SAS Proc Transpose – how have I not written about this before?

When I was young and knew everything, I would frequently see procedures or statistics and think, “When am I ever going to use THAT?” That was my thought when I learned about this new procedure to transpose a data set. (It was new then. Keep in mind, I learned SAS when I was pregnant with my first child. She is now CEO of a an educational game company and the mother of three children. )

PROC TRANSPOSE is super-useful. You might only think it is useful for transforming data for use with PROC GLM to use with PROC MIXED, or you might have no idea what the hell that means and it is still super-useful.

Let me give you today’s example. I’m looking for data to use in a biostatistics class I’m teaching next month. It’s a small data set, with data on eight states included in the Center for Disease Control’s Autism and Developmental Disabilities Monitoring Network.

The data looks like this:

As you can see, each state is a column. I would like to know, for example, what percentage of people with autism also have a physical disability. There is a way to do it by finding the mean across variables but I want to use this data set for a few examples and it would be much easier for me if each of those categories was a variable.

The code is super simple:

PROC TRANSPOSE DATA=mydata.autism OUT=mydata.autism2 NAME=state;
ID eligibility ;

The NAME = option is not required nor is the ID statement but they will make your life easier. First, let’s take a look at our new data.

Now, instead of state being a variable, we have one record for each state, the percent with autism diagnosis only is one variable, percent with emotional disturbance another, and so on. What the NAME = option does is give a name to that new variable which was the name of each column. If you don’t use that option, the first column would be named _name_ . Now, with these data it would still be pretty obvious that this variable is the state but in some cases it wouldn’t be obvious at all.

The ID statement is really necessary in this case because otherwise each column is going to be named “COL1”, “COL2” etc. Personally, I found the ID statement here confusing because normally the ID statement I think of as the individual ID for each record, like a social security number or student ID. In this case, the variable name you give in the ID statement is going to be used to name the variables. So, as you can see above, the first column is named Autism(%), the second is named Emotional Disturbance (%) and so on.

So, that’s it. All I need to do to get means, standard deviation, minimum and maximum is :

PROC MEANS DATA =mydata.autism2;

So, that’s it.

By the way, I get this data set and a few others from SAS Curriculum Pathways. Nice source for small data sets to start off a course.

I live in opposite world, where my day job is making games and I teach statistics and write about programming for fun. You can check out our games here. You’re probably already pretty good with division but you’ll learn about the Lakota language and culture with Making Camp Lakota. A bilingual (English-Lakota) game that teaches math.

feather

Categorical Data Analysis is even more fun than it sounds

ByAnnMaria De Mars June 17, 2011

I was very happy to get an acceptance letter from WUSS telling me that my class proposal for the 2011 conference was accepted. I’m usually happy to get acceptance letters in general but this was particularly nice since I have gotten the first four chapters of my naked mole rat book done and the next…

statistics

Predictor variables – when order does and does not matter

ByAnnMaria De Mars June 27, 2012

Having failed recently to use BMI as a variable from a data set on school children in our example for propensity score matching, because people who fill out surveys are big, fat liars, we next went to a sample of really old people and used death as a dependent variable Try faking it on the…

Algebra | statistics

Logarithms

ByAnnMaria De Mars February 24, 2009February 24, 2009

Logistic regression is based on logarithms. Ordinary Least Squares regression and analysis of variance uses the actual values as the dependent and independent variables in an equation. Logistic regression does not. What is a log, anyway? Let’s start with the very basics. First we learned to add: 5+5+5+5 = 20 After about eight years of…

Software | Technology

SAS Global Forum started out as planned …

ByAnnMaria De Mars April 9, 2018April 9, 2018

The first time I went to SAS Global Forum, over 30 years ago, it was actually called SUGI (SAS Users Group International) and it was in Reno, NV. I was a just-divorced single mom and there was no such thing as a Working Mothers Room (which I noticed signs for here in Denver). I paid…

statistics

Yes, You Totally CAN Understand Model Fit Statistics, with M & M’s

ByAnnMaria De Mars October 15, 2014

Ever wonder why with goodness of fit tests non- significance is what you want? Why is that sometimes when you have a significant p-value it means your hypothesis is correct, there is a relationship between the price of honey and the number of bees, and in other cases, significance means your model is rejected? Well,…

Software

Changing your path with pointing and clicking in SAS Enterprise Guide

ByAnnMaria De Mars October 6, 2013October 6, 2013

I am aware that the world is full of people who don’t know what a C-prompt is and who are unfamiliar with the concept of a path. Suppose you are one of those people (and no, I’m not judging you As Will Rogers said, “Everyone is ignorant, just on different subjects. He was a pretty…

Similar Posts

Leave a Reply