|

Random SAS tips: Mixed data types

This week I had one of those pain-in-the-ass problems. I had a test with 24 items but they were of mixed types. That is, for some the answer was multiple choice and for others it was numbers.

The data was received as an Excel file.

Now, I could have opened it with SAS Enterprise Guide and specified data types for each variable, but the problem is, I am going to get this particular data set over and over, so I want code I can write once and run every time.

As if that wasn’t bad enough, the variables all had names like:

which_choice_is_the_same_as_the_

I wanted to rename these all to something sane like q1, q2 etc.

The first step was an option I don’t think I’ve ever used before, oddly enough.

 

proc contents data= annoying  varnum ;

 

Normally, SAS gives you the variables in a data set in alphabetic order when you do a PROC CONTENTS. The varnum option lists the variables in the order they appear on the data set. This was immensely helpful because it spared me going through the data trying to figure which was the first question, which was the second, and so on.

I just copied the variables in order after a RENAME statement and tacked on an =q1, q2, etc. like so

Data better ;

set annoying ;

rename

which_choice_is_the_same_as_the_ = q1
what_is_five_time_six__ = q2

etc.

proc contents data= better ;

I could have combined this with the previous step , but the fact is that unless the data set is really gigantic, the time that needs to be preserved is not computer processing time but my time, and this way was quicker because I didn’t have to write out those ridiculous variable names and worry about the program failing because I used  _ in the name instead of __ .

SAS does have a function to detect variable type, but that wouldn’t really have helped me because I still need to write all of these variables into a single array of item1 – item24 for later use, and you cannot have mixed type arrays. So, I did this

 

data mo_better ;
set better ;
array qs{*} q2 – q6 q10 q12 q14 -q16 q19- q21 ;
array itemN {*} $12 item2 – item6 item10 item12 item14 – item16 item19 – item21 ;
array qsA {*} $12 q1 q7-q9 q11 q13 q17 q18 q22- q24 ;
array itemA {*} $12 item1 item7 – item9 item11 item13 item17 item18 item22- item24 ;
do i = 1 to dim(qs) ;
 itemN{i} = put(qs{i},12.) ;
end ;
do j = 1 to dim(qsa) ;
 itemA{j} = qsa{j} ;
end ;
drop i j q1 – q24 ;

I have 4 arrays. The first consists of the numeric variable type questions. I couldn’t use _numeric_ to create an array of all numeric variables because there were others in the data set that were NOT test questions but were numeric and I did not want them in my array. I had to actually list each variable individually or in a range like q14-q16.

The next array is the one I am going to recode the variables into as character variables. Notice that character arrays need a $ and a length. The next two arrays are the character variables and the variables I’m going to copy them into. I could have just renamed the character variables in a RENAME statement and then changed the length in an ATTRIB statement  but  it would have taken more typing.

The DIM function is the dimension of the array, so it is going to loop through from 1 to however many variables in the array because I didn’t feel like counting them.

The PUT function is going to put this numeric variable into a new character variable with the specified length. It changes the variable to character.

The next loop just puts all the character variables into other character variables with the names item1, item7, etc. Now I have variables that are all the same type and length, named item1 – item24 and I can do things with them like compare each student’s response to each variable to the answer key, score it right or wrong and sum up the scored items, like this (1ANSWER is the first username)

 

Data in.pre_scored ;
set mo_better ;
by username ;
Array scored {24} sc1 – sc24 ;
Array items {24} $12 item1- item24 ;
Array ans{24} $12 ans1 – ans24 ;
if _n_ = 1 then do i = 1 to 24 ;
ans{i} = items{i} ;
end ;
else do i = 1 to 24 ;
if ans{i} = items{i} then scored{i} = 1 ;
else scored{i} = 0 ;
end ;
Retain ans1 – ans24 ;
total = sum(of sc1-sc24) ;

Since this is part of a two-year grant and I am going to receive these same test data sets many times, I am now finished with reading in and scoring the data for the next two years. After this, I just need to import the excel file and click run. I am happy.

Also curious, because I noted that this year’s pre-test scores are 1.5 standard deviations higher than the previous year.  I suspect this is because we have many more fifth-graders in this sample. So … with the scoring done automatically, I can now go on to interesting stuff.

If you want to check out the game these results came from, you can read about it here

 

Similar Posts

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *