|

Scoring open data with VVALUE : The best function I’d never heard of

Anyone who tells you they know all of SAS is like that creepy guy at the fraternity party who swears that his father is the Duke of Canada, that is, they have a perception of themselves that is not in screaming distance of contact with reality.

Even though I have been using SAS for decades, I am still discovering new functions, tricks, tips and procedures all of the time. In fact, the one I came across today was so helpful, I even cross-posted this on SAScommunity.org

My problem and how I solved it

There are currently over 370,000 datasets on the data.gov site, not to mention the numerous others available from the National Center for Education Statistics and many other open data sources. One problem users of these data often encounter is that the formats used by the creator are not necessarily those desired by the end user. For example, many files have user-defined formats for each individual item such as:

value S9FMT
1 = “A”
2 = “B”
3 = “C”
4 = “D*”
6 = “NOT REACHED”
8 = “NOT ADMIN.”
9 = “OMITTED”
;
value S10FMT
1 = “A”
2 = “B”
3 = “C”
4 = “D”
5 = “E*”
6 = “NOT REACHED”
8 = “NOT ADMIN.”
9 = “OMITTED”
;

A common desire would be to have the items that were not administered to the student have a missing value and the rest scored as either correct or incorrect. At first thought, using the PUT function to get the formatted value might seem like a good idea, but that would require specifying the format. Since many of these datasets include several hundred variables and several hundred different formats, that’s not going to work.

Here is one solution using the 2007 dataset for eighth grade students from the Trends in International Mathematics and Science Study (TIMSS) :

%include "C:\Users\me\Documents\TIMSS\Tformats.sas" ;

data scored ;
set in.G8_ACHIEVE07 ;
attrib Fval length = $9. ;
array rec{*}  M022043 — BSSREA05 ;
do i = 1 to dim(rec) ;
FVAL = vvalue(rec{i}) ;
if FVAL  = “NOT ADMIN” then rec{i} = . ;
else if index(FVAL,”*”) > 0 or FVAL = “CORRECT R” then rec{i} = 1  ;
else rec{i} = 0 ;
end ;

The %INCLUDE statement includes the formats defined by the organization that created the dataset.
(TIMSS, like many of the original data sources, includes a folder of SAS files along with the data downloaded that read in text data, create formats and labels, merge files and perform other useful functions. )

The VVALUE function returns the formatted value of the variable.
In the program above, it is necessary to first recode the items that were not administered to have a missing value, and then score the students who were administered an item as having been correct or incorrect. In this particular example, all of the formatted values for correct responses either had an “*” next to the correct multiple choice value or the words “CORRECT RESPONSE” . Of course, this statement would need to be modified depending on the formatted values of your particular dataset.

Similar Posts

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *