For reasons I may explain later – or maybe not – I decided to analyze the TIMSS data, which is Trends in International  Mathematics and Science Study.

Use a colon: Nifty tip #1
** Ran this first ***

libname LIB ‘C:\TIMSS2007\Data’;

proc contents data = lib.G4_ACHIEVE07;

*** Modified to only keep math items ;
data lib.G4_ACHIEVE07;
set lib.G4_ACHIEVE07;
drop s: ;

I was only interested in the mathematics items, not the science ones, and since I did not want 170+ items cluttering up my data set, I used the statement below


This statement drops all of the variables beginning with S. You should be cautious of this, because  if there is a variable with a name like STUDENT_ID that will be dropped also.  This is why I ran the PROC CONTENTS first and verified that all of the science items and only t hose items began with an S.

Nifty tip #2  - use a %INCLUDE statement

It only appears that the point of today’s blog is to include all possible special characters. That is merely a fringe benefit.

The %INCLUDE statement essentially copies and pastes code from another file into your program in the spot where you inserted it. I like it for things like 400 lines of formats because just like I don’t like extra variables cluttering up my data set, I don’t like extraneous lines cluttering up my code. I do need to use the PROC FORMAT but I don’t need to see it every time I run the program and I do not want to store it permanently.

%include “c:\timss2007\programs\” ;

Problem solved.

Nifty tip #3  Use the LINESIZE option to see all of your results on one line

I am easily annoyed. If you read my blog often, you know that this has been established. If I have 200 variables and the minimum and maximum does not fit on the same line as the mean because the label is

“This is that question where we asked the student about long division which involves dividing a two- digit number into a three-digit number and has a remainder”

and then you have the mean and need to scroll down 200 lines to see the minimum and another 200 lines to see the maximum, well it’s annoying.

Do this:


or whatever large number you like. No, I don’t have paper that is that wide, but I’m not planning on printing this out, I just want to scroll through and see that the minimum, maximum, mean and standard deviation are reasonable.

Nifty tip #4 Use a temporary data step to find the number of variables

No, smart ass, PROC CONTENTS would NOT do this. I want to know how many math items there are, not how many total. The math items (now that I deleted the science ones above) are in order. I run this statement, look in my log and it tells me there are 178 variables in the data set.

data test ;
set lib.G4_ACHIEVE07;
keep M031106 –M041191 ;

Nifty tips #5  and #6 – Create an array and use the VVALUE function to score data

TIMSS has formats (remember the %INCLUDE ) that are things like 98 = “NOT ADMIN.” , 10 = “CORRECT RESPONSE”.

data lib.G4_scored ;
set lib.G4_ACHIEVE07;
array ans{*} M031106 –M041191 ;
array sc{*} $18 tmp1 – tmp178 ;

I created an array of the mathematics items and the ans{*} says to create an array of dimension however many variables there are between M031106 and M041191 . The double dashes signify between as in “between locations in the data set” with M031106 coming first. If you use one dash SAS assumes the variables are numbered M031106  M031107 all the way toM041190 M041191. Which they are not. Do double dashes count as two special characters or only one?

I could have used 178 instead of * since I actually knew there were 178 variables, but I wanted to throw in another special character. Yes, I am immature. That was established long ago. The $18 denotes this as an array of character variables and assigns them all a length of 18 which is the length of the maximum formatted response. Also, a $ is another special character.

do i = 1 to 178 ;
sc{i} = trim(vvalue(ans{i})) ;
if sc{i} in (“INCORRECT RESPONSE”,”NOT REACHED”, “OMITTED”) then ans{i} = 0 ;
else if sc{i} = “CORRECT RESPONSE” then ans{i} = 1 ;
else if sc{i} = ” PARTIAL RESPONSE” then ans{i} = .5 ;
else if sc{i} = “NOT ADMIN.” then ans{i} = . ;
end ;
drop tmp1 – tmp178 M031002 M031223;

Here I have my handy do-loop and a VVALUE function. You can use VVALUE when you don’t know the variable format, or, as in my case, are too lazy to look it up and type it in. The formatted value of ans{i} , whatever that format might be, is put into sc{i}. I also used the TRIM function to trim trailing blanks while I was at it.

Now that I have scored all of the items to suit my nefarious purposes, I drop the temporary variables as well as two variables that it turned out are questions not administered to anyone.

And that, is my nifty SAS tips of the night.



















One Response to “Little SAS tips (& special characters) while analyzing TIMSS data”

  1. Rob Meekings on October 5th, 2012 5:35 am

    Great tips, thank you!

    I’d take issue with #4, I think you can use Proc CONTENTS for this. Whilst I was thinking about it I came up with a couple of other approaches (one of which uses the colon):

    data temp;
    array m(20) 8 m1-m20;
    array s(12) $2 s1-s12;
    do i = 1 to 10;

    /* Approach 1: use Proc CONTENTS */
    /* 1a: filter output obs */
    proc contents noprint
    out=myconts(keep=name where=(substr(name,1,1)=”m”))
    /* Approach 1b: use Proc CONTENTS, filter input variables */
    proc contents noprint

    /* Approach 2: use a _null_ datastep and the PDV */
    data _null_;
    if 0 then set temp;
    length _name_ $32 _i 3;
    call vnext(_name_);
    do until (upcase(_name_) = ‘_NAME_’);
    if (substr(_name_, 1, 1) = ‘m’) then
    call vnext(_name_);
    _i + (-1);
    call symput(‘vars’, compress(_i));
    put _i=;

    /* Approach 3: use dictionary.columns */
    %macro get_vars(like=, ds=, lib=work);
    /* Create a macro variable to store number of variables that match*/
    %global &like._vars;
    /* Count the number of matching variables */
    proc sql noprint;
    select compress(put(count(*), comma12.)) into :&like._vars
    from dictionary.columns
    where libname = upcase(strip(“&lib”))
    and memname = upcase(strip(“&ds”))
    and upcase(substr(name, 1, length(strip(“&like”)))) = upcase(strip(“&like”))
    /* Write the number to the log */
    %put NOTE: &like._vars=&&&like._vars;

    %mend get_vars;
    /* Invokve the macro */
    %get_vars(like=m, ds=temp);

    The macro could go in your %include file!

Leave a Reply