|

PROC COMPARE FOR VALIDATING SAS CODE

I know people who are so obsessive about testing and validating their code to the point they spend more time on testing it than actually writing it and analyzing the output. I said I know people like that, I didn’t say I was one of them. However, it is good practice to validate your SAS code and despite false rumors spread by my enemies, I do it sometimes.

Here is a simple example.  I believed that using the COMPRESS function with “l” for lower case or “I” for case-insensitive gave the same results. I wanted to test that. So, I ran two data steps

DATA USE_L;
set mydata.aztech_pre ;
q3 = compress(Q3,’ABCDEFGHIJKLMNOPQRSTUVWXYZ’,’l’);
q5 = compress(Q5,’ABCDEFGHIJKLMNOPQRSTUVWXY’,’l’);

… and a whole bunch more statements like that.

Then, I ran the exact same data step but with an “I” instead of an “l”  .

Finally, I ran a PROC COMPARE step

PROC COMPARE base =USE_L compare=USE_I ;
Title “Using l for lowercase vs I for insenstitive” ;

PROC COMPARE RESULTS SHOW NO DIFFERENCES

But, hey, maybe PROC COMPARE just doesn’t work. Is it really removing everything whether it is upper or lower case? To test this, I ran the procedure again comparing the dataset with the compressed results with the original data set.

PROC COMPARE base =mydata.aztech_pre compare=use_I ;
Title “Comparing with and without compress function” ;

The result was a whole lot of output, which I am not going to reproduce here, but some of the most relevant was:

  Values Comparison Summary                                                      
                                                                                                                                    
Number of Variables Compared with All Observations Equal: 24.                                     
 Number of Variables Compared with Some Observations Unequal: 16.                                  
Number of Variables with Missing Value Differences: 10.                                           
Total Number of Values which Compare Unequal: 694. 

Looking further in the results, I can see comparison of the results for each variable by observation number

          ||  q5                                                                              
           ||  Base Value           Compare Value                                              
       Obs ||  q5                    q5                                                        
 ________  ||  ____________          ____________                                              
            ||                                                                                  
         5  ||  150m                  150                                                       
         6  ||  42 miles              42                                                        
        10  ||  one thousand                                                                    
        12  ||  200 MILES             200       

So, I can see that the data step is doing what I want, which is removing all of the text from the responses and only leaving numbers. This is important because the next step is comparing the responses to the questions with the answer key and I don’t want any mismatches to occur because the student wrote ‘200 miles’ instead of 200.

In case you are interested, this is the pretest for two games that are used to teach fractions and statistics. You can find Aztech: The Story Begins here and play it for free, on your iPad , Mac, Windows or Chromebook computer.

Mayan god
Play Aztech !

Forgotten Trail can be played in a browser on any Mac, Windows or Chromebook computer.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *