Watch me work: Compress Function for Test Scoring

Did you ever fill out one of those online forms where you kept trying to submit it and got messages like,

“You need to enter your phone number in the format 311-234-12234”


You cannot have any special characters in this field.

That one really irritates me because, in fact, my last name has a space in it and many websites refuse to accept it. Take it up with The Invisible Developer, or his ancestors.

Have you ever just said the hell with it, and skipped filling out the form? Preventing users from entering all but the expected data type saves problems when you analyze your data, but it can also cause people to give up on your stupid web form.

So … when I created the pretest for Forgotten Trail and Aztech, I made it accept just about anything. If you wanted to write in 6, six, 9R6, 6 left over — any and all of those would be accepted and recorded.

You can get the first two games we developed here.

background for hidden pictures game

Forgotten Trail and Aztech are in beta and will not be commercially available for another couple of months.

What now? I have to score that test, but I’d rather the difficulty be on me than 150 or so middle school students who are our first test group.

So… how to fix it, with SAS character functions. Here is me, scoring the first half of the test:

First, I read the data into a new data set because I want to preserve the original data and not write over it. I may want to look at the exact incorrect answers later.

I create a character array of all 32 items on the test, and then I use a DO loop to change all of the questions to upper case.


Data in.recode ;
set in.pretestGMS ;
array qs{32} $ q1 – q27 q28a q28b q28c q29 q30 ;
do i = 1 to 32 ;
qs{i} = upcase(qs{i}) ;
end ;

Now, on to the questions. I eventually need all of these items to be score 1= correct, 0= incorrect

q1 is a question about money. People put all kinds of wrong answers – $35, $40, as well as the correct answer, 100 and $100. I used the COMPRESS function to remove the ‘$’ , then set q1 to equal 1 if the answer was 100, an 0 otherwise.
q1 = compress(q1,”$”) ;
if q1 = 100 then q1 = 1 ;
else q1 = 0 ;

The second use of compress function removes trailing blanks – if you don’t put any second parameter in the compress function, it just removes blanks. In q2, the answer was 4 but the students put “four”, “four frogs” “4/14” and so on. All of these are correct. You can have a list in an IF statement and if the variable matches any of those values in the list, then do something, in this case, set the answer as correct.
q2 = compress(q2) ;
if compress(q2) in (“4″,”FOUR”,”FOURFROGS”,”4/14″,”4OUTOF”,”4FROGS”) then q2 = 1;
else q2 = 0 ;

*** How to keep only numeric data using a simple SAS function (take that all you regular expression fetishists!)

The third use of the compress function KEEPS the characters that are the second parameter, because I added an optional third parameter of “k”, to KEEP the characters in the second parameter instead of discard those. So, this keeps numbers and deletes everything else from the answer. If it is 150, it is scored correct, otherwise, it’s wrong.
if compress(q5,”0123456789″,”k”) = 150 then q5 = 1;
else q5 = 0 ;


A lot of the items were similar, so that is half of scoring the test. I’ll try to write up the rest from the airport  tomorrow, but for now, I need to write a couple of emails, finish this scoring program and pack before 2 am, and that only gives me about 40 minutes.

Similar Posts

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *