Lately, I’ve been working on a report that uses eight datasets that all have the same problems with the usernames.
In addition to needing to remove every username that contained the word “test” or “intern” we also needed to delete specific names of the classroom teachers who had played the game. We needed to correct names that were misspelled.
Here are a few examples of a very long list of statements:
if username in("MSDMARIA","1ANSWERKEY","MSDELAPAZ","MSCARRINGTON") then delete ;
if username = “GRETBUFFALO” then username = “GREYBUFFALO” ;
else if username = “HALFHORES” then username = “HALFHORSE” ;
else if username =”TTCARLSON18TTCARLSON18″ then username = “TTCARLSON18” ;
These problems occurred in every dataset.
A second problem found when looking at the contents of each of the 8 datasets was that the username variable was not the same length in all of them, which could cause problems later when they were to be merged together or concatenated. Also, now that all of the usernames have been cleaned up, none should be over 12 characters in length.
Wouldn’t it be nice if there was a way to just get the first n characters of a string?
Enter our character function, substr, which returns a substring of a variable beginning at any position and for as many characters as you like. Problem solved.
newid = substr(username, 1, 12) ;
It seems pretty inefficient to write this set of statements eight times in eight different data sets. Also, next year we will have another eight data sets, and some will have these same students’ usernames and same problems. Wouldn’t it be a lot easier to have these statements in one place and add to the “fixnames.sas” file whenever we find a new problem?
So, now we have the write once, use anywhere solution of %INCLUDE.
What %INCLUDE does
The %INCLUDE statement references lines from an external file and processes them immediately. It has almost the exact same effect as if you had copied and pasted those lines write into your program. The exception that makes it “almost” is that a %INCLUDE statement must begin at a statement boundary. That is, it has to be either the first statement in your program or occur after a semi-colon ending a statement as in this example.
data studentsf ;
infile inf delimiter = “,” missover ;
attrib teacher length = $12. username length = $ 16. ;
input username $ age sex $ grade school $ teacher $ ;
%include “/courses/u_mine.edu1/wuss14/fixnames.sas” ;
Also, you need to think about it as if you had copied and pasted those lines into your program. Is it still valid code? Whenever using %INCLUDE, you should make sure the code runs in your program as expected, with no errors, before cutting it out and making it an external file.
To source or not to source
The default is not to show the statements that were in the included file. Generally, this is desirable. This is code you have already debugged and if you are using it multiple times (otherwise, why bother with the %INCLUDE), having the same 20 lines repeated 8 times in your log just makes it harder to debug.
Professors might want to use real data but hide all of the messy data handling from the students initially in fear they would run screaming for the door. I meant, professors might want to gradually introduce SAS statements and functions for data handling.
In either case, students could use the %INCLUDE statement as shown in the example above. To see the code include in your log is quite simple, just add a source2 option as shown.
%include “/courses/u_mine.edu1/wuss14/fixnames.sas” /source2 ;
It will be in your log as follows
NOTE: %INCLUDE (level 1) file “/courses/u_mine.edu1/wuss14/fixnames.sas is file “/courses/u_mine.edu1/wuss14/fixnames.sas.
419 +username = compress(upcase(username),”. “) ;
420 +if (index(username,”TEST”) > 0 or index(username,”INTERN”) > 0
and so on.
The + signs next to each statement denote it was in the included file.
If you want to know why I think it is so important for new SAS users to learn about the %INCLUDE statement, you should come to the Western Users of SAS Software conference in San Jose next month. Especially if you’re a student, you should come, because they cut you a really good deal.
If you’re not a student and you have a real professional job – well, then, you should be able to afford it. There will be funny hats, beer, coding and cookies. What more could one ask?