%Include – a step toward making little black boxes

Years ago, I read a science fiction story about a future where all plays were performed by robots that had been programmed with the combined characteristics of the world’s best actors. An aspiring actor sadly asked the technician working on the computers to run these robots:

“What would you do if they invented a little black box to do your job?”

The technician paused thoughtfully and responded,

“Well, I guess I’d get me a job making them little black boxes.”

Remember “word processor”? You probably don’t but thirty years ago it was one of the fastest growing careers in America, women were moving out of the typing pool to “technical” jobs using word processing machines. And then, applications like WordPerfect, Word, Wordstar and Applewrite got to be easy enough that no one needed a word processing department and all of those jobs evaporated practically overnight.

I think that as soon as you have down DATA steps, PROC sort and other basic steps, it’s really time for you to look around for new challenges. The world is changing, whether you like it or not. SAS Enterprise Guide, Stata, SPSS, JMP, S-plus and even Excel/ Open Office to some extent, are all making it easier and easier for people to merge, create and analyze data sets all on their own using pre-written procedures that can be selected from a menu and populated with choices dragged and dropped in a pop-up window.

When you start out as a SAS programmer, you use the pre-written formats to make your data read “August 18, 2010” instead of 8/18/10 and you think you are cool. Then you find out you can create your own formats and have it print “Dennis’ birthday” for the 18th and “Naked Mole Rat Appreciation Day” for the 19th and so on and you are convinced that you are super-cool.

If you liked PROC FORMAT allowing you to write your own formats, you’ll really love the SAS macro language, a major extension of Base SAS that allows you to write a program to write programs.  But … learning to write macros is a big leap toward being an expert programmer, especially for someone new who is just getting used to PROC MEANS.

There are a couple of baby steps you can take to get started in that direction. The first is the use of %INCLUDE. I usually introduce this to new programmers first because it is relatively easy.

%INCLUDE is a gentle way to get a novice introduced to the concept of SAS macros, which can be a bit intimidating.

When you include something, unlike with a macro, you don’t need to change any of the variables, statements or functions. Debugging is a snap because you can run the code exactly as is, copy it into another dataset once it is bug free and then run it. Yet, it leads you into the idea of having code from somewhere other than within your program run over and over.

So, what does %INCLUDE do?

It includes programming statements from another file outside of your program. There are two reasons to use this very frequently. One is to save copying or typing the same code over and over. A common example is acknowledgements of funding, legal disclaimers, legends or other text that might always go at the bottom of every page on a website, such as:

Footnote1 “Material under the Creative Commons license for this site”  ;

Footnote2 “can be freely produced as it was created by pixies who live on air “ ;

Footnote3 “Other brand and product names are trademarks of their respective companies. “ ;

Footnote4 “Which are probably owned by the devil and run by communists“ ;
Footnote5 "Except for SAS" ;
Footnote6 "and SPSS (now owned by IBM) on Thursdays" ;
Footnote7 "and Stata which is in Texas and run by cowboys."

If you have 10 lines of this, you can copy and paste it every time you write a new program, or you can simply save the footnotes to a file and when ever you need these use a line something like this:

****** This is the text for footnotes **** ;

%include “c:\myfiles\mysasfiles\footers.sas “ ;

You really don’t need those ten lines of footnotes getting in your way every time you’re trying to read the program and de-bug it or document it. A major advantage over copying and pasting the same lines in every program is that if you get a new grant, or SAS gets bought by IBM, you don’t have to go find every program and change that code. I can just change it in the footers.sas file and I’m done.

A second common use for %include is , again, to make the code more readable and get rid of distractions. While it is kind that ICPSR includes PROC FORMAT code with many of the SAS data sets available on its site, these can get in the way.

You may get a program from ICSPR or another source that has hundreds of lines of Proc Format, Label and Format statements (this happens to me all the time).

proc format;

value statnum   1='(1) Alabama'

2='(2) Arizona'

3='(3) Arkansas'

And a hundred more lines of the same …

Followed by :

label

rec_bh  = 'bh:record type'

statnum = 'numeric state code'

ori     = 'originating agency identifer' ;

format state statnum.   division divisn. ;

And a hundred more lines …

Moving the Proc format, labels and formats cuts the length of the program in this example that I had received from 387 lines to 160.  There are several reasons I may want to do this. The first is debugging. Before I am sure I read in the code exactly the way I want it, the formats are pretty irrelevant. Often, for whatever reason, the file is not in the exact format I expected. I don’t want to slog through a lot of useless formats and labels before the data are actually read in. When I do have a permanent data set, I may want to use those formats but not have to see hundreds of extra lines in my code.

Important point: %INCLUDE statements are executed as they are read.  When SAS hits a %INCLUDE statement it is as if the code as copied into your program. Think about this. What this means is that I had to move the FORMAT procedure into one file and the LABEL and FORMAT statements into another file.

My program now looks like this:

*****  Proc Format to create user-written formats  ***** ;

%include “c:\myfiles\mysasfile\crimefmt.sas” ;

Data libref.crimes ;

Infile   datasetname ;

Input variables ;

<bunch of other statements >

******* Labels and Formats *********************** ;

%include “c:\myfiles\mysasfile\labelfmt.sas” ;

run;

Note that if I have the labels and formats after the run statement, it will give me an error.

Try it. It’s very non-threatening and once you get used to seeing those %____ in your program you are ready to move on to the next baby step, assigning macro variables.

Don’t believe that crap about how hard programming is or how hard statistics is. Everything is hard when you first start doing it. As a good friend of mine, who is a very good coach said at practice one day,

“When you learned to walk it was hard. You fell down on your butt. You screamed, you cried, you were really frustrated. But you didn’t give up and now you walk around all the time and walking, that’s just nothing to you.”

Yeah, it’s like that.

Similar Posts

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *