Bonus SAS Tips Before SAS Global Forum

I’m giving a talk on Preparing Students for the Real World of Data at SAS Global Forum next month.

You’d think 50 minutes would be long enough for me to talk, but that just goes to show you don’t know me as well as you think you do. One point made in the template for papers is that you should not try to tell every single thing you know about the DATA step, for example, because it will bore your audience to death.

Random Tips That Didn’t Make it Into the Paper

1. CATS removes blanks and concatenates

While I did give a few shout outs to character functions, it was not possible to put in every function that is worth mentioning. One that didn’t make the cut is the CATS function.

The CATS function concatenates strings, removing all leading and trailing blanks.

Let’s say that I want to have each category renamed with a leading “F” to distinguish all of the variables from the Fish Lake game. I also want to add a ‘_’ to problems 10-14 so that when I chart the variables 11 comes just before 12, not before 2 (which is what would happen in alphabetical order). So, I include these statements in my DATA step.

IF problem_num IN(11,10,12,13,14) THEN probname = CATS(‘F’,’_’,probname);
ELSE probname = CATS(game,probname) ;

Now when I chart the results you can see the drop off in correct answers as the game gets more difficult.

Graph by variable name

2. Not all export files are created equal

Nine of the ten datasets I needed I was able to download as an EXCEL file and open up in SAS Enterprise Guide. It was a piece of cake, as I mentioned last time. Unfortunately, the third file was download from a different site and it had special characters in it, like division signs, and the data had commas in the middle of it. When I opened it up in SAS Studio it looked like this.

Ugly dataFixing it was actually super simple. This was an Excel file. I simply did a Replace ALL and changed the division signs to “DIV” and the commas to spaces. The whole thing took FIVE lines to read in after that.

3. Listen to Michelle Homes and know your data

filename fred “/courses/abc123add/sgf15/sl_pretest.csv ” ;

Data pretest keyed;

LENGTH item9 $ 38. ;

infile fred firstobs = 2 dlm=”,”;
input started $ ended $ username $ (item1 – item24) ($) ;

Thank you to the lovely  Michelle Homes for catching this! As she pointed out in the comments, the input statement assumes that the variables are 8 characters in length and character data. This is true for 26 of the 27 variables. However, ONE of the 24 items on the test is a question that can be answered with something like Four million, four thousand and twelve.

That, as you can see, is over 8 characters. So, I added a LENGTH statement. That brought up another issue, but that is the next post …

I’ll have a lot more to talk about in Dallas. Hope to see you there.

============

Want to be even smarter? Back us on Kickstarter! We make games that make you smarter. The latest one, Forgotten Trail, is going to be great! You can get cool prizes and great karma.

Boy walking in rain

local_offerevent_note March 26, 2015

account_box AnnMaria De Mars

3 thoughts on “Bonus SAS Tips Before SAS Global Forum”

  • Good tips!

    In Tip #2, it seems that your character columns have a maximum of 8 characters which is the default length when list input is used and no length, informat or attrib statement isn’t specified before the input statement – http://support.sas.com/documentation/cdl/en/lrcon/67885/HTML/default/viewer.htm#n1w749t788cgi2n1txpuccsuqtro.htm

    I recommend to SAS programmers to check a column’s type and length before submitting the code (know the PDV configuration). This way they can potentially avoid character type and truncation issues. Also helps avoid debugging the log. 🙂

  • I completely agree. In the paper, I say about a dozen times “Know your data” in every way possible except for the biblical sense (because that’s just a little creepy).

  • Personally I think cats & catx should make the cut, they cut out so much fiddly coding.

    But your example is a good one for formats…

    IF problem_num IN(11,10,12,13,14) THEN
    probname = CATS(‘F’,’_’,probname);

    ELSE probname = CATS(game,probname) ;

    Can be done easier with one statement:

    probname = CATS(‘F’, put(probname, z2.) ;

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.