|

SAS Tip: Preventing Disaster When Variable Lengths Differ

Over the weekend, I wrote a post showing how SAS can be used to make what appears to be a complex problem quite simple.

First of all, am I just being dramatic? Seriously, how can having your variable lengths differ be a disaster?

Simple. You are merging by a variable that is a unique user identifier like username, social security number. Because the two different data sets have different lengths, they do not match. If you are computing the number of unique users you may overestimate by a huge amount. If you want the number of people who are in both data sets, you may vastly underestimate the amount of true matches.

As with anything in programming, there are many ways to do this. My solution is to create a new variable and set it to the identical length and format using the ATTRIB statement. Extra bonus is this will work when you have variables that are not only different lengths but different types, say character in one data set and numeric in the other.

You really only need two statements in your data step, an ATTRIB statement and then an assignment statement that sets the value of the variable you created to whatever the variable is you want to merge.

DATA dsname ;
ATTRIB newvar LENGTH = $49 ;
SET mydata2.dsname ;
newvar = oldvar ;
Repeat this step for the second data set and then merge (or concatenate) to  your little heart’s delight.

Angry guard faceThe voice of experience: 
Notice two things here:  I created a temporary data set from my permanent one. Although SAS has gotten more forgiving over the years in not writing over your existing data sets when there is an error, it is still better to err on the side of caution and make sure all is wonderful before saving over that existing data, especially if it took you a lot of effort to get the data in that form.
Second, I created a new variable and kept the old one as is. I don’t always do this but it is good practice. You may be tempted to just use the first 9 digits because we all know social security numbers are 9 digits and then later you find that it was entered as 123-45-6789  and now you only have 123-45-67

—- Feel smarter after reading this blog?

Fish Lake artwork

Want to feel even smarter? Download and play our games!  You can run around in our virtual world while reviewing your basic math skills. If you are too busy (seriously?) you can still give a game as a gift or donate a game to a classroom or school.

Similar Posts

2 Comments

  1. Hi there,

    Speaking from an efficiency point of view, wouldn’t it be quicker to only use the attrib on one dataset and make it the same length and type as the dataset you want to merge with?

    Just a thought.

  2. This is a VERY neat tip! And I really like the “Voice of Experience!” Reminds me of my Mom’s advice; but she was SO MUCH prettier! 🙂

    Thanks,
    Adrienne
    Research Scientist, California Department of Public Health

Leave a Reply

Your email address will not be published. Required fields are marked *