1. Use the data sets in the sashelp directory for dummy data.

2. The RANUNI function is worth remembering

Today I needed to check something.  Specifically, I was using the ranuni function to generate random numbers for sampling with replacement. I wanted the data set to be sorted randomly, select the first match, then re-sort the data and re-sample. (Yes, obviously, I was doing propensity score matching.)

When you use a 0 for the seed, e.g.,

randnum = ranuni(0) ;

SAS actually uses the time of day. I was not sure to how many seconds, micro-seconds, nano-seconds or whatever it rounds the time. Even if I did, I’m not sure that would have helped me because I wanted to know is the rounding factor small enough that if my program is running with a small data set there is already a new seed by the next step. I *thought* so but you know, there is a reason we do testing.  It’s because things don’t always come out the way you think.

I was going to create a dummy data set and then I realized, hey! There are all kinds of data sets already in the sashelp directory. You may never have looked at them, or noticed, but they  are there. So, I did this :

Data yes ;
set sashelp.air ;
randnum = ranuni(0) ;
run ;
proc means data = yes ;
var randnum ;


Data yes ;
set yes ;
randnum = ranuni(0) ;
proc means data = yes ;
var randnum ;
run ;

I did the PROC MEANS because I was too lazy to open the two data sets and look at the numbers. Yes, it worked. I expected it would.

3. Avoid the unconditional ELSE statement.

This is a habit I got into years ago. I’m not one for giving other people rules for how to code because I think most of those rules given out as gospel are just one of many acceptable ways of doing things. This one, however, is worth remembering. Say you have two possible conditions, experimental and control. You would *think* it would make sense to type

If group = 1 then output experiment ;

else  output control ;

 

It would make sense to you if you have never met any actual people. There are data entry errors, where someone types 11 or the letter “l” instead of a 1. People type “experiment” instead of 1. They leave that field blank because they don’t know if this person was in the experimental or control group. All of those people end up in the control group. The technical term statisticians use for this state of affairs is “bad”.

Instead, at a  minimum, do this.

If group = 1 then output experiment ;

else if group = 0 then output control ;

I worked with someone who had a good habit of always creating one more data set than he needed, he named it junk. Then,  at the end of every IF statement that sent data to different groups was this

If group = 1 then output experiment ;

else if group = 0 then output control ;

else output junk ;

Not only did your 1s only go to the experimental group and your 0s only go to the control group but you also had a dataset that collected the junk where you could look at who these people were and try to figure out what their problem was. Personally, I don’t do that as a routine, but it is a good habit.

 

Comments

10 Responses to “Random basic SAS tips on ranuni, sampling with replacement & junk”

  1. Ann on January 28th, 2012 7:45 am

    Ooh… I like John Cook’s blog, and I read some SAS blogs…but I REALLY like your blog – and now I get some SAS tips too? Life is good…

  2. Rick Wicklin on January 30th, 2012 8:16 am

    For more information on random number seeds and how they work in SAS, see http://blogs.sas.com/content/iml/2011/08/31/random-number-streams-in-sas-how-do-they-work/

    For long-running simulations/permutations, you might want to use the newer RAND function instead of the older RANUNI function. See the last paragraph of
    http://blogs.sas.com/content/iml/2011/10/19/four-essential-functions-for-statistical-programmers/

  3. AnnMaria on January 30th, 2012 1:59 pm

    Thanks, Rick. I’ll check it now. Turns out I just finished something which is a very long running simulation and although it’s running (good) I’m looking for ways to improve it.

  4. Jaspreet on February 14th, 2012 2:44 am

    If i already have a dataset, how do i randomize it using ranuni()

  5. AnnMaria on February 14th, 2012 2:51 am

    If you mean how do you get it in random order, you can create a variable just like shown above and then sort on that variable.

    Data yes ;
    Set yes ;
    randnum = ranuni(0) ;
    proc sort data = yes ;
    by randnum ;

  6. Jaspreet on February 14th, 2012 3:00 am

    data rand;
    input CT RT @@;
    Label CT = ‘Circuit Type’;
    Label RT = ‘Response Time’;
    datalines;
    1 9 1 12 1 10 1 8 1 15
    2 20 2 21 2 23 2 17 2 30
    3 6 3 5 3 8 3 16 3 7
    ;
    order = ranuni(0);
    output;
    end;
    end;
    proc print data = rand;
    title ‘Raw Dataset’;
    run;
    proc sort data = rand;
    by order;
    run;
    proc print data = rand;
    title ‘Randomized Dataset’;
    run;

    This is my SAS code, i am getting the same result in raw as well as randomized dataset, cannot figure out y mistake, its either in the way i entered the data or using the ranuni() command.

  7. annmaria on February 16th, 2012 2:46 am

    You should have also gotten some errors in your log.

    Once you had the datalines, data and the ; after the data, that ended that data step. To create the order variable, you need to start a new data step.

    Try this. It will work.

    data rand;
    input CT RT @@;
    Label CT = ‘Circuit Type’;
    Label RT = ‘Response Time’;
    datalines;
    1 9 1 12 1 10 1 8 1 15
    2 20 2 21 2 23 2 17 2 30
    3 6 3 5 3 8 3 16 3 7
    ;

    proc print data = rand;
    title ‘Raw Dataset’;
    run;
    data rand ;
    set rand ;
    order = ranuni(0);
    proc sort data = rand;
    by order;
    run;
    proc print data = rand;
    title ‘Randomized Dataset’;
    run;

  8. Mohammad Rahman on September 17th, 2013 11:29 am

    Hi,
    I do not know why the following two data steps produce the same random numbers for x_10 and x_20. Could you please help me in this regard? is it a bug? or I am doing some illegal! Thank you in advance.

    data x1;
    do i=1 to 3;
    xx=ranuni(401);
    do j= 1 to 2;
    x_10=ranuni(10); output;
    end;
    end;
    run;
    data x2;
    do i=1 to 3;
    xx=ranuni(401);
    do j= 1 to 2;
    x_20=ranuni(20); output;
    end;
    end;
    run;

  9. Mohammad Rahman on September 17th, 2013 12:08 pm

    I also checked for rand() function; it also produce same random numbers. When I delete “xx=ranuni(401);”, then they produce different, expected random numbers. Than you again.

  10. Jonathan on June 17th, 2014 12:24 am

    Actually, the seed is only generated on your first observation. If you did ranuni(1), your random number wouldn’t be the same for every step observation. What SAS does is for _n_=1, it uses the time of day as a seed, and let’s say the random number it generated was = 12345/(2^31 -1) (ranuni generates a random number between 1 and 2^31-1 then normalizes it to be between 0 and 1 by division), at _n_=2, the new seed would be 12345, NOT the new time of day

Leave a Reply