As statisticians, we like to say that statistics is everywhere. Here is an example. Regular readers of this blog might know that my darling daughter number three is the world champion in mixed martial arts. There is a very wide gap in the general discourse at mixed martial arts events and, say, the Joint Statistical Meetings, or SAS Global Forum.

A few days before her fight, a fighter from the UFC was arrested for vandalizing a church  while naked. Do not ask me why. This is not a why question.

Some people have been put off by my daughter’s very forthright manner of speaking and compared her to some of the fighters in the UFC. So, what was my daughter doing during that week?

Actually, she was making weight running up to her defense of her world title, and, as she did the last time, running a contest to raise money for the World Food Programme. There is a site called freerice.com where for every correct answer you get the sponsor donates 10 grains of rice to WFP. Ronda asked her fans to play as a group. So far, they have donated over 30,000,000 grains of rice which at 3,400 grains per bowl is enough to feed over 8,800 people.

People who donated over 1,000,000 grains all got a prize. The top two donors got personalized TOPPS trading cards with Ronda’s signature. Since there are a total of two of those cards in existence they should be worth some money.

What about all of the other people, though? How do you keep people who only have time to play the game a little motivated while rewarding those who play a lot?

Enter PROC SURVEYSELECT and the SIZE statement.

Here is how you select a sample proportional to size.

First, I downloaded the file that had all of the donors. It is a csv file. Then, I just from the SAS FILE menu selected IMPORT DATA, dragged down to select csv as the type and opened my file.

Here is the program
*** This deletes people who did not donate any rice, and also

*** Those who donated > 1,000,000 grains, since those people already are guaranteed a prize 

*** It also breaks them into 3 groups - High N=7, Middle N = 12 and low N=188  ;

data winners ;
set rice ;
if grains_in_group > 1000000 or grains_in_group = 0 then delete ;
else if grains_in_group > 100000 then donate = "high" ;
else if grains_in_group > 20000 then donate = "mid" ;
else donate = "low" ;

*** This just does the frequency by group to check all is well *** ;
proc freq data = winners ;
tables donate ;

*** The documentation says you need to sort by the strata variable **
*** Although I tried it without sorting, too, because I am just like that 

*** And it still works. Maybe SAS loves me   ;
proc sort data = winners ;
by donate ;

*** The method = PPS requests sample with probability proportional to size

*** and without replacement.  N = (2 1 3) will select  2 from high, 3 from mid and 1 from low.
proc surveyselect data = winners method = pps n=(2 1 3)  out = selected;
size grains_in_group ;
strata donate ;
run ;

Two things to note

1. When you do the N=(  2 1 3)   — the numbers are assigned in order of how they are sorted, so it is not high, middle, low. In alphabetical order, it is sorted high, low, middle.  So the 7 people who donated between 100,000 and 999,999 grains of rice had a 1 in 3.5 chance of being selected. The people who donated 20,000 to 99,999 had a 1 in 4 chance of being selected and the people who donated less than 20,000 had a 1 in 188 chance.

2. I could have left off the SIZE statement and done  a simple random sample stratified by donate. If I did that, the people who were in the highest donor group would have more chance of winning than in the middle donor or low donor group but within group, the person who donated 10,000 grains would have no more chance than the person who donated 10. I didn’t think that was fair. In fact, when I compared the two methods, pps and srs with strata by group, by running the program several times after I had already selected the winners, just to see what would happen, on the average there was a difference of 30,000 – 50,000 grains in the amount donated using the proportional to size method with the stratified random sample.

So, thanks to all the people who donated. Hurray for Ronda for doing something good while she’s starving to make weight.

And hurray to SAS for making it so easy to select people fairly.

niece with belt

As for the fight, Ronda won it in 54 seconds. She keeps the belt (and yes that is REAL gold). The only one who has been able to take the belt from her so far has been her 4-year-old niece, but she cheated. She used cuteness.

Also, even though the fight is over, you can still join the group and donate. It’s a good cause, because, to quote Ronda, “It sucks to be hungry.”

Comments

One Response to “PPS sampling, PROC SURVEYSELECT and not getting naked in church”

  1. pps file opener on August 26th, 2012 11:36 pm

    Very nice blog. I like it. It,s very informative.

Leave a Reply