Hopefully, you have read my Beginner’s Guide to Propensity Score matching or through some other means become aware of what the hell propensity score matching is. Okay, fine, how do you get those propensity scores?

Think about this carefully for a moment, if you are using quintiles, you are matching people by which group they fit into as far as probability of being in the treatment group. So, if your friend, Bob, has a predicted probability of 15% of being in the treatment group, his quintile would be 1, because he is in the lowest 20%, that is, the bottom fifth, or quintile. If your other friend, Luella, has a predicted probability of being in the treatment group of 57%, then she is in the third quintile.

Oh, if only there were a means of getting the predicted probability of being in a certain category – oh, wait, there is!

Let’s do binary logistic regression with SAS Studio

First, log into your SAS Studio account.

Second, you probably need to run a program with a LIBNAME statement to make your data available. I am going to skip that step because in this example I’m going to use one of the SASHELP data sets and create a data set in mu WORK library as so, so I don’t need a LIBNAME for that but, as you will see, I do need it later. Here is the program I ran.

data psm_ex ;
set sashelp.heart ;
if smoking = 0 then smoker = 0 ;
else if smoking > 0 then smoker = 1;
WHERE weight_status ne “Underweight” ;

libname mydata “/courses/blahblah/c_123/” ;

run;

My question is if I had people who had the same propensity to smoke, based on age, gender, etc. would smoking still be a factor in the outcome (in this case, death). To answer that, I need propensity scores.

Third, in the window on the left, click on TASKS AND UTILITIES, then STATISTICS and select BINARY LOGISTIC REGRESSION, as shown below.

1select_task

Next,  choose the data set you want by clicking on the thing under the word DATA that looks like a table of data and selecting the library and data set in that library. Next, under RESPONSE, click the + sign and select the dependent variable for which you want to predict the probability. In this case, it’s whether the person is a smoker or not. Click the arrow next to EVENT OF INTEREST and pick which you want to predict, in this case, your choices are 0 or 1. I selected 1 because I want to predict if the person is  a smoker.

Below that, select your classification variable,

choosing data

 

There is also a choice for continuous variables (not shown) on the same screen.  I selected AGEATSTART.

I’m going to select the defaults for everything but OUTPUT. Click the arrow at the top of the screen next to MODEL and keep clicking until you see the OUTPUT tab. Click on the box next to CREATE OUTPUT DATASET. Browse for a directory where you want to save it.  I had set that directory in my LIBNAME statement (remember the LIBNAME statement) so it would be available to save the data. Select that directory and give the data set a name.

Click the arrow next to PREDICTED VALUES and in the 3 boxes that appear below it, click the box next to predicted values.

create output data set

 

After this, you are ready to run your analysis. Click the image of the little running guy above.  When your analysis runs you will have a data set with all of your original data plus your predicted scores.

predicted

 

Now, we just need to compute quintiles.You could find the quintiles by doing doing this:

PROC FREQ DATA=MYDATA.STATSPSM ;

tables pred_ ;

and look for the 20th, 40th, etc. percentile

However, an easier way if you have thousands of records is

proc univariate data=mydata.statspsm ;
var pred_ ;
output pctlpre=P_ pctlpts= 20 to 80 by 20;
proc print data=data1 ;

Which will give you the percentiles.

Support my day job AND get smarter. Buy Fish Lake for Mac or Windows. Brush up on math skills and canoe the rapids.

girl in canoe

For random advice from me and my lovely children, subscribe to our youtube channel 7GenGames TV

One advantage of writing this blog for almost a decade is that there are a lots of topics I have already covered. However, software moving at the speed that it does, there are always updates.

So, today I’m going to recycle a couple of older posts that introduce you to propensity score matching. Then, tomorrow, I will show you how to get your propensity scores with just pointing and clicking with a FREE (as in free beer) version of SAS.

beer

Before you even THINK about doing propensity score matching …

Propensity score matching has had a huge rise in popularity over the past few years. That isn’t a terrible thing, but in my not so humble opinion, many people are jumping on the bandwagon without thinking through if this is what they really need to do.

The idea is quite simple – you have two groups which are non-equivalent, say, people who attend a support group to quit being douchebags and people who don’t. At the end of the group term, you want to test for a decline in douchebaggery.

However, you believe that that people who don’t attend the groups are likely different from those who do in the first place, bigger douchebags, younger, and, it goes without saying, more likely to be male.

The very, very important key phrase in that sentence is YOU BELIEVE.

Before you ever do a propensity score matching program you should test that belief and see if your groups really ARE different. If not, you can stop right now. You’d think doing a few ANOVAs, t-tests or cross-tabs in advance would be common sense. Let me tell you something, common sense suffers from false advertising. It’s not common at all.

Even if there are differences between the groups, it may not matter unless it is related to your dependent variable, in this case, the Unreliable Measure of Douchebaggedness.

For more information, you can read the whole post here, also read the comments because they make some good points

What type of Propensity Score Matching is for you? A statistics fable

Once upon a time there were statisticians who thought the answer to everything was to be as precise, correct and “bleeding edge” as possible. If their analyses were precise to 12 decimal places instead of 5, of course they were better because as everyone knows , 12 is more than 5 (and statisticians knew it better, being better at math than most people).

Occasionally, people came along who suggested that newer was not always better, that perhaps sentences with the word “bleeding” in them were not always reflective of best practices, as in,

“I stuck my hand in the piranha tank and now I am bleeding.”

Such people had their American Statistical Association membership cards torn up by a pack of wolves and were banished to the dungeon where they were forced to memorize regular expressions in Perl until their heads exploded. Either that, or they were eaten by piranhas.

Perhaps I am exaggerating a tad bit, but it is true that there has been an over-emphasis on whatever is the shiniest, new technique on the block. Before my time, factor analysis was the answer to everything. I remember when Structural Equation Modeling was the answer to everything (yes, I am old). After that, Item Response Theory (IRT) was the answer to everything. Multiple imputation and mixed models both had their brief flings at being the answer to everything. Now it is propensity scores.

A study by Sturmer et al. (2006) is just one example of a few recent analyses that have shown an almost logarithmic growth in the popularity of propensity score matching from a handful of studies to in the late nineties to everybody and their brother.

You can read the rest of the post about choosing a method of propensity score matching here. If your clicking finger is tired, the take away message is this —  quintiles, which are much simpler, faster to compute and easier to explain, are generally just as effective as more complex methods.

Now that we are all excited about quintiles, the next couple of posts will show you how to compute those in a mostly pointy-clicky manner.

Support my day job AND get smarter. Buy Fish Lake for Mac or Windows. Brush up on math skills and canoe the rapids.

girl in canoe

For random advice from me and my lovely children, subscribe to our youtube channel 7GenGames TV

I have to choose between either SAS or SPSS for a new course in multivariate statistics. You can take it up with the university if you like, but  these are my only two options, in part because the course is starting soon.

I need to decide in a few days which way to go. Here are my very idiosyncratic reasons for one versus the other:

  • SPSS
  • There is a really good textbook on multivariate statistics that I think would be perfect for these students and it uses SPSS. The book is Advanced and Multivariate Statistics by Mertler & Vannatta, in case you were wondering.
  • SPSS can be installed pretty easily on the desktop and these are pretty non-technical students, so that’s a plus.
  • The point and click interface for SPSS is pretty easy and similar to Excel which most people have used.
  • Personally, I haven’t used SPSS in a while so it would be nice to use something different.

SAS

  • Students can just register and go to the website to use SAS Studio
  • Structural equation modeling and other advanced statistics procedures built in and not on add-on
  • SAS Studio is free vs $80 or so for students and $260 for professor (i.e., me) to buy SPSS academic versions including add-ons needed
  • I’m more familiar with SAS and find it easier to code than SPSS syntax.

I’ve toyed with the idea of showing both options but that uses up class time better spent on teaching, for example, how do you interpret a factor loading or AIC.

My big objection to SAS is I can’t find a recent textbook that is good for a multivariate analysis course that is in a social sciences department. The best one is by Cody and that is from 2005. I also use a couple of chapters from the Hosmer & Lemeshow book on Applied Logistic Regression , but I need something that covers factor analysis, repeated measures ANOVA and hopefully, MANOVA and discriminant function analysis, too.

I think most of these students have careers in non-profits and they are not going to be creating new APIs to analyze tweets or anything using enormous databases, so the ability to analyze terabytes is moot. This will probably be their second course in statistics and maybe their first introduction to statistical software.

Suggestions are more than welcome.

Support my day job AND get smarter. Buy Fish Lake for Mac or Windows. Brush up on math skills and canoe the rapids.

girl in canoe

For random advice from me and my lovely children, subscribe to our youtube channel 7GenGames TV

P. S. You can skip the hateful comments on why SAS and SPSS both suck and I should be using R, Python or whatever your favorite thing is. Universities don’t usually give carte blanche. These are my two choices.

P.P.S. You can also skip the snarky comments on how doctoral students should have a lot more statistics courses, all take at least a year of Calculus, etc. Even if I might agree with you, they don’t and I need tools that work for the students in my classes, not some hypothetical ideal student.

me dressed up for the renaissance faireThe most useful function Facebook has served for me is as a time machine. That is, students, friends and acquaintances I had not seen in 20, 30 or 40 years, who are in my memory as small children or teenagers all of a sudden reappear in my life as young adults with spouses and children, or old, retired people.

It’s weird seeing that 8-year-old that I used to coach now 42 years old with adult children of her own. The serious, hard-working 11-year-old boy is now 27, a college graduate and new father. My fellow enthusiastic, naive graduate students are professor emeriti. How weird is that?

The first thing I have learned is that nothing lasts. The kid who was sobbing because she lost in the finals of the Junior Olympics and it ruined her life  has rarely thought about that match in 30 years. The teammate who was so in love with himself in his twenties, who always had at least two girlfriends at a time, and who I thought was an egocentric pain in the ass, now looks back on those days with amusement and embarrassment. What little hair he has left is snow white . He didn’t become a movie star as he expected. He ran a Harley Davidson dealership for 30 years and is now retired in Florida.

We can love our children more than life itself, but they are still going to grow up, get jobs and families of their own and live their own lives, as they should.

The second thing I have learned is that family is what brings us the greatest joys in life, if we are lucky, and the greatest sorrows, if we are cursed and a mix of both if we are normal. All of the photos of young parents have that same lovestruck and bewildered expression, as if to say, “I love this baby” and “I have no idea what the fuck  I am doing” both at the same time.

The newly married/ newly engaged couples all have the same phrases about how lucky they are and the divorced/ separated couples mostly sound equally bitter.

When we’re young we’re mostly focused on careers – because how else are we going to pay for diapers and baby food and tournament entry fees and piano lessons and college tuition for those babies? When we get older, we realize it doesn’t matter so much whether we are a retired professor or a retired janitor. Our grandchildren could care less.

The third thing I have learned is how lucky I am to live in the time and place that I do. Lately, in my spare time that I do not have, I have been reading a lot of history. Whether it is hygiene or women’s right or economic inequality or violence in society, in the overall scheme of things, we are SO much better off than we have ever been. That’s a post for another day, though, since I have to leave for Palm Springs in a few hours.

Support my day job AND get smarter. Buy Fish Lake for Mac or Windows. Brush up on math skills and canoe the rapids.

girl in canoe

For random advice from me and my lovely children, subscribe to our youtube channel 7GenGames TV

The year I turned 55, I wrote a series of blog posts on 55 things I’ve learned in 55 years. I’ve probably learned more than three things since, but one particular lesson has come back to me over and over the past few years.

People are more than their accomplishments – sometimes for better and sometimes worse.

This is one of those lessons that should be obvious and we’ve all probably given lip service to it at one time or another. “The janitor in the building deserves just as much respect as the university president.” As far as I can see, most people don’t really believe that. They go to major effort to attend any event where billionaires or celebrities are present, and despite all of the talk about ‘supporting our troops’, they really wouldn’t bother to go to a barbecue for the guy who came home from Afghanistan a few years ago.

Maybe it’s because as people get older they get tired of maintaining barriers and let you see more who they really are. Maybe I’ve just gotten better at paying attention.

I used to think I was smarter, more motivated, harder working and braver than the average person because I had overcome a lot of hurdles to accomplish at a high level in sports, academics and business. That’s embarrassing to admit because I now realize how completely wrong I was, and how I let the opportunity slip through my fingers to get to know better some really amazing people.

I’ve come to know people who came thousands of miles, hopping freight trains, hiding in the back of tractor-trailers, to escape civil war and violence, who worked 14 hour days at minimum wage to give their children a better shot in life. I’ve learned the university president has been in rehab three times for alcoholism. I’ve found out that the mid-level manager for the medium-sized company is far from mediocre, having spent 20 years in the military, first in combat zones and then training recruits how to survive. I’ve learned that the old guy who retired from the factory had been in some of the bloodiest battles in World War II.

It’s not just surviving wars or escaping from them. There are people who at first seem like the most staid, judgemental bureaucrats you’d ever meet, who would never lift a finger to do anything outside of the box, and then you find they are raising their five grandchildren after their child overdosed on methamphetamine or they spend their evenings volunteering at the prison to teach literacy classes. That really quiet guy that works at the library? Yeah, he spent nine years working for start-ups in Africa ‘because I wanted to understand more of the world than where I grew up’.

There is the flip side, too, the people who seemed to have it all together who turn out to have no real moral standing. Someone can be financially successful, well-educated and hit the gym at 5 am every morning, yet that person will still do business with someone known to have molested children and then bribed officials to get out of being prosecuted because, “Well, it’s just business.”

People with absolutely stellar credentials will lie to your face and it won’t bother them at all. On the other hand, people with equally stellar credentials will work another two hours on top of the 18 hours they already worked because they promised they would come to your fundraiser and they always keep a promise.

Whoever is up may be down next year and whoever is down might be up

Some people work for one company, volunteer for one organization or live in one community until they are doddering up to get the lifetime achievement award for fifty years of service. I’m the opposite of that, and so I’ve had the experience many times of running across someone I had not seen for 10, 20 or 40 years. People I was so angry with because they made an unethical decision a decade or two ago, I look at now and they are lonely, pathetic old people who have to live with themselves. Other people, I was a complete idiot to not pay enough attention to because they were ‘not important enough’ or ‘not interesting enough’ or ‘not smart enough’ and they have led fascinating, productive lives that I admire.

So, my biggest lesson I have learned is to take more time to listen to people and get to know them. Sometimes, getting to know them means I head in the opposite direction as far and fast as possible. More often, though, it means I learn more about the world than my little place in it.


Have kids? Know anyone who has kids? Like kids? Own a computer? Fish Lake will teach fractions and Native American history, with no whining and all for under ten bucks.

Buy our games