| |

Choosing the Right Propensity Score Method: A statistics fable

Once upon a time there were statisticians who thought the answer to everything was to be as precise, correct and “bleeding edge” as possible. If their analyses were precise to 12 decimal places instead of 5, of course they were better because as everyone knows , 12 is more than 5 (and statisticians knew it better, being better at math than most people).

Occasionally, people came along who suggested that newer was not always better, that perhaps sentences with the word “bleeding” in them were not always reflective of best practices, as in,

“I stuck my hand in the piranha tank and now I am bleeding.”

Such people had their American Statistical Association membership cards torn up by a pack of wolves and were banished to the dungeon where they were forced to memorize regular expressions in Perl until their heads exploded. Either that, or they were eaten by piranhas.

Perhaps I am exaggerating a tad bit, but it is true that there has been an over-emphasis on whatever is the shiniest, new technique on the block. Before my time, factor analysis was the answer to everything. I remember when Structural Equation Modeling was the answer to everything (yes, I am old). After that, Item Response Theory (IRT) was the answer to everything. Multiple imputation and mixed models both had their brief flings at being the answer to everything. Now it is propensity scores.

A study by Sturmer et al. (2006) is just one example of a few recent analyses that have shown an almost logarithmic growth in the popularity of propensity score matching from a handful of studies to in the late nineties to everybody and their brother.

I tend to be on the cynical side when it comes to new techniques, and sometimes I’m completely wrong about differences from older techniques being a bunch of useless trivial crap someone over-hyped to get seven articles published so they could get tenure. Not long ago, I was working on a project that used both item response theory and multiple imputation.

Sometimes, but not always.

This brings us to propensity score matching. I kind of disagree with the article by Sturmer and friends. They say,

Only 9 of 69 studies (13%) had an effect estimate that differed by more than 20% from that obtained with a conventional outcome model in all PS analyses presented. … Publication of results based on propensity score methods has increased dramatically, but there is little evidence that these methods yield substantially different estimates compared with conventional multivariable methods.

Actually, 14 of the studies found differences greater than 20% (read their whole study, not just the abstract, to understand the discrepancy, you slacker!) Let us split no hairs, however, and agree for the moment that 13% it is … if more often than one out of every eight times the results differ by at least 20%, I’d say a technique is worth doing. I think the Sturmer gang has called the glass empty when it is actually one-eighth full.

HOWEVER … I do agree that the the call for increased complexity can go too far.

The moral of the story, so if you are tired, you can stop reading after this:

Given that there are times that propensity score matching does make a substantial difference in the results, if you go with the simplest, most understandable method of propensity score matching the results will be pretty much the same as if you went with the most difficult, most obscure method.

Recently, I gave a talk on propensity scores and the results in the example I used replicated what I and many others have found time after time. That is, whether you do quintiles or nearest neighbor matching or calipers, you get pretty much the same result.  This is relevant because as I will show in the next blog post or two, when I get around to it, doing a quintile match is relatively easy. “Relatively” is the key word in that sentence. If you find logistic regression easy you will find propensity score matching on quintiles easy.  Nearest neighbor matching is harder and calipers is harder still.

In quintiles, you divide your sample into five groups, the 20% LEAST likely to end up in your treatment group is quintile 1, the 20% with the GREATEST likelihood of ending up in your treatment group is quintile 5, and so on. You match the subjects by quintiles. So, if 12% of the treatment group is in quintile 1, you randomly select 12% of the control subjects from quintile 1.  You can easily do quintile matching in SAS with PROC LOGISTIC, PROC UNIVARIATE and a few DATA steps.

In nearest neighbor matching, as the name implies, you match each subject in the treatment group with a subject in the control group who is nearest in probability of ending up in the treatment group. This would be really difficult to do in SAS without some knowledge of macro programming. Stata has a psmatch2 command. I’ve read some interesting discussion on it . There is also an nnmtach command.  I haven’t used either myself. Both appear to be .ado files and similar to the SAS user-written macro. Suffice it to say that nearest neighbor matching does not avail itself of basic statistical procedures in the same way that quintiles does.

Then, there is the calipers (radius) matching, that uses the nearest neighbors within a given radius. Attempting this in SAS without the use of macro programming will just drive you insane, and your neighbors with you.  Earlier this year, I rambled on a great deal about how you could it using a macro.

Inter-American Development Bank vs. AnalysisFactor

Recently, I read an article by Heinrich, Maffiolo and Vasquez of Inter-American Development Bank who said

…. alternative approaches have recently been recognized as superior to pairwise matching. In contrast to matching one comparison group case with a given treated case, it has been found that estimates are more stable (and make better use of available data) if they consider all comparison cases that are sufficiently close to a given treated case.

To which I reply,


Superior is in the eye of the beholder.

I also recently read a post by The Analysis Factor, a.k.a. Karen Grace-Martin, on when to fight for your analysis and when to give in. I really loved this point she made:

Keep in mind the two goals in data analysis:

  1. The analysis needs to accurately reflect the limits of the design and the data, while still answering the research question.
  2. The analysis needs to communicate the results to the audience.


Given that quintiles are far easier to communicate, including both of these goals, I would say MOST of the time the quintile method is superior and almost all of the rest of the time the nearest neighbor method is superior. The only time you’d really benefit from the methods such as radius matching is when the nearest neighbor is often really not very near at all. And in that case, I would question the wisdom of doing propensity score matching at all.

Support my day job AND get smarter. Buy Fish Lake for Mac or Windows. Brush up on math skills and canoe the rapids.

girl in canoe

For random advice from me and my lovely children, subscribe to our youtube channel 7GenGames TV

Similar Posts

One Comment

  1. Your style is unique in comparison to other people I’ve read stuff from.

    Thanks for posting when you have the opportunity, Guess I’ll just book mark this

Leave a Reply

Your email address will not be published. Required fields are marked *