Propensity score matching has had a huge rise in popularity over the past few years. That isn’t a terrible thing, but in my not so humble opinion, many people are jumping on the bandwagon without thinking through if this is what they really need to do.
The idea is quite simple – you have two groups which are non-equivalent, say, people who attend a support group to quit being douchebags and people who don’t. At the end of the group term, you want to test for a decline in douchebaggery.
However, you believe that that people who don’t attend the groups are likely different from those who do in the first place, bigger douchebags, younger, and, it goes without saying, more likely to be male.
The very, very important key phrase in that sentence is YOU BELIEVE.
Before you ever do a propensity score matching program you should test that belief and see if your groups really ARE different. If not, you can stop right now. You’d think doing a few ANOVAs, t-tests or cross-tabs in advance would be common sense. Let me tell you something, common sense suffers from false advertising. It’s not common at all.
Even if there are differences between the groups, it may not matter unless it is related to your dependent variable, in this case, the Unreliable Measure of Douchebaggedness.
Say, for example, that you find that your subjects in the support group are more likely to eat grapefruits for breakfast, live on even-numbered streets and own a parrot. Even though I’d be a little suspicious of anyone who gets up early enough to eat breakfast, if it turns out that none of those variables are related to how big of douchebag you are, there is no point in doing a propensity score match.
Finally, and perhaps most obvious and most frequently overlooked, if your dependent variable is not measured reliably, no amount of statistical hocus-pocus is going to make anything predict it. (Short explanation – an unreliable measure is one that has a large proportion of error variance. Error variance is, by definition, random. Random error is not going to be related to anything. Imagine that every student just colored in the bubbles in the test at random. Now imagine trying to predict the test scores with any variable. Not happening. I think all students SHOULD color in the test sheets at random. I did once. The school psychologist told me I was mentally retarded. She was wrong.)
and AFTER you do propensity score matching (or anything else) …
Even after all of this, sometimes it still doesn’t work. A few years ago, I had a client who had a really logical theory, well-designed study and when we ran the analyses every which way, none of the data supported their hypotheses.
At the end of it all, the client asked me what else we could do, and I said
“There isn’t anything else we can do that I would recommend. You know, sometimes the theory is just wrong.”
It reminds me of the title of a good presentation I went to at the Joint Statistical Meetings earlier this month,
“Bayesian statistics are powerful but they’re not magical”
I think that could be applied to just about any kind of statistical technique. I wish I had said it first.