statistics

Before you even THINK about propensity score matching

ByAnnMaria De Mars August 27, 2012April 26, 2017

Propensity score matching has had a huge rise in popularity over the past few years. That isn’t a terrible thing, but in my not so humble opinion, many people are jumping on the bandwagon without thinking through if this is what they really need to do.

The idea is quite simple – you have two groups which are non-equivalent, say, people who attend a support group to quit being douchebags and people who don’t. At the end of the group term, you want to test for a decline in douchebaggery.

However, you believe that that people who don’t attend the groups are likely different from those who do in the first place, bigger douchebags, younger, and, it goes without saying, more likely to be male.

The very, very important key phrase in that sentence is YOU BELIEVE.

Before you ever do a propensity score matching program you should test that belief and see if your groups really ARE different. If not, you can stop right now. You’d think doing a few ANOVAs, t-tests or cross-tabs in advance would be common sense. Let me tell you something, common sense suffers from false advertising. It’s not common at all.

Even if there are differences between the groups, it may not matter unless it is related to your dependent variable, in this case, the Unreliable Measure of Douchebaggedness.

Say, for example, that you find that your subjects in the support group are more likely to eat grapefruits for breakfast, live on even-numbered streets and own a parrot. Even though I’d be a little suspicious of anyone who gets up early enough to eat breakfast, if it turns out that none of those variables are related to how big of douchebag you are, there is no point in doing a propensity score match.

Finally, and perhaps most obvious and most frequently overlooked, if your dependent variable is not measured reliably, no amount of statistical hocus-pocus is going to make anything predict it. (Short explanation – an unreliable measure is one that has a large proportion of error variance. Error variance is, by definition, random. Random error is not going to be related to anything. Imagine that every student just colored in the bubbles in the test at random. Now imagine trying to predict the test scores with any variable. Not happening. I think all students SHOULD color in the test sheets at random. I did once. The school psychologist told me I was mentally retarded. She was wrong.)

and AFTER you do propensity score matching (or anything else) …

Even after all of this, sometimes it still doesn’t work. A few years ago, I had a client who had a really logical theory, well-designed study and when we ran the analyses every which way, none of the data supported their hypotheses.

At the end of it all, the client asked me what else we could do, and I said

“There isn’t anything else we can do that I would recommend. You know, sometimes the theory is just wrong.”

It reminds me of the title of a good presentation I went to at the Joint Statistical Meetings earlier this month,

“Bayesian statistics are powerful but they’re not magical”

I think that could be applied to just about any kind of statistical technique. I wish I had said it first.

Support my day job AND get smarter. Buy Fish Lake for Mac or Windows. Brush up on math skills and canoe the rapids.

For random advice from me and my lovely children, subscribe to our youtube channel 7GenGames TV

Software | statistics

SAS and SPSS Give Different Results for Logistic Regression but not really

ByAnnMaria De Mars July 14, 2011July 14, 2011

When people ask me what type of statistical software to use, I run through the advantages and disadvantages, but always conclude, “Of course, whatever you choose is going to give you the same results. It’s not as if you’re going to get a F-value of 67.24 with SAS and one of 2.08 with Stata. Your…

statistics

Text Mining with Statistica (or anything else) – look again!

ByAnnMaria De Mars May 16, 2012May 16, 2012

There are some things to like about Statistica. The scatter plot matrix, for one. I’d done a sentiment analysis of a data set on blog posts (not mine). For each post, I had three variables number of negative sentiments expressed in the post, number of positive sentiments expressed in the post total number of comments…

statistics

Probability and z-scores

ByAnnMaria De Mars May 11, 2015May 11, 2015

For many students just learning statistics, the relationship of z-scores and probability is confusing. Let’s try this concrete example. Here is a chart of the distribution of height in a sample of over 2,800 women. Notice that the peak, the mode is around 62-63 inches. You can see the frequency table here, as well as a…

Software | statistics

A SAS Mystery Solved – When FREQ and MEANS disagree

ByAnnMaria De Mars June 15, 2015June 15, 2015

I’m preparing a data set for analysis and since the data are scored by SAS I am double-checking to make sure that I coded it correctly. One check is to select out an item and compare the percentage who answered correctly with the mean score for that item. These should be equal since items are…

Algebra | Dr. De Mars General Life Ramblings | statistics

Becoming an Expert Statistician (or Mathematician or Programmer)

ByAnnMaria De Mars March 2, 2012

It’s not often that you read a paragraph and it sticks in your mind for months. That this particular paragraph came not from some great literary work but rather from the proceedings of the annual meeting of the Association of Small Computer Users in Education is even more expected, but there it is. Douglas Kranch…

Dr. De Mars General Life Ramblings | statistics | Technology

Margin of Error

ByAnnMaria De Mars May 13, 2010May 13, 2010

On my way back from Tunisia via Paris I ended up in a redneck dive bar somewhere in Georgia reading the New York Times on my Kindle while the lady next to me asked the very drunk waitress if she knew who had won at NASCAR this weekend. This sounds like the beginning of a…

4 Comments

Peter Flom says:

August 29, 2012 at 6:25 am

Nice post, but I thought the real advantage of propensity score matching was to combine the effects of a bunch of variables on which the groups likely vary into one score, thus saving a lot of degrees of freedom in the regression (of whatever type) you are doing.

It can also make the output from a regression simpler, if you aren’t interested in all those covariates.

But propensity score matching has problems if the groups are really different – that is, if there isn’t much overlap in their scores on the covariates. I saw this happen in one study of the effects of job training. The two groups had almost no overlap on education, and no overlap at all on joblessness. Propensity score analysis gave ridiculous results.
draypresct says:

August 29, 2012 at 11:54 am

Aside from the situation Peter mentions (saving degrees of freedom when your dataset is extremely limited), you shouldn’t expect propensity scores to do anything that a normal regression model controlling for the same variables would do. If you have unmeasured confounding factors, both types of analysis are going to be biased.

If you’re dealing with a client who needs to be convinced that propensity scores have no magic powers, you might be interested in “Propensity scores: help or hype?” by Winkelmayer et al. (Nephrol Dial Transplant 2004) as a reference.
Pingback: A Beginner’s Guide to Propensity Score Matching : AnnMaria's Blog
Harold Medows says:

June 17, 2017 at 11:16 pm

I do consider all the concepts you’ve presented on your post. They’re very convincing and will definitely work. Still, the posts are very brief for starters. May you please extend them a little from next time? Thanks for the post.

Similar Posts

4 Comments

Leave a Reply