First of all, I want to draw your attention to this retraction in the Journal of the American Medical Association and mad props to Drs. Aboumatar and Wise and John Hopkins for doing the right thing in publicly retracting it.

For the TL; DR crowd

Someone who is probably now unemployed miscoded the study groups in this randomized clinical trial of self-management of Chronic Obstructive Pulmonary Disease. What does that mean? In this case, it meant that the reported results were the exact opposite of what was really observed because the treatment groups were coded incorrectly. Also, read the seven tips at the end of this post.

When I talk about statistical analysis, I focus 80% or more of my time and attention on the basics of knowing your data, cleaning your data and examining your data some more. To some, mostly younger, statisticians, that is not the sexy stuff. Why am I not talking about neural nets or generalized linear mixed models? Don’t I know that improving your prediction by .3% can result in millions of dollars in profit for a corporation that has 38 million customers?

What I know is that problems like the one in that JAMA article occur more often than we like to admit.

Recently, a student sent thesis results and then the next day sent an email saying, “Oops, I meant to use the DESCENDING option in PROC LOGISTIC but I didn’t, so the results are the exact opposite of what I said.”

A couple of years ago, I did an analysis with a depression scale for which the standardized coding is 0 to 3, but the application had used 1 to 4. The first analysis showed that every single person in the sample was clinically depressed. Fortunately, I caught this before it was published. Even when I re-analyzed the data with the correct scoring the mean score was extremely high. This was not a random sample of the population, but rather, children with a family member addicted to methamphetamine. The original (incorrect) analysis wasn’t in the opposite direction but it did somewhat overstate the problem.

Several years before that, I worked for a client who had a previous consultant with no knowledge of their particular field but who was a very good programmer. In reviewing some of that person’s code to understand the data and how it had been scored, I found that NONE of the items that should have been reverse-coded had been. The consultant had simply taken the sum of all of the items. This research had been published, by the way. I mentioned this to the client and suggested that a retraction was in order. That retraction never happened and I never worked for that client again.

My Six Tips for Saving Your Ass

A seventh, extra bonus tip

If you can’t understand the code that someone has written, not because you are a moron (can’t help you there), but because they are one of those people who never write comments in code, don’t believe in documentation and write code that includes an unnecessary number of macro variables, user-written macros and overly complicated solutions, fire their sorry ass and hire someone less pompous. I’m not saying you shouldn’t have macros or that because a person uses a DATA step and you prefer PROC SQL you should get rid of them. What I am saying is if you ask a person what decisions they made in writing that code and what was the reason for, say, using a generalized linear model instead of a general linear model, they should be able to tell you.


WP Themes