# Replication, Correlation and Causation

There is not nearly enough replication in scientific research. It’s unfortunate that funding agencies and academic journals always want to see a new twist – a different technique, a different population. Personally, I’m very interested in reading studies that say:

“I did the exact same study as Mary Lou Who and I found pretty much the same thing.”

One reason this is interesting is that it controls for the history effect. Maybe a specific event determined the outcome. A second reason I find replication interesting is that people are very quick to generate causal hypotheses to explain relationships after the fact. In a subsequent study, those hypotheses can be assessed. Do they still stand up?

Here is an example that comes up in my personal life a lot. People assume since Darling Daughter Number 3 is on TV and in the movies a lot that it helps my business.

Let’s take a look at the graph below:

This shows website statistics for The Julia Group site. Those lines are average daily visits to this site in months when my little pumpkin had UFC world title fights. I used average daily visits to control for the fact that some months have more days than others. Contrary to expectations, the months when she had fights I had stagnant or declining number of visitors. Hearing this, some of the same people who had suggested her career would have a positive effect on business, without blinking an eye reversed themselves and said it must be because I was distracted and away from the office during those months.

Let’s replicate that graph with data from 2012-2013. You see a pretty similar trend between the top and bottom lines. Over the past couple of years, visits have been rising, so the average daily visitors is higher than in 2012-13 but the pattern is the same – an increase during the months from September to December and fewer visits in the summer months. December 2012 was a little unusual compared to most years – usually there is a drop over the holidays.

Because I see these same trends year after year, I realize it’s not at all attributable to how much Ronda is in the media in a given month. It’s a seasonal trend. Since I write about statistics and programming a lot, I’m pretty sure more people come to this blog during the academic year when they are taking a class. Also, people can read my blog at work and pretend it is work-related, even if I’m just ranting about something that day, because, hey there is a possibility that it COULD be about something relevant.

This assumption is further supported by the fact that the lowest days of the week for website visits are Saturday and Sunday.

It’s also interesting when you don’t find the same thing

If one defines “interesting” as not getting what you want, I had an interesting experience with a research project recently. Replicating the project a second year, we ran into all kinds of technical difficulties and the results were far from significant. In short, the subjects did not receive the planned intervention so no effect of intervention was observed. Much swearing ensued. I’m now analyzing data from the third year of the same project.

Multi-year studies make so much more sense to me and it troubles me that there are not more of them. I understand the reasons. For one thing, there is so much pressure to publish in many institutions that people put out as many articles as they can as quickly as they can (everyone except for YOU, of course). They are expensive and it is hard to justify funding to study something you already supposedly studied and reported the results.

Yeah, I get it, but just like those people who confidently explain my website statistics, without replication it is too easy to be persuaded that one’s first, or completely contradictory second, hypothesis is correct.

=============

Want to be even smarter? Back us on Kickstarter. You can check out the video to see what I am researching now. Yes, I collect data from video games with animated chickens. Don’t judge me.

## One Comment

1. James says:

Its funny because right now you read a lot about Baysian vs. Frequentist and all these methodological comparisons. At the end of the day, simple replication of research solves a lot of the drama associated with that debate.

I remember in grad school one of my professors said it used to be that Master’s students had to do their thesis as a replication of another study. Seemed like a brilliant idea to me – great way to see if somebody can write and run a study and it also allows for need replication.