In a previous post, I asked what you would do if one person’s score changed your results?
- Would you throw them out?
- Leave them in?
- Does it depend on whether they support your hypothesis or not?
A few people suggested collecting more data and I completely agree with their very valid points that if one person can change your results from significant to non-significant, you probably have a small sample size, which we did, and that is a problem for a number of reasons that warrant their own posts. It’s not always possible to collect more data, due to time, money or other constraints (only so many people are considerate enough to die from rabies bites in a given year). In our case, we have a grant under review to follow up on this pilot study with a much larger sample so if you are on the review committee let me just take this opportunity to say that you are good-looking and your mother doesn’t dress you funny at all.
A couple of other people commented on not getting tied up with significance vs non-significance too much, especially since a confidence interval with a sample size this small tends to be awfully wide. I agree with that also, but that, too, is a post in itself.
So, what would I do?
First of all, I would check if there were any problems in data entry. You’d laugh if you knew how often I have heard people trying to explain results due to an outlier and that outlier turns out to be a data entry person who typed 00 instead of 20 or a student who just went down the column circling everything “Always”.
For example, on this particular screening measure for depression, some of the items are reverse coded. If you did not pay attention to that and you just answered “A lot” for every item you would get an artificially depressed score (no pun intended). That was not the case here. I looked at the individual responses and, for example, the subject answered “Not at all” to “I felt down and unhappy” and “A lot” to “I felt happy”.
I checked to see that the measure was scored properly. Yes, there answers were consistent, with “Not at all” to all of the depressed items and “A lot” to all of the reverse coded items. This was just a happy kid.
So, that wasn’t it.
Second, I checked to see if there was a problem with the subject. Occasionally, we will get a perfect score on the pre or post-tests for our math games and upon closer examination, it turns out that prodigy is actually a teacher who wanted to see what our test was like for him/herself. Either that, or it was a really dumb kid whose failed fifth-grade 37 times.
That wasn’t it, either. This student was in the same target age group from one of the same two American Indian reservations as the rest of the students.
After ruling out both non-sampling error and sampling error, I then went and did what most people recommended. I analyzed the data both ways. Now, in my case, the one student did not change the results, so when I reported the results to staff from the cooperating reservations, I mentioned that there was one outlier but 2/3 of the youth tested were above the screening cut off for symptoms of depression and the cut-off score is 15 while the mean for the young people assessed on their reservation was 21. I should note that this was not a random sample but rather a sample of young people who had a family member addicted to alcohol or drugs, mostly methamphetamine.
Since in this case the results did not change substantively, I just reported the results including the outlier.
If there HAD been a major difference, I would have reported both results, starting with the results without the outlier and state that this was without one subject included and that with that outlier, the results were X.
I think the results without the outlier are more reliable because if you finding significance (or not) depends on that one person it’s not a very robust finding.
Here is my general philosophy of statistics and it has served me well in terms of preventing retracted results and looking like an idiot.
Look for convergence.
What I mean by that is to analyze your data multiple ways, and, if possible, over multiple years with multiple samples.That’s one reason I’m really grateful we’ve received USDA Small Business Innovation Research funding over multiple years. Where university tenure committees are fond of seeing people crank out articles, the truth is, at least with education, psychology and most fields dealing with actual humans, it often takes quite some time for an intervention to see a response. Not only that, but there is a lot of variation in the human population. So, you are going to have a lot more confidence in your results if you have been able to replicate those with different samples, in different places, at different times.
If your significant finding only occurs with a specific group of 19 people tested on January 2, 2018 in De Soto, Missouri, and only when you don’t include the responses from Betty Ann McAfferty, then it’s probably not that significant, now is it?
Please check our latest series in the app store for your iPad, Aztech Games, which teaches Latin American history and (what else) statistics. The first game in the series is free.