I tell clients on our statistical consulting side all of the time that if your conclusion is only valid if you look at this specific subset of your sample, with this particular statistical technique. You need to look for a convergence or results. Does the mean score increase? Does the proportion of people passing a test increase? Do the test scores still increase when you co-vary for the pretest score?
(This is for my friend, Dr. Longie, who tells me I always put too many numbers in things and should get to the point – no matter how we sliced it, the scores of students who played Fish Lake improved over 30% from their pretest. Analysis is continuing on Spirit Lake and other data from Fish Lake. There! Are you happy now?)
We are just at the very beginning stages of analyzing data from the second phase of our research grant funded by the U.S. Department of Agriculture. Coincidentally, we are in Maryland at the National SBIR Conference this week and got the chance to meet in person all of the folks whose email we have been receiving for years.)
When we were in the middle of developing and testing Fish Lake, one of the interns in our office asked me,
“Are you sure this is going to work?”
I told her,
“No, I’m not sure. That’s why they call it research.”
School has now ended at all of our test sites and I have just completed cleaning the data for analysis from the first data set, which is the pre- and post-test data for Fish Lake, our game that teaches fractions as your avatar retraces the Ojibwe migration – canoeing, hunting and fishing your way across the continent.
So … what happened?
The first thing I did was compute the mean and standard deviation for the students who completed the pretest and the posttest. Then, I merged the datasets together and did a paired t-test for the 61 students who took the post-test and pre-test both. I didn’t show you any of those results because I assumed (correctly) that the merge would have to be reviewed because some people would have misspelled their username on the pretest or posttest. Surprisingly, I only found two of those, as well as one record that was just testing the software by one of our interns. The programs that I developed to clean the data (programs presented at a couple of regional SAS software conferences) worked pretty well.
Then, I re-ran the analysis.
Pre-test mean = 22.4%, SD= 16.5% N=260
Post-test mean = 30.8% SD =17.4% N=63
So far, so good. We were not surprised by the low scores on the pretest. We knew that the majority of students in several of our test schools were achieving a year or two below grade level. The improvement from pre-test to post-test of 8.4% represented an improvement in test scores of 37.5%
BUT …. what if the students who did not take the post-test were the lower performing students? Shouldn’t we do a pretest and post-test comparison only including matched pairs?
This brings us to ….
Result 2 – With Matched Pairs
Pre-test mean = 23.6%, SD= 17.4% N=63
Post-test mean = 30.8% SD =17.4% N=63
As hypothesized, the students who completed the post-test scored higher on the pretest than the average, but not dramatically so. The difference was still statistically significant (p < .01)
What about outliers? That standard deviation seems awfully high to me and when I look at the raw data I find five players who have a 0 on the pretest or post-test and one who had one of the highest scores on the pretest whose test is blank after the first few questions.
Now, it is possible that those students just knew none of the questions – but it appears they just entered their username and (almost) nothing else. I deleted those 6 records and got this
Result 3 – Matched Pairs with Outliers Deleted
Pre-test mean =24.4 SD =16.2 N=57
Post-test mean = 32.7 SD = 16.7 N=57
With a difference of 8.3 percentage points, this presents an improvement of 34% (p<.001 )
Conclusion ? Well, we are not even close to a conclusion because we have a LOT of more data still to analyze, but what I can say is that the results are looking promising.