The truth is, what I wanted to be talking about today was either data mining, text mining or mixed models. Those are three things I want to be doing more and would be doing more except that we have a Kickstarter campaign going on to fund the next six levels of our game that teaches math, which really is awesome < Seriously, it is .
So, even though I was feeling statistically deprived these days, I have to admit that sometimes simple statistics do give you pretty straight answers. Let’s take the test we created to see whether this game really works to improve students’ math scores. This is what we used to measure the effectiveness of the game in our pilot phase. Our original idea was to take released items from the state standards test. Turns out that North Dakota, where we piloted the game, is one of the states that never releases the items on its tests. So … we found other states that had identical standards, like, “Solve problems involving division of multi-digit numbers by one-digit numbers.”
Then we took questions released from those tests, like:
6. Valerie has 225 pennies. She divides her pennies into five equal piles. How many pennies are in each pile?
and created our test aligned to state standards. That is good for content validity – that is our test matches the content teachers were supposed to be teaching. When we look at the percentage of each item answered correctly by grade level, we see two things.
First, if you look at those vertical lines, after the third, eighth and eighteenth questions, those are grade level. As I wrote about previously, this gives us some evidence for contract validity given that fourth-grade students answer most questions at the second grade level correctly, and relatively few at the fourth and fifth-grade level. (Because this was a low-performing school on other criteria, we expected many students to be below grade level.)
Notice the dashed horizontal line I added, though. That is at 25%. If students just randomly guessed, they would get 25% correct. Many of those who got those items “correct” , I would suppose just guess. This introduces random error and makes your results less reliable. Now correction of scores for guessing is not new. Frary, Cross and Lowry published an interesting article on the topic and how it affects reliability in the Journal of Experimental Education back in 1977 and there has certainly been plenty of discussion since.
Also interesting to me, notice how many questions are BELOW that line of 25%. Why do you think that is? Are Native American kids just bad guessers? I know the answer to that question, but put your guesses in the comments and I’ll tell you on Friday.