Day 2: Start-up News – Boring, Important Measurement

Bar graph showing percentage correct by item grade level

It never ceases to amaze me that intelligent people will spend huge amounts of time doing a literature review, designing elaborate theories, generating elegant hypotheses, selecting a three-stage stratified random sample, performing multivariate analyses, and their measures on which this brilliant study rests are some questions they made up with their three best friends over Chardonnay during happy hour one Friday night. This is also known as the “panel of experts” method and it has the added benefit that it allows you to deduct the wine on your taxes. (Not actual tax advice. Consult your accountant. Of course, if you are doing your 1040 based on reading this blog, you are probably beyond help.)

We did not go with this approach. Our original idea was to use released items from the state standards test from North Dakota but, unfortunately, that is one of the states that never releases items. What we did was find standards that were the same, verbatim, as other states and then found items from those states that had been released. For example,

” Compute a given percent of a whole number”

and the problem would be

“What is 40% of 250?”

with the same four multiple choice options that had been used on the state test.

As someone pointed out, even if the same test had not been previously, since we pulled only the items that tested exactly what we included in the game, the individual items had been validated. So, we had content validity.

One bit of evidence for construct validity came from the item difficulty levels. Here is one of several charts. This shows what percentage of the fourth-grade students answered each item correctly. The items are broken down by grade level. It is also important to know that the state tests showed the majority of students at this school to be low-performing in mathematics. What we see is that as students go from second-grade level items, all of which the majority of the students answered correctly, to fifth-grade items, the percentage correct declines. We see that for the fifth-grade items, only one of them did the students exceed the 25% that would be answered correct by random guessing (remember, there were four multiple-choice options).

Since the state’s test have shown these students to be performing poorly, we should see that they generally are not at grade level, that is, they do not answer many of the fourth-grade items correctly at a rate exceeding chance. That, as you can see from the chart, is the exact situation.

Of course, we did more than this, beginning with replicating this identical chart with fifth-graders, who showed pretty much the same pattern but, as would be expected, answered a higher proportion correctly at each grade level than did the fourth-graders.

That’s the sort of thing that too many studies take for granted and never test. This isn’t the exciting part of creating a game, the part where you make an attack scene and the kid gets to shoot flaming arrows. So, what good does this do us? Well, the combination of the different analyses of the measure confirms that the measure we used for students to test whether or not their mathematics achievement increased is, in fact, a valid measure of mathematics achievement.

Also, this method has the advantage of not being required to share any of the wine with our best friend/ expert panel so we get to drink it all ourselves.

The Emperor’s New Statistics

ByAnnMaria De Mars May 3, 2010May 4, 2010

I had the pleasure of attending a lecture Rand Wilcox gave on the state of research. He was far more amusing than I expected from a statistician (perhaps this reflects low self-esteem on my part). He made the very valid point that all statisticians learn in the infancy of their careers that the general linear…

Dr. De Mars General Life Ramblings | Software | statistics

My Year in Books: Technical Edition

ByAnnMaria De Mars December 31, 2014January 3, 2015

I read a lot. This year, I finished 308 books on my Kindle app, another dozen on iBooks, a half-dozen on ebrary and 15 or 20 around the house. I don’t read books on paper very often any more. It’s not too practical for me. I go through them at the rate of about a book…

Dr. De Mars General Life Ramblings | Software | statistics | Technology

To specialize or not to specialize, in 140 characters or less

ByAnnMaria De Mars September 11, 2010September 11, 2010

AN ACTUAL CONVERSATION THIS WEEK … “This paper is not going to be as much an academic treatise as most of the ones I write, but I am hoping it will be more interesting. I was wondering about the fact that some well-respected people say the secret to career success is to be the foremost…

computer games | Software | Technology

The Secret Life of Evaluators, with SAS

ByAnnMaria De Mars July 20, 2016

At the Western Users of SAS Software conference (yes, they DO know that is WUSS), I’ll be speaking about using SAS for evaluation. “If the results bear any relationship at all to reality, it is indeed a fortunate coincidence.” I first read that in a review of research on expectancy effects, but I think it…

Algebra | statistics

Matrix Algebra, Just Because

ByAnnMaria De Mars October 2, 2014October 2, 2014

I was talking to a friend of mine today who had taken a test for a new job recently and he had a hard time with the math portion of it. We were in college about the same time and he did perfectly fine in math, but it had been a while. This got me…

Dr. De Mars General Life Ramblings | Software | statistics

Choosing the Right Propensity Score Method: A statistics fable

ByAnnMaria De Mars July 11, 2012April 27, 2017

Once upon a time there were statisticians who thought the answer to everything was to be as precise, correct and “bleeding edge” as possible. If their analyses were precise to 12 decimal places instead of 5, of course they were better because as everyone knows , 12 is more than 5 (and statisticians knew it…

One Comment

Similar Posts

One Comment

Leave a Reply