I’m often skeptical of proponents of both Item Response Theory and multiple imputation procedures, not because either IRT or MI is a bad thing in itself but because its inclusion makes data analysis and reporting more complicated. At the National Center for Education Statistics (NCES) seminar this week, Dr. Emmanuel Sikali gave some damn good reasons why the National Assessment of Educational Progress (NAEP) makes use of both IRT and multiple imputation.

In very brief, he noted that it is only fair when conducting an assessment to measure what is taught in the curriculum. Your good old basic content validity. However, to do that for an area like mathematics or reading would take a very large number of questions. We not only want students to answer the test questions but we also want to get information like if they have a computer at home, how many minutes a night they spend on math homework and a few dozen similar questions. We can’t really expect schools to volunteer to have students take a four-hour long, 250 item test for each subject. We can’t expect students to be willing to spend that much time on testing without getting tired or sick of it. (The tests are done in grade 4 and grade 8. I can already hear the world’s most spoiled 13-year-old saying, “This stupid test sucks. I’m not doing it.”)

SO … how do you fairly assess the curriculum without giving students a zillion item test? You create say, four different tests and give each student 1/4 of a zillion items. Then, based on the answers students gave for the questions they did receive, you estimate the scores they would have gotten on the other items. Because students randomly get one version of a test, data really ARE missing at random. Of course, you don’t want to treat this data you imputed the same as the items the students actually answered because there is some uncertainty as to whether your estimate is correct. So, you do this imputation multiple times. Hence the name. You can read a pretty nice introduction to multiple imputation here by Joseph Schafer at the Penn State University Methodology Center. They provide a ton of useful information on their site. Between them being funded by DHHS and the Dept of Ed funding NCES, I have decided that I will not become a Republican this week because I have proof that the government DOES do some things right.

Okay, so we are convinced of the greater goodness of multiple imputation. Now what do we do with those plausible values? Also, I should throw in that students being sampled within schools, you need to account for the cluster in sampling. Oh, and it is not a simple random sample, you need to include student weights. You could use SAS. If you pay for the complex samples module, you can use SPSS.

The Department of Education funded development of AM Statistical Software (no, it was not named after me). You can download it for free and it is unbelievably simple to use. It is all pointing and clicking. As far as I know, it only runs on Windows. I used an earlier version that only imported SPSS datasets. The AM website says they now import SAS datasets also. It’s no problem if you have the older version. I just did the creation of factors, recoding, etc. that I wanted in SAS, then exported it as an SPSS file.

Analysis is also super simple. More on that tomorrow, though.

