My Life, TIMSS and Open Data – random & probably of interest to no one but me

I was disappointed to see that the Open Data community is pretty inactive over at data.gov. With 305,000 datasets and counting released you’d think there’d be more than a handful of people posting over there.  I decided I would start on my own with the TIMSS data. This is the Trends in International Mathematics and Science Survey. Props to them for releasing their data along with their programs, codebooks and publications. It takes some nerve to open up your work to the public and let other people have the freedom to scrutinize it and possibly criticize it.

First, I downloaded three folders containing 20 files – six text data files, six codebooks and eight SAS programs and other documentation.

No one asked my opinion on this – but that’s never stopped me before. Presumably one benefit the government is hoping to obtain by releasing its data is to get feedback – crowd-sourcing, differing perspectives.

Here are a few observations on the TIMSS data. These are not saying there was anything wrong with the way the analyses were done but simply that different people, me, for example, might have different interests.

In their analyses, they seem to have an interest in whether a question was answered incorrectly or just not answered either because the student skipped that item, and went on to others, or because he or she didn’t make it that far in the test, presumably because time ran out. Personally, I am interested in whether the student got it right or wrong. I’m assuming if the student skipped it, the reason was probably that she didn’t know the answer. Deleting out the formats that specified whether the student answered, omitted or skipped saved me thousands of lines.

I created an array of all variables that had a 998 or 999 and changed those to missing.

Originally, for some reason I could not open the file and view it in SAS. I was wondering if the data viewer had a limit to the number of variables, but now I am just thinking at the time I had too many applications running at once and there wasn’t enough memory available, because now it opens up fine.

My first ARRAY for re-coding included all of the variables from the first mathematics item  to the end, BSREA05 .

I realized when I started running the means to check everything that this isn’t what I wanted because that ends up changing all of the subscale scores, gender and age to 0 or 1. I only want the actual test questions coded that way. So, I changed it to:

array rec{*}  M022043 -- S042164 ;

Yes, this recodes all of the science items, too, which I ended up dropping but the math and science items were interspersed and I sure the hell wasn’t going to type in 200+ variable names. My typing skills aren’t that good.

In fact, because I am a total lazy slacker, I did this:
proc means data = in.g8_achieve07  n ;
output out = sam ;
proc transpose data = sam ;
id _stat_ ;
proc sort data = data1 ;
by _name_ ;
proc print data = data1 noobs ;
var _name_ ;

Then, I copied the variable names beginning with S from the output, pasted it under the word DROP and had my drop statement.  Yes, I will do almost anything to avoid typing. My mother told me that my life’s goal should be NOT to become a secretary (she was a secretary for 25 years so I guess she knows whereof she speaks).

At first, I  just coded everything either incorrect or correct, ignoring the partial responses. Then, I got to wondering whether that would make any difference. So, I re-ran the scoring program giving half a point for partial response and one point for a full response. This is not identical to the TIMSS scoring, which gives 2 points for a correct response and 1 for a partial response. I guess the rationale is that a partial response to a really difficult question should be equivalent to a completely correct response to a much easier question. I am just guessing this is their reasoning,  and it does make some sense.

The reason I did it with the 1 point, .5 point is that it took me about one second to modify my program. You can see at right that whether I coded partial credit or not made very little difference over all, but for some items it was significant. Of course, for the items that didn’t have partial credit, it made absolutely no difference. The overall mean of item difficulty changed almost none – it was around .50 either way, which is actually optimal for item difficulty for a test.

Note that neither of the ways I scored the data were the way TIMSS scored it. All multiple choice items were one point in their method, SOME of the problems that required a written solution awarded 2 points with possible 1 point partial response credit. Others did not award partial credit. It’s not difficult to use the formats supplied with the TIMSS data to create a scoring program, but it will take longer than the one second it took me previously.

One reason I am fiddling with the scoring is that I want to see how robust these results are. People always say you can prove anything with statistics. Yeah, in the same way you can prove anything with an apple pie.

You can say, “If you listen very closely, this apple pie says President Obama is an alien” and some people will be stupid enough to believe you or simply very, very ill-informed. (Psst – apple pies can’t talk. Now you know.)

Before I go ahead and do any analyses by group I want to know if there are any global decisions that make a difference, like awarding partial credit or not. Everything I am doing so far just entails getting to know the data better – how it was coded, how it’s distributed, how different scoring criteria might make a difference. I was interested in this because right or wrong is a completely objective fact in mathematics – the area of the rectangle is 32 or it isn’t – but the decision to award partial credit or not is just that, a decision.

For now, though, I have to get back to work that pays actual money. Since it was Easter, I went to mass, of course, then out to eat with my lovely children, then to Universal Studios, which resulted in me getting back, doing  work for actual money until 3 a.m. and then updating my blog.

Therein lies the drawback of much of the analysis of open data, in that it relies on the goodwill of people like me to conduct and document their analyses for free – goodwill which is limited by the desire to occasionally see my children, the importance to me of being at mass on Easter, and the need to have enough cash for the cost of annual passes to Universal Studios.

Oh, and Happy Easter!

Similar Posts

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *