Software

When 3 = 15: Another annoying data problem

ByAnnMaria De Mars March 19, 2012

Last week I mentioned a problem with scoring questions when each of dozens of true/ false questions had not been scored true or false (as one might think) or 1 or 0 (as one might think in mathematical terms) but, no, in some bizarre Alice in Wonderland mushroom-eating logic, each question was recorded as the answer to TWO VARIABLES. The first was 1 if the person answered true, and missing otherwise. The second variable was coded 1 if the person responded false, and missing otherwise.

Last week, I also gave the solution to scoring these in Normal World, where we ended up with variables scored 0 or 1.

Just because that way of recording data was not fucked up enough, this little data problem presented itself:

Respondents were asked to rate their ability to read, write and speak a second language on a scale from 1 = None to 5 = Native speaker.

You might assume that these would be scored on a scale of 1 to 5 for three variables. You think that my friend, because you are not stupid.

You might assume that if these were for some unfathomable reason scored as fifteen variables that the first variable would be 1 if the respondent answered for the first question and missing otherwise. The second variable would be 1 if the respondent answered 2 for the first question and missing otherwise. You think this because you noted a pattern above and are logical.

Neither of those assumptions are true. In this case, the data were coded as so:

V1 = 1 if answered 1 to question 1, missing otherwise

V2 = 1 if answered 1 to question 2, missing otherwise

V3 = 1 if answered 1 to question 3

….. all the way down to …..

V15 = 1 if answered 5 to question 5, otherwise V15 is a missing value.

If that wasn’t enough to make you pull your hair out, after I scored it, I found out that some people had a score of 9 on a 1 to 5 scale.

In examining the data, it turned out that a few people had checked both 4 = Advanced ability and 5= Native speaker. While I understand how people could see those as not mutually exclusive categories and check both, the researcher wanted these people to have a score of 5.

Simply stated, the problem is this:

Take these 15 variables and code them into three questions. When respondents selected two choices, assign the the larger value.

The solution is actually quite simple and it is another array:

data test ; set newfile ; array language {3} writing listening speaking ; array langq {15} q1 - q15 ; do L = 1 to 3 ; language{L} = max(langq{L},langq{L+3}*2,langq{L+6}*3,langq{L+9}*4 , langq{L+12}*5) ; end ;

So, for speaking, for example, if the respondent checked :

q3 , none- the score = 1
q6, basic – the score = 2
q9, intermediate – the score = 3

and so on..

I could have used the SUM function if it wasn’t for the people who checked both 4 and 5. Using the MAX function gives those people a score of 5. Also, we had a discussion with the research team about (hypothetically) people who checked both 2 and 3, for example, because they felt their reading ability fell between basic and intermediate. In that case, their score would be rounded up to the next whole number. The MAX function then, would give a 3, so also working in that case, which didn’t actually occur in these data yet, but we like to be prepared.

Software | Technology

JMP: Three shiny things catch my eye

ByAnnMaria De Mars July 24, 2010July 24, 2010

Hmm … so, Liz, our finance person is incomparably efficient and unfailingly nice, where I am usually efficient and have a reputation for being correct 97.6% of the time (as someone commented on twitter, if it has decimals in it, it must be true). Between the two of us we just accomplished the impossible task…

Software

The DARK side of SAS On-Demand ?

ByAnnMaria De Mars September 7, 2011September 7, 2011

In case you don’t know – SAS On-Demand is the “cloud-based” version of SAS for teaching and research at universities. That’s a fancy way of saying it runs on the SAS servers and it’s free. Lately I have been happily working with SAS On-Demand for academics so I was a bit surprised speaking with someone…

Software | statistics

Statistics Guru Predicts Republican Sweep! With Proc GMAP

ByAnnMaria De Mars April 2, 2016April 2, 2016

Esteemed statistics guru, Dr. Nathaniel Golden has some sobering news for Democrats. His latest models predict a Republican blow out. As can be seen by the map below, the Republican front-runner has tapped into the mood of resentment in the country’s non-elites. When the dust has settled, only the two highest earning states in the…

Software | statistics | Technology

SAS Proc Transpose – how have I not written about this before?

ByAnnMaria De Mars December 30, 2018

When I was young and knew everything, I would frequently see procedures or statistics and think, “When am I ever going to use THAT?” That was my thought when I learned about this new procedure to transpose a data set. (It was new then. Keep in mind, I learned SAS when I was pregnant with…

Software | Technology

If SAS software products were men …

ByAnnMaria De Mars November 15, 2012November 15, 2012

I may expand this into a series on software products in general. Years ago, I wrote a post on the similarities between the Rocket Scientist and SAS Enterprise Guide. Neither made a great first impression, both revealed their brilliance over time, and I am still with both lo these many years later. Experiencing both SAS…

Software | Technology

Whipping your data into shape with SAS : Day 2 Fixing Errors & Identifying Input Datasets

ByAnnMaria De Mars February 26, 2018February 24, 2018

Last post, we happily uploaded our data, read it into SAS using a combination of SAS utilities and coding, decided all was lovely and used this code to concatenate the 4 datasets. DATA allplants ; set import1 – import4 ; IF you get an error at this point, what should you do? Let’s say you…

3 Comments

Sara says:

March 20, 2012 at 7:29 am

I would have liked to be a fly on the wall in the room when you were talking with the team and realized their bizarre reinterpretation of the concept of variables and variable levels.
Annmaria says:

March 21, 2012 at 1:56 am

It wasn’t their fault. When you download surveymonkey data in SPSS format this is what you get.
Pingback: It Beats Working Backward from the Semi-Colon : AnnMaria’s Blog

Similar Posts

3 Comments

Leave a Reply