Software

When 3 = 15: Another annoying data problem

ByAnnMaria De Mars March 19, 2012

Last week I mentioned a problem with scoring questions when each of dozens of true/ false questions had not been scored true or false (as one might think) or 1 or 0 (as one might think in mathematical terms) but, no, in some bizarre Alice in Wonderland mushroom-eating logic, each question was recorded as the answer to TWO VARIABLES. The first was 1 if the person answered true, and missing otherwise. The second variable was coded 1 if the person responded false, and missing otherwise.

Last week, I also gave the solution to scoring these in Normal World, where we ended up with variables scored 0 or 1.

Just because that way of recording data was not fucked up enough, this little data problem presented itself:

Respondents were asked to rate their ability to read, write and speak a second language on a scale from 1 = None to 5 = Native speaker.

You might assume that these would be scored on a scale of 1 to 5 for three variables. You think that my friend, because you are not stupid.

You might assume that if these were for some unfathomable reason scored as fifteen variables that the first variable would be 1 if the respondent answered for the first question and missing otherwise. The second variable would be 1 if the respondent answered 2 for the first question and missing otherwise. You think this because you noted a pattern above and are logical.

Neither of those assumptions are true. In this case, the data were coded as so:

V1 = 1 if answered 1 to question 1, missing otherwise

V2 = 1 if answered 1 to question 2, missing otherwise

V3 = 1 if answered 1 to question 3

….. all the way down to …..

V15 = 1 if answered 5 to question 5, otherwise V15 is a missing value.

If that wasn’t enough to make you pull your hair out, after I scored it, I found out that some people had a score of 9 on a 1 to 5 scale.

In examining the data, it turned out that a few people had checked both 4 = Advanced ability and 5= Native speaker. While I understand how people could see those as not mutually exclusive categories and check both, the researcher wanted these people to have a score of 5.

Simply stated, the problem is this:

Take these 15 variables and code them into three questions. When respondents selected two choices, assign the the larger value.

The solution is actually quite simple and it is another array:

data test ; set newfile ; array language {3} writing listening speaking ; array langq {15} q1 - q15 ; do L = 1 to 3 ; language{L} = max(langq{L},langq{L+3}*2,langq{L+6}*3,langq{L+9}*4 , langq{L+12}*5) ; end ;

So, for speaking, for example, if the respondent checked :

q3 , none- the score = 1
q6, basic – the score = 2
q9, intermediate – the score = 3

and so on..

I could have used the SUM function if it wasn’t for the people who checked both 4 and 5. Using the MAX function gives those people a score of 5. Also, we had a discussion with the research team about (hypothetically) people who checked both 2 and 3, for example, because they felt their reading ability fell between basic and intermediate. In that case, their score would be rounded up to the next whole number. The MAX function then, would give a 3, so also working in that case, which didn’t actually occur in these data yet, but we like to be prepared.

Captain Obvious and SAS Enterprise Miner

ByAnnMaria De Mars June 15, 2014June 16, 2014

Maybe this is obvious, but I have often found that what is obvious to some people is not so obvious to others, so here are a few random tips. 1. Enterprise Miner can take a REALLY long time to load during which you wonder if anything is happening at all. Open up the task manager…

Software

I feel a macro coming on: Part 2 positional, optional & default parameters

ByAnnMaria De Mars November 4, 2012February 18, 2018

Yesterday and the day before, I gave an example of using SAS to sort student responses into the class they were in using the DATEPART function, TIMEPART function and a few DO-loops. After making sure my code runs, I decided it was pretty redundant and thus a classic case for a macro. What I want…

20 Day Blogging | Dr. De Mars General Life Ramblings | Software

Website to Die for : Day 3 of the 20-day blogging challenge

ByAnnMaria De Mars January 9, 2014January 14, 2014

The question for Day 3 is : “What is a website that you cannot live without? Tell about your favorite features and how you use it in your teaching and learning.” The first part is easy. Oh my God, I love, love, LOVE stackoverflow, a site where all of your programming questions are answered. It’s free…

Software

Never Believe the User

ByAnnMaria De Mars January 5, 2011

You know that guy, supposedly a program, in Tron, the one that yells, “I serve the user”. Well, he never met the first lead engineer I worked with. Reading Donald Farmer’s post “Is it really so?”, I was reminded of something that happened decades ago and it was a lesson I never forgot. I was…

Software | statistics | Technology

You Lost Me at “Compute Analysis of Variance by Hand” and When Your Server Went Down for 14 hours

ByAnnMaria De Mars December 8, 2013June 8, 2014

I was reading the powerpoints that came with a textbook, you know, in the instructor’s packet, and I was already thinking this book was a little more focused on computation over comprehension for my liking when I came to the following learning objective: “Compute an Analysis of Variance by hand.” Are you fucking kidding me?…

Software

Bugs on a Plane: Bad quotes & getting rid of character data

ByAnnMaria De Mars August 12, 2017August 12, 2017

Little known fact (because, seriously, how would you know) , I write a lot of code while sitting on a plane and I can’t always connect to the Internet. NOT ALL QUOTATION MARKS ARE CREATED EQUAL Sometimes, when I copy and paste my code into SAS Studio, it doesn’t work. if compress(q23) = “3/4” then…

3 Comments

Sara says:

March 20, 2012 at 7:29 am

I would have liked to be a fly on the wall in the room when you were talking with the team and realized their bizarre reinterpretation of the concept of variables and variable levels.
Annmaria says:

March 21, 2012 at 1:56 am

It wasn’t their fault. When you download surveymonkey data in SPSS format this is what you get.
Pingback: It Beats Working Backward from the Semi-Colon : AnnMaria’s Blog

Similar Posts

3 Comments

Leave a Reply