snow

 

Since there is a blizzard and I’m inside analyzing data, I thought it was a good time for another random SAS tip.

The default output type for SAS 9.4 is HTML, which is nice for presentation and sharing, but sometimes I would like a plain text output, especially if I’m going to be doing something like copying and pasting the output into my program.

You can easily toggle between plain text and HTML output by doing this

Go to the TOOLS menu, select OPTIONS and then PREFERENCES.

In the Preferences window, click on the RESULTS tab.

window with listing preference

 

Check the box next to Create listing and uncheck the box next to Create HTML . Click OK.

Now your results will be in plain text output instead of HTML. To switch back, just go through the same steps, uncheck the listing box and check the HTML box.

Speaking of copying your output into your program ….

I wrote previously about a problem where  I needed to do, among other things, a PROC CONTENTS with the variables in the order they occur in the dataset, not alphabetical order (use varnum) .

 

If you just copy and paste the output of a typical CONTENTS procedure and all you want is the variable names,  you will have a lot of stuff about data type, length and label that you need to delete. Also, my life experience has been that the keystrokes you make (including the backspace key), the more likely you are to make mistakes. When it comes to the keyboard, less is more.

What I really want is the variables output in the order they appear and nothing more. The SHORT option does this for me.

proc contents data= in.sl_pretest varnum short;

and produces this output

Which_choice_is_the_same_as_the What_is_five_time_six__ Fred_walked_for_one_hour__How_ma How_is_nine_thousand__thirty_sev There_are_124_students_making_th
Valerie_has_225_pennies__She_div Joe_did_this_division_problem Which_sign_goes_in_the_space_to

What possible good is that mess? Well, I copy and paste it under the RENAME keyword and then hit the spacebar between each variable name and type = q1   , or whatever number,  like so

RENAME

Which_choice_is_the_same_as_the = q1

What_is_five_time_six__  = q2

Fred_walked_for_one_hour__How_ma = q3

 

As I mentioned in the previous post, I could not do this using an array statement because the data were of mixed type, character and numeric, and SAS does not accept data of mixed types. I also mentioned how to get around that so if you are interested, go back and read that post.

This week I had one of those pain-in-the-ass problems. I had a test with 24 items but they were of mixed types. That is, for some the answer was multiple choice and for others it was numbers.

The data was received as an Excel file.

Now, I could have opened it with SAS Enterprise Guide and specified data types for each variable, but the problem is, I am going to get this particular data set over and over, so I want code I can write once and run every time.

As if that wasn’t bad enough, the variables all had names like:

which_choice_is_the_same_as_the_

I wanted to rename these all to something sane like q1, q2 etc.

The first step was an option I don’t think I’ve ever used before, oddly enough.

 

proc contents data= annoying  varnum ;

 

Normally, SAS gives you the variables in a data set in alphabetic order when you do a PROC CONTENTS. The varnum option lists the variables in the order they appear on the data set. This was immensely helpful because it spared me going through the data trying to figure which was the first question, which was the second, and so on.

I just copied the variables in order after a RENAME statement and tacked on an =q1, q2, etc. like so

Data better ;

set annoying ;

rename

which_choice_is_the_same_as_the_ = q1
what_is_five_time_six__ = q2

etc.

proc contents data= better ;

I could have combined this with the previous step , but the fact is that unless the data set is really gigantic, the time that needs to be preserved is not computer processing time but my time, and this way was quicker because I didn’t have to write out those ridiculous variable names and worry about the program failing because I used  _ in the name instead of __ .

SAS does have a function to detect variable type, but that wouldn’t really have helped me because I still need to write all of these variables into a single array of item1 – item24 for later use, and you cannot have mixed type arrays. So, I did this

 

data mo_better ;
set better ;
array qs{*} q2 – q6 q10 q12 q14 -q16 q19- q21 ;
array itemN {*} $12 item2 – item6 item10 item12 item14 – item16 item19 – item21 ;
array qsA {*} $12 q1 q7-q9 q11 q13 q17 q18 q22- q24 ;
array itemA {*} $12 item1 item7 – item9 item11 item13 item17 item18 item22- item24 ;
do i = 1 to dim(qs) ;
 itemN{i} = put(qs{i},12.) ;
end ;
do j = 1 to dim(qsa) ;
 itemA{j} = qsa{j} ;
end ;
drop i j q1 – q24 ;

I have 4 arrays. The first consists of the numeric variable type questions. I couldn’t use _numeric_ to create an array of all numeric variables because there were others in the data set that were NOT test questions but were numeric and I did not want them in my array. I had to actually list each variable individually or in a range like q14-q16.

The next array is the one I am going to recode the variables into as character variables. Notice that character arrays need a $ and a length. The next two arrays are the character variables and the variables I’m going to copy them into. I could have just renamed the character variables in a RENAME statement and then changed the length in an ATTRIB statement  but  it would have taken more typing.

The DIM function is the dimension of the array, so it is going to loop through from 1 to however many variables in the array because I didn’t feel like counting them.

The PUT function is going to put this numeric variable into a new character variable with the specified length. It changes the variable to character.

The next loop just puts all the character variables into other character variables with the names item1, item7, etc. Now I have variables that are all the same type and length, named item1 – item24 and I can do things with them like compare each student’s response to each variable to the answer key, score it right or wrong and sum up the scored items, like this (1ANSWER is the first username)

 

Data in.pre_scored ;
set mo_better ;
by username ;
Array scored {24} sc1 – sc24 ;
Array items {24} $12 item1- item24 ;
Array ans{24} $12 ans1 – ans24 ;
if _n_ = 1 then do i = 1 to 24 ;
ans{i} = items{i} ;
end ;
else do i = 1 to 24 ;
if ans{i} = items{i} then scored{i} = 1 ;
else scored{i} = 0 ;
end ;
Retain ans1 – ans24 ;
total = sum(of sc1-sc24) ;

Since this is part of a two-year grant and I am going to receive these same test data sets many times, I am now finished with reading in and scoring the data for the next two years. After this, I just need to import the excel file and click run. I am happy.

Also curious, because I noted that this year’s pre-test scores are 1.5 standard deviations higher than the previous year.  I suspect this is because we have many more fifth-graders in this sample. So … with the scoring done automatically, I can now go on to interesting stuff.

If you want to check out the game these results came from, you can read about it here

 

I read a blog post where the author said the women who dropped out of programming “should have been discouraged” because it’s not for everyone and many women try to use smiles and flattery to get men to do their work for them.

I actually have had the experience the author cites, but with both men and women. It’s true there are some people in the tech field who are very introverted or socially inept. They are willing to help you with your technical problems if you will just stop by and have a cup of coffee and chat with them.

I’m not that person. I have a husband and four daughters. Interestingly,The Invisible Developer, who is so introverted as to be never seen in public is also not that person. He has me and aforementioned four daughters. That is enough for him.

Clearly, people who want you to do their work for them are annoying, however, I haven’t found them to be limited to one gender at all.  Lately, I’ve been wondering whether they are like that in SOME  cases because they don’t believe they can learn to do it themselves. I don’t know the answer to that.

What I do know, though, is that over the years I have known many people to succeed in areas I would not have given them a chance. Two very fine physicians that I know didn’t attend the best high schools, have the grades as undergraduates and honestly,  I didn’t think they had a prayer of  getting into medical school, much less succeeding. Neither got accepted in medical school the first year that they tried. People I would not have given the chance of a prayer in hell of becoming elite athletes have often gone on to surprise me, including a couple who won Olympic medals.

Life discourages people enough. Don’t add to it!

That advice is particularly true for programming. The last couple of days have been discouraging. We had our next install almost ready and then I found some bugs in it. Then we thought it was done, and I found some more bugs in it.

Yes, he's a man and we work together to fix stuff

The Invisible Developer is upstairs fixing those and testing the latest version. I am downstairs fixing his code on the next game (so much for women wanting men to do their work for them, and he is definitely a man. I can point to fact of having collaborated in producing The Spoiled One as irrefutable proof of said manliness. Photograph attached.). Actually, he’s brilliant and totally capable of fixing it himself, but he was already working on the other game.

Everyone’s code, if it is the least bit complicated, is going to have bugs in it. Sometimes it can take you days to find them.

Some days we succeed in writing quizzes where students can drag and drop answers, video clips with sound and animation play in response to correct answers with dialogue in English and Dakota, and then the student is transported back to a 3-D virtual world to continuing playing.

Other days, nothing happens. Just nothing. There are no errors in our consoles, just a screen looking obstinately back at us refusing to do what it’s supposed to do.

Programming is discouraging some days on its own and the LAST thing you need those days is someone saying,

“Maybe you’re just not cut out for this. “

I was complaining about how today had just not been productive, that I wanted to have the latest fixes on Spirit Lake in the hands of the teachers today but it wasn’t saving the game state frequently enough. While The Invisible Developer worked on that I found that some of the quizzes in the next build of Fish Lake were telling the student the answer was wrong even when it was right.

The Spoiled One said,

“Don’t worry, Mom. You’ll figure it out. You have time. Life is long.”

You know what? She was right. We figured it all out today. People should be encouraged. I’m proud of that she has figured this out at not-quite-sixteen.

One of the many questions on start-up accelerator applications that make me go “Hmm”, is this question :

How many lines of code have you written?

I have heard of, but thankfully never worked at, organizations that evaluated their technical staff by the lines of code written.

Let me give you two stories that illustrate why this is a bad example.

fairy

Once upon a time ….

Many years ago, I worked at an organization that decided the programming staff was overpaid and generally had a bad attitude. (No, this wasn’t due solely to me. In fact, unbelievably, I was one of the easier to get along with people on the technical staff).  So … they hired some people at low salaries who had, I believe, a three-month training course in SAS. Most of the senior people avoided the cube farm where these new hires were housed, believing that it would be apparent soon enough that you get what you pay for.

I would generally come in around 10:30 or 11 and leave the office around 8 pm. I couldn’t help but notice several times that some of these new programmers were still there when I left. Leaving one evening, I saw one woman in tears in her cubicle, so I stopped and asked what was the matter. She said she had come into the office at 6 a.m. and was still waiting for her program to run. I sat down with her and looked at her program, which was a simple thing to create a few total and subtotal scores and get statistics on these by state. Her program looked like this:

 

LIBNAME  in “directory”;

Data Alabama ;
set in.us ;
If var1 = .  then var1 = 0 ;
If var2 = .  then var2 = 0 ;
If var3 = .  then var3 = 0 ;
Total = var1 + var 2 + var 3;
If state= “Alabama” ;

run;
Proc means data = alabama ;
var total ;
run;

REPEATED 50 TIMES (50 states + Washington, DC) for a total of 562 lines of code (there is only one Libname statement).

The reason it was taking so long is that she was reading in this dataset with millions of records 51 times. There are many ways this could be fixed. Since I was on my way home, I sat down and did this.

libname mydata “directory”  ;
data test ;
set mydata.us ;
total = sum(var1,var2,var3) ;
keep total state ;

Proc tabulate data= test ;
class state ;
var total ;
Table state ,(total*(n*f=comma12.0(mean std)*f=comma8.2) );
run ;

My program was 10 lines, read the dataset in once and produced a nicely formatted table.

So, was she 60 times more productive? I don’t think so.

Story number two happened in the last week. I have been working on improving our two games, Spirt Lake, and particularly Fish Lake. A major improvement has been merging multiple scripts into one.

Here is what we did with our prototype, since we had to meet a deadline:

  • Wrote a script to handle multiple choice tests.
  • Wrote another script to handle tests that had an integer or decimal answer.
  • Wrote a third script to handle tests that had a fraction as an answer, like 4/5 , to be sure it also accepted 8/10, etc.
  • Wrote a fourth script to handle tests where the answer was dragged and dropped.

etc.

Now obviously, de-bugging would be simpler if we have only one or two scripts. So, this week, I have been taking a couple of scripts and making them more generalizable and deleting many others.

Another thing I’ve done is create a CSS style sheet for each game and included that link in files instead of having the common classes defined in each page.

 

The number of code in the project has gone DOWN by hundreds of lines, but I think the ease of maintenance and documentation has gone UP.

Now, if you asked me how many lines of code I have written in my life, that might be a relevant question. (True story, I once worked on a job where I did repeated measures ANOVA so many times for so many projects, I got so bored, I started writing statements backward beginning with the semi-colon.)

Well, I better get to bed since it is well past midnight, I have seven teenagers sleeping over at my house and I have to get up in the morning and take them all to Disneyland for The Spoiled One’s birthday.

7 Generation Games Logo

BTW – You can buy Spirit Lake: The Game here 

The Invisible Developer had commented that I write an awful lot about SAS and maybe I should write about some other language. For Christmas last year, someone gave me an impact.js license so I made a little game where players drop snares to catch rabbits and collect berries. This doesn’t have much educational value,  I was just playing around. I thought it would be amusing to have the food items they collect in the game be equal in value to the number of calories in that item.

If you have impact and wanted to do this yourself, here is what you would do.

1. Basic stuff - include game.entities.berry, game.entities.rabbit and any other food item in your main.js script. It goes right at the beginning with any other entities you require

 

ig.module(
‘game.main’
)
.requires(
‘game.entities.berry’,
‘game.entities.rabbit’,

– more stuff –

)

2. Create the score in your game info function that stores information

GameInfo = new function(){
this.food = 0;
— other stuff you want to initialize
}

 

3. When you extend the game to add your own cool stuff include an addFood function

MyGame = ig.Game.extend({

– init and other functions

addFood: function(amt){
//pickup item
GameInfo.food += amt; //add caloric value to the food score
}
,

– draw and other functions

 

4. To each entity script, add a function that defines how the player gets the food. Here are two examples.

Collecting berries

In the case of the berries, the player will just walk by the bushes and collect the berries. Think Pac Man!

In your berry.js file add a check function like this

EntityBerry = ig.Entity.extend({

— other stuff

check: function(other){
if (other.name == “player”){
ig.game.addFood(5);
this.kill();
}}
})

So …. it is about 5 calories per berry. When the player walks by a bush and comes into contact with a berry (picks the berry), the berry disappears and the player’s food count goes up by 5.

Snaring rabbits

Here is a second example. In this one, they drop snares around the virtual woods and when they snare a rabbit they get 1,000 points which is the approximate calorie content of a dressed rabbit, according to the USDA Nutrient database . I assumed this yielded an average of 2 pounds of meat.

For my rabbit I have extended the rabbit.js script as follows

EntityRabbit = ig.Entity.extend({

— other stuff

kill: function(other){
ig.game.addFood(1000);
this.parent();
}
})

But what is going to kill my rabbits? The snares, of course, so I added this into my snare.js script

EntitySnare = ig.Entity.extend({

— other stuff

check: function(other){
if (other.name == “rabbit”){
other.receiveDamage(100,this) ;
ig.game.addFood(1000);

}}
})

Since the rabbit only has 100 health points, that kills it off so your rabbit disappears and your food value goes up by 1,000.

As you can see, you could easily add shooting deer, buffalo and other food in the same way.

——————————————–

After I had played around with this for a bit, I thought it was a waste to just trash it so I put it into our upcoming game, Fish Lake, in between levels. When they finish Level 3, they play this game and then go on to Level 4. Our main game is 3-d, this is just a little interlude. I like to throw surprises into the game so kids like it and keep playing.

—————————————-

Someone in Los Angeles was very upset by our Spirit Lake game where players shoot wolves and buffalo. She said she just could not kill animals. (The Invisible Developer asked me if she was aware that they were virtual animals and not real.) I told her that our games are based on Native American history and history is what happened, not what you think should have happened or wanted to happen. In fact, there is a very touching story in Fish Lake narrated by Debbie Gourneau of the Turtle Mountain reservation on how many people died of starvation and how many more would have died were it not for the jackrabbits.

——————————

buffalo in the snowClick here to get Spirit Lake: The Game for $9.99

 

P. S. The amount of information produced by USDA is nothing short of amazing, and I don’t say that just because they funded are grant. They really are incredible.

 

 

From the random file — I’ve been super-busy working on our new startup, 7 Generation Games , and Darling Daughter Number Three had to defend her world title again which distracted me a bit, so I have a bunch of half-written posts, I thought I’d just put up at random, for the same reason I do everything else on this blog, the hell of it.

902q798q467453q965pq86-34q9e’w5wi34ytrsghsf.ksfbcmn  - random!

I spend some time playing with other people’s data for a whole lot of reasons – for students to analyze as a learning experience, because I’m interested in a problem addressed by the data, to create presentations for elementary schoolchildren showing what one can learn from statistics.

Here are a few tips that may make your life easier:

Read the user’s guide. Most of all check to see if this is a random sample. If you are just using the data for the purpose of teaching your students who to compute a t-test, then it really doesn’t matter whether it is a completely random sample or not. However, if you are going to be drawing any conclusions based on these results, make sure you know whether the data should be weighted, stratified, or just really not used to generalize to the population at all. If your sample consists of actuaries who are also equestrian competitors, I’m afraid not too much generalization should occur. (Don’t write and tell me about your horse, Beau, and how the two of you are exactly representative of the state of Vermont. You’re not and I don’t care any way.)

Much of the open data I work with is very large data sets and I spend several hours trying to get a feel for the data before I do much with it. If I’m going to use the same data set for a course with a lot of students, I’d like it to have lots of variables, and many of them to be numeric so the students could combine them into scales, do a factor analysis or other quantitative uses and they wouldn’t end up all  using the same few numeric variables. They could have a little individuality in their research question and design.

One way to find number of numeric variables in a data set using SAS.

data testmiss ;
set in._500family ;
array allnums {*} _numeric_ ;
x = dim(allnums) ;
proc means data = testmiss ;
var x ;
run ;

 ++ Equally Random +++

artwork from game

If you buy the beta for Spirit Lake now for $9.99 you’ll get our version 2.0 for free in May. It will be good.  I’ve been working on the newest game, Fish Lake for the last two weeks, but soon I’m going to swap with The Invisible Developer and do nothing but work on Spirit Lake for another few weeks.

 

drinking for science

There are multiple reasons that I haven’t gotten around to Day 10 of the 20-day blogging challenge. In part, because I have been really busy, and the other part is because I read this topic,

“Share ideas that your classroom uses for brain breaks and/or indoor recess”

and I thought

I got nothin’

Anyone who knows me well can tell you that I am NOT a very fun person. I like to think that I have some good qualities, but playfulness is not among them. Ph.D., world champion, founded/ co-founded a few companies, publishes scientific articles – does this sound like I spend a lot of time playing frisbee in the park? No, I didn’t think so. About the closest I come to this in class is on the first day having everyone introduce themselves and talk about their research interests – which is not really very close, I must admit.

For the last SAS assignment of the Public Health Research Methods course, I decided to make a video and upload it to youtube. For one of the dependent variables, I used how often in a year a person engaged in binge drinking, defined as 4 or more drinks per day. I’ve probably had four drinks in a day a few times in my LIFE so I was surprised to find that the average person  (out of over 40,000), said they did this on average 2.4 times per year.

Today has been a really frustrating day. Yesterday, after a margarita at dinner, I came home and was working on our newest game, Fish Lake, and everything was progressing smoothly. Today, for both The Invisible Developer and I, it has been just beating our heads against the wall. For example, I have this PHP script that ran intermittently today – I have three records written to the database – and all of the rest of the times, it failed with an error. The I.D. has been having similar problems.

I took a break and made a video on how to do simple statistics with SAS to test the hypothesis that I could do a screen recording with Quicktime, write a program using SAS On-Demand in Firefox, record the audio in Garageband and drink Chardonnay all at the same time because Von’s had a half-price sale on wine over $20 a bottle and, well, you know – science.

You can determine if my hypothesis – whatever the hell it was – was supported. Bizarrely, the equals signs do not show up in the video. How weird is that?

Today I’m getting around to day nine of the 20-day blogging challenge while I wait for The Invisible Developer to get out of the shower where he is curled in a fetal position whining about having to go outside when it is 14 below zero. Actually, he is probably just taking a shower, but lots of whining has taken place this week, let me tell you.

Today’s question is what did I do this week that would I do again in teaching or what would I not do again. I think I’ll answer both. Coincidentally, (or maybe not, since I’ve been working on a course re-design to incorporate SAS programming), both of those things have to do with SAS. Short version, if your data are in a form amenable to SAS, it is a godsend for teaching statistics. If your data are not in a very SAS compatible format, it just blows. If, God forbid, you are limited to using SAS On-demand, as I am this week because I have yet to receive the Windows 8 compatible version from the university, and I am in North Dakota, working on my laptop, well then, your life is about to suck, I am sorry to say.

The thing I would totally do again, if I was teaching an epidemiology course, is PROC STDRATE. I love everything about this procedure. The documentation explains the procedure in very plain language which I did not have to rewrite at all for the students, I just included the overview in my livebinder.

“Two commonly used event frequency measures are rate and risk:

  • A rate is a measure of the frequency with which an event occurs in a defined population in a specified period of time. …

  • A risk is the probability that an event occurs in a specified time period. “

It also includes datasets that can be used as an example, and they are easily typed or copied and pasted into your SAS program. Further, these data are very similar in format to the types of data that students will usually come across. Most important, this is one of the most useful procedures for students beginning to learn epidemiology, providing a lot of statistics in one- population attributable risk, population attributable fraction, standardized morbidity rate and more. It will save loads of time over computing statistics on a calculator to answer homework questions – which I think is just silly, because it is 2014 and we have computers. Also, the syntax is relatively easy.

You can read one example of using STDRATE for crude risks, reference risk and attributable fractions here.

So, that was the good part. What I would never do again, if I had any choice at all, is

a) Use SAS to create maps, or really, analyze in any way, data that was either not already in a SAS dataset or in a very easy to read format, e.g. , no missing data, no variable length variables, and

b) Use the SAS Enterprise Guide version of SAS On-Demand for anything, ever

There are some significant drawbacks of the SAS Web Editor as well but they pale in comparison with the slowness of SAS Enterprise Guide in the on-demand version. While some programs you could maybe get a cup of coffee while waiting for it to run, with the on-demand version of SAS EG you can drive to Starbucks, wait in line, by your coffee, drive back to the office, park, take the elevator to your floor and STILL be there just about when your cross-tabulation had completed. It’s ridiculous, which is sad because if it ran ten times faster it would be a really great tool. It’s terrific on my desktop.

Someone on twitter commented that they hated SAS because it did not play well with open data. Aint that the truth! Now the exception is if you can get  your data in a SAS dataset format. Then it’s wonderful. Well, I was using HIV prevalence data from gapminder.org  – great site, by the way – and it took me an HOUR to get it read by SAS Web Editor. You can only upload csv files or SAS files to the web editor, so I couldn’t use PROC IMPORT to read in the Excel file. The data I had used country name as the ID and that didn’t match with the ID in the SAS map files – it’s a long sad story with the moral that if I had the option of not using SAS for maps I would certainly be looking into that right now and if I never have to use SAS Enterprise Guide again (which only seems to have the US map in the On-demand version anyway) it will be too soon.

Yes, in the end, I did get my world HIV in the end. The computer will not defeat me!

world_map

 

Day eight of the 20-day blogging challenge was to write about a professional read – a book, article or blog post that has had an impact on me. To be truthful, I would have to say that the SAS documentation has had a profound impact on me. SAS documentation is extremely well-written (to be fair, so is SPSS) in contrast to most operating system documentation which is written as if feces-flinging monkeys were somehow given words instead, which they flung onto a page which then became a manual. But I digress – more than usual. It’s not reasonable to suggest to someone reading the entire SAS documentation which is several thousand pages by now. Instead, I’d recommend Jennifer Waller’s paper on arrays and do-loops. This isn’t the paper where I first learned about arrays – that was before pdf files and I have met Jennifer Waller and she was probably barely in elementary school at the time. It’s a good paper though and if you are interested in arrays, you should check it out.

Here is what I did today, why and how. I wanted to score a dataset that had hundreds of student records. I had automatically received the raw score for each student, percent correct and what answer they gave for each multiple choice question. I wanted more than that. I wanted to know for each question whether or not they got it correct so that I could do some item analyses, test reliability and create subtests. This is a reasonable thing for a teacher to want to know – did my students do worse on the regression questions, say, than the ones on probability, or vice-versa?  Do the data back up that the topics I think are the hardest are the ones that my students really score worst on?  Of course, test reliability is something that would be useful to know and most teachers just assume but don’t actually assess. So, that’s what I did and why. Here is how.
filename sample “my-directory/data2013.csv”;
libname mydata “mydirectory” ;
data mydata.data2013 ;
infile sample firstobs = 2 dsd missover ;
input group_type $ idnum $ raw pct_correct qa qb qc q1- q70 ;

** These statements read in the raw data, which was an Excel file I had saved as csv file ;
** The first line was the header and I forgot to delete it so I used FIRSTOBS = 2 ;
*** That way, I started reading at the actual data. ;
*** The dsd specifies comma-delimited data. dlm=”,” would have worked equally well ;
*** Missover instructs it to leave any data missing if there are no values, rather than skipping to the next line ;

Data scored ;
set mydata.data2013 ;
array ans{70} q1- q70 ;
array correct{70} c1 – c70 ;
array scored{70} sc1 – sc70 ;

*** Here I created three arrays. One is the actual responses ;
*** The second array is the correct answer for each item ;
*** The third array is where I will put the scored right or wrong answers ;

if _N_ = 1 then do i = 1 to 70 ;
correct{i} = ans{i} ;
end ;

*** If it is the first record (the answer key) then c1 – c70 will be set to whatever the value for the correct answer is ;

else do i = 1 to 70 ;
if ans{i} = correct{i} then scored{i} = 1 ;
else scored{i} = 0 ;
end ;

**** If it is NOT the first record, then if the answer = the correct answer from the key, it is 1 , otherwise 0 ;

Retain c1 – c70 ;

**** We want to retain the correct answers that were in the key for all of the records in the data set ;
**** Since we never put a new value in c1 – c70, they will stay the correct answers ;

raw_c = sum(of sc1 – sc70) ;
*** This sums the raw score ;

pct_c = raw_c/70 ;
*** This gives a percentage score :

proc means data=scored ;
var sc1-sc10 c1 -c10 ;

*** This is just a spot check. Does the mean for the scored items fall between 0 and 1? Is the minimum 0 and the maximum 1 ;
*** The correct answers should have a standard deviation of 0 because every record should be the same ;
*** Also, the mean should either be 1, 2, 3, 4 or 5 ;

proc corr data = scored ;
var raw_c pct_c raw pct_correct ;

*** Spot check 2. The raw score I calculated, the percent score I calculated ;
*** The original raw score and percent score, all should correlate 1.0 ;

data mydata.scored ;
set scored ;
if idnum ne “KEY” ;
drop c1-c70 q1-q70 ;

*** Here is where I save the data set I am going to analyze. I drop the answer key as a record. I also drop the 70 correct answer fields and the original answers, just keeping the scored items ;
proc corr alpha nocorr data= mydata.scored ;
var sc1 – sc70 ;

*** Here is where I begin my analyses, starting with the Cronbach alpha value for internal consistency reliability ;

I want to point something out here, which is where I think the professional statisticians are maybe distinguished from others. It’s second nature to check and verify. Even though this program should work perfectly – and it did – I threw in reality checks at a couple of different points. Maybe I spelled a variable name wrong, maybe there was a problem with data entry.

One thing I did NOT do was write over that original data. Should I decide I need to look at what the actual answers were, say, I wanted to see if students were selecting chi-square instead of t-test (my hypothetical correct answer), that would alert me to some confusion.

Incidentally, for those who think that all of the time they save grading is taken up by entering individual scores, I would recommend having your students take tests on the computer if you possibly can. I was at a school today where we had a group of fourth graders taking two math tests using Google chrome to access the test and type in answers. They had very little difficulty with it. I wrote the code for one of those tests, but the other was created using survey monkey and it was super easy.

I’d love to include pictures or video of the kids in the computer lab but the school told me it was not allowed )-:

The question for Day 3 is :

“What is a website that you cannot live without? Tell about your favorite features and how you use it in your teaching and learning.”

The first part is easy. Oh my God, I love, love, LOVE stackoverflow, a site where all of your programming questions are answered. It’s free , you don’t have to register. You can just go there and search for an answer to why your css is not properly aligning 5 pixels from the left margin of the container, or whatever is bothering you at the moment. Normally, when I type a question into Google one of the first few hits will be on stackoverflow.com and I go read whatever it is. Even if my question isn’t answered, I’ll learn something and I can usually search the site or look at the related topics in the sidebar and find what it is I was trying to learn.

I can’t really say that I use stackoverflow for teaching, except for indirectly. One of The Julia Group companies, 7 Generation Games, is games to teach kids math and many of the problems I encounter are related to game development.

There are sites, I use for teaching and I was going to list more here but I peeked ahead and saw this question comes up again in the 20-day challenge so I’ll save those for later. There are a few other good sites, including a couple of blogs, that I like for statistics, SAS and SPSS but answering the first part of the question, what site, if I woke up tomorrow and it wasn’t there would you find me screaming NO- O – O – O !!! and searching for the nearest lake to drown myself in? Definitely, stackoverflow.com

 

lake for drowning in

Next Page →