Feb
15
How SAS Helped Me Make Our Best-Selling Educational Game: Part 2
February 15, 2018 | Leave a Comment
Last time, I gave a bit about the requirements of a game to match the most synonyms in one minute, and how what I learned using SAS was a basis for several parts of the game. This activity is going into Making Camp Premium, which will be a paid version of our best-selling game, Making Camp Ojibwe. I don’t know if you can call it best-selling because you can download it for free, and Spirit Lake has been around longer so has more players, but Making Camp gets more new downloads each month than any of our other games. This is surprising since the game is written in JavaScript and we have other games made with Unity that have way cooler effects. Just goes to show you can’t predict perfectly what kids will like.
While you are waiting for me to finish this game, head to the app store and get Making Camp Ojibwe , free, for your iPad.
Now, back to the synonyms game. We’d finished the timer, which, when it ended, showed your title points and a happy or sad image.
Okay, this first part is boring, just initializing a bunch of variables I will use later.
var thisone = 0; var boxmove = 0; var thisel; var thesepts = 0; var question ; var correct = 0 ; // This is the array of words. The first is displayed as the word to match ; // The next three words are synonyms and the last four words are incorrect answers ; var words = [ ["large", "big","enormous","gigantic","awkward","introspective","sane", "bulbous"], ["fast", "rapid","quick","speedy","awkward","boring","dull", "bulbous"], ["fat", "stout", "thick", "overweight", "thin", "unprofitable", "sense", "dazzling"], ["bad", "terrible", "not good", "awful", "couch", "sad", "ugly", "usual"], ["angry", "mad", "furious", "livid", "happy", "simple", "connected", "personal"], ["tale", "story", "fable", "yarn", "hind leg", "hippo", "newspaper", "earnest"], ["little", "small", "tiny", "itty bitty", "large", "thoughtless", "sleek", "perturbed"], ["strange", "odd", "queer", "weird", "couch", "sad", "ugly", "happy"], ["rare", "uncommon","unusual","not typical","irate","musical","aromatic", "within"] ];
I need more rows in this array. If you feel creative and want to help a sister out, post a word and 3 synonyms in the comments. Getting back to SAS, I have used SAS arrays since they first came out and were implicitly indexed. In other words, it’s been a minute. If one-dimensional arrays were great, two-dimensional arrays were great-squared. Some people will tell you that JavaScript does not have two-dimensional arrays and rather, you have an array of arrays. To those people, I say, “Bah, humbug!”
Systematic Random Sampling Saves the Day
Alrighty, then, on to creating the synonym problem. Sometimes you can be too clever. My challenge was to make sure that the choices were put in random order so that the first 3 boxes weren’t always the correct answer. I went through a lot of possible solutions where I tried to splice the array to pull out a word randomly used, then pull another random choice from the shortened array, using the length attribute.
After all of that, I realized there was a really simple solution. Pull out a random number. Take that and the rest of the items in the row, then start at the beginning again. Systematic random sampling. Yep. Super simple. Every useful programming language on earth has a random number function, including SAS, of course. First, we randomly pull a row out of the array. Then, we start with the n+1 word in that array, when n is a random number between 1 and 7. (Look at qnum to see how we get that). We pull the word that is in the n+1 position in the row and assign it to the first box. Then, the next box gets the next word in order. When we get to the end of the row, the next box will have the first synonym. So, if my random number is 5, the boxes for the choices are words # 5, 6, 7, 1 , 2, 3, 4 and boxes 4-6 are the correct answers.
Next, we have a
for (var i=1; i < 8; i++) {
some code
}
Really it is the exact same as
DO i = 1 to 7 ;
*** some code ;
END ;
After that, there are some IF- THEN – ELSE and assignment type statements. The only thing not really applicable to SAS is draggable function and appending some divs to the page.
I started this post writing about how everything in SAS made it easy for me to develop games using JavaScript but now that I think of it, it would work just as well the other way and if you know some JavaScript, learning SAS would be a piece of cake. You can check out the code below. It’s getting late here in Santiago, Chile and I still want to call my infinitely patient husband back in California so I’ll pick up next time on scoring the answers right or wrong.
/* THIS CREATES THE PROBLEM. A word is selected randomly from the array, then the start point in the list of synonyms is randomly selected. This is systematic random sampling. The words are put in boxes for the divs starting with the random number and when it gets to 7, it goes back to the beginning of the word list (but after the word you are finding the synonym for, that's why you need the 1+ ) Divs that get the first 3 synonyms in the array are assigned a class of 'right' and the others are assigned a class of 'wrongb'. Draggable function is assigned to each of the choice boxes created. If the choice is correct, the variable thisone is assigned the value of 1 when the box is dragged; */ function createProblem() { question = Math.floor(Math.random()* words.length); $("#segment2").text(words[question][0]); var qnum =1 +Math.floor((Math.random() * 7)) ; // Start at random number ; for (var i=1; i < 8; i++) { divid = "#div" + i ; var boxid = "#box" + i ; if (qnum < 4) { $(divid).append('<div class="smallbox draggable right" id="' + boxid + '">'+ words[question][qnum] + '</div>'); } else { $(divid). append('<div class="smallbox draggable wrongb" id="' + boxid + '">' + words[question][qnum] + '</div>');} $('.draggable').draggable({ start: function (event, ui) { if ($(this).hasClass('right')) { thisone = 1; thisel = this; } else { thisone = 0; } } }); if (qnum < 7) {qnum++;} else {qnum = 1; } } } // END CREATION OF WORD BANK PROBLEM ;
Feb
14
SAS taught me how to make best-selling games
February 14, 2018 | 1 Comment
I’m going to be speaking at SAS Global Forum about the places you can go starting your career with SAS, for example …
If you know anything about SAS, you might think from the title that I used my mad data analysis skills to figure out what works and what doesn’t for games. While that is somewhat true, it is not at all what this post is about. In fact, learning SAS first helped me a lot when it came to actually MAKING games. No, there is not a lick of SAS code in our games, but the concepts and ideas came to me fairly easily because of my experience using SAS.
(If you read this and start to post a comment saying I could have learned everything here from Python or C or whatever your favorite language is, I am sure you are right. The fact is, though, I didn’t. )
Let me give you an example:
The object of the game is to match as many synonyms as possible in one minute. This is what has to happen:
- On loading the page, randomly select a word to display on the screen, start the timer and music
- Show the number of seconds on the page, going down every second
- On the page, show 7 other words, 3 that are synonyms and 3 that are not synonyms, making sure that the correct and incorrect words show up in random order.
- If the player drags a correct word into the box, it turns green and adds 1 point to the score.
- If the player drags an incorrect word, the box turns red
- If all three choice boxes are filled, all the boxes are cleared and a new word and choice boxes are shown
- When the time is up, if the player has a perfect score, show a happy image and appropriate text.
- If the player doesn’t have a perfect score, show a less happy image and appropriate text.
- When time is up, show a button the player can click to play again.
What in the heck does all of that have to do with SAS? It’s all written in JavaScript, the reason for that is a post for another day, but let’s look at some code:
<script type="text/javascript"> $(document).ready(function () { //Timer script ; var time = 60000; var timer ;
This first bit just starts a script, and the beginning of a function that will execute when the document is ready. That is, I don’t want JavaScript to try acting on elements that aren’t loaded yet. My first exposure to writing functions was in the 1980s. It was a very significant event. I swear, I even remember the cramped graduate assistant office at the University of California, Riverside where I read my first book of SAS macros. I think it was a book of macros written by users. This is how we distributed things before the Internet. I thought the idea of writing my own functions was the coolest thing I had ever heard.
Now, for the timer. Everyone knows what a variable is, or you do if you did anything with any language. Here, I am initializing the time to 60,000 milliseconds. Initializing a variable, another basic idea I learned from SAS. I’m going to use that other variable, timer, later to execute the myTimer function. Just wait.
//Timer script ;
function myTimer() { if (time > 0) { var nowTime = time/1000 ; document.getElementById("timer").innerText = nowTime ; time = time - 1000; } else if (time <= 0) { document.getElementById("timer").innerText = "0"; clearInterval(timer); $("#form1").hide(); //IF ALL OF YOUR ANSWERS WERE CORRECT ; if (correct === boxmove) { $("#correct").text("PERFECT! You answered " + thesepts + " correctly."); $("#correcto").slideDown('slow'); playAudioLocal("../../sounds/correct1"); } else { $("#wrongo").show(); $("#incorrect").text("You answered " + thesepts + " correctly.").slideDown('slow'); playAudioLocal("../../sounds/flute"); } $("#redo").show(); }
Except for a few specific details, everything in the script above, I learned or improved from using SAS.
IF- THEN-DO-END – instead of DO and END , I have an opening { and a closing } but it’s the same thing.
If the time is greater than 0, the variable nowTime is going to be set to time divided by 1,000 since most people would prefer to see their time in seconds rather than milliseconds. By the way, nowTime is a local variable, defined within a function. Local variables is another idea I first learned from SAS macros, thank you very much. The text of the element in the page named ‘timer’ is now set to whatever the number of seconds remaining is (nowTime). We deduct another milliseconds from time.
ELSE – DO is another common SAS bit of code . If there is no time left, do all of this stuff, e.g., set the time value to 0, stop calling the timer function.
You can have nested IF-THEN-DO code in SAS, as I do here in my JavaScript.
While SAS didn’t introduce me to text functions, it’s where I learned a lot of them. Here, we have a JavaScript text function where I’m concatenating a string with a variable and then another string.
So, we’ve knocked off numbers 2, 7, 8 and 9. All of the showing and hiding elements had nothing to do with SAS . That part was straight jQuery but that was the easy part. Actually, this whole part was pretty easy. A few tricky bits show up later on. Maybe I’ll get to them in my next post. While you are waiting with bated breath …..
Check out Making Camp because maturity is overrated. Learn Ojibwe history, brush up on your math skills and build out your virtual wigwam.
Dec
9
Standardized testing: Solving your reliability problem
December 9, 2016 | Leave a Comment
One person, whose picture I have replaced with the mother from our game, Spirit Lake, so she can remain anonymous, said to me:
But there is nothing we can do about it, right?I mean, how can you stop kids from guessing?
This was the wrong question. What we know about the measure could be summarized as this:
- Students in many low-performing schools were even further below grade level than we or the staff in their districts had anticipated. This is known as new and useful knowledge, because it helps to develop appropriate educational technology for these students. (Thanks to USDA Small Business Innovation Research funds for enabling this research.)
- Because students did not know many of the answers, they often guessed at the correct answer.
- Because the questions were multiple choice, usually A-D, the students had a 25% probability of getting the correct answer just by chance, interjecting a significant amount of error when nearly all of the students were just guessing on the more difficult items.
- Three-fourths of the test items were below the fifth-grade level. In other words, if you had only gotten correct the answers three years below your grade level, the average seventh-grader should have scored 75% – generally, a C.
There are actually two ways to address this and we did both of them. The first is to give the test to students who are more likely to know the answers so less guessing occurs. We did this, administering the test to an additional 376 students in low-performing schools in grades four through eight. While the test scores were significantly higher (Mean of 53% as opposed to mean of 37% for the younger students) they were still low. The larger sample had a much higher reliability of 87. Hopefully, you remember from your basic statistics that restriction of range attenuates the correlation. By increasing the range of scores, we increased our reliability.
The second thing we did was remove the probability of guessing correctly by changing almost all of the multiple choice questions into open-ended ones. There were a few where this was not possible, such as which of four graphs shows students liked eggs more than bacon . We administered this test to 140 seventh-graders. The reliability, again was much higher: .86
However, did we really solve the problem? After all, these students also were more likely to know (or at least, think they knew, but that’s another blog) the answer. The mean went up from 37% to 46%.
To see whether the change in item type was effective for lower performing students, we selected out a sub-sample of third and fourth-graders from the second wave of testing. With this sample, we were able to see that reliability did improve substantially from .57 to. 71 . However, when we removed four outliers (students who received a score of 0), reliability dropped back down to .47.
What does this tell us? Depressingly, and this is a subject for a whole bunch of posts, that a test at or near their stated ‘grade level’ is going to have a floor effect for the average student in a low-performing school. That is, most of the students are going to score near the bottom.
It also tells us that curriculum needs to start AT LEAST two or three years below the students’ ostensible grade level so that they can be taught the prerequisite math skills they don’t know. This, too, is the subject for a lot of blog posts.
—-
For schools who use our games, we provide automated scoring and data analysis. If you are one of those schools and you’d like a report generated for your school, just let us know. There is no additional charge.
Nov
19
Standardized Testing In Plain Words
November 19, 2016 | 1 Comment
I hate the concept of those books with titles like “something or other for dummies” or “idiot’s guide to whatever” because of the implication that if you don’t know microbiology or how to create a bonsai tree of take out your own appendix you must be a moron. I once had a student ask me if there was a structural equation modeling for dummies book. I told her that if you are doing structural equation modeling you’re no dummy. I’m assuming you’re no dummy and I felt like doing some posts on standardized testing without the jargon.
I haven’t been blogging about data analysis and programming lately because I have been doing so much of it. One project I completed recently was analysis of data from a multi-year pilot of our game, Spirit Lake.
Before playing the game, students took a test to assess their mathematics achievement. Initially, we created a test that modeled the state standardized tests administered during the previous year, which were multiple choice. We knew that students in the schools were performing below grade level but how far below surprised both us and the school personnel. A sample of 93 students in grades 4 and 5 took a test that measured math standard for grades 2 through 5. The mean score was 37%. The highest score was 63%.
Think about this for a minute in terms of local and national norms. The student , let’s call him Bob, who received a 63% was the highest among students from two different schools across multiple classes. (These were small, rural schools.) So, Bob would be the ‘smartest’ kid in the area. With a standard deviation of 13%, Bob scored two standard deviations above the mean.
Let’s look at it from a different perspective, though. Bob, a fifth-grader, took a test where three-fourths of the questions were at least a year, if not, two or three, below his current grade level, and barely achieved a passing score. Compared to his local norm, Bob is a frigging genius. Compared to national norms, he’s none too bright. I actually met Bob and he is a very intelligent boy, but when most of his class still doesn’t know their multiplication tables, it’s hard for the teacher to get time to teach Bob decimals, and really, why should she worry, he’s acing every test. Of course, the class tests are a year below what should be his grade level.
One advantage of standardized testing, is that if every student in your school or district is performing below grade level it allows you to recognize the severity of the problem and not think “Oh, Bob is doing great.”
He wouldn’t be the first student I knew who went from a ‘gifted’ program in one community to a remedial program when he moved to a new, more affluent school.
—
Oct
24
Statistics Answers the Most Important Social Question
October 24, 2016 | Leave a Comment
Occasionally, when I am teaching about a topic like repeated measures Analysis of Variance, a brave student will raise a hand and ask,
Seriously, professor, WHEN will I ever use this?
The aspiring director of a library, clinic, afterschool program, etc. does not see how statistics apply to conducting an outreach campaign or HIV screening or running a recreational program for youth – or whatever one of hundreds of other good causes that students intend to pursue with their graduate degrees. Honestly, they often look at the required research methods and statistics courses is a waste of time mandated for some unknown reason by the University, probably to keep professors employed. Often, they will find a way to do a dissertation using only qualitative analysis and never think about statistics again.
This is a huge mistake.
For all of those people who say, “I never used statistics in my career”, I would answer, “well, I never used French in my career either and you know why – because I never learned it very well.”
Now, those people who don’t see a real use for French probably aren’t convinced. However, to me, it’s pretty evident that if I could speak French I could be making games in both French and English.
Actually, statistics can answer the very most important question in any social program – does it work?
So, I had written a couple of blogs about the presentation I gave at SACNAS (Society for the Advancement of Chicanos and Native Americans in Science) where I discussed using statistics to identify need for intervention and mathematics for students prior to middle school. I also gave examples of teaching statistics concepts in games.
The question is, did these games work for increasing student scores?
For this – surprise! Surprise! Drumroll – – – we used repeated measures Analysis of Variance. If you look at the graph below you can see that the students who played the games improved substantially more from pretest to posttest than the students in the control group.
This was a relatively small sample, because it was our first pilot study, and conducted in two small rural schools, that also happen to have very high rates of mobility and absenteeism, so we were only able to obtain complete data from 58 students.
Now, the results look impressive but where these differences higher than one would expect by chance with four groups (two grades from each school) of a fairly small size?
Well, when we look at the ANOVA results we see that the time by school interaction, which tests if one school changed more overtime than the other is quite significant (F = 7.13, P = .01). Yes, the P value equaled exactly .0100.
The time by school by grade 3 – way interaction was not significant. It’s worth noting that the fifth grade at the intervention school had less time playing the game due to logistical reasons – they had to schedule the computer lab as opposed to playing in their classroom, and sometimes, their class being scheduled later in the day, they missed playing the game altogether when school was let out early due to weather.
One way that I could reanalyze these data – and I will – would be to look at it not by grade but by time spent playing. So, instead of four groups, I would have three – those who played the game not at all, in other words, the control group, those who played at less than recommended and those who played it the recommended amount.
My point is that repeated measures ANOVA is just one of the many statistical techniques that can answer the most important questions in social programs – whether something works and under what conditions it works best. There’s also the question of who it works best for – and statistics can answer that too.
So, my answer to the student who questions if he or she will ever use this is, “if you’re smart you will.”
For all of those who have asked us if these data are going to be published, the answer is yes, we have two articles in press that should come out in 2017.
We are working on more in our copious spare time that we do not have, but right now we are focusing on game updates.
and on our new, free iPad game, Making Camp
Aug
15
A picture of a pretest: Statistical graphics
August 15, 2016 | Leave a Comment
A picture says 1,000 words – especially if you are talking to a non-technical audience. Take the example below.
We wanted to know whether the students who played our game Fish Lake at least through the first math problem and the students who gave up at the first sight of math differed in achievement. Maybe the kids who played the games were the higher achieving students and that would explain why they did better on the post-test.
You can see from the chart below this is not the case. The distribution of pretest scores is pretty similar for the kids who quit playing (the top) and those who persisted.
Beneath the graphs, you can see the box and whisker plots. The persistent group has fewer students at the very low end and we actually know why that is – students with special needs in the fourth- and fifth-grade, for example, those who were non-readers, could not really play the game and either quit on their own very soon or were given alternative assignments by the teacher.
The median (the line inside the box), the mean (the diamond) and 25th percentile (the bottom of the box) are all slightly higher for the persisting group – for the same reason, the students with the lowest scores quit right away.
These data tell us that the group that continued playing and the group that quit were pretty similar except for not having the very lowest achieving students.
So, if academic achievement wasn’t a big factor in determining which students continued playing the games, what was?
That’s another chart for another day, but first, try to guess what it was.
———–
Would you like to play one of our games? Check them out here – all games run on Mac and Windows.
What about Chromebooks? Check out Forgotten Trail.
Jul
20
Mom! That Evaluator Keeps Looking at Me!
July 20, 2016 | Leave a Comment
If I were to give one piece of advice to a would-be program evaluator, it would be to get to know your data so intimately it’s almost immoral.
Generally, program evaluation is an activity undertaken by someone with a degree of expertise in research methods and statistics (hopefully!) using data gathered and entered by people’s whose interest is something completely different, from providing mental health services to educating students.
Because their interest in providing data is minimal, your interest in checking that data better be maximal. Let’s head on with the data from the last post. We have now created two data sets that have the same variable formats so we are good to go with concatenating them.
DATA answers hmph;
SET fl_answers ansfix1 ;
IF username IN(“UNDEFINED”,”UNKNOWN”) or INDEX(username,”TEST”) > 0 THEN OUTPUT hmph;
ELSE OUTPUT answers;
PRO TIP : I learned from a wise man years ago that one should not just gleefully delete data without looking at it. That is, instead of having a dataset where you put the data you expect and deleting the rest, send the unwanted data to a data set. If it turns out to be what you expected, you can always delete the data after you look at it.
There should be very few people with a username of ‘UNDEFINED’ or ‘UNKNOWN’. The only way to get that is to be one of our developers who are entering the data in forms as they create and test them, not by logging in and playing the game. The INDEX function checks in the variable in the first argument for the string given in the second and returns the starting position of the string, if found. So, INDEX(username, “TEST”) > 0 looks for the word TEST anywhere in the username.
Since we ask our software testers to put that word in the username they pick, it should delete all of the tester records. I looked at the hmph data set and the distribution of usernames was just as I expected and most of the usernames were in the answers data set with valid usernames.
Did you remember that we had concatenated the data set from the old server and the new server?
I hope you did because if you didn’t you will end up with a whole lot of the same answers in their twice.
Getting rid of the duplicates
PROC SORT DATA = answers OUT=in.all_fl_answers NODUP ;
by username date_entered ;
The difference between NODUP and NODUPKEY is relevant here. It is possible we could have a student with the same username and date_entered because different schools could have assigned students the same username. (We do our lookups by username + school). Some other student with the same username might have been entering data at the same time in a completely different part of the country. The NODUP option only removes records if every value of every variable is the same. The NODUPKEY removes them if the variables in the BY statement are duplicates.
All righty then, we have the cleaned up answers data, now we go back and create a summary data set as explained in this post. You don’t have to do it with SAS Enterprise Guide as I did there, I just did it for the same reason I do most things, the hell of it.
MERGING THE DATA
PROC SORT DATA = in.answers_summary ;
BY username ;
PROC SORT DATA = in.all_fl_students ;
BY username ;
DATA in.answers_studunc odd;
MERGE in.answers_summary (IN=a) in.all_fl_students (IN=b) ;
IF a AND b THEN OUTPUT in.answers_studunc ;
IF a AND NOT b THEN OUTPUT odd ;
The PROC SORT steps sort. The MERGE statement merges. The IN= option creates a temporary variable with the name ‘a’ or ‘b’. You can use any name so I use short ones. If there is a record in both the student record file and the answers summary file then the data is output to a data set of all students with summary of answers.
There should not be any cases where there are answers but no record in the student file. If you recall, that is what set me off on finding that some were still being written to the old server.
LOOK AT YOUR LOG FILE!
There is a sad corner of statistical purgatory for people who don’t look at their log files because they don’t know what they are looking for. ‘Nuff said.
This looks exactly as it should. A consistent finding in the pilot studies of assessment of educational games has found a disconcertingly low level of persistence. So, it is expected that many players quit when they come to the first math questions. The fact that of the 875 players slightly less than 600 had answered any questions was somewhat expected. As expected, there were no records where
NOTE: There were 596 observations read from the data set IN.ANSWERS_SUMMARY.
NOTE: There were 875 observations read from the data set IN.ALL_FL_STUDENTS.
NOTE: The data set IN.ANSWERS_STUDUNC has 596 observations and 11 variables.
NOTE: The data set WORK.ODD has 0 observations and 11 variables.
So, now, after several blog posts, we have a data set ready for analysis ….. almost.
Want to see these data at the source?
You can also donate a copy of the game to a school or give as a gift.
Further Reading
For more on SAS character functions check out Ron Cody’s paper An Introduction to Character Functions, an oldie but goodie from WUSS back in 2003.
Or you could read my last post!
Jul
20
The Secret Life of Evaluators, with SAS
July 20, 2016 | 4 Comments
At the Western Users of SAS Software conference (yes, they DO know that is WUSS), I’ll be speaking about using SAS for evaluation.
“If the results bear any relationship at all to reality, it is indeed a fortunate coincidence.”
I first read that in a review of research on expectancy effects, but I think it is true of all types of research.
Here is the interesting thing about evaluation – you never know what kind of data you are going to get. For example, in my last post I had created a data set that was a summary of the answers players had given in an educational game, with a variable for the mean percentage correct and another variable for number of questions answered.
When I merged this with the user data set so I could test for relationships between characteristics of these individuals – age, grade, gender, achievement scores – and perseverance I found a very odd thing. A substantial minority were not matched in the users file. This made no sense because you have to login with your username and password to play the game.
The reason I think that results are often far from reality is just this sort of thing – people don’t scrutinize their data well enough to realize when something is wrong, so they just merrily go ahead analyzing data that has big problems.
In a sense, this step in the data analysis revealed a good problem for us. We actually had more users than we thought. Several months ago, we had updated our games. We had also switched servers for the games. Not every teacher installed the new software so it turned out that some of the records were being written to our old server.
Here is what I needed to do to fix this:
- Download the files from our servers. I exported these as .xls files.
- Read the files into SAS
- Fix the variables so that the format was identical for both files.
- Concatenate the files of the same type, e.g., student file the student file from the other server.
- Remove the duplicates
- Merge the files with different data, e.g., answers file with student file
I did this in a few easy steps using SAS.
- USE PROC IMPORT to read in the files.
Now, you can use the IMPORT DATA option from the file menu but that gets a bit tedious if you have a dozen files to import.
TIP: If you are not familiar with the IMPORT procedure, do it with the menus once and save the code. Then you can just change the data set names and copy and paste this a dozen times. You could also turn it into a macro if you are feeling ambitious, but let’s assume you are not. The code looks like this:
PROC IMPORT OUT= work.answers DATAFILE= “C:\Users\Spirit Lake\WUSS16\fish_data\answers.xls”
DBMS=EXCEL REPLACE;
RANGE=”answers$”;
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
Assuming that your Excel file has the names of the columns – ( GETNAMES = YES) . All you need to do for the next 11 data sets is to change the values in lower case – the file name you want for your SAS file goes after the OUT = , the Excel file after DATAFILE = and the sheet in that file that has your data after the RANGE =.
Notice there is a $ at the end of that sheet name.
Done. That’s it. Copy and paste however many times you want and change those three values for output dataset name, location of the input data and the sheet name.
2. Fix the variables so that the format is identical for both files
A. How do you know if the variables are the same format for each file?
PROC CONTENTS DATA = answers ;
This LOOKS good, right?
B. Look at a few records from each file.
OPTIONS OBS= 3 ;
PROC PRINT DATA = fl_answers_new ;
VAR date_entered ;
PROC PRINT DATA = fl_answers_old ;
VAR date_entered ;
OPTIONS OBS = MAX ;
PAY ATTENTION HERE !!! The OPTIONS OBS = 3 only shows the first three records, that’s a good idea because you don’t need to print out all 7,000+ records . However, if you forget to change it back to OBS = MAX then all of your procedures after that will only use the first 3 records, which is probably not what you want.
So, although my PROC CONTENTS showed the files were the same format in terms of variable type and length, here was a weird thing, since the servers were in different time zones, the time was recorded as 5 hours different, so
2015-08-20 13:23:30
Became
2015-08-20 18:23:30
Since this was recorded as a character variable, not a date (see the output for the contents procedure above), I couldn’t just subtract 5 from the hour.
Because the value was not the same, if I sorted by username and date_entered , each one of these that was moved over from the old server would be included in the data set twice, because SAS would not recognize these were the same record.
So, what did I do?
I’m so glad you asked that question.
I read in the data to a new data set and the third statement gives a length of 19 to a new character variable.
Next, I create a variable that is the value of the date_entered variable that start at the 12th position and go for the next two (that is, the value of the hour).
Now, I add 5 to the hour value. Because I am adding a number to it , this will be created as a numeric value. Even though datefix1 is a character variable – since it was created using a character function, SUBSTR, when I add a number to it, SAS will try to make the resulting value a number.
Finally, I’m putting the value of datefixed to be the first 11 characters of the original date value , the part before the hour. I’m using the TRIM function to get rid of trailing blanks. I’m concatenating this value (that’s what the || does) with exactly one blank space. Next, I am concatenating this with the new hour value. First, though, I am left aligning that number and trimming any blanks. Finally, I’m concatenating the last 6 characters of the original date-time value. If I didn’t do this trimming and left alignment, I would end up with a whole bunch of extra spaces and it still wouldn’t match.
I still need to get this to be the value of the date_entered variable so it matches the date_entered value in the other data set.
I’m going to DROP the date_entered variable, and also the datefix1 and datefixn variables since I don’t need them any more.
I use the RENAME statement to rename datefixed to date_entered and I’m ready to go ahead with combining my datasets.
DATA ansfix1 ;
SET flo_answers ;
LENGTH datefixed $19 ;
datefix1 = SUBSTR(date_entered,12,2);
datefixn = datefix1 +5 ;
datefixed = TRIM(SUBSTR(date_entered,1,11)) || ” ” || TRIM(LEFT(datefixn)) || SUBSTR(date_entered,14,6) ;
DROP date_entered datefix1 datefixn ;
RENAME datefixed = date_entered ;
They’re fun and will make you smarter – just like this blog!
Check out the games that provided these data!
Buy one for your family or donate to a child or school.
Apr
9
Your Baby Is Ugly and Other Start-up Lessons
April 9, 2016 | 1 Comment
It’s almost 6 am here on the east coast, and after flying all day during which I worked on a final report for a grant to develop our latest educational game and make bug fixes on same, I landed and wrote a report for a client, because that pays the bills.
In the meantime, over on our 7 Generation Games blog, Maria wrote a post where she called bullshit on venture capitalists who claim not to be interested in educational games because they aren’t a billion dollar business but then fund other enterprises that no way in hell are a billion dollar business.
She seems to have touched a nerve because now we are getting comments from people saying no one wants to fund you because your games are bad and you are mean.
That is part of the start-up life, really. You have this idea for a business that you think is wonderful, it is your baby. Like a baby, you get too little sleep, because you are working all of the time, but you think it’s worth it.
And every day, you run into people who are essentially telling you that your baby is ugly.
People like to believe they are reasonable and give reasons for their belief in your baby’s ugliness. I think you should consider those explanations because they could be right. Maybe your baby IS ugly.
For example, someone said, “Maybe venture capitalists don’t want to invest in your games because they aren’t as good as the PS4 , Wii and Xbox games and kids don’t want to play them.”
I answered that he was correct, our games, that cost schools an average of $2- $3 per student, and cost individuals $9.99 are NOT as good as games that cost $40 – $60. If you have 200 kids in your school playing our games, you probably can’t afford to pay us $10,000 . I know this is true. Could I be wrong about the price of the games to which he was comparing ours? I went and checked on Amazon which is probably one of the cheapest places to buy games and, I was correct.
I have a Prius. My daughter has a BMW that costs four times as much. Her car looks much cooler than mine and goes much faster. Does that mean Prius sucks and no one should invest in them? Obviously, no.
Actually, we have thousands of kids playing our games and they sincerely seem to like them, and upper elementary and middle school kids are usually pretty honest about what they think sucks.
People sometimes point out that our graphics could be cooler or our game world could be larger or other really, really great ideas that I completely agree with. The fact is, though, that we want our games to be an option for schools, parents across the income spectrum, after-school programs and even nursing homes, in some cases. (There is a whole group of “silver gamers”.) These markets often do NOT have the type of hardware that hard-core gamers do. In fact, the minimal hardware requirement we aim to support is Chromebooks and we are building web-based versions that will run in areas that don’t have high-speed Internet access.
Did you ever have that experience where you call tech support for a problem and the person on the other end says,
Well, it works on my computer.
What good does that do me?
So, we are trying to make games that work on a lot of people’s computers. Believe me, I do get it. I play games on my computer and I have a really nice desktop in an area with high-speed Internet and I would LOVE to do some way cooler things. We made the decision to try to provide games people could play even if the only computer they can access is some piece of junk computer that most of us would throw out. Don’t get me started on the need to upgrade our schools and libraries, that is a rant for another day.
A teacher commented the other day that while she really liked the educational quality of our games what she really wanted for her classroom were Xbox quality games for free . I would like a free computer, too, but those bastards at Apple keep charging me when I want a new one. I guess that is a rant for another day, too.
My whole point is that running a start-up is a lot of hard work and a lot of rejection. Almost like being an aspiring actor or author or raising a teenager. You have to consider the criticisms without being discouraged. Maybe they are correct that Shakespeare wouldn’t have said,
Like, you know, to be or not.
On the other hand, I remember that publishers rejected Harry Potter, and just about every successful company over the last few decades has had more detractors than supporters when it got started. And let it be noted I was right about that jerk I told you not to date, too.
In the meantime, check out our games, they really are fun and DO make you smarter!
Feb
16
Urban vs Rural Barriers to Ed Tech: An example of Fisher’s Exact Test
February 16, 2016 | 1 Comment
Who was it that said asking a statistician about sample size is like asking a jeweler about price. If you have to ask, you can’t afford it.
We all know that the validity of a chi-square test is questionable if the expected sample size of the cells is less than five. Well, what do you do when, as happened to me recently, ALL of your cells have a sample size less than five?
The standard answer might be to collect more data, and we are in the process of that, but having the patience of the average toddler, I wanted that data analyzed NOW because it was very interesting.
It was our hypothesis that rural schools were less likely to face obstacles in installing software than urban schools, due to the extra layers of administrative approval required in the latter (some might call it bureaucracy). On the other hand, we could be wrong (horrors!). Maybe rural schools had more problems because they had more difficulty finding qualified personnel to fill information technology positions. We had data from 17 schools, 9 from urban school districts and 8 from rural districts. To participate in our study, schools had to have a contact person who was willing to attempt to get the software installed on the school computers. This was not a survey asking them whether it would be difficult or how long it would take. We actually wanted them to get software ( 7 Generation Games ) not currently on their school computers installed. To make sure that cost was not an issue, all 17 schools received donated licenses.
You can see the full results here.
In short, 8 of the 9 urban schools had barriers to installation of the games which delayed their use in the classroom by a median of three months. I say median instead of mean because four of the schools STILL have not been able to get the games installed. The director of one after-school program that wanted to use the games decided it was easier for his program to go out and buy their own computers than to get through all of the layers of district approval to use the school computer labs, so that is what they did.
For the rural schools, 7 out of 8 reported no policy or administrative barriers to installation. The median length of time from when they received the software to installation was two weeks. In two of the schools, the software was installed the day it was received.
Here is a typical comment from an urban school staff member,
“I needed to get it approved by the math coach, and she was all on board. Then I got it approved at the building level. We had new administration this year so it took them a few weeks to get around to it, and then they were all for it. Then it got sent to the district level. Since your games had already been approved by the district, that was just a rubber stamp but it took a few weeks until it got back to us, then we had all of the approvals so we needed to get it installed but the person who had the administrator password had been laid off. Fortunately, I had his phone number and I got it from him. Then, we just needed to find someone who had the spare time to put the game on all of the computers. All told, it took us about three months, which was sad because that was a whole semester lost that the kids could have been playing the games. “
And here is a typical comment from a rural staff member.
“It took me, like, two minutes to get approval. I called the IT guy and he came over and installed it.”
The differences sound pretty dramatic, but are they different from what one would expect by chance, given the small sample size? Since we can’t use a chi-square, we’ll use Fisher’s exact test. Here is the SAS code to do just that:
PROC FREQ DATA = install ;
TABLES rural*install / CHISQ ;
Wait a minute! Isn’t that just a PROC FREQ and a chi-square? How the heck did I get a Fisher’s exact test from that?
Well, it turns out that if you have a 2 x 2 table, SAS automatically computes the Fisher exact test, as well as several others. I told you that you could see the full results here but you didn’t look, now, did you?
You can see the full results here.
In case you still didn’t look, the probability of obtaining this table under the null hypothesis that there is no difference in administrative barriers in urban versus rural districts is .0034.
If you think these data suggest it is easier to adopt educational technology in rural districts than in urban ones, well, not exactly. Rural districts have their own set of challenges, but that is a post for another day.
keep looking »
Blogroll
- Andrew Gelman's statistics blog - is far more interesting than the name
- Biological research made interesting
- Interesting economics blog
- Love Stats Blog - How can you not love a market research blog with a name like that?
- Me, twitter - Thoughts on stats
- SAS Blog for the rest of us - Not as funny as some, but twice as smart. If this is for the rest of us, who are those other people?
- Simply Statistics, simply interesting
- Tech News that Doesn’t Suck
- The Endeavor -John D Cook - Another statistics blog