A picture says 1,000 words – especially if you are talking to a non-technical audience. Take the example below.

We wanted to know whether the students who played our game Fish Lake at least through the first math problem and the students who gave up at the first sight of math differed in achievement. Maybe the kids who played the games were the higher achieving students and that would explain why they did better on the post-test.

You can see from the chart below this is not the case. The distribution of pretest scores is pretty similar for the kids who quit playing (the top) and those who persisted.

Graphs produced by ODSBeneath the graphs, you can see the box and whisker plots. The persistent group has fewer students at the very low end and we actually know why that is – students with special needs in the fourth- and fifth-grade, for example, those who were non-readers, could not really play the game and either quit on their own very soon or were given alternative assignments by the teacher.

The median (the line inside the box), the mean (the diamond) and 25th percentile (the bottom of the box) are all slightly higher for the persisting group – for the same reason, the students with the lowest scores quit right away.

These data tell us  that the group that continued playing and the group that quit were pretty similar except for not having the very lowest achieving students.

So, if academic achievement wasn’t a big factor in determining which students continued playing the games, what was?

That’s another chart for another day, but first, try to guess what it was.

———–

Would you like to play one of our games? Check them out here – all games run on Mac and Windows.

trail

What about Chromebooks?  Check out Forgotten Trail.

characters traveling on map

If I were to give one piece of advice to a would-be program evaluator, it would be to get to know your data so intimately it’s almost immoral.

Generally, program evaluation is an activity undertaken by someone with a degree of expertise in research methods and statistics (hopefully!) using data gathered and entered by people’s whose interest is something completely different, from providing mental health services to educating students.

Because their interest in providing data is minimal, your interest in checking that data better be maximal. Let’s head on with the data from the last post. We have now created two data sets that have the same variable formats so we are good to go with concatenating them.
DATA answers hmph;
SET fl_answers ansfix1 ;
IF username IN(“UNDEFINED”,”UNKNOWN”) or INDEX(username,”TEST”) > 0 THEN OUTPUT hmph;
ELSE OUTPUT answers;

PRO TIP : I learned from a wise man years ago that one should not just gleefully delete data without looking at it. That is, instead of having a dataset where you put the data you expect and deleting the rest, send the unwanted data to a data set. If it turns out to be what you expected, you can always delete the data after you look at it.

There should be very few people with a username of  ‘UNDEFINED’ or ‘UNKNOWN’. The only way to get that is to be one of our developers who are entering the data in forms as they create and test them, not by logging in and playing the game.   The INDEX function checks in the variable in the first argument for the string given in the second and returns the starting position of the string, if found. So,  INDEX(username, “TEST”) > 0 looks for the word TEST anywhere in the username.

Since we ask our software testers to put that word in the username they pick, it should delete all of the tester records. I looked at the hmph data set and the distribution of usernames was just as I expected and most of the usernames were in the answers data set with valid usernames.

Did you remember that we had concatenated the data set from the old server and the new server?

I hope you did because if you didn’t you will end up with a whole lot of the same answers in their twice.

Getting rid of the duplicates

PROC SORT DATA = answers OUT=in.all_fl_answers NODUP ;
by username date_entered ;

The difference between NODUP and NODUPKEY is relevant here. It is possible we could have a student with the same username and date_entered because different schools could have assigned students the same username. (We do our lookups by username + school). Some other student with the same username might have been entering data at the same time in a completely different part of the country. The NODUP option only removes records if every value of every variable is the same. The NODUPKEY removes them if the variables in the BY statement are duplicates.

All righty then, we have the cleaned up answers data, now we go back and create a summary data set as explained in this post. You don’t have to do it with SAS Enterprise Guide as I did there, I just did it for the same reason I do most things, the hell of it.

MERGING THE DATA

PROC SORT DATA = in.answers_summary ;
BY username ;

PROC SORT DATA = in.all_fl_students ;
BY username ;

DATA in.answers_studunc odd;
MERGE in.answers_summary (IN=a) in.all_fl_students (IN=b) ;
IF a AND b THEN OUTPUT in.answers_studunc  ;
IF a AND NOT  b THEN OUTPUT odd ;

The PROC SORT steps sort. The MERGE statement merges. The IN= option creates a temporary variable with the name ‘a’ or ‘b’. You can use any name so I use short ones.  If there is a record in both the student record file and the answers summary file then the data is output to a data set of all students with summary of answers.

There should not be any cases where there are answers but no record in the student file. If you recall, that is what set me off on finding that some were still being written to the old server.

LOOK AT YOUR LOG FILE!

There is a sad corner of statistical purgatory for people who don’t look at their log files because they don’t know what they are looking for. ‘Nuff said.

This looks exactly as it should. A consistent finding in the pilot studies of assessment of educational games has found a disconcertingly low level of persistence. So, it is expected that many players quit when they come to the first math questions.  The fact that of the 875 players slightly less than 600 had answered any questions was somewhat expected. As expected, there were no records where

NOTE: There were 596 observations read from the data set IN.ANSWERS_SUMMARY.
NOTE: There were 875 observations read from the data set IN.ALL_FL_STUDENTS.
NOTE: The data set IN.ANSWERS_STUDUNC has 596 observations and 11 variables.
NOTE: The data set WORK.ODD has 0 observations and 11 variables.

So, now, after several blog posts, we have a data set ready for analysis ….. almost.


Want to see these data at the source?

Check out our game, playable on Mac or Windows. Download Spirit Lake or Fish Lake  to play, or for Forgotten Trail, just click on the link provided, no download required.

Mom and kid

You can also donate a copy of the game to a school or give as a gift.

Further Reading

For more on SAS character functions check out Ron Cody’s paper An Introduction to Character Functions, an oldie but goodie from WUSS back in 2003.

Or you could read my last post!

This paper by Britta Kelsey from SAS Users Group International in 2005 will tell you more than you want to know about the NODUP and NODUPKEY.

At the Western Users of SAS Software conference (yes, they DO know that is WUSS), I’ll be speaking about using SAS for evaluation.

“If the results bear any relationship at all to reality, it is indeed a fortunate coincidence.”

I first read that in a review of research on expectancy effects, but I think it is true of all types of research.

This is me on my soapbox

This is me on my soapbox

Here is the interesting thing about evaluation – you never know what kind of data you are going to get.  For example, in my last post I had created a data set that was a summary of the answers players had given in an educational game, with a variable for the mean percentage correct and another variable for number of questions answered.

When I merged this with the user data set so I could test for relationships between characteristics of these individuals – age, grade, gender, achievement scores – and perseverance I found a very odd thing. A substantial minority were not matched in the users file. This made no sense because you have to login with your username and password to play the game.

The reason I think that results are often far from reality is just this sort of thing – people don’t scrutinize their data well enough to realize when something is wrong, so they just merrily go ahead analyzing data that has big problems.

In a sense, this step in the data analysis revealed a good problem for us. We actually had more users than we thought. Several months ago, we had updated our games. We had also switched servers for the games. Not every teacher installed the new software so it turned out that some of the records were being written to our old server.

Here is what I needed to do to fix this:

  1. Download the files from our servers. I exported these as .xls files.
  2. Read the files into SAS
  3. Fix the variables so that the format was identical for both files.
  4. Concatenate the files of the same type, e.g., student file the student file from the other server.
  5. Remove the duplicates
  6. Merge the files with different data, e.g., answers file with student file

 

I did this in a few easy steps using SAS.

  1. USE PROC IMPORT to read in the files.

Now, you can use the IMPORT DATA option from the file menu but that gets a bit tedious if you have a dozen files to import.

TIP: If you are not familiar with the IMPORT procedure, do it with the menus once and save the code. Then you can just change the data set names and copy and paste this a dozen times. You could also turn it into a macro if you are feeling ambitious, but let’s assume you are not. The code looks like this:

PROC IMPORT OUT= work.answers  DATAFILE= “C:\Users\Spirit Lake\WUSS16\fish_data\answers.xls”
DBMS=EXCEL REPLACE;
RANGE=”answers$”;
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;

Assuming that your Excel file has the names of the columns – ( GETNAMES = YES) . All you need to do for the next 11 data sets is to change the values in lower case – the file name you want for your SAS file goes after the OUT =  , the Excel file after DATAFILE =  and the sheet in that file that has your data after the RANGE =.

Notice there is a $ at the end of that sheet name.

Done. That’s it. Copy and paste however many times you want and change those three values for output dataset name, location of the input data and the sheet name.

2. Fix the variables so that the format is identical for both files

A. How do you know if the variables are the same format for each file?

PROC CONTENTS DATA = answers ;

contents of data set

This LOOKS good, right?

B. Look at a few records from each file.

OPTIONS OBS= 3 ;
PROC PRINT DATA = fl_answers_new ;
VAR  date_entered ;
PROC PRINT DATA = fl_answers_old ;
VAR  date_entered ;

OPTIONS OBS = MAX ;

PAY ATTENTION HERE !!! The OPTIONS OBS = 3 only shows the first three records, that’s a good idea because you don’t need to print out all 7,000+ records . However, if you forget to change it back to OBS = MAX then all of your procedures after that will only use the first 3 records, which is probably not what you want.

So, although my PROC CONTENTS showed the files were the same format in terms of variable type and length, here was a weird thing, since the servers were in different time zones, the time was recorded as 5 hours different, so

2015-08-20 13:23:30

Became

2015-08-20 18:23:30

Since this was recorded as a character variable, not a date (see the output for the contents procedure above), I couldn’t just subtract 5 from the hour.

Because the value was not the same, if I sorted by username and date_entered , each one of these that was moved over from the old server would be included in the data set twice, because SAS would not recognize these were the same record.

So, what did I do?

I’m so glad you asked that question.

I read in the data to a new data set and the third statement gives a length of 19 to a new character variable.

Next, I create a variable that is the value of the date_entered variable that start at the 12th position and go for the next two (that is, the value of the hour).

Now, I add 5 to the hour value. Because I am adding a number to it , this will be created as a numeric value. Even though datefix1 is a character variable  – since it was created using a character function, SUBSTR, when I add a number to it, SAS will try to make the resulting value a number.

Finally, I’m putting the value of datefixed to be the first 11 characters of the original date value , the part before the hour. I’m using the TRIM function to get rid of trailing blanks. I’m concatenating this value (that’s what the || does) with  exactly one blank space. Next, I am concatenating this with the new hour value. First, though, I am left aligning that number and trimming any blanks. Finally, I’m concatenating the last 6 characters of the original date-time value. If I didn’t do this trimming and left alignment, I would end up with a whole bunch of extra spaces and it still wouldn’t match.

I still need to get this to be the value of the date_entered variable so it matches the date_entered value in the other data set.

I’m going to DROP the date_entered variable, and also the datefix1 and datefixn variables since I don’t need them any more.

I use the RENAME statement to rename datefixed to date_entered and I’m ready to go ahead with combining my datasets.

DATA ansfix1 ;
SET flo_answers ;
LENGTH datefixed $19 ;
datefix1 = SUBSTR(date_entered,12,2);
datefixn = datefix1 +5 ;
datefixed = TRIM(SUBSTR(date_entered,1,11)) || ” ” || TRIM(LEFT(datefixn)) || SUBSTR(date_entered,14,6) ;
DROP date_entered datefix1 datefixn ;
RENAME datefixed = date_entered ;

 


They’re fun and will make you smarter – just like this blog!

Check out the games that provided these data!

Fish lake splash screen

Buy one for your family or donate to a child or school.

 

 

It’s almost 6 am here on the east coast, and after flying all day during which I worked on a final report for a grant to develop our latest educational game and make bug fixes on same, I landed and wrote a report for a client, because that pays the bills.

In the meantime, over on our 7 Generation Games blog, Maria wrote a post where she called bullshit on venture capitalists who claim not to be interested in educational games because they aren’t a billion dollar business but then fund other enterprises that no way in hell are a billion dollar business.

She seems to have touched a nerve because now we are getting comments from people saying no one wants to fund you because your games are bad and you are mean.

That is part of the start-up life, really. You have this idea for a business that you think is wonderful, it is your baby. Like a baby, you get too little sleep, because you are working all of the time, but you think it’s worth it.

kid acting ugly

And every day, you run into people who are essentially telling you that your baby is ugly.

People like to believe they are reasonable and give reasons for their belief in your baby’s ugliness. I think you should consider those explanations because they could be right. Maybe your baby IS ugly.

For example, someone said, “Maybe venture capitalists don’t want to invest in your games because they aren’t as good as the PS4 , Wii and Xbox games and kids don’t want to play them.”

I answered that he was correct, our games, that cost schools an average of $2- $3 per student, and cost individuals $9.99 are NOT as good as games that cost $40 – $60. If you have 200 kids in your school playing our games, you probably can’t afford to pay us $10,000 . I know this is true. Could I be wrong about the price of the games to which he was comparing ours? I went and checked on Amazon which is probably one of the cheapest places to buy games and,  I was correct.

I have a Prius. My daughter has a BMW that costs four times as much. Her car looks much cooler than mine and goes much faster. Does that mean Prius sucks and no one should invest in them? Obviously, no.

Actually, we have thousands of kids playing our games and they sincerely seem to like them, and upper elementary and middle school kids are usually pretty honest about what they think sucks.

People sometimes point out that our graphics could be cooler or our game world could be larger or other really, really great ideas that I completely agree with. The fact is, though, that we want our games to be an option for schools, parents across the income spectrum, after-school programs and even nursing homes, in some cases. (There is a whole group of “silver gamers”.) These markets often do NOT have the type of hardware that hard-core gamers do. In fact, the minimal hardware requirement we aim to support is Chromebooks and we are building web-based versions that will run in areas that don’t have high-speed Internet access.

Did you ever have that experience where you call tech support for a problem and the person on the other end says,

Well, it works on my computer.

What good does that do me?

So, we are trying to make games that work on a lot of people’s computers. Believe me, I do get it. I play games on my computer and I have a really nice desktop in an area with high-speed Internet and I would LOVE to do some way cooler things. We made the decision to try to provide games people could play even if the only computer they can access is some piece of junk computer that most of us would throw out. Don’t get me started on the need to upgrade our schools and libraries, that is a rant for another day.

A teacher commented the other day that while she really liked the educational quality of our games what she really wanted for her classroom were Xbox quality games for free . I would like a free computer, too, but those bastards at Apple keep charging me when I want a new one. I guess that is a rant for another day, too.

My whole point is that running a start-up is a lot of hard work and a lot of rejection. Almost like being an aspiring actor or author or raising a teenager. You have to consider the criticisms without being discouraged. Maybe they are correct that Shakespeare wouldn’t have said,

Like, you know, to be or not.

On the other hand, I remember that publishers rejected Harry Potter, and just about every successful company over the last few decades has had more detractors than supporters when it got started. And let it be noted I was right about that jerk I told you not to date, too.

In the meantime, check out our games, they really are fun and DO make you smarter!

Fish lake splash screen

 

 

 

Who was it that said asking a statistician about sample size is like asking a jeweler about price. If you have to ask, you can’t afford it.

We all know that the validity of a chi-square test is questionable if the expected sample size of the cells is less than five. Well, what do you do when, as happened to me recently, ALL of your cells have a sample size less than five?

baby mashing cake

The standard answer might be to collect more data, and we are in the process of that, but having the patience of the average toddler, I wanted that data analyzed NOW because it was very interesting.

It was our hypothesis that rural schools were less likely to face obstacles in installing software than urban schools, due to the extra layers of administrative approval required in the latter (some might call it bureaucracy). On the other hand, we could be wrong (horrors!). Maybe rural schools had more problems because they had more difficulty finding qualified personnel to fill information technology positions. We had data from 17 schools, 9 from urban school districts and 8 from rural districts. To participate in our study, schools had to have a contact person who was willing to attempt to get the software installed on the school computers. This was not a survey asking them whether it would be difficult or how long it would take. We actually wanted them to get software ( 7 Generation Games ) not currently on their school computers installed. To make sure that cost was not an issue, all 17 schools received donated licenses.

You can see the full results here.

In short, 8 of the 9 urban schools had barriers to installation of the games which delayed their use in the classroom by a median of three months. I say median instead of mean because four of the schools STILL have not been able to get the games installed. The director of one after-school program that wanted to use the games decided it was easier for his program to go out and buy their own computers than to get through all of the layers of district approval to use the school computer labs, so that is what they did.

For the rural schools, 7 out of 8 reported no policy or administrative barriers to installation. The median length of time from when they received the software to installation was two weeks. In two of the schools, the software was installed the day it was received.

Here is a typical comment from an urban school staff member,

“I needed to get it approved by the math coach, and she was all on board. Then I got it approved at the building level.  We had new administration this year so it took them a few weeks to get around to it, and then they were all for it. Then it got sent to the district level. Since your games had already been approved by the district, that was just a rubber stamp but it took a few weeks until it got back to us, then we had all of the approvals so we needed to get it installed but the person who had the administrator password had been laid off. Fortunately, I had his phone number and I got it from him. Then, we just needed to find someone who had the spare time to put the game on all of the computers. All told, it took us about three months, which was sad because that was a whole semester lost that the kids could have been playing the games. “

And here is a typical comment from a rural staff member.

“It took me, like, two minutes to get approval. I called the IT guy and he came over and installed it.”

The differences sound pretty dramatic, but are they different from what one would expect by chance, given the small sample size? Since we can’t use a chi-square, we’ll use Fisher’s exact test. Here is the SAS code to do just that:

PROC FREQ DATA = install ;
TABLES rural*install / CHISQ ;

Wait a minute! Isn’t that just a PROC FREQ and a chi-square? How the heck did I get a Fisher’s exact test from that?

Well, it turns out that if you have a 2 x 2 table, SAS automatically computes the Fisher exact test, as well as several others. I told you that you could see the full results here but you didn’t look, now, did you?

You can see the full results here.

In case you still didn’t look, the probability of obtaining this table under the null hypothesis that there is no difference in administrative barriers in urban versus rural districts is .0034.

If you think these data suggest it is easier to adopt educational technology in rural districts than in urban ones, well, not exactly. Rural districts have their own set of challenges, but that is a post for another day.

 

When I first taught multivariate statistics, I was nervous. The material is more difficult than Statistics 101 so I assumed teaching the course would be more difficult as well.  Over 25 years of teaching, I’ve found the opposite. The more advanced you get in a field, the easier the courses are to teach. You might expect it is because you have more motivated or capable students, and there is some of that effect. A bigger effect, I’ve found, is because once students have the basic concepts you have something to generalize from. Also, you have a common vocabulary. It’s much easier to explain that multiple regression is just simple regression with multiple predictor variables than to explain what regression is to someone who has never been exposed to the concepts of correlation and regression.

I’m in the middle of making a game to teach statistics to middle school students and was thinking how to explain to them why what they are learning is important and how to explain statistics to someone who has never been exposed to the idea. On top of this challenge is the fact that I know many of the students playing our games will be limited in English proficiency, either because it is their second language or simply because they have a limited vocabulary.

Why learn statistics? Did you even know that the type of mathematics you are learning at the moment has its own name? If you did, pat yourself on the back for being smart. Go ahead, I’ll wait.

Statistics is the practice or science of collecting and analyzing numerical data in large quantities, especially for the purpose of making inferences.

We’re going to break down that definition.

Collecting numerical data.

Collecting: bringing or gathering together.  Notice people don’t have a collection of one thing!

Numerical: Numbers that have meaning as a measurement. The fact that 1 bass can feed 2 people is numerical data.

Data : Facts or figures from which conclusions can be drawn

Analyzing: looking at in detail, examining the basic parts – like looking at each category of animal and how many people it can feed

Let’s take the example of the Mayans hunting, using  this graph that shows how many people you can feed with each type of animal.

mayan_hunting_graph

Based on the data that you have, you  know you can feed more people from a peccary, than a bass, so you could draw the conclusion that an area with a lot of peccaries would be a better place to be looking for food than one with a lot of bass.

This is what a peccary looks like, in case you were wondering.

peccary looks like a wild pig

Here is what is important to know about the science of collecting and analyzing numerical data – you are making decisions based on facts.

Why on earth would you hunt peccary? They can be dangerous if threatened, and trying to kill one and eat it is certainly threatening it.

On the other hand, no one ever got injured by a bass, as far as I know.

bassAs you can see from the graph above, you can feed 9 times as many people from a peccary, so maybe it is worth the risk.

 

You’re just learning to be a baby statistician at this point, working with really small quantities of data.

The same methods using bar graphs, computing the mean and analyzing variability are used everywhere with huge amounts of data. The military uses statistics, for everything from figuring out how many tanks they need to order to deciding when to move soldiers from one part of the country to another. One of the first uses of statistics was for agriculture, to decide what was working to raise more corn and what wasn’t. You’ll get to see for yourself when you get to the floating gardens of the Aztecs.

—–

Here’s my question to you, oh reader people, what resources have you found useful for teaching statistics? I mean, resources you have really watched or used and thought, “Hey, this would be great for teaching? ”

There is a lot of mediocre, boring stuff on the interwebz and if any of you could point me to what you think rises above the rest, I’d be super appreciative.

—–

grandfatherIf you want to check out our previous games, that teach multiplication and division (Spirit Lake) or fractions (Fish Lake) you can see them here. If you buy a game this month you can get our newest game, Forgotten Trail (fractions and statistics) as a free bonus.

 

 

Years ago, a friend of mine was in college and had an old, beat up car that leaked oil on to the street where it was parked, which, for some reason annoyed her elderly neighbor. When we returned from a trip overseas competing for the U.S., there was a notice on her car – the neighbor had reported the car as abandoned and we got home just in time to stop the city from towing it away. As a joke, the coach got her a bumper sticker that read, “This is not an abandoned vehicle.”

old car

It’s almost two weeks since I last posted. Contrary to appearances, this is not an abandoned blog!

I just this minute – hurray, tap-dancing – submitted a grant I’ve been working on for the past two weeks.

Girl on TV playing game

While writing the grant this week, I’ve been in North Dakota, first giving a presentation on Using Native American Culture to Increase Math Performance. You can see a bit of it that was shown on the local TV station here.

meeting students

After meeting lots of students at Minot State, we headed over to the Minot Job Corps and I met with students and faculty, talking about our games, starting a company and life in general.

On  to New Town, on the Fort Berthold Reservation where I met with the staff and students from the Boys and Girls Club, again, gave demonstrations of our games, and threw a judo demonstration in along with it.

armbarring at Boys and Girls club Along there somewhere, I finished the final report on our Dakota Math project that once again found significant improvement in performance of students who played our games, hired two more employees, signed another consulting contract,  had way too many meetings and squashed a few bugs in the games.

Tomorrow, I head home to Santa Monica, for two weeks, until I head out to Fort Totten, ND. In the meantime, I’m back to blogging. Did you miss me?

—-

If you want to see what I’m working on these days, you can check it out here:

Sam is running

If you buy a game this week, we’ll throw in a beta release of Forgotten Trail for free!

 

 

The results are in! The chart below gladdens my little heart, somewhat.

Graph showing significant improvement from pretest to posttest

One thing to note is the fact that the 95% confidence interval is comfortably above zero. Another point is that it looks like a pretty normal distribution.

What is it? It is the difference between pretest and post-test scores for 71 students at two small, rural schools who played Spirit Lake: The Game.

I selected these schools to analyze first, and held my breath. These were the schools we had worked with the most closely, who had implemented the games as we had recommended (play twice a week for 25-30 minutes). If it didn’t work here, it probably wasn’t going to work.

Two years ago, with a sample of 39 students from 4th and 5th grade from one school, we found a significant difference compared to the control group.

COULD WE DO IT AGAIN?

You probably don’t feel nervous reading that statement because you have not spent the last three years of your life developing games that you hope will improve children’s performance in math.

The answer, at least for the first group of data we have analyzed is – YES!

Scores improved 20% from pre-test to post-test. This was not as impressive as the improvement of 30% we had found in the first year, but this group also began with a substantially higher score. Two years ago, the average student scored 39% on the pre-test. This year, for 71 students with complete data, the average pre-test score was 47.9% , the post-test mean was 57.4%.   I started this post saying my little heart was gladdened “somewhat” because I still want to see the students improve more.

There is a lot more analysis to do. For a start, there is analysis of data from schools who were not part of our study but who used the pretest and post-test – with them, we can’t really tell how the game was implemented but at least we can get psychometric data on the tests.

We have data on persistence – which we might be able to correlate with post-test data, but I doubt it, since I suspect students who didn’t finish the game probably didn’t take the post-test.

We have data on Fish Lake, which also looks promising.

Overall, it’s just a great day to be a statistician at 7 Generation Games.

buffalo in the winter

Here is my baby, Spirit Lake. It can be yours for ten bucks. If you are rocking awesome at multiplication and division, including with word problems, but you’d like to help out a kid or a whole classroom, you can donate a copy.

Some problems that seem really complex are quite simple when you look at them in the right way. Take this one, for example:

My hypothesis is that a major problem in math achievement is persistence. Students just give up at the first sign of trouble. I have three different data sets with student data from the Spirit Lake game. Many of the students in the student table are the control group, so they will have no data on game play. There is a table of answers to the math challenges and another table with answers to quizzes which students took only if they missed a math challenge. When students miss a math challenge in  the game, depending on which educational resource they choose, they may do one of two or three different quizzes to get back into the game.  Also, some of the quiz records were not from quizzes actually in the game but from supplemental activities we provided. So, how do I identify where in the process students drop out and present in a simple graphic to discuss with schools? Just to complicate matters, the username was different lengths in the different datasets and the variable for timestamp also had different names.

It turns out, the problem was not that difficult.

  1. Merge the student table with the answers (math challenges) and only include those students with at least one answer.
  2. Merge the student table with the quizzes and only include those students with at least one quiz
  3. Concatenate the data sets from steps 1 & 2
  4. Create a new userid variable and set it equal to the username
  5. Create a new “entered” variable and set it equal to whichever of the datetime fields exists on that record
  6. Delete the quizzes not included in the game.
  7. Sort the dataset by userid and the date and time entered.
  8. Keep the last record for each userid. Now you have their last date of activity.
  9. If there is a value for the math challenge field then that is the name of the last activity, otherwise the quiz name is the name for the last activity.
  10. Use a PROC FORMAT to assign each activity a value equal to the step in the game.
  11. Do a PROC FREQ using that format and the order = FORMATTED option.

Once I had the frequencies, I just put them into a table in a word document and shaded the columns to match the percentage. There may be a way in SAS/Graph or something else to do this automatically, but honestly, the table took me two minutes once I had the data.

graph showing students dropping out at each step

I think it illustrates my points pretty clearly, which are:

  • A sizable number of students drop out after the second problem.
  • 25% of the students drop after the first difficulty they have (missing the second problem)
  • Only a minority of students persist all the way to the end, less than 25% of the total sample

This isn’t based on a tiny sample, either. The data above represent a sample of 397 students.

In case you would like to see it, the code for steps 3-11 is below. Particularly useful is the PROC FORMAT. Notice that you can have multiple values have the same format, which was important here because players can take multiple paths that are still the same step in the sequence.

data persist ;
attrib userid length= $49 ;
set mydata2.sl_answers mydata2.sl_quizzes ;
entered = max(date_answered_dt,date_taken_dt) ;
**** DELETES QUIZZES IN EXTRA AND SUMMER SITE, NOT IN MAIN GAME ;
if quiztype in (“problemsolve”,”divide1long”,”multiplyby23″) then delete ;
userid = new_username ;
format entered datetime20. ;
proc sort data=persist ;
by quiztype ;

proc sort data=persist ;
by userid entered ;

data retention ;
set persist ;
by userid ;
if last.userid ;
attrib last_activity length= $14 ;
if inputform ne “” then last_activity = inputform ;
else last_activity = quiztype ;

proc freq data= retention ;
tables last_activity ;

proc format ;
value
$activity
“findcepansi” = “01”
“x2x9” = “02”
“math2x” = “02”
“math2_2” = “02”
“wolves1a” = “02”
“multiplyby5” = “03”
“multiplyby4” = “03”
“multiplyby3” = “04”
“wolves1b” = “05”
…. AND SO ON ….

“horseform2” = “21”
;
ods rtf file = “C:\Users\Spirit Lake\phaseII\pipeline.rtf” ;
proc freq data= retention order=formatted ;
tables last_activity ;
format last_activity $activity. ;
run ;
ods rtf close ;

—- Feel smarter after reading this blog?
Fish Lake artwork
Want to feel even smarter? Download and play our games!  You can run around in our virtual world while reviewing your basic math skills. If you are too busy (seriously?) you can still give a game as a gift or donate a game to a classroom or school

Many years ago, I was walking through the exhibits at the county fair with my late husband (he was alive then, that’s why he was able to walk with me) and I lamented,

Look at those quilts. My grandmother makes quilts. Look at those crocheted tablecloths. My other grandmother crochets. Look at me – what do I make?

My wonderful husband turned to me and said in his good-old-boy, country accent,

Money. That’s what you make that your grandmothers didn’t make. You make money, darlin’.

Everyone is posting pictures of the cute Halloween costumes their mom made for them or that they made for their children. I never made a Halloween costume in my life, but here is a copy of some code I finished last weekend that makes a graph with different types of pastries. Another function I wrote (not shown here) changes it from Spanish to English. If you get it correct, it takes you to another problem that does bar graphs with actual bars.

I didn’t make a costume but I did make money from working on this project which The Spoiled One can use to buy whatever costume she likes.

graph with pastries

<script type="text/javascript">
    $( window ).load(function() {
        var ncup = 0;
        var nd = 0 ;
        var ncake = 0 ;
        var thisone = 0;
        var sesstries = 0 ;
        document.getElementById("arrow").addEventListener("click", function(){
           if(ncake== 4 & nd ==5 & ncup==7){
               window.location.href="problem5_go_to.html" ;
           }
            else {goFail();}
        });
        document.getElementById("button1").addEventListener("click", function(){
            location.reload();
        });
        document.getElementById("button2").addEventListener("click", function(){
           window.location.href ="../learn_more4.html";
        });
        $(function () {
            $(".abox").draggable({
                helper: "clone",
                start: function (event, ui) {
                    thisone = 1;
                },
                revert: function (event, ui) {
                    $(this).data("uiDraggable").originalPosition = {
                        top: 0,
                        left: 0
                    };
                    return !event;
                }
            });
            $(".bbox").draggable({
                helper: "clone",
                start: function (event, ui) {
                    thisone = 100;
                },
                revert: function (event, ui) {
                    $(this).data("uiDraggable").originalPosition = {
                        top: 0,
                        left: 0
                    };
                    return !event;
                }
            });
            $(".cbox").draggable({
                helper: "clone",
                start: function (event, ui) {
                    thisone = 1000;
                },
                revert: function (event, ui) {
                    $(this).data("uiDraggable").originalPosition = {
                        top: 0,
                        left: 0
                    };
                    return !event;
                }
            });
            $(".a").droppable({

                drop: function (event, ui) {
                    if (thisone != 1) {goFail();}
                    if (thisone == 1) {
                        nd++;
                        if (nd > 5) {
                           goFail();
                        }

                       // $(this).draggable('disable');
                        $(this).append($(ui.helper).html());
                    }
                    else {

                        $(".abox").draggable('disable');
                       $(".bbox").draggable('disable');
                    }
                }
            });

            $(".b").droppable({

                drop: function (event, ui) {

                    if (thisone == 100) {
                       ncup++;
                        if (ncup > 7) {
                            goFail();
                        }
                        // $(this).draggable('disable');
                        $(this).append($(ui.helper).html());
                    }
                    else {
                        $(".abox").draggable('disable');
                        $(".bbox").draggable('disable');
                    }
                }
            });
            $(".c").droppable({

                drop: function (event, ui) {

                    if (thisone == 1000) {
                        ncake++;
                        if (ncake > 4) {
                            goFail();
                        }
                        // $(this).draggable('disable');
                        $(this).append($(ui.helper).html());
                    }
                    else {
                        $(".cbox").draggable('disable');
                        $(".cbox").draggable('disable');
                    }
                }
            });
        });

        function goFail(){

            var prev = sessionStorage.getItem("caketries");
                   $(".missed").hide();


            if (prev != 1 )
            {
                sessionStorage.setItem("caketries", "1") ;
                $("#wrong1").show();
                prev = sessionStorage.getItem("caketries");
              

            }
        else {
                sessionStorage.setItem("caketries", "0") ;
                $("#wd2").hide();
                $("#container").addClass("green");
            $("#wrong2").show();

        }

        }
    }) ;

</script>

Next Page →