It ought to be easier than this and perhaps I could have found an easier way if I had more patience than the average ant or very young infant. However, I don’t.
Here was the problem. I wanted control charts for two different variables, satisfaction with care, surveyed at discharge, and satisfaction with care 3 months after discharge.
The data was given in the form of the number of patients out of a sample of 500 who reported being unsatisfied. PROC SHEWHART does not have a WEIGHT statement. You could try using the WEIGHT statement in PROC MEANS but that won’t work. It will give you the correct means if you have the number unsatisfied (undisc = 1)  and the number satisfied (undisc =0) out of 500, but the incorrect standard deviation because the N will be 2, according to SAS.
So, here is what I did and it was not elegant but it did work.
  1. I created two data sets, named q4disc and q4disc3, keeping the month of discharge and the number dissatisfied at discharge and dissatisfied 3 months later, respectively.
  2. I read in the 3 values I was given, month of sample, number unsatisfied at discharge and number unsatisfied 3 months later.
  3. Now, I am going to create a data set of raw data based on the numbers I have. First, in a do loop, for as many as people said they were unsatisfied, I set the value of undisc (unsatisfied at discharge) to to 1 and output a record to the q4disc dataset.
  4. Next, in a do loop for 500- the number dissatisfied, I set undisc = 0 and output a record to the same dataset.
  5. Now, repeat steps 3 & 4 to create a data set of the values of people unhappy 3 months after discharge.
  6. Following the programming statements are the original data.

So, now, I have created two data sets of 6,000 records each with three variables. Doesn’t seem that efficient of a way to do it but now I have the data I need and it didn’t take long and doesn’t take up much space.

data q4disc (keep = undisc month) q4disc3 (keep = undisc3 month) ;
input month $ discunwt disc3unwt ;
Do I = 1 to discunwt ;
    undisc = 1 ;
    output q4disc ;
end ;
Do J = 1 to (500-discunwt) ;
   undisc = 0 ;
   output q4disc;
end ;
Do k = 1 to disc3unwt ;
   undisc3 = 0 ;
   output q4disc3 ;
End ;
Do x = 1 to (500 -disc3unwt) ;
  undisc3 = 1 ;
   output q4disc3;
end;
datalines ;
JAN 24 17
FEB 44 24
MAR 36 15
APR 18 8
MAY 16 11
JUN 19 7
JUL 17 11
AUG 18 9
SEP 27 10
OCT 26 15
NOV 29 12
DEC 26 11
;
RUN ;
proc shewhart data=WORK.Q4disc;
xschart undisc * month /;
run;
According to SAS

“The XSCHART statement creates and charts for subgroup means and standard deviations, which are used to analyze the central tendency and variability of a process.”

For the three months after discharge variable, just do another PROC SHEWHART with q4disc3 as the dataset and undisc3 as the measurement variable.

OR , once you have the dataset created, you can get the chart using SAS Studio by selecting the CONTROL CHARTS task

Control charts window with month as subgroup and undisc as measure

Either way will give you this result:

Control chart

I wouldn’t normally consider Excel for analysis, but there are four reasons I’ll be using it sometimes for the next class I’m teaching. First of all, we start out with some pretty basic statistics, I’m not even sure I’d call them statistics, and Excel is good for that kind of stuff. Second, Excel now has data analysis tools available for the Mac – years ago, that was not the case. Since my students may have Mac or Windows, I need something that works on both.  Third, many of the assignments in the course I will be teaching use small data sets – and this is real life. If you are at a clinic, you don’t have 300,000,000 records.Four, the number of functions and ease of use of functions in Excel has increased over the years.

For example,

TRANSPOSE AN ARRAY IN EXCEL

Select all of the data you want and select COPY

Click on the cell where you want the data copied and select PASTE SPECIAL from the edit menu. Click the bottom right button next to TRANSPOSE and click OK. Voila. Data transposed.

PERFORMING A REGRESSION ANALYSIS

Once you have your data in columns (and if it isn’t, see TRANSPOSE above), you just need to

Excel add-ons window

  1. Add the Analysis Pack. You only need to do this once and it should be available with Excel forever more.  To do that, go to TOOLS and select EXCEL ADD-INS. Then click the box next to Analysis ToolPak and click OK.
  2.  Now, go to TOOLS, select DATA ANALYSIS and then pick REGRESSION ANALYSIS

Regression analysis menuYou just need to select the range for the Y variables, probably one column, select the range for the X variables, probably a column adjacent to it, and click OK. You may also select confidence limits, fit plots, residuals and more.

So, yeah, for simple analyses, Excel can be super-simple.

Believe it or not, this is what I do for fun. In my day job, I make video games that teach math and social studies.

You can check out the games we make here.

Wigwam

Working on some fun things  using  SAS Studio, so, expect a number of short posts over the next few days. Last time, I talked about the utilities and how easy it is to import an Excel file. Now let’s say maybe you are not aUnix person  and you have no idea how exactly to code a LIBNAME statement  that is not on Windows.  Never fear, it’s super easy.

Right click on the folder where you want to save your data set. From  the  menu that appears, select the last choice which is ‘properties’.

A window will come up that shows the name of your folder and its location, it’s easy to spot because it’s right next to the word Location. It will look something like this:

/home/your_name/data_analysis_examples

to save your data you have uploaded an Excel file and imported into  SAS, remember that the files were saved in the work directory and named import, import 1 etc.If I wanted to  sort those data sets and then merge them together into a permanent data set, I’d do it in the exact same way as if I was using Windows. The only thing different is the LIBNAME statement, as you can see below.

LIBNAME in “/home/your_name/data_analysis_examples”;

Proc sort data = work.import;
By username;

Proc sort data = work.import1;
By username;
data in.crossroads ;
merge work.import work.import1;
By username;
run;

If, later on, I want to use that data set in a program, again I would do it exactly the same as in Windows and the only thing different would be my LIBNAME.

 

LIBNAME in “/home/your_name/data_analysis_examples”;

 

Proc means data = in.Crossroads;

Run;

Completely random fact, unrelated to SAS studio, or maybe it is related,  I hurt my arm again, so I have been writing my SAS programs using Dragon voice recognition software.  If you are going to use SAS studio on a Mac, you should be aware that Dragon does not work on Firefox on the Mac so open up Chrome if you want to use voice recognition software, or at least the software from Dragon. This has nothing to do with SAS specifically.

 Support My Day Job!  Buy Games That Teach Social Studies And Math And Have Fun! (You can even get Making Camp for free! 

Buy our games

 

It’s been about a year since I last looked at SAS Studio much  –

OKAY, LISTEN UP PEOPLE

In my previous life, I taught for years at a small liberal arts college, with under 2,000 students. I also taught at a tribal community college with less than 500 students. In neither of those situations did we have the funding to pay for expensive software. SAS Studio is FREE. I could have really used this when I was teaching at those small schools. Check it out.

students

So, it’s free, but I don’t teach that often because I have a day job as president of The Julia Group where clients want me to do some much stuff we quit taking new clients years ago and also president of 7 Generation Games where they want me to do more stuff.

The last class I taught, we used SAS on a remote desktop – which I liked a lot. So, yes, no SAS Studio for me for a while.

In case, like me, you are more a programming type and haven’t been too pointy-clicky, perhaps you missed the TASKS AND UTILITIES. Well, don’t.

Let’s say you want to import a file from Excel into SAS. First, upload it by clicking on the folder where you want it stored and then clicking the upload button at the top left of your screen.

Look to the bottom left of your screen and you will see this. Well, you’ll see the Tasks and Utilities anyway, the stuff above it is files for class examples.

Tasks and utilities menu

Click on the arrow next to Tasks and Utilities and you’ll find all kinds of cool stuff.  Click the arrow next to utilities and pick IMPORT DATA

upload window

Drag the file you uploaded into the window on the right and, voila!

There you go, your Excel file is imported into SAS. You can see the code in the CODE window. DON’T FORGET TO CLICK THE LITTLE RUNNING GUY AT THE TOP OF YOUR SCREEN TO RUN THIS.

Note that the file is named WORK.IMPORT because you’ll need that name for the next task, but that’s next time because I have to go back to work.

/* Generated Code (IMPORT) */
/* Source File: testit.xlsx */
/* Source Path: /home/annmaria.demars/homework */
/* Code generated on: 2/6/17, 11:27 PM */

%web_drop_table(WORK.IMPORT);

FILENAME REFFILE ‘/home/annmaria.demars/homework/testit.xlsx’;

PROC IMPORT DATAFILE=REFFILE
DBMS=XLSX
OUT=WORK.IMPORT;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=WORK.IMPORT; RUN;

%web_open_table(WORK.IMPORT);

 

Screen Shot 2017-02-06 at 11.36.53 PM

SAS nicely runs the PROC CONTENTS, too, so you end up with a table telling you the contents of your new data set.

Once you have your data imported, you can use the TASKS menu to complete (what else) statistical tasks. I wrote about those in some other posts below:

My point is, there is a lot of stuff under that little tab and you should check it out. Also, if you are a small school, SAS Studio is an awesome resource you can get for free and I bet you could use it.


Support my day job! Learn about Ojibwe history and culture. Practice multiplication and division

FREE GAME FOR iPad or Android tablets

wig wam in snow

Click over here to find links to Making Camp in the App Store or on Google Play. Yes, it’s free.

So this is attempt number two with voice recognition software. Now that I have my new custom splint on and I look something like Darth Vader with the robot arm I thought I had better not just keep doing the same thing that caused this problem in the first place.

my darth vader arm

The arthritis in my hands has just been getting worse to the point where I just had my left thumb, reconstructed. I know from other sports injuries that what happens when you injure one part is that other body parts get stressed and start to get injured. For example if you injure your right knee you start putting so much weight on your left knee to compensate that your left knee soon is giving you problems as well.

The Dragon software that I have only works on Windows although the Mac version is coming out very soon. So far it seems to work better than read and write the Google Chrome extension I have used.

What I like about this software so far is that it can do more than just type. It will open a web browser you can correct and underline words and do other formatting.

It’s going to be kind of weird to get use to dictating instead of typing. I’m sure it’s going to take me a while after all I’ve been typing for probably 40 years. I’m certain though that this will help the problems I have with my hands a lot. I’m not sure I’ll be able to do a lot of coding with this, though who knows.

I don’t think it will really work on planes and airports where I spend an inordinate amount of my time. Maybe it will though, I have a friend who is visually impaired and she talks into her phone all the time giving it messages and commands so I’m sure it’s just a matter of getting used to it.

Well I currently have about 900 unanswered e-mail messages, I also have an IRB application to complete and loads of documentation to write. I expect just like learning to use a word processor for the first time this will be a bit of a time-consuming learning process but well worth it in the end.

You’d think that talking to your computer would feel more natural and it would be easier to write but I can’t say that’s the case at all. Obviously I’m much more used to typing.

We’ll see as time passes if this gets easier. I presume it does.

Do you use voice recognition software to type? If so, how long was it before you felt comfortable doing it?

 

————

SCHOOL IS STARTING ! DONATE 7 GENERATION GAMES TO A STUDENT, CLASSROOM OR SCHOOL under $20 for all three games,

Fish lake woman
————-

you can make anything into an opportunity.

For example today I had this very unpleasant operation on my phone actually that was my thumb not my phone. as you may have guessed comma I am now writing using a piece of voice recognition software.

me wearing cast inside giant foam pillow on arm

It’s a Google Chrome extension. this makes me happy for two reasons. the pain pills are not one of them.

the first reason is that I have been wanting to experiment more with Google Chrome extensions.  

At some point we are planning on using Chrome extensions 4  for our game making camp. this is a great opportunity for me to start learning more about how extensions work.

The second reason this is a great opportunity is that I have wondered for some time what I’m going to do when I get old.

I’m just not sitting around knitting type of person. my hand has been bothering me for quite some time. it’s only a matter of time until my other hand starts to bother me as well. So I’ve been wondering about this comma what could I do if I didn’t work.

Now all kinds of people including all of my relatives most of my friends tell me all of the time that I should not work so much. I mention that I did not ask any of these people their opinion? You see the issue isn’t that I can’t think of things to do instead of work. the point is that I like to work and the thought that I couldn’t do it anymore is a bit depressing.

 

There are a few drawbacks of  read and write for Google Chrome which you may have already detected. One is that it has a rather random view of capitalization.  I’m sure that if you read this post closely you can identify other drawbacks. for example like Siri it often misinterpret your words. I left most of the errors here so that you could see. I did fix a few where the sentence made absolutely no sense.

I found it works better if you speak more slowly.

 

So far  it hasn’t been too bad. it was super easy to install and I figured out how to use the speech to text by watching 2 minute YouTube video.

 

On the other hand haha  that’s a joke since I only have one hand –  it seems like the only way to get the premium features is to be at a school that licenses those at the school or maybe classroom level.  right now I’m using the 30 day trial version.

 

The other problem I have found is that sometimes the microphone just randomly quits working.  toggle it off and on to fix Problem.

 

2 move 2A new line All you need to do is say those words which ironically since I wanted actually those words in the sentence I had to take them otherwise it would have gone to a well you know.

 

now if you read this you can see it kind of makes me look like a cross between a teenager using text-speak and someone with a very poor grasp of grammar and spelling. however I think that much of that could be improved with practice and getting 2 no the software better. we’ll see if with practice the voice recognition can be accurate at a faster speed because this slow pace is pretty annoying. the invisible developer just told me that I  sound like a bit from Find old radio show called the slow talkers of America.

New line I also think it would be really really difficult to write code using this with all of the special characters required like  square brackets and curly brackets parentheses etcetera etcetera.

 

After a few weeks tough trying this out I’m going to check out dragon I have a friend who is visually impaired who uses that so I’m going to ask her 2 show me because I’m sure she knows all of the special features as I believe she even used it to write her thesis.. You’re line

If you have any other suggestions either 4 Chrome extensions in general or on using Speech-to-Text software please post it and the comments.

————

SCHOOL IS STARTING ! DONATE 7 GENERATION GAMES TO A STUDENT, CLASSROOM OR SCHOOL under $20 for all three games,

Fish lake woman

A picture says 1,000 words – especially if you are talking to a non-technical audience. Take the example below.

We wanted to know whether the students who played our game Fish Lake at least through the first math problem and the students who gave up at the first sight of math differed in achievement. Maybe the kids who played the games were the higher achieving students and that would explain why they did better on the post-test.

You can see from the chart below this is not the case. The distribution of pretest scores is pretty similar for the kids who quit playing (the top) and those who persisted.

Graphs produced by ODSBeneath the graphs, you can see the box and whisker plots. The persistent group has fewer students at the very low end and we actually know why that is – students with special needs in the fourth- and fifth-grade, for example, those who were non-readers, could not really play the game and either quit on their own very soon or were given alternative assignments by the teacher.

The median (the line inside the box), the mean (the diamond) and 25th percentile (the bottom of the box) are all slightly higher for the persisting group – for the same reason, the students with the lowest scores quit right away.

These data tell us  that the group that continued playing and the group that quit were pretty similar except for not having the very lowest achieving students.

So, if academic achievement wasn’t a big factor in determining which students continued playing the games, what was?

That’s another chart for another day, but first, try to guess what it was.

———–

Would you like to play one of our games? Check them out here – all games run on Mac and Windows.

trail

What about Chromebooks?  Check out Forgotten Trail.

characters traveling on map

If I were to give one piece of advice to a would-be program evaluator, it would be to get to know your data so intimately it’s almost immoral.

Generally, program evaluation is an activity undertaken by someone with a degree of expertise in research methods and statistics (hopefully!) using data gathered and entered by people’s whose interest is something completely different, from providing mental health services to educating students.

Because their interest in providing data is minimal, your interest in checking that data better be maximal. Let’s head on with the data from the last post. We have now created two data sets that have the same variable formats so we are good to go with concatenating them.
DATA answers hmph;
SET fl_answers ansfix1 ;
IF username IN(“UNDEFINED”,”UNKNOWN”) or INDEX(username,”TEST”) > 0 THEN OUTPUT hmph;
ELSE OUTPUT answers;

PRO TIP : I learned from a wise man years ago that one should not just gleefully delete data without looking at it. That is, instead of having a dataset where you put the data you expect and deleting the rest, send the unwanted data to a data set. If it turns out to be what you expected, you can always delete the data after you look at it.

There should be very few people with a username of  ‘UNDEFINED’ or ‘UNKNOWN’. The only way to get that is to be one of our developers who are entering the data in forms as they create and test them, not by logging in and playing the game.   The INDEX function checks in the variable in the first argument for the string given in the second and returns the starting position of the string, if found. So,  INDEX(username, “TEST”) > 0 looks for the word TEST anywhere in the username.

Since we ask our software testers to put that word in the username they pick, it should delete all of the tester records. I looked at the hmph data set and the distribution of usernames was just as I expected and most of the usernames were in the answers data set with valid usernames.

Did you remember that we had concatenated the data set from the old server and the new server?

I hope you did because if you didn’t you will end up with a whole lot of the same answers in their twice.

Getting rid of the duplicates

PROC SORT DATA = answers OUT=in.all_fl_answers NODUP ;
by username date_entered ;

The difference between NODUP and NODUPKEY is relevant here. It is possible we could have a student with the same username and date_entered because different schools could have assigned students the same username. (We do our lookups by username + school). Some other student with the same username might have been entering data at the same time in a completely different part of the country. The NODUP option only removes records if every value of every variable is the same. The NODUPKEY removes them if the variables in the BY statement are duplicates.

All righty then, we have the cleaned up answers data, now we go back and create a summary data set as explained in this post. You don’t have to do it with SAS Enterprise Guide as I did there, I just did it for the same reason I do most things, the hell of it.

MERGING THE DATA

PROC SORT DATA = in.answers_summary ;
BY username ;

PROC SORT DATA = in.all_fl_students ;
BY username ;

DATA in.answers_studunc odd;
MERGE in.answers_summary (IN=a) in.all_fl_students (IN=b) ;
IF a AND b THEN OUTPUT in.answers_studunc  ;
IF a AND NOT  b THEN OUTPUT odd ;

The PROC SORT steps sort. The MERGE statement merges. The IN= option creates a temporary variable with the name ‘a’ or ‘b’. You can use any name so I use short ones.  If there is a record in both the student record file and the answers summary file then the data is output to a data set of all students with summary of answers.

There should not be any cases where there are answers but no record in the student file. If you recall, that is what set me off on finding that some were still being written to the old server.

LOOK AT YOUR LOG FILE!

There is a sad corner of statistical purgatory for people who don’t look at their log files because they don’t know what they are looking for. ‘Nuff said.

This looks exactly as it should. A consistent finding in the pilot studies of assessment of educational games has found a disconcertingly low level of persistence. So, it is expected that many players quit when they come to the first math questions.  The fact that of the 875 players slightly less than 600 had answered any questions was somewhat expected. As expected, there were no records where

NOTE: There were 596 observations read from the data set IN.ANSWERS_SUMMARY.
NOTE: There were 875 observations read from the data set IN.ALL_FL_STUDENTS.
NOTE: The data set IN.ANSWERS_STUDUNC has 596 observations and 11 variables.
NOTE: The data set WORK.ODD has 0 observations and 11 variables.

So, now, after several blog posts, we have a data set ready for analysis ….. almost.


Want to see these data at the source?

Check out our game, playable on Mac or Windows. Download Spirit Lake or Fish Lake  to play, or for Forgotten Trail, just click on the link provided, no download required.

Mom and kid

You can also donate a copy of the game to a school or give as a gift.

Further Reading

For more on SAS character functions check out Ron Cody’s paper An Introduction to Character Functions, an oldie but goodie from WUSS back in 2003.

Or you could read my last post!

This paper by Britta Kelsey from SAS Users Group International in 2005 will tell you more than you want to know about the NODUP and NODUPKEY.

At the Western Users of SAS Software conference (yes, they DO know that is WUSS), I’ll be speaking about using SAS for evaluation.

“If the results bear any relationship at all to reality, it is indeed a fortunate coincidence.”

I first read that in a review of research on expectancy effects, but I think it is true of all types of research.

This is me on my soapbox

This is me on my soapbox

Here is the interesting thing about evaluation – you never know what kind of data you are going to get.  For example, in my last post I had created a data set that was a summary of the answers players had given in an educational game, with a variable for the mean percentage correct and another variable for number of questions answered.

When I merged this with the user data set so I could test for relationships between characteristics of these individuals – age, grade, gender, achievement scores – and perseverance I found a very odd thing. A substantial minority were not matched in the users file. This made no sense because you have to login with your username and password to play the game.

The reason I think that results are often far from reality is just this sort of thing – people don’t scrutinize their data well enough to realize when something is wrong, so they just merrily go ahead analyzing data that has big problems.

In a sense, this step in the data analysis revealed a good problem for us. We actually had more users than we thought. Several months ago, we had updated our games. We had also switched servers for the games. Not every teacher installed the new software so it turned out that some of the records were being written to our old server.

Here is what I needed to do to fix this:

  1. Download the files from our servers. I exported these as .xls files.
  2. Read the files into SAS
  3. Fix the variables so that the format was identical for both files.
  4. Concatenate the files of the same type, e.g., student file the student file from the other server.
  5. Remove the duplicates
  6. Merge the files with different data, e.g., answers file with student file

 

I did this in a few easy steps using SAS.

  1. USE PROC IMPORT to read in the files.

Now, you can use the IMPORT DATA option from the file menu but that gets a bit tedious if you have a dozen files to import.

TIP: If you are not familiar with the IMPORT procedure, do it with the menus once and save the code. Then you can just change the data set names and copy and paste this a dozen times. You could also turn it into a macro if you are feeling ambitious, but let’s assume you are not. The code looks like this:

PROC IMPORT OUT= work.answers  DATAFILE= “C:\Users\Spirit Lake\WUSS16\fish_data\answers.xls”
DBMS=EXCEL REPLACE;
RANGE=”answers$”;
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;

Assuming that your Excel file has the names of the columns – ( GETNAMES = YES) . All you need to do for the next 11 data sets is to change the values in lower case – the file name you want for your SAS file goes after the OUT =  , the Excel file after DATAFILE =  and the sheet in that file that has your data after the RANGE =.

Notice there is a $ at the end of that sheet name.

Done. That’s it. Copy and paste however many times you want and change those three values for output dataset name, location of the input data and the sheet name.

2. Fix the variables so that the format is identical for both files

A. How do you know if the variables are the same format for each file?

PROC CONTENTS DATA = answers ;

contents of data set

This LOOKS good, right?

B. Look at a few records from each file.

OPTIONS OBS= 3 ;
PROC PRINT DATA = fl_answers_new ;
VAR  date_entered ;
PROC PRINT DATA = fl_answers_old ;
VAR  date_entered ;

OPTIONS OBS = MAX ;

PAY ATTENTION HERE !!! The OPTIONS OBS = 3 only shows the first three records, that’s a good idea because you don’t need to print out all 7,000+ records . However, if you forget to change it back to OBS = MAX then all of your procedures after that will only use the first 3 records, which is probably not what you want.

So, although my PROC CONTENTS showed the files were the same format in terms of variable type and length, here was a weird thing, since the servers were in different time zones, the time was recorded as 5 hours different, so

2015-08-20 13:23:30

Became

2015-08-20 18:23:30

Since this was recorded as a character variable, not a date (see the output for the contents procedure above), I couldn’t just subtract 5 from the hour.

Because the value was not the same, if I sorted by username and date_entered , each one of these that was moved over from the old server would be included in the data set twice, because SAS would not recognize these were the same record.

So, what did I do?

I’m so glad you asked that question.

I read in the data to a new data set and the third statement gives a length of 19 to a new character variable.

Next, I create a variable that is the value of the date_entered variable that start at the 12th position and go for the next two (that is, the value of the hour).

Now, I add 5 to the hour value. Because I am adding a number to it , this will be created as a numeric value. Even though datefix1 is a character variable  – since it was created using a character function, SUBSTR, when I add a number to it, SAS will try to make the resulting value a number.

Finally, I’m putting the value of datefixed to be the first 11 characters of the original date value , the part before the hour. I’m using the TRIM function to get rid of trailing blanks. I’m concatenating this value (that’s what the || does) with  exactly one blank space. Next, I am concatenating this with the new hour value. First, though, I am left aligning that number and trimming any blanks. Finally, I’m concatenating the last 6 characters of the original date-time value. If I didn’t do this trimming and left alignment, I would end up with a whole bunch of extra spaces and it still wouldn’t match.

I still need to get this to be the value of the date_entered variable so it matches the date_entered value in the other data set.

I’m going to DROP the date_entered variable, and also the datefix1 and datefixn variables since I don’t need them any more.

I use the RENAME statement to rename datefixed to date_entered and I’m ready to go ahead with combining my datasets.

DATA ansfix1 ;
SET flo_answers ;
LENGTH datefixed $19 ;
datefix1 = SUBSTR(date_entered,12,2);
datefixn = datefix1 +5 ;
datefixed = TRIM(SUBSTR(date_entered,1,11)) || ” ” || TRIM(LEFT(datefixn)) || SUBSTR(date_entered,14,6) ;
DROP date_entered datefix1 datefixn ;
RENAME datefixed = date_entered ;

 


They’re fun and will make you smarter – just like this blog!

Check out the games that provided these data!

Fish lake splash screen

Buy one for your family or donate to a child or school.

 

 

Occasionally, a brave student will ask me,

When will I ever use this?

The “this” can be anything from a mixed model analysis to nested arrays. (I have answers for both of those, by the way.)

I NEVER get that question when discussing topics like filtering data, whether for records or variables, because it is so damn ubiquitous.

computer in a field

Before I headed out to be, literally, testing in the field (you can read why here) , I was working on an evaluation of the usability of one of our games, Fish Lake.

I had expected to find a correlation between performance and persistence but it didn’t quite turn out that way because the players who had 100% of the problems correct skewed the results.

My next thought was that many students played the game for a very short time, got the first answer correct and then quit. I decided to take a closer look at those people.

First step: from the top menu select TASKS, then DATA, then FILTER AND SORT

filter and sort

Second step:  Create the filter. Click on the FILTER tab, select from the drop-down menu the variable to use to filter, in this case the one named “correct_Mean” , select the type of filter in the next drop-down menu, in this case EQUAL TO and in the box, enter the value you want it to equal. If you don’t remember all of the values you want, clicking on the three dots next to that box will bring up a list of values. You can also filter by more than one variable, but in this case, I only want one, so I’m done.Create filter

Third step:  Select the variables. Steps two and three don’t have to be done in a particular order, but you DO have to select variables or your procedure won’t run, since it would end up with an empty data set. I do the filter first so I don’t forget. I know the filter is the whole point and you’re probably thinking you’d never forget that but you’re probably smarter than me or never rushed.

Selecting variables

If you click the double arrows in the middle, that will select all of the variables.  In this case, I just selected the two variables I wanted and clicked the single arrow (the top one) to move those over.

Why include correct_mean, since obviously that is a constant?

Because I could have made a mistake somewhere and these aren’t all with 100% correct. (Turns out, I didn’t and they were, but you never know in advance if you made a mistake because if you did then you wouldn’t make it.)

I click OK and now I have created a data set of just the people who answered 100% correctly.

For a first look, I graphed the frequency distribution of the number of questions answered by these perfect scorers.  To do this,

  1. Go to TASKS > GRAPH > Bar Chart

bar chart menu to select type of graph

2. Click on the first chart to select it, that’s a simple vertical bar chart

data menu
3. Click on the DATA tab and drag correct_N under column to chart

appearance option

4. Under APPEARANCE click the box next to SPECIFY NUMBER OF BARS. The default here is one bar for each unique data value, which is already clicked. Caution with this if you might have hundreds of values, but I happen to know the max is less than 20.

bar chart of number of answersI thought I’d find a bunch answered one question and a few answered all of the questions and maybe those few were data entry errors, say teachers who tested the game and shouldn’t be in the databaseWhen I look at this graph, I’m surprised. There are a lot more people who had answered 100% correctly than I expected and they are distributed a lot more across the number of questions than I expected.  That’s the fun of exploratory data analysis. You never know what you are going to find.

SO, now what?

 


Want to see the game that generated these data? Canoe rapids, catch fish and learn fractions.

Fish lake splash screen

Runs on Mac and Windows.


So, now what?

I want to find out more about the relationship among persistence and performance. To do this, I’m going to need to merge the answers summary data set with demographics.

I’m going to go back to the Summary Data Set I created in the last post (remember that one) and just filter variables this time, keeping all of the records.

Again, I’m going to go to the TASKS menu, select DATA then FILTER AND SORT, this time, I’m going to have no filter and select the variables.

Since the pop-up window opens with the VARIABLES tab selected, I just click the variables I want, which happens to be “correct_N”,” correct_mean” and “username”, click the single arrow in between the panes to move them over, and click OK at the bottom of the pop-up window. Done! My data set is created.

variables selected

You can always click on PROGRAM from the main menu to write code in SAS Enterprise Guide, but being an old dinosaur type, I’d like to export this data set I just created and do some programming with it using SAS. Personally, I find it easier to write code when I’m doing a lot of merging and data analysis. I find Enterprise Guide to be good for the quick looks and graphics but for more detailed analysis, the old timey SAS Editor is my preference.  If you happen to be like me, all you need to do to output your data set is click on it in the process flow and select EXPORT.

export file option

You want to export this file as a stand-alone data set, not as a step in a project. Just select the first option and you can save it like any file, select the folder you want, give it the name you want. No LIBNAME statement required.

And it’s a beautiful sunny day in Santa Monica, so that’s it on this project for today.

—–

Next Page →