A couple of weeks ago, I wrote a post that showed student achievement was not what made a difference in whether students persisted in playing our games. With the exception of a few students who were very low achieving, e.g., non-readers in fourth-grade and up, there was little difference in the pretest scores of students who gave up at the first math problem and those who persisted.
What did make the difference, then? My next hypothesis was that it was the teacher that made a difference. To test this hypothesis, I did an Analysis of Variance with SAS using PROC GLM. The WHERE statement was used to eliminate extreme observations from the analysis – students who had attempted more than 50 problems. Complete code is shown below.
PROC GLM DATA= teach ;
CLASS teacher ;
MODEL prob_correct_N = teacher ;
LABEL prob_correct_N = “Problems Completed” ;
WHERE prob_correct_N < 50 ;
The teacher variable explained 22% of the variance (F = 9.79, p < .0001). When grade level (grades 4-7 were included in the analysis) was added to the model, the additional explained variance was trivial – about 0.5%.
In the figure above, produced by default for this analysis when ODS GRAPHICS is on, a box and whisker plot is given for each teacher, showing mean, median and outliers (note the numbers on the plot are observation number for outliers and not a value for the dependent variable). There are clear differences among the teachers in mean number of problems completed.
The next step (thank goodness this is a longitudinal study) is to see what those teachers are doing that explains persistence.
Shameless plug #1 :
I will be giving a talk at the Western Users of SAS Software conference tomorrow (Thursday, September 8, 2016) at the Grand Hyatt in San Francisco that includes a lot more detail about the use of SAS for program evaluation.
Shameless plug # 2:
I was going to write more about graphics using SAS but I realized it was a Friday before a three-day weekend and most people don’t want to think about work – and that is the point of this post.
Many of the people who I know have done a far better job planning for retirement financially than they have planning for anything in their life other than work. So yes I guess this is one more post about work-life balance, of which I personally have none. This is obvious by the fact that I am writing this blog using voice input software rather than sitting on the couch watching reality TV – or maybe by the fact that I took one day off to have an operation and was back working 10 hours a day pretty much as soon as the anesthesia wore off. But we’re not here to talk about me, hypocrite that I may be.
Recently, “I ran into” three people who I have known for years. I put ran into in quotation marks because one of them was actually dead and I simply received the news from a mutual friend.
Let’s call them all Bob(not their real names) and start with Bob number one(or should I call him Dead Bob). If you ever heard the song, Eleanor Rigby, that was kind of Dead Bob but with more money, which I’m sure you will agree is not much help to you if you are dead. Now, DB made a number of what I would consider ethical compromises in his career. He always used the excuse if I don’t I’ll lose my job. I never understood that because why would you want a job where you had to compromise your principles? Also, it wasn’t even as if DB was making millions of dollars – not that that would make it okay, but it would maybe make it slightly understandable. So here you have it, someone who lived only to their 50s, had a middle management position, not a lot of respect from the people around them and that was it. When I look back on the times that DB overlooked sexual harassment or agreed to promote the boss’s nephew who had no more qualifications than a drowned rat, I wonder what was the point? In the end, he had a mediocre life with no respect from the people around him. What a waste! Don’t be a DB.
Bob number two, is 70 years old quite well-off financially and a pretty good guy. I asked him why he was still working when he didn’t need the money and I know that he just likes his job kind of okay. I was a bit shocked by what he told me. He’s been married forever and he said that he just couldn’t take the idea of spending all that time with his wife! He said it’s fine the way it is now, when they see each other at dinner time, maybe go out and try a nice restaurant. However, he said that when he takes a few days of vacation, or even on the weekends, she kind of drives him nuts. She does things like ask him to take out the trash and then two minutes later asking if you took out the trash. It’s stuff like that multiplied 1000 times over. He says that it’s much better to keep things the way they are because if he was home 24 – 7 he’d probably end up strangling her.
Bob number three is the most like me. He’s 67 years old, owns his own business and could easily just shut the doors, go home and not work for the rest of his life. He still goes to work every day though, not because he likes his work so much, although I know that he does, but because he’s never really done anything but work. Yes, like many people, he was into sports when he was young but that’s been 40 years or more. He’ll work 50 weeks a year and maybe going to vacation for two. In brief, he still working because he doesn’t really have anything to do but work. He doesn’t have any real hobbies that he’s into and is not one to sit around and watch TV. So, Bob Three is pretty much still working on autopilot – not to imply that he’s not doing a good job but rather he’s doing a good job because that’s what he’s always done.
All three Bobs could have had more of a life than they did/do. Maybe it’s too late for them but I feel like Scrooge in a Christmas Carol where I’m looking at the past and future and it’s my opportunity to change what the future would be. If you’re not as old as Bob two or three, and you are not dead (which I presume because you’re reading this blog) then it’s probably an opportunity for you, too.
Maybe you want to try that now, try to imagine what it would be like if you are retired and if you really hate that picture do something to change it.
One of my concerns not being able to use my left hand has been how I’m going to be able to continue coding.
However, SAS was a whole different story. Working on my paper for the Western Users of SAS Software conference, I had to run some analyses just to verify what I was saying in the paper. I’m a little obsessive like that. I may have run a procedure 500 times but before I write about it in the paper I will still run some analysis just to be 100% sure that the binomial option does exactly what I think it does.
Also, it is very helpful for an audience particularly if seeing a technique for the first time to see the output that is created. Because SAS is a very natural language, especially when using the statistical procedures rather than say, macro programming, it was actually quite easy to run a PROC FREQ with pretty much every known option. Even a data step that included data lines and entering the data was a piece of cake.
Now of course most programs are going to be a lot more complicated than a proc freq with a data step, but still I can see how I could easily do a lot of SAS programming using Dragon.
Once I figure out a few of the mathematical symbols, I should be able to do just about anything with SAS,
I think this is a pretty important point because if you have a physical disability that makes it difficult for you to use a keyboard you might want to consider learning SAS as a valuable career skill. If you put that knowledge to gather with knowledge in the content area, for example, a degree in statistics you would be very marketable.
This was on my mind because I just returned from a site visit at a vocational rehabilitation project where their goal is to find jobs for people with disabilities.
I wasn’t thinking of going to anymore SAS conferences for a while after the one in September just because my schedule is very very packed. However, I think I might make an exception in a year or two and demonstrate how one could use Dragon to write SAS programs using only their voice.
Actually, Dragon worked better with SAS than it did with this blogging software. Yes, I am now only writing my blogs using voice input software as I saving any typing I do for actual programming.
So this is attempt number two with voice recognition software. Now that I have my new custom splint on and I look something like Darth Vader with the robot arm I thought I had better not just keep doing the same thing that caused this problem in the first place.
The arthritis in my hands has just been getting worse to the point where I just had my left thumb, reconstructed. I know from other sports injuries that what happens when you injure one part is that other body parts get stressed and start to get injured. For example if you injure your right knee you start putting so much weight on your left knee to compensate that your left knee soon is giving you problems as well.
The Dragon software that I have only works on Windows although the Mac version is coming out very soon. So far it seems to work better than read and write the Google Chrome extension I have used.
What I like about this software so far is that it can do more than just type. It will open a web browser you can correct and underline words and do other formatting.
It’s going to be kind of weird to get use to dictating instead of typing. I’m sure it’s going to take me a while after all I’ve been typing for probably 40 years. I’m certain though that this will help the problems I have with my hands a lot. I’m not sure I’ll be able to do a lot of coding with this, though who knows.
I don’t think it will really work on planes and airports where I spend an inordinate amount of my time. Maybe it will though, I have a friend who is visually impaired and she talks into her phone all the time giving it messages and commands so I’m sure it’s just a matter of getting used to it.
Well I currently have about 900 unanswered e-mail messages, I also have an IRB application to complete and loads of documentation to write. I expect just like learning to use a word processor for the first time this will be a bit of a time-consuming learning process but well worth it in the end.
You’d think that talking to your computer would feel more natural and it would be easier to write but I can’t say that’s the case at all. Obviously I’m much more used to typing.
We’ll see as time passes if this gets easier. I presume it does.
Do you use voice recognition software to type? If so, how long was it before you felt comfortable doing it?
you can make anything into an opportunity.
For example today I had this very unpleasant operation on my phone actually that was my thumb not my phone. as you may have guessed comma I am now writing using a piece of voice recognition software.
It’s a Google Chrome extension. this makes me happy for two reasons. the pain pills are not one of them.
the first reason is that I have been wanting to experiment more with Google Chrome extensions.
At some point we are planning on using Chrome extensions 4 for our game making camp. this is a great opportunity for me to start learning more about how extensions work.
The second reason this is a great opportunity is that I have wondered for some time what I’m going to do when I get old.
I’m just not sitting around knitting type of person. my hand has been bothering me for quite some time. it’s only a matter of time until my other hand starts to bother me as well. So I’ve been wondering about this comma what could I do if I didn’t work.
Now all kinds of people including all of my relatives most of my friends tell me all of the time that I should not work so much. I mention that I did not ask any of these people their opinion? You see the issue isn’t that I can’t think of things to do instead of work. the point is that I like to work and the thought that I couldn’t do it anymore is a bit depressing.
There are a few drawbacks of read and write for Google Chrome which you may have already detected. One is that it has a rather random view of capitalization. I’m sure that if you read this post closely you can identify other drawbacks. for example like Siri it often misinterpret your words. I left most of the errors here so that you could see. I did fix a few where the sentence made absolutely no sense.
I found it works better if you speak more slowly.
So far it hasn’t been too bad. it was super easy to install and I figured out how to use the speech to text by watching 2 minute YouTube video.
On the other hand haha that’s a joke since I only have one hand – it seems like the only way to get the premium features is to be at a school that licenses those at the school or maybe classroom level. right now I’m using the 30 day trial version.
The other problem I have found is that sometimes the microphone just randomly quits working. toggle it off and on to fix Problem.
2 move 2A new line All you need to do is say those words which ironically since I wanted actually those words in the sentence I had to take them otherwise it would have gone to a well you know.
now if you read this you can see it kind of makes me look like a cross between a teenager using text-speak and someone with a very poor grasp of grammar and spelling. however I think that much of that could be improved with practice and getting 2 no the software better. we’ll see if with practice the voice recognition can be accurate at a faster speed because this slow pace is pretty annoying. the invisible developer just told me that I sound like a bit from Find old radio show called the slow talkers of America.
New line I also think it would be really really difficult to write code using this with all of the special characters required like square brackets and curly brackets parentheses etcetera etcetera.
After a few weeks tough trying this out I’m going to check out dragon I have a friend who is visually impaired who uses that so I’m going to ask her 2 show me because I’m sure she knows all of the special features as I believe she even used it to write her thesis.. You’re line
If you have any other suggestions either 4 Chrome extensions in general or on using Speech-to-Text software please post it and the comments.
A picture says 1,000 words – especially if you are talking to a non-technical audience. Take the example below.
We wanted to know whether the students who played our game Fish Lake at least through the first math problem and the students who gave up at the first sight of math differed in achievement. Maybe the kids who played the games were the higher achieving students and that would explain why they did better on the post-test.
You can see from the chart below this is not the case. The distribution of pretest scores is pretty similar for the kids who quit playing (the top) and those who persisted.
Beneath the graphs, you can see the box and whisker plots. The persistent group has fewer students at the very low end and we actually know why that is – students with special needs in the fourth- and fifth-grade, for example, those who were non-readers, could not really play the game and either quit on their own very soon or were given alternative assignments by the teacher.
The median (the line inside the box), the mean (the diamond) and 25th percentile (the bottom of the box) are all slightly higher for the persisting group – for the same reason, the students with the lowest scores quit right away.
These data tell us that the group that continued playing and the group that quit were pretty similar except for not having the very lowest achieving students.
So, if academic achievement wasn’t a big factor in determining which students continued playing the games, what was?
That’s another chart for another day, but first, try to guess what it was.
Hey, boys and girls, it’s that time again, for another episode of Mama AnnMaria’s Guide on How not to Get Your Sorry Ass Fired.
Lately, I’ve run into a few people who think they are getting away with something because they are SO smart. (Hint: You’re not. I wrote about this months ago. You should have been paying attention.) Let’s call the three of them Bozo 1, Bozo 2 and Bozo 3 (not their real names). There will be a quiz at the end, so pay attention.
Bozo 1 started at as a good employee, so good, in fact, that she received perks like her own office and telecommuting three days a week. Since no one ever questioned the hours she put on her time card, she concluded that no one would know the difference whether she worked 40 hours a week or 38, which was probably true. Gradually, though, she dropped to 30 hours, then 25. She’s still charging for 40, of course. Bozo 1 thinks she’s getting away with it. In fact, her boss let her slide at first because she had been a good employee and the boss figured maybe B1 was just having some personal or health issues. Fed up as the situation has deteriorated, her employer, is starting to put together documentation to fire Bozo 1.
Here is the CRAZY objection someone made the other day:
That’s not fair! The boss should give her some kind of warning that the company is on to her!
To which I could only reply:
Are you fucking kidding me? She is LYING to her employer, basically stealing money in that she is getting paid for work she did not do, and you think that the employer owes HER? Her boss doesn’t owe her squat. She is getting fired and she’s too arrogant to see it coming.
On to Bozo 2: He is very good at his job and made the company a pile of money. He also got the corner office, name your own hours, work from Bali if you want to. The problem is B2 became too impressed with his own success. When his staff told him that the new project was not bringing in as many sales with him in Bali, that he needed to get out there and talk to the customers, he told them to quit bothering him, he knew what he was doing, and go get him coffee because his great brain needs caffeine. One particular sales person insisted to him that the personal approach was NECESSARY, that sales were going DOWN. B2 didn’t want to hear it and fired the sales guy saying, ‘I don’t need negativity in my life.’ Bozo 2 still has lots of money coming into the company as a result of work he did a year or two ago. As soon as that money dries up, he’s gone and no one will miss him because he’s been a pompous, inconsiderate jerk to everyone.
The same self-righteous young person objected to this, too.
He has made a LOT of money for the company.
This is true, and they rewarded him for that money he made with a lot of money and nice benefits. Now he isn’t making money and he is treating everyone like they are dirt beneath his feet. He’s going to get his sorry ass fired.
Bozo 3 is EVERYWHERE, even on The Simpsons, which The Invisible Developer will be delighted to hear me paraphrase,
“You don’t go on strike if you hate your job. You just go in every day and do it half-ass. That’s the American Way.”
I’ve dealt with a lot of people lately who have that attitude. They believe they cannot get fired because their organization is too big to fail and they have seniority. You see this from employees in big banks to universities to government. If you have to wait in line for two hours, they lost your paperwork, they failed to comply with some government regulation, they shrug it off because their organization is untouchable – and so are they, by association.
One advantage of being old is that you get a long-term view. I’ve seen plenty of organizations that hadn’t had layoffs in their entire history close entire departments or plants or institute widespread layoffs. No one is untouchable. Eventually it catches up to you, and who do you think is going to speak up when your budget gets cut? Not John Q. Public you couldn’t be bothered to care about.
So, what did we learn today, children? No matter who you are, where you work or how good you think you are, do your job and don’t be a jerk, because no one gets away with it forever.
CRAZY ! Our ecommerce site went out of business last night with zero days notice. So …
We should have our online store back online tomorrow.
If I were to give one piece of advice to a would-be program evaluator, it would be to get to know your data so intimately it’s almost immoral.
Generally, program evaluation is an activity undertaken by someone with a degree of expertise in research methods and statistics (hopefully!) using data gathered and entered by people’s whose interest is something completely different, from providing mental health services to educating students.
Because their interest in providing data is minimal, your interest in checking that data better be maximal. Let’s head on with the data from the last post. We have now created two data sets that have the same variable formats so we are good to go with concatenating them.
DATA answers hmph;
SET fl_answers ansfix1 ;
IF username IN(“UNDEFINED”,”UNKNOWN”) or INDEX(username,”TEST”) > 0 THEN OUTPUT hmph;
ELSE OUTPUT answers;
PRO TIP : I learned from a wise man years ago that one should not just gleefully delete data without looking at it. That is, instead of having a dataset where you put the data you expect and deleting the rest, send the unwanted data to a data set. If it turns out to be what you expected, you can always delete the data after you look at it.
There should be very few people with a username of ‘UNDEFINED’ or ‘UNKNOWN’. The only way to get that is to be one of our developers who are entering the data in forms as they create and test them, not by logging in and playing the game. The INDEX function checks in the variable in the first argument for the string given in the second and returns the starting position of the string, if found. So, INDEX(username, “TEST”) > 0 looks for the word TEST anywhere in the username.
Since we ask our software testers to put that word in the username they pick, it should delete all of the tester records. I looked at the hmph data set and the distribution of usernames was just as I expected and most of the usernames were in the answers data set with valid usernames.
Did you remember that we had concatenated the data set from the old server and the new server?
I hope you did because if you didn’t you will end up with a whole lot of the same answers in their twice.
Getting rid of the duplicates
PROC SORT DATA = answers OUT=in.all_fl_answers NODUP ;
by username date_entered ;
The difference between NODUP and NODUPKEY is relevant here. It is possible we could have a student with the same username and date_entered because different schools could have assigned students the same username. (We do our lookups by username + school). Some other student with the same username might have been entering data at the same time in a completely different part of the country. The NODUP option only removes records if every value of every variable is the same. The NODUPKEY removes them if the variables in the BY statement are duplicates.
All righty then, we have the cleaned up answers data, now we go back and create a summary data set as explained in this post. You don’t have to do it with SAS Enterprise Guide as I did there, I just did it for the same reason I do most things, the hell of it.
MERGING THE DATA
PROC SORT DATA = in.answers_summary ;
BY username ;
PROC SORT DATA = in.all_fl_students ;
BY username ;
DATA in.answers_studunc odd;
MERGE in.answers_summary (IN=a) in.all_fl_students (IN=b) ;
IF a AND b THEN OUTPUT in.answers_studunc ;
IF a AND NOT b THEN OUTPUT odd ;
The PROC SORT steps sort. The MERGE statement merges. The IN= option creates a temporary variable with the name ‘a’ or ‘b’. You can use any name so I use short ones. If there is a record in both the student record file and the answers summary file then the data is output to a data set of all students with summary of answers.
There should not be any cases where there are answers but no record in the student file. If you recall, that is what set me off on finding that some were still being written to the old server.
LOOK AT YOUR LOG FILE!
There is a sad corner of statistical purgatory for people who don’t look at their log files because they don’t know what they are looking for. ‘Nuff said.
This looks exactly as it should. A consistent finding in the pilot studies of assessment of educational games has found a disconcertingly low level of persistence. So, it is expected that many players quit when they come to the first math questions. The fact that of the 875 players slightly less than 600 had answered any questions was somewhat expected. As expected, there were no records where
NOTE: There were 596 observations read from the data set IN.ANSWERS_SUMMARY.
NOTE: There were 875 observations read from the data set IN.ALL_FL_STUDENTS.
NOTE: The data set IN.ANSWERS_STUDUNC has 596 observations and 11 variables.
NOTE: The data set WORK.ODD has 0 observations and 11 variables.
So, now, after several blog posts, we have a data set ready for analysis ….. almost.
For more on SAS character functions check out Ron Cody’s paper An Introduction to Character Functions, an oldie but goodie from WUSS back in 2003.
At the Western Users of SAS Software conference (yes, they DO know that is WUSS), I’ll be speaking about using SAS for evaluation.
“If the results bear any relationship at all to reality, it is indeed a fortunate coincidence.”
I first read that in a review of research on expectancy effects, but I think it is true of all types of research.
Here is the interesting thing about evaluation – you never know what kind of data you are going to get. For example, in my last post I had created a data set that was a summary of the answers players had given in an educational game, with a variable for the mean percentage correct and another variable for number of questions answered.
When I merged this with the user data set so I could test for relationships between characteristics of these individuals – age, grade, gender, achievement scores – and perseverance I found a very odd thing. A substantial minority were not matched in the users file. This made no sense because you have to login with your username and password to play the game.
The reason I think that results are often far from reality is just this sort of thing – people don’t scrutinize their data well enough to realize when something is wrong, so they just merrily go ahead analyzing data that has big problems.
In a sense, this step in the data analysis revealed a good problem for us. We actually had more users than we thought. Several months ago, we had updated our games. We had also switched servers for the games. Not every teacher installed the new software so it turned out that some of the records were being written to our old server.
Here is what I needed to do to fix this:
- Download the files from our servers. I exported these as .xls files.
- Read the files into SAS
- Fix the variables so that the format was identical for both files.
- Concatenate the files of the same type, e.g., student file the student file from the other server.
- Remove the duplicates
- Merge the files with different data, e.g., answers file with student file
I did this in a few easy steps using SAS.
- USE PROC IMPORT to read in the files.
Now, you can use the IMPORT DATA option from the file menu but that gets a bit tedious if you have a dozen files to import.
TIP: If you are not familiar with the IMPORT procedure, do it with the menus once and save the code. Then you can just change the data set names and copy and paste this a dozen times. You could also turn it into a macro if you are feeling ambitious, but let’s assume you are not. The code looks like this:
PROC IMPORT OUT= work.answers DATAFILE= “C:\Users\Spirit Lake\WUSS16\fish_data\answers.xls”
Assuming that your Excel file has the names of the columns – ( GETNAMES = YES) . All you need to do for the next 11 data sets is to change the values in lower case – the file name you want for your SAS file goes after the OUT = , the Excel file after DATAFILE = and the sheet in that file that has your data after the RANGE =.
Notice there is a $ at the end of that sheet name.
Done. That’s it. Copy and paste however many times you want and change those three values for output dataset name, location of the input data and the sheet name.
2. Fix the variables so that the format is identical for both files
A. How do you know if the variables are the same format for each file?
PROC CONTENTS DATA = answers ;
This LOOKS good, right?
B. Look at a few records from each file.
OPTIONS OBS= 3 ;
PROC PRINT DATA = fl_answers_new ;
VAR date_entered ;
PROC PRINT DATA = fl_answers_old ;
VAR date_entered ;
OPTIONS OBS = MAX ;
PAY ATTENTION HERE !!! The OPTIONS OBS = 3 only shows the first three records, that’s a good idea because you don’t need to print out all 7,000+ records . However, if you forget to change it back to OBS = MAX then all of your procedures after that will only use the first 3 records, which is probably not what you want.
So, although my PROC CONTENTS showed the files were the same format in terms of variable type and length, here was a weird thing, since the servers were in different time zones, the time was recorded as 5 hours different, so
Since this was recorded as a character variable, not a date (see the output for the contents procedure above), I couldn’t just subtract 5 from the hour.
Because the value was not the same, if I sorted by username and date_entered , each one of these that was moved over from the old server would be included in the data set twice, because SAS would not recognize these were the same record.
So, what did I do?
I’m so glad you asked that question.
I read in the data to a new data set and the third statement gives a length of 19 to a new character variable.
Next, I create a variable that is the value of the date_entered variable that start at the 12th position and go for the next two (that is, the value of the hour).
Now, I add 5 to the hour value. Because I am adding a number to it , this will be created as a numeric value. Even though datefix1 is a character variable – since it was created using a character function, SUBSTR, when I add a number to it, SAS will try to make the resulting value a number.
Finally, I’m putting the value of datefixed to be the first 11 characters of the original date value , the part before the hour. I’m using the TRIM function to get rid of trailing blanks. I’m concatenating this value (that’s what the || does) with exactly one blank space. Next, I am concatenating this with the new hour value. First, though, I am left aligning that number and trimming any blanks. Finally, I’m concatenating the last 6 characters of the original date-time value. If I didn’t do this trimming and left alignment, I would end up with a whole bunch of extra spaces and it still wouldn’t match.
I still need to get this to be the value of the date_entered variable so it matches the date_entered value in the other data set.
I’m going to DROP the date_entered variable, and also the datefix1 and datefixn variables since I don’t need them any more.
I use the RENAME statement to rename datefixed to date_entered and I’m ready to go ahead with combining my datasets.
DATA ansfix1 ;
SET flo_answers ;
LENGTH datefixed $19 ;
datefix1 = SUBSTR(date_entered,12,2);
datefixn = datefix1 +5 ;
datefixed = TRIM(SUBSTR(date_entered,1,11)) || ” ” || TRIM(LEFT(datefixn)) || SUBSTR(date_entered,14,6) ;
DROP date_entered datefix1 datefixn ;
RENAME datefixed = date_entered ;
Occasionally, a brave student will ask me,
When will I ever use this?
The “this” can be anything from a mixed model analysis to nested arrays. (I have answers for both of those, by the way.)
I NEVER get that question when discussing topics like filtering data, whether for records or variables, because it is so damn ubiquitous.
Before I headed out to be, literally, testing in the field (you can read why here) , I was working on an evaluation of the usability of one of our games, Fish Lake.
My next thought was that many students played the game for a very short time, got the first answer correct and then quit. I decided to take a closer look at those people.
First step: from the top menu select TASKS, then DATA, then FILTER AND SORT
Second step: Create the filter. Click on the FILTER tab, select from the drop-down menu the variable to use to filter, in this case the one named “correct_Mean” , select the type of filter in the next drop-down menu, in this case EQUAL TO and in the box, enter the value you want it to equal. If you don’t remember all of the values you want, clicking on the three dots next to that box will bring up a list of values. You can also filter by more than one variable, but in this case, I only want one, so I’m done.
Third step: Select the variables. Steps two and three don’t have to be done in a particular order, but you DO have to select variables or your procedure won’t run, since it would end up with an empty data set. I do the filter first so I don’t forget. I know the filter is the whole point and you’re probably thinking you’d never forget that but you’re probably smarter than me or never rushed.
If you click the double arrows in the middle, that will select all of the variables. In this case, I just selected the two variables I wanted and clicked the single arrow (the top one) to move those over.
Why include correct_mean, since obviously that is a constant?
Because I could have made a mistake somewhere and these aren’t all with 100% correct. (Turns out, I didn’t and they were, but you never know in advance if you made a mistake because if you did then you wouldn’t make it.)
I click OK and now I have created a data set of just the people who answered 100% correctly.
For a first look, I graphed the frequency distribution of the number of questions answered by these perfect scorers. To do this,
- Go to TASKS > GRAPH > Bar Chart
2. Click on the first chart to select it, that’s a simple vertical bar chart
4. Under APPEARANCE click the box next to SPECIFY NUMBER OF BARS. The default here is one bar for each unique data value, which is already clicked. Caution with this if you might have hundreds of values, but I happen to know the max is less than 20.
I thought I’d find a bunch answered one question and a few answered all of the questions and maybe those few were data entry errors, say teachers who tested the game and shouldn’t be in the database. When I look at this graph, I’m surprised. There are a lot more people who had answered 100% correctly than I expected and they are distributed a lot more across the number of questions than I expected. That’s the fun of exploratory data analysis. You never know what you are going to find.
SO, now what?
So, now what?
I want to find out more about the relationship among persistence and performance. To do this, I’m going to need to merge the answers summary data set with demographics.
I’m going to go back to the Summary Data Set I created in the last post (remember that one) and just filter variables this time, keeping all of the records.
Again, I’m going to go to the TASKS menu, select DATA then FILTER AND SORT, this time, I’m going to have no filter and select the variables.
Since the pop-up window opens with the VARIABLES tab selected, I just click the variables I want, which happens to be “correct_N”,” correct_mean” and “username”, click the single arrow in between the panes to move them over, and click OK at the bottom of the pop-up window. Done! My data set is created.
You can always click on PROGRAM from the main menu to write code in SAS Enterprise Guide, but being an old dinosaur type, I’d like to export this data set I just created and do some programming with it using SAS. Personally, I find it easier to write code when I’m doing a lot of merging and data analysis. I find Enterprise Guide to be good for the quick looks and graphics but for more detailed analysis, the old timey SAS Editor is my preference. If you happen to be like me, all you need to do to output your data set is click on it in the process flow and select EXPORT.
You want to export this file as a stand-alone data set, not as a step in a project. Just select the first option and you can save it like any file, select the folder you want, give it the name you want. No LIBNAME statement required.
And it’s a beautiful sunny day in Santa Monica, so that’s it on this project for today.