Jun
26
Data Analysis by Example: That’s funny …
June 26, 2016 | 2 Comments
In the last post, I used SAS Enterprise Guide to filter out a couple of ‘bad’ records that came from test data, then I created a summary table of the number of questions answered and the percentage correct. Then, I calculated the mean percentage correct for the around 84%. That seemed a bit high to me.
Having (temporarily) answered the first question regarding the number of individual subjects and the average percent of correct answers from the 424 subjects, I turned to the next question:
Is there a correlation between percentage correct and the number of questions attempted? That is, do students who are getting the answers correct persist more often?
Since I had both variables, N and the mean correct (which, since this was score 0= correct, 1= incorrect gave me the percentage correct) from the summary tables I had created in the previous step, it was a simple procedure to compute the correlation.
I just went to the TASKS menu, selected MULTIVARIATE and then CORRELATIONS
Under ANALYSIS VARIABLES correct_ N for the ‘correct’ variable, which is a variable that holds whether the student answered correctly, 0(= no) or 1(=yes). Under CORRELATE WITH I dragged correct_mean, which has the percentage each student answered correctly.
Since it is just a bivariate correlation and the correlation of X with Y = the correlation of Y with X , it would make absolutely no difference if I switched the spots where I dragged the two variables.
I click run and I get a somewhat unexpected result, you can see here, with a correlation of -.07.
I also note that the minimum number of answers attempted is 1. Now, I have done (and published) analyses of these data elsewhere, as this is an on-going project.
Other analyses from this same project can be found in:
Telling Stories with Your Data and
Because of these analyses of ‘Fidelity of Implementation’, that is the degree to which a project is implemented as planned, I am pretty sure that these data include a large proportion of students who only had the opportunity to play the game once.
So … I decided to run a scatter plot and check my suspicion. This is pretty simple. I just go to the TASKS menu and select GRAPH then SCATTER PLOT.
I selected 2-D Scatter Plot
Then, I clicked on the DATA tab, dragged correct_Mean under Horizontal and Correct_N and vertical, then clicked RUN.
This produced the graph below.
Now, this graph isn’t fancy but it serves its purpose, which is to show me that there IS in fact a correlation of mean correct and the number of problems attempted. Look at that graph a minute and tell me that you don’t see a linear trend – but it is pulled off by the line of 1.0 at the far end.
This did NOT fit my preconceived notion, though, that the lack of correlation was due to the players who played once, and so there would be a bunch of people who had answered 1 or 2 questions and got 100% of them correct. Actually, those 100-percenters were all over the distribution in terms of number of problems attempted.
This reminds me of a great quote by Isaac Asimov,
The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ (I found it!) but ‘That’s funny …’
Well, we shall see, as our analysis continues …
Want to see these data at the source?
You can also follow the link above to donate a copy of the game to a school or give as a gift.
Jun
20
The government is extremely fond of amassing great quantities of statistics. These are raised to the nth degree, the cube roots are extracted, and the results are arranged into elaborate and impressive displays. What must be kept ever in mind, however, is that in every case, the figures are first put down by a village watchman, and he puts down anything he damn well pleases.
Josiah Stamp
Any time you do anything with any data your first step is to consider the wisdom of Sir Josiah Stamp and check the validity of your data. One quick first step is using the Summary Tables task from SAS Enterprise Guide. If you are not familiar with SAS Enterprise Guide, it is a menu driven application for using SAS for data analysis. You can open a program window and write code if you like, and I do that every now and then but that’s another post. In my experience, SAS Enterprise Guide works much better with smaller data sets – defined by me, as the blog owner, of less than 400,000 records or so. Your mileage may vary depending upon your system.
How to do it:
- Open SAS Enterprise Guide
- Open your data set – (FILE > OPEN > DATA)
- From the TASKS menu, select DESCRIBE and then SUMMARY TABLES. The window below will pop up
- Drag the variables to the roles you want for each. Since I have less than 450 usernames here, I just quickly want to see are there duplicates, errors (e.g. ‘gret bear’ is really the same kid as ‘grey bear’ , with a typo). I also want to find out the number of problems each student attempted and the percent correct. So, I drag ‘username’ under CLASSIFICATION VARIABLES and ‘correct’ under ANALYSIS variables. You can have more than one of each but it just so happens I only have one classification and one analysis variable I’m interested in right now.
5. Next click on the tab at left that says SUMMARY TABLES and drag your variables and statistics where you want them. I want ‘username’ as the row, so I drag it to the side, ‘correct’ as the column, N is already filled in as a statistic if you drag your classification variable to the table first. I also want the mean, so I drag that next to the N. Then, click RUN.
Wait a minute! Didn’t I say I wanted the percent correct for each student? Why would I select mean instead of percent?
Because the pctN will simply tell me what percent of the total N responses from this username make up. I don’t want that. Since the answers are score 0 = wrong, 1= right, the mean will tell me what percentage of the questions were answered correctly by each student. Hey, I know what I’m doing here.
6. Look at the data! In looking at the raw data, I see that there are two erroneous usernames that shouldn’t be there. These data have been cleaned pretty well already, so I don’t find much to fix.Now, I want to re-run the analysis deleting these two usernames.
7. At the top of your table, you’ll see an option that says “Modify Task”. Click that.
8. You’ll have the summary tables window pop up, this time with your data filled in. Click on the edit button at the top right of this window. You are about to create a task filter.
8. Under TASK FILTER pull down the first box to show the variable ‘username’. Pull down the second box to show the option NOT EQUAL TO and then click the three dots next to the third box. This will pull up a list of all of your values for usernames. You can select the one you want to exclude and click OK. Next to the three dots, pull down to select AND, then go through this to select the second username you want to delete. You can also just type in the values, but I tend to do it this way because I’m a bad typist with a bad short-term memory.
9. Create a SAS dataset of the output. It’s super easy. Click on the RESULTS tab to the left and in the window that pops up click SAVE RESULTS TO A DATA SET. Then, click RUN.
10. The most recently created data set should be your default data set for analysis but click on it in your process flow diagram to activate it just in case.
11. From the DESCRIBE menu again select SUMMARY STATISTICS
12. Drag ‘correct_mean’ under ANALYSIS VARIABLES and click RUN.
The resulting table gives me my answer – the mean is .838 with a standard deviation of .26 for N=424 subjects. So … the average subject answered 84% of the problems correctly. This, however, is just the first step. There are couple more interesting questions to be answered with this data set before moving on. Read the next step here.
————–
Want to play the game that produced these data? Own a Mac or Windows computer? Have ten bucks?
Jun
6
Success in Parenting Isn’t What You Think
June 6, 2016 | 1 Comment
It’s been a good week for the darling daughters.
The Spoiled One graduated summa cum laude, also president of the senior class, and is heading to the east coast to attend a small liberal arts college where she has an academic scholarship and a spot on the soccer team.
The book co-authored by Darling Daughter One and Darling Daughter Three won International Sports Biography of the Year, and the two lovelies pictured above flew to London to receive the award.
The Perfect Jennifer has tenure now and is finishing out another year of being an outstanding teacher.
A couple of years ago, there was a book with the thesis that Chinese mothers are superior and all Americans are raising a bunch of lazy slackers. It irritated me and I wrote a blog with the title “Why American mothers are superior” because that seemed more professional than “Go Fuck Yourself” . And no, in all seriousness, I really don’t think that one race or country has better mothers, but I also think the idea that if we don’t regiment our children lock-step for 18 years straight into MIT we are a bunch of losers is irritating as fuck.
You might think this is my rubbing it in post to say, “How you like me now? My kids are doing awesome.”
You’d be wrong. To paraphrase Erma Bombeck yet again, no mother should ever be arrogant because she can’t be sure that at any moment the principal won’t call to tell her that one of her children rode a motorcycle through the gymnasium.
I wanted to talk about something different – definitions of success that Tiger Mom Lady probably would not understand at all.
A friend of mine has a son in his mid-twenties who lives at home. He earned a degree from a two-year college. He is not crushing it as a hedge fund manager, but rather, has a regular job with benefits. I’m sure Tiger Mom would be dismayed if he was her kid.
My friend was distraught over the situation at work. The company had been acquired and reorganized. Her new boss was a nightmare and she came home in tears more often than not. Despite over a decade of good performance, she was afraid she was going to be laid off and was becoming depressed and stressed. They couldn’t afford to make the payments on their house on one income, and they had already lost a home back in 2008 when the housing marketing imploded. They were the collateral damage of those hedge fund managers.
It was at this point that her son (remember him?) stepped up. He had been living at home to save money for a down payment on a house of his own. Since he is single, has no children and gets along well with his parents, it seemed like a good arrangement, and he was paying them rent, but a lot less than it would cost to go out and get his own apartment. Plus, there were those home-cooked meals. He said something like this,
Look, you took care of me for 26 years. I make enough money now to cover the mortgage. If you are that unhappy about your job, quit. Even if you don’t quit your job, at least quit worrying about being laid off. I’ll pick up any slack. Between Dad and me, we got you covered.
Look at this family – they all love each other, the mom, dad and son. They get along well enough that he feels comfortable living at home to save money. Her son is hard-working and appreciates the fact that his parents have done what they could to support him. He can take the perspective of another person, see the stress his mother is experiencing and offer to do what he can to alleviate it out of appreciation for what they have done for him.
In my view, my friend is a success as a mother and her son is a success as a human being.
———
Blogroll
- Andrew Gelman's statistics blog - is far more interesting than the name
- Biological research made interesting
- Interesting economics blog
- Love Stats Blog - How can you not love a market research blog with a name like that?
- Me, twitter - Thoughts on stats
- SAS Blog for the rest of us - Not as funny as some, but twice as smart. If this is for the rest of us, who are those other people?
- Simply Statistics, simply interesting
- Tech News that Doesn’t Suck
- The Endeavor -John D Cook - Another statistics blog