Thank you, God, for St. Mary’s Catholic School, where someone back in the 1960′s decided that a speed-reading machine and programmed learning was a good idea. My father used to say that if they put words on toilet paper, I’d read it. I still read six or seven books a week, selected pretty much at random. Here are some books I read lately that I just happen to like.
Applied statistics and the SAS programming language. by Ronald Cody and Jeffrey Smith
Experienced programmers will probably find several chapters too basic, but that’s okay, I just skipped to the interesting ones.
Consequential strangers by Melinda Blau & Karen Fingerman
About the importance of acquaintances in our lives, those people like the hair dresser or dentist you have gone to for ten years, who are not quite friends but not really strangers, either.
The End of Your Life Book Club by Will Schwalbe
I doubt Schwalbe, who had a privileged upper middle class childhood, realized how interesting that would be to the other 95% of the world – the children of the secretaries, housekeepers and gardeners who wondered what it would be like to grow up in one of those nice houses with a grand piano – and that part is just described in passing. His mother, who died of cancer, had a fascinating life – from Harvard admissions director to years of work with refugees. Throughout, he adds discussions of the books they read together to pass the time while waiting for her chemotherapy.
jQuery Ui by Eric Sarrion
It’s billed as a beginner to intermediate book on the jQuery user interface, and that it is. I liked it because you can breeze through it in a day or two, and each chapter is pretty self-contained so if you are interested particularly in animation, for example, you can skip to that chapter.
A really insightful book on how games can be applied to real world problems, why some games work, what we like about them. After the first 250 pages or so, I thought it got to be redundant. The point I liked the best is how sometimes failing spectacularly can be reinforcing, so that people will want to keep playing a game even more after they get killed (virtually speaking, that is).
So … what have you been reading lately?
P.S. Does anyone know why they don’t have speed reading programs in school any more? Being able to read hundreds of words a minute saved my ass from high school all the way through three graduate degrees. I wasn’t really the most motivated student for much of those years and if I’d had to spend the hours reading that many of my classmates did, I think I would have given up and gotten a job selling fish and chips or something.
I tried to find an easily comprehended explanation of the F-statistic for my students but I could not, so, here as a public service is mine. If you have some other pages you can recommend, please let me know.
Okay, why ANOVA? Why not just do a t-test? Well, let’s say you have five groups. Then you will have ten pairwise comparisons. You compare group 1 to groups 2, 3, 4 and 5. That’s four. Now you compare group 2 to groups 3, 4 and 5. That’s another three t-tests. And so on. So now, you don’t really have a 5% probability of a type I error when p = .05 because you actually had TEN tests. If you did 100 tests, you’d expect five of them to turn out significant just by chance. So, let’s just accept that many pairwise tests = bad.
Enter ANOVA, short for Analysis of Variance. Let’s talk about a one-way ANOVA for now. You have a continuous, numeric dependent variable – say height. You have a categorical independent variable with two or more levels. You could do ANOVA with just two levels but in that case you might as well do a t-test. In this case, let’s assume that we have children raised eating an unrestricted diet, children who were raised vegetarian and children who were raised vegan. At age 10, we decide to measure all of their heights.
What is our null hypothesis? It is that there is no difference among the means, or
μ1 = μ2 = μ3
Enter the F-test. We are going to state that if there is no difference in the means then the estimate of variance you get from the difference in group means should be the same as the estimate of the population variance you get within groups. The F statistic is calculated like this
variance between groups
variance within groups
If the null hypothesis is correct, these two estimates of the variance should be close to the same and your F ratio should be near 1.0
How to get the within group variance
Well, it’s just like any other time you get a variance. Imagine that group 1 is a sample for a study. What do you do? You sum the squared deviations for the mean and divide by n minus 1, right?
That gives you the within group variance for group 1. You do the same thing for group 2 and group 3.
BUT … not all groups are created equal. What if you have five times as many people in group 3 as you do in group1 and group 2?
Being the reasonable person you are, you weight the within group variances by the degrees of freedom of each group, that is to say, the number of subjects minus 1. You divide this by the total number of subjects minus the number of groups. This is your within group estimate of the variance. This is your denominator. Let’s say that the value you get for this is 42.
Now you need the between groups variance
First, subtract each group mean from the overall mean. Square that.
Second, multiply by the number in each group
Third, add the result
Fourth, divide by the number of groups minus 1
Let’s just suppose, for the sake of supposing, that the value you get for this is 108. Your F-ratio is then 108/42 = 2.57
And that, my dears is you get an F value.
A Pew Research Poll asked 1,201 adults
“All in all, do you think affirmative action programs designed to increase the number of black and other minority students on campus are a good thing or a bad thing? Sixty percent said good, 30% said bad and 10% said don’t know. Let π denote the population proportion who said it is good. Find the p-value for testing Ho : π= 0.50 against Ha π ≠ 0.50 . Interpret.
Going back to our previous discussion of steps in solving a problem, let’s look at these one at a time as they apply to this question about proportions.
1. Chill. I have accomplished this by sitting here with a cup of Chamomile tea and a glass of Chardonnay. I intend to finish both of them before this post. In all seriousness, though, I think students often get problems wrong just because they panic. What do I do? What do I do? Relax. Take a sip of wine. Okay, better? Yes, and the picture above is my daughter and her friend, just goofing off in the photo shoot for my book. Seriously. Chill.
2. Understand the problem.
60% said affirmative action was good
30% said it was bad
10% were undecided
The question asks you to test the PROPORTION (not the mean) who said affirmative action was good. This has nothing to do with the 30% or 10%. You do NOT want to compute a chi-square here. That would test if there was an association between two variables, the respondents’ rating of affirmative action and some other variable. You do not have any other variable.
3. Select a strategy.
Identify the statistic you need, the formula you will need to obtain that statistic and the numbers you need to find to plug into that formula.
Then, compare the obtained statistic to the table of critical values.
You need a z-value. To get a z-value you use this formula:
z= Obtained proportion – Hypothesized population proportion
Since it starts with a P for proportion and population, let’s call the hypothesized population proportion ∏
To get the standard error, you use this formula
The SQUARE ROOT OF
4. Execute the strategy
First, I need to find the standard error. My ∏ value is .50 – remember, that is the hypothesized population value, not the value you obtained.
So — I take .5 * (1-.5) and get .25
I divide that by 1,201 and get .000208 or thereabouts.
I take the square root of that and get .014
Now, I have my standard error. I think bells should ring at this point, but they did not. I was sadly disappointed so I drank some more Chardonnay to get over it.
Now, I calculate my z-value by plugging in more numbers. The obtained value is .60, the hypothesized value (∏ ) is .50 and my standard error is .014
.60 – .50
that equals 6.93
I compare that to a z table in a handy dandy statistics textbook, which only goes up to 5.0 but that has a probability of way less than .0001, so I call it a day, saying that it is extremely unlikely one would get a proportion of 60% in a sample of 1,201 people if the population proportion was truly 50%. This assumes all of the usual suspects, that is no bias in the wording of the question, random sampling.
5. Test it. Evaluate your answer. The first thing I always do is a reality check. Unless I’ve had a LOT of glasses of Chardonnay, I can generally perceive reality fairly well. Does it make sense that a sample that large would be that far off? No, not really. So, it does seem pretty likely that if the obtained proportion from well over 1,000 people randomly sampled is 60% it is not as low as 50% in the population.
Another way I might test it is to throw it into some statistical software and see if I get the same answer. Maybe if I’m feeling ambitious, I’ll do that tomorrow. Sadly, I am now all out of chamomile tea. Happily, there is still more wine.
Here we have analysis of open data using free software with – uh, SAS?
Click the links below and watch the videos. Seriously. They are too large to embed in the post. Sorry.
Yes, you might think of SAS as the choice of multinational corporations with unlimited software budgets. You now have two options, if you are a student or faculty member, and those are either
- SAS web editor – which is fast and runs on both Windows and Mac (hurray!) but does require more knowledge of programming, OR
- SAS Enterprise Guide – which is MUCH slower in the typical university environment where it seems to be an accrediting body requirement that your wireless speed has to blow, but EG doesn’t require much programming, is much more pointy and click-y, which makes some people very happy. It also includes a process flow diagram which is like a security blanket for people in management who have some weird kind of Freudian attachment to Microsoft Project imitators.
If you haven’t seen the new SAS web editor, I highly recommend you take a peek at this video on how to do a regression analysis the SAS web editor.
I did it for my class but it nicely demonstrates how easy it is to get a quick view of your data with the web editor. This is a decent size data set of actual data from the 2007 TIMSS study. I did reduce it down to a few dozen variables. It’s really good because it has actual problems like user-written formats, missing data, non-obvious coding. This is good because my biggest complaint in hiring new graduates is they have only used data in the back of the statistics textbook and they have no idea how to work with data collected from actual human beings.
You can compare this first video to doing the same analysis with SAS On-Demand for SAS Enterprise Guide, another video I made for the same class. You can see that SAS Enterprise Guide takes longer and this was recorded in my office where we have an extremely good Internet connection. I was NOT using the wireless which seems to be pathetically slow at every educational institution where I have ever been. One of the reasons that I record these for the class is that with SAS Enterprise Guide it just takes so-o-o long. As I say on the video, I could sing Christmas carols while waiting for the results, if I could sing.
So, this semester I have used both options, but presuming it gets out of beta and is available next year, I’m thinking about using the SAS web editor for my next class. Even though it does require some programming, I think the increase in speed, use across all operating systems and lack of problems in installation make up for it.
Anyone else who has used one or both of these, please chime in with your opinions.
Last week, I wrote about my disagreement with those who want to go out and hire a code monkey. Being deeply immersed in writing a computer game to teach kids math, here is my perspective from the monkey cage on the benefits of coding your own stunts.
- I like it. This seems to be a greatly under-rated reason. Every night, I have to force myself to quit working and go to bed, take a break, read a book.
- It’s WAY faster to integrate. I have an office downstairs. The Rocket Scientist works upstairs. Several times a day one of us will wander up or down the steps and say, “I was thinking of doing X, what do you think?” We can integrate my code and his code over dinner, over coffee, over cognac, while one of us is riding the exercise bike in the living room.
- It’s way faster to innovate. If I have an idea at 10 pm, it may be done by 2 am because I sat down and worked on it. I didn’t have to call anyone, schedule a meeting, write specifications.
- It’s way faster to iterate. We are in beta test mode right now. If the consultant at the school calls in the evening I can usually have whatever he needs done before school opens the next morning. If it’s more complicated, it might take a week or two.
- We can give firm deadlines. If I tell our consultant at the school when something will be done, I can be really close to that estimate because I am not relying on too many other people for the core competency to deliver it. I may need some graphics or animation work done, in which case I always ask the folks working on that, and, if necessary, we can almost always work around them.
- Our out of pocket costs are less. We do cost, because when either of us work on this game we are not doing work on other contracts that would pay us money. We could definitely hire someone to work for less than we charge on our consulting contracts, but I am almost certain we could not get anyone to do the same quality of work for the cost of our foregone income.
- The communication is better. Rather than telling someone what I have in mind, I write it myself. Similarly, when the Rocket Scientist wants to change the look of something, add a different weapon or obstacle, he just does it. We do have to talk to each other and the graphic artists/ animators. Still, the fewer people you need to go through, and the more of a concrete product you have to show those people, the less misunderstandings occur.
- You can’t buy commitment. This isn’t the first time we have found a bug around 2 a.m. The Rocket Scientist suggested he work all night fixing everything he can while I sleep and then I can get up and test everything in the morning. There have been plenty of times when I thought, “Oh, I don’t really need to add music there”, or “That doesn’t have to have logic to give progressively harder or easier problems” but then, of course, caught myself and went ahead and did it. Any time I have an idea to make the game better, I either implement it right then or, if it is too close to a deadline to get done, add it to my list of stuff to put in a future release.
There are down sides, too, don’t get me wrong. The only one I can think of – but it is huge- is that all of this coding takes time away from other things. If everyone who told me we should be on Kickstarter gave me $20 we’d never need Kickstarter! I really like teaching statistics and writing about statistics, and I don’t get time to do as much of that as I would like. I’ve gone to a couple of meet-ups and talked to a couple of investors, but I really don’t have much time to do that. I need to finish the Phase II application, write another application, should probably be doing more on the marketing side, more on the design side – in short, coding takes you away from all of the other parts of running a business. I have no good solution other than what I do now, which is set aside large blocks of time right after an update – like next week, to do some of those business-side responsibilities. Hopefully, someone to take up some of those responsibilities will be joining the company very soon and one of our existing staff members will be picking up new responsibilities as well.
There are tasks, though, that no one but me is going to take over. Someone else may edit it, add some figures or add up the budget, for example, but no one else is going to write our grant proposals but me. Still, I think all of the responsibilities that I retain I will do better BECAUSE I’m involved in the coding as well.
I admit that some months I am so busy that I toss Significance out without reading it – this is the magazine of the American Statistical Association (ASA) and Royal Statistical Society. No, I don’t pile up things to read later because I never do read them later.
Anyway … taking two days off work, I have been doing a lot of reading. Here is just one example of why Significance is a treat, from a discussion of how many school playing fields really had been sold, was it 5,000 or 10,000 or 500? Where did these published numbers come from?
“It seems almost literally the case that someone made a flagrant guess, added some wild assumptions, extrapolated from a year and a half’s wrong data to 20 years of even wronger data, then divided by 2 for luck and rounded it down. For further details of this rather wonderful method of not getting anything right, see the Royal Statistical Society’s Getstats website. ... “
Wins the prize for combining scholarly and snarky. One article on research on anti-depressants, I copied and sent to a mental health advocate with a special interest in this area. A second article on yelling at the radio and twelve criteria for judging the value of a health study, I scanned and uploaded as recommended reading for my class. There were other articles, like the use of data “sonification” as well as visualization – which is something you have engaged in any time you took your car in because the engine “sounded funny”.
In my opinion, ASA has come a long way in popularizing statistics. The magazine is one example. I had signed up for ASA in January because I went to a local chapter meeting where the new president, Bob Rodriguez gave a talk. Since then, I went to a data hack-a-thon I heard about on an ASA email group – as an interested observer, because it was close by, and attended JSM in San Diego, was even a discussant on a panel.
My personal opinion is that ESPECIALLY IF YOU ARE A STUDENT, it’s worth signing up. The student membership is $15 ! There is even a $50 membership for the year after you graduate.
They offer a number of meetings each year. I’m still debating on whether I have time to take off to go to the Joint Statistical Meetings next year in Montreal. There is a Conference on Statistical Practice that has a very applied focus. It looks like it would be recommended for people starting out in the field.
I run into a lot of people who ask me how they could learn more about statistics, because they are in graduate school or working in a field outside of statistics but it would be very helpful for them to understand statistical analysis, results and caveats. My advice for today is to join ASA, especially if you are a student. Hell, you probably spent more than $15 on lattes to pull on all-nighter for that last exam.
According to my partner, Dr. Erich Longie, the Dakota believe that how one acts immediately after the death of a loved one, is how a person will act for the rest of his/her life. Thus, when someone dies, it is a very bad idea to get drunk, sleep around or other dysfunctional ways of dealing with grief, lest you end up being that kind of person forever.
Well, I’m not Native American, but I can tell you that when my husband died 17 years ago, I worked pretty much from when I woke up in the morning until I fell asleep from exhaustion around 2 a.m. the next morning, then did it all over again. In two years, I paid off over $40,000 in medical bills, funeral bills, taxes due the IRS. As Erich might have predicted, I pretty much kept on that path for the next fifteen years. Originally, when I decided to split off from Spirit Lake Consulting, Inc, I thought I would retire. Within less than three months, we had spun off the satellite office in Santa Monica as a separate company, The Julia Group, AND I had taken a full-time position at USC.
Right now I’m teaching a course at Pepperdine University along with being principal investigator of a research grant and our usual consulting contracts. Oh, yes, and I wrote a book on matwork for judo, grappling and mixed martial arts that should be coming out soon. Except for my aborted retirement attempt, I’ve worked more than one job for the past 17 years.
I’ve really been trying to work less. I went on a vacation this summer to all sorts of lovely places. Of course, right before I went we received an SBIR award so I spent my vacation as a “laptop with a view”.
The last revision for the book I co-authored on matwork went to the editor a few weeks ago. Yesterday, the Rocket Scientist insisted we get version 2.0 of the game done and shipped out on Wednesday afternoon so we would have nothing to do on Thanksgiving. As of 4 pm yesterday, I decided to take two days off. Now, it has been pointed out to me that other people do this, take off from the afternoon of a particular day, then don’t work the next two days. Maria (a.k.a. darling daughter number one) reminds me that this is called a “weekend”. Before the shortest retirement in history, the only time in the past several years I had taken off over 48 hours in a row was when the same daughter number one insisted that we take a cruise to the Bahamas and that I NOT open my computer once during the four days we were gone. That was about seven years ago.
Obviously, I have been a huge failure at retirement. So, I thought if I was going to ever have the option of doing anything but working in my life, I should make an attempt to , well, do anything but working. So far, my day of not working has gone like this:
- Finished The End of Your Life Book Club
- Read a mystery novel, Sweet Revenge (There is a whole genre of mystery novels with recipes? Who knew?)
- Did The Spoiled One’s laundry, which she had saved up, since she is home for five days and brought no clean clothes (but a basket of dirty ones).
- Cleaned up the downstairs aftermath of The Spoiled One’s sleepover with her best friend
- Changed the office guinea pigs’ cage
- Made Thanksgiving dinner for eight
- Saw all of my wonderful children, even darling daughter number one and the genius grandchildren popped in for half an hour via Skype
- Had a terrific Thanksgiving dinner
- Watched Addams Family Values with my lovely family
- Cleaned up after dinner and made turkey soup
- Read the book Unaccustomed Earth
- Read The Incorporated Knight
Yes, I do read very, very fast. So far, I have not done any work other than read and respond to one email from a student. Tomorrow, I plan to go to the Iliad bookstore in North Hollywood with The Perfect Jennifer (a.k.a. darling daughter number two), go out drinking with a couple of friends and the Rocket Scientist, and write an article for Black Belt magazine. I am not sure I have this whole weekend concept down. I really don’t like not working, to be honest. Still, I think I’m going to try to do this a couple of times a month. I can guarantee that within a few months my house will be spotless and perfectly organized. After that, maybe I’ll learn to knit.
Note: I did actually take four days off work in 2009, but that doesn’t really count since I had my knee replaced and was either being operated on or in the hospital being given large doses of - I think it was – morphine, for three of them.
Having received email from readers who believe that Spirit Lake: The Game is exploitative or disrespectful of Native Americans, I asked the individual who designed the Level 5 problem shown in my blog post of November 18 to respond in a guest post.
Dr. Erich Longie
Although this game is still in its early development stages the Native American students we are working with are really excited about it. What is exciting to us is that they think it’s entertainment when it’s actually a math game. However, what is most important to me is that the game’s theme is based on our Dakota culture. Students will be exposed to our culture from my, a Dakota wicasa’s (Indian man’s) point of view. Not the romanticized version, not the Indians are uncivilized savages version, but my version.
Take the problem AnnMaria talks about in this blog as an example. In the the final version the students will learn that horse stealing was an honorable sport among the Plains tribe. And only the bravest warriors crept into the villages to steal the most valuable horses, the war horses. They will learn that courage was the most desirable of all values. And, death in battle was preferred to dying of old age, That counting coup on the enemy without killing that enemy (horse stealing as an example) was the bravest act of all.
The game successfully combines culture, math skills and entertainment to make learning math interesting and fun.
The beginning of an explanation of number lines, given below, illustrates how culture is infused throughout our game.
There were no street lights or road signs when Tasina and Hoksinato lived on the prairie. Yet, they were able to walk to the river in darkest night to fetch water and back again without getting lost. How did they do it? When the sky was clear they used the stars to guide them and then they counted the number of steps it took to get to the river and back. Here is an example of the path they followed to the river and back. By learning how to count forward and back ward they were able to move around the village on the darkest nights and not get lost.
Students are constantly reinforced by positive information about their culture which in turn raises their self-esteem. A high self-esteem is critical for a student’s success.
(Note: You can read Dr. Longie’s blog posts at the lastrealindians.com website. Dr. Longie, who has earned three degrees from the University of North Dakota, has been an outspoken and active opponent of the Fighting Sioux logo. He is an enrolled member of the Spirit Lake Dakota Nation, co-founder and president of Spirit Lake Consulting, Inc. )
The past couple of weeks, I’ve been hearing my friends from Turtle Mountain and Spirit Lake talk about the election in North Dakota. I was particularly interested because this was the one election that Nate Silver predicted incorrectly. He had Heitkamp down by 3.9 percent, and yet she won.
I have no idea how Silver’s model is coded and I doubt he’ll be telling me any time soon. On top of that, most of the research I do is on education, social services and demographics, not politics. However, one of the variables I would definitely include would be ethnicity, what percentage of registered voters are a particular ethnic group, what percentage usually vote and in which direction.
So, if, as in California, 16% of likely voters in the state are Latino, and they vote overwhelmingly Democrat, I would factor that in. In fact, the Latino vote was one factor in turning what was supposed to be a close race for governor in 2010. With Latinos supporting Jerry Brown by over a 40 point margin, this group contributed significantly to his 11 point victory in the overall vote.
So, let’s go back to Native Americans. I’m not the first to point out that even though Native Americans are a small percentage of the voting population in North Dakota, in victories that are claimed by a margin of a few thousand votes, that small percentage matters.
It would not surprise me if Nate Silver did not have enough data on Native Americans to include them in his model. Often, in our own research, we will find polling data has an asterisk in the Native American column when reporting by ethnicity. Down in the footnote you will see, “Not enough data to estimate”. Even if they do provide estimates, with small sample sizes, you’ll see very large margins of error.
Dr. Carol Davis, at the Turtle Mountain Reservation, which is located in Rollette County, which Heitkamp carried with 80% of the vote, commented , “Heidi came here seven times. We didn’t see Berg once. At least ACT like you want our votes!”
Dr. Erich Longie, one of the collaborators on our Spirit Lake game, not only went to vote but rounded up his younger relatives and told them to get out and do their civic duty as well. The Spirit Lake Nation is located next to Benson County, which Heitkamp also carried.
So … for the next election, American Indians certainly ought to be in the model, especially in North Dakota.
On the long list of things that irritate me, few score higher than people who have “an idea” for a start-up and are then going to find “you know, people” to code it for them.
I was cheered up when I ran across this youtube video where AngelList founder Naval Ravikant advises the winners of a start-up competition, “Don’t outsource coding.”
“At last! A voice of reason!”
This was particularly welcome because I had heard someone, who actually is a start-up founder, giving advice that began,
“Say you have an idea, but you are not a technical person. It’s not like YOU’RE GOING TO LEARN TO CODE!”
Imagine the latter sentence said in the same tone as if I was discussing cleaning out an outhouse with a shovel. Obviously, no one with any education or other options would choose to do such disgusting, low level work.
Completely apart from whether expecting you to learn to code is unreasonable, hearing that tone made me decide I would never work with that person. Honestly, who would want to work with a person who looked down on them? But it gets better. This same person went on in great detail to explain, “Working with your technical person.”
I have found that when you work with your technical person, you need to give them very specific details so that he knows exactly what you want him to do. Otherwise, I would get back something that was not what I had in mind.
Interesting, so now we know that developers do not read minds (who knew?) and are apparently incapable of having any ideas of their own. Where do they get these ideas? Harvard Business School, apparently. I found this site that advises, “Don’t code”. They give 5 steps to a minimum viable product which include design (can be outsourced or run a design contest), hiring a freelance developer (most bids will come from off-shore and be $20+ per hour) .
So … you have an idea and pay people in third-world countries far below the going rate in the U.S. to develop it for you. Remind me never, ever to work with anyone who took this start-up course at Harvard. The main take-away seems to be to get everything you can for yourself while you pay the people who actually make it as little as possible and pit them against one another.
I read a lot of articles that agreed with the Harvard Business School people, so, I was very pleased to find, before I threw a brick through my monitor, an excellent article by Paul Graham, How to Start a Startup.
One of his points, “spend as little money as possible”, is the opposite of the BAD advice I mentioned previously of people telling us “you can’t afford not to do X”.
The first group of students who spoke with Naval planned to spend $25,000 to pay developers – that’s a lot of money to spend on a student income. The Harvard MBA program said a complex app should take at most $15,000 and that is paying designers and developers, but I guess if you hire them all off-shore, it’s cheaper.
The question that always comes to me in these cases is why any investor would need you. Couldn’t they just go hire those people themselves for $15 or $25K ?
Since we are shipping our update next week, I’m going to grab a glass of Chardonnay and go on to the last level of the game that will be in version 2. I just hope that the folks at Y Combinator and Angel List are right and the Harvard Business School is wrong, not just because one fits more the model of our company. One also fits more the model of the way I want the world to work.