# It’s not the math that’s hard

November 17, 2012 | 3 Comments

Here is a math problem:

Hoksinato and Tasunka Ska are going to steal horses. They could steal the scrub ponies from the edge of the camp. The last 15 times warriors from the tribe tried to steal scrub ponies, they got away 12 times and were caught and tortured 3 times. If they steal the war ponies instead, they will show more bravery, maybe even earn an eagle feather. Plus, war ponies are much more valuable. The last 10 times warriors from the tribe tried to steal war ponies they got away 3 times and were caught and killed 7 times. What is the probability that they will steal the war ponies and get away?

The correct answer is 30%, in other words, 3 out of 10 times.

Another way you could possibly answer it is 12%. You could interpret it as there were 25 attempts at stealing ponies, and that the question was,

What is the probability that they will steal the war ponies AND get away?

In that case, the probability is 40% that they will steal the war ponies (versus the scrub ponies) and 30% that they will get away, .40 *.30 = .12

Hoksinato is not sure whether or not they should try to steal the war ponies. What is the probability that warriors stealing war ponies will get away?

A statistician would immediately think of this in terms of P(A|B)  in other words, what is the probability of escape given that they are stealing war ponies? The answer is clearly 30%.

Here is why solving this problem is hard

1. It is not worded to be completely clear whether I need to know the probability of getting away when stealing war ponies or the probability of getting away AND stealing war ponies.
2. I need to decide which numbers are relevant. For this particular problem, the probability of escaping when stealing scrub ponies is irrelevant.
3. I need to decide on the correct operation, in this case, to find the percentage of successful war pony theft attempts, which I find by dividing ten into three.
4. Finally, I need to know the answer to 3/10

Why don’t I just ask, “What is the probability of escaping, if a warrior escapes 3 times out of 10?” Because that is a much easier problem. Here’s the kick in the ass – problems in real life don’t come up that way. They occur ambiguously worded with extraneous information thrown in.

Many people, including most of those awarding funding at the National Science Foundation, realize this and thus very strongly urge mathematics programs to teach problem-solving via discovery learning, guided discovery or other methods. These people are right – to an extent. As you can see above, basic mathematics alone won’t solve this problem.

Many other people, including many math teachers at low-performing schools, believe the NSF is run by COMPLETE IDIOTS because the students don’t know the concept of probability, much less P(A|B) , they have no idea of the notation, where to even begin deciding which are the relevant numbers and oh, yes, they can’t divide 10 into 3 and come up with .30 either. These people are also right – to an extent.

If we are going to teach kids math effectively, we need to fund projects that bring these two sides together.

Y’all get on it.

### Nov

#### 16

November 16, 2012 | 3 Comments

Lots of random people give me advice. I don’t know why. Maybe it’s because:

1. I really do try very hard to listen to other people openly and,
2. I go out of my way to meet new people.

Neither of these traits come naturally to me and I have to put a lot of effort into it. I’m not an extrovert by nature. Given a choice between sitting at my computer writing a new macro to automate reporting and going to a meet-up I would pick the former 10 times out of 10. It turns out, though, that to meet people who will hire you, invest in your company or work for you, that you first have to meet people. I tried the alternative of staying in the office, waiting for someone to drive by and throw a bag full of money through the open door and it didn’t work out for me.

Here then, in no particular order, for those of you still sitting in the office, is some of the advice I have received.

• Everything is a numbers game – sales, investment, even learning about statistics or programming. The more people you meet, the more likely you will meet someone who will hire you, that you want to hire, who knows about a statistical technique or code library with which you are unfamiliar but that can be the solution to the next problem that comes up.
• Fake it until you make it – by this I do NOT mean lie about your credentials. I mean if you feel uncomfortable pitching to investors, writing research reports, programming in SAS or javascript, whatever it is, just do it. Get a job doing it, volunteer for some project at a non-profit, get up at a pitch camp, make up your own side project to create a database for all of your DVDs. Just do it. You’ll get better.
• Hire for character first. Everything else is secondary. – A good friend of mine who is a very successful businessman told me that and I still think is the best single piece of advice anyone ever gave me as a small business owner. The value of our company IS our people, probably even more than most businesses given that 90% of the work here would be characterized as “knowledge work”. I can teach people skills. I can’t teach them honesty, persistency, reliability, initiative.
• You can learn from anyone, especially the people who disagree with you. Listen. This was from my advisor in graduate school. He said people who don’t like you, are jealous of you are the ones most likely to point out your flaws personally and in your work. So what, he said. They’re still flaws. Fix ’em.

You can’t afford not to …  almost anything that begins with this phrase turns out to be bad advice.  We can’t afford not to update our website, invest more in marketing, hire an attorney, increase our business insurance, etc. Actually, we’ve done pretty well keeping overhead as low as possible. I’ve written my own contracts for 27 years. I think I lost \$500 once because a client didn’t pay. I wrote them off as people never to work with again and went on. I think we’re more likely to get in trouble by running in the red. Bad decisions are made in desperate circumstances.

Google/ Microsoft/ Facebook did – almost everything that starts with that is useless. First of all, maybe they did and maybe they didn’t. Often I have no evidence other than the fact that the speaker says so. Even if they did, so what? Maybe Google gave free lunches  to all of its employees, but so did eToys and Pets.com and they both went bankrupt. (Actually, I have no idea whether any of them did or not, see how that works?)

I know I’ve been given a lot more bad and useless advice, but it’s the good advice that has stuck with me. If you had any advice of any kind to contribute, please do.

# If SAS software products were men …

November 15, 2012 | 5 Comments

I may expand this into a series on software products in general. Years ago, I wrote a post on the similarities between the Rocket Scientist and SAS Enterprise Guide. Neither made a great first impression, both revealed their brilliance over time, and I am still with both lo these many years later.

Experiencing both SAS Web Editor and SAS On-demand this semester, I have to say this …. (well, I don’t HAVE to, but that’s never stopped me before) …

SAS On-demand is like a guy I dated who was on the Olympic team. You’d look at him and think,

“Gee, he should be a winner. He’s certainly got good genes and when he shows up, he looks nice.”

He was a nice guy, too, but just thick as a brick. Just like SAS On-Demand for Enterprise Guide, he meant well, but I couldn’t count on him to keep up. I don’t have to worry he’ll be reading my blog and get his feelings hurt because that sentence involved both “reading” and “blog”. Actually, I could have stopped at the word, “reading”. So, it came down to this,

“You know, I think we should see other people. We’re just not compatible. I’m really interested in a faster lifestyle.”

The positive  thing about dumping people who are really dumb is that they don’t realize you’ve dumped them until a couple of months later, if ever. Just like my old boyfriend, SAS On-demand for Enterprise Guide might turn out okay and it is certainly fine for some people. I happen to know he married a very nice, not-too-bright woman and they have charming, athletic, attractive, moderately intelligent children who will probably run for Congress some day.

SAS Web Editor is like the promising young men I’d like to introduce to my daughters  – intelligent, quick-witted, good-looking and seems likely to have a very good future. Really, it looks exactly to be SAS running on Linux with just a shiny front-end. Kind of like that guy getting his Ph.D. in computer science, but he’s cute.

(Of course, none of said daughters are actually getting introductions to men from me. They have told me in no uncertain terms that they don’t need my help in getting dates. Random picture of sample daughter presented below as support for this hypothesis. Also, a camel family.)

If you would like to compare your most (least) favorite product to a person of the opposite gender, please feel free to chime in.

# How to add hours to the day

My problem as a professor has been how to fit everything into the schedule that students really should learn given that universities are under pressure to require less and less. There isn’t an easy answer. Unlike back when I was attending graduate school, riding in on my pterodactyl, most students are now working full-time, which means they have fewer hours in the day to study and attend class. When I took the course I am teaching now, it was SIX hours a week, three of lecture and three of computer lab. Now I have three hours and that’s it, but – and you may have noticed this – computers are still used in statistical analysis.

My solution is to record how-to videos and upload these to the class website. This semester, I started using SAS Web Editor after the semester started so I have to make TWO sets of videos, one for SAS Enterprise Guide and one for the SAS Web Editor.

The SAS Web Editor runs SO much better, though, that I think I will only use it after this semester.

You can compare the two here:

SAS Enterprise Guide (this isn’t even On-Demand. The On-Demand movie was so slow I couldn’t stand it. This was done on a desktop).

This is the SAS Web Editor. It’s much faster and it runs on both Mac and Windows.

There are a lot of benefits of movies like these. They expand instructional time. Students can watch them at their own pace, stop and re-wind.

The disadvantage, of course, is that while it adds hours of instructions the students receive it also adds hours to MY work day. That’s one of the reasons that I only teach one course a year. I don’t want to do it unless I have time to do it well. That’s also one of the reasons I quit being a professor full time. I never felt like I had the time to do as much as I should to teach really well, and I’ve never been able to accept doing “just okay”.

# Software packages I cannot live without

November 13, 2012 | 3 Comments

I read an interesting question years ago, on the JMP blog, “What are 5-9 software packages you can’t live without?”

That reminded me that when I started this blog almost four years ago, I wrote a post with the title “Nine software packages I can’t live without” . I never finished publishing the list, I started another list a year ago that I never published, and then started another one today – which I finally finished!

My daughters call me an “anti-hoarder” because I am always going through the house and giving away old clothes, appliances or other items we no longer need. With a new “app for that” coming along every 15 seconds, I thought it would be interesting to re-visit that and see which I really and truly could not live without. Top of my list is, showing up all three years was :

1. SAS

I hesitated to put this at the top of the list because if I didn’t get a free SAS On-Demand license from teaching at Pepperdine University and have free access to computers with licenses on-site at clients’ office, I’d certainly gulp at the price tag. That being said, SAS does almost everything I need done and does it well. Last year, and this year I had it on my list as the software I used the second-most. Four years ago, it was further down, but the point is, there has not been a year that I have not used it OFTEN. As I rant about a lot here, I get a lot of messy data that needs to be beaten into shape and SAS is good for that. I also do every kind of statistics from bar graphs to  mixed models to survival analysis . As I tell students all of the time, if you’re taking one statistics course, buy SPSS. It’s cheap (for a student license) and easy to learn. If you are planning a research career, learn SAS. Yes, it is harder. The hard stuff is what people are willing to pay you for (duh!)

2. Graphic converter – Some people call this the poor man’s photoshop. I would have no idea. I never use photoshop, having failed art in junior high and never tried it again. People say you can’t fail art. They are wrong. You just have to be really, really bad at it. I use it for photo editing, changing gif backgrounds to transparent, cropping, re-sizing. It’s super-cheap. \$40. You can’t go wrong. This has also been on my list every single year.

3. Webstorms. I purely love Webstorms. I started coding Javascript with Text wrangler (see below) but webstorms has saved me SO much time and caught so many errors. It is the number one package I used most often this year.

4.  Dreamweaver: This was on the list four years ago, not on the list last year and indispensable today. We had some “legacy clients” for whom we did website work. They came with The Julia Group when we split off from Spirit Lake Consulting, Inc. years ago. Some of them have thousands of web pages and Dreamweaver was the best solution to manage that. Although we don’t do much website development any more (including our own, obviously, if you take a look at the site!) we do provide “vertical integration” of services. That is, we have some clients in remote communities and we’re their technical “go-to” for everything. These days, the rocket scientist (retired) is handling that and he prefers WordPress. On the rare occasion that I actually got around to doing anything with The Julia Group site, I used Textwrangler for that, too, and upload it in Fetch. This is ironic because in my post four years ago, I made fun of people who coded HTML and CSS as being slow. The difference is, if it is only a line or two, I can fix it in the time Dreamweaver takes to open. THEN, this year, we ended up doing a lot more web development. So, I am back to Dreamweaver again.

5. NOTE-TAKING AND BRAINSTORMING

OmniOutliner Pro: I still use this for outlining any large new project, whether it is a grant or a book I am writing. I bought it years ago for pretty cheap – I think I got it on sale for under \$40. Four or five years later, I downloaded the upgrade and it was free. Pretty awesome deal. I thought I could replace this with something that would be accessible everywhere. I tried a number of products for notes – Google notebook worked fine until they discontinued it. I tried another notes package I have forgotten since I used it for about a month and then they went out of business! I tried both Zoho and Evernote. Neither worked seamlessly between my iPad and computers, so in the end I am stuck taking notes on the iPad notepad if I don’t have my computer at hand and using omnioutliner the rest of the time. I don’t use this all that often but I do use it several times a year when I am starting a new project.

Text wrangler from Bare Bones software: It’s surprising because it is free and was not on my list four years ago. At the beginning of the year, I used it almost every day, sometimes for quickly editing html. Mostly, I used it when I was playing around with simple Javascript. These days, my js isn’t simple and it isn’t playing so I’ve moved up to Webstorms.  Very often I need it for an “off-label” use. I often get files in formats ranging from SAS report to God knows what and I have no application on my Mac that will open them. Using Text Wrangler, I can open the file and read the text. Yes, there may be all kinds of ugly formatting codes around it, but if I just need to see R-squared = .648   or “Meet me at 3:30 by the pool”,  as long as I can read the text, I’m fine.

7. Open Office

My Windows office computer, the three lap tops (including the one belonging to the world’s most spoiled fourteen-year-old), the rocket scientist’s desktop and the computer in the living room (because someone felt we must have one there) all run Open Office. I like the templates for  Impress – their version of Powerpoint – better than the Microsoft ones. Plus, we have so many computers, and are buying another one, buying the Microsoft version is just too much money.

8. SPSS

Some clients are more comfortable with SPSS and it runs native on a Mac, so if I don’t want to go to the massive effort of turning around and using the computer on the desk on the other side of my office, or re-starting my laptop or desktop in boot camp, I use SPSS. Also, if you teach, there are some really good deals for educators buying SPSS, check it out. Otherwise, it costs you approximately a kidney and your first born.

On thin ice

The Microsoft Office package … I have it on two computers, my Mac desktop and one of the laptops. I’ve actually found myself using Office MORE this year. Last year I only had it on one computer. The main reasons I am using it more is that I teach a class and use Powerpoint. I wrote a book that was hundreds of pages long with lots of photos and used Office for that. The book is done and the class ends in three weeks so I don’t know how much need I will have for Office after that.

So, that is my list of software I would take with me on a desert island – or, more likely, to the Bahamas. What’s yours?

Disclaimer regarding sponsorship: Nobody paid me a damn thing to endorse their products above. You’d think that someone could at least go the effort to TRY to bribe me now and then, but NO-O-O !

# I was wrong

Last week, teaching my statistics class, I gave an example of regression using actual data. I had hypothesized that given the greater discrimination and poverty on the reservation 40 or 50 years ago, there would be a negative relationship between age and educational attainment within the adult population. That is, people over 60 would be less likely to have obtained a high school diploma, associate’s degree or bachelor’s degree than those under 40.

To get to the point – I was wrong. I deleted the outliers on both age and education. I was still wrong. The students were waiting for me to do the next analysis to prove that I was right but there was no next analysis. It seems that, at least for the population I had sampled, I was simply wrong.

I try to do this at least two or three times during the semester with every course I teach.  The first time, the students are always surprised. We present research in textbooks in such a neat, linear fashion – you have a hypothesis, you collect data, analyze data, reject the null hypothesis, write up your conclusions and either publish your results or pick up a fat speaker fee to talk about your brilliant study.

It doesn’t always work that way. One advantage of being a small company with multiple clients is that we aren’t so tied to anyone that we feel that we MUST produce certain findings. I’ve worked for pharmaceutical companies, educational institutions – you name it. All of my clients are really smart, competent people. If I didn’t believe that, I wouldn’t be working with them – you only get one life and why waste it with people who aren’t rewarding to work with? No matter how smart, educated, experienced and hard-working you are, sometimes the results don’t come out the way you expected.

The profoundest lesson my advisor, Dr. Richard Eyman, taught me was,

“The data show what they show.”

Sometimes they show that you are wrong. Not only do you have to accept that, you even have to expect it.

# My not-quite-year of code

November 11, 2012 | 2 Comments

I decided about a year ago that it was time to write an educational game to teach math. None of the languages I knew were an optimal choice for this. I’d been programming in SAS for 30 years and it’s an excellent choice for processing the data that come in from pre-tests, student answers and site statistics, but not for writing a game. I’d use FORTRAN, BASIC and even COBOL decades ago but those were totally out. I’d toyed with Ruby some last year, even going so far as to complete a project with it for text analysis. In the end, javascript, jQuery and some related libraries seemed the best choice. At one point, I did talk to a couple of people about writing the code and me doing the game design and data analysis, but that never quite got off the ground. I had a couple of very brief meetings with people who suggested that I come up with an idea and then pay someone in India or eastern Europe sub-minimum wage to code it up. For a long list of reasons, I thought that idea totally sucked.

Let me be clear that we’re not talking about an app you make with Game Salad or some other kit where you are basically shooting things and dropping things. First of all, it needed to align with state standards – that is, the specific skills and concepts that students are supposed to know at a particular grade. Second, we needed to save the data so that we knew how kids were doing, could track that and report back to their teachers and school administration. Third, it had to be fun to play. Fourth, it had to provide instruction to students, so that if they gave the wrong answers, they were routed to APPROPRIATE instruction. Fifth, since the schools were on American Indian reservations, the game had to include the students’ language and culture – which meant things like embedded sound files, accurate renditions of Native American legends and history.

All of those five parts were important.

I started out with Codecademy and perhaps it has gotten much better since I tried it, although given the fact that ten months later, comments on that post reflect similar frustrations, I don’t think so. My main advice is that if you have tried Codecademy and gotten frustrated, don’t give up. I really did give it the old college try. I went through 73 exercises in a week or two and decided my time could be better spent. Your mileage may vary. The next thing I did was get two books, The Essential Guide to HTML5 – which uses game programming to teach javascript – and Javascript: The Definitive Guide.

I took a page from when I wrote my masters thesis back in 1980 – I got up and worked on javascript for two hours first thing every morning. Some mornings I laid in bed and read the definitive guide for two hours. The Rocket Scientist had given me the book and he loved it, describing it as “really easy to read”. I didn’t find it easy (and I have three graduate degrees and years of programming experience) but it was definitely definitive. I’m way past the age of  feeling the need to prove I’m smart, so if I didn’t understand part of it, I wasn’t terribly bothered.  I read that part again, and sometimes a third time.

The Essential Guide to HTML5 , by Jeanine Meyer, was the opposite of the Definitive Guide in that it did not at all try  to explain javascript in a linear fashion. Instead, she starts write in with game programming. I went back and forth between the two books and within a few weeks had bits and pieces of games – a program to roll dice and play craps, a timer, a memory card game, a program to shoot a buffalo with a cannon (why not?), something to move a small Indian hunter avatar around a screen. In retrospect, that is pretty good for a language I didn’t know at all a few weeks before, but at the time it seemed glacial speed because I really wanted to get working on this game and my skills had nowhere near caught up to my ideas.

Although I had written simple games for use in teaching as far back as when I was a student teacher for middle school mathematics in 1985 (I wrote the game in Basic), and done quite a few on-line learning applications, I had never written a large-scale educational game of this type. Enter the USDA Small Business Innovation Research program, which awarded our company \$99,000 to develop a prototype.I submitted the grant in February based on the presumption that by the time funding came around in June that  I would have learned enough javascript to be able to do it.

It wasn’t just javascript/ jQuery etc. It was also SAS, which was a piece of cake for the data analysis. There was also Dreamweaver, SQL, HTML, iMovie, Graphic Converter and CSS all of which I had been using for years but needed to know a little better. I picked up a book on Dreamweaver at Barnes & Noble and watched a few youtube videos. There was Final Cut Pro (which I DON’T like) that I had to learn and Garageband I hadn’t touched in years that I needed to re-learn. Fortunately, every piece of software out there has multiple communities on the Internet and almost all of them are helpful. The game has 2-D, 3-D and data analysis components. About the time I was getting started, The Rocket Scientist retired and since I really needed another developer and he is the best there is, I convinced him to work with me by a combination of sex, money and the chance to select whatever parts he wanted to work on. He chose the 3-D component. When we were falling behind schedule, we were lucky to get another terrific programmer to step in and write the PHP scripts.

In August, we did a demo of a really rough draft of our game. The beta version was installed in October and had improved so much that our site coordinator on the reservation said,

“Wow! This is great! Now when I tell people how good our game is, I won’t be lying.”

“No, I’m just kidding. I knew you could do it.”

Version 1.1 shipped out on Thursday. I have made massive progress on the version 2.0 which will be available by the end of November. After all of these days working from 10 a.m. to 2 a.m., I think I should have some profound advice. I will offer this – don’t give up. If you find that “everybody is learning to code with Codecademy” and it just is not doing it for you, the pace is too slow, you don’t give a rat’s ass about making a taxi fare calculator, whatever – then do it a different way.  Just about every day, I have to FORCE myself to get up from the computer and go buy groceries, get some exercise, change the guinea pigs’ cage.

I have five meetings scheduled in the next two weeks, two proposals for development funds due in the next four months and a Kickstarter video to finish. My fondest wish is for someone else to take it all over so I can code. Realistically, some of it I cannot delegate to anyone else, and we do have the right person coming on in two months to pick up some of the slack.

So, how has my year of code come out? Pretty phenomenal, I’d say.

# Start-up Life: Day 1 or Day 7,832

We shipped. Again. It was not everything I wanted to go into version 1.1 but we had four days to get it installed on all of the computers being used at our test site and one of those days was being taken by FedEx to get our flash drive to North Dakota and the other three were a three-day weekend. So we had to mail it to arrive by Friday afternoon. Yes, future updates will be done remotely but that was yet another thing we did not have time to do yet.

Have we learned anything? Well, in addition to learning javascript, jquery, Unity 3D, webstorms, PHP storms, PHP and increasing our knowledge of SQL, CSS, HTML and Dreamweaver – yes.

In an amazing post that I am going to memorize, on Why Startups Die, Andrew Montalenti quotes Paul Graham’s advice to not do other things, don’t go to graduate school, have other projects. We’re decades past graduate school, so we’re good on that.

In fact, the Rocket Scientist retired to work on this project. I have quit taking new clients.  (I am keeping clients who still need us. We have been a consulting company for 27 years and I am a big believer in loyalty.)

Andrew advises not to be “scared of code”. Since we are technical-heavy in our founding team, that is not as much of a problem. As we ARE boot-strapping and paying the bills with \$100K in SBIR funds plus money from our consulting projects, I do sometimes take a deep breath when I am writing something that I KNOW I will need to re-write later, but it’s a fact. Sometimes, you don’t know what you need to fix until you have a working prototype, shake your head and say,

“That’s not right.”

Andrew’s advice is right on target when he says, “Be persistent.”

I have had this idea for a game to teach math since I applied to the Ph.D. program in 1985. It was on my application for what I wanted to do for my dissertation – build a game to teach kids math. I did it, in BASIC, but not for my dissertation. The capabilities for what I wanted to do did not exist when I graduated in 1990.

Going back to Paul Graham’s original post, he says to find something at least someone really loves. We knew we were on to something when teachers at the school pilot site and parents of kids testing our program after school had to say repeatedly,

“You have to go now. Five more minutes and you really have to stop playing. Okay, you need to shut the machine down, I’ll give you thirty seconds.”

After sending off the update, the rocket scientist went to a physics lecture at UCLA to relax and I went to teach my class on Advanced Quantitative Analysis. We went out late, had a few drinks to celebrate and this morning were at it again.

Be persistent.

Here is some additional advice from me. If you’re tired and you don’t want to write one more line of code, edit another video, make another GIF file or whatever it is you’ve been working on, do some other part of the project. Do documentation. Work on a presentation for investors. Go to a meet-up. Read a book on programming or game design. Watch a video on a javascript library you might use. Read articles you could include in a grant proposal.

That is a lot of what I did today and will probably do tomorrow.

Curiously, this is something I learned from becoming the world judo champion. When I could not spend another five seconds doing arm bar drills because I had already done 10,000 of them, I would go to the gym and lift weight.

It reminds me of a line from a philosopher I read when I was in high school.

“The best is the enemy of the good.”

Maybe the BEST thing you could do each day is to write more lines of code to make your game dance, but documenting it, designing the next level, looking at other games to get new ideas, talking to the beta testers to get their feedback – those are all GOOD things. When you are burnt out from doing the best thing you can do, go do some of those good things until you feel a bit rested up, and then go back to your core function.

So far, it seems to be working for us.

# Baby steps to regression

November 8, 2012 | 1 Comment

What do you see when you look at a regression analysis? Because me, all I see is a bunch of numbers and I have no idea where to look first or what’s important. Could you start me off with regression in some baby steps? What is it that you are looking at when you stare at this stuff?

Never one to shy away from a student’s request, here you go. I had data from 104 people aged 16 -71 living on an American Indian reservation. All but 4 of them were over 18. I thought that there would be a NEGATIVE relationship between educational achievement and age given that the older people would have had fewer opportunities, for a lot of reasons.

When I ran the regression, this was the first table I looked at.

This tells me that the correlation between the years of education and education was .268. Since it is positive, I can already see my hypothesis is not supported. The R-square is the amount of explained variance – so, 7.2% of the variance in educational attainment in this adult population is explained by age.

The ANOVA table  tells me that my F-value is 7.94 and the probability of an F-value that large is less than .01  – in fact, it is .006. So, age is positively related, it explains about 7.2 % of the variance and this is statistically significant. If you divide the regression sum of squares by the total sum of squares you will find the quotient is .072. This is not coincidence.

The intercept tells me what the value of education would be if age was zero, which is where the regression line intercepts the Y axis. The constant is 9.993.  Children on the reservation really aren’t born with almost 10 years of education, which gives you some insight into the fact that you really shouldn’t interpret the intercept in cases where an X of 0 is not really feasible. I’m interested in educational attainment of ADULTS.

A more useful statistic is the standardized beta coefficient. In the case where you only have one predictor, this will always equal the correlation between the dependent and the independent. Of course, it is significant and at the same level as the overall model, since it is the only variable in the model. If you square the t-value of 2.818, you’ll see it equals 7.94. This isn’t a coincidence, either.

Okay, so we have a model that is significant, there is a positive relationship between age and education, with age explaining about 7% of the variance.

I always want to do some checks for possible outliers, so I graph the data like this:

It’s a pretty skewed distribution, with that one person at the far right being four standard deviations above the mean for education.

I also see, when I plot age by years of education that our one highly educated person is also over 60, so extreme in both ends.

I re-run the analysis without this one individual to see what happens. In fact, the regression is still significant, still positive but by dropping this one person the explained variance has dropped from 7.2% to 5%. (I could have looked at all of the same tables again, but you asked about a “quick and dirty” look, and I’d probably just glance at that one.)

You might think if I dropped one outlier and it made that much difference, maybe dropping the handful who were under 18 years of age would make a difference also. I did that, ran the regression again, and this time with 99 of my original 104 people the explained variance had dropped to 2.6% — so, by dropping out just five people, less than 5%, the explained variance is now one-third of what it was and my model is non-significant.

So …. hopefully this gives you a bit of insight into the first glances at a regression model and also the importance of not jumping up and running off as soon as you find a model with a significant F-value. Try to consider significance, explained variance, the standardized regression coefficient and the potential effect of outliers, for your first few baby steps.

# Nate Silver: The Statistician’s Hero

November 7, 2012 | 11 Comments

While about equal percentages of the general public will be either happy or upset with the outcome of the presidential  and congressional elections, 100% of statisticians will be cheering Nate Silver.

The reason I’ve watched Silver’s blog very closely is because I’m a big believer in the Central Limit Theorem, which states that the mean of an infinite number of reasonably large random samples will be the population mean. Although there were not an infinite number of polls taken before the election – it only seemed that way – there was definitely a large number.

In states where 50 or more polls were taken and ALL or all but one or two showed a small advantage in favor of President Obama, the probability that the true population mean was in favor of Mitt Romney was very, very, very low.

Then, if you take several states where that was the case, the probability that several of them actually had a population mean in favor of Romney would be even lower. It would not be the product of the individual probabilities because it is very unlikely that those probabilities are independent. If the polls in one state were all wrong, it would be due to bias in their sampling and that same bias would exist in other state samples as well.

What impressed me about Silver was not his math, not his models. I made the same predictions in my statistics class weeks ago. In fact, I just received this email from a student,

“Dr. De Mars,

If Obama wins can you please “statistically” explain to me why you projected that last month? “

No, what impressed me about Silver was his courage. Every statistician who looked at his results nodded in agreement. Some even tweeted when Silver was disparaged,

“I guess the Republicans have secretly disproved the Central Limit Theorem.”

It’s one thing to make predictions in your class, or make snarky comments on twitter. It’s another to make them in the New York Times and on national television. Although the probability may have been 92% of an Obama victory, there was still an 8% chance he would be wrong and been publicly humiliated on every conservative network and blog, and on all of the moderate and liberal ones that didn’t understand math.

Most people wouldn’t take an 8% chance of that – which is why Nate Silver is my hero. By taking a chance, going out on a limb, he brought mathematics, statistics and science in general to a much higher profile and level of confidence. Now maybe people will believe scientists about that global warming stuff, too.

This cognac is to you, Nate. You’re my hero.

« go backkeep looking »