statistics

Yes, You Totally CAN Understand Model Fit Statistics, with M & M’s

ByAnnMaria De Mars October 15, 2014

Ever wonder why with goodness of fit tests non- significance is what you want?

Why is that sometimes when you have a significant p-value it means your hypothesis is correct, there is a relationship between the price of honey and the number of bees, and in other cases, significance means your model is rejected? Well, if you are reading this blog, it’s possible you already know all of this, but I can guarantee you that students who start off in statistics learning that a significant p-value is a good thing often are confused to learn that with model fit statistics, non-significance is (usually) what you want.

You are hoping that you find non-significance when you are looking at model fit statistics because the hypothesis you are testing is that the full model – one that has as many parameters as there are observations – is different than this model you have postulated.

To understand model fit statistics, you should think about three models.

The null model, and contains only one parameter, the mean. Think of it this way, if all of your explanatory variables are useless then your best prediction for the dependent variable is the mean. If you knew nothing about the next woman likely to walk into the room, your best prediction of her height would be 5’4″ , if you live in the U.S., because that is the average height.

The full model has one parameter per observation. With this model, you can predict the data perfectly. Wouldn’t that be great? No, it would be useless. Using the full model is a bad idea because it is non- replicable

Here is an example data set where I predict IQ using gender, number of M & M’s in your pocket and hair color.

EXAMPLE

Male 10 redhead 100

Female 0 blonde. 70

Male 10 blonde 60

Female 30 brunette 100

50 + MMx1 + female x 20 + redhead x 40

Is that replicable at all? If you selected another random sample of 4 people from the population do you think you could predict their scores perfectly using this equation?

No.

Also, I do not know why that woman has so many M & M’s in her pocket.

In between these two useless models is your model. The hypothesis you are testing is that your model, whatever it is, is non-significantly different from the full model. If you throw out one of your parameters, your new model won’t be as good as the full model – that one extra parameter may explain one case – but the question is, does the model without that parameter differ significantly from the full model. If it doesn’t then we can conclude that the parameters we have excluded from the model were unimportant.

We have a more parsimonious model and we are happy.

But WHY do more parsimonious models make us happy? Well, because that is kind of the whole point of model building. If you need a parameter for each person, why not just examine each person individually? The whole point of a model is dimension reduction, that is, reducing the number of dimensions you need to measure while still adequately explaining the data.

If, instead of needing 2,000 parameters to explain the data gathered from 2,000 people you can do just as well with 47 parameters, then you would have made some strides forward in understanding how the world works.

Coincidentally, I discussed dimension reduction on this blog almost exactly a year ago, in a post with the title “What’s all that factor analysis crap mean, anyway?”

(Prediction: At least one person who follows this link will be surprised at the title of the post.)

SENSITIVITY, SPECIFICITY AND SAS USAGE NOTES

ByAnnMaria De Mars March 10, 2016March 10, 2016

SENSITIVITY AND SPECIFICITY – TWO ANSWERS TO “DO YOU HAVE A DISEASE?” Both sensitivity and specificity address the same question – how accurate is a test for disease – but from opposite perspectives. Sensitivity is defined as the proportion of those who have the disease that are correctly identified as positive. Specificity is the proportion…

statistics

The Emperor’s New Statistics

ByAnnMaria De Mars May 3, 2010May 4, 2010

I had the pleasure of attending a lecture Rand Wilcox gave on the state of research. He was far more amusing than I expected from a statistician (perhaps this reflects low self-esteem on my part). He made the very valid point that all statisticians learn in the infancy of their careers that the general linear…

Software | statistics

Statistics is Everywhere: An unexpected use of PROC SURVEYSELECT

ByAnnMaria De Mars February 4, 2012

Although I tell my students all of the time that statistics is everywhere, even I did not really see where mixed martial arts, free rice and PROC SURVEYSELECT could possibly have anything in common. Here is what happened …. Mixed martial arts Darling daughter #3 after the Olympics decides not to go to college as…

Dr. De Mars General Life Ramblings | statistics | The Julia Group

Men, Women, Tech, Discrimination & Statistics

ByAnnMaria De Mars October 13, 2015October 13, 2015

Let’s get this out right up front – I have no question that there is discrimination in the tech industry. I gave an hour-long talk on this very subject at MIT a couple of weeks ago, where I pointed out that everyone’s first draft of pretty much everything is crap – your first game, first…

20 Day Blogging | Software | statistics

Livebinders: 20-day blogging challenge, day two

ByAnnMaria De Mars January 8, 2014January 14, 2014

Today I’m on day two of the 20-day blogging challenge, the brain child of Kelly Hines and a great way to find new, interesting bloggers. The second day prompt was to share an organizational tip from your classroom, one thing that works for you. The latest tool I’ve been using is livebinders . Remember when…

statistics

What’s epidemiology? A definition with a side of SAS

ByAnnMaria De Mars January 5, 2016

I’ll be teaching a graduate course in epidemiology in the spring and giving a talk on biostatistics at SAS Global Forum in April, so I thought I’d jump ahead and start rambling on about it now. When I tell people that I teach epidemiology, the first question I usually get is, What’s epidemiology? In short,…

One Comment

Melissa Pontello says:

October 15, 2014 at 6:27 pm

i don’t think you can quantify IQ AT ALL. For instance some people are incredibly intuitive and that can Not be quantified either in my opinion. Even if you spent years studying an individual’s IQ, the variances alone I don’t believe can be measured or labeled.

Similar Posts

One Comment

Leave a Reply