Ever wonder why with goodness of fit tests non- significance is what you want?

Why is that sometimes when you have a significant p-value it means your hypothesis is correct, there is a relationship between the price of honey and the number of bees, and in other cases, significance means your model is rejected? Well, if you are reading this blog, it’s possible you already know all of this, but I can guarantee you that students who start off in statistics learning that a significant p-value is a good thing often are confused to learn that with model fit statistics, non-significance is (usually) what you want.

You are hoping that you find non-significance when you are looking at model fit statistics because the hypothesis you are testing is that the full model – one that has as many parameters as there are observations – is different than this model you have postulated.

To understand model fit statistics, you should think about three models.

**The null model, and contains only one parameter, the mean.** Think of it this way, if all of your explanatory variables are useless then your best prediction for the dependent variable is the mean. If you knew nothing about the next woman likely to walk into the room, your best prediction of her height would be 5’4″ , if you live in the U.S., because that is the average height.

**The full model has one parameter per observation.** With this model, you can predict the data perfectly. Wouldn’t that be great? No, it would be useless. Using the full model is a bad idea because it is non- replicable

Here is an example data set where I predict IQ using gender, number of M & M’s in your pocket and hair color.

EXAMPLE

Male 10 redhead 100

Female 0 blonde. 70

Male 10 blonde 60

Female 30 brunette 100

50 + MMx1 + female x 20 + redhead x 40

Is that replicable at all? If you selected another random sample of 4 people from the population do you think you could predict their scores perfectly using this equation?

No.

Also, I do not know why that woman has so many M & M’s in her pocket.

In between these two useless models is your model. The hypothesis you are testing is that your model, whatever it is, is non-significantly different from the full model. If you throw out one of your parameters, your new model won’t be as good as the full model – that one extra parameter may explain one case – but the question is, does the model without that parameter differ significantly from the full model. If it doesn’t then we can conclude that the parameters we have excluded from the model were unimportant.

We have a more parsimonious model and we are happy.

But WHY do more parsimonious models make us happy? Well, because that is kind of the whole point of model building. If you need a parameter for each person, why not just examine each person individually? The whole point of a model is dimension reduction, that is, reducing the number of dimensions you need to measure while still adequately explaining the data.

If, instead of needing 2,000 parameters to explain the data gathered from 2,000 people you can do just as well with 47 parameters, then you would have made some strides forward in understanding how the world works.

Coincidentally, I discussed dimension reduction on this blog almost exactly a year ago, in a post with the title “What’s all that factor analysis crap mean, anyway?”

(Prediction: At least one person who follows this link will be surprised at the title of the post.)

i don’t think you can quantify IQ AT ALL. For instance some people are incredibly intuitive and that can Not be quantified either in my opinion. Even if you spent years studying an individual’s IQ, the variances alone I don’t believe can be measured or labeled.