Predictor variables – when order does and does not matter

Having failed recently to use BMI as a variable from a data set on school children in our example for propensity score matching, because people who fill out surveys are big, fat liars, we next went to a sample of really old people and used death as a dependent variable

Try faking it on the dichotomous dead (yes/ no) variable. I dare you!

…  which caused me to start thinking about other differences when using logistic regression

By the end of two or three statistics classes, everyone knows that there are four types of sums of squares. Okay, well at least they know that there are at least two types of sums of squares, Type I and Type III.

The first one, coincidentally referred to as Type I, is also called the sequential sum of squares. Let’s assume you have two factors, Alcohol Use and Cigarette Smoking and from whether the student uses alcohol or smokes cigarettes, you are trying to predict how much he smokes marijuana. You have an Analysis of Variance with an continuous dependent and two categorical predictors. If your model use Type I sum of squares, and your statement is

MODEL  marijuana = alcohol cigarettes ;

You will get one estimate for the sum of squares for, say, alcohol. If your statement is

MODEL marijuana = cigarettes alcohol ;

You will get DIFFERENT estimates.

On the other hand, with the Type III sum of squares, you will get the same result regardless of order.

SAS by default gives you the Type I and Type III sum of squares and expects you to know what you are doing. SPSS only gives you the Type III sum of squares and expects you not to ask any questions.

There is a nice explanation of Type I and Type III sum of squares on Matt’s blog, which he actually re-posted from somewhere else, but I found Matt’s blog generally interesting so I linked there.

What if, instead of doing an ANOVA you were doing a logistic regression, with not how much marijuana the person smoked as the dependent but whether he ever smoked it or not?

Does order in your MODEL statement matter in logistic regression? The short answer is – no.

The fact that the table with the Wald chi-square is labeled Type 3 Analysis would tip you off there, but what about odds ratios, parameter estimates, concordant pairs? Nope, nope and nope.

Try it for yourself and see.

(Of course, you COULD do stepwise logistic regression – but I wouldn’t.)


Similar Posts

One Comment

  1. Dear Annmaria,

    Can you help me in fitting a poisson regression in sas having only an intercept term and no other co-variate?

    Moreover how i can fit a generalized poisson regression in sas as there is no built-in function available?

Leave a Reply

Your email address will not be published. Required fields are marked *