# Ask Me Anything: Part I

Filed Under statistics

It’s that time of year, near the end of the semester, when I ask students to write down any questions they may have about material covered in the course. This semester I am teaching Advanced Quantitative Data Analysis. I thought other people might be interested in the answers to the questions students asked as well.

Is it possible to have a Mean Square of 0 ?

Yes, it is possible but if your model mean square is exactly zero then it is extremely likely that something is wrong with your model, in addition to the obvious problem that your mean square is zero. Normally, even if you select a random number as a predictor or dependent variable, you’ll get some very small model mean square value, it won’t be EXACTLY zero. One instance in which you will get  a zero model mean square is when your dependent variable is a constant.

Another case in which you will get a zero for model mean square is if you are using SAS Enterprise Guide, your dependent variable is standardized, with a mean of zero, and you forget to specify effects for your model. If you fail to specify effects, SAS EG includes the intercept in the model. However, if your intercept truly is zero you will get a model mean square of zero. I’d say if your model mean square is exactly zero something fishy is up.

Is there a limit to the number of variables to be used for a multiple regression?

If your question is whether there is an absolute number, like no more than 42, the answer is,

“No, there is no set maximum number of variables that can be used for a multiple regression.”

Obviously, you need to have at least two predictor variables or it is not a multiple regression. You cannot have more predictors than there are observations because then you cannot find a unique solution. If you have 30 subjects you can’t have 42 independent variables. On the other hand, if you have 3,000,000 subjects, 42 independent variables would work mathematically. It might be a difficult analysis to interpret, though and you might run into problems of multicollinearity.

When doing a 2-way ANOVAs with SAS Enterprise Guide, does it matter if I drag dependent variable under classification variables?

You drag INDEPENDENT variables under classification variable.  You would never put your dependent variable there.

Can type I sum of squares and type III sum have equal values sometimes?

They can and do. The Type I Sum of Squares is also called the sequential sum of squares. The sum of squares for each term is given controlling for the terms that precede it in the model. Some people don’t like Type I sum of squares because the SS can change for a variable if you change its order in the model. Let’s say we are predicting SAT scores based on SES (upper, middle, lower) and school type (private or public). If we look at the effect of school without controlling for SES and with controlling for SES we may get very different sums of squares.

The Type III Sum of Squares is also called the marginal sum of squares. It gives you the sum of squares controlling for all of the other effects in the model.

If you think about it for a minute, you’ll see that the Type I and Type III sum of squares are always the same for the last term entered in the model, in a two-way ANOVA, this is usually the interaction effect. For the last term, in Type I you get the sum of squares controlling for all of the effects entered previously on your model statement, which is all of the other effects. In the Type III sum of squares, you get the sum of squares controlling for all of the other effects in the model for all of effects, including the last one.

So …. if your model statement is

MODEL gpa = ses school ses * school ;

The Type I and Type III for ses* school will be identical.

If you leave off your interaction term and have

MODEL gpa = ses school ;

The Type I and Type III for school will be identical.

You will also get the same values for Type I and Type III for effects in the model other than the last one if there is zero shared variance. Just like with getting exactly a zero mean square, though, that is unlikely to happen. Even when I tried a couple of times just creating categories from a random number function

( IF RANUNI(4) > .42 THEN rand2col =  912 ;

ELSE rand2col = 875 ; ) The Type I and Type III sums of squares were not EXACTLY the same. The Type III always showed just a tiny bit decrease from Type I due to a minute amount of shared variance being present just by chance. However, the Type I and Type III were the same for all practical purposes. For example, in one case I used attitudes toward abortion ( a scale) as the dependent variable, church type (fundamental or non-fundamentalist) and my random categories as the independent.

The Type I SS of church type was 31.34 (F = 8.56, p <.008) and Type III SS was 33.0 (F = 9.01, p < .007) .

The fact that I have time to create random categories in an attempt to find exactly identical Type I and Type III SS  is perhaps evidence that I need to find a hobby. Maybe knitting?