Let’s Talk about Multivariate Research Designs: Part 1

(There may even be a part two, if I get around to it.)

Let me ask you a couple of questions:

1. Do you have more than just one dependent variable and one independent variable?

2. If you said, yes, do you have a CATEGORICAL or ORDINAL dependent variable? If so, use logistic regression. I have written several posts on it. You can find a list of them here. Some involve Euclid, marriage, SAS and SPSS. Alas, none involve a naked mole rat. I shall have to remedy that.

3. You said yes to #1, multiple variables, but no to number 2, so I am assuming you have multiple variables in your design and your dependent variable is interval or continuous, something like sales for the month of December, average annual temperature or IQ. The next question is do you have only ONE dependent variable and is it measured only ONCE per observation? For example, you have measured average annual temperature of each city in 2013 or sales in December , 2012. In this case, you would do either Analysis of Variance or multiple regression. It doesn’t matter much which you do if you code it correctly. Both are specific cases of the general linear model and will give you the same result. You may also want to do a general linear MIXED model, where you have city as a random effect and something else, say, whether the administration was Democratic or Republican as a fixed effect. In this case I assume that you have sales as your dependent variable because contrary to the beliefs of some extremists, political parties do not determine the weather. Generally, whether you use a mixed model or an Ordinary Least Squares (OLS) plain vanilla ANOVA or regression will not have a dramatic impact on your results unless the result is a grade in a course where the professor REALLY wants you to show that you know that school is a random effect when comparing curricula.

4. Still here? I’m guessing you have one of two other common designs. That is, you have measured the same subjects, stores, cities, whatever, more than once. Most commonly, it is the good old pretest posttest design and you have an experimental and control group. You want to know if it works. If you have only tested your people twice, you are perfectly fine with a repeated measures ANOVA. If you have tested them more than twice, you are very likely to have grossly violated the assumption of compound symmetry and I would recommend a mixed model.

5. All righty then, you DO have multiple variables, they are NOT categorical or ordinal, your dependent variable is NOT repeated, so you must have multiple dependent variables. In that case, you would do a multivariate Analysis of Variance.

Some might argue that logistic regression is not a multivariate design. Other people would argue with them that, assuming your data are multinomial, you need multiple logit functions so that really is a type of multivariate design. A third group of people would say it is multivariate in the ordinal or multinomial case because there are multiple possible outcomes.

Personally, I wonder about all of those types of people. I wonder about the amount of time in higher education spent in forcing students to learn answers to questions that have no real use or purpose as far as I can see.

On the other hand, while knowing whether something falls in the multivariate category or not probably won’t impact your life or analyses, if you treat time as an independent variable and analyze your repeated measures ANOVA with experiment and condition as a 2 x 2 ANOVA, you’re screwed.