# What you need to know before multivariate statistics

You might have gotten the misimpression from my previous post that I don’t think students need to learn all that much matrix algebra that I am a slacker as far as expecting students to come to courses with some prior knowledge. That’s not exactly the case. In fact, here are some things I just assume students coming into a multivariate statistics course should know and even though some textbooks begin with these, well, all I can say is if you have had three statistics courses and you still don’t know what a covariance is, I think something has gone awry in your education.

- Know the equation to compute variance – it’s pretty darn basic – and have a really good understanding of interpreting variance, like what 0 variance means, the statistical and practical interpretation of explained variance. I personally view science as the search for explained variance.
- REALLY understand covariance – that is, now how it is calculated, that it is a measure of linear relationship and that a covariance of 0 usually but not always signifies independence.
- Be able to interpret a correlation.
- Have a basic grasp of the Central Limit Theorem and the difference between population values and sample statistics.
- Understand what a chi-square is, how you get it and how you interpret it
- Remember the definition and interpretation of an F-test
- Understand the difference between statistical significance and effect size
- Know what the null hypothesis test
- Realize that before you do ANYTHING with data, if you don’t check the data coding and quality you are an idiot. You should have some understanding of how to read a codebook and be able to compute a frequency distribution, descriptive statistics and data description (like a PROC CONTENTS with SAS). When I look at the scant attention many so-called researchers pay to issues like missing data, miscoded data and non-random sampling, I am surprised we’re ever able to replicate anything.

Diving into MANOVA was really what I wanted to blog about next, so maybe I will actually get to that in the context of analyzing missing data, but having failed already at my attempt to leave my desk before midnight, that will have to wait until next time.

Having found no significant differences in the missing and non-missing data, as I’d expected, I went on to do a couple of more analyses where I was quite surprised not to find differences, but that will also have to wait for next time. I’m really only mentioning it here so I don’t forget. Wouldn’t you think that there would be differences in hospital length of stay and age by race and region? Well, I would, but I was wrong.

On a random note, I have to say, I really do love this remote desktop set up for teaching. It solves the problem of whether students have Windows or Mac, having to get needed software installed. All the way around, I love it.