The Multivariate Social Scientist: Book Review & Notes on Generalized Linear Models

I’ve been looking high and low for a supplemental text for a course on multivariate statistics and I found this one –

The Multivariate Social Scientist, by Graeme Hutcheson 7 Nick Sofroniou

They are big proponents of generalized linear models, in fact, the subtitle is “Introductory statistics using generalized linear models”, so if you don’t like generalized models, you won’t like this book.

I liked this book a lot. Because this is a random blog, here is day one of my random notes

A generalized linear model has three components:

  • The random component is the probability distribution assumed to underlie the response variable. (y)
  • The systematic component is the fixed structure of the explanatory variables, usually linear. (x1, x2 … xn)
  • The link function maps the systematic component on to the random component.

The systematic component takes the form

η  = α + ß1×1 + ß2×2 + … ßnxn

They use η to designate the predicted variable instead of y-hat. I know you were dying to know that.

Obviously, since that IS  a multiple regression equation (which could also be used for ANOVA), when you have linear regression, the link function is actually identity.  With logistic regression, it is the logit function, which maps the log odds of the random component on to the systematic one.

The reason I think this is such a good book for students taking a multivariate statistics course is that it relates to what they should know.  They certainly should be familiar with multiple regression and logistic regression, and understand that the log of the odds is used in the latter.

The book also discusses the log link used in loglinear analyses, which I don’t necessarily assume every student will have used. I don’t say that as a criticism, merely an observation.



Similar Posts


  1. eta is the linear predictor but not on the scale of the outcome variable. So y_hat is inv_link(eta), not eta.

  2. It depends. If you are thinking in terms of multiple regression, which is where most students begin the course, then they are used to seeing that equation = y_hat because the link function is identity, and that equation + error = the actual y.

    You’re right, though, that the whole point of generalized models is to generalize beyond that.

    As you’ve probably guessed, sometimes I write this blog as sort of thinking out loud while working on lecture notes for an upcoming class. I’m assuming that many students will be used to seeing the same equation in a different context.

    Your point helps clarify it, though. Thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *