Categorical data analysis used to be simple. You had two nominal variables and you did a chi-square analysis. If it was statistically significant, that was all it took to make life good.
Then, logistic regression came along, with the reasonable notion that:
A. Dichotomous choices such as bought a candy apple/ate tofu & bean sprouts instead, lived/died or voted/didn’t vote were a very far cry from a normal distribution and pretending otherwise was just a bad thing.
B. It could be useful to be able to predict such dichotomous choices from a combination of other variables, not just one, and it would be even nicer if the variables could be categorical, continuous or a combination.
If you are interested, and who wouldn’t be, a good basic discussion of logistic regression can be found on this page at the University of South Florida site
Just when we got logistic regression settled in, with a pseudo-R squared which made us feel comfortable with something kind of like the ordinary least squares regression we had come to understand, if not exactly know and love, a new complication entered the picture.
What about when you have more than one choice, say voting Democrat, Republican or Independent? Maybe we went from plain old caramel apples or red candy apples to a choice of white chocolate covered, dark chocolate, carmel, M & Ms or Reese’s pieces.
Enter Multinomial Logistic Regression.
Sometimes, though, choices are predicated on one another. For example, I decide that I am going to take a vacation. I like vacations. I go to travelocity, on which I have spent approximately 8 zillion dollars over the past ten years, and am confronted with three options, within driving distance, Europe, or somewhere with flamingos. Why not? I like flamingos.
Nested Logistic regression
Nested logit models are used when one choice is based on, or “nested in” another. For example, once I have chosen the “flamingo” option, I can choose between two destinations, Florida and the Bahamas. (Did you know that the flamingo is the national bird of the Bahama Islands? Well, now you do.) Bahamas are nested within the flamingo option, because if I had chosen to take a vacation somewhere within driving distance or Europe, I could not have chosen the Bahamas.
Both a nested logit model and a multi-nomial model take advantage of the fact that you have more information than under a simple yes / no model.
Another model that seems to be coming up in popularity is the ordered logit model. This is used when your data can be rank-ordered. For example, the following responses to “How likely are you to vote for candidate X?”
Not unless winged monkeys descend from the sky and carry me off to the voting booth.
I will be writing about these models for the next week because they make me happy. For now, though, I am off to meetings.