It’s also not just so you can get your own original, signed illustration of the difference between ordinary least squares and maximum entropy methods from Don from SAS,
I was very pleasantly surprised to learn more than I expected at WUSS this year. I was aware of the GLMSELECT procedure available to select the best-fitting model, but I have not actually used it. Funda Gunes, from SAS, gave a great talk on model selection methods. To summarize the last hour – you create 1,000 or so bootstrapped samples, then run models with those each of those and select the average coefficient estimates from the 1,000 models. This is the best model not in the stepwise regression sense of giving you the highest explained variance, but as in most likely to correctly reflect the population values. That is a GROSS over-simplification but I highly recommend if you have any interest in model selection techniques, you download and read her paper which should be available from the conference proceedings, which will be published on the WUSS site eventually.
A second good paper on model selection was by Scott Leslie, pretty much on the polar opposite on the technical side from Funda’s, where he showed a series of ROC curves to illustrate the gradual (or sometimes substantial) improvement in a model as new predictors were added. He ended with a discussion of what might be better predictors of adherence to a prescribed medication regimen and how would you get that data.
In Kechen Zhao’s presentation, I learned about using PROC GENMOD to compare four different model types – logistic, log-binomial, Poisson and modified Poisson. He discussed relative risk as a variable of interest versus odds ratios, and the fact that logistic regression in particular can produce substantially different estimates then the other models. This is worth a whole post in itself that I will try to get to next week.
As an added icing on the cake, in a session by Marie Bowman-Davis I learned about a public use data set, the California Health Interview Survey. (I did not know these data were available for public use and they are obviously a great resource for teaching.)
Despite all of these good things, I left the conference a bit concerned about the future of SAS – the average age of attendees at the conference was probably over 50. More about why that is and why that’s a problem later, since this post is already long enough and I have actual work to do.