# Random non-parametric thingies: This is your brain on stats

Here is how the Wald statistic works: You divide the maximum likelihood coefficient estimate by its standard error and square the result.

If you wanted to be really specific about it, what you are dividing is the difference between the obtained coefficient estimate and your hypothesized estimate. I would say, though, that 99% of the time your hypothesis you are testing is zero, that is, that the independent variable has zero effect on the outcome variable. Since the coefficient estimate minus zero is the coefficient estimate, it is actually simpler, although somewhat less accurate, to state it the way that I just did.

In my experience, people who use discriminant function analysis and logistic regression usually differ in their intent. Discriminant function analysis attempts to sort people into two (or more) groups. Logistic regression predicts the probability of an individual being in a specific group.

People who use discriminant function analysis are often interested in predicting, for example, who will drop dead of a heart attack and who won’t. If they find that 80% of those who drop dead can be predicted correctly, and 77% of those who don’t can also be predicted correctly using a combination of education, the Selye Stress Scale and how many times a year the patient eats liver with onions, then they are happy. (Topic for future research – why would anyone eat liver? It tastes totally gross.)

People who use logistic regression are often almost as interested in the relative effects of the predictors as they are the overall model. So, they are happy to know that the Pseudo-R is .35 but they are at least as interested in knowing that the coefficient for Stress is positive and substantially higher than education, while the coefficient for liver (no matter how gross it may taste) is non-significant.

From a statistical standpoint, the major difference between discriminant function analysis and logistic regression is that discriminant function analysis makes a lot of assumptions about the distribution of the independent (i.e., predictor) variables, specifically that these are normally distributed and linearly related to the dependent variable. Logistic regression does not make these assumptions.

So, for the person on SAS community who said that for the next Los Angeles Basin SAS Users Group they would like a discussion of non-parametrics so easy a hamster could understand it (BYOH – bring your own hamster) – this was the best I could do on a Friday afternoon.

And yes, I do know that is not a hamster, but all I had hanging around was a guinea pig named Edward G. Robinson and a spare cockatiel.

# What’s next? The most interesting statistical problem?

Filed Under statistics | 1 Comment

I will be finishing reading thousands of pages of grants and spend a few days on grant reviews. A grant I have been working on is almost done. The semester is almost over. I have two articles I submitted to journals under review. So… the question is, what’s next?

I thought about trying to make the deadline for the Western Users of SAS Software conference, but there was just no time. Besides, I have done so many dozens of conference papers, like most people my age, I don’t even list them all on my resume, I just pick a dozen or so sample topics each of which I have probably done five or ten times.

Here is what I am thinking about:
1. Writing a final article on on-line education for people with disabilities on American Indian reservations. This was one of the craziest ideas ever, including individuals with mental retardation and reading disabilities. How do you have a web-based course with people who can’t read, for crying out loud? And no, we did not use videos. When the results came in, we were jumping up and down excited. The data collection was completed over a year ago and I still haven’t written this up. Hey, I did two other articles taught classes, reviewed grants. etc.

2. Writing an article or two on the ten years of data on training teachers of English language learners. This includes some really interesting qualitative data on what makes the best teachers. There is also the standard stuff on GPA, test scores. The main question is – what characteristics are shared by those teachers who are the best of the best, the type we remember 20 years later?

3. Writing up data on an after-school tutoring program for hundreds of kids, which at first glance seemed to have failed but I think it actually sort of succeeded. The data were a total mess when I received them, but what I THINK happened was that many of the kids went to tutoring only rarely and those who did go to at least X hours showed improvement. The most interesting question here is to find X.

4. Analyzing qualitative data from interviews of 30 Native American parents of children with disabilities about how they first found out about their child’s diagnosis, the experiences they had with the school personnel and other professionals.

5. Doing something completely different and working on a design I am interested in right now using a combination of social network analysis and proportional hazards models to predict the movement from casual use through abuse to compulsion for youth using alcohol and other drugs.

6. Writing a book on SAS Enterprise Guide as a tool for researchers.

Because I am clearly all over the map here and I have a lot of data that is not being used, I think what I might do is write the book and use each of 1-5 as an example problem. That way, I will have the first draft of part of each article written along with the book. It will also show how you can apply EG to lots of different research problems.

This undoubtedly makes me sound as if my research interests are all over the map, and they are. This doesn’t even include the evaluation reports I am being paid to do. Still, reading these grants, I recognize the names of some of the same people who have been doing the same type of research for the past 15 or 20 years. Some people might call it having a passion for the topic. I call it boring. I don’t care if I was on the French Riviera studying the impact of cocaine on beach-side sexual behavior of porn stars and my covariate was the quality of champagne sipped by the researcher while watching. I’d still be bored with it way before 15 years.

Disclaimer: I don’t know if porn stars actually vacation in the French Riviera, so if you go there and are disappointed, don’t blame me.

# My Favorite Things – not so secret documents, part 2

Because I am no fun at all, as any of my children can tell you, I do not have a post for April Fool’s Day (although you should totally check out the blog on SAS for the Wii ).

Being counter-cyclical can be good at times, for example, making money on your investments last year. Bucking the trend today, I thought I would be serious and semi-useful.

Although aforementioned children refer to email as “for old people”, there are still some tremendous resources out there I would have thought all humans and even most naked mole rats would have heard about by now. Photo supplied for those of you wondering, “WTF is a naked mole rat and what does one look like?”
I am nothing if not service-oriented.

SPSSX-L archives and its not at all evil twin, the SAS-L archives can be the one and only location for finding the answer to that incredibly esoteric question about multinomial logistic regression using complex samples with dates entered in Roman numerals. I’d recommend using dates to restrict your search because both of those go back many years, apparently justifying the youngsters’ belief about email is for old people. If you don’t restrict your search, you may get a lot of results that are no longer relevant, e.g., links to macros to do something with SPSS 11.0 or SAS 8.2 that can be now done much more simply using menus or a newer PROC step.

# Statistical Software – The Secret Documents, Part 1

All right, well maybe they are not that secret, but there are some really great resources out there that you may want to check out.

If you didn’t get to the SPSS Higher Education Road Show at UCLA because rush hour in Los Angeles is from 7-9, 4-6 and any time you are on the 405, go here and check out the PowerPoint slides.