statistics

Interesting thoughts from JSM

ByAnnMaria De Mars July 30, 2012

Several interesting random thoughts from JSM:

From a session by Freeda Cooner entitled, “Bayesian statistics are powerful, not magical”

ways in which Bayes results could be slanted (one hopes unwittingly) were discussed. One point worth repeating is that the validity assumes accurate priors. Kind of obvious, no? Yet, the question is, where did you get your prior probabilities? Did you base them on studies of use of this drug with adults and your current study is of children? Did you base them on studies of a “similar” drug but this is a study of a new drug?

As I said, when you think about this point, it is kind of obvious but I suspect people don’t think about it often enough.

A second interesting point was made by Milo Schield about “causal heterogeneity”. That is, we like to think if we are testing a new treatment that those who live survive because of the treatment (saved) and those who die do so as a result of the failure of the treatment (killed). That is, we act as if there are only two categories. In reality, he says, there are four groups. In addition to the “saved” and “killed” groups there are those who would have lived regardless “immune” and those who would have died regardless “doomed”.

Another point by Schield was that although we always say that correlation does not mean causation we almost always give examples of confounding variables. We say, for example, that although ice cream sales go up along with violent crime, eating ice cream doesn’t cause you to go after your neighbor with a baseball bat, unless perhaps your neighbor is spied eating ice cream off of your spouse. However, when we look at probability in terms of p-values we are really spending most of our introductory statistics courses testing whether or not observed relationships are a coincidence and we should emphasize guarding against coincidence more than confounding.

Personally, I think I do talk about this a lot, so if you do not, feel shame.

Another really interesting idea came from Chris Fonnenbeck. He was discussing putting his code up on github and I thought,

“Why don’t I do that? Why don’t other people in our company?”

I hate to admit that we had just never taken the time to do it, for which I now feel guilt because I do look for code on github occasionally, and, more often, just browse it looking for interesting ideas, or the hell of it.

Speaking of @fonnenbeck, I met both him and @randomjohn from twitter tonight. I feel smarter just having been around them.

Learning Advanced SAS from a Macro : Part 3

ByAnnMaria De Mars February 24, 2012February 24, 2012

This is part 3 (hence the name) of my posts on how to learn advanced SAS from other people’s code. Yes, taking a class is a good idea – I have taken several – a couple from SAS Institute here and there, several before or after conferences like WUSS and SAS Global Forum. For me,…

Algebra | Dr. De Mars General Life Ramblings | statistics

Becoming an Expert Statistician (or Mathematician or Programmer)

ByAnnMaria De Mars March 2, 2012

It’s not often that you read a paragraph and it sticks in your mind for months. That this particular paragraph came not from some great literary work but rather from the proceedings of the annual meeting of the Association of Small Computer Users in Education is even more expected, but there it is. Douglas Kranch…

statistics

Baby steps to regression

ByAnnMaria De Mars November 8, 2012November 8, 2012

What do you see when you look at a regression analysis? Because me, all I see is a bunch of numbers and I have no idea where to look first or what’s important. Could you start me off with regression in some baby steps? What is it that you are looking at when you stare…

Dr. De Mars General Life Ramblings | Software | statistics | Technology

More after the data step (the naked mole rat continues)

ByAnnMaria De Mars June 11, 2011June 11, 2011

When last seen, our heroes were attempting to write a book with the title Beyond SAS Basics: Tips, Statistics and a Naked Mole Rat The first chapter was entitled After the Data Step. The first half of it was posted here earlier which you would know if you were following this blog in the probably…

Open data | Software | statistics | Technology

My happy adventure with SAS on-demand

ByAnnMaria De Mars December 19, 2011December 19, 2011

Before the semester began, I debated about requiring SAS on-demand for my statistics course. In fact, after giving it some thought, I decided to make use optional rather than mandatory. One reason for my hesitation was uncertainty about basing a major part of students’ grades on a project requiring an untested software package. I could…

Dr. De Mars General Life Ramblings | statistics

Should transgender athletes compete in women’s MMA: The data

ByAnnMaria De Mars March 24, 2013March 27, 2013

There has been far more heat than light surrounding the current controversy over whether a transgender (male to female) fighter should be allowed to compete in mixed martial arts in the women’s division. This article on The Verge said that opponents of Ms. Fox competition “are not supported by the current science”, citing the fact…

4 Comments

Josh M says:

July 30, 2012 at 11:16 pm

Do you spend much/any time on monte carlo methods? One of the more amusing things I’ve done in stats/probabilities is to implement monte carlo solutions fo most of “Fifty Challenging Problems in Probabilities” (by Mosteller). His solutions are involved, beautiful, nuanced applications of basic probabilities. My solutions are a couple dozen ugly lines of ruby. Yet, with enough iterations*, they converge to many decimal places.

It seems like this is something that we don’t drag enough students through, since it’s often a more effective/efficient solution for much of the “statistics” work we do in industry (I’m in machine learning/”data science”).

But, I may only be mentioning this because we just hired a guy with a fresh Master’s degree, and I’ve had to drag him kicking and screaming into just writing simulations, instead of spending hours at the whiteboard.

* How many iterations are needed to get a certain level of precision? Well, that’s just a matter of running a lot of runs of simulations, and seeing how they settle out! It’s dice rolls all the way down.
Annmaria says:

August 1, 2012 at 10:43 pm

That does sound really fun but sadly it is one of the many things I don’t have time for at the moment
John Johnson says:

August 4, 2012 at 2:32 pm

Awwww, thank you! I wish you could have been at Rick Wicklin’s roundtable!

A couple of random thoughts:

1. Many intro stats classes will be better off if shown (and discuss) this:
http://dilbert.com/strips/comic/2011-11-28/

2. The addition of “immune” and “doomed” groups are part of a discussion by Agrist and Rubin (principal stratification) that appears to be catching on, but I don’t quite understand well enough. It’s related to this notion of causality by Pearl (in one camp) and Rubin (in another, hopefully reconcilable with Pearl’s) that I’ve really had to study for a couple of different applications — cancer vaccines in one area, and observational research in another — and still don’t understand well enough.
John Johnson says:

August 4, 2012 at 2:38 pm

One more thought — the thing I like about Bayesian statistics is that you can characterize how bullheaded you have to be to reject a study’s results. The smaller the variance on your prior, the more you believe it, and the harder the likelihood has to work to overcome it. On the flip side, if you put a huge variance on your prior, you don’t formally show faith in it, and the posterior looks more like the likelihood. There is this interesting theory that characterizes priors in terms of an effective sample size they represent. If you don’t have true prior data or some justification for your prior’s effective sample size, you are placing too much faith in it. It’s not a perfect system, but a useful way of recasting priors in a language that makes you (hopefully) stop and think about it.

Similar Posts

4 Comments

Leave a Reply