Statistics is definitely in.
In the last month I’ve gotten three invitations to “tech events” from organizations which will remain nameless on the grounds that I may want to do business with them some time in the current century. All of them wanted to do something with statistics. And apps. And social media. And whatever that other buzz word was. Maybe ketchup. Or Simpsons.
It reminded me of a Simpsons episode that I could not avoid watching. (What do rocket scientists do when they are not doing rocket science? They sit in the living room and watch re-runs of the Simpsons over and over. Trust me. I know. )
Homer decides to open an Internet company. The comic book guy comes in as a customer and starts asking about T1 lines, backing up his hard drive on the server and upload and download speed. Homer stares at him for a few seconds and then asks plaintively,
“Can I have money now?”
Don’t get me wrong. I LOVE statistics. I think World Statistics Day is a wonderful idea. I think treasure troves like data.gov , the National Center for Education Statistics, the U.S. Census Bureau and the Inter-university Consortium for Political and Social Research are wonderful-squared. The next best thing to sex.
HOWEVER… There are a lot of other data/ statistics/tech/ I-don’t-know-what-the-hell-they-are-doing that are the next best thing to sex with a ketchup packet, with which they have a lot in common.
I must make three points that were not evident to the Homer Simpson wanna-bes.
- You are incorrect in the point you try to make in your invitations that, “We need to try all of these things or we won’t know if they work or not.” Unlike a lot of other things I refer to on this blog, I have never actually tried sex with a ketchup packet and yet, having some knowledge of both sex and ketchup packets, I can still tell you that combining the two is very likely not going to meet with success. Similarly, even though action video games where you kill simulated people in graphic ways, data mining and iPhone apps are all big, having a two-day hack-a-thon where we all drink vodka and Red Bull while trying to create a Lethal Weapon Neural Network app is a bad idea. I’m not coming.
- No matter what you tell yourselves, most statisticians are really not all that much sexier than Homer Simpson. As photographic evidence, I have enclosed the following, uh, photograph.
Guess which one is the statistician? I’ll give you a hint. It is not the one with the pierced navel nor the one wearing a dress which caused her sister to punch her, which caused the punched sister to holler “Mom!” which caused the punching sister to mumble defensively, “Well, I THOUGHT they were padded.”
3. Statistics takes a lot of time and effort to get right. You can’t just throw out an invitation, have a bunch of people come into a room and prove that the Central Limit Theorem is incorrect and the answer to every statistics problem is, in fact, six. Let me give you a very simple example. I’ve been playing with some FCC data that I downloaded from data.gov. I wanted to do more with it than just create some maps but the data wasn’t in the format I needed.
So… I wanted to merge some datasets together and look at relationships between variables like the percentage of cross-owned stations and county population. Unfortunately, the variables weren’t coded identically. Here is just one tiny example. Some datasets would have a city listed for media area, such as Dallas. Others would have DALLAS and others still had “Dallas, ETC.” meaning Dallas and surrounding cities. So, I wrote a macro that converted city to uppercase and stripped out the etceteras. Actually, it would work without that IF statement, but it would leave messages in my SAS log and that would be messy.
libname in “e:\fccstuff” ;
%macro fx(dsn) ;
data &dsn ;
set in.&dsn ;
city = UPCASE(CITY) ;
fnd = index(city,”, ETC.”) ;
if fnd > 0 then city = substr(city,1,fnd) ;
drop fnd ;
proc sort ;
by city ;
My point is that even if you put the data out there, which is all kinds of wonderful, someone still needs more than rudimentary knowledge to put it together and get more than rudimentary statistics out of it.
It’s because issues like this arise far more often than not that getting to KNOW your data is a crucial part of statistics. That’s something that could happen in 24 hours, if you didn’t sleep, but then you still haven’t gotten to the statistical analysis part.
And maybe you shouldn’t. I read several articles the past few days where the statistics were acceptable, basic multiple regression type of stuff. However, when I read the part before that, where they collected the data, I actually said out loud,
“What are you doing? Your sampling is the equivalent of going outside and collecting rain drops and generalizing from that to the Pacific Ocean, or asking six drunk guys at a bar and predicting the general election! Don’t you know that?!”
The only ones around were the two frogs in the tank on my desk, Type I and Type II. They didn’t answer but I am pretty sure they could have designed a study equally well.
You can’t just skip over the sampling part and report on regression coefficients and t-statistics and think that it won’t be noticed.
My oldest daughter, who is not in the picture, being in Cambridge, wouldn’t notice. She said this requirement for great effort, time and actual numbers is the reason that she wouldn’t find statistics interesting even if you combined it with the Simpsons, sex and ketchup packets.
World Statistics Day is NOT followed by Whack Your Children on the Head Day. But it should be.