# Logistic regression proves I have no soul

December 5, 2010 | 9 Comments

The #reverb10 prompt for December 3rd was to write about a time when you felt most truly alive in 2010. There were more prompts,  about what you wonder about and other examining-your-soul type of introspection. This isn’t that kind of blog. I don’t think I’m that type of person. For the record, the time I feel most alive is when I am with my family but I wasn’t the least bit interested in writing about how much I love my family right now. In fact, I was very interested in logistic regression.

I'll get this down eventually

Should YOU wonder about logistic regression? Well, that depends.

• Do you have a continuous, numeric dependent variable? If yes, do something else, maybe multiple regression.
• Is one of your variables a dependent variable? If no, do something else, maybe log-linear modeling.
• Do you have more than one independent variable? If not do something else, usually either a chi-square (if both your variables are categorical) or a t-test if one of your variables is continuous.

Logistic regression is the statistical technique of choice when you have a single dependent variable and multiple independent variables from which you would like to predict it.

With logistic regression, the dependent variable you are modeling is the PROBABILITY of the value of Y being a certain value divided by ONE MINUS THE PROBABILITY. Let’s start with the simplest model, binary logistic regression. There are two probabilities, married or not. We are modeling the probability that an individual is married, yes or no.  [Logistic regression is NOT what you would use to model how long a marriage lasted. That would be survival analysis.]

The logistic regression formula models the log of the odds ratio. That is

The probability of y =1 / probability of y = 0

So, the left side of your equation is

ln(p / (1- p) )

**** Very, mega- super-important point here – the p in this equation is NOT the same old p as in p < .05. No, au contraire. Completely different. This is the probability of event = 1. For example, the probability of being married. 1-p then would be 1 – the probability of being married.  Yes, that second number is the same as the probability of being single. You aren’t missing anything.

I was, in this post going to use the probability of being a dumb-ass but some people have written and told me that I am too hostile for a statistician so I am trying to mend my ways, it being around the holidays and all.

The right side of the equation is the same old ß0 + ß1X1 + …ßnXn
that you are used to with Ordinary Least Squares (OLS) regression also known as multiple regression or multiple linear regression, or, if you are a complete weirdo, Monkey-Bob .

The ODDS RATIO is

The probability of y =1 / probability of y = 0  when x =1

divided by

The probability of y = 1/ probability of y = 0 when x = 0

I presume the only reason you have read this far is that you have some deep-rooted need or desire to understand logistic regression. An example will help. I have discovered lately that I love my husband for a very important reason. He is not a dumb ass. I have had multiple husbands (not simultaneously, that would be polyandry and illegal in most states and immoral according to certain anal-retentive religions) what they all had in common, other than the obvious being married to me, is that they were all in technical fields and pretty good at what they did. Let’s go with the hypothesis that people who are in a technical field are more likely to be married.  Further, let’s say that we have sampled 100 people in computer science and 100 people in French literature. We find that 90 of the computer scientists are married and 45 of the French literature people.

So, if the probability of marriage is 90/100  and the probability of not married is 10/100 then the odds ratio of  9:1 for the computer scientists. =  9

For the French literature people, the probability of marriage is 45/ 100  and the probability of not being married is 55/100  = .818

So, 9/.818  = 11.00

This tells you that the odds of a computer scientist being married versus single are 11 times that of a French literature professor. Also, that you should study computer science instead of French.

If you really had nothing else to do in your life and wanted to run this using SPSS just to see if I was correct (really, now!) you would get this output.

Gasp! The value of  ß0, that being our constant, is -.201 . The inverse of the log is Exp(x) also shown as “e to the x”. This is a function in SPSS, if you want to double-check. Also a function in SAS, Stata and Excel, but NOT on the calculator on my iPhone. Steve Jobs should feel shame.

The value of  Exp(-.201) = .818  –  the odds  for French literature people.

The value of Exp (2.398) = 11.00  – the odds ratio for computer scientists versus French literature whatever you call them (unemployed would be my guess).

Coincidence? I think not!

In interpreting a logistic regression analysis you want to look at the significance of the parameter estimates (.000) and the parameter estimate, in this case the ß = 2.398. A positive coefficient says that the dependent is MORE likely if the variable has the value in question. In SPSS, that value is shown in parentheses. Notice it says cs(1) – that means when cs has the value of 1, the outcome is more likely to occur. How much more likely? Look to your right. (On the table, in this blog post,not to your right in your room. What are you thinking?) The odds are 11 times greater for computer scientists than for French literature whats-its.

A really good reference if you want a plain language introduction to logistic regression is by Newsom . There are a lot of really bad references to logistic regression in very obscure language but I decided not to bother mentioning them.

The syntax for producing this table in SPSS is below.

LOGISTIC REGRESSION VARIABLES married
/METHOD=ENTER cs
/CONTRAST (cs)=Indicator(1)
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).

# My Mother Didn’t Want a Hacker for a Child

As my lovely oldest daughter, noted, most of what most people turn out to be, for better or for worse, they owe to their mothers.

Family legend has it that my mother, a former Catholic nun, former second grade teacher, was bored sitting home in Karachi, West Pakistan while my father worked on the Air Force Base. Girls were not allowed to attend school, so my mother, always the quietly rebellious type, taught me to read before I turned three years old.

Another family rumor holds that my father taught me how to swim by throwing me off the end of a pier in Biloxi, Mississippi.

I don’t know if either of those stories are true, but they are certainly in character for both of my parents and two things I have been able to do as far back as I can remember are read and swim.

The reverb10 prompt for the day is “What keeps you from writing?”

Certainly, reading is the SECOND most common activity that keeps me from writing. Thanks to mom again, I attended St. Mary’s Catholic School, where the speed-reading machine was the educational innovation of the day. In an early introduction to the difference between individual and mean results, while I assume it fell out of favor because my results were not typical, by the end of seventh grade when I left St. Marys, I could read hundreds of words a minute. I am the absolute archetype of the voracious reader. I have daily subscriptions to the Los Angeles Times (in print), the New York Times (on my Kindle) . I read an average of four books a week, sometimes ten during vacations. If I like an author, like Malcolm Gladwell or Alexander McCall Smith or Alice Walker, I’ll read everything he or she ever wrote. I have a Sony eReader, a Kindle, an iPad, cards for the Santa Monica Public Library, Los Angeles Public Library and three university libraries.

Mom wanted an academic. That was pretty clear.  For a while there, it seemed like she might get her wish. I collected scholarships, degrees, even was a professor, on the tenure track no less, for seven years.

Sorry, mom, it wasn’t for me. I really liked teaching, and I like to think I was pretty good at it. What I liked even more was finding out stuff. I was like Kipling’s The Elephant Child, full of insatiable curiosity. The part, though, where you rewrite the article eleven times, and someone sends it back to you because you wrote “11” instead of spelling out the word and thinks you ought to rewrite that sentence where  you said 11 people died of grammatical errors to 11 people passed away and you didn’t cite Billy Bob Thornton’s classic piece on death by semi-colon and — AAAAH !!!

And did I mention professors don’t get paid much even when they DO have full-time tenure track positions?

I spend most of my time hacking away at code. My husband, the resident honest-to-God rocket scientist, tells me that hacking is not a positive term and I shouldn’t use it all the time. But he’s not the boss of me (as he would be the first to attest, under oath, if necessary).

I spend hours downloading datasets from places like data.gov, the Inter-University Consortium for Political and Social Research, and loads of other government and private websites that offer data. Then, of course, there are the datasets I get paid to analyze.

I’m not only a disappointment to my mother, but I’m a disappointment to my doctoral adviser, too, or I would be, if he wasn’t dead. I like statistics and sometimes I like going through the underlying equations for procedures. It makes me feel like I’m on NUMB3RS, my absolutely favorite show. Instead of spending my time delving deeper into matrix algebra, though, I liked to figure things out.

How many high school girls want to go into the military? Who are these girls? How are they different from other girls their age, or from boys who plan to enter the service?

Do people posting on forums for athletes have a more accepting view of eating disorders (you do what you gotta do) than people posting on forums with a more general audience?

When you ask teachers what they do in their classrooms, how does that relate to what they believe is important?

How do you measure what people believe, anyway?

It turns out that a whole hell of a lot of the time of answering those questions is not doing a correlation or factor analysis or non-linear mixed model. Nope, most of it is whacking your data into shape, reading it in, merging it, checking it, fixing it. Once you figure out how to do that, you want to figure out how to do it better, faster, smarter. So, I might do something once and then realize, hey, if I changed this into a macro, I could do it over and over, like SO…
``` Title "Thing to Look at Statistics" ; %macro tbles(strt,nd,ndsn,dsn) ; proc summary data =  &ndsn    ; var &strt -- &nd ; output out = &dsn ; proc transpose data = &dsn  out=frq&dsn ; id _stat_ ; var &strt -- &nd ; proc sort data = frq&dsn ; by descending mean ; proc print data = frq&dsn split = " " ; id _name_ ; label _name_ = "Type of &dsn" mean = "Percentage of Things"; var mean ; format mean percent8.1 ; %mend ;```

Now, every time I want a table of something in order, like how often teachers used a particular strategy or customers bought a certain brand or whatever I can just do this:

%tbles(question1,question39,teachers,strategies) ;

But THEN, it occurs to me that I might want to run this same thing 20,000 times for 20,000 different datasets I read in. So, it occurs to me that if I have them all in windows, say, and I go to the command line and type

dir c:\thisdirectory\otherdirectory\*.* >> files.txt

Then I could read in the directory as a dataset and have a %DO  / %END loop that reads each file and does whatever is in my macro for all of these and maybe even in the middle there concatenates all of the datasets so I can look at it for all of the 20,000 studies ….

… and so that is why I don’t get more writing done.

# Computers: The Word of the Day

December 2, 2010 | 2 Comments

The reverb10 prompt for today was to sum up 2010 in one word.  It reminded me of a kid I used to play with. Her name was Star, her sister’s name was Secret and her brother’s name was Arrow. She said that, after each birth, her mom opened the dictionary at random and whatever word her finger landed on, that was the child’s name. Personally, I think this story is bullshit.

I mean, honestly, do you think that on three tries she never once landed on a word like “the” or “pancreas” or “fornication” ?

So, I read a lot of these other posts.  One advantage of being a procrastinator (which is Greek for “night person”) is you get to read other people’s stuff before you write your own.  Most of them had words like “Sanguine” .

The blog by The Annoyed Army Wife was more credible as she said the first word that came to mind was “fuck!” but  then she had a beer and thought of a different one (“surprise”, in case you are wondering, oh, sorry, now I ruined it).

The word that best describes my year is, “Computers”.

Coincidentally, next week is Computer Science Education Week and everyone is trying to be all politically correct in telling you, “Hey, kids! STEM is great! Computers are cool!” and showing pictures with nationally representative genders and races included of people looking like the cult from Bubble Boy and smiling at screenshots of the moon. I predict this will be about as successful as the “Just say no to drugs” campaign, which, contrary to expectations, did not stop people doing drugs. Don’t even get me started on abstinence education.

I think perhaps what we should say to young people instead is this ….

COMPUTERS ….

I have a truly great life and it revolves around computers.

I spend most of my working hours sitting in front of a computer doing things I could never have imagined thirty years ago. I’ve worked on a couple of projects this year analyzing genetics data, both human and animal, that’s the type of job that only existed on Star Trek back in the 1960s. Much of 2010 I spent working at a university with a direct line to several thousand CPUs right over my head. Other projects ranged from studies of government spending, “reading” posts to websites (yeah, you’re in my database, watch out), projections of election results, complete with color-coded maps and other stuff that you see on TV and movies now, except I’m a real person, not an actor, and I get to do this for a living. Yeah, I know, it’s hard for me to believe, too.

My first job I made minimum wage, I think it was about \$2 an hour back then, waiting tables from midnight to 8 a.m., then going to class.  Even if we give a discount for telecommuting, flexible hours and volume, I still make more in an hour now than I used to in a week. I get paid this because whether you want it using Linux or Windows or on a Mac, I can get it done and I have all three operating systems in my house. If your data are on websites scattered across the world, in an Excel file written in Korean or a 100GB dataset on a supercomputer, I can pull out what you need to know. For this, they pay me. Well. Many of the people I knew growing up ended  in jail, on welfare or dead before they got to my age. I didn’t take an eight-month course at some institute – I spent 40 hours or more in front of a computer for decades of my life and it paid off. And I was HAPPY doing it. (See previous paragraph.)

Speaking of that telecommuting – flexible hours thing. Because almost all of my work is done through a computer, I can do it in jeans and a t-shirt, starting at 10 a.m., with Christmas music playing in the background, a glass of Chardonnay on one side of my monitor and a candle burning on the other. Walking to the beach for a break is nice, too.

Don’t get me wrong – I work a lot of hours and do my, very, very damnedest to deliver great work – BUT when I was in business school, young women (of which there were two or three in most of my classes, counting me) were counseled that they should not have pictures of their children on their desk because it would make them seem less serious about their careers. Imagine me saying this when I graduated with my MBA back in 1980 –

“Dr. Erickson, I think I want to work out of my home, because, after all, that’s where my stuff is. I put a lot of time into making it comfortable for me, so, yeah, I don’t think I want to go to an office. Besides, I’d have to get up and drive in rush hour traffic, and that pretty much sucks. Oh, and I don’t like mornings so I’m not going to work before 10. See, if I don’t drive to work, I can just get up at 9:55 and start work.”

Well, I wouldn’t have done that because I am afraid he would have had a heart attack on the spot.

If you’re really, really good with computers, it doesn’t matter if you are male, female, black, white, over fifty, an immigrant, overweight, shy, gay, once slept with the dean’s wife or a hundred other things. You’ll have work. That doesn’t mean you won’t have to sometimes deal with idiots, or get laid off or quit, get fired for sleeping with the dean’s wife or have to learn a new programming language or operating system because the one you knew is no longer in demand.

The other reverb10 prompt was to give a word that I hope will describe 2011.

Sex would be a good word but I am over fifty and married for a very long time, so, seriously, who am I kidding. I was going to go with computers again, because I like my life, but I guess that’s cheating, or redundant.

So, my word for 2011 is reboot.

A reboot every now and then is a good thing. Hopefully, your system starts up again, all your applications that you want are working the way you want them, your communication among the various parts of your computer are working again and you have recovered from any errors.

System working – communication – recovered from errors. That’s what I want for 2011. A reboot for my life.

« go back