Event history models of all types have a few characteristics that make them unique. First of all, forget that whole symmetry thing around zero.

Here our dependent variable of interest is time to event. We are interested in how long a person lives, remains sober, stays with a given company, or, in a study of my parenting skills, goes without threatening to skin a child alive and tack her hide upside the door as a warning to her sisters.

Regardless of the specific nature of the event, we are interested in TIME, which by definition must be positive. You cannot have negative duration.

Let’s take death as an example of an event. We will define death operationally as the time of death written on the death certificate.  As our beginning point, let’s take attack by weasels.  Some people might die right after a weasel attack, if, say, attacked by a particularly large weasel, or a whole sneak of weasels. (Yes, the correct term for a collection of weasels is a ‘sneak’. ) If you don’t believe me, look it up.

Others might linger for a while and then die, with their bodies unable to combat those severe weasel-bite wounds. Some additional number may die from complications of infections due to weasel bites and so on.

Our dependent variable we are interested in is T, where T represents the time from the biting weasel onslaught to death. At each time period, there is a baseline hazard rate. Remember this term, because it is important.

The baseline hazard rate is a constant. Weasel attack survival may be like this. Say 5% of weasel attacks are the sneak variety and the victim dies within 24 hours. However, of those who survive, only 1% die within the next 24 hours, and 0.2% catch some type of nosocomial infection and die within the following 48 hours. In an exponential model, the baseline hazard rate is a constant – period – because we assume that the rate of an event does not change with time.For other models, the baseline hazard rate is a constant for a given time interval. The Weibull model, for example, allows for a monotonic hazard rate, i. e., it can be increasing or decreasing but only in one direction.

The baseline hazard rate for that second period is .01.  so h(2)  = .01

However, one can, and usually will, have covariates. I mean a person is more than the sum of his episodes of attack-by-weasel, right? So, while the hazard rate may be .01, it may increase if a person has other pre-existing conditions, such as old age. A 99-year-old weasel attack victim may have a greater hazard rate than a 17-year-old victim. Other factors may have a negative relationship with hazard, for example, having been vaccinated for rabies.

Thus,  Weibull model can be expressed as a log-linear function

log(T) = b0 + b1X1 + b2X2 + σε

where the last part is a stochastic disturbance term, stochastic disturbance sounding better to say than ‘error’ and less likely to draw the attention of malpractice attorneys and hedge fund investors.

What makes the Weibull model different is that it also includes a shape parameter. The covariates alter the scale value but the shape (if it is increasing, decreasing or flat) remains constant.

A Weibull model can work over defined ranges but may not always be the best pick. Think mortality, for example. There is actually a relatively high mortality rate in the first year of life – being born is a risky business – but then mortality drops until age 14 after which your risk of death goes up again until, well, until you die.



What is an event history model?

Think of it like this – you are interested in whether something happens, what predicts whether it happens and how long until it happens. Let’s take a common one, like, say, death.

An event history model could predict the duration from diagnosis of tuberculosis to death. In this model you have two groups, those who died during the study and those who were still alive at the end of the study. You could use a simple logistic regression model. I guess this says something about me that I use simple and logistic regression model adjacent to one another in the same sentence.

Logistic regression fails to use a critical piece of information, that is, how long the person survived.

Some terms to know thinking about event history analysis:
1. There are various types. Survival analysis is a special case of event history analysis. In this case, the curve eventually reaches zero – in the end, there are no survivors, everybody dies. Also, survival analysis does not have recidivism rate. You only die once. Related to this, it is a final point. You don’t die and then come back. I know every Christian from that original Mormon guy to Father Mike says you do, but it has never happened in the duration of any statistical study in which I have been involved. In mathematical terms, it would be said that the survivor function S(t) is a strictly decreasing function.

2. Some observations are censored. That does not mean they have been running around your study naked (although they could be, there is nothing to prevent a censored subject from going naked). Censored subjects have not experienced the event by the time the study ended or you lost track of them. (If you had kept their clothes, that might have prevented them from running off, but it is too late now. You should have thought of that sooner.) If your study is of the use of illegal drugs, some people will not have used drugs at all by the time the study ends. If your study lasts 700 days and Joseph goes out and does massive amounts of cocaine on day 700, while Mary is at church singing hymns all day for all 700 days of the study, it wouldn’t make any sense to consider Joe as having just one day less of cocaine-free lifestyle. In fact, it is very plausible that Mary will continue drug-free throughout the rest of her life for another 7,000 days or more, and, with behavior like this, she may even come back from the dead and live drug-free hymn-singing some more. You could drop Mary out of the study as “missing data” , since there is no data on when she began using illegal drugs. That’s an unsatisfactory solution also, though. Not only is she not really missing data but the data you do have is usually the outcome you are most interested in – the not-drug-taking, not-dead, not-incarcerated people.

3. Some event history models allow for multiple episodes of the event, whether your variable of interest might be drug use, incarceration, military intervention, or its don’t-try-this-at-home counterpart of domestic violence.



Whether you are a statistician, SPSS guru, SAS programmer or professor and world-renowned expert on re-incarceration, odds are great that you are susceptible to bubble-vision. You work, breathe and socialize within one or two very narrow bubbles.


This is bad and unhealthy. You’ll miss much of life that is beautiful, exciting, dramatic, interesting, tragic and delightfully fun. You’ll also focus too much on things that are not particularly important because you are looking only at whether your colleague in the Study of Very Important Flagellum Department unfairly criticized your latest conference presentation, who voted for you as Treasurer of the SVIF Society and what that editor of the Journal of SVIF said about your latest article submitted.

Be like Julia (the eponym for The Julia Group), live life large, interested and happy. In the interest of that goal, here are some interesting links to follow that relate to the world outside of my personal bubble:

The Disease Management Care Blog – is unfortunately named because, contrary to what you might think, it is far more interesting than a rectal exam. The latest post was on Comparative Effectiveness Research. I don’t wholly agree with the point cited that CER doesn’t take into account co-existing conditions, personal preferences, etc. It may not in all cases but that is no reason it couldn’t. The author discusses both sides of the issue of CER funding, whether we are spending too much on it, too little and does it do any good in the end? These are pretty general questions of life.

I love the New York Times because their coverage is intelligent and thought-provoking. This series on social class in America is even more the case than usual. My family certainly lives in a different class than the one I grew up in. When Julia was about four, I asked her if someone she had mentioned was her friend’s mother and she answered contemptuously,
“No, she’s him’s ‘anny !”

After all, who could be so dumb as to not know it is your NANNY that takes you to the park, not your mommy. Your mommy is probably working on a documentary or writing a blog on statistics or at the hospital delivering a baby.

When I was eight years old, I walked a mile home from school with my brothers and sister. During the summers, we watched ourselves, made ourselves lunch and solved our own fights, by means best not shared with my mother to this day. Let me just say that the broken front window, the broken down bathroom door and the scars on my second brother’s forehead – none of those were me. My oldest brother’s broken finger or the drainpipe inexplicably pulling away from the second floor, well, I plead the fifth.

Their discussion of class was fascinating to me in part because, being over-involved in judo (I am the president of the United States Judo Association) in my copious spare time, of which I have none, I meet people from all possible strata of American society, most of whom haven’t a clue what a stratum is. Some are absolutely infuriated that I do not do as I am told. What the New York Times articles highlighted was the class differences in the value placed on doing what one is told versus finding the right answer. It never even occurred to me that blind obedience could even be considered a virtue.

Wiki-books is an interesting concept. Free textbooks. Not great in quantity, but hey, if you want to contribute, go ahead, or read whatever happens to be there. Every now and then I go just to read at random. Today, I read How to Do Nothing. As anyone who has ever met me can tell you, it is a textbook I sorely need to read.

Speaking of the judo association, another good site to check out is the page on Social Capital from . This Internet thing is pretty cool. Where else could you read original research by people from Harvard University while sitting in your massage chair? Or find 150 ways to increase your social capital.

Right now, I think I am going to do #86, log off and go to the park, even though I am not, in fact, a nanny.



FINALLY got a few minutes to download the latest version. For some reason the download I received was for the planned installation as opposed to the basic installation.

In 25 words or less, basic installation is for stand-alone installs on a single machine, which we have hundreds of users doing. The planned installation would be used if you had a meta-data repository, SAS on a server distributed to client machines or some other configuration which we did not have.

So, I have logged in as SAS administrator, downloaded the download manager, applied the order number and key, created a software depot and — nothing.

After slogging through several documents, I realized that we had been sent the wrong thing. Either that, or one of the right things telling us how to use this for a non-planned installation, had been omitted. Got through right away to the lovely Angie McKinley from SAS who sent me a link how to skip the planning part and voila ! My deployment deploys and I now have SAS 9.2 v2 and Enterprise Guide 4.2 on a computer running Windows XP.

By the way, since I am taking this incredibly stupid required course on Workplace Harassment Prevention let me just specify that I do not actually know what Angie McKinley looks like and the lovely is referring to her helpfulness and is not in any way a reflection of ageist/sexist/gender-specificist/racist/lookist stereotypical intent. Come on, I am Hispanic, female and over 40. I believe as a group we are mostly accused of harassing our children for not calling often enough. (“Yes, I know you are covering the World Cup. So, what, they don’t have phones in South Africa?“)

SAS 9.2, which I am testing in between clicking on the stupid harassment training, is so far working well. Opened up an xslx file no problem. Tried Enterprise Guide 4.2 and
mudskipper1Hey wait a minute …. something looks different here…

First of all, there is no longer a DATA menu. Instead, under tasks, there is a FILTER and Sort. There is also a QUERY BUILDER which is where you now create new variables a.k.a. computed columns. Okay, so having just completed the docs on Enterprise Guide on my personal pages, I will need to go recreate them. This does not motivate me to do my little happy dance.

Other than having to redo a few pages I just finished, though, I cannot complain about EG 4.2, personally. With the FILTER & SORT and Query Builder, it looks more Access-ish.

So, what have we got here… a combination of SAS, SQL, Access, Excel and something that looks like the new ODS Graphics. SPSS users will find it WAY easier to move to Enterprise Guide than they would to SAS. Kind of like Esperanto, it has bits of everything to make it a little familiar to anyone who has experience with just about any fringe of data management and statistical software package. Except, unlike Esperanto, I think it will catch on. (You see, I used the Esperanto reference here rather than some breeding analogy so that no one could feel harassed. Except for maybe celibate people who speak Esperanto, but AFAIK they are not a protected class.)


WP Themes