statistics

Death is different: More on event history models

ByAnnMaria De Mars May 20, 2009

Event history models of all types have a few characteristics that make them unique. First of all, forget that whole symmetry thing around zero.

Here our dependent variable of interest is time to event. We are interested in how long a person lives, remains sober, stays with a given company, or, in a study of my parenting skills, goes without threatening to skin a child alive and tack her hide upside the door as a warning to her sisters.

Regardless of the specific nature of the event, we are interested in TIME, which by definition must be positive. You cannot have negative duration.

Let’s take death as an example of an event. We will define death operationally as the time of death written on the death certificate. As our beginning point, let’s take attack by weasels. Some people might die right after a weasel attack, if, say, attacked by a particularly large weasel, or a whole sneak of weasels. (Yes, the correct term for a collection of weasels is a ‘sneak’. ) If you don’t believe me, look it up.

Others might linger for a while and then die, with their bodies unable to combat those severe weasel-bite wounds. Some additional number may die from complications of infections due to weasel bites and so on.

Our dependent variable we are interested in is T, where T represents the time from the biting weasel onslaught to death. At each time period, there is a baseline hazard rate. Remember this term, because it is important.

The baseline hazard rate is a constant. Weasel attack survival may be like this. Say 5% of weasel attacks are the sneak variety and the victim dies within 24 hours. However, of those who survive, only 1% die within the next 24 hours, and 0.2% catch some type of nosocomial infection and die within the following 48 hours. In an exponential model, the baseline hazard rate is a constant – period – because we assume that the rate of an event does not change with time.For other models, the baseline hazard rate is a constant for a given time interval. The Weibull model, for example, allows for a monotonic hazard rate, i. e., it can be increasing or decreasing but only in one direction.

The baseline hazard rate for that second period is .01. so h(2) = .01

However, one can, and usually will, have covariates. I mean a person is more than the sum of his episodes of attack-by-weasel, right? So, while the hazard rate may be .01, it may increase if a person has other pre-existing conditions, such as old age. A 99-year-old weasel attack victim may have a greater hazard rate than a 17-year-old victim. Other factors may have a negative relationship with hazard, for example, having been vaccinated for rabies.

Thus, Weibull model can be expressed as a log-linear function

log(T) = b0 + b1X1 + b2X2 + σε

where the last part is a stochastic disturbance term, stochastic disturbance sounding better to say than ‘error’ and less likely to draw the attention of malpractice attorneys and hedge fund investors.

What makes the Weibull model different is that it also includes a shape parameter. The covariates alter the scale value but the shape (if it is increasing, decreasing or flat) remains constant.

A Weibull model can work over defined ranges but may not always be the best pick. Think mortality, for example. There is actually a relatively high mortality rate in the first year of life – being born is a risky business – but then mortality drops until age 14 after which your risk of death goes up again until, well, until you die.

Fixing character data without beatings: SAS Enterprise Guide

ByAnnMaria De Mars June 18, 2009June 25, 2009

At the JMP seminar on Monday, when Dick De Veaux said that 65-70% of time in all research projects is spent on data cleaning, everyone in the audience groaned in agreement. One of the biggest problems I run into is recoding those simple textboxes. For example, we often want to look at data for one…

statistics

Survivor Functions, Hazard Functions and Pictures

ByAnnMaria De Mars October 21, 2011October 21, 2011

Unfamiliar jargon like Kaplan-Meier curves, PROC PHREG, right-censored and hazard functions can be daunting to the newcomer. Survival analysis is really quite straightforward; it is simply a set of statistical techniques used when the focus is “time to event”. The event can be death, divorce, arrest, substance abuse or literally anything else. You’ve been wanting…

Open data | statistics

Charts with CDC Data- A step by step example

ByAnnMaria De Mars March 12, 2017

Perhaps you have watched the Socrata videos on how to do data visualization with government data sets and it is still not working for you. Here is a step by step example of answering a simple question. Is the prevalence of alcohol use among youth higher in rural states than urban ones? You can…

Software | statistics | Technology

Watch me work: Data Project

ByAnnMaria De Mars February 3, 2016February 3, 2016

On twitter, there were a few comments from people who said they didn’t like to take interns because “More than doing work, they want to watch me work.” I see both sides of that. You’re busy. You’re not netflix. I get it. On the other hand, that’s a good way to learn. The data are…

Software | statistics | Technology

Virtual Machine vs SAS On-demand for Academics

ByAnnMaria De Mars September 19, 2014

I’ve been pretty pleased with SAS Studio (the product formerly known as SAS Web Editor), so when Jodi sent me an email with information about using a virtual machine for the multivariate statistics course, I was a bit skeptical. Every time I’ve had to use a remote desktop connection virtual machine for SAS it has…

Software | statistics | Technology

SAS Global Forum Stuff worth Noting

ByAnnMaria De Mars April 5, 2011April 5, 2011

In three minutes before the next statistics session, here’s some more on the opening session last night. SAS Chief Marketing Officer Jim Davis made the comment that for every SAS product they are asking the question “Is there a mobile application for this and if so what does it look like?” He also showed some…

3 Comments

Mike says:

July 15, 2009 at 7:18 am

Please reconsider the second sentence regarding “symmetry thing around zero.” One requirement of a parametric survival model is that the distribution fit, whether it be Weibull, lognormal, exponential, etc. While not symmetry around zero, determination of the appropriate parametric distribution follows the same notions as normality for a linear model.
Ashley says:

July 15, 2009 at 10:30 am

Thanks for the chuckles from using the weasel attack example.

As someone who took statistics a long time ago,
I feel like I need more explanation how the two time periods in the weasel example get modeled (you don’t define the terms in the Weibull model).
Ian Cuthill says:

July 15, 2009 at 1:48 pm

Most providential encounter. Will add weasel sensitivity to Alzheimers and organ failure as potential hazards in my plan for eternal life.

Similar Posts

3 Comments

Leave a Reply