Why biostatistics is confusing and how to make it less so

Getting ready to teach biostatistics in a few weeks and it seems to me that the real confusion in most cases is not the calculations, which can be fairly simple, but rather that there can be several ways of looking at the same question. Let’s take “risk” as an example.

What is the “risk” of diabetes?

You could answer this by prevalence – 9.3% of Americans have diabetes. So you could say you have about a 1 in 11 chance of having diabetes. Is that your risk?

On the other hand, incidence, the number of new cases per year is about 1.8 million, which comes out to around 0.6% in a population of 313 million. So, your chance of being newly diagnosed with diabetes is around 1 in 200. Is that your risk?

In discussing risk of a disease, it may be useful to consider the specific population. For example, the CONDITIONAL risk of having diabetes given the condition that your ethnicity is Asian-American of Chinese descent is 4.4 %. (Conditional risk of a disease is defined here as the prevalence given a specific condition.)

Conditional risk given that you are Puerto Rican is 14.8%.

What is the relationship between diabetes and ethnicity?

This is another simple-sounding question that can be answered in multiple ways. First of all, what is your reference group?

Is it, say, Puerto Ricans compared to the total prevalence of 9.3% ? Is it Puerto Ricans compared to non-Hispanic whites? To all Hispanics? To Americans of Chinese descent?  If the latter sounds silly, I’m not sure why it is any sillier than non-Hispanic whites, but perhaps someone can enlighten me.

Once you have a reference group, then what do you pick as the method of measuring relationship?

Risk difference is the absolute value of difference in probabilities between two groups.

The relative risk is the risk of one group divided by the risk of the other group. So, the relative risk is 1.74.  Rounding it up, you could say that Puerto Ricans are twice as likely to have diabetes as Central or South Americans – which sounds considerably different than that the difference between the two ethnic group risks is .063.

Then there are odds ratios, which I have written about extensively, including here.  Proportional attributable fraction, proportional attributable risk.

Well, I can go on for weeks – and will, once class starts.

How to make it all less confusing

Start with this question, “What do you want to know and why do you want to know that?”

If you want to know what the probable demand for insulin will be in the next year, you might care most about prevalence + incidence. If you are interested in predicting diabetes 10 years from now, you might be very interested in differing probabilities within ethnic groups, as some have a much faster rate of growth than others.

If you are interested in screening or prevention, you would be very interested in which groups have the highest incidence.

I’m thinking a fun and useful thing to do for both biostatistics and epidemiology would be to have students make a flowchart with questions like : If you want to know this, then do that.