SENSITIVITY AND SPECIFICITY – TWO ANSWERS TO “DO YOU HAVE A DISEASE?”
Both sensitivity and specificity address the same question – how accurate is a test for disease – but from opposite perspectives. Sensitivity is defined as the proportion of those who have the disease that are correctly identified as positive. Specificity is the proportion of those who do not have the disease who are correctly identified as negative.
Students and others new to biostatistics often confuse the two, perhaps because the names are somewhat similar. If I was in charge of naming things, I would have named one ‘sensitivity’ and the other something completely different like ‘unfabuloso’. Why I am never consulted on these issues is a mystery to me, too.
Specificity and sensitivity can be computed simultaneously, as shown in the example below using a hypothetical Disease Test. The results are in and the following table has been obtained:
Results from Hypothetical Screening Test
COMPUTING SENSITIVITY AND SPECIFICITY USING SAS
Step 1 (optional): Reading the data into SAS. If you already have the data in a SAS data set, this step is unnecessary.
The example below demonstrates several SAS statements in reading data into a SAS dataset when only aggregate results are available. The ATTRIB statement sets the length of the result variable to be 10, rather than accepting the SAS default of 8 characters. The INPUT statement uses list input, with a $ signifying character variables.
a statement on a line by itself, precedes the data. (Trivial pursuit fact : CARDS; will also work, dating back to the days when this statement was followed by cards with the data punched on them.) A semi-colon on a line by itself denotes the end of the data.
DATA diseasetest ;
ATTRIB result LENGTH= $10 ;
INPUT result $ disease $ weight ;
positive present 240
positive absent 40
negative present 60
negative absent 160
Step 2: PROC FREQ
PROC FREQ DATA= diseasetest ORDER=FREQ ;
TABLES result* disease;
WEIGHT weight ;
Yes, plain old boring PROC FREQ. The ORDER = FREQ option is not required but it makes the data more readable, in my opinion, because with these data the first column will now be those who had a positive result and did, in fact, have the disease. This is the numerator for the formula for sensitivity, which is:
Sensitivity = (Number tested positive)/ (Total with disease).
TABLES variable1*variable2 will produce a cross-tabulation with variable1 as the row variable and variable2 as the column variable.
Weight weightvariable will weight each record by the value of the weight variable. The variable was named ‘weight’ in the example above but any valid SAS name is acceptable. Leaving off this statement will result in a table that only has 4 subjects, 1 subject for each combination of result and disease, corresponding to the data lines above.
Results of the PROC FREQ are shown below. The bottom value in each box is the column percent.
Because the first category happens to be the “tested positive” and the first column is “disease present”, the column percent for the first box in the cross-tabulation – positive test result, disease is present – is the sensitivity, 80%. This is the proportion of those who have the disease (the disease present column) who had a positive test result.
|Table of result by disease|
Output from PROC FREQ for Sensitivity and Specificity
The column percentage for the box corresponding to a negative test result and absence of disease is the value for specificity. In this example, the two values, coincidentally, are both 80%.
Three points are worthy of emphasis here:
- While the location of specificity and sensitivity in the table may vary based on how the data and PROC FREQ are coded, the values for sensitivity and specificity will always be diagonal to one another.
- This exact table produces four additional values of interest in evaluating screening and diagnostic tests; positive predictive value, negative predictive value, false positive probability and false negative probability. Further details on each of these, along with how to compute the confidence intervals for each can be found in Usage Note 24170 (SAS Institute, 2015).
- The same exact procedure produces six different statistics used in evaluating the usefulness of a test. Yes, that is pretty much the same as point number 2, but it bears repeating.
Speaking of that SAS Usage Note, you should really check it out.