Playing around today with possible activities for my Biostatistics course and used this simulated random sample of race by antigen type. Here is a random trivia question for you that took me about 10 minutes to figure out.
By the way, ChatGPT did not help
I’m interested in how my students could use ChatGPT or other AI, so I often paste questions into ChatGPT or Google Bard to see how the answers compare to reality. ChatGPT said this code should return observations for the second analysis, but it did not.
The second PROC FREQ said “no observations”. Why?
The first chi-square showed, as expected, not enough observations per cell. I decided to try to get a 2x 2 contingency table and a Fisher’s chi-square, but first I was just going to do a chi-square for two racial groups.
data blood ; input race $ num antigen $ ; datalines; white 15 a_anti white 6 b_anti white 0 ab_anti white 29 no_antigen african_am 7 a_anti african_am 4 b_anti african_am 0 ab_anti african_am 9 no_antigen asian_am 7 a_anti asian_am 5 b_anti asian_am 1 ab_anti asian_am 7 no_antigen ; Title "Data from https://sciencecourseware.org/biol300/bloodgroups.html" ; proc freq data=blood ; tables race*antigen / chisq; weight num ; run; proc freq data=blood ; tables race*antigen / chisq; where race in ("white","asian_am") ; weight num ; run;
How I figured it out
First, I looked at the output for the initial chi-square and saw race was listed as “white” and “asian_a”. So, I thought perhaps it was truncating the value of race when it was read in to 8 characters – but that doesn’t explain why “white” didn’t match. Anyway, I tried it again with “white” and “asian_a” as the two values and still nothing.
I thought perhaps there were trailing blanks, I tried using the trim function to remove trailing blanks. Still, no observations.
I did this:
DATA test2; SET blood ; SUBSTR(race,1,3) ; PROC PRINT DATA = test2 ;
And that’s when I spotted it – the first three characters were ” wh” and ” as”. There was a LEADING blank!
There is a function for deleting all blanks as well but I actually just went and deleted the 12 spaces in the 12 lines and it worked.
How did those leading blanks get in there in the first place? I don’t know. I might have just typed a space without thinking or copied the labels from a table.
It’s a little thing, but all told, I spent 10 minutes trying to figure this out, so I thought I’d post the answer here.