|

Why won’t SAS see this character variable is equal?

Playing around today with possible activities for my Biostatistics course and used this simulated random sample of race by antigen type. Here is a random trivia question for you that took me about 10 minutes to figure out.

By the way, ChatGPT did not help

I’m interested in how my students could use ChatGPT or other AI, so I often paste questions into ChatGPT or Google Bard to see how the answers compare to reality. ChatGPT said this code should return observations for the second analysis, but it did not.

The second PROC FREQ said “no observations”. Why?

The first chi-square showed, as expected, not enough observations per cell. I decided to try to get a 2x 2 contingency table and a Fisher’s chi-square, but first I was just going to do a chi-square for two racial groups.

data blood ;
	input race $  num  antigen $ ;
	datalines;
	white 15 a_anti
	white 6 b_anti 
	white  0 ab_anti 
	white  29 no_antigen
	african_am 7 a_anti
	african_am 4 b_anti 
	african_am  0 ab_anti 
	african_am  9 no_antigen
	asian_am 7 a_anti
	asian_am  5 b_anti 
	asian_am   1 ab_anti 
	asian_am   7 no_antigen
	;
	
   Title "Data from https://sciencecourseware.org/biol300/bloodgroups.html" ;
proc freq data=blood ;
  tables race*antigen / chisq;
  weight num ;
  run;
proc freq data=blood ;
  tables race*antigen / chisq;
  where race in ("white","asian_am") ;
  weight num ;
  run;

How I figured it out

First, I looked at the output for the initial chi-square and saw race was listed as “white” and “asian_a”. So, I thought perhaps it was truncating the value of race when it was read in to 8 characters – but that doesn’t explain why “white” didn’t match. Anyway, I tried it again with “white” and “asian_a” as the two values and still nothing.

I thought perhaps there were trailing blanks, I tried using the trim function to remove trailing blanks. Still, no observations.

I did this:

DATA test2;
   SET blood ;
   SUBSTR(race,1,3) ;
PROC PRINT DATA = test2 ;

And that’s when I spotted it – the first three characters were ” wh” and ” as”. There was a LEADING blank!

There is a function for deleting all blanks as well but I actually just went and deleted the 12 spaces in the 12 lines and it worked.

How did those leading blanks get in there in the first place? I don’t know. I might have just typed a space without thinking or copied the labels from a table.

It’s a little thing, but all told, I spent 10 minutes trying to figure this out, so I thought I’d post the answer here.

Similar Posts

3 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *