How to write a statistical analysis paper: Step 4

We’ve looked at data on Body Mass Index (BMI) by race. Now let’s take a look at our sample another way. Instead of using BMI as a variable, let’s use obesity as a dichotomous variable, defined as a BMI greater than 30. It just so happened (really) that this variable was already in the data set so I didn’t even need to create it.

The code is super-simple and shown below. The reserved SAS keywords are capitalized just to make it easier to spot what must remain the same. Let’s look at this line by line

LIBNAME mydata “/courses/some123/c_1234/” ACCESS=READONLY;
PROC FREQ DATA = mydata.coh602 ;
TABLES race*obese / CHISQ ;
WHERE race NE “” ;
RUN ;

LIBNAME mydata “/courses/some123/c_1234/” ACCESS=READONLY;

Identifies the directory where the data for your course are stored. As a student, you only have read access.
PROC FREQ DATA = mydata.coh602 ;

Begins the frequency procedure, using the data set in the directory linked with mydata in the previous statement.

TABLES race*obese / CHISQ ;

Creates a cross-tabulation of race by obesity and the CHISQ following the option statistic produces the second table you see below of chi-square and other statistics that test the hypothesis of a relationship between two categorical variables.
WHERE race NE “” ;

Only selects those observations where we have a value for race (where race is not equal to missing)
RUN ;

Pretty obvious? Runs the program.

Similar to our ANOVA results previously, we see that the obesity rates for black and Hispanic samples are similar at 35% and 38% while the proportion of the white population that is obese is 25%. These numbers are the percentage for each row. As is standard practice, a 0 for obesity means no, the respondent is not obese and a 1 means yes, the person is obese.

The CHISQ option produces the table below. The first three statistics are all tests of statistical significance of the relationship between the two variables.

You can see from this that there is a statistically significant relationship between race and obesity. Another way to phrase this might be that the distribution of obesity is not the same across races.

The next three statistics give you the size of the relationship. A value of 1.0 denotes perfect agreement (be suspicious if you find that, it’s more often you coded something wrong than that everyone of one race is different from everyone of another race). A value of 0 indicates no relationship whatsoever between the two variables. Phi and Cramer’s V range from -1 to +1 , while the contingency coefficient ranges from 0 to 1. The latter seems more reasonable to me since what does a “negative” relationship between two categorical variables really mean? Nothing.

From this you can conclude that the relationship between obesity and race is not zero and that it is a fairly small relationship.

Next, I’d like to look at the odds ratios and also include some multivariate analyses. However, I’m still sick and some idiot hit my brand new car on the freeway yesterday and sped off, so I am both sick and annoyed. So … I’m going back to bed and discussion of the next analyses will have to wait until tomorrow.

Translating Ruby to SAS (or vice-versa)

ByAnnMaria De Mars February 18, 2011February 18, 2011

I’ve just been hired on a project that uses SAS and I know nothing but SPSS, but I know that really well. Can I come over to your office tomorrow? … so began a very interesting afternoon, where the new employee would show me code that she had written, say VALUE LABELS and ask, “Can…

statistics

GUEST POST: Mental Health Outcome Statistics Saved my Life

ByAnnMaria De Mars September 16, 2011September 16, 2011

I met Corinna when she was a member of the 1996 Olympic team. In the intervening 15 years she has had a battle with mental illness – or maybe I should say, the mental health system. This article is re-posted, with permission, from her blog, where she writes about mental illness, mental health, motivational speaking,…

Technology

Why Business Travelers (and Grandmas) Need an iPad

ByAnnMaria De Mars April 20, 2012April 20, 2012

The rocket scientist told me that I am too one-dimensional and write too much about “that SAS thing you are always going on about”. Since I will be at SAS Global Forum for days and writing about nothing else, here, straight from BFE, Florida, in a nod to the paternal figure of The Spoiled One…

Software | statistics | Technology

Text Miner = Coolness

ByAnnMaria De Mars May 2, 2012May 2, 2012

In the past, when I had to do any type of parsing of text, I wrote my own code with a zillion SUBSTR functions and IF statements and it did the job but it was *so-o-o ugly and painful that I never even considered including text mining in any courses I taught. I looked into…

Dr. De Mars General Life Ramblings | statistics

Mixed Models and other new stuff

ByAnnMaria De Mars January 15, 2013

“I’m XX years old and I’ve never used (insert any statistical technique here). Why do I need to know this?” I’ve heard some version of the above on every topic from linear regression to structural equation models, and at every age from teenagers to executives in their sixties. I’ve even made comments like that myself….

Software | statistics | Technology

SAS On-Demand – Nice statistics teaching tool, with graphics & possibly cookies

ByAnnMaria De Mars July 18, 2013

I’ve been working with SAS Enterprise Guide 6.1 with SAS On-Demand on Virtual Box and I can’t find much to complain about. It is slow, but since I’m usually doing 14 things at once, while I’m writing for a task to complete I can read an email, download a file or answer a phone call….

Similar Posts

Leave a Reply