How to write a statistical analysis paper: Step 4

ByAnnMaria De Mars May 24, 2015May 24, 2015

We’ve looked at data on Body Mass Index (BMI) by race. Now let’s take a look at our sample another way. Instead of using BMI as a variable, let’s use obesity as a dichotomous variable, defined as a BMI greater than 30. It just so happened (really) that this variable was already in the data set so I didn’t even need to create it.

The code is super-simple and shown below. The reserved SAS keywords are capitalized just to make it easier to spot what must remain the same. Let’s look at this line by line

LIBNAME mydata “/courses/some123/c_1234/” ACCESS=READONLY;
PROC FREQ DATA = mydata.coh602 ;
TABLES race*obese / CHISQ ;
WHERE race NE “” ;
RUN ;

LIBNAME mydata “/courses/some123/c_1234/” ACCESS=READONLY;

Identifies the directory where the data for your course are stored. As a student, you only have read access.
PROC FREQ DATA = mydata.coh602 ;

Begins the frequency procedure, using the data set in the directory linked with mydata in the previous statement.

TABLES race*obese / CHISQ ;

Creates a cross-tabulation of race by obesity and the CHISQ following the option statistic produces the second table you see below of chi-square and other statistics that test the hypothesis of a relationship between two categorical variables.
WHERE race NE “” ;

Only selects those observations where we have a value for race (where race is not equal to missing)
RUN ;

Pretty obvious? Runs the program.

Similar to our ANOVA results previously, we see that the obesity rates for black and Hispanic samples are similar at 35% and 38% while the proportion of the white population that is obese is 25%. These numbers are the percentage for each row. As is standard practice, a 0 for obesity means no, the respondent is not obese and a 1 means yes, the person is obese.

The CHISQ option produces the table below. The first three statistics are all tests of statistical significance of the relationship between the two variables.

You can see from this that there is a statistically significant relationship between race and obesity. Another way to phrase this might be that the distribution of obesity is not the same across races.

The next three statistics give you the size of the relationship. A value of 1.0 denotes perfect agreement (be suspicious if you find that, it’s more often you coded something wrong than that everyone of one race is different from everyone of another race). A value of 0 indicates no relationship whatsoever between the two variables. Phi and Cramer’s V range from -1 to +1 , while the contingency coefficient ranges from 0 to 1. The latter seems more reasonable to me since what does a “negative” relationship between two categorical variables really mean? Nothing.

From this you can conclude that the relationship between obesity and race is not zero and that it is a fairly small relationship.

Next, I’d like to look at the odds ratios and also include some multivariate analyses. However, I’m still sick and some idiot hit my brand new car on the freeway yesterday and sped off, so I am both sick and annoyed. So … I’m going back to bed and discussion of the next analyses will have to wait until tomorrow.

Software | statistics

PPS sampling, PROC SURVEYSELECT and not getting naked in church
ByAnnMaria De Mars August 23, 2012August 24, 2012

As statisticians, we like to say that statistics is everywhere. Here is an example. Regular readers of this blog might know that my darling daughter number three is the world champion in mixed martial arts. There is a very wide gap in the general discourse at mixed martial arts events and, say, the Joint Statistical…

Read More PPS sampling, PROC SURVEYSELECT and not getting naked in church
Dr. De Mars General Life Ramblings | Software

How many lines of code have you deleted?
ByAnnMaria De Mars March 16, 2014March 18, 2014

One of the many questions on start-up accelerator applications that make me go “Hmm”, is this question : How many lines of code have you written? I have heard of, but thankfully never worked at, organizations that evaluated their technical staff by the lines of code written. Let me give you two stories that illustrate…

Read More How many lines of code have you deleted?
statistics

SHOWING students statistics
ByAnnMaria De Mars June 26, 2011June 26, 2011

Science is boring! Math is boring! This is the whine of the world’s most spoiled 13-year-old as she does her homework, and I find it hard to argue with her because I have read her textbooks and all of them could be put to a better use as a cure for insomnia or starting a…

Read More SHOWING students statistics
computer games | Technology

Research design meets actual people: 7 Generation Games
ByAnnMaria De Mars December 10, 2013

Today was my most recent experience in the clash of commercial and academic cultures. For seven years, I was an assistant and then associate professor, teaching statistics and research methods, writing articles for academic journals. For five years before that, I was a graduate student at the University of California. I even did a post-doc…

Read More Research design meets actual people: 7 Generation Games
statistics

Yes, You Totally CAN Understand Model Fit Statistics, with M & M’s
ByAnnMaria De Mars October 15, 2014

Ever wonder why with goodness of fit tests non- significance is what you want? Why is that sometimes when you have a significant p-value it means your hypothesis is correct, there is a relationship between the price of honey and the number of bees, and in other cases, significance means your model is rejected? Well,…

Read More Yes, You Totally CAN Understand Model Fit Statistics, with M & M’s
statistics

Systematic random sampling: As useful as Roman numerals?
ByAnnMaria De Mars October 20, 2013October 20, 2013

Why do we still teach systematic random sampling as an option? As you may recall from your Statistics 101, simple random sampling is when you select from the sample at random. So, if you want 100 people out of a sample of 10,000 in a dataset, you would pull a random sample by, most likely,…

Read More Systematic random sampling: As useful as Roman numerals?

Similar Posts

Leave a Reply