statistics

Chi-square, by request, and not in a few words

ByAnnMaria De Mars December 15, 2008December 17, 2008

Recently, someone asked me if I could explain chi-square in a few words. The short answer is, “No, I am incapable of using only a few words for any purpose whatsoever. If you doubt this, ask any of my children.”

What is chi-square?
Chi-square is a measure of relationship between two categorical variables. For example, let’s pick Proposition 8, the recent initiative on the ballot regard gay marriage. The two categories of voters were “For” and “Against”. This initiative was passed 52% to 48%. Remember these proportions. They are important later on.

The gender of voters also fell into two categories, “Male” and “Female”, who are around 49% and 51% of the population. If I wanted to test for whether there was relationship between votes on Prop 8 and gender, a chi-square would be a great test to use.

The null hypothesis being tested is : “There is no relationship between gender and how one voted on Proposition 8”.

I tried to find actual data on this relationship but after searching through a lot of websites and articles trying to find facts on this issue, I was depressed by the number of people who hate other groups of people and were not at all reluctant to write about it, data or no, so I just gave up. Here, proceeding without any interference from real data, is a hypothetical example.

We find 1,000 people who are willing to tell us how they voted and their gender. Just to make life easier, we deliberately select 500 males and 500 females. This gives us a two by two table

Gender     Vote
Yes         No
Female      237        263
Male          280        220

More males voted yes and more females voted no. Was this just random or are males really more likely to vote against gay marriage?

The formula for a chi-square is sum of the observed number in each cell minus the expected number, squared, and divided by the expected.

In this case, if there were no relationship between gender and which way you voted, the expected number in each cell would be 260 yes (52%) and 240 no (48%) for both male and female.

In the first cell, we have (237- 260) ** 2 / 260 = 2.03
In the second cell, we have (263 – 240)**2/ 240 = 2.20
In the third cell, (280 – 260)**2 / 260 gives us 1.54
And, in the fourth cell (220 – 240)**2 / 240 = 1.67

I end up with a chi-square value of 7.4 which is statistically significant.. The probability of obtain a chi-square value of 7.4 is less than .01, or one out of 100. Therefore, if these data were real and not some random numbers that I made up, I could conclude that women are less likely to be opposed to gay marriage than men.

Why did I detour into chi-square when I said I was going to spend the next week talking about categorical models? It’s not a detour, really.

Understanding chi-square is one of the building blocks of getting into log-linear models and more. Next, I want to talk about another basic statistic, the phi coefficient, and how, like marzipan, it really isn’t all it’s cracked up to be.

============================

How to get a chi-square in SAS:

Proc freq data = datasetname ;

tables variable1 * variable2 / chisq ;

============================

How to get a chi-square in SPSS

CROSSTABS

/TABLES = variable1 BY variable2

/ STATISTICS = CHISQ.

=====================================

Chi-square in Stata

tabulate variable1 variable2 , chi2

Now you know more than you wanted to know about chi-square.

computer games | statistics

Day 2: Start-up News – Boring, Important Measurement

ByAnnMaria De Mars January 11, 2013January 11, 2013

It never ceases to amaze me that intelligent people will spend huge amounts of time doing a literature review, designing elaborate theories, generating elegant hypotheses, selecting a three-stage stratified random sample, performing multivariate analyses, and their measures on which this brilliant study rests are some questions they made up with their three best friends over…

Software | statistics | Technology

Mixed models with SAS Enterprise Guide – Not Really

ByAnnMaria De Mars February 13, 2013February 13, 2013

I was going to use SAS Enterprise Guide 4.3 with SAS On-Demand to do my mixed model analysis, but it did not quite work out. First of all, if like me you are used to doing PROC GLM where each subject is one record, you have to change your dataset to be one where each…

statistics

Residuals are not an insect but they still bug me

ByAnnMaria De Mars January 11, 2011January 11, 2011

Today, I commented to one of my daughters that I was examining residuals. She asked if that was a kind of insect, like a termite. I told her no, but they still were bugging me. To a statistician, all of the variance in the world is divided into two groups, variance you can explain and…

Software | statistics | Technology

Factor analysis is your friend: Quit whining and learn it

ByAnnMaria De Mars June 23, 2013

Too often, when I look at the surveys some people design, I have the same thought as when I see my granddaughter with a lollipop bigger than her head – Just what exactly do you think that you are going to DO with that? The problem is that both may have metaphorically (or, in Eva’s…

statistics

MANOVA, finally

ByAnnMaria De Mars June 15, 2017

So, after three posts of recoding, creating scales, checking reliability and distributional assumptions we have arrived at MANOVA. If you skipped those three posts, feel shame at trying to take shortcuts, go back and read them. Before we dive into coding, let’s take a look at some basic background on MANOVA. The difference between ANOVA…

statistics

The rumors of our sucking at math have been greatly exaggerated

ByAnnMaria De Mars May 27, 2011May 27, 2011

Question authority. Whenever I hear authoritative statements made that don’t fit with the world I see around me, I try to follow up. How many times have we been told that the U.S. is just terrible in math, we are falling behind educationally, China and India are eating our lunch – deservedly so, because our…

3 Comments

missukamaka says:

October 5, 2013 at 11:14 am

Thank you for this information!
I am trying to familiarize myself with stats and it has not been easy. My question, how did you determine that the value of 7.4 was statistically significant?
AnnMaria says:

October 5, 2013 at 12:13 pm

You can read the p value on your printout. If you calculated the chi-square by hand or on a calculator you can look up the probability in a chi-square table.
missukamaka says:

October 6, 2013 at 10:44 am

Thank you!

Similar Posts

3 Comments

Leave a Reply