Why chi-square is expecting the expected value

This is one of those things that is obvious after someone points it out to you and you smack your head saying, “Of course! I knew that.”

As I was going through everything I have to say about analyzing categorical data trying to winnow it down to a three-hour workshop for the WUSS conference (Western Users of SAS Software) next week, I wondered how many people ever THOUGHT about probability again once they had finished that chapter or two in their statistics course.

Professors are optimistic when they believe that students forget almost everything they have learned six months after the course. I have found that if you give chapter tests, students forget a lot of what they have learned by the next week. And I don’t blame them. Very seldom have I seen a real effort made in textbooks to draw connections back to what was learned previously. This is why I have a hatred, varying only in degree of venom, for all mathematics textbooks ever written.
So, as a public service, here is what the information you learned about probabilities has to do with expected value.

The probability of two independent events occurring is the product of their individual probabilities. That is, under the assumption that

the probability of event A occurring – P(A)

— is unrelated to

the probability of event B occurring – P(B)

— then the probability of A and B occurring , which is written as P(A U B) and read as “the probability of the union of A and B)

is equal to P(A) * P(B)

Let’s say that whether or not you have your own desk at home (yes or no) as a middle school student is unrelated to gender. Parents are equally likely to provide a desk for a boy or a girl.

Let’s say we have a population of 7,286 eighth-graders that is almost exactly divided between girls (50.51%) and boys (49.49%).

We also find that

of those 7,286 eighth-graders, 85.08% have their own desk.

Then our EXPECTED frequency for girls having their own desk is 50.51% times 85.08% times 7,286

.5051 * .8508 * 7286 = 3,131

What an amazing coincidence, that is exactly what the expected frequency is in this table.

If you remember (and if you never knew, let it be a brand new surprise to you) that the chi-square is calculated by the sum of the observed minus the expected squared (hence the name chi-square) divided by the expected

So, the further your observed frequency is from the frequency expected under the assumption the two variables are independent, the larger your chi-square value.

Why divide by the expected? Well, if your expected value is 10 and your observed value is 20 then 10 more than expected is a lot of difference, it is twice what was expected. On the other hand if your expected value is 2,000 and your observed value is 2,010 then your observed is actually pretty close to the expected, percentage-wise

How to get some tables….

I was feeling all pointy and clicky today so I produced the SAS table above using SAS Enterprise Guide. Go to the TASKS menu, select DESCRIBE and TABLE ANALYSIS. Under cells be sure to click on expected frequency and cell percentages. (If you are using a screen reader, click here for an html version of the table)

If you want to do the same thing in SPSS you can use this syntax

CROSSTABS
/TABLES=ITSEX BY BS4GTH03
/FORMAT=AVALUE TABLES
/STATISTICS=CHISQ
/CELLS=COUNT EXPECTED TOTAL
/COUNT ROUND CELL.

Or, you can go to ANALYZE then DESCRIPTIVE STATISTICS then CROSSTABS then click on CELLS and click the button next to expected.

And now I was feeling guilty because even though we have four desks in the house, two are in my office, one is upstairs and one is in the living room so that anyone who wants to work on the computer while watching TV can. None of them belong to the world’s most spoiled 13-year-old personally.

But .. then I re-read the question and saw that it just asked if there was a study desk or table the student could use. So, we are off the hook. Which is a good thing, too, because her shopping list for today includes:

One Halloween costume

Zero Desk

All of the make-up sold by MAC and Sephora

How to write a statistical analysis paper: Step 4

ByAnnMaria De Mars May 24, 2015May 24, 2015

We’ve looked at data on Body Mass Index (BMI) by race. Now let’s take a look at our sample another way. Instead of using BMI as a variable, let’s use obesity as a dichotomous variable, defined as a BMI greater than 30. It just so happened (really) that this variable was already in the data…

Software | statistics | Technology

Interestingness from WUSS: Part 2 Condensing Big Data

ByAnnMaria De Mars September 8, 2014September 8, 2014

Sometimes the benefits of attending a conference aren’t so much the specific sessions you attend as the ideas they spark. One example was at the Western Users of SAS Software conference last week. I was sitting in a session on PROC PHREG and the presenter was talking about analyzing the covariance matrix when it hit…

statistics

A statistical picture is worth 1,000 words

ByAnnMaria De Mars August 4, 2013August 4, 2013

One nice thing that SAS Enterprise Guide does is produce a series of graphs when you do a logistic regression. Too many people just skim over the table of Type III effects, say what is significant and isn’t and go on their merry way, which is too bad, because sometimes your graphs are very easy…

Software | statistics

What would you do if one person changed your results?

ByAnnMaria De Mars December 30, 2017

This is a hypothetical question, but it could easily happen. Let me give you a real example. Using a mobile phone game, we administered a standard depression screening measure (CESD-C) to 18 children living on or near an American Indian reservation. All children had a family member who was an alcoholic or addicted to drugs. …

Software | statistics | Technology

What I learned from my favorite paper at SAS Global Forum

ByAnnMaria De Mars May 2, 2016

At first, I was thinking it wasn’t right to have a favorite paper, but then I realized that was idiotic. It’s not like these papers (or their presenters) are my children. My favorite paper was, Statistical modeling for large complex data: Five new directions from SAS/STAT software If you’re not a statistician, props to you…

Dr. De Mars General Life Ramblings | Software | statistics | Technology

And … SAS Enterprise Miner is Running on Boot Camp

ByAnnMaria De Mars June 3, 2014June 10, 2014

Thank you to Jason Kellogg from SAS Technical Support, SAS On-Demand Enterprise Miner is now running on my Mac using Windows 8.1 with boot camp. Here were his instructions. Note, this is after you have a SAS profile, registered a course, changed the security settings in Java, now you are here The steps are: 1….

5 Comments

Pingback: What does everybody already know about categorical data? : AnnMaria’s Blog

I love this article, well done!

Just a clarification “then the probability of A and B occurring , which is written as P(A U B) and read as “the probability of the union of A and B)
This should be P(A n B) probability of A intersection B.

Pingback: Evwij Blog

Pingback: Naturals Blog

Similar Posts

5 Comments

Leave a Reply