Science is boring! Math is boring!
This is the whine of the world’s most spoiled 13-year-old as she does her homework, and I find it hard to argue with her because I have read her textbooks and all of them could be put to a better use as a cure for insomnia or starting a fire than actually teaching children.
Math and science teaching often IS boring, but it doesn’t have to be.
I just read Alan Alda’s book, Things I overheard while talking to myself in which he discusses making carrying a glass of water 25 feet dramatic by announcing if the student spills a single drop his entire village will die. (No, the university does not actually give him the authority to wipe out a student’s home town, but nonetheless, it is the SET-UP that makes it interesting.)
If he can make carrying a glass of water interesting, why can’t we do it for statistics?
One of the reasons students are bored is that we tell them about statistics rather than showing them. One statement I don’t hear very often in statistics classes is,
“Let’s see what happens.”
For example, in my last post I gave an example of a chi-square with two variables, both with two options, yes or no. I said that
One group is much larger than the other - in this case, the students who said they did use a computer at home were 91% of the total sample,
AND there is a significant chi-square, showing a relationship,
AND you can look at the cell chi-square values and see that most of this chi-square value comes from the cells of the smaller group.
Do you think this will always happen?
How do we get a cell chi-square value? It is based on how much the observed number in a cell differs from the expected number.
Let’s say that 60% of all of the people we survey said “Yes” that they sometimes use a computer at public facilities like libraries. If there really is no relationship between having a home computer and public use, we EXPECT that 60% of the 6,600 people who have a home computer will say, yes they use public computers (that’s an expected number of about 4,000). We’ll also expect 60% of the 630 people who DON’T have a home computer to say that yes they use public computers (that’s an expected number of about 380, if you’re keeping track).
When we actually run the cross-tabulation and look at the cell chi-square values, we find that the people who do have a home computer are right around 60%, but the people who don’t have a home computer are far off.
At this point, I would ask students in the class what they thought would happen if we had equally sized groups. Very often, people will guess that the group that has the huge cell chi-square values will continue to be responsible for most of the total chi-square value, whether the two groups (people who do and don’t have a home computer) were equal in size or not.
Let’s test it and see what happens.
Re-read that last statement. To create drama in any situation, that’s a very useful thing to say.
Fortunately, SAS has a surveyselect procedure I can use to select a stratified random sample of 140 people who do and don’t have home computers. I re-run the analysis and look at my new results. Lo and behold, I am correct!
The chi-square is still significant, but this time, instead of 90% of the chi-square value coming from the two cells for the people who don’t have a computer at home, this time the cell chi-square values are all about equal.
Why is that? Do you think this will always happen? Do you want to run it again with a different sample?
After running it a few more times, we can have a discussion on WHY the cell chi-square values are larger when you have an uneven distribution.
Yes, students memorize that you get a cell chi-square value by squaring (observed – expected) and dividing by the expected frequency.
Where do you get that expected frequency? You get it from the whole sample, right? So if almost all of the sample is in one category, your expected frequency is going to be whatever it is for that category.
Does this mean that the cell chi-square is a useless statistic? No. Sometimes it can be very useful.
Now, admit it, aren’t you at least just a little bit curious about what those times are?
Disclaimer: It is true that there are some students who are just not going to be interested in statistics unless the professor sets herself on fire, but for most students, just that little bit of set up, asking what the student thinks will happen, and that magic phrase, “Do you want to find out?” will spark students’ interest, at least for a minute.
IN CASE YOU ARE INTERESTED, THE CODE TO DO THIS IS BELOW
/* Create an html file and run the cross-tab with unequal distribution */
/* Request expected frequency, chi-square and cell chi-square statistics */
ods html file = "C:\TIMSS\sasout\chisq2.html" (title = "Cell-chi-square with UNEQUAL distributions") style = brick ;
Title "Cell Chi-Square and Expected Values with Unequal Distribution" ;
proc freq data = lib.student_int ;
tables Bs4GCels* BS4GCHOM / expected chisq cellchi2 ;
ods html close ;
/* Sort the data by strata */
proc sort data = lib.student_int ;
by BS4gchom ;
/* Select a stratified random sample of 140 in each stratum */
proc surveyselect data = lib.student_int out=computer method = srs n = 140;
strata BS4GCHOM ;
/* Re- run the analysis using the sample with equal Ns per stratum */
ods html file = "C:\TIMSS\sasout\chisqsample.html" (title = "Cell-chi-square with EQUAL distributions" )
style = brick ;
Title "Cell Chi-Square and Expected Values with EQUAL Distribution" ;
proc freq data = computer ;
tables BS4GCELS* BS4GCHOM / expected chisq cellchi2 ;
ods html close ;