statistics

What does everybody already know about categorical data?

ByAnnMaria De Mars October 3, 2011

I’m teaching a class on categorical data analysis after the Western Users of SAS Software conference next week. As always, I have WAY more information than I can cover. Handouts are limited to 40 pages so I sent the organizers 80 slides but I know I am going to cover way more than that. Why not put them three to a page? Because that is just silly. I’d rather have people have 80 slides they can read than 120 they can’t.

From lectures and papers over the years, I have way, way more material than will fit in three hours. Now, the question is what do I include and what do I leave out? There are some obvious things to leave in:

How to code and interpret a logistic regression analysis.
How to interpret model fit statistics.
What is an odds ratio and how do you get it?

The above points all address things that people will commonly want to do, like use multiple variables to predict which category a person will fall into (hence the need for logistic regression).

What can everyone be expected to know already?

Okay, that’s the easy part. The not as easy part is to know what everyone can be expected to already know, as I don’t want to waste anyone’s time.

How many people really look at those different chi-square values like Pearson and the maximum likelihood chi-square? Does everyone know WHY the expected frequency is expected to be that number?

Does pretty much everybody know what a phi coefficient was? Yes, I know we all learned it in basic statistics but how many people never thought about it again?

Can I just skip discussing the marginal distribution and conditional distribution because “everyone knows that”?

How about computing confidence intervals with PROC FREQ ?

What about testing the null hypothesis that the population value is a specific proportion, also using PROC FREQ ?

In a normal household one might ask one’s spouse, to at least get some indication if “the man on the street” would be familiar with a topic. I, on the other hand, married someone who decided to pursue a doctorate in particle physics because he found nuclear physics too easy. Somehow, I don’t think he does a very good imitation of the man in the street.

Here is my plan right now:

Collate everything I have on categorical data analysis, including the material from the two courses I taught years ago on non-parametric statistics which I had forgotten that I had taught until I found the powerpoint presentations in a folder. Then I remembered, oh yeah, THOSE courses on non-parametric statistics!
Put these in order in an outline under “Questions you want answered”.

These are the questions I have so far:

Are your data any good? (Always a good question to ask first)
What is the distribution of X ?
What is the distribution of X given Y?
Is there a significant relationship between X and Y?
Given X, what are the odds of Y?
How well, and with what variables, can we predict which category of X a person falls into?
Is this set of variables significantly better for predicting X than that other set of variables lying over there?

Then, there are those questions of special cases:

What if you only have a small number of cases in one or more cells?

What if your data are repeated measures?

Any suggestions, experience or good categorical data analysis jokes are welcome.

(Hey, if there are SQL jokes, there must be some categorical data analysis jokes. )

Tip of the day: A three-way interaction has an entirely different meaning in categorical data analysis than it does in an X-rated video. I actually found a three-way interaction with sex within military service. It was not at all exciting. It only meant that the relationship between school experiences and plans to enter the military varied by gender.

Dr. De Mars General Life Ramblings | statistics

Significance:The magazine – and why you should join ASA

ByAnnMaria De Mars November 24, 2012November 24, 2012

I admit that some months I am so busy that I toss Significance out without reading it – this is the magazine of the American Statistical Association (ASA) and Royal Statistical Society. No, I don’t pile up things to read later because I never do read them later. Anyway … taking two days off work,…

20 Day Blogging | statistics

I’m going to do that again: Day 4 of the Blogging Challenge

ByAnnMaria De Mars January 10, 2014January 14, 2014

Amazingly, given my current schedule, I have made it to Day 4 of the 20-day blogging challenge. This was the brain child of Kelly Hines as a way to get herself to blog more regularly. Today’s prompt was : Share a topic/ idea from class this week. What’s one thing you did with students this…

Software | statistics

Errors in Repeated Measures ANOVA – let me count the ways

Byannmaria April 17, 2019

As I said in my last post, repeated measures ANOVA seems to be one of the procedures that confuses students the most. Let’s go through two ways to do an analysis correctly and the most common mistakes. Our first example has people given an exam three times, a pretest, a posttest and a follow up…

Software | statistics

Confirmatory Factor Analysis with AMOS: OMG it’s THIS button

ByAnnMaria De Mars August 1, 2011August 2, 2011

I’ve forgotten more about statistical software than you’ll ever know! I don’t know why people ever say this in a bragging tone because I consider that to be my problem. I’ve forgotten it. Today, I needed to do a confirmatory factor analysis with someone using AMOS. They wanted it in AMOS so that is what…

statistics

Your equations are right but you’re still wrong

ByAnnMaria De Mars February 24, 2010February 24, 2010

Last week, a couple of really sharp cookies from JMP were on campus giving a presentation and their academic program manager, Curt Hinrchs commented that what is really needed is a course on statistical thinking. I think he is absolutely on to something. I mentioned in my last post how there is a debate over…

Software | statistics

Exploratory Factor Analysis with Mplus

ByAnnMaria De Mars May 15, 2013

Previously, I discussed how to do a confirmatory factor analysis with Mplus. What if you aren’t sure what variables should load on what factor? Then you are doing an exploratory factor analysis. Really, you should probably do the exploratory factor analysis first unless you have some very large body of research behind you saying that…

6 Comments

Chris Hemedinger says:

October 3, 2011 at 1:33 pm

Well, since you started it…here’s a lame SAS joke that might require the documentation to decipher:

A FREQ walks into a bar, orders a chi-square. The bartender says: hey, we don’t serve homogeneities here, go WEIGHT outside.
AnnMaria says:

October 3, 2011 at 1:40 pm

Okay, that was a dumb joke but I admit that I still laughed anyway.
Waynette Tubbs says:

October 4, 2011 at 10:57 am

I know for a fact that I wouldn’t be considered for the “man on the street” in your example above, but I laughed out loud at Hemedinger’s joke.
Joe Levy says:

November 25, 2011 at 8:42 am

Hi

Can you share the slides on categorical data with your readers?

Thank you
AnnMaria says:

November 26, 2011 at 12:11 am

The slides on logistic regression are here

http://www.thejuliagroup.com/presentations.html

along with a lot of other presentations stuff
Joe Levy says:

November 29, 2011 at 4:52 pm

Thank you!

Similar Posts

6 Comments

Leave a Reply