How to solve any (statistics) problem: Part 3, proportions

Last month, I wrote about the steps to solving any statistics problem.

A Pew Research Poll asked 1,201 adults

“All in all, do you think affirmative action programs designed to increase the number of black and other minority students on campus are a good thing or a bad thing? Sixty percent said good, 30% said bad and 10% said don’t know. Let π denote the population proportion who said it is good. Find the p-value for testing Ho : π= 0.50 against Ha π ≠ 0.50 . Interpret.

Ronda punching Crystal — Chill. It’s just my daughter and a friend goofing off

Going back to our previous discussion of steps in solving a problem, let’s look at these one at a time as they apply to this question about proportions.

1. Chill. I have accomplished this by sitting here with a cup of Chamomile tea and a glass of Chardonnay. I intend to finish both of them before this post. In all seriousness, though, I think students often get problems wrong just because they panic. What do I do? What do I do? Relax. Take a sip of wine. Okay, better? Yes, and the picture above is my daughter and her friend, just goofing off in the photo shoot for my book. Seriously. Chill.

2. Understand the problem.

60% said affirmative action was good

30% said it was bad

10% were undecided

The question asks you to test the PROPORTION (not the mean) who said affirmative action was good. This has nothing to do with the 30% or 10%. You do NOT want to compute a chi-square here. That would test if there was an association between two variables, the respondents’ rating of affirmative action and some other variable. You do not have any other variable.

3. Select a strategy.

Identify the statistic you need, the formula you will need to obtain that statistic and the numbers you need to find to plug into that formula.

Then, compare the obtained statistic to the table of critical values.

You need a z-value. To get a z-value you use this formula:

z= Obtained proportion – Hypothesized population proportion
standard error

Since it starts with a P for proportion and population, let’s call the hypothesized population proportion ∏

To get the standard error, you use this formula

The SQUARE ROOT OF

∏*(1 -∏)
N

4. Execute the strategy

First, I need to find the standard error. My ∏ value is .50 – remember, that is the hypothesized population value, not the value you obtained.

So — I take .5 * (1-.5) and get .25

I divide that by 1,201 and get .000208 or thereabouts.

I take the square root of that and get .014

Now, I have my standard error. I think bells should ring at this point, but they did not. I was sadly disappointed so I drank some more Chardonnay to get over it.

Now, I calculate my z-value by plugging in more numbers. The obtained value is .60, the hypothesized value (∏ ) is .50 and my standard error is .014

.60 – .50
.014

that equals 6.93

I compare that to a z table in a handy dandy statistics textbook, which only goes up to 5.0 but that has a probability of way less than .0001, so I call it a day, saying that it is extremely unlikely one would get a proportion of 60% in a sample of 1,201 people if the population proportion was truly 50%. This assumes all of the usual suspects, that is no bias in the wording of the question, random sampling.

5. Test it. Evaluate your answer. The first thing I always do is a reality check. Unless I’ve had a LOT of glasses of Chardonnay, I can generally perceive reality fairly well. Does it make sense that a sample that large would be that far off? No, not really. So, it does seem pretty likely that if the obtained proportion from well over 1,000 people randomly sampled is 60% it is not as low as 50% in the population.

Another way I might test it is to throw it into some statistical software and see if I get the same answer. Maybe if I’m feeling ambitious, I’ll do that tomorrow. Sadly, I am now all out of chamomile tea. Happily, there is still more wine.

How to solve any (statistics) problem: Part 3, proportions

All the little models come home to nest

SAS and SPSS Give Different Results for Logistic Regression but not really

Mr. JMP, Bubbles and the Wandering Cheetahs

Mama, what’s a scree plot?

Travels through Open Data Land, with old people, flashlights & cigars

Repeated measures with SAS: Common mistakes in PROC GLM

One Comment

Leave a Reply

Similar Posts

One Comment

Leave a Reply