### Oct

#### 16

# How to solve any statistics problem

October 16, 2012 | 5 Comments

I was grading the quizzes from my Advanced Quantitative Data Analysis class. This is a class of really smart people in a doctoral program at a selective university. And yet, some of them still had problems with the quiz. Therefore, in however many parts I feel like doing, I am going to discuss how to solve any statistics problem.

**#1 CHILL !**

I mean this most seriously. Often, I see people make mistakes because they panic, think they can’t do it, underestimate themselves and think, “The problem cannot be that easy”.

Here is an example:

For 17 girls diagnosed with anorexia, weight change after family therapy was as follows:

11,11, 6, 9, 14, -3, 0, 7, 22, -5 , -4, 13, 13, 9, 4 , 6, 11

Partial results are shown below. Fill in the missing results:

Lower C.L. | Upper C.L. | t-value | df | 2-tail Sig |

3.60 | .0007 |

**#2 UNDERSTAND!**

What is it you are asked to do in the problem? You need to find the upper confidence limit for the mean, the t-value and the degrees of freedom.

What are the degrees of freedom for a t-test?

The degrees of freedom when you are estimating the mean with one sample is N-1, or 17-1, which is 16.

To understand a problem, look at the numbers you DO have.

- You have the lower confidence limit.
- You have all of the individual scores
- You know the number of scores (17)

Think about what you DO know (or can look up in a textbook)

- The mean is the sum of the scores divided by the number of scores
- The lower confidence limit is the obtained mean MINUS (t * standard error).
- The UPPER confidence limit is the obtained mean PLUS (t * standard error).

**#3 SELECT A STRATEGY**

There are a number of ways to find the upper confidence limit but all involve adding the value of (t*standard error) to the mean. With what you have from #2, I’d think the easiest strategy is

- Find what the mean is
- Find the difference between the lower confidence limit and the mean
- Add that number to the mean

This is often the step where people have trouble. I think it comes from three missteps. One is that they are too stressed out. The second is they don’t relax a minute and think about what they DO know first. The third is that they don’t relax a minute and think about what is the right strategy. In short, I think most people (and I am as guilty of this as anyone) don’t spend enough time on the first three steps before jumping right to number four.

**#4 DO IT**

Carry out your strategy.

- The mean is 7.29
- 7.29 -3.6 = 3.69
- Add 3.69 to 7.29 to get 10.98

That’s your answer.

**#5 TEST IT**

Do a reality check. The mean is 7.29 . If it doesn’t fall between your upper and lower confidence limits, you did something wrong.

Check back tomorrow for further proof that these steps can be applied to any statistics problem (and any math problem – maybe any problem in life. )

# Comments

5 Comments so far

## Blogroll

- Andrew Gelman's statistics blog - is far more interesting than the name
- Biological research made interesting
- Interesting economics blog
- Love Stats Blog - How can you not love a market research blog with a name like that?
- Me, twitter - Thoughts on stats
- SAS Blog for the rest of us - Not as funny as some, but twice as smart. If this is for the rest of us, who are those other people?
- Simply Statistics, simply interesting
- Tech News that Doesn’t Suck
- The Endeavor -John D Cook - Another statistics blog

[…] Yesterday, I mentioned this problem […]

[…] Last month, I wrote about the steps to solving any statistics problem. […]

Access the Pizzasales.xls dataset in the documents library. Create a scatter plot of Sales vs. Income and have Excel – plot the regression line as well. Does the picture reveal any likely opportunities to improve your model? Construct a new variable, Comp*Inc, by multiplying the Competitor and Income variable together. Run a regression to predict sales using all three variables: Competitor, Income, and Comp*Inc.

Is the Competitor variable in this model statistically significant?

Estimate the daily sales for a store without competition whose neighborhood income is $300 per week.

Estimate the daily sales for a store with a competitor whose neighborhood income is $300 per week.

Compare your answers to part b and part c. Reconcile the results of this comparison with your answers to part a.

Could u help me answer it on MS Excel

This is a great way to solve this problem. I have also used your technique to solve a similar problem and found it very useful.

In an email, 5 features are extracted. Let n=20 data are observed from this email.

(a) What is your proposed model of data? (Hint: you are allowed to choose freely parameters of the model so that the conditions of the proposed model met.)

(b) What is the probability that we observe 2, 1, 0, 0, 17 data respectively from feature one to five?

(c) What is the probability that we observe at most 4 data from the last feature?