### May

#### 22

# Wilcoxon, Normality, Paired T-test & Smart Boys

May 22, 2011 | 2 Comments

Lately, I’ve been missing some of my former colleagues at the USC Medical School. This is not just because they are super-nice people, which they are, but also because they used to ask for different types of statistics, and I do think variety is the spice of life – except for in marital relationships where it is the spice of divorce courts.

Many of the physicians I’ve worked with deal with small sample sizes, especially if they are just looking at their own practices. Not wanting to violate any confidentiality agreements here, let’s make up a disease, say, fear of naked mole rats, or nakedmoleratophobia . In the normal course of one’s practice, you may only see a couple dozen people a year who have this malady.

Thus, many of the medical studies on which I have been a consultant involve small sample statistics. I haven’t done a lot of that lately, so as I was coveting a Mann-Whitney U (used in place of an independent t-test) or a Wilcoxon signed rank.

I ran through what did I have that could be a small sample and produce an answer to a question that interested me, and here is what I have been thinking about —

I’ve heard most of my life from most of the experts that allowing gifted children to skip grades and attend school with children older is a bad idea. Short version – my brother and I both started college at 16. I thought it was a terrific advantage and two of my daughters began college before age 18. My brother thought it was a bad idea and neither of his children began college early.

I got to wondering specifically about males who were accelerated, since you hear that boys mature later. Another common belief is that boys are less verbal. While I was wondering, I noticed in the TIMSS data that there were 31 males who were younger than the typical age range – that is younger than 13.5 at the time of the test.

I wondered if, given the bias against promoting children, and especially boys, whether these young boys would be exceptionally advanced. I also wondered if they would be doing relatively better in mathematics than in science since, based on my completely casual observations, it seemed like middle school science requires a lot more reading than middle school mathematics does.

Both mathematics and science on the TIMSS are measured on a scale with a mean of 500, so I thought I could compare these using a Wilcoxon signed rank test. In case you didn’t know, this is a non-parametric test used with small sample sizes with related measures. Kind of like a paired t-test for non-normal data.

There are all sorts of statistical packages you could use to do this, and with small sample sizes like the one I have, you can even do it by hand. I happened to use SAS. I was going to try it with SPSS also but that would have required moving at least four feet to the computer and desk behind me. (Yes, my office does have two desks and two computers. What of it?)

It’s quite simple, really. You create a difference score by subtracting one variable from the other and then do a PROC UNIVARIATE. (This page from the University of Delaware gives a few other ways to do it. It also has a picture of a turkey’s head, which is something you don’t see that often. You also don’t hear much from Delaware. They are awfully quiet there. They are probably up to something.)

data smartboys ;

set sm ;

where itsex = 2 and round(bsdage) = 13 ;

diff = BSMMAT01 – BSSSci01 ;

proc univariate data = smartboys ;

var diff ;

This gives me the following:

Tests for Location: Mu0=0

Test -Statistic- —–p Value——

Student’s t t -0.07961 Pr > |t| 0.9371

Sign M -1.5 Pr >= |M| 0.7201

Signed Rank S -24 Pr >= |S| 0.6458

Plus a bunch of other stuff.

Well, clearly, there is a non-significant difference between their scores in mathematics and science. This isn’t very surprising when you learn that their average score in mathematics is 535.7 and in science 537.2 . So, it is a really small difference and not at all what I expected. Also, from looking at the PROC UNIVARIATE output for the mathematics and science scores, it was obvious that the distributions were quite normal and I could have gone ahead and used a paired t-test. When I looked at the t statistic, shown above and helpfully included as part of the univariate output, it can be seen that the difference is even less significant.

HOWEVER — and here is where it is useful and highly recommended to know something about your data – it turns out that the mean scores for the U.S. are anything BUT identical. In fact, the mean for U.S. students in mathematics is 508 with a standard deviation around 77, while the mean for science is about 520 with a mean of 84. So, these young boys are about .37 standard deviations above their peers in mathematics and about .23 standard deviations higher in science. In fact, when I compared them to the other students, these boys WERE significantly higher than their peers in mathematics but not in science.

data testboys ;

set lib.statsfile ;

if itsex = 2 and round(bsdage) = 13 then smb = 1 ;

else smb = 0 ;

proc ttest data = testboys ;

class smb ;

var bsmmat01 bsssci01 ;

I had thought, given that there seems to be a prejudice against starting school early or skipping grades, both in general and especially for boys, that these boys would have to be amazingly ahead of their peers. As you can see, that isn’t the case. Yes, they were ahead, and yes, in mathematics it was statistically significant, but they weren’t far out there on the right of the normal curve.

On the other hand, most of them were doing quite fine, thank you, and being youngest in their classes didn’t seem to be affecting them in any negative way, at least, not academically.

Of course, since it did turn out that the data were quite normal, I could have just simply done a paired t-test, as so:

proc ttest data = smartboys ;

paired BSMMAT01 * BSSSci01 ;

Of course, this will give me the EXACT same result as for the t-test in the univariate output above, with one less step because I don’t need to use a data step and create a variable which is the difference between the two.

However, I got to do my Wilcoxon signed rank test, I got an answer to my questions, in fact, for the question of math vs science, I got two answers, and they both agreed. On top of it all, the world’s most spoiled 13-year-old received a letter today telling her that she was accepted for the Summer Scholars program, despite not being 12, or a boy, (which since it is a program for high-achieving girls, actually worked in her favor).

So, I am satisfied and fulfilled. It’s just another sunny day in paradise.

# Comments

2 Comments so far

## Blogroll

- Andrew Gelman's statistics blog - is far more interesting than the name
- Biological research made interesting
- Interesting economics blog
- Love Stats Blog - How can you not love a market research blog with a name like that?
- Me, twitter - Thoughts on stats
- SAS Blog for the rest of us - Not as funny as some, but twice as smart. If this is for the rest of us, who are those other people?
- Simply Statistics, simply interesting
- Tech News that Doesn’t Suck
- The Endeavor -John D Cook - Another statistics blog

[…] regression or cluster analysis or installing SAS on Unix or whatever. Yes, my discussion of Wilcoxon and normality includes a naked mole rat, but there is a reason for that. Many of the people I work with are really, really smart in their […]

Mole rats are in the news today (in the UK, at least) because their genome has been sequenced and it is hoped that this may offer clues to their longevity: http://www.bbc.co.uk/nature/14031978