{"id":1599,"date":"2011-08-25T22:49:14","date_gmt":"2011-08-26T03:49:14","guid":{"rendered":"http:\/\/www.thejuliagroup.com\/blog\/?p=1599"},"modified":"2011-08-25T22:53:58","modified_gmt":"2011-08-26T03:53:58","slug":"sports-equality-t-tests-and-standard-error","status":"publish","type":"post","link":"https:\/\/www.thejuliagroup.com\/blog\/sports-equality-t-tests-and-standard-error\/","title":{"rendered":"Sports equality, t-tests and standard error"},"content":{"rendered":"<p>Today, taking a break from writing the grant proposal that has no end, I found myself thinking about easy ways to explain and understand standard error.<\/p>\n<p>To understand standard error, you have to have some statistic that you&#8217;re discussing the standard error of. As a random example, let&#8217;s just take the mean.<\/p>\n<p><strong>T-TEST PROCEDURE FOR TESTING FOR DIFFERENCE BETWEEN MEANS<\/strong><br \/>\nLet&#8217;s just say we have a sports organization that is interested in knowing whether there is a significant difference between the numbers of competitors in the male and female divisions each year. Since Title IX passed and we are supposed to be all equal in sports, there should be no significant difference, right?<\/p>\n<p>To test for the difference in means between two groups, we compute a t-test. A t-test done with SAS will give four tables of results. The first one is shown below.<\/p>\n<p><strong>Table 1<\/strong><br \/>\n<strong>First PROC TTEST Table<\/strong><br \/>\nsex\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 N\u00a0\u00a0\u00a0 Mean\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Std Dev\u00a0\u00a0\u00a0 Std Err\u00a0\u00a0\u00a0 Minimum\u00a0\u00a0\u00a0 Maximum<br \/>\nFemale\u00a0\u00a0\u00a0 22\u00a0\u00a0\u00a0 97.0909\u00a0\u00a0\u00a0 15.8201\u00a0\u00a0\u00a0 3.3729\u00a0\u00a0\u00a0 66.0000\u00a0\u00a0\u00a0 119.0<br \/>\nMale\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 22\u00a0\u00a0\u00a0 222.8\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 45.2253\u00a0\u00a0\u00a0 9.6421\u00a0\u00a0\u00a0 146.0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 324.0<br \/>\nDiff (1-2)\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 -125.7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 33.8792\u00a0\u00a0\u00a0 10.2150<\/p>\n<p>There were 22 records for males and 22 for females. The mean number of competitors each year was 97 for females, with a standard deviation of 15.8 , with a range from 66 to 119. For males, the mean number of competitors was almost 223 per year, with a standard deviation of 45, and a range from 146- 324.<\/p>\n<p>What exactly is a standard deviation ? A standard deviation is the average amount by which observations differ from the mean. So, if you pulled out a year at random, you wouldn\u2019t expect it to necessarily have exactly 222.8 male competitors. In fact, it would be pretty tough on that last guy who was the .8! On the other hand, you would be surprised if that year there were only 150 competitors, or if there were 320. On the average, a year will be within 45 competitors of the 223 male athletes and 95% of the years will be within two standard deviations, or, from 132 to 312.\u00a0 That is, 223 &#8211; ( 2 x 45)\u00a0 to 223 + (2 x 45) .<\/p>\n<p><strong>WHAT IS THE STANDARD ERROR AND WHAT DETERMINES IT?<\/strong><br \/>\nBut what is the standard error? The standard error is the average amount by which we can expect our sample mean to differ from the population mean.\u00a0 If we take a different sample of years, say, 1988- 2009, 1991- 2012, all odd numbered years for the last 30 years and so on, each time we\u2019ll get a different mean. It won\u2019t be exactly 97.09 for women and 222.8 for men. Each time, there will be some error in estimating the true population value. Sometimes we\u2019ll have a higher value than the real mean. Sometimes we\u2019ll underestimate it.<\/p>\n<p>On the average, our error in estimate will be 9.64 for men, 3.37 for women.<\/p>\n<p>Why is the standard error for women lower? Because the standard deviation is lower.<\/p>\n<p>The standard error of the mean is the standard deviation divided by the square root of N, where N is your sample size. The square root of 22 is 4.69. If you divide 15.82 by 4.69, you get 3.37.\u00a0 Why the N matters seems somewhat obvious. If you had sampled several hundred thousand tournaments, assuming you did an unbiased sample, you would expect to get a mean pretty close to the true population. If you sampled two tournaments, you wouldn\u2019t be surprised if your mean was pretty far off. We all know this. We walk around with a fairly intuitive understanding of error. If a teacher gives a final exam with only one or two questions, students complain, and rightly so. With such a small sample of items, it\u2019s likely that there is a large amount of error in the teacher\u2019s estimate of the true mean number of items the student could answer correctly. If we hear a survey found that children of mothers who ate tofu during pregnancy scored .5 points higher on a standard mathematics achievement test, and then find out that this was based on a sample of only ten people, we are skeptical about the results.<br \/>\nWhat about the standard deviation? Why does that matter? The smaller the variation in the population, the smaller error there is going to be in our estimate of the means. Let\u2019s go back to our sample of mothers eating tofu during pregnancy. Let\u2019s say that we found that children of those mothers had .5 more heads. So, the average child is born with one head, but half of these ten mothers had babies with two heads, bringing their mean number of heads to 1.5. I\u2019ll bet if that was a true study, it would be enough for you never to eat tofu again. There is very, very little variation in the number of heads per baby, so even with a very small N, you\u2019d expect a small standard error in estimating the mean.<br \/>\nThe second table produced by the TTEST procedure is shown in Table 2 below. Here we have an application of our standard error. We see that the mean for females is 97, with a 95% Confidence Level (CL) from 90.07 to 104.1.\u00a0 That 95% is the mean minus two times the standard error, plus two times the standard error. That is, 97.09 &#8211; (2 x 3.37)\u00a0 to 97.09 + (2 x 3.37).<\/p>\n<p>Why does that sound familiar? Perhaps because it is exactly what we discussed 10 seconds ago about a normal distribution? Yes, errors follow a normal distribution. Errors in estimation should be equally likely to occur above the mean or below the mean. We would not expect very large errors to occur very often. In fact, 95% of the time, our sample mean should be within two standard errors of the mean.<br \/>\n<strong>Table 2<\/strong><br \/>\n<strong>Second PROC TTEST Table<\/strong><\/p>\n<h5>sex\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Method\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Mean\u00a0\u00a0\u00a0\u00a0\u00a0 95% CL Mean\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Std Dev\u00a0\u00a0\u00a0 95% CL Std Dev<br \/>\nFemale\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0 \u00a0\u00a0 \u00a0\u00a0 97.0909\u00a0\u00a0\u00a0 90.0766\u00a0\u00a0\u00a0 104.1\u00a0\u00a0\u00a0 15.8201\u00a0\u00a0\u00a0 12.1713\u00a0\u00a0\u00a0 22.6080<br \/>\nMale\u00a0\u00a0\u00a0 \u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 222.8\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 202.7\u00a0\u00a0\u00a0 242.8\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 45.2253\u00a0\u00a0\u00a0 34.7941\u00a0\u00a0\u00a0 64.6299<br \/>\nDiff (1-2)\u00a0\u00a0\u00a0 Pooled\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 -125.7\u00a0\u00a0\u00a0 -146.3\u00a0\u00a0\u00a0 -105.1\u00a0\u00a0\u00a0 33.8792\u00a0\u00a0\u00a0 27.9348\u00a0\u00a0\u00a0 43.0609<br \/>\nDiff (1-2)\u00a0\u00a0\u00a0 Satterthwaite\u00a0\u00a0\u00a0 -125.7\u00a0\u00a0\u00a0 -146.7\u00a0\u00a0\u00a0 -104.7<\/h5>\n<p>The next two lines both say Diff (1-2) and both show the difference between the\u00a0 two means is -125.7. That is, if you subtract the mean for the number of male competitors from the mean number of female competitors, you get negative 125.7. So, there is a difference of 125.7 between the two means. Is that statistically significant? How often would a difference this large occur by chance? To answer this question we look at the next table. It gives us two answers. The first method is used when the variances are equal. If the variances are unequal, we would use the statistics shown on the second line. In this instance, both give us the same conclusion, that is, the probability of finding a difference between means this large if the population values were equal is less than 1 in 10,000. That is the value you see under the PRobability &gt; absolute value of t. If you were writing this up in a report, you would say,<\/p>\n<blockquote><p>\u201cThere were, on the average 126 fewer female competitors each year than males.\u00a0 This difference was statistically significant (t = -12.30, p &lt;.0001).\u201d<\/p><\/blockquote>\n<p><strong>Table 3<\/strong><br \/>\n<strong>Third PROC TTEST Table<\/strong><br \/>\nMethod\u00a0\u00a0\u00a0 Variances\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 DF\u00a0\u00a0\u00a0 t Value\u00a0\u00a0\u00a0 Pr &gt; |t|<br \/>\nPooled\u00a0\u00a0\u00a0 Equal\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 42\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 -12.30\u00a0\u00a0\u00a0 &lt;.0001<br \/>\nSatterthwaite\u00a0\u00a0\u00a0 Unequal\u00a0\u00a0\u00a0 26.064\u00a0\u00a0\u00a0 -12.30\u00a0\u00a0\u00a0 &lt;.0001<\/p>\n<p>In this case the t-values and probability are the same, but what if they are not? How do we know which of those two methods to use?<\/p>\n<p>This is where our fourth, and final table from the TTEST procedure comes into use. This is the test for equality of variances. The test statistic in this case is the F value. We see the probability of a greater F is &lt; .0001. This means that we would only get an F-value larger than this 1 in 10,000 times if the variances were really equal in the population.\u00a0 Since that is a really large number, and the normal cut-off for statistical significance is p &lt; .05\u00a0 and\u00a0 .0001 is a LOT less than .05, we would say that there is a statistically significant difference between the variances. That is, they are unequal. We would use the second line in Table 3 above to make our decision about whether or not the differences in means are statistically significant.<\/p>\n<p><strong>Table 4<\/strong><br \/>\n<strong>Fourth PROC TTEST Table<\/strong><br \/>\n<strong>Equality of Variances<\/strong><br \/>\nMethod\u00a0\u00a0\u00a0 Num DF\u00a0\u00a0\u00a0 Den DF\u00a0\u00a0\u00a0 F Value\u00a0\u00a0\u00a0 Pr &gt; F<br \/>\nFolded F\u00a0\u00a0\u00a0 21\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 21\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 8.17\u00a0\u00a0\u00a0 &lt;.0001<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today, taking a break from writing the grant proposal that has no end, I found myself thinking about easy ways to explain and understand standard error. To understand standard error, you have to have some statistic that you&#8217;re discussing the standard error of. As a random example, let&#8217;s just take the mean. T-TEST PROCEDURE FOR&#8230;<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[11],"tags":[],"class_list":["post-1599","post","type-post","status-publish","format-standard","hentry","category-statistics"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/1599","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/comments?post=1599"}],"version-history":[{"count":3,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/1599\/revisions"}],"predecessor-version":[{"id":1601,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/1599\/revisions\/1601"}],"wp:attachment":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/media?parent=1599"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/categories?post=1599"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/tags?post=1599"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}