{"id":1175,"date":"2011-03-22T02:54:18","date_gmt":"2011-03-22T07:54:18","guid":{"rendered":"http:\/\/www.thejuliagroup.com\/blog\/?p=1175"},"modified":"2011-03-22T02:54:18","modified_gmt":"2011-03-22T07:54:18","slug":"failing-forward-my-excellent-adventure-with-microdata-continues","status":"publish","type":"post","link":"https:\/\/www.thejuliagroup.com\/blog\/failing-forward-my-excellent-adventure-with-microdata-continues\/","title":{"rendered":"Failing Forward: My Excellent Adventure with Microdata Continues"},"content":{"rendered":"<p><strong>I was very thrilled to be invited to speak to six classes of seventh- and eighth-grade students at an urban school. Actually, they wanted me to speak to seven classes but there is no way on earth I am getting up at 6:30 a.m. or whatever ungodly hour would be required for me to make it to an 8 a.m. class.<br \/>\n<\/strong><\/p>\n<p>These students live in an area where basically everything you want to be low is high &#8211; poverty, crime, unemployment\u00a0 &#8211;\u00a0 and everything you want to be high is low &#8211; education, income, fluency in English. I spoke to a teacher at a similar school and she said her students were very interested in issues of race and inequality. In her words,<\/p>\n<blockquote><p>&#8220;My students aren&#8217;t stupid. They&#8217;re getting screwed in America and they know it. There just isn&#8217;t anything they can do about it because they&#8217;re all, like, thirteen years old.&#8221;<\/p><\/blockquote>\n<p><strong>Failure #1: Summary Tables <\/strong><\/p>\n<p>My initial thought was to download the summary tables from the census site, read those into JMP file and create the graphics from there.<\/p>\n<p>I  tried downloading some summary tables of those  characteristics but concluded after a day of messing with trying to get  the data into the format I wanted that it would be far easier to do it in SAS. Now, if you don&#8217;t know SAS, it probably would take you more time to learn it than to to just go ahead and use JMP but, hey, at the end you&#8217;d know SAS. (Note to self: Learn more about JSL).<\/p>\n<p>Everyone is complaining about the price of SAS products and I have always been at a university or corporation that paid for the license. So, I thought I would actually check the price, and holy shit, this stuff is expensive. I thought perhaps I should see what I could do in OpenOffice spreadsheet on Unix just in case I run out of clients or employers with licenses and have to pay for the software myself. Unfortunately, the spreadsheet application has a limit of 65,000 rows. Also, it occurred to me that if I didn&#8217;t have any clients, I wouldn&#8217;t have that much need for the software, now would I?<\/p>\n<p>Anyway &#8230; the whole summary tables thing didn&#8217;t work out because I wanted variables defined more simply than the Census Bureau did, because this is for a middle school class. For example, I wanted two categories, employed and unemployed, rather than the six the census uses.<\/p>\n<p><strong>Failure #2: JMP<\/strong><\/p>\n<p>I downloaded the <a href=\"http:\/\/www.census.gov\/acs\/www\/data_documentation\/documentation_main\/\">PUMS (as in Public Use Microdata Set<\/a>) from the American Community Survey for California. This is a 1% sample of the state &#8211; 352,875 people, to be exact. You can download it as a SAS dataset, which I would recommend.<\/p>\n<p>My initial thought was to download it, do a few data manipulations with SAS, then output it to a JMP file and create the graphics from there. I like JMP in part because it does good graphics easier than SAS and because it runs native on Mac OS. Using a Windows requires moving 45 feet to a computer downstairs or waiting approximately 15 seconds for a virtual machine to open, thus negatively impacting my quality of life by requiring movement or waiting. I am an American, after all.<\/p>\n<p>Hold that thought &#8211; I still think SAS to JMP is a good idea but there turns out to be a bit more massaging of the data necessary than I had originally planned.<\/p>\n<p><strong>Success? Data Massaging with SAS<\/strong><\/p>\n<p>As I mentioned previously, finding how many people are unemployed was not a simple matter of looking at how many people said they were unemployed, although that would certainly seem like a reasonable way to do it. It also seems reasonable to raise taxes on people and corporations making over $10 million a year and fund health care, education and fire departments, but we don&#8217;t see that happening, now, do we?<\/p>\n<p>My first thought was to create a dataset that was a subset of the variables I needed in SAS, do a little bit of recoding and then run the graphs using Enterprise Guide. This did not completely work out.<\/p>\n<p>In trying to find a way to use a weight variable in graphs for SAS Enterprise Guide (which I never figured out in the twenty minutes I spent on it), I came across this <a href=\"http:\/\/www.freakalytics.com\/2007\/09\/26\/the-joy-of-sas-enterprise-guide\/\">Freakalytics blog on The Joy of SAS Enterprise Guide<\/a>. While I agreed with most of his points, one point made by a dissenter in the comment section I have to agree with &#8211; SAS Enterprise Guide IS slow. I&#8217;m running it under VMware on a Mac with 2GB allocated to the virtual machine and I still have something up in the background like a bid I&#8217;m working on, or am answering email while I wait for the next step to pop up. How annoying it is to you no doubt depends both on your work habits and patience. Since I probably think 60 times a day, &#8220;Well, what about this?&#8221; waiting a minute for each analysis takes an hour of my time each day. Yes, I do remember when we would submit jobs to run overnight and pick up our output and that green and white lined computer paper the next morning. No, I am sure the fact that I was using a dataset with 325,000 or so records didn&#8217;t help. Your point being?<\/p>\n<p>I ended up doing some recoding of the data in SAS 9.2, then opened the dataset I saved, with just the variables of interest, in SAS Enterprise Guide. Even though I thought I had done the recoding I wanted in SAS, I still ended up at several steps creating a filter to, for example, only include the working age population, or recoding race to drop the &#8220;other&#8221; category.<\/p>\n<p>Again, a lot of this was due to my need to reduce the categories to better fit the level of the group to whom I was presenting. Finally, I lumped all the things I needed to recode, subset, etc. into one program and ran that in Enterprise Guide, then went ahead with my analyses.<\/p>\n<p>Three points regarding Enterprise Guide.<\/p>\n<p>First, I had thought maybe I could just take 10% or so of the 325,000 and use Excel or OpenOffice. If it was a random sample, it should do almost as well. One reason I am very glad I did not choose that route that is the MODIFY TASK option in Enterprise Guide. I am constantly wanting to look at results in a different way, and this allows me to do that without starting all over. Inhabiting some parallel universe where this looked like a good idea, Microsoft has made it so the latest version of Office for the Mac doesn&#8217;t allow you to record macros. How much does that blow? Answer: A lot<\/p>\n<p>Second, I think the SUMMARY TABLES option is much better than TABLE ANALYSIS for the types of things I needed to do. It just allows a lot more flexibility.<\/p>\n<p>Third, since I didn&#8217;t see any way to use a weight variable and then get a percentage in a pie chart, I ended up doing summary tables, outputting the results to a dataset and then analyzing that dataset. I did compare weighted and unweighted results and it really did not make that much difference.<\/p>\n<p>Once I started getting results, was both the fun and the depressing part. Actually doing the statistics is the fun part, but the results were not what I would have if I ran the world. Several times, I have re-run results, or compared them with census data for the state or nation thinking this can&#8217;t possibly be right &#8211; but it is.<\/p>\n<p>Below is the distribution of income. In California, median personal income, that is, the income for the median person surveyed, was under $24,000 per year. I looked it up in a frequency table to get a more precise estimate and the amount was $20,000. <a href=\"http:\/\/www.census.gov\/prod\/2010pubs\/acsbr09-3.pdf\">The census gives a more cheery $26,000 or so. <\/a>The difference is due to the fact that their estimate is based on &#8220;full-time year-round workers&#8221; whereas mine was based on &#8220;people of working age&#8221;, which I defined as 18 &#8211; 62.<\/p>\n<p>For people who want to believe we have a fairly egalitarian society, this is a pretty depressing chart. And it gets worse &#8230;<\/p>\n<figure id=\"attachment_1177\" aria-describedby=\"caption-attachment-1177\" style=\"width: 800px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2011\/03\/incomedistribution.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1177\" title=\"incomedistribution\" src=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2011\/03\/incomedistribution.jpg\" alt=\"Graph of income distribution\" width=\"800\" height=\"600\" srcset=\"https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2011\/03\/incomedistribution.jpg 800w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2011\/03\/incomedistribution-300x225.jpg 300w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/a><figcaption id=\"caption-attachment-1177\" class=\"wp-caption-text\">Distribution of Personal Income in California<\/figcaption><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>I was very thrilled to be invited to speak to six classes of seventh- and eighth-grade students at an urban school. Actually, they wanted me to speak to seven classes but there is no way on earth I am getting up at 6:30 a.m. or whatever ungodly hour would be required for me to make&#8230;<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[9,11],"tags":[],"class_list":["post-1175","post","type-post","status-publish","format-standard","hentry","category-software","category-statistics"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/1175","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/comments?post=1175"}],"version-history":[{"count":3,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/1175\/revisions"}],"predecessor-version":[{"id":1179,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/1175\/revisions\/1179"}],"wp:attachment":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/media?parent=1175"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/categories?post=1175"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/tags?post=1175"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}