{"id":4249,"date":"2014-09-08T03:36:43","date_gmt":"2014-09-08T08:36:43","guid":{"rendered":"http:\/\/www.thejuliagroup.com\/blog\/?p=4249"},"modified":"2014-09-08T12:10:50","modified_gmt":"2014-09-08T17:10:50","slug":"interestingness-from-wuss-part-2-condensing-big-data","status":"publish","type":"post","link":"https:\/\/www.thejuliagroup.com\/blog\/interestingness-from-wuss-part-2-condensing-big-data\/","title":{"rendered":"Interestingness from WUSS: Part 2 Condensing Big Data"},"content":{"rendered":"<p>Sometimes the benefits of attending a conference aren&#8217;t so much the specific sessions you attend as the ideas they spark. One example was at the Western Users of SAS Software conference last week. I was sitting in a session on PROC PHREG and the presenter was talking about analyzing the covariance matrix when it hit me &#8212;<\/p>\n<p>Earlier in the week, Rebecca Ottesen (from Cal Poly) and I had been discussing the limitations of directory size with SAS Studio. You can only have 2 GB of data in a course directory. Well, that&#8217;s not very big data, now, is it?<\/p>\n<p>It&#8217;s a very reasonable limit for SAS to impose. They can&#8217;t go around hosting terabytes of data for each course.<\/p>\n<p>If you, the professor, have a regular SAS license, which many professors do, you can create a covariance matrix for your students to analyze. Even if you include 500 variables, that&#8217;s going to be a pretty tiny dataset but it has\u00a0the data you would need for a lot of analyses &#8211; factor analysis, structural equation models, regression.<\/p>\n<p>Creating a covariance data set is a piece of cake. Just do this:<\/p>\n<p>proc corr data=sashelp.heart cov outp=mydata.test2 ;<br \/>\nvar ageatdeath ageatstart ageCHDdiag ;<\/p>\n<p>The COV option requests the covariances and the OUTP option has those written to a SAS data set.<\/p>\n<p>If you don&#8217;t have access to a high performance computer and have to run the analysis on your desktop, you are going to be somewhat limited, but far less than just using SAS Studio.<\/p>\n<p>So &#8212; create a covariance matrix and have them analyze that. Pretty obvious and I don&#8217;t know why I haven&#8217;t been doing it all along.<\/p>\n<p>What about means, frequencies and chi-square and all that, though?<\/p>\n<p>Well, really, the output from a PROC FREQ can condense your data down dramatically. Say I have 10,000,000 people and I want\u00a0age at death, blood pressure status, cholesterol status, cause of death and smoking status. I can create an output data set like this. (Not that the heart data set has 10,000,000 records but you get the idea.)<\/p>\n<p>Proc freq data= sashelp.heart ;<br \/>\nTables AgeAtDeath<br \/>\n*BP_Status<br \/>\n*Chol_Status<br \/>\n*DeathCause<br \/>\n*Sex<br \/>\n*Smoking \/noprint out=mydata.test1;<\/p>\n<p>This creates a data set with a count\u00a0variable, which you can use in your WEIGHT statement in just about any procedure, like<\/p>\n<p>proc means data = test1 ;<\/p>\n<p>weight count ;<\/p>\n<p>var ageatdeath ;<\/p>\n<p>&nbsp;<\/p>\n<p>Really, you can create &#8220;cubes&#8221; and analyze your big data on SAS Studio that way.<\/p>\n<p>Yeah, obvious, I know, but I hadn&#8217;t been doing it with my students.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Sometimes the benefits of attending a conference aren&#8217;t so much the specific sessions you attend as the ideas they spark. One example was at the Western Users of SAS Software conference last week. I was sitting in a session on PROC PHREG and the presenter was talking about analyzing the covariance matrix when it hit&#8230;<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[9,11,8],"tags":[],"class_list":["post-4249","post","type-post","status-publish","format-standard","hentry","category-software","category-statistics","category-technology"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/4249","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/comments?post=4249"}],"version-history":[{"count":3,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/4249\/revisions"}],"predecessor-version":[{"id":4252,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/4249\/revisions\/4252"}],"wp:attachment":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/media?parent=4249"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/categories?post=4249"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/tags?post=4249"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}