{"id":5591,"date":"2018-02-24T18:33:27","date_gmt":"2018-02-24T23:33:27","guid":{"rendered":"http:\/\/www.thejuliagroup.com\/blog\/?p=5591"},"modified":"2018-02-24T20:57:30","modified_gmt":"2018-02-25T01:57:30","slug":"whipping-your-data-into-shape-with-sas-part-1-for-today","status":"publish","type":"post","link":"https:\/\/www.thejuliagroup.com\/blog\/whipping-your-data-into-shape-with-sas-part-1-for-today\/","title":{"rendered":"Whipping your data into shape with SAS : Part 1 for Today"},"content":{"rendered":"<p>I&#8217;m sure I&#8217;ve written about this before &#8211; after all, I&#8217;ve been writing this blog for 10 years &#8211; but here&#8217;s something I&#8217;ve been thinking about:<\/p>\n<h3>Most students don&#8217;t graduate with nearly enough experience with real data.<\/h3>\n<p>You can use government websites with de-identified data from surveys, and I do, but I teach primarily engineering and business students so it would be helpful to have some business data, too. Unfortunately, businesses aren&#8217;t lining up to hand me their financial, inventory and manufacturing data (bunch of jerks!)<\/p>\n<p>So, I <a href=\"https:\/\/itunes.apple.com\/es\/app\/medica-scientific\/id1116555200?mt=8\">downloaded this free app, Medica Scientific from the app store and ran a simulation of data for a medical device company<\/a>. Some friends did the same and this gave me 4 data sets, as if from 4 different companies.<\/p>\n<p>Now, that I have 4 Excel files with the data, before you get to uploading the file, I&#8217;m going to give you a tip. By default, SAS is going to import the first worksheet. So, move the worksheet you want to be first. In this case, it&#8217;s a worksheet named &#8220;Financials&#8221;. Since SAS will use the first worksheet, it could just as well be named &#8220;A whale ate my sandwich&#8221;, but it wouldn&#8217;t be as obvious.<\/p>\n<p>While you are at it, take a look at the data, variable names in the first row.\u00a0 ALWAYS give your data at least a cursory glance. If it is millions of records, opening the file isn&#8217;t feasible and we cover other &#8216;quick looks&#8217; in class.<\/p>\n<p><strong>These steps and the next few use SAS Studio, which is super-duper helpful for online courses.<\/strong><\/p>\n<p>1. Upload the file into the desired directory<br \/>\n2. Under Tasks and Utilities select Utilities and then Import Data<br \/>\n3. Click select file and then navigate to the folder where your file is and click open<br \/>\n4. You&#8217;ll see a bunch of code but nothing actually happens until you click on the little running guy.<\/p>\n<figure id=\"attachment_5592\" aria-describedby=\"caption-attachment-5592\" style=\"width: 450px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2018\/02\/data1.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-5592 size-full\" src=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2018\/02\/data1.jpg\" alt=\"menus to select data to import\" width=\"450\" height=\"174\" srcset=\"https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2018\/02\/data1.jpg 450w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2018\/02\/data1-300x116.jpg 300w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/a><figcaption id=\"caption-attachment-5592\" class=\"wp-caption-text\">First select the data set<\/figcaption><\/figure>\n<figure id=\"attachment_5593\" aria-describedby=\"caption-attachment-5593\" style=\"width: 450px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2018\/02\/data2.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-5593 size-full\" src=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2018\/02\/data2.jpg\" alt=\"the import data window\" width=\"450\" height=\"495\" srcset=\"https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2018\/02\/data2.jpg 450w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2018\/02\/data2-273x300.jpg 273w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/a><figcaption id=\"caption-attachment-5593\" class=\"wp-caption-text\">Have you clicked the running guy? Good!<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>Okay, now you have your code. Not only has SAS imported your data file into SAS, it&#8217;s also written the code for you.<\/p>\n<p><code>FILENAME REFFILE '\/home\/annmaria\/examples\/simulation\/Tech2Demo.xlsx';<br \/>\nPROC IMPORT DATAFILE=REFFILEDBMS=XLSX OUT=WORK.IMPORT1;<br \/>\nGETNAMES=YES;<br \/>\nRUN;<br \/>\nPROC CONTENTS DATA=WORK.IMPORT1;<br \/>\nRUN;<\/code><\/p>\n<p>Now, if you had a nice professor who only gave you one data set, you would be done, which is why I showed you the easy way to do it.<\/p>\n<p>However, very often, we want to compare several factories or departments or whatever it is.<\/p>\n<h3>Also, life comes with problems. Sigh.<\/h3>\n<p>One of your problems, which you&#8217;d notice if you opened the data set is that the variables have names like &#8220;Simulation Day&#8221; .\u00a0 I don&#8217;t want spaces in my variable names.<\/p>\n<p>My second problem is that I need to upload all of my files and concatenate them so I have one long file.<\/p>\n<p>Let&#8217;s attack both of these at once. First, upload the rest of your files.<\/p>\n<p>Now,\u00a0 open a new SAS program and at the top of your file, put this:<\/p>\n<p>OPTION VALIDVARNAME=V7 ;<\/p>\n<p>It will make life easier in general if your variable names don&#8217;t have spaces in them. The option above automatically recodes the variables to valid variable names without spaces.<\/p>\n<p>Now, to import the next 3 files, just create a new SAS program and copy and paste the code created by your IMPORT procedure\u00a0 FOUR TIMES (yes, four).<\/p>\n<p style=\"text-align: center;\"><strong>From Captain Obvious:<\/strong><\/p>\n<figure id=\"attachment_104\" aria-describedby=\"caption-attachment-104\" style=\"width: 450px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2009\/01\/captnobvious.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-104\" src=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2009\/01\/captnobvious-693x1023.jpg\" alt=\"\" width=\"450\" height=\"665\" srcset=\"https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2009\/01\/captnobvious-693x1023.jpg 693w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2009\/01\/captnobvious-203x300.jpg 203w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2009\/01\/captnobvious.jpg 1245w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/a><figcaption id=\"caption-attachment-104\" class=\"wp-caption-text\">Captain Obvious wearing her obvious hat<\/figcaption><\/figure>\n<p>Although you&#8217;d think this would be obvious, experience has shown that I need to say it.<\/p>\n<ul>\n<li>Do NOT copy the code in this blog post. Copy the code produced by your own IMPORT procedure, it will have your own directory name.<\/li>\n<li>Do NOT name every output data set IMPORT1 because if you do, each step will replace the data set and you will end up with one dataset and be sad.<\/li>\n<\/ul>\n<p>Since I want to replace the first file, I&#8217;m going to need to add the REPLACE option in the first PROC IMPORT statement.<br \/>\n<code><br \/>\nOPTION VALIDVARNAME=V7 ;<\/code><\/p>\n<p><code>FILENAME REFFILE '\/home\/annmaria\/examples\/simulation\/Tech2Demo.xlsx';<br \/>\nPROC IMPORT DATAFILE=REFFILEDBMS=XLSX<br \/>\nREPLACE<br \/>\nOUT=WORK.IMPORT1;<br \/>\nGETNAMES=YES;<br \/>\nRUN;<br \/>\nPROC CONTENTS DATA=WORK.IMPORT1;<br \/>\nRUN;<\/code><\/p>\n<p><code>FILENAME REFFILE '\/home\/annmaria\/examples\/simulation\/Tech2Demo2.xlsx';<br \/>\nPROC IMPORT DATAFILE=REFFILEDBMS=XLSX<br \/>\nREPLACE OUT=WORK.IMPORT2;<br \/>\nGETNAMES=YES;<br \/>\nRUN;<br \/>\nPROC CONTENTS DATA=WORK.IMPORT2;<br \/>\nRUN;<\/code><\/p>\n<p>Do that two more times for the last two datasets<\/p>\n<p><strong>Did you need to do the utility? Couldn&#8217;t you just have done the code from the beginning?<\/strong> Yes. I just wanted to show you that the utility existed. If you only had one file and it had valid filenames, which is a very common situation, you would be done at that point.<\/p>\n<p>In a real-life scenario, you would want to merge all of these into one file so you could compare clinics, plants, whatever. Super easy.<\/p>\n<p>[IF you have write access to a directory, you could create a permanent dataset here using a LIBNAME statement, but I&#8217;m going to assume that you are a student and you do not. The default is to write to the working directory. ] ;<\/p>\n<p><code>DATA allplants ;<br \/>\nset import1 - import4 ;<\/code><\/p>\n<p>IF you get an error at this point, what should you do?<\/p>\n<p>There are a few different answers to that question and I will answer them in my next post.<\/p>\n<p><a href=\"https:\/\/www.microsoft.com\/es-cl\/store\/p\/spirit-lake-demo\/9mz1j58bm6ks\">SUPPORT MY DAY JOB . IT&#8217;S FUN AND FREE!<br \/>\n<img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-5510\" src=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2017\/10\/spirit_lake_village.png\" alt=\"\" width=\"450\" height=\"307\" srcset=\"https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2017\/10\/spirit_lake_village.png 1022w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2017\/10\/spirit_lake_village-300x205.png 300w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2017\/10\/spirit_lake_village-768x525.png 768w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/>YOU CAN DOWNLOAD A SPIRIT LAKE DEMO FOR YOUR WINDOWS COMPUTER FROM THE MICROSOFT STORE<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;m sure I&#8217;ve written about this before &#8211; after all, I&#8217;ve been writing this blog for 10 years &#8211; but here&#8217;s something I&#8217;ve been thinking about: Most students don&#8217;t graduate with nearly enough experience with real data. You can use government websites with de-identified data from surveys, and I do, but I teach primarily engineering&#8230;<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[9,11,8],"tags":[26,12,23],"class_list":["post-5591","post","type-post","status-publish","format-standard","hentry","category-software","category-statistics","category-technology","tag-data-management","tag-sas","tag-statistics"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/5591","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/comments?post=5591"}],"version-history":[{"count":5,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/5591\/revisions"}],"predecessor-version":[{"id":5599,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/5591\/revisions\/5599"}],"wp:attachment":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/media?parent=5591"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/categories?post=5591"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/tags?post=5591"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}