{"id":5680,"date":"2019-01-20T23:17:36","date_gmt":"2019-01-21T04:17:36","guid":{"rendered":"http:\/\/www.thejuliagroup.com\/blog\/?p=5680"},"modified":"2019-01-20T23:18:25","modified_gmt":"2019-01-21T04:18:25","slug":"super-basic-introduction-to-data-analysis","status":"publish","type":"post","link":"https:\/\/www.thejuliagroup.com\/blog\/super-basic-introduction-to-data-analysis\/","title":{"rendered":"SUPER BASIC INTRODUCTION TO DATA ANALYSIS"},"content":{"rendered":"<p>I was going to write more about reading JSON data but that will have to wait because I\u2019m teaching a biostatistics class and I think this will be helpful to them.<\/p>\n<h2>What\u2019s a codebook?<\/h2>\n<p>If you are using even a moderately complex data set, you will want a code book. At a minimum, it will tell you the name of each variable, the type (character, numeric or date), a label, if it has one and its position in the data set. It will also tell you the number of records and number of variables in a data set. In SAS, you can get all of this by running a PROC CONTENTS. (Also from a PROC DATASETS but we don\u2019t cover that procedure in this class.)<\/p>\n<p>So, for the sashelp.heart data set, for example, you would see:<\/p>\n<p><a href=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2019\/01\/Screen-Shot-2019-01-19-at-14.16.38.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-5682\" src=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2019\/01\/Screen-Shot-2019-01-19-at-14.16.38-300x140.png\" alt=\"output from Proc contents\" width=\"450\" height=\"210\" srcset=\"https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2019\/01\/Screen-Shot-2019-01-19-at-14.16.38-300x140.png 300w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2019\/01\/Screen-Shot-2019-01-19-at-14.16.38-768x358.png 768w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2019\/01\/Screen-Shot-2019-01-19-at-14.16.38-1024x477.png 1024w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2019\/01\/Screen-Shot-2019-01-19-at-14.16.38.png 1258w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/a><\/p>\n<p>The variable AgeAtDeath is the 12th variable in the data set. It is numeric, with a length of 8 and the label for it is \u201cAge At Death\u201d. Because it is a numeric variable, if you try to use it for any character functions, like finding a substring, you will get an error. (A substring is a subset of a string, so \u2018ABC\u2019 is a substring of \u2018ABCDE\u2019.)<\/p>\n<p>Similarly, BP_Status is the 15th variable in the data set, it is a character, with a length of 7 and a label of \u201cBlood Pressure Status\u201d. Because it\u2019s a character variable, if you try to do any procedures or functions that expect numeric variables, like find the mean, you will get an error. The label will be used in output, like in the table below.<\/p>\n<p><a href=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2019\/01\/Screen-Shot-2019-01-19-at-14.21.20.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-5683\" src=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2019\/01\/Screen-Shot-2019-01-19-at-14.21.20.png\" alt=\"Frequency distribution of blood pressure status\" width=\"450\" height=\"206\" srcset=\"https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2019\/01\/Screen-Shot-2019-01-19-at-14.21.20.png 1248w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2019\/01\/Screen-Shot-2019-01-19-at-14.21.20-300x137.png 300w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2019\/01\/Screen-Shot-2019-01-19-at-14.21.20-768x351.png 768w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2019\/01\/Screen-Shot-2019-01-19-at-14.21.20-1024x468.png 1024w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/a><\/p>\n<p>This is useful because you may have no idea what BP_Status is supposed to mean. HOWEVER, if you use \u201cBlood Pressure Status\u201d in your statements like the example below, you will get an error.<\/p>\n<p>**** WRONG!!!<br \/>\nProc means data=sashelp.heart ;<br \/>\nVar blood pressure status ;<\/p>\n<p>Seems unfair, but that\u2019s the way it is.<\/p>\n<p>The above statement will assume you want the means for three separate variables named \u201cblood\u201d \u201cpressure\u201d and \u201cstatus\u201d.<\/p>\n<p>There are no variables in the data set named \u201cblood\u201d or \u201cpressure\u201d so you will get an error. There is a variable named \u201cstatus\u201d, but it\u2019s something completely different, a variable telling if the subject is alive or dead.<\/p>\n<p>Even if you don\u2019t have a real codebook available, you should at a minimum start any analysis by doing a PROC CONTENTS so you have the correct variable names and types.<\/p>\n<p>What about these errors I was talking about, though? Where will you see them?<\/p>\n<h2>LOOK AT YOUR SAS LOG!!<\/h2>\n<p>If you are using SAS Studio , it\u2019s the second tab in the middle window, to the right of the tab that says CODE.<\/p>\n<p>Click on that tab and if you have any SYNTAX errors, they will conveniently show up in red.<\/p>\n<p>Also, if you are taking a course and want help from your professor or a classmate, the easiest way for them to help you is if you is to copy and paste your SAS log into an email, or even better, download it and send it as an attachment.<\/p>\n<p>Just because you have no errors in the SAS log doesn\u2019t mean everything is all good, but it\u2019s always the first place you should look.<\/p>\n<p>To get a table of blood pressure status, you may have typed something like<\/p>\n<p>Proc freq data=sashelp.heart ;<br \/>\nTables status ;<\/p>\n<p>That will run without errors but it will give you a table that gives status as alive or dead, not blood pressure as high, normal or optimal.<\/p>\n<p>PROC CONTENTS is a sort of \u201ccodebook light\u201d. A real codebook should also include the mean, minimum, maximum and more for each variable. We\u2019ll talk about that in the next post. Or, who knows, maybe I&#8217;ll finally finish talking about reading in JSON data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was going to write more about reading JSON data but that will have to wait because I\u2019m teaching a biostatistics class and I think this will be helpful to them. What\u2019s a codebook? If you are using even a moderately complex data set, you will want a code book. At a minimum, it will&#8230;<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[9,11,8],"tags":[],"class_list":["post-5680","post","type-post","status-publish","format-standard","hentry","category-software","category-statistics","category-technology"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/5680","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/comments?post=5680"}],"version-history":[{"count":2,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/5680\/revisions"}],"predecessor-version":[{"id":5685,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/5680\/revisions\/5685"}],"wp:attachment":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/media?parent=5680"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/categories?post=5680"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/tags?post=5680"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}