{"id":1543,"date":"2011-07-18T10:39:34","date_gmt":"2011-07-18T15:39:34","guid":{"rendered":"http:\/\/www.thejuliagroup.com\/blog\/?p=1543"},"modified":"2011-07-18T10:39:34","modified_gmt":"2011-07-18T15:39:34","slug":"fixing-data-the-easy-way-part-2","status":"publish","type":"post","link":"https:\/\/www.thejuliagroup.com\/blog\/fixing-data-the-easy-way-part-2\/","title":{"rendered":"Fixing data the easy way, part 2"},"content":{"rendered":"<p>I have learned not to be too smart for my own good. Yesterday was an example.<\/p>\n<p>My client provides many different types of services to the consumers who use their program. There are about 15 different options, from counseling to on-the-job training to assistive technology. We want to get the total number of services each participant in their program receives and the average number for the total customer base. However, there are fifteen different variables and for each one, the data are entered &#8220;Yes&#8221; for if the customer received it and &#8220;No&#8221; if they didn&#8217;t. Of course, these data being entered by real human beings and not computers, the data are actually entered, &#8220;Yes&#8221; or &#8220;yes&#8221; or &#8220;YES&#8221; or &#8220;Yes\u00a0\u00a0 &#8221; . You get the idea. And, of course, all of those yes&#8217;s are interpreted differently by the computer.<\/p>\n<p>My first (dumb) thought was to do two array statements, create 15 different variables, all numeric and use a DO-loop to convert to 0=No, 1 = Yes. Then I could sum those variables to get how many services each customer received. I could apply a PROC FORMAT so when the report printed out it would print as &#8220;YES&#8221; and &#8220;NO&#8221;.<\/p>\n<p>There is a much easier way.<br \/>\n<code>Data services ;<br \/>\nset report7_11 ;<br \/>\nArray nums{*} counsel -- Other_services ;<br \/>\nnum_services = 0 ;<br \/>\ndo i = 1 to dim(nums) ;<br \/>\nnums{i} = upcase(nums{i}) ;<br \/>\nif trim(nums{i}) = \"YES\" then num_services + 1 ;<br \/>\nend ;<\/code><\/p>\n<p>Let&#8217;s look at each of these statements &#8230;.<\/p>\n<p>Data services ;<br \/>\nset in.consumers7_11 ;<\/p>\n<p>Well, obviously, the first one creates a data set. The second statement reads in data from a permanent SAS dataset.<br \/>\nArray nums{*} counsel &#8212; Other_services ;<\/p>\n<p>This creates an array of all of the variables in the dataset from the &#8220;counsel&#8221; variable to the &#8220;Other_services&#8221; variable. Conveniently for me, and for the data entry personnel, their dataset was set up so these were 15 consecutive variables.<\/p>\n<p>num_services = 0 ;<\/p>\n<p>This sets my num_services variable to 0. If you DON&#8217;T do this the value will be retained from one iteration of the data step to the next. Your first record will be correct, but for the second person you&#8217;ll get the number of services they received plus the number the first person received. Bad!<\/p>\n<p>do i = 1 to dim(nums) ;<br \/>\nObviously, sets up the DO loop to go from the first variable to however many variables there were in the array. I don&#8217;t know if there were exactly 15 and I was too lazy to count them.<\/p>\n<p>nums{i} = upcase(nums{i}) ;<\/p>\n<p>This changes all of the values to upper case.<\/p>\n<p>if trim(nums{i}) = &#8220;YES&#8221; then num_services + 1 ;<br \/>\nThis trims the trailing blanks for the values and, if the result is &#8220;YES&#8221;\u00a0 (notice it is now upper case with no trailing blanks) the value for num_services is increased by one.<\/p>\n<p>end ;<\/p>\n<p>End the DO-loop and we&#8217;re done.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have learned not to be too smart for my own good. Yesterday was an example. My client provides many different types of services to the consumers who use their program. There are about 15 different options, from counseling to on-the-job training to assistive technology. We want to get the total number of services each&#8230;<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[9],"tags":[],"class_list":["post-1543","post","type-post","status-publish","format-standard","hentry","category-software"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/1543","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/comments?post=1543"}],"version-history":[{"count":2,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/1543\/revisions"}],"predecessor-version":[{"id":1545,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/1543\/revisions\/1545"}],"wp:attachment":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/media?parent=1543"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/categories?post=1543"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/tags?post=1543"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}