{"id":2674,"date":"2012-10-02T01:03:15","date_gmt":"2012-10-02T06:03:15","guid":{"rendered":"http:\/\/www.thejuliagroup.com\/blog\/?p=2674"},"modified":"2012-10-02T01:08:00","modified_gmt":"2012-10-02T06:08:00","slug":"little-sas-tips-while-analyzing-timss-data","status":"publish","type":"post","link":"https:\/\/www.thejuliagroup.com\/blog\/little-sas-tips-while-analyzing-timss-data\/","title":{"rendered":"Little SAS tips (&#038; special characters) while analyzing TIMSS data"},"content":{"rendered":"<p>For reasons I may explain later &#8211; or maybe not &#8211; I decided to analyze the <a href=\"http:\/\/nces.ed.gov\/timss\/\">TIMSS data, which is Trends in International \u00a0Mathematics and Science Study<\/a>.<\/p>\n<p><strong>Use a colon: Nifty tip #1<\/strong><br \/>\n** Ran this first ***<\/p>\n<p>libname LIB &#8216;C:\\TIMSS2007\\Data&#8217;;<\/p>\n<p>proc contents data =\u00a0lib.G4_ACHIEVE07;<\/p>\n<blockquote><p>*** Modified to only keep math items ;<br \/>\ndata lib.G4_ACHIEVE07;<br \/>\nset lib.G4_ACHIEVE07;<br \/>\ndrop s: ;<\/p><\/blockquote>\n<p>I was only interested in the mathematics items, not the science ones, and since I did not want 170+ items cluttering up my data set, I used the statement below<\/p>\n<p>DROP S: ;<\/p>\n<p>This statement drops all of the variables beginning with S. You should be cautious of this, because \u00a0if there is a variable with a name like STUDENT_ID that will be dropped also. \u00a0This is why I ran the PROC CONTENTS first and verified that all of the science items and only t hose items began with an S.<\/p>\n<p><strong>Nifty tip #2 \u00a0&#8211; use a %INCLUDE statement<\/strong><\/p>\n<p>It only appears that the point of today&#8217;s blog is to include all possible special characters. That is merely a fringe benefit.<\/p>\n<p>The %INCLUDE statement essentially copies and pastes code from another file into your program in the spot where you inserted it. I like it for things like 400 lines of formats because just like I don&#8217;t like extra variables cluttering up my data set, I don&#8217;t like extraneous lines cluttering up my code. I do need to use the PROC FORMAT but I don&#8217;t need to see it every time I run the program and I do not want to store it permanently.<\/p>\n<blockquote><p>%include &#8220;c:\\timss2007\\programs\\achievefmts.sas&#8221; ;<\/p><\/blockquote>\n<p>Problem solved.<\/p>\n<p><strong>Nifty tip #3 \u00a0Use the LINESIZE option to see all of your results on one line<\/strong><\/p>\n<p>I am easily annoyed. If you read my blog often, you know that this has been established. If I have 200 variables and the minimum and maximum does not fit on the same line as the mean because the label is<\/p>\n<p>&#8220;This is that question where we asked the student about long division which involves dividing a two- digit number into a three-digit number and has a remainder&#8221;<\/p>\n<p>and then you have the mean and need to scroll down 200 lines to see the minimum and another 200 lines to see the maximum, well it&#8217;s annoying.<\/p>\n<p>Do this:<\/p>\n<blockquote><p>OPTIONS LINESIZE = 255 ;<\/p><\/blockquote>\n<p>or whatever large number you like. No, I don&#8217;t have paper that is that wide, but I&#8217;m not planning on printing this out, I just want to scroll through and see that the minimum, maximum, mean and standard deviation are reasonable.<\/p>\n<p><strong>Nifty tip #4 Use a temporary data step to find the number of variables<\/strong><\/p>\n<p>No, smart ass, PROC CONTENTS would NOT do this. I want to know how many math items there are, not how many total. The math items (now that I deleted the science ones above) are in order. I run this statement, look in my log and it tells me there are 178 variables in the data set.<\/p>\n<blockquote><p>data test ;<br \/>\nset lib.G4_ACHIEVE07;<br \/>\nkeep M031106 &#8211;M041191 ;<\/p><\/blockquote>\n<p><strong>Nifty tips #5 \u00a0and #6 &#8211; Create an array and use the VVALUE function to score data<\/strong><\/p>\n<p>TIMSS has formats (remember the %INCLUDE ) that are things like 98 = &#8220;NOT ADMIN.&#8221; , 10 = &#8220;CORRECT RESPONSE&#8221;.<\/p>\n<blockquote><p>data lib.G4_scored ;<br \/>\nset lib.G4_ACHIEVE07;<br \/>\narray ans{*} M031106 &#8211;M041191 ;<br \/>\narray sc{*} $18 tmp1 &#8211; tmp178 ;<\/p><\/blockquote>\n<p>I created an array of the mathematics items and the ans{*} says to create an array of dimension however many variables there are between\u00a0M031106 and M041191 . The double dashes signify between as in &#8220;between locations in the data set&#8221; with M031106 coming first. If you use one dash SAS assumes the variables are numbered\u00a0M031106 \u00a0M031107 all the way toM041190\u00a0M041191. Which they are not. Do double dashes count as two special characters or only one?<\/p>\n<p>I could have used 178 instead of * since I actually knew there were 178 variables, but I wanted to throw in another special character. Yes, I am immature. That was established long ago. The $18 denotes this as an array of character variables and assigns them all a length of 18 which is the length of the maximum formatted response. Also, a $ is another special character.<\/p>\n<blockquote><p>do i = 1 to 178 ;<br \/>\nsc{i} = trim(vvalue(ans{i})) ;<br \/>\nif sc{i} in (&#8220;INCORRECT RESPONSE&#8221;,&#8221;NOT REACHED&#8221;, &#8220;OMITTED&#8221;) then ans{i} = 0 ;<br \/>\nelse if sc{i} = &#8220;CORRECT RESPONSE&#8221; then ans{i} = 1 ;<br \/>\nelse if sc{i} = &#8221; PARTIAL RESPONSE&#8221; then ans{i} = .5 ;<br \/>\nelse if sc{i} = &#8220;NOT ADMIN.&#8221; then ans{i} = . ;<br \/>\nend ;<br \/>\ndrop tmp1 &#8211; tmp178 M031002 M031223;<\/p><\/blockquote>\n<p>Here I have my handy do-loop and a VVALUE function. You can use VVALUE when you don&#8217;t know the variable format, or, as in my case, are too lazy to look it up and type it in. The formatted value of ans{i} , whatever that format might be, is put into sc{i}. I also used the TRIM function to trim trailing blanks while I was at it.<\/p>\n<p>Now that I have scored all of the items to suit my nefarious purposes, I drop the temporary variables as well as two variables that it turned out are questions not administered to anyone.<\/p>\n<p>And that, is my nifty SAS tips of the night.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For reasons I may explain later &#8211; or maybe not &#8211; I decided to analyze the TIMSS data, which is Trends in International \u00a0Mathematics and Science Study. Use a colon: Nifty tip #1 ** Ran this first *** libname LIB &#8216;C:\\TIMSS2007\\Data&#8217;; proc contents data =\u00a0lib.G4_ACHIEVE07; *** Modified to only keep math items ; data lib.G4_ACHIEVE07;&#8230;<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[9,8],"tags":[],"class_list":["post-2674","post","type-post","status-publish","format-standard","hentry","category-software","category-technology"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/2674","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/comments?post=2674"}],"version-history":[{"count":3,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/2674\/revisions"}],"predecessor-version":[{"id":2676,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/2674\/revisions\/2676"}],"wp:attachment":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/media?parent=2674"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/categories?post=2674"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/tags?post=2674"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}