{"id":2102,"date":"2012-02-16T03:27:02","date_gmt":"2012-02-16T08:27:02","guid":{"rendered":"http:\/\/www.thejuliagroup.com\/blog\/?p=2102"},"modified":"2012-02-21T21:39:04","modified_gmt":"2012-02-22T02:39:04","slug":"learning-advanced-sas-from-a-macro","status":"publish","type":"post","link":"https:\/\/www.thejuliagroup.com\/blog\/learning-advanced-sas-from-a-macro\/","title":{"rendered":"Learning Advanced SAS from a Macro"},"content":{"rendered":"<p>I was using <a href=\"http:\/\/www.lexjansen.com\/pharmasug\/2006\/publichealthresearch\/pr05.pdf\">a macro this week written by Feng, Yu &amp; Xu<\/a> and there were so many less-used and advanced techniques in it I thought it would be a great example to use for teaching SAS.<\/p>\n<p><strong>** ** Nifty thing one computing the caliper width\u00a0 ***** ;<\/strong><\/p>\n<p>Like with other propensity score macros, it started with a logistic regression. This was to do a matching with calipers and since a lot of people use .25* the logit as the caliper value, I need to calculate that. Notice the xbeta = logit on the\u00a0 OUTPUT statement in the PROC LOGISTIC step.<\/p>\n<p>%macro match(data, class, yvar, xvar, id);<br \/>\n**Perform logistic regression, select independent variables, and create the propensity<br \/>\nscore***;<br \/>\nPROC LOGISTIC descending data=&amp;data noprint;<br \/>\n&amp;class;<br \/>\nmodel &amp;yvar=&amp;xvar;<br \/>\nOUTPUT =propen(drop=_level_) prob=prob XBETA = LOGIT ;<br \/>\nrun;<\/p>\n<p><strong>****\u00a0 Here they use PROC UNIVARIATE to output the standard deviation of the logit. The result is a file with one record\u00a0 ;<\/strong><br \/>\n\/* Calculate standard deviation of the logit *\/<\/p>\n<p>proc univariate data=propen noprint;<br \/>\nvar logit;<br \/>\noutput out=propen_sd std=sd;<br \/>\nrun;<\/p>\n<p>*** This is pretty standard just outputting to case and control files, giving the control file a random number;<\/p>\n<p>***create datasets of case and control, which contain patients from case and control groups ***;<br \/>\ndata ctrl case;<br \/>\nset propen;<br \/>\nif &amp;yvar=1 and prob ne . then do;<br \/>\nrannum=ranuni(1234567);<br \/>\nlabel rannum=&#8217;Uniform Randomization Score&#8217;;<br \/>\noutput ctrl;<br \/>\nend;<br \/>\nelse if &amp;yvar=0 and prob ne . then do;<br \/>\noutput case;<br \/>\nend;<\/p>\n<p><strong>**** Nifty thing two: Adding the standard deviation of the logit to every record.<\/strong><\/p>\n<p>data ctrl;<br \/>\nif _n_=1 then set propen_sd(keep=sd);<br \/>\nset ctrl;<br \/>\nrun;<\/p>\n<p>I had to look this up. Did you know that using a conditional SET statement causes you to RETAIN every variable in that dataset. Notice that data set only has one variable, due to the KEEP option that follows it. The result is that the value of sd, that is, the SD of the logit, is added to every record in the ctrl file.<\/p>\n<p>Thanks to <a href=\"www2.sas.com\/proceedings\/sugi25\/25\/po\/25p222.pdf\">Paul McDonald who wrote a paper for SAS Global Forum If _N_ = 1 then set <\/a>. Yes, that was actually the name of it.<\/p>\n<p>Then, they sort by the random number, not too\u00a0 interesting.<\/p>\n<p>***sort ctrl by random number generated ***;<br \/>\nproc sort data=ctrl;<br \/>\nby rannum;<br \/>\nrun;<\/p>\n<p><strong>*** Nifty thing three, if you are learning macros , an example of CALL SYMPUT, plus creating a macro variable that is the total N ;<\/strong><\/p>\n<p>\/* The tot macro variable is created here *\/<\/p>\n<p>data _null_;<br \/>\nset ctrl;<br \/>\nby sd;<br \/>\nif last.sd then call symput(&#8216;tot&#8217;,put(_n_,8.));<br \/>\nrun;<\/p>\n<p>This is nifty in the first place because it uses CALL SYMPUT to put the value of\u00a0 _N_ into a newly created macro variable called tot . Notice they did not actually bother sorting by sd, since it is the same for every record. However, they used the BY sd and then LAST.SD . That means the IF statement will be true on the last record in the data set, since everyone has the same value for SD. On the last record, the value of _N_ is the last record in the data set. I found this very clever because there are many times when people would want to know the number of records and put that into a macro variable and there are multiple ways to do it. This is one I had not thought of, plus it demonstrates the CALL SYMPUT so double learning points.<\/p>\n<p><strong>*** Nifty macro learning things four, five and six<\/strong><\/p>\n<p>First, they just create a data set (boring!) then they use the %LET statement and the %EVAL function. The %EVAL function is pretty essential in SAS macro language. It evaluates the value of a variable as a number instead of text. By default, SAS is going to treat the value of any macro variable as text. So, for example, 1 +2\u00a0\u00a0 is not evaluated as 3.\u00a0 %EVAL only works with integers but since the number of records is always going to be an integer, you&#8217;re good here.<br \/>\ndata case_temp;<br \/>\nset case;<br \/>\n*** select one patient at a timee time from dataset ctrl, and search for matched-pair in\u00a0 case group ***;<br \/>\n%let j=%eval(&amp;tot);<\/p>\n<p>Next, they use the value of j, which is now the total number of subjects in the data set, in a DO statement, so that they are going to do these statements for every person in the control group. A &#8220;caliper&#8221; is set up for each person so that is the logit for that subject plus or minus .25* the standard deviation of the logit. All the cases that fall within that range are collected in a data set named match_temp<br \/>\n%do i=1 %to &amp;j;<br \/>\ndata ctrl_temp;<br \/>\nset ctrl;<br \/>\nif _n_=&amp;i;<br \/>\nlow=logit &#8211; 0.25*sd;<br \/>\nup =logit + 0.25*sd;<br \/>\nrun;<br \/>\ndata match_temp;<br \/>\nif _n_=1 then set ctrl_temp(keep=low up);<br \/>\nset case_temp(drop=rannum);<br \/>\nif low&lt;=logit&lt;=up;<br \/>\nrun;<\/p>\n<p><strong>*** Nifty thing seven &#8211; an illustration of a use of PROC SQL.<\/strong><\/p>\n<p>*************************************************************************************;<br \/>\n*** calculate number of patients who were included in the match dataset\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ***\u00a0 ;<\/p>\n<p>*** 1) if there is no match found, then select next patient from case group\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ***<br \/>\n*** 2) if there is one matched, select this patient and go to next round\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ***<br \/>\n*** 3) if there are more than 1 patients, calculate Mahalanobis distance and\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ***<br \/>\n*** select the patient with smallest distance, then go to next round ***<br \/>\n*************************************************************************************;<\/p>\n<p>Here they use PROC SQL for another way of finding the number of records in a file (I told you there were multiple ways). They create a new variable that is the count of IDs, which is going to be\u00a0 the number of people in the data set since each one has a unique ID.<br \/>\nproc sql;<br \/>\ncreate table casen<br \/>\nas select count(&amp;id) as n<br \/>\nfrom match_temp;<br \/>\nquit;<br \/>\ndata _null_;<br \/>\nset casen;<br \/>\nif _n_=1 then call symput(&#8216;casen&#8217;,put(n,8.));<br \/>\nrun;<\/p>\n<p>This is followed by some more %IF , %THEN and %DO loops that, if no match is found, goes on to the next case, and if only one matches is found, keeps that in the matched data set we are building.<\/p>\n<p>If more than one match is found &#8212; well, that&#8217;s where it gets really interesting.<\/p>\n<p>&nbsp;<\/p>\n<p>&#8212;&#8211;\u00a0 Tune in next time for the rest of the story, as Paul Harvey used to say.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was using a macro this week written by Feng, Yu &amp; Xu and there were so many less-used and advanced techniques in it I thought it would be a great example to use for teaching SAS. ** ** Nifty thing one computing the caliper width\u00a0 ***** ; Like with other propensity score macros, it&#8230;<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[9,11,8],"tags":[],"class_list":["post-2102","post","type-post","status-publish","format-standard","hentry","category-software","category-statistics","category-technology"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/2102","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/comments?post=2102"}],"version-history":[{"count":5,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/2102\/revisions"}],"predecessor-version":[{"id":2130,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/2102\/revisions\/2130"}],"wp:attachment":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/media?parent=2102"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/categories?post=2102"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/tags?post=2102"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}