{"id":2104,"date":"2012-02-13T05:41:04","date_gmt":"2012-02-13T10:41:04","guid":{"rendered":"http:\/\/www.thejuliagroup.com\/blog\/?p=2104"},"modified":"2012-02-13T05:46:36","modified_gmt":"2012-02-13T10:46:36","slug":"spss-propensity-scores-part-2","status":"publish","type":"post","link":"https:\/\/www.thejuliagroup.com\/blog\/spss-propensity-scores-part-2\/","title":{"rendered":"SPSS Propensity Scores &#8211; Part 2"},"content":{"rendered":"<p><a href=\"http:\/\/www.thejuliagroup.com\/blog\/?p=216\">I wrote Part 1 a couple of years ago<\/a>, so I guess I&#8217;m due for a part 2. In this case, I started with a data set in SAS but because it was going to be used by a group who had some SAS users and some SPSS users, they wanted to have the code for both SPSS and SAS.<\/p>\n<p><a href=\"http:\/\/www.spsstools.net\/Syntax\/RandomSampling\/MatchCasesOnBasisOfPropensityScores.txt\">Levesque wrote a lovely macro years ago to do propensity score matching<\/a> and a few years later <a href=\"http:\/\/www.unc.edu\/~painter\/SPSSsyntax\/propen.txt\">John Painter added to it a bit<\/a>. Both of  them did great work for which many people should be extremely grateful (I am!) .<\/p>\n<p>However, I think for many people who use SPSS primarily by pointing and clicking, they still may have a bit of trouble with the pre-processing that this macro assumes you will do. They also may not be so sure about how to code things for a Mac or how to do the post-processing after the macro. So, as a public service, here you go, Part 2.<\/p>\n<p>Start with defining the path where your interim files and the final matched file will be stored. If my path looks funny to you it is because you are probably using Windows and I did this on a Mac.<\/p>\n<p>\/* Change file path here and only here *\/<\/p>\n<p>DEFINE !pathd() &#8216;\/Volumes\/Mystuff\/SaveHere\/&#8217; !ENDDEFINE.<\/p>\n<p>\/* This is the data set with all of my original data *\/<\/p>\n<p>DEFINE !readin() &#8216;\/Volumes\/Otherplace\/HereIs\/inputfile.sav&#8217; !ENDDEFINE.<\/p>\n<p>IT IS ASSUMED that your dependent variable is named treatm and coded 0 for the control (larger) group and 1 for the treatment (smaller) group. If that is NOT the case you will need to execute a GET FILE statement to readin your data and then do whatever you need to make it coded 0 or 1. In my example I have a variable, Alive, coded in the OPPOSITE direction of what I need, that is most people are alive (1) and a few people are dead (0). It is easy enough here to execute a COMPUTE command and make treatm = Alive &#8211; 1 .<\/p>\n<p>*  Get file and make sure  .<br \/>\n*  Dependent is named treatm and coded 0 or 1.<\/p>\n<p>GET<br \/>\n  FILE= !readin .<br \/>\nCOMPUTE treatm=1 &#8211; Alive .<br \/>\nEXECUTE.<\/p>\n<p>******************** .<br \/>\n* Perform logistic regression to compute propensity score .<\/p>\n<p>******************* .<\/p>\n<p>I could have made the a variable list and dependent variable macro variables also but since they are only used this one time, that seemed kind of silly.<\/p>\n<p>Note the RENAME VARIABLES &#8211; that is going to name the propensity score to propen. That is used throughout the macro, so don&#8217;t change that. Also, the output file from the logistic regression is going to be test.sav in whatever directory you specified above. That is also used throughout the macro, so don&#8217;t change that either. In fact, don&#8217;t change anything here other than the dependent variable name and the list of independent variables after ENTER .<\/p>\n<p>LOGISTIC REGRESSION VARIABLES depend<br \/>\n  \/METHOD=ENTER V1 v2 othervar morevar1 morevar2<br \/>\n  \/SAVE=PRED<br \/>\n  \/CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).<br \/>\nRENAME VARIABLES (PRE_1=propen) .<br \/>\nSAVE OUTFILE=!pathd + &#8220;test.sav&#8221; .<\/p>\n<p>EXECUTE.<\/p>\n<p>After this is the macro Levesque wrote. It works fine. Just copy and paste it into your syntax file. <a href=\"http:\/\/www.spsstools.net\/spss_programming.htm\">Yay, Levesque. He also has a really good book on data management and programming.<\/a> I HIGHLY recommend it, as well as just perusing his site. You&#8217;ll learn a ton about SPSS.<\/p>\n<p>********************* .<br \/>\n** End Preparation .<br \/>\n********************* .<br \/>\nGET FILE= !pathd + &#8220;test.sav&#8221;.<br \/>\nCOMPUTE x = RV.UNIFORM(1,1000000) .<br \/>\nSORT CASES BY treatm(D) propen x.<br \/>\nCOMPUTE idx=$CASENUM.<br \/>\nSAVE OUTFILE=!pathd + &#8220;mydata.sav&#8221;.<\/p>\n<p>* Erase the previous temporary result file, if any.<br \/>\nERASE FILE=!pathd + &#8220;results.sav&#8221;.<br \/>\nCOMPUTE key=1.<br \/>\nSELECT IF (1=0).<br \/>\n* Create an empty data file to receive results.<br \/>\nSAVE OUTFILE=!pathd + &#8220;results.sav&#8221;.<br \/>\nexec.<\/p>\n<p>********************************************.<br \/>\n* Define a macro which will do the job.<br \/>\n********************************************.<\/p>\n<p>SET MPRINT=no.<br \/>\n*\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/.<br \/>\nDEFINE !match (nbtreat=!TOKENS(1))<br \/>\n!DO !cnt=1 !TO !nbtreat<\/p>\n<p>GET FILE=!pathd + &#8220;mydata.sav&#8221;.<br \/>\nSELECT IF idx=!cnt OR treatm=0.<br \/>\n* Select one treatment case and all control .<br \/>\nDO IF $CASENUM=1.<br \/>\nCOMPUTE #target=propen.<br \/>\nELSE.<br \/>\nCOMPUTE delta=propen-#target.<br \/>\nEND IF.<br \/>\nEXECUTE.<br \/>\nSELECT IF ~MISSING(delta).<br \/>\nIF (delta<0) delta=-delta.\n\nSORT CASES BY delta.\nSELECT IF $CASENUM=1.\nCOMPUTE key=!cnt .\nSAVE OUTFILE=!pathd + \"used.sav\".\nADD FILES FILE=* \n\t\/FILE=!pathd + \"results.sav\".\nSAVE OUTFILE=!pathd + \"results.sav\".\n\n************************************************ Match back to original and drop case  from original .\nGET FILE= !pathd + \"mydata.sav\".\nSORT CASES BY idx .\nMATCH FILES \n \/FILE=*\n \/IN=mydata\n \/FILE=!pathd + \"used.sav\"\n \/IN=used\n \/BY idx .\nSELECT IF (used = 0).\nSAVE OUTFILE=!pathd + \"mydata.sav\"\n \/ DROP = used mydata key delta.\nEXECUTE.\n!DOEND\n!ENDDEFINE.\n*\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/.\n\nSET MPRINT=yes.\n\n**************************.\n* MACRO CALL (first insert the number of cases after nbtreat below) .\n**************************.\n\nSo much for the macro definition. Now you need to call it.\n\nReplace the ### here with the number in your treatment (smaller) group, the people that are coded treatm = 1 .\n\n!match nbtreat= ### .\n\nHere is more of Levesque's work. Just copy and paste it.\n\n\n* Sort results file to allow matching.\n\nGET FILE=!pathd + \"results.sav\".\nSORT CASES BY key.\nSAVE OUTFILE=!pathd + \"results.sav\".\n\n******************.\n* Match each treatment cases with the most similar non treatment case.\n* To include additional variables from original file list them on the RENAME subcommand below .\n******************.\n\nGET FILE=!pathd + \"mydata.sav\".\nMATCH FILES \/FILE=*\n \/FILE=!pathd + \"results.sav\"\n \/RENAME (idx = d0) (id=id2) (propen=propen2)\n  (treatm=treatm2) (key=idx)\n \/BY idx\n \/DROP= d0.\nFORMATS delta propen propen2  (F10.8).\nSAVE OUTFILE=!pathd + \"mydata and results.sav\".\nEXECUTE.\n\n* That's it!.\n\nHe says that's it, but there is a little more to it than that. The original macro assumed you had four variables, a propensity score, ID, treatm and improve, that is, a variable that shows improvement.\n\nWhat if you have a lot more than that and you would like to merge this back with your original data set and have the matched and treatment subjects selected out by ID number with everything else you may have recorded on them?\nIn the file the macro produces, it has N cases, where N is the number of people in the treatment group. Perfect if you want to do a dependent t-test sort of analysis, but that is not what I want. I want N*2 cases with each ID on a separate row.\n\nThere are probably more beautiful ways to do this, but here is one that works.\n\nGet the file that the macro produced with all of the data. If the case has a value for propen2, it was one of the cases selected. Output that to the results2 file and keep id2 (this is the match ID). Rename that variable to ID.\n\nGET \n  FILE= !pathd +'mydata and results.sav'.\nSELECT IF NOT (SYSMIS(propen2)).\n SAVE OUTFILE=!pathd + 'results2.sav' \/ KEEP = id2. \nEXECUTE.\nGET \n  FILE=!pathd + 'results2.sav'.\nRENAME VARIABLES (ID2 =ID) .\n SAVE OUTFILE=!pathd + 'results2.sav' \/ KEEP = id .\nEXECUTE.\n\nGet the file with the results and keep the id variable. That is the case ID.  Output that to the results1 file.\n\nGET \n  FILE= !pathd + 'mydata and results.sav'.\nSELECT IF NOT (SYSMIS(propen2)).\n SAVE OUTFILE=!pathd + 'results1.sav' \/ KEEP = id. \nEXECUTE.\n\nConcatenate the results1 and results2 files. This is your file of all of the ids, the treatment cases and their nearest match.\nADD FILES \n    \/FILE=  !pathd + 'results1.sav' \n  \/FILE=  !pathd + 'results2.sav' . \n SAVE OUTFILE=!pathd + 'resultsall.sav' .\nEXECUTE.\n\nThis is just a quality check. On the final results file, all of the records should have a variable inmatch with a value of 1. \nGET \n  FILE= !pathd + 'resultsall.sav'.\nCOMPUTE inmatch = 1 . \nSAVE OUTFILE=!pathd + 'resultsall.sav' . \nEXECUTE.\n \nYou absolutely have to sort the cases by ID and save the sorted file before you can merge them. This sorts the resultsall.sav file just created.\n\nSORT CASES BY ID(A). \nSAVE OUTFILE=  !pathd + 'resultsall.sav' \n \/COMPRESSED. \n\nThis sorts the original data file (remember your original data file?)\nGET \n  FILE= !readin . \nEXECUTE. \nSORT CASES BY ID(A). \nSAVE OUTFILE=  !readin\n \/COMPRESSED. \n\nNow, we will finally match the subjects from the treatment group and their matched controls back together and save them in a new file named matches.sav that is in that directory you specified at the very beginning.\n\nMATCH FILES \/TABLE= !pathd + 'resultsall.sav'  \n  \/FILE= !readin\n  \/BY ID. \nSAVE OUTFILE=  !pathd + 'matches.sav' \n \/COMPRESSED. \nEXECUTE.\n\nNow that you have your matched file, I strongly suggest you do some tests to see that your propensity score matching worked as hoped.\n\nWhich I did and nothing was within shouting distance of being significant, so I was happy. \n<a href=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2008\/09\/drd.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2008\/09\/drd.jpg\" alt=\"\" title=\"Happy Times with Statistics\" width=\"100\" height=\"138\" class=\"aligncenter size-full wp-image-58\" \/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I wrote Part 1 a couple of years ago, so I guess I&#8217;m due for a part 2. In this case, I started with a data set in SAS but because it was going to be used by a group who had some SAS users and some SPSS users, they wanted to have the code&#8230;<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[9,11,8],"tags":[],"class_list":["post-2104","post","type-post","status-publish","format-standard","hentry","category-software","category-statistics","category-technology"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/2104","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/comments?post=2104"}],"version-history":[{"count":4,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/2104\/revisions"}],"predecessor-version":[{"id":2111,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/2104\/revisions\/2111"}],"wp:attachment":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/media?parent=2104"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/categories?post=2104"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/tags?post=2104"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}