Learning Advanced SAS from a Macro: Part 4

Continuing to learn Advanced SAS from the Propensity Score Matching with Calipers macro from Feng, Yu & Xu , we take our data set we created by doing a principal components analysis on the cases (experimental) group, using the coefficients from that analysis to score every record in our control group, concatenating the cases and controls, performing a cluster analysis on the whole group that adds a variable “distance” which is the distance from the seed of the one cluster. The data set we ended up with after all this statistical slam-dunking was named mahalanobis_to_point.

I was thinking about the FASTCLUS and I realized something. The PROC APPEND appends all of the records that fall within the calipers to the reference point data set. In FASTCLUS , the first complete record is going to be the seed . So, that first record, which is the seed is from one group and all of the other records are from your matching group. (Yes, Feng et al. said exactly that in their paper but I needed to see how it worked for myself.)

So, what now ? First, we’re going to use SQL to select all the records from the mahalanobis_to_point that are not in the &refdata data set. That is, we are going to pull out the case we’re trying to match so we don’t end up with the silly situation where something is matched with itself.

I’m not sure if I can call PROC SQL a nifty thing.

I guess it is a good thing in the sense that homework and canned spinach are a good thing. Anyway, here we throw in some practice with PROC SQL.

proc sql;

create table mahalanobis_to_point

as select a.*
from mahalanobis_to_point a, &refdata b
where a.&id^=b.&id;
quit;

Next, we sort all of the records by distance, we SET by distance, which will be in the sorted order, and we take the the first record, because that is the the one with the smallest distance from the seed, i.e., the case we’re trying to match.
proc sort data=mahalanobis_to_point; by distance; data mahalanobis_to_point; set mahalanobis_to_point; by distance; if _n_=1; run;

We add those two records into a data set named case_ctril and give them both the value of &i (the loop counter) for their newid variable, which we can now use to link the case and it’s control group match. We don’t want to write over their old id variable because that is usually needed. The need to apply a newid is why we can’t (as you might be tempted at first glance) combine these into the case_ctrl_together data set in one step.

At our next step we add these two records, the case and its matched control, into the case_ctrl_together data set we are building. We pull the record for the matched case out of the temporary data set with cases. We are going to keep going around in the loop until all of the cases are pulled out, either because they were matched here, they were matched previously (because there was only one match) or because there is not a control record that matches within the defined caliper.
data case_ctrl; set &refdata(keep=&id) mahalanobis_to_point(keep=&id); newid=&i; run; data case_ctrl_together; set case_ctrl_together case_ctrl; if &id=. then delete; run;

proc sql; create table case_temp as select a.* from case_temp a, mahalanobis_to_point b where a.&id^=b.&id; quit; *** Exclude patients in case group which are selected in the Mahalanobis macro ***; %mend Mahalanobis;

You may be excused for having forgotten by now that this macro was executed inside of a loop inside of another loop inside of a macro, but it was. Having cranked through the macro as many times as necessary we hit a %END statement, and go searching for another case-control match. When we have matched all of our cases, we’ll hit our second %END statement for the last time and exit the outer loop.

%end; %end; ***Create test dataset that contains patients from case_control_together, and all variables and propensity scores as well***;

This final PROC SQL is going to select out only the matched records into a data set (table) named test from the original data set we had created at the very beginning as output from the PROC LOGISTIC with our propensity scores (remember propensity scores? That’s what this macro was about, matching by propensity scores.)

It was a lot of work to get this and I wanted it saved as a permanent data set just in case there was some problem later in the program and I accidentally did something stupid like save over this data set after it took 40 minutes for the program to run and create it. So, I added a DATA step to save it off somewhere safely away in a permanent SAS data set.
proc sql; create table test as select a.*, b.newid from propen a, case_ctrl_together b where a.&id=b.&id order by newid,&yvar; quit; data study.calipers ; set test ; run ; %mend match;
Was it worth it?

So, there you have it, how to do propensity score matching with calipers in 5,000 words or more over four posts. Even given all of that, I am not so sure it is worth it, other than the educational value of going through the macro – and there IS considerable educational value. However, quintiles is far simpler and often works to reduce the bias a great deal. The GREEDY match macro, is also far simpler (though not as simple as quintiles) and has worked really well with the data I’ve been using. I’m not convinced that the improvement in precision is worth the tradeoff in complexity. Yeah, it’s a cool macro and all, but you need someone to understand it and maintain it. When disseminating the results of your research you’re going to need to explain it to people like a hospital board, a city council. Increasing the probability of difference between case and control from .76 to .91 and reducing the r-square from .002 to .00018 – well, I’m just not sure it is a non-trivial improvement in a lot of cases.

That, however, is a philosophical discussion for another day.

One Comment

Gabe says:

March 8, 2012 at 3:33 pm

I’m pretty sure you can avoid that first PROC SORT call by adding an ORDER BY statement to the PROC SQL call directly before it. Like this:
proc sql;
create table mahalanobis_to_point
as select a.*
from mahalanobis_to_point a, &refdata b
where a.&id^=b.&id
order by distance;
quit;

I think you can also get PROC SQL to select the record with the shortest distance for you, but that gets a little more complicated.

Learning Advanced SAS from a Macro: Part 4

Simple graphs, not so simple answers

Phi coefficients, odds ratios and the F-word

Statistics Guru Predicts Republican Sweep! With Proc GMAP

What I totally will do again, and what I wish I didn’t have to

Proc Summary and other bricolage

Survey Participants are Fat Liars

One Comment

Leave a Reply

Similar Posts

One Comment

Leave a Reply