This is part 3 (hence the name) of my posts on how to learn advanced SAS from other people’s code. Yes, taking a class is a good idea – I have taken several – a couple from SAS Institute here and there, several before or after conferences like WUSS and SAS Global Forum.
For me, though, the two best ways to learn are:
1. Take something really complicated someone else has done, take it apart to really understand it and modify it to fit my needs.
2. Use the language to solve some problem that is bothering you (or someone paying you).
So, here are are continuing with the propensity score matching with calipers macro from Feng, Yu & Xu. This is the really interesting part, in my opinion, and the crux of the whole matter, where you are trying to find the best match.
%MACRO Mahalanobis(data, var, refdata);
*** This macro is to calculate Mahalanobis distance from each point to a reference
*** point. For example, reference point can be a patient from case group, other
*** points can be the patients from control group who meet the criteria within
*** 1/4 standard deviation of Logit of propensity score ***
Nifty thing number twelve – a principal components analysis
So, here we are doing a principal components analysis with the data set data, OUT = is going to be a data set with our data plus the first principal component (in this case there is only one principal component because there is only one variable, the propensity score. You COULD have multiple variables – the propensity score and additional variables). The OUTSTAT = data set is going to contain the statistics created using the first data set – which is your treatment or experimental group (the “cases”).
Personally, I don’t like the NOPRINT option. I always want to print everything and look at it. In this example, since you only have one variable, you won’t be getting a lot of output, or necessarily learning a lot about what the PRINCOMP procedure does, but at the least you can open the data sets from your explorer window and see what is in them.
PROC PRINCOMP =&data STD OUT =out OUTSTAT=outstat NOPRINT ;
Nifty thing thirteen – PROC SCORE
PROC SCORE DATA = &refdata SCORE = outstat OUT = reference_point ;
“The SCORE procedure multiplies values from two SAS data sets, one containing coefﬁcients (for example, factor-scoring coefﬁcients or regression coefﬁcients) and the other containing raw data to be scored using the coefﬁcients from the ﬁrst data set. The result of this multiplication is a SAS data set containing linear combinations of the coefﬁcients and the raw data values.”
In this case, it is going to take the coefficients output from the PRINCOMP procedure and apply these to your new (control) data set. In the simpler example I am using here, there is only one variable in the principal components step and thus only one principal component. Your new scored data set with the control cases, now named reference_point, is going to have a new variable added PRIN1 , which is the principal component for each case in the control group calculated using the values from the cases (experimental or treatment) group.
I said it’s a simpler example because really, if you’ve been following this post from
It’s become abundantly clear to you that this is not a simple example at all, but that’s the whole point – we’re now moving into more advanced programming techniques territory, either because they take a bit of statistics, like principal components, which I’ll probably discuss next week, or they use less common procedures or macros.
Nifty thing fourteen – PROC APPEND
PROC APPEND DATA =out BASE=reference_point;
Compared to a lot of this other stuff, PROC APPEND is easy. I just threw it in here as a nifty thing because not everyone knows about it. You could create a new data set using a DATA step and a SET statement. Or, you can use PROC APPEND and add the data from the DATA = step to the BASE = data set. As David Carr so helpfully explains in this SAS Global Forum paper this can save you processing time and as you will learn when you run this macro, reducing processing time is a desirable feature.
Nifty thing fifteen and sixteen – PROC FASTCLUS & VAR PRIN:
PROC FASTCLUS DATA=reference_point MAXC=1 REPLACE=none MAXITER=0 NOPRINT OUT=mahalanobis_to_point(DROP = CLUSTER);
VAR PRIN: ;
Note they used the PRIN: syntax. Because there is no way for the programmers who wrote this to know how many variables you will be using and hence how many principal components you will have, they just used PRIN: , that will use all of the variables beginning with PRIN , from PRIN1, PRIN2 to however many you have.
This does a cluster analysis on the control group data set. The variables used in the cluster analysis are the principal components generated from the second step above. Remember, these were generated using the coefficients for the cases. (Think about this until it makes sense – we’re trying to generate the closest match to a case in the treatment group.)
Again, they used the NOPRINT option, which is not my preference because I like to look at everything. They are creating one cluster and the new data set mahalanobis_to_point . That data set will have one additional variable, distance, which is the distance from the seed of the one cluster.
Now, FASTCLUS starts with the first complete observation as the seed. There is no iteration so you are going to end up with distance from that first observation.
As I look at this, I wonder if you would get better results if you did let it iterate. You certainly would get different results (I know because I tried it with a sample data set).
Tomorrow-ish (defined as tomorrow or whatever day I get around to it), I’ll finish off with the propensity score with calipers macro.
Later, I’ll try some different things with FASTCLUS, when I have a bit more time. One of the benefits of working through someone else’s code is it makes you think. At first you’re thinking about what they did and trying to understand the decisions they made. Then, you’re thinking about how you might do it differently and testing out your ideas.