| | |

You learn one programming language, you’ve learned them all (sort of): SPSS Quintiles Example

Recently, I had the need to write the exact same programs twice, once using SAS and once using SPSS syntax. Even though these aren’t the same language, having done it once made it much easier to do it the second time. Let’s start with quintile matching. I’ve been rambling on about propensity scores lately and quintiles are one way of matching propensity scores. I could go on at length about the different methods and probably will some day, but this is not that day.

No, what I’m going to ramble on about is how much transfer occurs between one language and another.

The first thing that transfers is the structure of the solution.

Listen up and do what I say, not what I do. This is one of my many failings that I don’t sit down and draft out a solution right away. Often, I’ll start coding and not actually draw out the program until I have spun my wheels for a while. Flow charts are under-rated. And no, you don’t need to bother with all that crap where they teach you in school that a diamond is a decision point and a trapezoid is a process and this other thing is input data. No. Just break it into logical, sequential steps.

===========================

Run Logistic Regression

===========================

|

============================

Compute Quintiles of Propensity Score

============|================

|

============|=============================

Add Random Number & Quintiles  to original data set

========================================

|

=======================================

Randomly select out control cases in proportion

to the quintile distribution of treatment cases

======================================

|

=====================================

Create a data set of the treatment cases, plus

the control cases you pulled out in the previous step

=====================================

Ta-da ta -da. Done!

Whether I am going to use SAS, SPSS or something else, I want to follow the same basic steps outlined above. Knowing what you need to do is half the battle. Okay, well, maybe not half, but it pretty much guarantees that in the battle between you and your computer that you’re going to win.

 

The second thing that transfers is the sense to re-use as much code as possible.

I was heartbroken that I could not find a simple quintile matching program with SPSS somewhere and had to actually write some statements myself. Still, as much as possible, I tried to re-use code I had already written. The first box above was already done for a different SPSS program.

Start with defining the path where your interim files and the final matched file will be stored. I changed from the program I had written previously using SPSS 20 on the Mac because the Mac version *** BLOWS *** . It crashed so often as to be worthless, so I switched to my Windows 7 machine.

/* Change file path here and only here */

DEFINE !pathd() ‘c:\Users\AnnMaria\Documents\Here’ !ENDDEFINE.

/* This is the data set with all of my original data */

DEFINE !readin() ‘:\Users\AnnMaria\Documents\Here\ItIs.sav’ !ENDDEFINE.

The third thing that transfers is your knowledge of concepts relevant to solving the problem.

For example, even if you’ve never done a line of SPSS syntax in your life, if you use SAS you recognize the usefulness of something like a LIBNAME statement which defines the directory for your files, so that you don’t have to do it over and over. Also, you can change that directory in one place, and it permeates throughout the program. That is exactly what I am doing in that DEFINE !Pathd() above.

Similarly, if you are familiar with a macro variable, you can understand that both the !pathd  and !readin are acting as macro variables, to be reused throughout the program.

IT IS ASSUMED that the dependent variable is named treatm and coded 0 for the control (larger) group and 1 for the treatment (smaller) group. I also need to perform a logistic regression to compute the propensity score . I covered how to do that in a previous blog. You should have been paying attention.

Next is computing quintiles. Here is how you happen to do it in SPSS. Again, even if you’ve never used SPSS before, if you have some experience with statistical programming, you know what a quintile is (it’s dividing the group into five equal parts – the lowest 20%, 20-40% etc.).

* Find the quintiles .
GET
FILE=!pathd + ‘test.sav’.
FREQUENCIES VARIABLES=PRE_1
/FORMAT=NOTABLE
/NTILES=5
/ORDER=ANALYSIS.
EXECUTE.

****** SELECT AND RUN EVERYTHING FROM THE  FIRST LINE TO HERE TO CALCULATE QUINTILES .

This will give you output of percentiles that looks like this

Percentiles

20     .9345586

40    .9565614

60   .9690461

80   9789572

***** STEP 2: SELECT AND RUN EVERYTHING BETWEEN THESE LINES, from here .

***** USING THE NUMBERS FROM THE PREVIOUS STEP, code the quintiles .

*Code the quintiles in the data set.

GET FILE= !pathd + “test.sav”.
RECODE PRE_1 (0 thru .9345586=1) (.9345586 thru .95656=2) (.9565614 thru .9690461=3) (.969046 thru .9789572=4) (.9789572 thru 1=5) INTO Quintile.
COMPUTE x = RV.UNIFORM(1,1000000) .
VARIABLE LABELS  Quintile ‘Quintile’.
SAVE OUTFILE=!pathd + “test.sav” .
EXECUTE.
COMPUTE filter_$=(treatm = 1).
VARIABLE LABELS filter_$ ‘treatm = 1 (FILTER)’.
VALUE LABELS filter_$ 0 ‘Not Selected’ 1 ‘Selected’.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.

FREQUENCIES VARIABLES= Quintile
/ORDER=ANALYSIS.
EXECUTE.

****** STEP 2: SELECT AND RUN EVERYTHING BETWEEN THESE LINES, to here .

The fourth thing that transfers is your knowledge of how computers work.

You COULD take the macro Levesque wrote that I discussed earlier that does nearest neighbor matching and run it with quintile substituted for propensity score. That would work. It would also take probably 50 times as long to  run. Somewhere on the order of an hour as opposed to a minute. Here is why. You DON’T need to do a 1 to 1 match. You have, say 315 people in the first quintile in your treatment group. You want to pick 315 people in the first quintile in your control group. Coincidentally (okay, it wasn’t a coincidence), there is a random number assigned to each record. In the next step, I output all the control subjects from Quintile 1 to a data set, sort them by that random number and then take the first 315. I do the same for the next four quintiles. Sorting a data set and selecting out the first N people is FAR quicker than taking every one of say, 900 people from the treatment group and comparing them to each one of 20,000 people from the control group.
* Create quintile 1 data set.

GET FILE= !pathd + “test.sav”.
SELECT IF (treatm = 0 & Quintile =1).
SAVE OUTFILE=!pathd + “quintile1.sav” .
EXECUTE.
SORT CASES BY x(A).
EXECUTE.

GET FILE= !pathd + “quintile1.sav”.
SELECT IF ($CASENUM LE 315).
SAVE OUTFILE=!pathd + “quintile1.sav” .
EXECUTE.

* Create quintile 2  data set.

GET FILE= !pathd + “test.sav”.
SELECT IF (treatm = 0 & Quintile =2).
SAVE OUTFILE=!pathd + “quintile2.sav” .
EXECUTE.
SORT CASES BY x(A).
EXECUTE.
GET FILE= !pathd + “quintile2.sav”.
SELECT IF ($CASENUM LE 150).
SAVE OUTFILE=!pathd + “quintile2.sav” .
EXECUTE.

* Create quintile 3  data set.
GET FILE= !pathd + “test.sav”.
SELECT IF (treatm = 0 & Quintile =3).
SAVE OUTFILE=!pathd + “quintile3.sav” .
EXECUTE.
SORT CASES BY x(A).
EXECUTE.
GET FILE= !pathd + “quintile3.sav”.
SELECT IF ($CASENUM LE 113).
SAVE OUTFILE=!pathd + “quintile3.sav” .
EXECUTE.

* Create quintile 4 data set .

GET FILE= !pathd + “test.sav”.
SELECT IF (treatm = 0 & Quintile =4).
SAVE OUTFILE=!pathd + “quintile4.sav” .
EXECUTE.
SORT CASES BY x(A).
EXECUTE.
GET FILE= !pathd + “quintile4.sav”.
SELECT IF ($CASENUM LE 83).
SAVE OUTFILE=!pathd + “quintile4.sav” .
EXECUTE.

* Create quintile 5 data set .

GET FILE= !pathd + “test.sav”.
SELECT IF (treatm = 0 & Quintile =5).
SAVE OUTFILE=!pathd + “quintile5.sav” .
EXECUTE.
SORT CASES BY x(A).
EXECUTE.
GET FILE= !pathd + “quintile5.sav”.
SELECT IF ($CASENUM LE 47).
SAVE OUTFILE=!pathd + “quintile5.sav” .
EXECUTE.

********  Data set with  cases .

GET FILE= !pathd + “test.sav”.
SELECT IF (treatm =1).
SAVE OUTFILE=!pathd + “cases.sav” .
EXECUTE.
ADD FILES /FILE= !pathd + “cases.sav”
/FILE= !pathd + “quintile1.sav”
/FILE= !pathd + “quintile2.sav”
/FILE= !pathd + “quintile3.sav”
/FILE= !pathd + “quintile4.sav”
/FILE= !pathd + “quintile5.sav”  .
EXECUTE.
SAVE OUTFILE=!pathd + “quintilematch.sav” .
EXECUTE.

The result will be a data set quintilematch.sav with treatment cases matched by quintile with controls.

The fifth thing that transfers is your knowledge of how people work with computers.

I get the occasional snarky comment from someone that they could code “better” than me, meaning more efficiently or using more cool things. Sometimes the snarky comment is implied, by the following said in a wondering  tone …

“I don’t understand why you sometimes are turning away clients while I am always looking for work ….”

For example, the above program requires selecting something, running it, typing in eight numbers, running something else and typing in another five numbers. If I were running this in batch, that would not be feasible, so this program is not as extensible as it could be. Also, if this were going to be used to run against a data set with millions of records, say, the census data, running it interactively wouldn’t work. It would be more impressive if I had the frequency data output as input to the test.sav data set and then the next frequency output create macro variables then had a loop that went through five times, created the five quintile data sets and then concatenated all of them.

All of that would be impressive, but it wouldn’t be helpful. The environment where this program will be used doesn’t have more than 50,000 records. Why on earth would they run a program in batch that runs in a minute? They aren’t primarily an SPSS shop, so it will be much easier for them to use and maintain a program that essentially requires only knowing where you saved your data set, what your dependent and independent variables are, and how to copy some numbers from the screen into a program.

Actually, I can do some fairly impressive coding when the situation calls for it, but  that’s a pretty important qualifying phrase there. Usually, it doesn’t. Here is the secret to success as a consultant, I will give it to you for free, since I am feeling generous.

“It turns out that people don’t want to be impressed nearly so much as they want to be helped.”

 

 

Similar Posts

One Comment

  1. Hello! I am not a specialist in computer languages, but I am a linguist and I know that all the languages have common pattern. Having studied one of them you can learn any other quite fast and without much difficulty. The same principle can be applied to the programming languages, I suppose.

Leave a Reply

Your email address will not be published. Required fields are marked *