| | |

More after the data step (the naked mole rat continues)

Rocky & BullwinkleWhen last seen, our heroes were attempting to write a book with the title

Beyond SAS Basics: Tips, Statistics and a Naked Mole Rat

The first chapter was entitled

After the Data Step. The first half of it was posted here earlier which you would know if you were following this blog in the probably vain hope that you might learn something.

Writing the second half of the chapter was delayed by people offering to pay me actual money if I would fly around the country to hither and yon and do work like a real grown up. I didn’t make it to hither, but you can see a picture of yon below.

Lac du FlambeauNow that I have returned,  I have completed the rest of Chapter 1. To whit …

The next section could have been an entire book in itself. I LOVE statistics. I have spent most of my life as a statistician, and also won a world judo championship and married three husbands (not at the same time). It is a myth statisticians are boring and it is not true that math is hard, I don’t care what that stupid Talking Barbie doll said. Math is a lot easier than unemployment, in the opinion of most people. Since this book is titled “Beyond the Basics”, I did not include means, frequencies or correlations in the statistics section. I could have included simple linear regression or one-way Analysis of Variance – I know those are not that basic to most people.

If at this point your eyes are starting to glaze over and you’re starting to get anxious, just cut it out right now! You’re NOT that bad at math and it’s NOT that hard. It’s not rocket science. Besides which, having been married to a rocket scientist for fifteen years, I can tell you that they aren’t perfect, either. This section includes just two chapters. The first is on logistic regression – when your data really DO fit in neat little boxes – like did someone live or die, buy your widget or walk on by, vote Democrat or Republican. These are the kinds of things we want to predict on a daily basis. The second chapter in this section is the most common research design for testing whether something works – an experimental group and control group are each given a pre-test and a post-test. Read all about it in the chapter on Repeated Measures Analysis of Variance. If I have not convinced you, you can skip this chapter and still understand the rest of the book perfectly. Then, you can wait for my next book – Hamster Statistics with SAS. (Under suggestions for next year’s topic, one conference attendee wrote, “Statistics so simple a hamster can understand it – Bring your own hamster.”)

This next section is for those of you who don’t like statistics – and for those who do.
“Public agencies are very keen on amassing statistics – they collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But what you must never forget is that every one of those figures comes in the first instance from the village watchman, who just puts down what he damn pleases.” – Sir Josiah Stamp

People who don’t want to get too involved in statistics can take comfort in the fact that many statistical results are flawed because the data are of poor quality. If this describes you, there is plenty of work available out there fixing the data. I’ve read books that asserted that 80% of the time in any data analysis project is spent on data cleaning and data management. I deeply suspect that they just made this number up, but, as Dilbert said, studies have shown that real numbers are no more useful than numbers you just make up. (How many studies have shown this? 42. I just made that number up, too. See, it works!) My point, and you may rightly have despaired by now of me ever having one, is that a very large proportion of the amount of time on any project goes into fixing the data. So, if you don’t want to get very involved in statistics but you still want to use SAS for fun and profit, specialize in data quality improvement and you will be the life of the party. (Of course, that will only be at parties attended by nerdy SAS programmers but judging by the fact that you are reading this book it is assumed that you will fit right in.)
For those of you who DO love statistics (and, please, come sit next to me), the section on data quality is essential because unless you’ve been hanging out at parties where you met that guy in the last paragraph (and you didn’t invite me!), then you need to make sure your data are as near to error-free as you can get.

Section four is an introduction to SAS macros. There are a lot of reasons to like SAS macros. Any time you do the same type of task repetitively, you could write a macro and just supply the information that changes. For example, say you have a report you do for 24 different departments and the only difference is the name of the dataset you read in, the name of the department in the title and the name of the department manager – macro material for sure! Another reason to like macros is that a lot of the concepts you learn are applicable to other programming languages beyond SAS, and we’re all about being generalists here.
The main reason not to like macros is that they look like they are written in Micmac. (Micmac , also spelled Mi’kmaq)  is the language of a tribe native to Canada. For a sample of the language, see Exhibit A.

Scroll in MicmacExhibit A

For a sample of a macro, see Exhibit B, from a 1997 paper by Art Carpenter*. I was right, wasn’t I ?).

.

.

.

.

EXHIBIT B

%do q = 1 %to &n;
PROC FSEDIT DATA=dedata.p&&dsn&q mod
SCREEN=GLSCN.descn.p&&dsn&q...SCREEN;
RUN;
%end;

There is also the problem that the way people learn the macro language is usually sufficient to send them screaming in the opposite direction. Macro processing is taught beginning with several chapters on parameter scope, tokens, quoting and masking text.  Instead, I’ve included a couple of macros so you can see right away how useful macros can be and learn the statements and functions as we go along.
So, now we come to the final section, which is the “where do you go from here?” Since I don’t know you well enough to differentiate between you and a hairless monkey, it’s a bit surprising that I have an answer for you, but I do. The secret to keeping excited about the work you do and keeping other people excited enough to pay you is NOT Viagra, regardless of the 1,247,877 emails you have received. In fact, the answer is to really and truly keep learning. This section includes recommended resources from websites to mailing lists to conferences to specific books and papers I found both useful and interesting.

It also includes a a naked mole rat.

* Carpenter, A. L. (1997). Resolving and Using &&var&i Macro Variables .

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *