Software Books I Want That I Have Not Got

The new year is a popular time for blogs to give lists of favorite books one read over the last year. Reading several of these posts did not inspire in me any desire to update my Amazon wish list. Novels really aren’t my cup of tea. I don’t care about any girls who knocked over bee hives or whatever.

I was thinking this morning about books that I would like to have read if they existed, or maybe books I did read that I would like to have been written differently. Lately, I have read several hundred pages of documentation of SAS software. Stata documentation, by the way, is written exactly the same, only more so.

I had the 224-page PROC MIXED book excerpt on my desktop, so I just opened a random page in the first twenty pages and here is what it says (click to see larger font, as if that will help – ha!) :

Now, maybe I am just grumpy because I have to teach this stuff to graduate students who generally don’t want to learn it, and professionals who do want to learn it but have rather unreasonable expectations, like being made an expert by Thursday.

That being said, the reaction of the average student, is generally along the lines of , and I may be paraphrasing here (or not):

“Are you fucking kidding me?”

I’d like a book that for the first 20 pages provided a general description of the procedure, when it is used, compared and contrasted it with other procedures. The next 100 pages would give examples of appropriate uses of mixed models (or whatever the particular procedure happened to be) with the appropriate code after each one. The book would introduce, say, the Akaike Information Criterion, and show how it could be used to compare models, using one model with several predictor variables and then a second model without one of those variables.

The examples used would be real ones with real data. Picking mixed models again, the first example in the SAS manual is predicting height from the variables family (with a random sample of families) and gender. These are good variables from the standpoint of an example of random effects (randomly sampled from all possible families) and fixed effects (gender having two fixed levels, male and female). However, as I read this example, I tried to think of any possible scenario in which it would matter to predict height from these two variables. I failed. Perhaps if one were a biologist and had discovered a new species, say, the Pine-baby Tree and you wanted to determine if the male of the species was significantly larger than the female of the species.

(As no expense is spared in the researching of this blog, a photo of the Pine-baby Tree in its natural environment of living room sofas next to smart phones, is included. I had to brave suburbia to take this picture. You’re welcome.)

My complaint, as is the complaint of the 50% of students who begin majoring in science and then switch majors, is that the examples presented early on are not in any context. I know this demand is hard on the authors, because you are asking for an example that is simple for someone new to a language or procedure to understand, general enough that it will make sense to the majority of readers and at the same time a real world application.

This challenge is addressed in an interesting way by a book I’m reading now, Beginning Ruby: From Novice to Professional. The author starts off with the example of Pets as a class and then discusses dog, cat and snake as subclasses and gets into the issue of inheritance. Now, it no doubt helped that I already knew what classes and inheritance were (as well as knowing about pets, dogs, cats and snakes) but it also helps that he continually draws specific generalizations ..

“Now, you can see how this would apply if the class was Person or Tickets.”

One could argue that the Ruby book is more of a textbook or self-teaching tool while the SAS documentation is meant for reference, like the Unix man pages (man as in manual, not as in only meant for men). However, this is unlike Unix in that one can find lots of well-written helpful books.

For statistical software, once you get past the most basic statistics (for which there are some good books available), all of the books and articles I read seem to follow the same frustrating format – a few pages of introduction, if any, and then pages of formula, with 20 pages at the end of stuff I really need to know.

I feel like someone who wants to drive from Los Angeles to San Francisco and the first 195 pages of the map are a discussion of the manufacture, operation and quality testing of internal combustion engines. A few pages mixed in there are important points about how you have to put gas in when the gauge is near empty, what windshield wipers do, and so on. Somewhere else in there are all of the possible routes one can take to go anywhere in the United States, one of which includes going from Los Angeles to San Francisco with different routes through all California cities of over 50,000. At the end of the book is an example of driving from San Diego to Sacramento. However, since you don’t know which and where are those important things like putting in gas, you have to read the entire book, making you two days late for your meeting in San Francisco.

Let me give a real-life example for statistics, since I just complained about people not doing that. If you are using PROC LOGISTIC, GLM or MIXED, you need to use a CLASS statement to define your categorical variables. For example, I used five different schools where I administered an experimental training program. At each I had an experimental and control group.

If I did this:

Proc mixed data = mystudy ; model score = group school ;

I would get an error message because school is not a numeric variable and therefore needs to be specified in the CLASS statement. That’s the sort of thing you need to know up front.

The discussion of the asympotic covariance matrix and what the ODS object name is for it, well that can wait (AsyCov if you really just couldn’t) .

I’d like to have read about ten books like that in 2010 but Santa didn’t bring me any for Christmas. If you have any to recommend, I’d be extremely grateful.

5 Comments

disgruntledphd says:

January 5, 2011 at 7:46 am

I like Gelman and Hill’s Applied Regression:

http://www.amazon.com/Analysis-Regression-Multilevel-Hierarchical-Models/dp/052168689X/ref=sr_1_1?ie=UTF8&s=books&qid=1294231506&sr=8-1

Its both comprehensive and informative, without going too deeply into asymptotic covariance matrices (except when actually necessary). It focuses on R and WinBugs which i’m not sure if you use, but its a really good read nonetheless.
admin says:

January 5, 2011 at 1:40 pm

Thanks a lot for the suggestion. I’ll check it out. For the first time in years, I am not teaching this semester, so I have a chance to just read books. On the other hand, since my sabbatical isn’t going to last forever most likely, I’m taking the time to find some better resources.
Rob Meekings says:

January 6, 2011 at 6:15 am

Any chance you’ll write the book you (and we) want to read?
admin says:

January 6, 2011 at 1:38 pm

Actually, I am co-authoring a book we plan to have done by the end of the year. It’s on judo, though.

I’m also writing a couple of papers for a users group meeting in Hawaii this month.
Pingback: Ebooki

Software Books I Want That I Have Not Got

Can’t log in to SAS On-demand? Maybe this is why

Are Blacks and Hispanics too Lazy to be Statisticians?

Defeating the dreaded YZR malfunction

2 tips to being a better programmer, if you can’t afford SAS Global Forum

Why Present Your Data at a Software Conference?

SAS EG Weighted Bar Chart to Answer Question on Race & Marriage

5 Comments

Leave a Reply

Similar Posts

5 Comments

Leave a Reply