The difference between working for a business and being at a university is in a business we’re not paying you to show us how smart you are.

Don’t get me wrong. I think the best career advice I ever read was in an article by James Watson in Technology Review, never be the brightest person in the room. Yes, we want people who are very smart but we are not paying them to BE smart, we are paying them to DO stuff.

In a post entitled “Is SAS Enterprise Guide Making You Stupid?“, Chris Hemedinger (yes, that IS his real name) compared using SAS EG to using a GPS to get around London, and now, all of a sudden, no one has the streets memorized any more.

I don’t see a problem with that, since my goal in London is to get from Point A to Point B as quickly as possible, preferably where Point B involves a pub with good beer, and, if that fails, at least a garden, those being the two best things in England in my view. Since my cab driver doesn’t have to memorize all the streets in London, he or she can just focus on the important things, like which roads are closed and which pubs have the best beer.

How this all relates to SAS Enterprise Guide …

Enterprise Guide versus SAS programming is often presented as an either-or solution but I personally find it a lot easier to do certain things just writing the code. This includes setting the options, selecting a subset, writing formats and labels.

I’ve been doing this stuff for a long time so there’s usually a program already on my hard drive that does close to what I want to do. All I need to do is open SAS Enterprise Guide from the START menu, then from the FILE menu select OPEN > PROGRAM and select the program.

For the analysis I’m doing today, I need a new format. One of the great tragedies of life – although defining your own sex format sounds like it’s going to be really hot, all it is, is this.

Proc format ;
value sex
1 = “Male”
2 = “Female” ;

A very abbreviated form of the program is shown below.

libname readin "e:\wuss2010" ;
options nodate missing = " " ;
proc format ;
value sex
1 = "Male"
2 = "Female" ;
data recruits ;
set readin.recruits ;
if will_joi = 4 ;
format sex sex. FAILGRAD SUMMERSC SUSPEND YN. GRADES grades. race $race. ;
Label grades = "H.S. GRADES"
Failgrad = "Failed A Grade"
Summersc = "Attended Summer School"
Suspend = "EVER SUSPENDED" ;

I can use Enterprise Guide to create summary tables. [If you are familiar with PROC TABULATE, this accomplishes exactly the same thing. In fact, if you look at the code, it is, in fact PROC TABULATE.]

From the TASKS menu I select DESCRIBE and then SUMMARY TABLES.

Task Roles in SAS Enterprise Guide

Let’s pause briefly before continuing with the pointing and clicking to review task roles in SAS Enterprise Guide, because this concept comes up a lot. Many Enterprise Guide tasks will require an analysis and/or classification variable.

• Analysis variables are variables for which statistics will be produced, e.g., mean, N, standard deviation. An analysis variable must be numeric.
• Classification variables – are variables by which subjects will be classified within an analysis. Classification variables can be either character (categorical) or numeric. However, if a numeric variable is defined as a classification variable, it will be treated as if it is a categorical variable.
• Group variables – are variables for which separate analyses will be performed.

The window for the SUMMARY TABLES task pops up with the data window.The first step is to define the variables.

I drag sex, race, failgrad, vo-tech and suspend under CLASSIFICATION variables and grades under ANALYSIS variables.

Notice that the RUN button is grayed out and, just in case I didn’t know why, there is a helpful note at the bottom of the window that says
“You must add at least one variable to the Table definition”

Next, I click on Summary Tables (in the left pane of the window) , and begin dragging the variables to the positions where we want them, either row or column. This is very much like painting a table, which is probably the closest I am ever going to get to art.

NOTE that I defined grades as a numeric variable so it is going to be treated that way, with only one row, for whatever statistic I choose. I defined SUSPEND as a classification variable so it will have categories

For each variable, I drag the statistic I want from the pane in the lower left to the desired position in the table.

I want the output to be in html and think the default format is ugly so I’m going to click on the PROPERTIES tab, select HTML for the output type and Torn for the style.

The process flow for this “program” is shown below, beginning with the first step, the program that produced the Recruits dataset, the summary tables task that used the Recruits dataset as input and produced the html file as output.

But what about replicability? In research it is crucial to be able to defend and justify your procedures that led to the results. And my procedure looks like a computer version of Lego. What about THAT?

For those who want to look at the individual code, any of these steps can be double-clicked and a window will pop up with both a CODE tab and a LOG tab at the top and you can read code to your heart’s content. You can copy the code, paste it into a program window and modify it however you want.

When we went from solving equations with a pencil and paper to using computers, I don’t think it made us dumber. Maybe some of the dumb people could now find SPSS in the START menu and run a logistic regression, but they still didn’t know how to interpret it or whether they had done it right or not (my guess would be “not”) and the smart people had more time to read articles on logistic regression, read documentation for whatever software they were using and learn more stuff in the time that they previously would have spent mindlessly crunching numbers.

In fact, I’m not sure that anyone would do a maximum likelihood method by hand unless the person was really, really weird.

By freeing us from the necessity to memorize and perform repetitive tasks, SAS Enterprise Guide has given us the time to engage in more high-level intellectual pursuits.

So, my conclusion is that, SAS Enterprise Guide does not make us stupid. It makes us smarter. And less weird.

We should all write Chris and thank him.

Let’s assume, based on a random fact that I just made up for the moment, that 50% of all businesses that succeed are restaurants.

Based on this fact, as a business consultant, you advise me that I should offer daily specials, make up flyers that I post on cars around the neighborhood, be sure I get listed in Yelp and invest in search engine optimization.

But … I protest, we specialize in statistical consulting and evaluation research. Our clients come from around the country and that’s a lot of cars to paper. Besides, I don’t think there is a statistical consultant category on Yelp.

BUT .. counters the business consultant (proving that I am not the only one who can make up statistics), the success of these types of organizations matches the total percentage for all other business types combined!

I have run one business or another since 1985 – sole proprietorship, partnership and corporation. I haven’t made a fortune to rival Bill Gates, but it was enough that, after my husband died, I was able to support three children through college degrees and Olympic competition and live by the beach in Santa Monica. So, it’s been okay. And then, social media experts tell me I am doing it all wrong.

As Annie Pettit said, I must listen to them because some of them have days of experience in social media.

I am not stupid and I once was a young MBA (back in 1980) who thought I had brilliant ideas. And I did know some stuff and was pretty bright, just like the young people that lecture me today, but please, listen y’all …

If you have no clue what my business is, don’t lecture me about it.

The fact is, the majority of our work comes either directly or indirectly (via subcontract) from federal and charitable grants or government contracts. Usually, the grant review panel is specifically prohibited from including information in their decision that is not in the proposal. That means that even if we had an orgasm-producing website with fireworks and a data mining application that read the consumer’s mind, the reviewers would not be allowed to consider that in their decision. And there is usually a representative of the agency on hand to make sure they don’t.

My husband, the honest-to-God-rocket-scientist , commented the other day that I “dominate the AnnMaria space”. If you type AnnMaria into Google the first link is my blog on judo. This blog, which is a combination of statistics, statistical software and just rambling on random shit, is also on the first page.

So… to all those companies that call me and leave messages about search engine optimization, my question is

.. Why the hell do I care if the first thing that pops up in a search on my first name is me?

Of course, it helps if you have an unusual first name. I bet the person who is named Eshnapitaluki is the first link to pop up if you search on that, too. (Unless that just happens to be the most common first name in Thailand, which, for all I know, it is. )

I had a couple of conversations with people from different companies, very earnest, undoubtedly intelligent but not wildly experienced young people.

Search Engine Optimization company:

Him: We can help you be at the top in searches on Google, Yahoo and other major search engines,

Me: Ri-i-ght. That’s very nice. I don’t think that would help our company.

Him: Of course it would! We can increase your business by 50% !

Me: Do you even know what we do?

Company providing consulting on marketing to the federal government

Him: AnnMaria, don’t you think you should get your share of YOUR money?

Me: Huh?

Him: The federal government gives out billions of dollars every year. You should get YOUR share. Don’t you think that’s unfair that you’re not getting any of that money?

Me: Well, actually, I think we should have a good chance if we go in with a good proposal. I mean, I tried the business plan of sitting around with the door open and waiting for people to throw bags of money in, but that just didn’t work out for us.

Him: We can show you how to get on a GSA schedule.

Me: Well, yes, that’s nice, but we generally do statistical consulting, evaluation research. Our average contract is probably around $100,000. I don’t think people are buying statistical consulting services on a GSA schedule.

Him: Oh, yes there are. Every day, millions of dollars worth of business.

Me: Can you give me the name of one example?

SIGH.

[Actually, there are some opportunities on a GSA schedule but the vast majority of business that fits our company is not.]

I am sure that if you are doing search engine optimization it is truly wonderful for restaurants, people selling templates for Dreamweaver (I love pop menu magic from Project 7) and apps from the app store, whether they make you fart or not.

Yes, it would be nice if we updated our website more often. We have received two new contracts, hired a new person and published a few papers since the last time we added any pages or updated any of the old ones. Please do not call and give me your public relations expertise about how I should be sending out press releases and posting something on our site every time we get asked to present at a conference. Please don’t ask me (rhetorically) whether I know that would increase our visibility and bring us more business.

Here is half of my marketing plan –
Deliver products that exceed expectations. I know that is a very trite saying but the more times in a week a client says to me,
“Thank you, you didn’t have to do that” or
“Wow! We didn’t expect you to do that.”

The more in compliance we are with our business plan. Perhaps that did not fit the metric you had in mind?

Here is the other half of my marketing, find out what my clients want and what they need. Talk to them, not at them. LISTEN. I am pretty sharp as a statistician and programmer but no one is more of an expert on what you need and want than YOU. Before I tell people what I am going to do for them and what they need to do, I listen to find out where they want to end up.

This is where the self-appointed social media and SEO gurus are missing the point.

You see, my goal is not to have the largest possible business, make the most possible money and hire the greatest number of people, preferably those working in Elbonia who I can pay a monthly wage of $42 and one goat.

My goal is to have a good life while paying other people well so they can have a good life, too, and providing excellent service to my clients so their lives are easier.

One day I wanted to go to the Smithsonian in Washington, D.C. and look at the flowers. Which I did. See my photographic evidence of flowers at Smithsonian above.

The next week, I wanted to be in Boston to play dinosaurs with my granddaughter. See photographic evidence of dinosaur (not actual size) and granddaughter (still not actual size, but closer) below.

The week after that, I wanted to make enchiladas for Sunday dinner in Santa Monica with my daughters.
Photographic evidence of daughters included, but not the enchiladas. They ate those.

Later, I wanted to go to the Long Beach Aquarium.

There was a hackathon one could attend that offered twelve hours in a room coding with other developers, free coffee and fast food included.

I didn’t go. With my husband, I walked down to the wine-tasting at the Casa Del Mar, sampled white wine (I recommend the Cakebread) and champagne, ate oysters and watched the sun go down over Santa Monica Bay.

There’s been a lot of talk about why there aren’t more women in tech, more start-ups by women – um, I started a company in 1985 but I don’t think that is what they mean.

All the people who call and email me with unsolicited advice on what I am doing wrong tell me that if I just followed their advice I would make a pile of money and have a good life.

I would just give them three words of advice.

Know your customer.

Because, when I scroll up and look at those pictures, it seems to me that I have a good life now.

You know those movies you see where there’s a guy that looks like Brad Pitt (or is Brad Pitt) and some woman who looks like Miss Universe and they sit down at a random computer and in less than five minutes they have re-programmed all of the computers on Planet Earth while simultaneously disarming the Deathstar? Did you ever notice that the biggest problem they ever have in those movies is figuring out the password, which is never something like cabbage.Liberia6 and always the name of the evil villian’s one true love?

Yeah, well, they have talking raccoons in the movies, too, and that’s not real either.

It all started when I was looking at a graph for a paper for WUSS (Western Users of SAS Software).

I decided I wanted the lines on the graph to be thicker so it would be easier to read.

Here is my whole goal – not writing code for deactivating a nuclear reactor in Iceland using only X11 on my iPad (note to self: install X11 on iPad, also, find out if they really have nuclear reactors in Iceland).

No, I did not want to de-activate nuclear reactors in random Scandinavian countries, I just wanted to make some lines thicker on a graph.

Here was my amazingly unproductive day.

1. Go to Google, source of all knowledge, and see that there is an SGEDIT option to edit statistical graphics. Just type SGEDIT ON in the results window. I try this and get an error message.

2. Go to Google again, see that you can create .sge as in editable SAS graphs by including this statement in your program

ods listing sge = on ;

Do that, get .sge as well as other types of files. Double-click on it and get message that there is no program associated with this file.

3. Begin to suspect that SAS Graphics Editor has not been installed on this computer. Try the SGEDIT ON in the results window on someone else’s computer and it works. Swear.

4.  Copy SAS Software Depot from that computer to hard drive. Try to plug hard drive into my computer and see that it does not have the correct type of Firewire port to plug in the hard drive.

5. Spend an hour sorting through approximately 6, 192 cables. Do not find one that will work.

6. Give up and copy SAS software depot to server.  This will take 20 minutes.

I remember Joe Perry at SAS Global Forum, who was either being particularly brilliant that night or it was all the beers we sampled at the microbrewery in Seattle, who made the comment that twenty years ago he felt that it was possible for him to know all of SAS and that he did know a good percentage of it but that now that it has gotten so huge it is literally impossible for one person to know it all. This is comforting and relevant because …

In one of the papers I read looking into this, I see the code:

dm ‘clear output’ ;

dm ‘clear log’ ;

Smack self in head having forgotten that this clears the output and log in display manager. In other words, you don’t have to select all and clear. I KNOW this, used it for years, then was in a position where everything we did was batch and it totally slipped my mind. I vaguely remember something about the SGEDIT when ODS Statistical Graphics was announced and thinking I didn’t care how pretty my graphs looked, I just want the information, had totally forgotten about it.

Since the depot is copying and in honor of Joe’s memory (he’s not dead, as far as I know, I just happened to remember him today, him, and also the beer), I went down to the corner and bought beer.

7. Try to install from server and it updates several installed applications but doesn’t give me the option for installing additional software.

8. Remember this is a planned installation and if I want to install additional software I need to go to the RUN command and run setup.exe – skipplanning

9. Add Graphics Editor

10. Run program AGAIN.

11. Double-click on my graph and make thicker lines.

Yeah, I know, for all that trouble  you think it would look more impressive.

I don’t look like Miss Universe, either.

Reality is over-rated.

AN ACTUAL CONVERSATION THIS WEEK …

“This paper is not going to be as much an academic treatise as most of the ones I write, but I am hoping it will be more interesting. I was wondering about the fact that some well-respected people say the secret to career success is to be the foremost specialist in some obscure application or language. That doesn’t fit with my experience at all, though. So, instead of citing some articles I pretty much just sent a shout-out on Twitter, got responses from smart people and quoted them.”

“You mean you crowd-sourced it?”

“Gee, it sounds SO much more professional and scientific when you put it that way!”

Often there is a distinction made between programmers – who write code –  and analysts, managers or other categories who use the results of that coding. Some would say that the secret to career success is to specialize. There is something about this view that bothers me, so I went to Twitter, quickly replacing Google and Wikipedia in my life as the source of all knowledge and asked fellow twitterers their opinions.

Dr. Peter Flom, a statistician replied,

I am not a programmer, but as data analyst/statistician, I think you can be successful either way

In my experience, there are a good number of people who have made a successful decades-long career as the maven of PROC REPORT or SAS/AF or some other specific niche that was of critical importance to some division of their organization.

Jon Peltier, an Excel programmer answered,

Depends on how much in demand your specialty is.

Evan Stubbs, of SAS Institute in Australia put it best when he succinctly summed up what bothers me about the specialization paradigm.

Fly high, fall far; pay’s good for specializing until you go the way of the buggy whip. Generalists fit anywhere, learn faster

Let’s be generalists then, and apply what we know about SAS to answer some questions using statistical procedures. In the first part of SAS Essentials, I said that one distinction between a novice and an intermediate programmer is being able to make design choices because he or she knows more than one way to achieve a task. A second distinction is being able to put together the things you know

We’re going to try to put together some of the procedures you may know to understand a bit more about an incomprehensible subject – hate crimes.  These are crimes that are motivated by bias against the victim’s race, religion, sexual orientation or disability status. We’ve already seen in a prior example the most common categories. How often does this happen and what do hate crimes look like? I want to start with the victims and offenders so I use PROC UNIVARIATE. The code below will give some initial statistics for both the number of victims and the number of offenders ;

ODS GRAPHICS ON ;
Proc univariate data = in.hatecrime plots ;
Var tnumvtms tnumoff ;
Where hc_flag = 1 ;

The first set of results is very interesting. This tells you that there were 7,783 hate crimes in the database in 2008 and the average had slightly less than one victim. Of course, this is very curious so let’s explore it further.

In the table below, both the median (the score that half of the population falls above and half fall below) and mode (the most common score) are one. So, in general a hate crime seems to be perpetrated against an individual. In a normal distribution, the mean = the median = the mode. This would seem to meet that criteria with the mean for number of victims = .98, median = 1 and mode = 1. Yet,  according to the statistics in the table above, the distribution is very far from normal. You can see this by looking at the skewness and kurtosis statistics, which are enormous.[1] A kurtosis value for a normal distribution is 0. Ours is 787. Skewness measures how symmetric (or not) the curve is and kurtosis measures how flat or peaked it is (in contrast to the “bell” shape we would expect in a normal curve.

The standard deviation is about one, again, suggesting that most crimes are committed against a solitary victim with not a lot of variation from the mean. To gain a little more understanding, let’s take a look at the frequency distribution produced by PROC UNIVARIATE when we used the ODS GRAPHICS ON statement.

What this picture shows is that hate crimes are overwhelmingly likely to have only one victim. The next most common number is zero, which is weird, but we’ll get back to that in a minute.

In this case, the results shown in the next table, the  t-test that the population mean is zero isn’t of much interest to us. If I’d included this table (which I didn’t because it was irrelevant and hence boring), we could see, unsurprisingly, that it is hugely significant. No real information here – the average number of victims of a hate crime in the population is not zero (duh). There are times when this statistic would be of interest. This isn’t one of those times.

The next table is quite interesting, though. It gives the responses in quantiles, from 0% to 100%. The minimum number of victims, and, in fact, up to the tenth percentile, is zero.

When I look at the results for the number of offenders, I find a similar pattern, where 10% of the records show zero offenders.

Something is definitely strange here, how can you have crimes with no victims and no offenders, and, if so, how do you know they are hate crimes? To learn a little bit more about this, I do the following:


data in.check ;
set in.hatecrime ;
attrib victim_off  length = $11. ;
if hc_flag = 1 ;
if tnumvtms = 0 and tnumoff > 0 then victim_off = “No victim” ;
else if tnumvtms > 0 and tnumoff = 0 then victim_off = “No offender” ;

else if ( tnumvtms = 0 and tnumoff = 0) then victim_off = "Neither" ;

else if ( tnumvtms > 0 and tnumoff > 0) then victim_off = "Both" ;

proc freq data = in.check ;
tables offcod1 * victim_off ;
where victim_off ne "Both" ;

The above code creates a dataset that only includes hate crimes (hc_flag = 1). It also creates a variable victim_off , that has four categories, no victim, no offender, neither victim nor offender or both a number of victims and offenders given. The FREQ procedure shown creates a cross-tabulation of the offense code by the victim_off category.

… and it becomes somewhat clear to me by reviewing the offense codes in the resulting table which I did not show because I am by now tired of showing tables in this post.

About two-thirds of the cases with no victim and/ or no offender are destruction or vandalism. So, if someone trashes a church or synagogue and leaves behind spray-painted racist or anti-semitic comments, that would be considered a hate crime and you would have an identifiable group that the bias was motivated against, but there wouldn’t necessarily be an identified victim.

The next step simply requires going back to the documentation. It turns out that if the offender is unknown, a zero is entered in this field.

This is just a tiny, beginning part of an analysis. Why even bother?

Let’s say you are a brand new programmer that maybe just finished your first SAS class that your company sent you to, so some code with a few IF statements, a PROC FREQ is a reasonable expectation from you. You proudly hand over your charts and tables and someone says,

“How the hell can you have zero victims and zero offenders for a hate crime?”

You can  kind of shrink back into yourself and say,

I don’t know.”

or, You can very defensively and somewhat aggressively state,

“How should I know, that’s not my job. I’m just the programmer.”

Or you could say …

“Well, it’s like this …”

Even if you’re a brand-new baby programmer that is just learning to cobble together a data step and a proc freq, I can tell you that in most organizations one of those three answers is going to get you more respect than the others.


[1] The formula for kurtosis sometimes subtracts 3 (which makes the value for a normal distribution equal to 0), and sometimes doesn’t. SAS software uses the formula that subtracts 3.

So, I am writing these papers on moving from novice to intermediate programmer and Kim Le Bouton has to go apply logic to it and ask,

“Just how do you define a novice programmer, anyway?”

I was tempted to be a smart ass about it and answer that it was anyone who didn’t come to my papers, but was overcome by an uncharacteristic burst of maturity.

First of all, my definition of a novice programmer, having been elected the word chooser of this blog unanimously by a nationally representative random sample of all of the people who are me, would say this:

“Being a novice, as distinct from an expert programmer, is not merely a function of years of experience, it is also reflects quality and results of experience. A novice programmer is  a person who is limited in knowledge of the field. “

Recently, someone told me there was a surplus of programmers and a shortage of managers. As evidence, he cited some report he had seen where a couple of programmers knew all sorts of programming languages but couldn’t get a job.

I told him,

“I don’t believe that. I believe there are people who know a programming language who can’t find a job but taking a course in a language doesn’t make you an expert programmer any more than writing in English makes you Hemingway. There’s never been a surplus of excellence and I don’t believe there ever will be. Managers who consider everyone who knows a programming language to be interchangeable are going to find that out to their detriment.”

One difference between novice and expert programmers is hours. I loved the book Outliers, by Malcolm Gladwell. His main point was that people who are outstanding in a field spend much, much more time in practice than people who are simply very good.

A while back, Mark Stevens posted a blog on Zero to SAS Certification in Ninety Days. Now, Mark Stevens seems to be a pretty smart guy, who started out with the education, motivation and experience that would make him derive the maximum benefit from this training and it is theoretically possible that I am dumber than a rock, but I seriously question what exactly one is being certified as in three months.

After 28 years of working with SAS, I would like to believe I have learned more than could be picked up in 90 days of study. So, back to Kim’s question, what would that be?

WHAT: A novice programmer is one who knows fairly limited set of procedures or solutions for most problems. For example, given the need to aggregate categories, he or she might consider several IF-THEN, ELSE statements and probably an ARRAY statement with a DO – LOOP. A more experienced programmer would consider other options such as PROC FORMAT or PROC FREQ, to name just two. An example of the former… I am using the 2008 Uniform Crime Reporting data on hate crimes. These are coded in infinite detail. I’d like to combine all crimes against races other than black or white, since there are very few in each category. I’d like to combine the categories “Anti-homosexual male, “Anti-homosexual female”, “Anti-homosexual- both sexes”  etc. into a single category. Below was my solution:

Proc format ;
VALUE biasmo1f  11='Anti-White'
12='Anti-Black'
13 - 15='Anti-Other Race'
21='Anti-Jewish'
22 - 25 ='Anti-Other Religion'
26 -27, 44 ='Other'
32='Anti-Hispanic'
33='Other'
41 - 43, 45 ='Anti-Homosexual'
51 -52 ='Anti-Disability'  ;

WHEN: The solution isn’t always to use, or even learn, proc format. Perhaps I wanted to aggregate in a different way. I would like to learn more about the locations in which hate crimes occur. There are 25 categories for location but only a few of them occur as often as 5% of the time. The following few statements will pull out only those locations that occur more than 4% of the time and give me a frequency distribution of those locations along the way.

proc freq data = in.hatecrime ;

tables loccod1 / out = location (where = (percent > 4)) ;

proc sort data = location ;

by loccod1 ;

proc sort data = in.hatecrime ;

by loccod1 ;

data common ;

merge location (in = a) in.hatecrime ;

by loccod1 ;

if a ;

run;

WHY: There is an almost magnetic attraction between software and oneupmanship. Someone might say they above solution is not efficient, there is a better way to do this without two sort steps. Maybe. I can give a reason why I did it this way.

Total processing time (real time, not CPU time) was 78 seconds. It took me another minute to type those statements. So, in terms of both processing and programming time, it was efficient. Most of all, it is easy to read, so if I need to explain it to someone or turn the program over to someone else because I am leaving a project where I was brought in as a consultant for a short period, it is a simpler transition.

I did the frequency procedure selecting those locations that had a percent of > 4, I sorted by those locations and then created a new dataset from the original dataset that excluded those with low frequency.

When someone presents me with a more complex solution to a problem like this, I am the opposite of impressed [that would not be unimpressed. Unimpressed is null. People like that score negative on my impressed scale]. I’ve had people tell me, very condescendingly, that code like the above is wrong because it is inefficient and doesn’t minimize CPU usage. And I sit there thinking that CPU time was 39 seconds, so why do I care?

HOW: This is the Hemingway, part, I think. An expert programmer is able to put together those different pieces of knowledge, the what and when and why, apply what they know, integrating information on some subject area – be it marketing, statistics, genetics or what have you – and come up with a solution that is greater than the sum of the parts.

A novice programmer just hasn’t put in the hours yet to learn a wide array of techniques that can be useful in solving a variety of problems. This in NO WAY implies the person is dumb or incapable of learning to be a fantastic programmer. He or she just hasn’t become that yet.


This is usually because the person is new to the field, but it can also be a result of a lack of interest or a lack of time. I don’t buy that it is due to a lack of opportunity. If you are anywhere with an Internet connection and you have a few bucks to buy a trial or learning sample of the software there are tons of resources out there for you to learn. There are even open source offerings like Linux and R that you can get for free.

The secret is to just hack away at it, and the deeper secret than that is to love it.  Without going into boring details (unlike how I usually do) – when my hotel turned out to be more one star than four star, I was extremely upset and frustrated last night. My solution was to sit up until 4 a.m. reading up on generalized linear models, link functions canonical variates, response bias, and trying different things with proc format.

When I do this kind of work, I’m happy and content (Mihaly Csikszentmihalyi would call it “Flow”) and so I work a lot.

I think I am a damn good programmer and statistician and I think that is the reason why. There isn’t a secret decoder ring. Sorry.