So, I am writing these papers on moving from novice to intermediate programmer and Kim Le Bouton has to go apply logic to it and ask,
“Just how do you define a novice programmer, anyway?”
I was tempted to be a smart ass about it and answer that it was anyone who didn’t come to my papers, but was overcome by an uncharacteristic burst of maturity.
First of all, my definition of a novice programmer, having been elected the word chooser of this blog unanimously by a nationally representative random sample of all of the people who are me, would say this:
“Being a novice, as distinct from an expert programmer, is not merely a function of years of experience, it is also reflects quality and results of experience. A novice programmer is a person who is limited in knowledge of the field. “
Recently, someone told me there was a surplus of programmers and a shortage of managers. As evidence, he cited some report he had seen where a couple of programmers knew all sorts of programming languages but couldn’t get a job.
I told him,
“I don’t believe that. I believe there are people who know a programming language who can’t find a job but taking a course in a language doesn’t make you an expert programmer any more than writing in English makes you Hemingway. There’s never been a surplus of excellence and I don’t believe there ever will be. Managers who consider everyone who knows a programming language to be interchangeable are going to find that out to their detriment.”
One difference between novice and expert programmers is hours. I loved the book Outliers, by Malcolm Gladwell. His main point was that people who are outstanding in a field spend much, much more time in practice than people who are simply very good.
A while back, Mark Stevens posted a blog on Zero to SAS Certification in Ninety Days. Now, Mark Stevens seems to be a pretty smart guy, who started out with the education, motivation and experience that would make him derive the maximum benefit from this training and it is theoretically possible that I am dumber than a rock, but I seriously question what exactly one is being certified as in three months.
After 28 years of working with SAS, I would like to believe I have learned more than could be picked up in 90 days of study. So, back to Kim’s question, what would that be?
WHAT: A novice programmer is one who knows fairly limited set of procedures or solutions for most problems. For example, given the need to aggregate categories, he or she might consider several IF-THEN, ELSE statements and probably an ARRAY statement with a DO – LOOP. A more experienced programmer would consider other options such as PROC FORMAT or PROC FREQ, to name just two. An example of the former… I am using the 2008 Uniform Crime Reporting data on hate crimes. These are coded in infinite detail. I’d like to combine all crimes against races other than black or white, since there are very few in each category. I’d like to combine the categories “Anti-homosexual male, “Anti-homosexual female”, “Anti-homosexual- both sexes” etc. into a single category. Below was my solution:
WHEN: The solution isn’t always to use, or even learn, proc format. Perhaps I wanted to aggregate in a different way. I would like to learn more about the locations in which hate crimes occur. There are 25 categories for location but only a few of them occur as often as 5% of the time. The following few statements will pull out only those locations that occur more than 4% of the time and give me a frequency distribution of those locations along the way.
proc freq data = in.hatecrime ;
tables loccod1 / out = location (where = (percent > 4)) ;
proc sort data = location ;
by loccod1 ;
proc sort data = in.hatecrime ;
by loccod1 ;
data common ;
merge location (in = a) in.hatecrime ;
by loccod1 ;
if a ;
WHY: There is an almost magnetic attraction between software and oneupmanship. Someone might say they above solution is not efficient, there is a better way to do this without two sort steps. Maybe. I can give a reason why I did it this way.
Total processing time (real time, not CPU time) was 78 seconds. It took me another minute to type those statements. So, in terms of both processing and programming time, it was efficient. Most of all, it is easy to read, so if I need to explain it to someone or turn the program over to someone else because I am leaving a project where I was brought in as a consultant for a short period, it is a simpler transition.
I did the frequency procedure selecting those locations that had a percent of > 4, I sorted by those locations and then created a new dataset from the original dataset that excluded those with low frequency.
When someone presents me with a more complex solution to a problem like this, I am the opposite of impressed [that would not be unimpressed. Unimpressed is null. People like that score negative on my impressed scale]. I’ve had people tell me, very condescendingly, that code like the above is wrong because it is inefficient and doesn’t minimize CPU usage. And I sit there thinking that CPU time was 39 seconds, so why do I care?
HOW: This is the Hemingway, part, I think. An expert programmer is able to put together those different pieces of knowledge, the what and when and why, apply what they know, integrating information on some subject area – be it marketing, statistics, genetics or what have you – and come up with a solution that is greater than the sum of the parts.
A novice programmer just hasn’t put in the hours yet to learn a wide array of techniques that can be useful in solving a variety of problems. This in NO WAY implies the person is dumb or incapable of learning to be a fantastic programmer. He or she just hasn’t become that yet.
This is usually because the person is new to the field, but it can also be a result of a lack of interest or a lack of time. I don’t buy that it is due to a lack of opportunity. If you are anywhere with an Internet connection and you have a few bucks to buy a trial or learning sample of the software there are tons of resources out there for you to learn. There are even open source offerings like Linux and R that you can get for free.
The secret is to just hack away at it, and the deeper secret than that is to love it. Without going into boring details (unlike how I usually do) – when my hotel turned out to be more one star than four star, I was extremely upset and frustrated last night. My solution was to sit up until 4 a.m. reading up on generalized linear models, link functions canonical variates, response bias, and trying different things with proc format.
When I do this kind of work, I’m happy and content (Mihaly Csikszentmihalyi would call it “Flow”) and so I work a lot.
I think I am a damn good programmer and statistician and I think that is the reason why. There isn’t a secret decoder ring. Sorry.