So, I am writing a paper on how you know you (or someone else) is a “real” programmer. That is, they don’t fit in that “new user” box any more. But how do you make that decision?

Is it like pornography, you just know it when you look at it? (Not that I ever personally looked at any of course, but I have heard you can find it on the Internet if you try really hard.)

Yesterday, Rob Meekings made a comment about design decisions. That is certainly a distinction, when you get to the point that you are actually thinking that way. For example, I often will merge everything together in one long dataset, a habit that makes those who love SQL and the star schema just cringe. The REASON I do this is that most of the people I work with are researchers using very powerful computers with datasets of a few thousand observations, or, at most, a few hundred thousand. Even on a desktop, an analysis with SAS, Stata or SPSS takes seconds. It isn’t worth taking an extra hour or two to make a program run in one second instead of two. It also may make the program more difficult for the user to maintain him/herself.

HOWEVER, when I am running a program that runs against a 100GB dataset and can take hours to run because the researcher cannot use a supercomputer, e.g., due to security classification, I’ll spend a good bit of time trying to make it run as efficiently as possible.

If there isn’t a pressing reason not to do it, I’d recommend someone with a large dataset considering running it on a cluster and take advantage of parallel processing capabilities. This means changing your code slightly to run on a different OS, often Linux or some other Unix version.

I do a lot of “throw away programming”, that’s not to say it’s garbage. Sometimes I think my work is quite good, in fact, but it’s not production code that runs every day to produce reports on 500 different stores. When I DO write production code, I do several things differently. One is that I make good use of %include statements. For example, if there is a footnote that is going to be in every single output that says, “Funding provided by National Science Foundation Rural Systemic Initiative Grant #1234-2010″ and several more lines about the university, address for contact, etc., I am going to have a small file that I just include. Yes, I could copy and paste it or have that as a template for when I create a new program. BUT what happens when we get another grant and we want to recognize both funding agencies in everything we publish?

My point, and you may be surprised by this point to find that I do, in fact, have one, is that a distinction between novice and non-novice programmers is that they have the luxury of thinking about a design because they know more than one way to do something.

Comments

Leave a Reply