I'll get this down eventually

I'll get this down eventually

Writing a presentation for WUSS, I had to fill out the usual check box for the intended audience:

Level of programming expertise:

___ Novice __ Intermediate __ Advanced

and I started wondering when exactly does someone stop being a novice? One answer is that your programming no longer LOOKS like it was written by a novice. That’s kind of circular reasoning, though, isn’t it? To be more specific, here are a few of those signs, generated from a survey of a random sample of 1.

(Note, if your programming does not always show all of the characteristics mentioned below, you are forbidden to feel bad. All but a very exceptional few programmers will admit to having made every ‘newbie’ mistake when they started, and on occasion, they still do when they are rushed, tired or distracted by three fighting children or after their third martini. As for that exceptional few – they’re chronic liars. Stay away from them.)

Five signs you’re no longer a novice, in no particular order ….

1. Good use of functions

AvgQtr = (Jan + Feb + Mar) /3

is a sign of a novice

AvgQtr = Sum(Jan, Feb, Mar) /3

is better

AvgQtr = Mean(Jan,Feb, Mar)

is what an intermediate programmer would do.

2. You know options of options
3. You understand how the particular language you are using processes data.

For example, in SAS, let’s say you have two datasets

Pretest has the following variables: Id Age Gender Testscore
Where testscore is (obviously) the pretest score.
Posttest has the same variables: Id Age Gender Testscore
Where testscore is (obviously) the posttest score.

If you do this (bad!)

Proc sort data = libref.pretest ;
By id ;
Proc sort data = libref.posttest ;
By id ;
Data libref.alltests ;
Merge libref.pretest libref.posttest ;
By id ;

You have just created a dataset that is a copy of posttest because the testscore from the second dataset named will copy over the first.

Try this:

Proc sort data = libref.pretest out = pre (rename = (testscore = pretest)) ;
By id ;
Proc sort data = libref.posttest out= post (rename = (testscore = posttest));
By id ;
Data libref.alltests ;
Merge pre post ;
By id ;

Yes, you COULD have done this by at least one data step where you renamed the testscore variable, but adding an extra step is inefficient.

A good, short article on beyond the basics in proc sort was written by Kelsey Basset.

4. Use your knowledge of functions in your programming logic.
5. Don’t forget about missing values.

For example, a researcher wants to categorize people who have ANY positive response to five questions on raising taxes, “Would you vote to raise taxes if … the state budget isn’t balanced?” “Would you raise taxes if … the option was to cut social services?” and so on.

A novice response would be:

If q1 = 1 then taxes = 1 ;
Else If q2 = 1 then taxes = 1 ;
Else If q3 = 1 then taxes = 1 ;
Else If q4 = 1 then taxes = 1 ;
Else If q5 = 1 then taxes = 1 ;
Else taxes = 0 ;

Better

If sum(of q1 – q5) > 0 then taxes = 1 ;
Else if sum(of q1 – q5) = 0 then taxes = 0 ;

The reason for having the second IF in there is that if you do not then all of those with missing values get set to zero, which may result in throwing off your results by a great deal, depending on how frequent missing data is.

There are a variety of ways, some better some worse. However, one statement that does exactly what we want is :

Taxes = Max(of q1 – q5) ;

If any of the questions were answered 1, the value of taxes is 1. If all were answered 0, the value is 0 and if all were missing, the value is missing.

I saw a similar example from SPSS on Douglas Smith’s page. Although Recode is actually a command and not a function, my point is the same. Once you proceed from being a novice, you are naturally seeing the ways you can make your program more efficient.

“Another example of using recode might be to invert the order of the values for a subjective evaluation variable. For instance, the variable “happy” has three valid response categories:

1 = Very Happy
2= Pretty Happy
3 = Not Too Happy

You might want to change the order to go from least happy to most happy. To do this, all you need to do is swap the values 1 and 3. The recode statement that will accomplish this is:

recode happy (1=3) (3=1).

Oh, and if you don’t use the command window, much less the Do-file editor in Stata, you are definitely a novice. Same goes for anyone who doesn’t write syntax for SPSS or hasn’t found a use for the Program window in SAS Enterprise Guide.

That isn’t to say that there will never come a day when one can be considered a programmer by simply being very good at pointing and clicking.

Just sayin’ …. today is not that day.

Comments

3 Responses to “Signs you’re not a novice programmer”

  1. Tweets that mention Signs you’re not a novice programmer : AnnMaria’s Blog -- Topsy.com on July 4th, 2010 4:21 am

    […] This post was mentioned on Twitter by annmariastat, Yossi Levy. Yossi Levy said: Sharing: Signs you’re not a novice programmer http://bit.ly/dmtg0t […]

  2. Rob Meekings on July 5th, 2010 8:58 am

    At the risk of completely missing the point, would you consider using a hash table for (3), above? For a little bit more code you’d get a lot more control, reduce I/O and remove the need to sort?

    I suppose, (trying to get back on topic,) that getting someone to talk through their code, to defend or justify the “design” decisions they’ve made in their choice of tools and approach, could be a lot more illuminating than forming a judgement based on a code snippet in isolation.

    Could, or should, some of this be captured in comments in the code?

    Rob

  3. admin on July 5th, 2010 1:55 pm

    Personally, I would think a hash table would be past intermediate, but you’re right, I would usually take it as a sign someone was not a novice.

    Of course you are right that no one should make a decision based on a snippet of code.

    What I was speculating about (sort of musing out loud as I am writing a paper on it) is how do you decide someone (you or anyone else) is a novice versus intermediate programmer? We’re required to check this box for all sorts of things – specifications to HR for a job description, intended audience for a presentation, our own skill level, references for jobs.

    When we say someone is an intermediate programmer what exactly do we have in mind?

    I think comment statements is an excellent example of that (thanks!) and, as you pointed out, illuminating comments, not the kind like after a

    Data fabulous ;

    /* This creates a dataset named fabulous */

    Sometimes the reason WHY someone did something is more interesting than what they did.

Leave a Reply