One of the many questions on start-up accelerator applications that make me go “Hmm”, is this question :
How many lines of code have you written?
I have heard of, but thankfully never worked at, organizations that evaluated their technical staff by the lines of code written.
Let me give you two stories that illustrate why this is a bad example.
Once upon a time ….
Many years ago, I worked at an organization that decided the programming staff was overpaid and generally had a bad attitude. (No, this wasn’t due solely to me. In fact, unbelievably, I was one of the easier to get along with people on the technical staff). So … they hired some people at low salaries who had, I believe, a three-month training course in SAS. Most of the senior people avoided the cube farm where these new hires were housed, believing that it would be apparent soon enough that you get what you pay for.
I would generally come in around 10:30 or 11 and leave the office around 8 pm. I couldn’t help but notice several times that some of these new programmers were still there when I left. Leaving one evening, I saw one woman in tears in her cubicle, so I stopped and asked what was the matter. She said she had come into the office at 6 a.m. and was still waiting for her program to run. I sat down with her and looked at her program, which was a simple thing to create a few total and subtotal scores and get statistics on these by state. Her program looked like this:
LIBNAME in “directory”;
Data Alabama ;
set in.us ;
If var1 = . then var1 = 0 ;
If var2 = . then var2 = 0 ;
If var3 = . then var3 = 0 ;
Total = var1 + var 2 + var 3;
If state= “Alabama” ;
Proc means data = alabama ;
var total ;
REPEATED 50 TIMES (50 states + Washington, DC) for a total of 562 lines of code (there is only one Libname statement).
The reason it was taking so long is that she was reading in this dataset with millions of records 51 times. There are many ways this could be fixed. Since I was on my way home, I sat down and did this.
libname mydata “directory” ;
data test ;
set mydata.us ;
total = sum(var1,var2,var3) ;
keep total state ;
Proc tabulate data= test ;
class state ;
var total ;
Table state ,(total*(n*f=comma12.0(mean std)*f=comma8.2) );
My program was 10 lines, read the dataset in once and produced a nicely formatted table.
So, was she 60 times more productive? I don’t think so.
Story number two happened in the last week. I have been working on improving our two games, Spirt Lake, and particularly Fish Lake. A major improvement has been merging multiple scripts into one.
Here is what we did with our prototype, since we had to meet a deadline:
- Wrote a script to handle multiple choice tests.
- Wrote another script to handle tests that had an integer or decimal answer.
- Wrote a third script to handle tests that had a fraction as an answer, like 4/5 , to be sure it also accepted 8/10, etc.
- Wrote a fourth script to handle tests where the answer was dragged and dropped.
Now obviously, de-bugging would be simpler if we have only one or two scripts. So, this week, I have been taking a couple of scripts and making them more generalizable and deleting many others.
Another thing I’ve done is create a CSS style sheet for each game and included that link in files instead of having the common classes defined in each page.
The number of code in the project has gone DOWN by hundreds of lines, but I think the ease of maintenance and documentation has gone UP.
Now, if you asked me how many lines of code I have written in my life, that might be a relevant question. (True story, I once worked on a job where I did repeated measures ANOVA so many times for so many projects, I got so bored, I started writing statements backward beginning with the semi-colon.)
Well, I better get to bed since it is well past midnight, I have seven teenagers sleeping over at my house and I have to get up in the morning and take them all to Disneyland for The Spoiled One’s birthday.