Semi-programming as a way to simplify life

ByAnnMaria De Mars December 8, 2012December 8, 2012

More about the Los Angeles Basin SAS Users Group (LABSUG) later, but I did want to mention one tangential point from the first presentation. It was on the Graphics Template Language (GTL). The first example was pretty cool, looking at the population pyramid by gender and year for the United States, then for Qatar at the same time. This is obviously hundreds of numbers to plot – age by gender for two countries for two years.

You can read how to make this here, in the paper presented at SAS Global Forum this year, Off the Beaten Path.

So, this plot was cool, but there were others that I looked at the plot and thought, “Gee, I’d just have an artist draw that.”

My take-away from that session was that you COULD graph anything anyway with GTL. Whether it was worth the effort or not is another story.

This brings me to “semi-programming” which is a term I just made up. (Not to be confused with semi-infinite programming, which is actually a thing, or quasi-programming, which is what I was going to call it until I found out that was another actual thing.)

Semi-programming is when it is simpler to program half the solution and do the other half some other way.

SEMI-PROGRAMMING EXAMPLE #1

Let’s say your boss wanted you to create a bubble plot of the relative sizes of all the 14 most popular animal species in the Santa Barbara Zoo and have the bubble size be relative to the mean weight of animals of that species. I’m sure you could do something with SAS or some other program with a gazillion programming statements to draw capybaras and parrots and other species on your chart. Or, you could do the bubble plot, find the 14 animal species photos on wikipedia, paste them on to the plot, dragging each photo to be the right size to just fit over that bubble. You could probably do the whole thing with three statements, a few minutes on google and some copying and pasting and be done in half an hour.

Spare me the question,

“What if you have to do it again?”

My initial reaction is, then you quit because your boss is an ass if she regularly asks you to do stuff like that. Seriously though, you could create a bunch with the semi-programming method in the time it would take you to do ONE purely programming.

SEMI-PROGRAMMING EXAMPLE #2

Not convinced? I had another example today. I want to merge my pretest and posttest groups. Because they are very valuable, I don’t want to lose a single subjects. Unfortunately, our subjects are children and due to very strict concerns about confidentiality, we have no personally identifying information. We merge the data sets by username, by grade, by school. In a few cases, the kids did the test twice. It’s online. They accidentally submitted the test, then realized they had skipped a questioned or two, so they opened it again and continued on the test. (There’s a reason we want them to be able to do that.)

So, we have two identical records for that child.

Sometimes, though, there are two usernames because kids at two different schools had similar usernames and one mis-typed it. So, your username might by MightyMan and mine is MightyMean and I accidentally left out the “e”. So, it is NOT the same kid twice. One way to see if it is a real duplicate versus an error is to sort the data sets by school, grade and then username and merge them – but what about if the student didn’t enter the name of the school , which a few did not? Or entered the wrong school? (Yes, that happened.)

I thought about all kinds of complicated solutions to this until it occurred to me that there were probably no more than 5 or 6 records that were really a problem. So, here is my semi-programming solution.

1. Write a program that does this:

Sorts records by username, grade and school
If there is no duplicate (it is the only student, username, grade combination) output that to one data set
If there is a duplicate, output those records to the other dataset

2. Look at the 5 or 6 records that are duplicates. Identify which are the same student twice, in which case, delete the incomplete one. If it is a case of an error, see that MightyMean is really at school number 2, and enter an e in the username.

Since I expect no more than half a dozen of these, it will probably take me less than a minute to fix them manually, after I have my little program to spit out the problem children.

Semi-programming. Think about it.

Dr. De Mars General Life Ramblings

CSS for long division applet

ByAnnMaria De Mars May 10, 2014

At one time, I avoided learning CSS because it seemed like that was a “woman thing” and in my career, I have noted that if you are in a primarily female field you get paid less money and people give you more shit than in predominantly female fields – thus, construction workers make more than…

Dr. De Mars General Life Ramblings | statistics

Spring-cleaning in a statistician’s office

ByAnnMaria De Mars April 11, 2012April 11, 2012

There must be some hallucinogenic drug given off by spring flowers, because it is the one time in the year when I am struck by the irrational thought, “Maybe I should clean my office.” Like any other spring cleaning undertaking, random things are uncovered that make me think, “Huh. I wonder what that was for.”…

Dr. De Mars General Life Ramblings

New Year’s Resolutions

ByAnnMaria De Mars January 3, 2015January 3, 2015

According to the New York Times, 45% of people make New Year’s resolutions and although 54% of them fail, the percentage who succeed (46% , proof that I can subtract and understand the concept of percentages) is more than ten times higher than for those people who say they would like to change but don’t…

Algebra | Dr. De Mars General Life Ramblings | statistics | Technology

I Wonder What Would Have Happened If I Sucked at Math

ByAnnMaria De Mars December 6, 2010December 6, 2010

On the front page of the Los Angeles Times today was a story about three of the middle schools in Los Angeles serving the highest proportion of students in poverty. My daughter, “The Perfect Jennifer”, did her student teaching at one of the three and teaches at a second. She said to me today, Mom,…

computer games | Dr. De Mars General Life Ramblings | Software | Technology

I never made a Halloween costume but here is some code I wrote

ByAnnMaria De Mars October 28, 2015October 28, 2015

Many years ago, I was walking through the exhibits at the county fair with my late husband (he was alive then, that’s why he was able to walk with me) and I lamented, Look at those quilts. My grandmother makes quilts. Look at those crocheted tablecloths. My other grandmother crochets. Look at me – what do…

Dr. De Mars General Life Ramblings

Sexual harassment can be complicated

ByAnnMaria De Mars October 19, 2013

Feeling confused and hypocritical. I have read a lot of sad posts lately about sexual harassment, by women in the technology industry who were sexually assaulted, women in science who were verbally harassed. The reaction to the harassers of many people (and I admit, my initial response) has been akin to “burn ’em at the…

3 Comments

disgruntledphd says:

December 8, 2012 at 5:29 am

While I agree with your major point, I typically find (especially with your duplicate records example) that if I do it that way, then I normally have to repeat an extremely similar process soon after, while if I do it in a fully automated fashion then I never use that damn script again.

I suppose the moral is, don’t automate until you have to do something for the second time.
Jef Allbright says:

December 8, 2012 at 10:00 am

Also on the subject of optimal mix between more reductive (programming) and more holistic approaches to a new problem, where the problem is even moderately complex, it pays to work through the solution manually _before_ starting to automate it. Almost always, edge cases and previously unrecognized facets will be discovered, as well as shortcuts and opportunities for refactoring.

Likewise, I have found that before working with a new dataset, it’s worthwhile to “simply” browse through the data, looking especially at the beginning and end, assessing maxima, minima, gaps and outliers, and other patterns or features of interest, _before_ proceeding with automated analysis.
AnnMaria says:

December 8, 2012 at 4:37 pm

The moral is don’t automate until you have to do it for the second time- I think that is a perfect moral!

As for Jeff’s approach, I wholeheartedly agree. That was going to be my first point in a paper on macros that I never got around to writing – make sure it runs as regular code first. I’m also with you in taking half an hour or so in eyeballing the data. I DID write a couple of macros for that, to produce mean, minimum, maximum, print the first five rows, etc.

Similar Posts

3 Comments

Leave a Reply