|

Hybrid Open-Source SAS – or what are friends for?

I was reading this article in the Wall Street Journal on SAS software and CEO Jim Goodnight said,

“We’re producing so many new products. [They’re all] in the funnel and we’ve got to get them in production and it’s taking us longer to get them out the door…. A lot of the problems are testing issues. It’s taking too long to solve all the problems. Every piece of software has its bugs. And with more and more products, we’re struggling with compatibility to make sure the next release is easy to migrate to for certain customers. With the sheer number of software solutions we have, it’s making it harder and harder to develop and test them all.”

Which got me to wondering why SAS does not take advantage of its considerable user base. In another article from CBR Business Intelligence, it’s claimed that SAS has 45,000 customer sites. (Not that I’m stalking y’all or anything). I have no idea what the average number of users is at each site but let’s try a really low estimate of around 22. I say really low because I know that many of those sites are universities and corporations that have hundreds of users and it would take a heck of a lot of small shops to bring it down to 22, but let’s go with that. So, this gives them around a million users.

You would think that somewhere in those million users you would have people who would be happy to dive in and break things (also known as testing). One benefit of open source software is that you have people trying things all the time. When they come up with problems, they create fixes. There are also disadvantages to open source software SOMETIMES –  lack of documentation, lack of support. That’s not always true. Personally for large open source options, like Linux, the documentation is massive as is the support within the user community. There can also be legal issues if a company is marketing something and it has someone else’s code included.

However, there is a hybrid option, that can happen on two fronts:

1. Testing.

When I was at USC when we would get new versions of any software or operating system, there were several people (including me), who had the immediate reaction of ,

“Let’s try to break it!”

We’d install it on every configuration of virtual machine, hardware and operating system. We’d try to use enormous files and analyze huge matrices of dependent and independent variables. We’d compare results from different applications. Ostensibly we did this as part of our job so that when anyone asked us a question we had a better answer than,

“Your guess is as good as mine.”

There was also the aspect of, as one of my co-workers said,

“They pay me to play with computers all day. How cool is that?”

The fact is when you create a piece of software of a significant degree of complexity (and SAS is certainly on the far right of the bell curve) it is damn near impossible to test every possible permutation –

“What if I run SAS on Linux running on VMWare on a Macbook to read a 100GB dataset that was orginally created by SAS on Windows in China using a double-bit character system and then exported using PROC COPY?”

With a large enough user base, SAS could make copies of work in progress available for testing which would allow identification of problems. I can see many reasons that users would be happy to do this:

  • Consultants would see this as an opportunity to get ahead of the curve, using the newest software before it was available to the general public.
  • Students could use this chance to learn more about the latest software.
  • Just for the hell of it. (This is my motivation for most things.)

Fixing those bugs would still have to be done in-house to allow for quality control, documentation and liability issues. Hence the hybrid part. I’d be interested to see how having a large number of users turning the software inside and out could supplement whatever is done internally to minimize the bugs in software when it is released (and just accept the fact that there will always be bugs).

Someone suggested possibly SAS is worried about it damaging their reputation if they let out software with lots of bugs in it. I don’t think that would occur if they were very upfront about,

“Look, this is a work in progress. Run it through the paces and see what you think.”

as opposed to doing what some companies do and essentially releasing their beta version.

And, of course, no matter what you do or say, some people will complain and criticize because some people are just stupid that way.

“What! I can’t believe you did not see the importance of including  a Serbo-Croatian to Mandarin translation function! And you say you have a comprehensive set of character functions, you fools!”

2. User-written macros

Back in the 1980’s, there used to be a book (yes, an actual physical phone book type of book) of SAS user-written macros. Over the years, of course, many of those have morphed into SAS procedures. I remember learning SPSS because SAS didn’t do loglinear models, so this was quite some time ago and there is not as much need.

Of course, there are a zillion packages for R, which is true open source, but Stata, which is not, also has a host of Stata procedures that are written by users. Raynald Levesque’s website has 140 macros for SPSS. SAS macros exist in diverse places – individual websites. SUGI/SGF and local user group proceedings.

With the data.gov initiative alone I think there is a lot of extensibility of SAS that is going untapped.

Lately, I’ve been pondering if I should be picking up some other language, partly just because it is good to learn new stuff, but also because it seems as if SAS is a bit behind on getting involved with some of these opportunities like with open government.

All the same points could be made about SPSS but they at least have the excuse that there are not as many users, not as many people writing syntax versus pointing-and-clicking and they have been bought by IBM so who knows what direction they are taking.

It’s just puzzling to me why with a such a strong user community SAS is overlooking so much of the potential for being faster, better, smarter.

Similar Posts

5 Comments

  1. As I noted on Twitter, I think at least part of SAS’ lack of a significant relationship with the userbase is the paradigm of the central product.

    As much as SAS is one of the standard tools of business analytics, medicine, and so forth, the core product is really from the 1970s. While there were time-sharing and a growing number of desktop systems (especially by the end of the decade), batch processing was still commonplace, and the general lack of interactivity in SAS pretty much comes from its batch processing roots.

    I think part of the batch processing idea is a certain sense of centralization, and it seems reflected in SAS itself. While there are, as you mentioned, third-party SAS macros, it’s not something that gets much official attention or support, and certainly not something that becomes almost quasi-official (as with some Stata packages).

    On the other hand, products like R and Stata are more recent and are part of a different design philosophy. Instead of taking a batch-file approach to handling a given problem, R and Stata both allow for all sorts of exploratory analysis on the desktop. Load in the data, type in some commands. There’s a direct connection with the work and the analysis that I found in those two that I didn’t get in SAS.

    Also, if I want to know how routines are implemented in Stata, I can look at the ado files and see how, say GMM is implemented. I can learn to write programs that extend and expand upon what’s already provided. The same goes with R (even more so, since R is released under the GPL). I never got that sense of access and empowerment from SAS. (Not that SAS is bad — I find that it’s blazingly fast on huge datasets, for one thing. But it’s something where I don’t reach for it unless I have to.)

    Also, I think that while I’ve seen SAS in academia, the majority of contexts in which I’ve seen it and used it were in commercial applications, where I didn’t get a strong sense of collaboration among users. On the other hand, I mostly see R and Stata in academia, and I think that strong academic usage lends itself to people having problems, writing tools to solve the problems, and sharing them with other people in the hope they may find it useful.

    (I cannot speak for SPSS, since I have no experience with it. But I notice something similar to R and Stata with MATLAB and Octave, and more recently with Python packages for numerical and scientific computing.)

  2. This just doesn’t fit with how SAS do business. They seem to make a lot of money by being a closed shop.

  3. Pingback: Peer Revue

Leave a Reply

Your email address will not be published. Required fields are marked *