I read Allen Englehardt’s post this morning, on R vs SAS/SPSS in corporations and it motivated me to set aside my infinite to-do list and write about something I’ve been thinking for a long time.
Since Allen writes on R-bloggers, it will surprise no one that his conclusion was that R is preferable to SAS and the main obstacle to its use is the inertia and ignorance of executives and HR departments. What may surprise some people is that I agree with him that there may be cases where R is preferable, although not for the same reasons he gives, and that SAS Institute has some serious issues it needs to address, although looking at it from the side of someone who likes and uses SAS, I see different problems.
As someone who has used SAS daily for 29 years, I disagree on some of Mr. Engelhardt’s reasons both for and against SAS. I do agree, though, that there are some serious issues that, unless SAS Institute starts taking them seriously, may eventually end up in SAS going the way of WordPerfect or COBOL.
Engelhardt said that one reason R is not the choice for corporations is
“R takes talent to use. (That is kind of why we like it.) It takes talent to maintain. My problem as the manager of a commercial analytical insights team is that it is very hard for me to retain that talent.”
I quoted this so you would not think I made it up. I thought of incredibly brilliant people like Rick Wicklin, the author of Statistical Programming with SAS/IML software. The first paper I pulled up at random in my notes from SAS Global Forum was An Overview of Survival Analysis using Complex Sample Survey Data, by Dr. Patricia Berglund. I could add a vast number of examples of SAS users who are not talent-less hacks, but you get my point.
He’s incorrect in assuming most of the people who use SAS use the menu-drive SAS Enterprise Guide, Enterprise Miner, etc. I’ve been to many user group meetings/ conferences where when asked how many do it’s less than 10% in the room who raise their hands. (Non-random sample, I know) but in 29 years in diverse organizations I see the same thing – the great majority of people who use SAS write code. Those who use it for very long write macros, create their own formats, extend it with CSS, Perl, Python, IML and sometimes even R. Assuming R = talented, SAS = pointing, clicking drone is a bit over-simplistic.
SPSS, I’ve seen the opposite and I agree on that point. People who are SPSS users are hardly likely to abandon it for R – yet (see below for why they may). I was once speaking with a developer at SPSS about a problem and he asked me, as one of the standard questions, “Do you write syntax?” Then, because we had been talking for a while already, he caught himself and said, “Of course you do.” My point is that the assumption was that you did not use syntax, and, again, in my admittedly non-random sample over 25 years of using SPSS, that assumption has been increasingly born out ever since menus became an option.
So, I disagree with his assumption that R people are just more talented (although that was popular with readers of R-bloggers) and I am not completely sold on his disadvantage that SAS costs corporations a lot of money. I think Mr. Engelhardt over-estimated the ignorance of executives and under-estimated the cost of the vast body of legacy code out there.
As I have said before,
Re-writing everything to run on free software is only a good deal if your time has no value.
I think he under-emphasized this for corporations, an enormous COST of replacing legacy code. You’d need to re-write the code, re-write the documentation and re-train the employees. Anyone who has written much code, especially for a complex system, realizes that it will not work right out of the gate. For a while, you will be running two parallel systems. That’s expensive. You will need to keep all of your SAS people until you have your new system up and running. Will you have those people learn R? As Engelhardt notes, there is a difference between reading an introductory book and being an expert. Will you hire new people with years of experience with R? Then what will you do with your SAS people? Fire them all? I presume they have other knowledge of statistics, your industry, etc. that you might want. Will you just take the SAS code and re-write it in R? As anyone who has worked in corporations on large systems will guarantee, a lot of that code “Grew like Topsy”. It can be improved because you probably have patches on top of patches. What do you say to your manager when your R code has a bug and quits running? (This happens to everyone, but remember, you are replacing a system that was running with a new one that, made with free software and better in many ways, is not running.) Also, does that mean your people who are writing the R code are going to be well-versed in SAS, too? Or are you going to have one of those talent-less SAS people you are going to fire sit next to you and tell you what each piece does?
I said this before but, who is going to write the documentation of everything the program does and how to maintain it for when your talented R person leaves?
So why should SAS (and SPSS) be worried about R?
First of all, for those people and organizations that do NOT have legacy code, the major barrier I just talked about is removed. If you are a new company, you don’t have any legacy. There is no cost of re-writing, re-documenting anything. If you are a student, your time doesn’t have any value to anyone but you. This is why R is so popular among students, and this should make SAS very, very worried. Yes, lots of students hate R, but lots of them hate SAS, too (more about that in a minute).
A few days ago, I was at a SAS USERS GROUP MEETING and three people sitting around me were discussing using R to teach students. One person said that the students would hate it because it was too difficult, where a second professor countered that he had used R studio and it was not that difficult. The third chimed in that he had used it in graduate school. Again, this is not a random sample but rather one that should be biased toward SAS. These are people who are interested in SAS enough to attend users group meetings and yet discussing the benefits of switching to R. One had already done it, a second was at least considering it, though unconvinced, and the third saw no problem with it.
A major reason that people, especially in academics, consider switching to R – or a piece of slate and a sharp rock for that matter is that their installation process BLOWS. If you have never had to install SAS, let me just tell you that it is bad beyond imagination and has been for thirty years. I remember in graduate school using SAS 5 how every time we had to renew our license and I had to get things working again the SPSS people in sociology would laugh at me. It has only gotten worse. A month ago, I was having lunch with the SAS administrator at a large university and she told me she hated SAS. She tells people to switch to JMP or SPSS every chance she gets. I asked about SAS On-demand and she said that almost every single person had a problem installing it. At one point, I was the SAS administrator for a large university and about 10% of the people had trouble installing SAS. These are not stupid, lazy people. They’re faculty and researchers at a prestigious institution.
I used SAS On-demand for my statistics course I am teaching. Here is what I did:
- Tested everything myself and registered a month before class.
- Made a powerpoint of step by step how to get the software
- Made a MOVIE of how to get and install the software that students can watch to review the steps
- Demonstrated in class how to get a SAS profile, register for the course and download and install the software.
Obviously I did this because I believed learning SAS would benefit my students, but it took quite a bit of time I would not have had when I was an assistant professor trying to get tenure.
As it is, about half of my students have been able to use SAS On-demand. Why? Mostly because it doesn’t run on a Mac (more on that later). Those who had Windows were able to get it to run by the third week of class. One student, however, could not get it to run. I tried uninstalling and re-installing it. Still didn’t run. In the end, he received this message from SAS Technical Support, who were no doubt correct
It sounds like you may have a registry key that is acting up. Lets try the following:
1. Reboot your system.
2. Log in as the Administrator.
3. Close all applications including anti-virus software (even if it is just running in the background).
4. Go to the system registry by clicking on Start>Run and type:
5. Examine the following Windows registry key:
If it contains FileRenameOperations or PendingFileRenameOperations, delete this key, and retry your SAS installation.
Warning: Always back up your registry before you make any registry changes. For assistance, see Windows Help, Microsoft documentation,or the Microsoft Windows Web site. SAS is not responsible when you edit the Windows registry: changes in the Windows registry can render your system unusable and will require that you reinstall
the operating system.
After removing these keys, continue on with the installation.
I am not faulting SAS Technical Support. They are probably right, this was probably the problem and it probably would have worked. I have done similar things getting Enterprise Miner to work on a computer once and it did work. The problem is that when you send this to a student who is just trying to pass a statistics class, and Advanced Quantitative Data Analysis is not a fluff course to begin with, their response is going to be, and I believe this is a direct quote, “Fuck it!”
The student asked if he could use a different software package he had used as an undergraduate and I said sure, go ahead.
This type of problem does not occur often – this was one out of 10 or 11 students who tried to install SAS – but when it does, this student becomes like the SAS Administrator I mentioned above. They both hate SAS. This cannot be good.
After the problem of installation, the biggest problem SAS has is it does not run native on a Mac and the SAS On-Demand doesn’t run on virtual machines, either.
Of the 17 students in my class, 7 or 8 have Macs. When I required SAS on-demand, I found that it did not run on a virtual machine, so I had to partition my hard drive, install boot camp, buy a copy of Windows 7 and install that. Since I am using this for a class, I was able to get Windows 7 for under $50 so it was not a big deal for me, but since my “free” version of SAS has now cost me $50 that is as much or more than many student licenses for statistical software. Also, there is the time part. I like playing with computers, installing boot camp and partitioning the hard drive was pretty effortless (Your mileage may vary) and downloading and installing SAS On-Demand took very little time with the very, very good connection we have in our office.
I have taught statistics at three private universities in California in the last several years (again, a non-random sample) at one, 25% of the students had a Mac. At the other two, it was closer to 50%. According to tech support, this was what they saw campuswide. Perhaps if you can afford $30K and up for tuition you buy more expensive computers. This was also something the folks at the SAS user group mentioned about R – you know it runs on Mac and Unix, too.
A few of the students did what I did, installed boot camp, installed SAS On-demand, and it worked fine. The only problem now is that much of your other software like PowerPoint, Word is probably on the Mac side. You can do what I do and install OpenOffice, which I really like, but now you are taking more time to install boot camp, install OpenOffice – so the time aspect of using SAS over R is starting to disappear.
The final problem – the free cloud-based service, SAS On-Demand is pathetically slow. I’m holding out hope for that one, though, because it has increased so much from a year ago when it was just useless. Useless to usable and decent but slow is a pretty big leap.
Why I Recommend SAS Anyway – for now
There are advantages, too.
First of all, amazing technical support. Engelhardt just brushes this aside, but SAS tech support is AMAZING. See the answer above. If I really wanted to get SAS working, and I was that student, I’ll bet it would work. I called them the other day because a client needed the equations used to calculate power in PROC POWER because her dissertation committee required it (no, I very seldom have clients who are students because they can’t afford our fees, but this was a special case). I got transferred to the right person and got an answer in 10 minutes. Or you can read here about the amazing Tom from SAS Technical Support . See this post I see smart people, for more details on both problems with SAS installation and the amazingness of technical support. (not to be confused with the creepy Tom from MySpace).
Compare this to SPSS where I have sat on hold for 45 minutes, as the norm. (This was before they were bought by IBM, it may be better now.)
Second, SAS has a huge user group base. Their user groups are amazing. I know R has meet-ups and meetings that are becoming more common around the country. From what I have seen, though, the SAS user groups are growing in size and activity as well. Orange County is starting a new user group, the one in San Diego meets quarterly, LA has annual meetings and we were discussing at WUSS possibly making this semi-annual. There is SAS-L and its archives, which are a fountain of information, the growing SAScommunity.org Did I mention their user groups are amazing? They have regional user group meetings, PharmaSUG and SAS Global Forum which is amazing cubed. All of the regional user groups offer student and junior professional scholarships, including travel, to allow people starting their career to attend for free, learn and network.
Third, SAS does EVERYTHING. This might be why it takes the sacrifice of a flamingo to get it to install sometimes, but once installed it can be used for anything. More than once, when someone has had a problem computing a statistic, I’ve heard someone sniff, “Well you could do that in R”, believe me, whether it is reporting with columns in alternating chartreuse and magenta, running a nightly analysis of your data that is uploaded to the web at 2 a.m. or analyzing a complex national survey, SAS does it.
Because SAS does everything, including being great for analyzing huge and complex data sets, really great statistical graphics, maps, every flavor of report and every type of statistic, there are jobs out there in those corporations now. That is the main reason I chose it for my students. Many of them are mid-career professionals getting a Ph.D. and there will be SAS jobs available when they graduate and for the 10 or 15 years remaining until they retire.
For younger students, and down the road, I think unless SAS Institute can get SAS On-demand working and fix its installation fiasco, there are going to be some serious problems. That makes me sad because I think SAS On-Demand could be insanely great and SAS Institute is completely missing it. This must be how Steve Jobs and Steve Wozniak felt when they saw the first GUI interface and mouse at Xerox.
Dudes! This could be insanely great! Don’t you see that?
Apparently, they don’t. If nothing else, they should license it to some start-up that will realize that potential. If you are interested in that, holler and I’ll holler back.
More on that later, this post is already thousands of words longer than I meant to write today, I have a paper to write, need to price a contract and the rocket scientist is asking why we live by the beach in Santa Monica if I won’t walk down and have a drink with him while overlooking the ocean. Having no answer for that, I’m heading out for Chardonnay.