| |

The Next Big Thing

I’m at Seattle this week, at SAS Global Forum, and it is even greater than usual. I go to several conferences each year, some because I am presenting, some because there is a topic that particularly interests me, but there are three I go to every year.  Of these, SAS Global Forum is the one I would absolutely not miss. It is not for those on a limited budget, but it is worth it. You get the chance to meet A LOT of the smartest people in the world. Seriously. And I have a basket of degrees and am married to an honest-to-God rocket scientist so my bar for “smartest people in the world” is pretty high.

One of the other two I always attend are the Western Users of SAS Software conference, you learn a lot , it’s relatively inexpensive and not far to travel. Lots of bang for the buck. The second is the SPSS Directions conference.

At ALL of these, and in general, in the back of my mind all of the time, I am looking for “the next big thing”.  Whether as an individual, a university or a company, I think to stay competitive in the long-run you need to be ahead of the learning curve, as people who want to be smart-asses refer to it, “bleeding edge”. Think about it, if you were teaching statistics twenty years ago, you had the choice of having your students learn SPSS, SAS, SYSTAT, BMDP or Minitab. Of those, BMDP, which was “for real statisticians”, kind of like the R of the day, is one I haven’t seen used in years. I thought SYSTAT was off the market but I did see an ad for it recently, surprised to hear it still existed.

If you had taught your students SAS twenty years ago and they stuck with it they are much more marketable now than if you had made the other choices. My definition of marketable is based on how many jobs are available requiring SAS as skill, and how extensible those skills are. For example, Stata is not really feasible to use for running a company’s entire data management and data analysis. If you are an individual economist and you just need to do some specific econometric procedures, you don’t care about that, but if you are looking for “the next big thing”, something that will be around and used by millions of people twenty years from now, Stata is probably not it. Actually, I don’t think that’s their plan, anyway. I think their plan is to be a very good choice for high-level statistical analysis and stay in business as a profitable company.

Contrary to what  some people seem to think, R is definitely not the next big thing, either. I am always surprised when people ask me why I think that, because to my mind it is obvious. Note: For those of you who were so unhappy with the example I used previously, here is a new snippet of code from the site R by example

Below is an example of R code:

# Goal: Simulate a dataset from the OLS model and obtain
# obtain OLS estimates for it.

x <- runif(100, 0, 10) # 100 draws from U(0,10) y <- 2 + 3*x + rnorm(100) # beta = [2, 3] and sigma = 1 # You want to just look at OLS results? summary(lm(y ~ x)) # Suppose x and y were packed together in a data frame -- D <- data.frame(x,y) summary(lm(y ~ x, D)) # Full and elaborate steps -- d <- lm(y ~ x) # Learn about this object by saying ?lm and str(d) # Compact model results -- print(d) # Pretty graphics for regression diagnostics -- par(mfrow=c(2,2)) plot(d) Follow this link for the rest of the program.

I know that R is free and I am actually a Unix fan and think Open Source software is a great idea. However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail. It does NOT fit with the way the vast majority of people in the world use computers. The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.
There are two developments that I see coming as The Next Big Thing.
Data visualization. I am teaching a workshop this summer on this topic. This isn’t an ad, it is not open to the public so you can’t come anyway. I’m teaching it because I have seen more and more professors AND students frustrated by the fact that the average graduate student has trouble really understanding statistics. They may be able to get the correct answer on a multiple choice test that asks about a critical p-value. I have lived over half a century now and discovered that life holds very few multiple choice tests. We need statistical thinking, data literacy or whatever cool catch phrase someone can coin. This is the wave of the future. I am going to use examples from SPSS, SAS Enterprise Guide and JMP in this course because they can all be done with the pointing and clicking AND for those who want to go further all have a coding option, giving that extensibility thing.

Analyzing enormous quantities of unstructured data: First, let me explain structured data. That is data that is in a set format. Say, you have your annual expenditures. The first column is date of expense, the second column is check number, the third is the amount. That’s structured data. It can be over more than one row and in all sorts of other ways but the main point is that you have some sort of definite structure. The overwhelming majority of data – forum posts, blogs, comments on customer service cards, websites, etc. etc. is unstructured data. People start wherever they want, finish wherever they want, change subjects and just basically do it however the hell they way.  And there is a ginormous amount of this stuff. The Next Big Thing is going to be finding meaning from this data. Google and its imitators are doing it with their search engines. Every company that has a clue is mining for market information.
So, for the next year, those are the eggs I am putting in my basket. I am sure the shape of those two fields will change over the years, but I guarantee that neither will go the way of BMDP, MUMPS and COBOL.

Similar Posts

67 Comments

  1. Hello,

    Interesting post – I am writing a reply to it on my blog now.

    Just before that, I would like to encourage you to re check your R code, since it doesn’t seem to compile…

    With respect,
    Tal

  2. AnnMaria:

    The R code you posted does not run:

    > a sigma y r plot(-4:4, -4:4, xlab= ‘x’, ylab= ‘y’, main= “”, sub = “”,type = “n”)
    > points(x,y,pch=19,cex=0.2)
    Error en points(x, y, pch = 19, cex = 0.2) : objeto ‘x’ no encontrado
    > legend(-3.9, 3.8,substr(paste(“r=”,r), 1, 8), bg=’gray90′)
    Error en paste(“r=”, r) : objeto ‘r’ no encontrado
    >

    You failed to define x and, as a result, you failed to define y and r.

    It looks as if you didn’t know how to program in R. How can you criticize a software you don’t seem to understand?

    Best regards.

  3. Well, if R is an “epic fail” because it needs programming, what about the (dinosaur-like) so-called programming of SAS? Or you do not actually use the programming interface of SAS?

    And would you post R code that is executable to your reader? The variables ‘x’ and ‘xnorm’ were missing, and the error term was a constant 0!

    I’m glad to see data visualization is one of the “big things” in your list, as that is also what I’m interested in. I’m not sure if you ever checked the R packages related to this area: http://cran.r-project.org/web/views/Graphics.html Among them you might be interested in rggobi (or GGobi: http://www.ggobi.org/). The R Graph Gallery is also a place to see what R can do in graphics: http://addictedtor.free.fr/graphiques/

  4. “The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.”

    That’s completly true but, the majority of people have no interest for statistics. Whereas the majority of people dealing with statistics should be programmers if we want statistics to have a meaning in a real life.

  5. “However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail.”

    I don’t follow this argument. As you must purchase SAS, deal with technicians at your institution, deal with a nightmare of licensing, and it can take a long time. R takes less than 2 minutes to install.

    I think that the younger generation of statisticians are using R and are more familiar with coding. Everyone has been exposed to some language, especially HTML.

    Graphically, SAS really can’t compare ( http://addictedtor.free.fr/graphiques/ ). I mean I would rather graph in Excel than SAS! Sure you can use some macro that is about 4 paragraphs of code, or you can use one line of code in R. Especially with multivariate statistics.

    With that said, I do feel that mixed models are much stronger in SAS than R. Especially those with complicated nested structures. I do all my mixed models in SAS, all my graphics, simulations, and multivariate statistics in R. I do think R is the future.

  6. Google and its imitators are doing it with their search engines

    The people at Google do a good deal of their analysis of both structured and unstructured data … in R.

    And not so much in SPSS or SAS.

    They would agree with you that R isn’t the next big thing, because for them it’s already a big thing.

  7. The R code was copied from an example I found on line.

    Every time I mention anything negative about a programming language (or any language – a lot of people were not happy that I commented on Esperanto) there are people who believe that is the greatest thing since sliced bread and if I only spoke Esperanto or programmed in R then I would not be so ignorant and see that it really is the answer to everything and world peace.

    My point, which I stand by, is not that I am an expert on R, that R doesn’t do graphics or even that R does not work well for some things, but rather if you even LOOK at R code – bug-free or not, compilable or not – it should be evident that this is not how the average person uses a computer. If we are talking about something that is going to be used by a large number of people, R is not it.

    I read in response to another blog
    http://www.iq.harvard.edu/blog/sss/archives/2010/04/the_inevitable.shtml

    The comment,
    “If you think that R needs a point and click GUI, you can build one.”

    This really made me laugh and it illustrates my point perfectly. The average person does not think when they look at DOS, “Gee,I should write a Window (or better yet, Mac) OS.”

    How many people use computers now compared to when you had to build your own from a kit from Radio Shack?

    Maybe the vast majority of people who use statistics SHOULD be programmers – that is debatable and I could argue either side of that issue – but there are NOT a vast number of people out there who are going to be programmers whether they should be or not.

    A point I don’t think is debatable is that we would be much better off if a vast number of people could perform statistics and understand statistical analyses. They aren’t going to be doing it with R.

    Maybe “young statisticians” will. However, I would not think any product aimed at the young-statistician-and-people-who-work-at-Google market is going to be getting a lot of venture capital money.

    As for SAS/Graph, you can do a lot with it but I am not a big fan of that. Personally, if I were using SAS I would do the coding in SAS to create the type of analytic dataset desired, and do the graphics in SAS Enterprise Guide.

  8. I have been hearing about the wonders of data visualization since about 1995. I have come to believe it is a nice-sounding but empty term. It’s like when managers use the word expect “excellence.”

  9. I’d have to say both data visualization and excellence are a good idea. I’m afraid I share your cynicism about the excellence part.

    As far as data visualization, I’d say the software we have available has improved by leaps and bounds over the last decade. Whether that potential will be realized remains to be seen but I think the odds are we will see widespread data visualization before we see widespread excellence. More often now when I see excellence it’s unexpected!

  10. An interesting post. However, you are confounding technologies with techniques. That is, visualization of data and scalable analysis of non-structured data are indeed two big areas of interest. But the beginning of your post is talking about tools, any of which can be used to do some of your “next big things”. I can use parts of SAS, SPSS, R or Python to visualize and analyze all sorts of data.

    I think you are trying to get at the mixture of these two:
    * Easier, more visual tools which make the process of analyzing and understanding data more accessible, and
    * More powerful tools which can impute order on non-structured data using very scalable approaches which take advantage of the abundance of computing power that clouds and other modern tech approaches.

    So, is R the next big thing? I agree that, if they don’t get their act together around visual use and scalability, then no, not by itself it won’t be. But I think what R is doing, along with Hadoop and Mahout, is allowing users to have a shared approach for analyzing data. This shared approach means that they can now focus on the issues like vizualization, stemming, and other important parts of what you mention.

    From that point of view, then, R may be part of the foundation of the “next big thing”, which is more accessible analytics as part of more and more experiences, including the two you mention, but many more besides.

  11. I think that the blog is spot on—data analysis is about understanding data, not about programming code. That’s a required step at some point but there’s no reason that it has to be done by the end user. It reminds me of the critiques that are lobbed against my field (I am a social psychologist) for not being more like cognitive psychology or neuropsychology (on the micro side) or sociology (on the macro side). Those are wonderful disciplines but they are not the same thing as my chosen level and domain of analysis. The same is true of programming and data analysis—yes, there’s a connection but programming is NOT data analysis. It’s programming. And I think the fact that the snippet of R code does not work is EXACTLY the point—even very smart people have a hard time with this program. (And that’s why researchers hire people to do R instead of doing it themselves; it’s not what they’re interested in and it’s not what they need to focus on.)

    I know a professor who requires his graduate students to perform canonical correlation by hand and from memory. I imagine that he believes that this is the best for them to understand the MEANING of the procedure and how it is affected by and reflects the data. To me, that is just silly. What he’s teaching isn’t data analysis but memorization and test performance under anxiety, not the ability to understand and work with data in an intelligent and interpretable way.

    As I see it, the most difficult part of data analysis is not the computation (which is where R comes in) or even the visual presentation. The most difficult part is being able to tell a story about the data—a story that is intelligible, accurate, insightful, and interesting. Computation has nothing to do with that.

  12. As the author of “R for SAS and SPSS Users” and (with Joe Hilbe) “R for Stata Users”, I like having a wide range of tools to work with. I think they each offer a feature set that suites different situations and/or people. I do agree that R is harder to learn, but I do not think that will affect it’s long-term success. There are several graphical user interfaces available for R (see http://r4stats.com/add-on-modules). Some are well developed while others are at an early stage of development. People who prefer to point-and-click their way through analyses can choose any style they like. People who prefer instead to program won’t be bothered by the fact that R is harder, since it gives them great flexibility, merging the data step, proc step, DMS/OMS, macro language and IML/Matrix languages into a seamless coherent whole.

    But will R replace the commercial packages? I doubt it. They offer a different style of programming and people are a diverse lot. Now that SPSS, SAS and JMP have vendor-supported interfaces to R, those users have access to the 3,000+ R packages without having to migrate from their preferred environment. SPSS and JMP users can even add R functions to their menus and dialog boxes.

    For visualization, I think SAS’ new SG series of graphics is very well done. I think SPSS’ Graphics Production Language is more powerful but also much more complex. Hadley Wickham’s ggplot2 package for R is almost as flexible as SPSS’ GPL (on which it is modeled) while being as easy as the new SAS procs. You can get a feel for it at http://had.co.nz/ggplot2/.

    I agree that huge data sets are an issue for R at the moment, but several efforts are underway (similar to Thomas Lumley’s biglm package) to overcome the data-in-memory limitation.

    I also agree with your assessment that text analysis is one of the next “big things.” Although I have a list of R packages that do text analysis listed on my web site, I have not had time to try them. I do use SAS Text Miner, which does a great job of implementing the Latent Semantic Analysis approach. I also use SPSS Text Analysis for Surveys, which does well with Linguistic Analysis. I use the wonderfully easy and powerful WordStat software for the Content Analytic approach. I also occasionally use QDA Miner to manually select sections of text (Qualitative Analysis) to then analyze using more automated methods. What is wrong with this picture?? Every company chooses it’s favorite approach, and ignores the others. This is how the world of statistics was before SAS and SPSS existed. We need one company to offer all text tools as stat packages offer all popular methods. SAS is expanding its approach to include more linguistic/sentiment analysis, so I hope we will see this come to pass some day.

    Cheers,
    Bob Muenchen

  13. The reason many people have criticized your comments on R is because they are just Fear, Uncertainty, and Doubt (FUD). It is similar to what people used to say (and still do) about Linux, that it is not for regular users. Well, it is not trying to be. Regular users do not need to do statistics.

    If you think you can analyze enormous quantities of data without knowing some programming, you have a lot of time in your hands.

    Furthermore, your problems with R are demonstrably false: “much greater cost of software is the time it takes to install it, maintain it, learn it and document it”
    – Installation takes a minute or two
    – Maintain? Updates take a minute or two
    – Learn it – This is your only valid point
    – Document it – Try documenting mouse clicks (‘first click Stats, then click x, then click y…’). A session in R serves as full documentation of what was done.
    How can you talk about something you have no idea how it works?

    If you think visualization and analysis are the next big things and R is not, check the article in the New York Times: Data Analysts Captivated by R’s Power (http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html). Just one quote from it: “Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use [R].”

    Also, “If we are talking about something that is going to be used by a large number of people, R is not it.” Do you really think a large number of people are going to do “Data visualization” and “Analyzing enormous quantities of unstructured data”?

  14. Everything you have said is pathetically backed up with evidence. You are amateur at best in your comparison of SAS and R.

    1- SAS graphics are atrocious. ggplot2 makes better plots than anything the entire SAS team could put together.

    2- You can use the R pkg reshape for your data structuring needs.

    3- R is FREE! How is that not cheaper than SAS?

    4- Maybe if you knew how to use a computer, you could install it. It’s called download and double-click, done.

    5- R can hook into most any type of Database. It can even work with Hadoop.

    6- The SAS language does not even compare to R’s.

    You really did not do your homework on R.

  15. “If we are talking about something that is going to be used by a large number of people, R is not it.”

    Also C++, C#, Java, Perl, Python, Unix, Linux are not it.

    Also SAS and JMP are not it.

    “A large number of people” consume results of statistical analysis but do not produce the results and the analysis.

    For statisticians who produce the results time to learn R is 5% of time to learn statistics.

  16. We do not want epi and soci people using R! This is why so many scientific papers are published today with terrible theoretical grounding. Lesson learned: hire a statistician. I will go out on a limb and say that every statistician under 50 can program in at least SAS, if not R and some real languages like Python or even C. SAS and SPSS are too computationally slow to answer a lot of our most challenging questions. R does this best. R also runs laps around SAS and SPSS in terms of simulations and graphics. Have you used any of these languages before?!?

  17. I spent the first 2-3 years of my career in the Pharma industry learning to program SAS since we had no real GUI to work from (1993-1995). I’ve since learned to program R. No biggy.

    You’re arguing from the point of view of end users who are happy for others to program their analytics and then use them. There are others who distrust, disagree or want something more than these stock functions and so are happy to program their own. Compare to folks who tinker with engines and cars compared to those who just use the car to get from A to B. It’s not like one group is right and the other wrong. Just preference.

    Industry likes SAS because it’s a controlled environment. Unlike R. Which is the very reason why I think R users like R…

    Blog reply:
    http://mikeksmith.posterous.com/statisticians-and-programming-languages-and-g

  18. I disagree with your comments on R for one reason alone: I am not a programmer and I taught myself R code over the course of two days at work. I then learned on the third day how to use Sweave and automatically generate well-formatted and reproducible reports. The documentation is terse but thorough and the range of libraries available is amazing.

    I took a semester of biostatistics based around SAS and never was able to comprehend the terrible GUI interface. Not to mention: when I graduated, I no longer could afford the (non-student) license.

    Want to learn some R quickly? Try these books:
    Introductory Statistics with R by Dalgaard
    R in a Nutshell, by Adler
    ggplot2, by Wickham
    Introductory Time Series with R, by Cowpertwait and Metcalfe

    There are also many tutorials online for free.

  19. Dr. AnnMaria De Mars,

    I agree with you that two of the “next big things” in data management & analysis are data visualization and dealing with unstructured data. I’m of the opinion that there is a third area, related to the “Internet of Things” and the tsunami of data it will create.

    These are conceptual areas, however, and not software packages nor computing languages. SAS, IBM/SPSS and Pentaho are of the first type; R is of the latter.

    The major thrust of your post seems to be in helping to guide students into areas of study that will be survive in the job market in the coming decades. This is always difficult for mentors, as we can’t always anticipate the “black swan” events that might change things drastically.

    In 1979, when I first sat down with a FORTRAN programmer to turn my Bayesian methodologies into practical applications to determine the reliability and risk associated with the STAR48 kick motor and associated Payload Assist Module (PAM), the statistical libraries for FORTRAN seemed amazing. The ease with which we were able to create the program and churn through decades of NASA data (after buying a 1MB memory box for the mainframe) was wondrous 😉

    Today, not so much wonder from such a feat. The evolution of computing has drastically affected the way in which we apply mathematics and statistics today. Several of the comments to your post argue both sides of the statement that anyone doing statistics today should be a programmer, or shouldn’t. It’s an interesting argument, that I’ve also seen reflected in chemistry, as fewer technicians are used in the lab, and the Ph.D.s work directly with the robots to prepare the samples and interpret the results.

    Perhaps a discussion of what skills a statistics students needs to be marketable over the course of their career would be a more profitable 😉 discussion in these comments than the unnecessary R vs. SAS war.

    1. Never stop learning

    2. Learn to tell stories about data; your audience may not be statisticians, or a statistician with your particular focus

    3. Understand computing as well as statistics; they have become a vital part of our field

    4. Match your tools, statistical and computing, to the problem at hand; not all tools are applicable to all problems, not all problems respond to the same set of tools

    – Twitter.com/JAdP

  20. AnnMaria, I read your post a couple of times and I think I can understand where you are coming from but, if I might say so, I think you came across somewhat confused. The confusion seems to stand primarily from two misconceptions: (1) confusing the language with the tools that can be created through the the language, and (2)misunderstanding the idea that the market for statistical languages and software is a unified block rather than a highly segmented environment.

    In relation to point #1, your post assumes that to benefit from R one has to know programming. This would be equivalent to say that to benefit from Web applications such as Java one would have to get a certification rather than just simply being able to log into a Web application built on Java. If you look at Web-based applications based on R such as http://rweb.stat.ucla.edu/stockplot/ or http://www.stat.ucla.edu/~jeroen/ggplot2.html you will quickly notice that R is indeed tipping towards some revolutionary applications to non-technical audiences.

    In relation to point #2, I think the confusion is even greater. If we were to take your argument to an extreme, it would probably be fair to say that not only R is difficult to use, but SPSS and SAS would never succeed because Excel is so much easier to use. I believe that what your post neglects is that there are different types of users with different goals and different skill sets. Programming in R fulfills a very important role within one specific segment of users who are interested in creating analysis that go beyond off the shelf, pre-packaged recipes. And as my point #1 outlines, it is also establishing itself in other, less technical niches.

    So, this is all to say that your post was somewhat hasted or might have benefited from a little more research.

  21. Despite all the criticisms here in comments and on other blogs, I find myself agreeing with your assessment of R. R is a great statistical programming environment, and I use it whenever I can. It is also incredibly hard to learn, with an interface that is hidden, buried in the online help, manuals and the r-help mailing list. R is an environment for experts that does little or nothing to encourage mastery or to aid new users in accomplishing even simple tasks (just watch a new user try to get the help working if they’re behind a corporate firewall, or to perform a simple t-test).

    As a manager in a corporate environment, I wish that I had the time and resources to train my people in the use of R, or could afford a full-time statistician who was also an expert in R. Instead, I have to settle for the 95% solution that is easy to learn and works for my team. In my case, this is Minitab. I wish it were R.

    The R developer community seems to be largely dominated by individuals who are comfortable with R’s steep learning curve. There is, however, some promising work being done to make R accessible to a broader range of users, such as R Commander, REvolution and R Analytic Flow. I even recall seeing a prototype R GUI that looked just like Minitab, though I cannot find my links to it. I hope that these efforts will successfully transform R from an arcane interface to an explorable one, so that R will become The Next Big Thing.

  22. As far as the comparisons with Excel, with people who work on engines versus those who own cars, I completely agree.

    That is my point. If your target market is “People who own cars that drive from point A to point B” that is much BIGGER than “people who work on engines”. If you are looking for a job making things or selling things or providing services, the former is more likely to pay off for you than the latter.

    Telling people that if they can’t appreciate an internal combustion engine they are too stupid to own a car probably won’t help, either.

  23. “However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail.”

    There are companies who pay 2 milion Euro for SAS (PER YEAR!!), have you ever installed SAS Enterprise Miner, the meta data server and all the other stuff that is needed to run SAS EM, well SAS sells installation engineers at 1000 Euro per day, that says it all…

    Compare that to R.

    A data analyst who does not know some form of programming or scripting would have a low market value. I would never hire such persons.

  24. R is a programming language for doing statistics. As such, it’s great for developing algorithms or doing other programming tasks with data. But if you are an analyst, a programming paradigm may not be the best fit. Besides requiring you to write code that looks a lot like C, the output you get may be oriented towards the programmer. (The available R guis are rather primitive by current standards.) Here is an error message that I get a lot from a popular R package.

    “Error in optim(0, f, control = control, hessian = TRUE, method = “BFGS”) :
    non-finite finite-difference value [1]”

    I know what that means. Would an analyst?

    There are a lot of things I would change about R as a programming language, but, no doubt, it offers a lot of tools for developing algorithms and the library of things already on the shelf doubtless exceeds any alternative.

    It is not easy to learn, though, unless you are just doing things simple enough that you don’t really need to learn much of it anyway. I have programmed for too many decades in many languages, and I found R harder to learn than most other languages I have used.

    It seems to me that it is harder to focus on the important statistical questions when using a language like R than in higher level packages, but if you want precise control over all aspects of the calculations, R can’t be beat.

    So, to me, it’s a coexistence scenario: R is great for certain things, and higher level packages are better for others. Neither branch is likely to take over the (statistical) world.

  25. Let me get this straight – R is a failure because you’re lazy? That’s the whole argument, right?

  26. Yeah, go ahead and have fun dealing with expensive, clunky, GUI software the rest of your life, dolt. There are people who do real science, and there are people who produce excessively complicated excel spreadsheets. The learning curve for R is not bad, and the learned-side is wonderous.

  27. I understand your point that R is more of a programming/scripting language but saying that it does not have click-able buttons to perform analysis is not correct. I think either you are too busy in working with the software like SAS, SPSS etc and don’t read about R (which means ignorant) or biased towards these software.

    I would like to mention about some important things in R:

    1. Use Rcmdr : its is a package which is used for R GUI — which actually solves the your question about click-able buttons

    2. Data Visualization: There are many excellent packages and awesome commands to get that. eg: FIX, graphs packages etc

    3. I agree with your point that Google won’t use COBOL, MUMPS but they are still using C/C++ a language which is in the market from ’80s. I think your example was too narrow minded. What matters is the strength of a languages and robustness. C/ C++ shows that. R also shows but in some aspects it is not good esp writing files.

    4. I am a statistical geneticist-computational biologist and I am sure the kind of unstructured data and vast amount data we see is far more than financial guys. I am not boosting but the data in our field grows 4-6 fold every year due to sequencing/genotyping etc and still in many workshops they recommend R and not SAS/Stata etc.

    I don’t think that your perspective for R is correct. Moreover, Google is using Perl/R for the financial Data.

  28. I’m a little shocked at the ferocity and rudeness of some responses. Perhaps some people don’t realize that not everyone else uses statistics in the same way they do.

    I primarily help researchers, mainly in biology and social science, apply statistics to their research. They are not doing “business analytics,” do not have enormous databases, and really have no need to program anything beyond what SAS or SPSS syntax does. They are not programmers or statisticians, and they don’t have backgrounds in programming or math.

    I believe they are the kinds of users of statistics that you are referring to and I agree with you wholeheartedly that they are probably the majority of statistics users and they have no need for a programming language. They don’t want to nor need to program new statistical procedures.

    There are clearly people who do, but I agree they’re not the majority. At least not in the fields I work. That is not to say (nor do I think you say) that in some other fields and applications, R is a good fit.

    For people who have been well trained in SAS or SPSS, but not “programming” as in C, etc., (and I include myself here), the logic of R feels strange. This shift in logic is not insurmountable, if needed, but it does take some time and focus for training, which is often the ingredient most lacking.

    I learned S-Plus (the basis of R) in my statistics grad program (after already having been trained, and used BMDP, SPSS, and SAS, on unix, for years). I didn’t like the logic, and I stopped using it once I could, because SAS and SPSS worked fine for me. To be fair, my training in S-Plus was minimal, so using it was always a challenge. (Although I’ve heard great things about Bob Muenchen’s book, and may look into R again in the future).

    I wrote a blog post a while ago about choosing a statistical software package. http://www.analysisfactor.com/statchat/?p=321. I still say the most important consideration is what your colleagues use, but you always need to have at least two packages under your belt.

    And I’m pretty sure BMDP disappeared b/c SPSS bought it and pretty much stopped supporting it.

  29. If “the next big thing” is something that “the average person” will be doing, then clearly R is not it.

    It’s obvious that ubiquitous connectivity and vastly cheaper storage have led to corpora of data of almost unimaginable size by the “BMDP-era” standards, and that the analysis and visualization of this material will be (and already is) huge. For dabblers, a GUI will be all they need and all they see. Underneath that GUI, however, is an engine. I would hardly count R out on that score.

    Of course, if your definition of “big” is defined as the surface manifestations visible to most people, R ain’t it; but then again, such a prediction would be by definition superficial.

  30. @AnnMaria – did you have any idea how much of an impact this post was going to have? The phrase “epic fail” certainly seemed to capture people’s imaginations 😉

    I have just written a follow-up post “R, the Epic Fail blog, and SOFA Statistics” (http://www.sofastatistics.com/blog/?p=314) in which I relate the “Epic Fail” debate to directions for the open source SOFA Statistics project (http://www.sofastatistics.com). SOFA Statistics is free, with an emphasis on ease of use, learn as you go, and beautiful output. It is currently packaged for Windows and Ubuntu, with a Mac package in the pipeline.

  31. I’ve spent several decades in commercial statistical software development (working in a variety of R&D roles at SYSTAT, StatView, JMP, and SAS), and I now do custom JMP scripting, etc., to make my prejudices clear.

    I can say with hard-won authority that:

    – good statistical software development is difficult and expensive
    – good quality assurance is more difficult and expensive
    – designing a good graphical user interface is difficult, and expensive
    – a good GUI is worthwhile, because the easier it is to try more things, the more things you will try, &
    – creative insight is worth a lot more than programming skill

    Even commercial software tends to be under-supported, and I’ll be the first to admit that my own programming is as buggy as anybody else’s, but if I’m making life-and-death or world-changing decisions, I want to be sure that I’m not the only one who’s looked at my code, tested border cases, considered the implications of missing values, controlled for underflow and overflow errors, done smart things with floating point fuzziness, and generally thought about any given problem in a few more directions than I have. I want to know that when serious bugs are discovered, the knowledge is disseminated and somebody’s job is on the line to fix them.

    For all these reasons, I temper my sincere enthusiasm about the wide open frontiers of open source products like R with a conservative appreciation for software that has a big company’s reputation and future riding on its accuracy, and preferably a big company that has been in the business long enough to develop the paranoia that drives a fierce QA program.

    R is great for what it is, as long as you bear in mind what it isn’t. Your own R code or R code that you find sitting around is only as good as your commitment to testing and understanding of thorny computational gotchas.

    I share the apparently-common opinion that R’s interface leaves a lot to be desired. Confidentiality agreements prevent me from confirming or denying the rumors about JMP 9 interfacing with R, but I will say that if they turn out to be true, both products would benefit from it. JMP, like any commercial product, improves when it faces stiff competition and attends to it, and R, like most open source products, could use a better front end.

    [An expanded version of my comments are cross-posted on Global Pragmatica LLC’s blog as http://globalpragmatica.com/?p=230%5D.

  32. “The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.”

    The vast majority of people are not interested in data analytics. People interested in data analytics who just “look at things and click” could not do any data analytics, whatever tool they would use.

    I don’t see the point in blaming a programming language for being a programming language.

    There is a lot to criticize in R, though. Compared to Python with proper packages (including its R interface), it is not really more powerfull, and far from being as well designed and as easy to use.

  33. You say that people “are used to looking at things and clicking on things.” Is this really true of people who use SAS for serious work? I am making the transition from SAS to R, due in substantial measure to working for a government agency lacking funds to continue with SAS. I have always done any meaningful analysis via a script. I wrote code for SAS and I now write code for R. Scripts are self-documenting. They enable reproducibility and coherent updating. Although it can be quicker to point and click one’s way through an analysis, being able a few months later to support a conclusion so derived can be nearly impossible.

    R certainly does not fit the way most people use computers – few computer users undertake meaningful analysis of numerical data. R does surely fit the approach of those who are serious about supportable analyses of non-trivial data sets.

Leave a Reply

Your email address will not be published. Required fields are marked *