I’m at Seattle this week, at SAS Global Forum, and it is even greater than usual. I go to several conferences each year, some because I am presenting, some because there is a topic that particularly interests me, but there are three I go to every year.  Of these, SAS Global Forum is the one I would absolutely not miss. It is not for those on a limited budget, but it is worth it. You get the chance to meet A LOT of the smartest people in the world. Seriously. And I have a basket of degrees and am married to an honest-to-God rocket scientist so my bar for “smartest people in the world” is pretty high.

One of the other two I always attend are the Western Users of SAS Software conference, you learn a lot , it’s relatively inexpensive and not far to travel. Lots of bang for the buck. The second is the SPSS Directions conference.

At ALL of these, and in general, in the back of my mind all of the time, I am looking for “the next big thing”.  Whether as an individual, a university or a company, I think to stay competitive in the long-run you need to be ahead of the learning curve, as people who want to be smart-asses refer to it, “bleeding edge”. Think about it, if you were teaching statistics twenty years ago, you had the choice of having your students learn SPSS, SAS, SYSTAT, BMDP or Minitab. Of those, BMDP, which was “for real statisticians”, kind of like the R of the day, is one I haven’t seen used in years. I thought SYSTAT was off the market but I did see an ad for it recently, surprised to hear it still existed.

If you had taught your students SAS twenty years ago and they stuck with it they are much more marketable now than if you had made the other choices. My definition of marketable is based on how many jobs are available requiring SAS as skill, and how extensible those skills are. For example, Stata is not really feasible to use for running a company’s entire data management and data analysis. If you are an individual economist and you just need to do some specific econometric procedures, you don’t care about that, but if you are looking for “the next big thing”, something that will be around and used by millions of people twenty years from now, Stata is probably not it. Actually, I don’t think that’s their plan, anyway. I think their plan is to be a very good choice for high-level statistical analysis and stay in business as a profitable company.

Contrary to what  some people seem to think, R is definitely not the next big thing, either. I am always surprised when people ask me why I think that, because to my mind it is obvious. Note: For those of you who were so unhappy with the example I used previously, here is a new snippet of code from the site R by example

Below is an example of R code:

# Goal: Simulate a dataset from the OLS model and obtain
# obtain OLS estimates for it.

x <- runif(100, 0, 10) # 100 draws from U(0,10)
y <- 2 + 3*x + rnorm(100) # beta = [2, 3] and sigma = 1

# You want to just look at OLS results?
summary(lm(y ~ x))

# Suppose x and y were packed together in a data frame --
D <- data.frame(x,y)
summary(lm(y ~ x, D))

# Full and elaborate steps --
d <- lm(y ~ x)
# Learn about this object by saying ?lm and str(d)
# Compact model results --
print(d)
# Pretty graphics for regression diagnostics --
par(mfrow=c(2,2))
plot(d)

Follow this link for the rest of the program.

I know that R is free and I am actually a Unix fan and think Open Source software is a great idea. However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail. It does NOT fit with the way the vast majority of people in the world use computers. The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.
There are two developments that I see coming as The Next Big Thing.
Data visualization. I am teaching a workshop this summer on this topic. This isn’t an ad, it is not open to the public so you can’t come anyway. I’m teaching it because I have seen more and more professors AND students frustrated by the fact that the average graduate student has trouble really understanding statistics. They may be able to get the correct answer on a multiple choice test that asks about a critical p-value. I have lived over half a century now and discovered that life holds very few multiple choice tests. We need statistical thinking, data literacy or whatever cool catch phrase someone can coin. This is the wave of the future. I am going to use examples from SPSS, SAS Enterprise Guide and JMP in this course because they can all be done with the pointing and clicking AND for those who want to go further all have a coding option, giving that extensibility thing.

Analyzing enormous quantities of unstructured data: First, let me explain structured data. That is data that is in a set format. Say, you have your annual expenditures. The first column is date of expense, the second column is check number, the third is the amount. That’s structured data. It can be over more than one row and in all sorts of other ways but the main point is that you have some sort of definite structure. The overwhelming majority of data – forum posts, blogs, comments on customer service cards, websites, etc. etc. is unstructured data. People start wherever they want, finish wherever they want, change subjects and just basically do it however the hell they way.  And there is a ginormous amount of this stuff. The Next Big Thing is going to be finding meaning from this data. Google and its imitators are doing it with their search engines. Every company that has a clue is mining for market information.
So, for the next year, those are the eggs I am putting in my basket. I am sure the shape of those two fields will change over the years, but I guarantee that neither will go the way of BMDP, MUMPS and COBOL.

Comments

64 Responses to “The Next Big Thing”

  1. Tal Galili on April 14th, 2010 9:29 am

    Hello,

    Interesting post – I am writing a reply to it on my blog now.

    Just before that, I would like to encourage you to re check your R code, since it doesn’t seem to compile…

    With respect,
    Tal

  2. R, “the next big thing” and Statistics in the cloud | R-statistics blog on April 14th, 2010 10:28 am

    [...] A friend just e-mailed me about a blog post by Dr. AnnMaria De Mars titled “The Next Big Thing”. [...]

  3. La opinión sobre R de una pobre señora « Datanalytics: estadística y minería de datos on April 14th, 2010 4:37 pm

    [...] por datanalytics Me llegan noticias de una pobre señora que, se conoce, tiene un blog en el que habla de cosas que, da la impresion, le trascienden. Dice lo siguiente: Contrary to what  some people seem to think, R is definitely not the next big [...]

  4. Juan José Gibaja Martins on April 15th, 2010 2:07 am

    AnnMaria:

    The R code you posted does not run:

    > a sigma y r plot(-4:4, -4:4, xlab= ‘x’, ylab= ‘y’, main= “”, sub = “”,type = “n”)
    > points(x,y,pch=19,cex=0.2)
    Error en points(x, y, pch = 19, cex = 0.2) : objeto ‘x’ no encontrado
    > legend(-3.9, 3.8,substr(paste(“r=”,r), 1, 8), bg=’gray90′)
    Error en paste(“r=”, r) : objeto ‘r’ no encontrado
    >

    You failed to define x and, as a result, you failed to define y and r.

    It looks as if you didn’t know how to program in R. How can you criticize a software you don’t seem to understand?

    Best regards.

  5. Yihui Xie on April 15th, 2010 2:11 am

    Well, if R is an “epic fail” because it needs programming, what about the (dinosaur-like) so-called programming of SAS? Or you do not actually use the programming interface of SAS?

    And would you post R code that is executable to your reader? The variables ‘x’ and ‘xnorm’ were missing, and the error term was a constant 0!

    I’m glad to see data visualization is one of the “big things” in your list, as that is also what I’m interested in. I’m not sure if you ever checked the R packages related to this area: http://cran.r-project.org/web/views/Graphics.html Among them you might be interested in rggobi (or GGobi: http://www.ggobi.org/). The R Graph Gallery is also a place to see what R can do in graphics: http://addictedtor.free.fr/graphiques/

  6. Tweets that mention The Next Big Thing : AnnMaria’s Blog -- Topsy.com on April 15th, 2010 6:37 am

    [...] This post was mentioned on Twitter by Drew Conway, annmariastat. annmariastat said: Wrote post on The Next Big Thing. Hoping my Next Big Thing after this is new beer http://www.thejuliagroup.com/blog/?p=433 [...]

  7. didi on April 15th, 2010 7:57 am

    “The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.”

    That’s completly true but, the majority of people have no interest for statistics. Whereas the majority of people dealing with statistics should be programmers if we want statistics to have a meaning in a real life.

  8. Kyle on April 15th, 2010 9:03 am

    “However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail.”

    I don’t follow this argument. As you must purchase SAS, deal with technicians at your institution, deal with a nightmare of licensing, and it can take a long time. R takes less than 2 minutes to install.

    I think that the younger generation of statisticians are using R and are more familiar with coding. Everyone has been exposed to some language, especially HTML.

    Graphically, SAS really can’t compare ( http://addictedtor.free.fr/graphiques/ ). I mean I would rather graph in Excel than SAS! Sure you can use some macro that is about 4 paragraphs of code, or you can use one line of code in R. Especially with multivariate statistics.

    With that said, I do feel that mixed models are much stronger in SAS than R. Especially those with complicated nested structures. I do all my mixed models in SAS, all my graphics, simulations, and multivariate statistics in R. I do think R is the future.

  9. Joseph Dunn on April 15th, 2010 10:54 am

    Interesting thoughts. I’ve written a bit of a response here: http://jdunn.posterous.com/r-is-not-the-next-big-thingand153

  10. efrique on April 15th, 2010 10:57 am

    Google and its imitators are doing it with their search engines

    The people at Google do a good deal of their analysis of both structured and unstructured data … in R.

    And not so much in SPSS or SAS.

    They would agree with you that R isn’t the next big thing, because for them it’s already a big thing.

  11. admin on April 15th, 2010 5:15 pm

    The R code was copied from an example I found on line.

    Every time I mention anything negative about a programming language (or any language – a lot of people were not happy that I commented on Esperanto) there are people who believe that is the greatest thing since sliced bread and if I only spoke Esperanto or programmed in R then I would not be so ignorant and see that it really is the answer to everything and world peace.

    My point, which I stand by, is not that I am an expert on R, that R doesn’t do graphics or even that R does not work well for some things, but rather if you even LOOK at R code – bug-free or not, compilable or not – it should be evident that this is not how the average person uses a computer. If we are talking about something that is going to be used by a large number of people, R is not it.

    I read in response to another blog
    http://www.iq.harvard.edu/blog/sss/archives/2010/04/the_inevitable.shtml

    The comment,
    “If you think that R needs a point and click GUI, you can build one.”

    This really made me laugh and it illustrates my point perfectly. The average person does not think when they look at DOS, “Gee,I should write a Window (or better yet, Mac) OS.”

    How many people use computers now compared to when you had to build your own from a kit from Radio Shack?

    Maybe the vast majority of people who use statistics SHOULD be programmers – that is debatable and I could argue either side of that issue – but there are NOT a vast number of people out there who are going to be programmers whether they should be or not.

    A point I don’t think is debatable is that we would be much better off if a vast number of people could perform statistics and understand statistical analyses. They aren’t going to be doing it with R.

    Maybe “young statisticians” will. However, I would not think any product aimed at the young-statistician-and-people-who-work-at-Google market is going to be getting a lot of venture capital money.

    As for SAS/Graph, you can do a lot with it but I am not a big fan of that. Personally, if I were using SAS I would do the coding in SAS to create the type of analytic dataset desired, and do the graphics in SAS Enterprise Guide.

  12. Darrell Rudmann on April 15th, 2010 7:19 pm

    I have been hearing about the wonders of data visualization since about 1995. I have come to believe it is a nice-sounding but empty term. It’s like when managers use the word expect “excellence.”

  13. admin on April 15th, 2010 7:36 pm

    I’d have to say both data visualization and excellence are a good idea. I’m afraid I share your cynicism about the excellence part.

    As far as data visualization, I’d say the software we have available has improved by leaps and bounds over the last decade. Whether that potential will be realized remains to be seen but I think the odds are we will see widespread data visualization before we see widespread excellence. More often now when I see excellence it’s unexpected!

  14. Eduardo on April 15th, 2010 7:58 pm

    Why not follow up with a post contrasting the same task being done with different languages/systems. E.g compare ggplot2 (the latest and greatest graphics package in R. See learnr.wordpress.com/) to the new features in JMP (see http://junkcharts.typepad.com/junk_charts/2010/04/hoisted-from-the-archives-a-revolution.html). Just an idea…

  15. John on April 16th, 2010 12:14 am
  16. An article attacking R gets responses from the R blogosphere – some reflections | R-statistics blog on April 16th, 2010 6:32 am

    [...] am very grateful to Dr. AnnMaria De Mars for writing her post “The Next Big Thing”. In her post, Dr. De Mars attacked R by accusing it of being “an epic fail” (in [...]

  17. R Command Line » stotastic on April 16th, 2010 9:12 am

    [...] The Next Big Thing [...]

  18. Michael Wexler on April 16th, 2010 2:35 pm

    An interesting post. However, you are confounding technologies with techniques. That is, visualization of data and scalable analysis of non-structured data are indeed two big areas of interest. But the beginning of your post is talking about tools, any of which can be used to do some of your “next big things”. I can use parts of SAS, SPSS, R or Python to visualize and analyze all sorts of data.

    I think you are trying to get at the mixture of these two:
    * Easier, more visual tools which make the process of analyzing and understanding data more accessible, and
    * More powerful tools which can impute order on non-structured data using very scalable approaches which take advantage of the abundance of computing power that clouds and other modern tech approaches.

    So, is R the next big thing? I agree that, if they don’t get their act together around visual use and scalability, then no, not by itself it won’t be. But I think what R is doing, along with Hadoop and Mahout, is allowing users to have a shared approach for analyzing data. This shared approach means that they can now focus on the issues like vizualization, stemming, and other important parts of what you mention.

    From that point of view, then, R may be part of the foundation of the “next big thing”, which is more accessible analytics as part of more and more experiences, including the two you mention, but many more besides.

  19. sxbxchen on April 16th, 2010 9:15 pm

    If you really want to click a button in R, please check library Rcdmr. From input of raw data to final result (regression, analysis……), you don’t have to use the keyboard.

    http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/
    http://cran.r-project.org/web/packages/Rcmdr/index.html

  20. Barton Poulson on April 16th, 2010 10:54 pm

    I think that the blog is spot on—data analysis is about understanding data, not about programming code. That’s a required step at some point but there’s no reason that it has to be done by the end user. It reminds me of the critiques that are lobbed against my field (I am a social psychologist) for not being more like cognitive psychology or neuropsychology (on the micro side) or sociology (on the macro side). Those are wonderful disciplines but they are not the same thing as my chosen level and domain of analysis. The same is true of programming and data analysis—yes, there’s a connection but programming is NOT data analysis. It’s programming. And I think the fact that the snippet of R code does not work is EXACTLY the point—even very smart people have a hard time with this program. (And that’s why researchers hire people to do R instead of doing it themselves; it’s not what they’re interested in and it’s not what they need to focus on.)

    I know a professor who requires his graduate students to perform canonical correlation by hand and from memory. I imagine that he believes that this is the best for them to understand the MEANING of the procedure and how it is affected by and reflects the data. To me, that is just silly. What he’s teaching isn’t data analysis but memorization and test performance under anxiety, not the ability to understand and work with data in an intelligent and interpretable way.

    As I see it, the most difficult part of data analysis is not the computation (which is where R comes in) or even the visual presentation. The most difficult part is being able to tell a story about the data—a story that is intelligible, accurate, insightful, and interesting. Computation has nothing to do with that.

  21. Bob Muenchen on April 17th, 2010 8:13 am

    As the author of “R for SAS and SPSS Users” and (with Joe Hilbe) “R for Stata Users”, I like having a wide range of tools to work with. I think they each offer a feature set that suites different situations and/or people. I do agree that R is harder to learn, but I do not think that will affect it’s long-term success. There are several graphical user interfaces available for R (see http://r4stats.com/add-on-modules). Some are well developed while others are at an early stage of development. People who prefer to point-and-click their way through analyses can choose any style they like. People who prefer instead to program won’t be bothered by the fact that R is harder, since it gives them great flexibility, merging the data step, proc step, DMS/OMS, macro language and IML/Matrix languages into a seamless coherent whole.

    But will R replace the commercial packages? I doubt it. They offer a different style of programming and people are a diverse lot. Now that SPSS, SAS and JMP have vendor-supported interfaces to R, those users have access to the 3,000+ R packages without having to migrate from their preferred environment. SPSS and JMP users can even add R functions to their menus and dialog boxes.

    For visualization, I think SAS’ new SG series of graphics is very well done. I think SPSS’ Graphics Production Language is more powerful but also much more complex. Hadley Wickham’s ggplot2 package for R is almost as flexible as SPSS’ GPL (on which it is modeled) while being as easy as the new SAS procs. You can get a feel for it at http://had.co.nz/ggplot2/.

    I agree that huge data sets are an issue for R at the moment, but several efforts are underway (similar to Thomas Lumley’s biglm package) to overcome the data-in-memory limitation.

    I also agree with your assessment that text analysis is one of the next “big things.” Although I have a list of R packages that do text analysis listed on my web site, I have not had time to try them. I do use SAS Text Miner, which does a great job of implementing the Latent Semantic Analysis approach. I also use SPSS Text Analysis for Surveys, which does well with Linguistic Analysis. I use the wonderfully easy and powerful WordStat software for the Content Analytic approach. I also occasionally use QDA Miner to manually select sections of text (Qualitative Analysis) to then analyze using more automated methods. What is wrong with this picture?? Every company chooses it’s favorite approach, and ignores the others. This is how the world of statistics was before SAS and SPSS existed. We need one company to offer all text tools as stat packages offer all popular methods. SAS is expanding its approach to include more linguistic/sentiment analysis, so I hope we will see this come to pass some day.

    Cheers,
    Bob Muenchen

  22. Luis J. Villanueva on April 17th, 2010 10:46 am

    The reason many people have criticized your comments on R is because they are just Fear, Uncertainty, and Doubt (FUD). It is similar to what people used to say (and still do) about Linux, that it is not for regular users. Well, it is not trying to be. Regular users do not need to do statistics.

    If you think you can analyze enormous quantities of data without knowing some programming, you have a lot of time in your hands.

    Furthermore, your problems with R are demonstrably false: “much greater cost of software is the time it takes to install it, maintain it, learn it and document it”
    – Installation takes a minute or two
    – Maintain? Updates take a minute or two
    – Learn it – This is your only valid point
    – Document it – Try documenting mouse clicks (‘first click Stats, then click x, then click y…’). A session in R serves as full documentation of what was done.
    How can you talk about something you have no idea how it works?

    If you think visualization and analysis are the next big things and R is not, check the article in the New York Times: Data Analysts Captivated by R’s Power (http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html). Just one quote from it: “Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use [R].”

    Also, “If we are talking about something that is going to be used by a large number of people, R is not it.” Do you really think a large number of people are going to do “Data visualization” and “Analyzing enormous quantities of unstructured data”?

  23. R is an epic fail or is it just overhyped « DECISION STATS on April 17th, 2010 9:32 pm

    [...] caught my attention were the words from http://www.thejuliagroup.com/blog/?p=433 However, for me personally and for most users, both individual and organizational, the much greater [...]

  24. patrick on April 18th, 2010 12:18 am

    Everything you have said is pathetically backed up with evidence. You are amateur at best in your comparison of SAS and R.

    1- SAS graphics are atrocious. ggplot2 makes better plots than anything the entire SAS team could put together.

    2- You can use the R pkg reshape for your data structuring needs.

    3- R is FREE! How is that not cheaper than SAS?

    4- Maybe if you knew how to use a computer, you could install it. It’s called download and double-click, done.

    5- R can hook into most any type of Database. It can even work with Hadoop.

    6- The SAS language does not even compare to R’s.

    You really did not do your homework on R.

  25. Alex Zolot, Cert.Adv.SAS programmer on April 18th, 2010 6:11 am

    “If we are talking about something that is going to be used by a large number of people, R is not it.”

    Also C++, C#, Java, Perl, Python, Unix, Linux are not it.

    Also SAS and JMP are not it.

    “A large number of people” consume results of statistical analysis but do not produce the results and the analysis.

    For statisticians who produce the results time to learn R is 5% of time to learn statistics.

  26. Eric on April 18th, 2010 11:36 am

    We do not want epi and soci people using R! This is why so many scientific papers are published today with terrible theoretical grounding. Lesson learned: hire a statistician. I will go out on a limb and say that every statistician under 50 can program in at least SAS, if not R and some real languages like Python or even C. SAS and SPSS are too computationally slow to answer a lot of our most challenging questions. R does this best. R also runs laps around SAS and SPSS in terms of simulations and graphics. Have you used any of these languages before?!?

  27. Mike K Smith on April 18th, 2010 11:53 am

    I spent the first 2-3 years of my career in the Pharma industry learning to program SAS since we had no real GUI to work from (1993-1995). I’ve since learned to program R. No biggy.

    You’re arguing from the point of view of end users who are happy for others to program their analytics and then use them. There are others who distrust, disagree or want something more than these stock functions and so are happy to program their own. Compare to folks who tinker with engines and cars compared to those who just use the car to get from A to B. It’s not like one group is right and the other wrong. Just preference.

    Industry likes SAS because it’s a controlled environment. Unlike R. Which is the very reason why I think R users like R…

    Blog reply:
    http://mikeksmith.posterous.com/statisticians-and-programming-languages-and-g

  28. Clay on April 19th, 2010 9:08 am

    I disagree with your comments on R for one reason alone: I am not a programmer and I taught myself R code over the course of two days at work. I then learned on the third day how to use Sweave and automatically generate well-formatted and reproducible reports. The documentation is terse but thorough and the range of libraries available is amazing.

    I took a semester of biostatistics based around SAS and never was able to comprehend the terrible GUI interface. Not to mention: when I graduated, I no longer could afford the (non-student) license.

    Want to learn some R quickly? Try these books:
    Introductory Statistics with R by Dalgaard
    R in a Nutshell, by Adler
    ggplot2, by Wickham
    Introductory Time Series with R, by Cowpertwait and Metcalfe

    There are also many tutorials online for free.

  29. Joseph A. di Paolantonio on April 19th, 2010 4:33 pm

    Dr. AnnMaria De Mars,

    I agree with you that two of the “next big things” in data management & analysis are data visualization and dealing with unstructured data. I’m of the opinion that there is a third area, related to the “Internet of Things” and the tsunami of data it will create.

    These are conceptual areas, however, and not software packages nor computing languages. SAS, IBM/SPSS and Pentaho are of the first type; R is of the latter.

    The major thrust of your post seems to be in helping to guide students into areas of study that will be survive in the job market in the coming decades. This is always difficult for mentors, as we can’t always anticipate the “black swan” events that might change things drastically.

    In 1979, when I first sat down with a FORTRAN programmer to turn my Bayesian methodologies into practical applications to determine the reliability and risk associated with the STAR48 kick motor and associated Payload Assist Module (PAM), the statistical libraries for FORTRAN seemed amazing. The ease with which we were able to create the program and churn through decades of NASA data (after buying a 1MB memory box for the mainframe) was wondrous ;-)

    Today, not so much wonder from such a feat. The evolution of computing has drastically affected the way in which we apply mathematics and statistics today. Several of the comments to your post argue both sides of the statement that anyone doing statistics today should be a programmer, or shouldn’t. It’s an interesting argument, that I’ve also seen reflected in chemistry, as fewer technicians are used in the lab, and the Ph.D.s work directly with the robots to prepare the samples and interpret the results.

    Perhaps a discussion of what skills a statistics students needs to be marketable over the course of their career would be a more profitable ;-) discussion in these comments than the unnecessary R vs. SAS war.

    1. Never stop learning

    2. Learn to tell stories about data; your audience may not be statisticians, or a statistician with your particular focus

    3. Understand computing as well as statistics; they have become a vital part of our field

    4. Match your tools, statistical and computing, to the problem at hand; not all tools are applicable to all problems, not all problems respond to the same set of tools

    - Twitter.com/JAdP

  30. Ricardo on April 20th, 2010 6:30 am

    AnnMaria, I read your post a couple of times and I think I can understand where you are coming from but, if I might say so, I think you came across somewhat confused. The confusion seems to stand primarily from two misconceptions: (1) confusing the language with the tools that can be created through the the language, and (2)misunderstanding the idea that the market for statistical languages and software is a unified block rather than a highly segmented environment.

    In relation to point #1, your post assumes that to benefit from R one has to know programming. This would be equivalent to say that to benefit from Web applications such as Java one would have to get a certification rather than just simply being able to log into a Web application built on Java. If you look at Web-based applications based on R such as http://rweb.stat.ucla.edu/stockplot/ or http://www.stat.ucla.edu/~jeroen/ggplot2.html you will quickly notice that R is indeed tipping towards some revolutionary applications to non-technical audiences.

    In relation to point #2, I think the confusion is even greater. If we were to take your argument to an extreme, it would probably be fair to say that not only R is difficult to use, but SPSS and SAS would never succeed because Excel is so much easier to use. I believe that what your post neglects is that there are different types of users with different goals and different skill sets. Programming in R fulfills a very important role within one specific segment of users who are interested in creating analysis that go beyond off the shelf, pre-packaged recipes. And as my point #1 outlines, it is also establishing itself in other, less technical niches.

    So, this is all to say that your post was somewhat hasted or might have benefited from a little more research.

  31. Tom on April 20th, 2010 8:30 am

    Despite all the criticisms here in comments and on other blogs, I find myself agreeing with your assessment of R. R is a great statistical programming environment, and I use it whenever I can. It is also incredibly hard to learn, with an interface that is hidden, buried in the online help, manuals and the r-help mailing list. R is an environment for experts that does little or nothing to encourage mastery or to aid new users in accomplishing even simple tasks (just watch a new user try to get the help working if they’re behind a corporate firewall, or to perform a simple t-test).

    As a manager in a corporate environment, I wish that I had the time and resources to train my people in the use of R, or could afford a full-time statistician who was also an expert in R. Instead, I have to settle for the 95% solution that is easy to learn and works for my team. In my case, this is Minitab. I wish it were R.

    The R developer community seems to be largely dominated by individuals who are comfortable with R’s steep learning curve. There is, however, some promising work being done to make R accessible to a broader range of users, such as R Commander, REvolution and R Analytic Flow. I even recall seeing a prototype R GUI that looked just like Minitab, though I cannot find my links to it. I hope that these efforts will successfully transform R from an arcane interface to an explorable one, so that R will become The Next Big Thing.

  32. admin on April 20th, 2010 11:11 am

    As far as the comparisons with Excel, with people who work on engines versus those who own cars, I completely agree.

    That is my point. If your target market is “People who own cars that drive from point A to point B” that is much BIGGER than “people who work on engines”. If you are looking for a job making things or selling things or providing services, the former is more likely to pay off for you than the latter.

    Telling people that if they can’t appreciate an internal combustion engine they are too stupid to own a car probably won’t help, either.

  33. Para que copien, peguen y disfruten « Datanalytics: estadística y minería de datos on April 20th, 2010 6:18 pm

    [...] A las alegaciones de que el código de R que publicó en su página no es, siquiera, código de R respondió diciendo que lo había copiado “de internet” (¡cuánto de pernicioso hay por esas páginas por donde uno navega sin temor de [...]

  34. LHL on April 21st, 2010 7:24 am

    “However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail.”

    There are companies who pay 2 milion Euro for SAS (PER YEAR!!), have you ever installed SAS Enterprise Miner, the meta data server and all the other stuff that is needed to run SAS EM, well SAS sells installation engineers at 1000 Euro per day, that says it all…

    Compare that to R.

    A data analyst who does not know some form of programming or scripting would have a low market value. I would never hire such persons.

  35. Jon Peck on April 22nd, 2010 10:49 am

    R is a programming language for doing statistics. As such, it’s great for developing algorithms or doing other programming tasks with data. But if you are an analyst, a programming paradigm may not be the best fit. Besides requiring you to write code that looks a lot like C, the output you get may be oriented towards the programmer. (The available R guis are rather primitive by current standards.) Here is an error message that I get a lot from a popular R package.

    “Error in optim(0, f, control = control, hessian = TRUE, method = “BFGS”) :
    non-finite finite-difference value [1]”

    I know what that means. Would an analyst?

    There are a lot of things I would change about R as a programming language, but, no doubt, it offers a lot of tools for developing algorithms and the library of things already on the shelf doubtless exceeds any alternative.

    It is not easy to learn, though, unless you are just doing things simple enough that you don’t really need to learn much of it anyway. I have programmed for too many decades in many languages, and I found R harder to learn than most other languages I have used.

    It seems to me that it is harder to focus on the important statistical questions when using a language like R than in higher level packages, but if you want precise control over all aspects of the calculations, R can’t be beat.

    So, to me, it’s a coexistence scenario: R is great for certain things, and higher level packages are better for others. Neither branch is likely to take over the (statistical) world.

  36. Brenda on April 22nd, 2010 11:48 am

    Let me get this straight – R is a failure because you’re lazy? That’s the whole argument, right?

  37. andy on April 22nd, 2010 11:59 am

    Yeah, go ahead and have fun dealing with expensive, clunky, GUI software the rest of your life, dolt. There are people who do real science, and there are people who produce excessively complicated excel spreadsheets. The learning curve for R is not bad, and the learned-side is wonderous.

  38. R is an ‘epic fail’ – or how to make statisticians mad | EMDMA on April 22nd, 2010 12:27 pm

    [...] are mad and out for blood. Someone called R an epic fail and said it wasn’t the next big thing. I know that R is free and I am actually a Unix fan [...]

  39. Ayush Raman on April 23rd, 2010 12:50 am

    I understand your point that R is more of a programming/scripting language but saying that it does not have click-able buttons to perform analysis is not correct. I think either you are too busy in working with the software like SAS, SPSS etc and don’t read about R (which means ignorant) or biased towards these software.

    I would like to mention about some important things in R:

    1. Use Rcmdr : its is a package which is used for R GUI — which actually solves the your question about click-able buttons

    2. Data Visualization: There are many excellent packages and awesome commands to get that. eg: FIX, graphs packages etc

    3. I agree with your point that Google won’t use COBOL, MUMPS but they are still using C/C++ a language which is in the market from ’80s. I think your example was too narrow minded. What matters is the strength of a languages and robustness. C/ C++ shows that. R also shows but in some aspects it is not good esp writing files.

    4. I am a statistical geneticist-computational biologist and I am sure the kind of unstructured data and vast amount data we see is far more than financial guys. I am not boosting but the data in our field grows 4-6 fold every year due to sequencing/genotyping etc and still in many workshops they recommend R and not SAS/Stata etc.

    I don’t think that your perspective for R is correct. Moreover, Google is using Perl/R for the financial Data.

  40. Karen on April 23rd, 2010 10:00 am

    I’m a little shocked at the ferocity and rudeness of some responses. Perhaps some people don’t realize that not everyone else uses statistics in the same way they do.

    I primarily help researchers, mainly in biology and social science, apply statistics to their research. They are not doing “business analytics,” do not have enormous databases, and really have no need to program anything beyond what SAS or SPSS syntax does. They are not programmers or statisticians, and they don’t have backgrounds in programming or math.

    I believe they are the kinds of users of statistics that you are referring to and I agree with you wholeheartedly that they are probably the majority of statistics users and they have no need for a programming language. They don’t want to nor need to program new statistical procedures.

    There are clearly people who do, but I agree they’re not the majority. At least not in the fields I work. That is not to say (nor do I think you say) that in some other fields and applications, R is a good fit.

    For people who have been well trained in SAS or SPSS, but not “programming” as in C, etc., (and I include myself here), the logic of R feels strange. This shift in logic is not insurmountable, if needed, but it does take some time and focus for training, which is often the ingredient most lacking.

    I learned S-Plus (the basis of R) in my statistics grad program (after already having been trained, and used BMDP, SPSS, and SAS, on unix, for years). I didn’t like the logic, and I stopped using it once I could, because SAS and SPSS worked fine for me. To be fair, my training in S-Plus was minimal, so using it was always a challenge. (Although I’ve heard great things about Bob Muenchen’s book, and may look into R again in the future).

    I wrote a blog post a while ago about choosing a statistical software package. http://www.analysisfactor.com/statchat/?p=321. I still say the most important consideration is what your colleagues use, but you always need to have at least two packages under your belt.

    And I’m pretty sure BMDP disappeared b/c SPSS bought it and pretty much stopped supporting it.

  41. Nathan on April 23rd, 2010 1:32 pm

    @Karen – calling something an epic fail is pretty rude, so the responses matched that flavor.

  42. Mankiw, thank you for making parenthetical notes about yourself in your textbook « things which happen twice on April 24th, 2010 4:39 am

    [...] Opinion and Discussion about the future of R. (The Julia Group/AnnMaria) [...]

  43. Chris on April 24th, 2010 1:42 pm

    If “the next big thing” is something that “the average person” will be doing, then clearly R is not it.

    It’s obvious that ubiquitous connectivity and vastly cheaper storage have led to corpora of data of almost unimaginable size by the “BMDP-era” standards, and that the analysis and visualization of this material will be (and already is) huge. For dabblers, a GUI will be all they need and all they see. Underneath that GUI, however, is an engine. I would hardly count R out on that score.

    Of course, if your definition of “big” is defined as the surface manifestations visible to most people, R ain’t it; but then again, such a prediction would be by definition superficial.

  44. R, the Epic Fail blog, and SOFA Statistics « Statistics Open For All on April 26th, 2010 1:33 am

    [...] R is an open source programming language and software environment for statistics. And it is not just any old programming language – it is the dominant system for open source statistics. So was it fair to call R an “epic fail” as Dr. AnnMaria De Mars did in her notorious blog post The Next Big Thing? [...]

  45. Grant Paton-Simpson on April 26th, 2010 2:01 am

    @AnnMaria – did you have any idea how much of an impact this post was going to have? The phrase “epic fail” certainly seemed to capture people’s imaginations ;-)

    I have just written a follow-up post “R, the Epic Fail blog, and SOFA Statistics” (http://www.sofastatistics.com/blog/?p=314) in which I relate the “Epic Fail” debate to directions for the open source SOFA Statistics project (http://www.sofastatistics.com). SOFA Statistics is free, with an emphasis on ease of use, learn as you go, and beautiful output. It is currently packaged for Windows and Ubuntu, with a Mac package in the pipeline.

  46. R and the Google Summer of Code 2010 – accepted students and projects! | R-statistics blog on April 26th, 2010 3:46 pm

    [...] Deducer and ggplot2) might finally provide the bridge to the layman-statistician that some people recently wrote to be one of R’s weak spots (while other bloogers wrote back that this is o.k., still no one [...]

  47. Erin Vang, PMP on April 28th, 2010 3:00 pm

    I’ve spent several decades in commercial statistical software development (working in a variety of R&D roles at SYSTAT, StatView, JMP, and SAS), and I now do custom JMP scripting, etc., to make my prejudices clear.

    I can say with hard-won authority that:

    - good statistical software development is difficult and expensive
    - good quality assurance is more difficult and expensive
    - designing a good graphical user interface is difficult, and expensive
    - a good GUI is worthwhile, because the easier it is to try more things, the more things you will try, &
    - creative insight is worth a lot more than programming skill

    Even commercial software tends to be under-supported, and I’ll be the first to admit that my own programming is as buggy as anybody else’s, but if I’m making life-and-death or world-changing decisions, I want to be sure that I’m not the only one who’s looked at my code, tested border cases, considered the implications of missing values, controlled for underflow and overflow errors, done smart things with floating point fuzziness, and generally thought about any given problem in a few more directions than I have. I want to know that when serious bugs are discovered, the knowledge is disseminated and somebody’s job is on the line to fix them.

    For all these reasons, I temper my sincere enthusiasm about the wide open frontiers of open source products like R with a conservative appreciation for software that has a big company’s reputation and future riding on its accuracy, and preferably a big company that has been in the business long enough to develop the paranoia that drives a fierce QA program.

    R is great for what it is, as long as you bear in mind what it isn’t. Your own R code or R code that you find sitting around is only as good as your commitment to testing and understanding of thorny computational gotchas.

    I share the apparently-common opinion that R’s interface leaves a lot to be desired. Confidentiality agreements prevent me from confirming or denying the rumors about JMP 9 interfacing with R, but I will say that if they turn out to be true, both products would benefit from it. JMP, like any commercial product, improves when it faces stiff competition and attends to it, and R, like most open source products, could use a better front end.

    [An expanded version of my comments are cross-posted on Global Pragmatica LLC’s blog as http://globalpragmatica.com/?p=230.

  48. The controversy about R: epic fail or epic success? on April 28th, 2010 4:20 pm

    [...] of Ann­Maria De Mars, Ph.D. (Pres­i­dent of The Julia Group and a SAS Global Forum attendee) in her blog that the open source sta­tis­ti­cal analy­sis tool R is an “epic fail,” or to put it in [...]

  49. haltux on April 29th, 2010 6:07 am

    “The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.”

    The vast majority of people are not interested in data analytics. People interested in data analytics who just “look at things and click” could not do any data analytics, whatever tool they would use.

    I don’t see the point in blaming a programming language for being a programming language.

    There is a lot to criticize in R, though. Compared to Python with proper packages (including its R interface), it is not really more powerfull, and far from being as well designed and as easy to use.

  50. Wayne Richter on May 9th, 2010 10:39 am

    You say that people “are used to looking at things and clicking on things.” Is this really true of people who use SAS for serious work? I am making the transition from SAS to R, due in substantial measure to working for a government agency lacking funds to continue with SAS. I have always done any meaningful analysis via a script. I wrote code for SAS and I now write code for R. Scripts are self-documenting. They enable reproducibility and coherent updating. Although it can be quicker to point and click one’s way through an analysis, being able a few months later to support a conclusion so derived can be nearly impossible.

    R certainly does not fit the way most people use computers – few computer users undertake meaningful analysis of numerical data. R does surely fit the approach of those who are serious about supportable analyses of non-trivial data sets.

  51. Alan on May 16th, 2010 7:45 pm

    I never have understood the almost religious fanaticism that goes along with some computer tools (Mac vs. PC, C# vs. Java and so on,ad infinitum). The world isn’t so black-and-white. Why pick only one package to use for an area so vast as data analysis?

    For data analysis, I use SAS, SPSS, R, Python, Java, Excel, ESRI and GRASS GIS, JMP, Tableau and (Revolution) R. You see, I have a toolbox and use whatever is appropriate for the task. None of these are rocket science by the way, so maybe your husband could help you understand them. Of course, if you are not able to learn programming (even with your “basket of degrees”), your set of choices will be more limited. A limited choice set can only reduce what you are able to do.

    By your logic, the most “marketable” human language is English. But I still took the time to learn French, Spanish and Japanese, and I find these useful. No doctor should take a specialty, because by your logic specialized skills aren’t marketable (I think dermatologists would disagree). You see, value comes not from market size, but from scarcity. There are certain so-called network effects that can be derived from using a common package, but R (and all the others) have reached a critical mass sufficient to generate these network effects.

    By the way, “data visualization” (by which you seem to mean “data exploration”; the difference is not just semantic) and “analyzing enormous quantities of unstructured data” are not software tools. They are already a big thing and most of us have been doing them for decades, using a variety of tools. SAS wouldn’t be my first choice for either of these tasks (and R wouldn’t be either).

  52. watson on May 20th, 2010 6:56 pm

    “On that, R is an epic fail. It does NOT fit with the way the vast majority of people in the world use computers. The vast majority of people are NOT programmers. ”

    I completely agree with your thoughts and aplaude them. I will also add to this, the R tutorials, documentation on how to use R packages and the general available instructions are so thin in the details its shocking…..

  53. datanalytics » Para que copien, peguen y disfruten on July 6th, 2010 9:07 am

    [...] A las alegaciones de que el código de R que publicó en su página no es, siquiera, código de R respondió diciendo que lo había copiado “de internet” (¡cuánto de pernicioso hay por esas páginas por donde uno navega sin temor de [...]

  54. Mark on July 27th, 2010 6:10 am

    I’m currently looking at stats software options for a large UK financial company, and as a result I’m interested in any discussions of the relative merits of various software options. What a shame that many self-important, arrogant stats geeks have posted such rude and socially inept responses here to what is a provocative but measured post that insults no one and opens itself to commentary. No doubt most of the unsavoury commentators wouldn’t dare make the same remarks face to face, but that’s typical of the spineless making the most of niche expertise and the internet’s relative annonymity; it’s all over the internet. While the poisonous few make unhelpful and self-indulgent flames, there are many useful commentaries here, and I’m rapidly coming to the conclusion that my end-users will benefit from a combined package of SAS and SPlus; they will be able to make the most of both softwares, exploiting each to its strengths. What I hope they won’t do is to retreat into a ghetto of ignorant name calling and pathetic one-upmanship.

  55. R be dragons | Timothée Poisot on August 18th, 2010 6:15 pm

    [...] R has a steep learning curve, a fact that some used to call it an epic fail [what is an epic fail?]. So does LaTeX, so does C, and so does almost any other language. If we [...]

  56. Art on October 6th, 2010 2:01 am

    I work in a research laboratory. The scientists here are very intelligent people. But they work with plants and insects, and their understanding of technology is not that great. They can explain in detail how a GC machine works, but they will not know the difference between RAM and Hard Disk. And they don’t need to, because it has nothing to do with their work. But they do need to do statistical analysis of a lot of data, and R is not at all suitable for them despite its features and ability to do so many things. Such people would find it much easier to use a calculator to analyze their data, rather than learn R and its syntax. Even a GUI such as R-commander is not very intuitive.
    I have been looking for a few months now for a package that has the power and features of R and a friendly interface ( maybe something like SPSS ), but i have not found anything suitable yet.
    I am thinking of a client-server system which uses R to analyse the data, Mysql to store the data, and a web-based front-end to enter the data and select the tests to be performed. I have seen some websites which allow you to process data upto a certain limit, but i have not seen anything similar that is freely available.

  57. BK Waas on October 27th, 2010 8:53 am

    I, too, am surprised at the ferocity and rudeness of some of the responses. There seem to be a lot of immature posters who have just pounced on the original posting as an opportunity to advertise their self-appraised intelligence.

    I’m trained as a computer programmer and have done stats and other things with computers for more than 20 years. The whole debate about the “next big thing” seems to me to be misplaced. I just use whichever tool strikes me as the most efficient way to accomplish my goals on a given project. Sometimes that’s R, sometimes it isn’t.

    Obsessing about a particular platform, whether it’s R or SAS or anything else, is just a stupid case of letting the tail wag the dog. I, for one, am more concerned about analyzing data than I am about whether one platform or another is acknowledged as the “next big thing.” You people clearly don’t have enough work to do, can’t get attractive dates, and probably have a lot of other serious problems. Get a friggin’ life, losers.

  58. R.sas on October 29th, 2010 11:56 pm

    nice BK Waas, but your comment is itself rude and a display of ———..!

    anyway, I’m a heavy user of R, and I agree it has limitations,and i will not recommend R to a biologist or journalist etc

    But if you’re into computational statistics, then R is really great (able to look into the source code, greater freedom …)
    which really explains why almost new statistical techniques/methods are first develop with R. Most Phds I know use R.

    Maybe R is not the ‘next big thing’ among nonspecialist (which is being fix by R community) but it is next big thing for the ‘computational statistics’ community and alike =).

    (one phd developer in SAS that i know really loves R and studies its statistical algorithms develop by the r community to test and extend it to sas)

  59. Vijayan on November 23rd, 2010 9:23 am

    I can not but disagree that R is an epic fail. I have never seen a software that is more versatile than R . To top it all it is free and has a great user community that is ready to help all who want to learn it.
    If tomorrows world belongs to Intellectuals, then R will take over every other commercial package that is there in the market today. I pray that day happens sooner than later.
    What you can acheive with R is limited by your imagination!
    Long Live R!

  60. disgruntledphd on April 9th, 2011 3:58 am

    Hi, i read this post last year (when i had just started with R), and have come back to it now (desperately trying to avoid real work).

    Anyway, thanks for the post, it certainly created a lot of buzz.

    What i would say (on the basis of my 1 year experience with R) that you have a point.

    R is harder than point and click. That being said, the reason i originally went towards R (i’m a psychologist) is because it has features for almost everything i want to use.

    Parallel analysis, item response theory, wonderful graphs and structural equation modelling can all be done with R. If i hadn’t learned R i would have had to learn about 5 or 6 different programs to do this, and i wouldn’t have been able to afford them if i wasn’t working in an academic or commercial environment.

    Also, when i find something missing in R, i can program it. Granted this takes time, but it can be done.

    Overall, although learning R (and i’m still learning) took time, when i integrated it with LaTeX and Sweave, it speeded up my writing of papers immensely. An example was a few months ago, when i went to my data file and found that it contained crazy values, i was able to revert back to an older file and rerun my analysis in about 30 minutes (it would have taken less time with a better computer). Also, this integration stops me worrying about creating tables in the correct format for journals, and enables anyone to reproduce my results.

    That being said, R (and its obscure error messages) have forced me to go back and relearn linear algebra and calculus as well as gain a shallow understanding of non linear optimisation. While this is probably not for everyone, it certainly allows me to do better analyses and understand what the hell i am doing (something which is sadly lacking in my field).

    So all in all, R has been a net gain for me, but i can understand that others may not feel the same way.

  61. Women jewelry on May 16th, 2011 6:43 am

    I have been hearing about the wonders of data visualization since about 1995. I have come to believe it is a nice-sounding but empty term. It’s like when managers use the word expect “excellence.”

  62. Really the next big thing: SAS Global Forum Day 3 : AnnMaria’s Blog on April 24th, 2012 12:31 am

    [...] Two years ago, I said the data visualization was the next big thing. I also said that people would stick with SAS because it was easier to use and there are more people who DON’T want to be programmers than do. [...]

  63. SOFA, open stats « Labrigger on July 5th, 2012 9:05 am

    [...] then I wonder why we’re friends. But since we are, here are some links, you psychopath: “R is a programming language missing a GUI” “R is really important to the point that it’s hard to overvalue it” [...]

  64. Aman Khurana on December 25th, 2012 2:40 pm

    While it is an ongoing debate whether R on its own is the next big thing, I feel if the mega trends of cloud, big data and statistical analytics are put together, it has a chance to emerge as the next big thing. There are many initiatives in this directions and ours at GingerBrain http://www.gingerbrain.com is one amongst those. Appreciate if you can take a moment to drop by our web site and share your honest feedback on whether something like this makes sense..Thank you.

    -Aman

Leave a Reply