| |

The Next Big Thing

I’m at Seattle this week, at SAS Global Forum, and it is even greater than usual. I go to several conferences each year, some because I am presenting, some because there is a topic that particularly interests me, but there are three I go to every year.  Of these, SAS Global Forum is the one I would absolutely not miss. It is not for those on a limited budget, but it is worth it. You get the chance to meet A LOT of the smartest people in the world. Seriously. And I have a basket of degrees and am married to an honest-to-God rocket scientist so my bar for “smartest people in the world” is pretty high.

One of the other two I always attend are the Western Users of SAS Software conference, you learn a lot , it’s relatively inexpensive and not far to travel. Lots of bang for the buck. The second is the SPSS Directions conference.

At ALL of these, and in general, in the back of my mind all of the time, I am looking for “the next big thing”.  Whether as an individual, a university or a company, I think to stay competitive in the long-run you need to be ahead of the learning curve, as people who want to be smart-asses refer to it, “bleeding edge”. Think about it, if you were teaching statistics twenty years ago, you had the choice of having your students learn SPSS, SAS, SYSTAT, BMDP or Minitab. Of those, BMDP, which was “for real statisticians”, kind of like the R of the day, is one I haven’t seen used in years. I thought SYSTAT was off the market but I did see an ad for it recently, surprised to hear it still existed.

If you had taught your students SAS twenty years ago and they stuck with it they are much more marketable now than if you had made the other choices. My definition of marketable is based on how many jobs are available requiring SAS as skill, and how extensible those skills are. For example, Stata is not really feasible to use for running a company’s entire data management and data analysis. If you are an individual economist and you just need to do some specific econometric procedures, you don’t care about that, but if you are looking for “the next big thing”, something that will be around and used by millions of people twenty years from now, Stata is probably not it. Actually, I don’t think that’s their plan, anyway. I think their plan is to be a very good choice for high-level statistical analysis and stay in business as a profitable company.

Contrary to what  some people seem to think, R is definitely not the next big thing, either. I am always surprised when people ask me why I think that, because to my mind it is obvious. Note: For those of you who were so unhappy with the example I used previously, here is a new snippet of code from the site R by example

Below is an example of R code:

# Goal: Simulate a dataset from the OLS model and obtain
# obtain OLS estimates for it.

x <- runif(100, 0, 10) # 100 draws from U(0,10) y <- 2 + 3*x + rnorm(100) # beta = [2, 3] and sigma = 1 # You want to just look at OLS results? summary(lm(y ~ x)) # Suppose x and y were packed together in a data frame -- D <- data.frame(x,y) summary(lm(y ~ x, D)) # Full and elaborate steps -- d <- lm(y ~ x) # Learn about this object by saying ?lm and str(d) # Compact model results -- print(d) # Pretty graphics for regression diagnostics -- par(mfrow=c(2,2)) plot(d) Follow this link for the rest of the program.

I know that R is free and I am actually a Unix fan and think Open Source software is a great idea. However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail. It does NOT fit with the way the vast majority of people in the world use computers. The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.
There are two developments that I see coming as The Next Big Thing.
Data visualization. I am teaching a workshop this summer on this topic. This isn’t an ad, it is not open to the public so you can’t come anyway. I’m teaching it because I have seen more and more professors AND students frustrated by the fact that the average graduate student has trouble really understanding statistics. They may be able to get the correct answer on a multiple choice test that asks about a critical p-value. I have lived over half a century now and discovered that life holds very few multiple choice tests. We need statistical thinking, data literacy or whatever cool catch phrase someone can coin. This is the wave of the future. I am going to use examples from SPSS, SAS Enterprise Guide and JMP in this course because they can all be done with the pointing and clicking AND for those who want to go further all have a coding option, giving that extensibility thing.

Analyzing enormous quantities of unstructured data: First, let me explain structured data. That is data that is in a set format. Say, you have your annual expenditures. The first column is date of expense, the second column is check number, the third is the amount. That’s structured data. It can be over more than one row and in all sorts of other ways but the main point is that you have some sort of definite structure. The overwhelming majority of data – forum posts, blogs, comments on customer service cards, websites, etc. etc. is unstructured data. People start wherever they want, finish wherever they want, change subjects and just basically do it however the hell they way.  And there is a ginormous amount of this stuff. The Next Big Thing is going to be finding meaning from this data. Google and its imitators are doing it with their search engines. Every company that has a clue is mining for market information.
So, for the next year, those are the eggs I am putting in my basket. I am sure the shape of those two fields will change over the years, but I guarantee that neither will go the way of BMDP, MUMPS and COBOL.

Similar Posts

67 Comments

  1. I never have understood the almost religious fanaticism that goes along with some computer tools (Mac vs. PC, C# vs. Java and so on,ad infinitum). The world isn’t so black-and-white. Why pick only one package to use for an area so vast as data analysis?

    For data analysis, I use SAS, SPSS, R, Python, Java, Excel, ESRI and GRASS GIS, JMP, Tableau and (Revolution) R. You see, I have a toolbox and use whatever is appropriate for the task. None of these are rocket science by the way, so maybe your husband could help you understand them. Of course, if you are not able to learn programming (even with your “basket of degrees”), your set of choices will be more limited. A limited choice set can only reduce what you are able to do.

    By your logic, the most “marketable” human language is English. But I still took the time to learn French, Spanish and Japanese, and I find these useful. No doctor should take a specialty, because by your logic specialized skills aren’t marketable (I think dermatologists would disagree). You see, value comes not from market size, but from scarcity. There are certain so-called network effects that can be derived from using a common package, but R (and all the others) have reached a critical mass sufficient to generate these network effects.

    By the way, “data visualization” (by which you seem to mean “data exploration”; the difference is not just semantic) and “analyzing enormous quantities of unstructured data” are not software tools. They are already a big thing and most of us have been doing them for decades, using a variety of tools. SAS wouldn’t be my first choice for either of these tasks (and R wouldn’t be either).

  2. “On that, R is an epic fail. It does NOT fit with the way the vast majority of people in the world use computers. The vast majority of people are NOT programmers. ”

    I completely agree with your thoughts and aplaude them. I will also add to this, the R tutorials, documentation on how to use R packages and the general available instructions are so thin in the details its shocking…..

  3. I’m currently looking at stats software options for a large UK financial company, and as a result I’m interested in any discussions of the relative merits of various software options. What a shame that many self-important, arrogant stats geeks have posted such rude and socially inept responses here to what is a provocative but measured post that insults no one and opens itself to commentary. No doubt most of the unsavoury commentators wouldn’t dare make the same remarks face to face, but that’s typical of the spineless making the most of niche expertise and the internet’s relative annonymity; it’s all over the internet. While the poisonous few make unhelpful and self-indulgent flames, there are many useful commentaries here, and I’m rapidly coming to the conclusion that my end-users will benefit from a combined package of SAS and SPlus; they will be able to make the most of both softwares, exploiting each to its strengths. What I hope they won’t do is to retreat into a ghetto of ignorant name calling and pathetic one-upmanship.

  4. I work in a research laboratory. The scientists here are very intelligent people. But they work with plants and insects, and their understanding of technology is not that great. They can explain in detail how a GC machine works, but they will not know the difference between RAM and Hard Disk. And they don’t need to, because it has nothing to do with their work. But they do need to do statistical analysis of a lot of data, and R is not at all suitable for them despite its features and ability to do so many things. Such people would find it much easier to use a calculator to analyze their data, rather than learn R and its syntax. Even a GUI such as R-commander is not very intuitive.
    I have been looking for a few months now for a package that has the power and features of R and a friendly interface ( maybe something like SPSS ), but i have not found anything suitable yet.
    I am thinking of a client-server system which uses R to analyse the data, Mysql to store the data, and a web-based front-end to enter the data and select the tests to be performed. I have seen some websites which allow you to process data upto a certain limit, but i have not seen anything similar that is freely available.

  5. I, too, am surprised at the ferocity and rudeness of some of the responses. There seem to be a lot of immature posters who have just pounced on the original posting as an opportunity to advertise their self-appraised intelligence.

    I’m trained as a computer programmer and have done stats and other things with computers for more than 20 years. The whole debate about the “next big thing” seems to me to be misplaced. I just use whichever tool strikes me as the most efficient way to accomplish my goals on a given project. Sometimes that’s R, sometimes it isn’t.

    Obsessing about a particular platform, whether it’s R or SAS or anything else, is just a stupid case of letting the tail wag the dog. I, for one, am more concerned about analyzing data than I am about whether one platform or another is acknowledged as the “next big thing.” You people clearly don’t have enough work to do, can’t get attractive dates, and probably have a lot of other serious problems. Get a friggin’ life, losers.

  6. nice BK Waas, but your comment is itself rude and a display of ———..!

    anyway, I’m a heavy user of R, and I agree it has limitations,and i will not recommend R to a biologist or journalist etc

    But if you’re into computational statistics, then R is really great (able to look into the source code, greater freedom …)
    which really explains why almost new statistical techniques/methods are first develop with R. Most Phds I know use R.

    Maybe R is not the ‘next big thing’ among nonspecialist (which is being fix by R community) but it is next big thing for the ‘computational statistics’ community and alike =).

    (one phd developer in SAS that i know really loves R and studies its statistical algorithms develop by the r community to test and extend it to sas)

  7. I can not but disagree that R is an epic fail. I have never seen a software that is more versatile than R . To top it all it is free and has a great user community that is ready to help all who want to learn it.
    If tomorrows world belongs to Intellectuals, then R will take over every other commercial package that is there in the market today. I pray that day happens sooner than later.
    What you can acheive with R is limited by your imagination!
    Long Live R!

  8. Hi, i read this post last year (when i had just started with R), and have come back to it now (desperately trying to avoid real work).

    Anyway, thanks for the post, it certainly created a lot of buzz.

    What i would say (on the basis of my 1 year experience with R) that you have a point.

    R is harder than point and click. That being said, the reason i originally went towards R (i’m a psychologist) is because it has features for almost everything i want to use.

    Parallel analysis, item response theory, wonderful graphs and structural equation modelling can all be done with R. If i hadn’t learned R i would have had to learn about 5 or 6 different programs to do this, and i wouldn’t have been able to afford them if i wasn’t working in an academic or commercial environment.

    Also, when i find something missing in R, i can program it. Granted this takes time, but it can be done.

    Overall, although learning R (and i’m still learning) took time, when i integrated it with LaTeX and Sweave, it speeded up my writing of papers immensely. An example was a few months ago, when i went to my data file and found that it contained crazy values, i was able to revert back to an older file and rerun my analysis in about 30 minutes (it would have taken less time with a better computer). Also, this integration stops me worrying about creating tables in the correct format for journals, and enables anyone to reproduce my results.

    That being said, R (and its obscure error messages) have forced me to go back and relearn linear algebra and calculus as well as gain a shallow understanding of non linear optimisation. While this is probably not for everyone, it certainly allows me to do better analyses and understand what the hell i am doing (something which is sadly lacking in my field).

    So all in all, R has been a net gain for me, but i can understand that others may not feel the same way.

  9. I have been hearing about the wonders of data visualization since about 1995. I have come to believe it is a nice-sounding but empty term. It’s like when managers use the word expect “excellence.”

  10. While it is an ongoing debate whether R on its own is the next big thing, I feel if the mega trends of cloud, big data and statistical analytics are put together, it has a chance to emerge as the next big thing. There are many initiatives in this directions and ours at GingerBrain http://www.gingerbrain.com is one amongst those. Appreciate if you can take a moment to drop by our web site and share your honest feedback on whether something like this makes sense..Thank you.

    -Aman

  11. I’m a statistician who has been using SAS for over 20 years and I also know some R. I think there is a point here that we statisticians might be missing. Suppose you, as a statistician, were asked to become proficient in another field in the next month or so. In order to do so, you would need to read and digest five to ten very difficult textbooks and to also learn a new programming language that is popular in this new field. Would you have time, given all the work that goes into maintaining your knowledge of statistics?
    Now look at it in terms of, say, a new medical researcher who has had two methods courses, knows ANOVA and multiple regression, but has been given a dataset with 100 variables (not uncommon anymore). If there is no statistician nearby, what course of action should he or she take, given that there has been no exposure to multivariate statistics? Perhaps we statisticians haven’t provided the right tools to other researchers so that they could at least be able to look at such data in a reasonable and systematic way. — Bill

  12. You can go back to your high school and ask your computer teacher is macro what majority of people use, Mr Epicfail.

Leave a Reply

Your email address will not be published. Required fields are marked *