I’m at Seattle this week, at SAS Global Forum, and it is even greater than usual. I go to several conferences each year, some because I am presenting, some because there is a topic that particularly interests me, but there are three I go to every year. Of these, SAS Global Forum is the one I would absolutely not miss. It is not for those on a limited budget, but it is worth it. You get the chance to meet A LOT of the smartest people in the world. Seriously. And I have a basket of degrees and am married to an honest-to-God rocket scientist so my bar for “smartest people in the world” is pretty high.
One of the other two I always attend are the Western Users of SAS Software conference, you learn a lot , it’s relatively inexpensive and not far to travel. Lots of bang for the buck. The second is the SPSS Directions conference.
At ALL of these, and in general, in the back of my mind all of the time, I am looking for “the next big thing”. Whether as an individual, a university or a company, I think to stay competitive in the long-run you need to be ahead of the learning curve, as people who want to be smart-asses refer to it, “bleeding edge”. Think about it, if you were teaching statistics twenty years ago, you had the choice of having your students learn SPSS, SAS, SYSTAT, BMDP or Minitab. Of those, BMDP, which was “for real statisticians”, kind of like the R of the day, is one I haven’t seen used in years. I thought SYSTAT was off the market but I did see an ad for it recently, surprised to hear it still existed.
If you had taught your students SAS twenty years ago and they stuck with it they are much more marketable now than if you had made the other choices. My definition of marketable is based on how many jobs are available requiring SAS as skill, and how extensible those skills are. For example, Stata is not really feasible to use for running a company’s entire data management and data analysis. If you are an individual economist and you just need to do some specific econometric procedures, you don’t care about that, but if you are looking for “the next big thing”, something that will be around and used by millions of people twenty years from now, Stata is probably not it. Actually, I don’t think that’s their plan, anyway. I think their plan is to be a very good choice for high-level statistical analysis and stay in business as a profitable company.
Contrary to what some people seem to think, R is definitely not the next big thing, either. I am always surprised when people ask me why I think that, because to my mind it is obvious. Note: For those of you who were so unhappy with the example I used previously, here is a new snippet of code from the site R by example
Below is an example of R code:
# Goal: Simulate a dataset from the OLS model and obtain
# obtain OLS estimates for it.
x <- runif(100, 0, 10) # 100 draws from U(0,10)
y <- 2 + 3*x + rnorm(100) # beta = [2, 3] and sigma = 1
# You want to just look at OLS results?
summary(lm(y ~ x))
# Suppose x and y were packed together in a data frame --
D <- data.frame(x,y)
summary(lm(y ~ x, D))
# Full and elaborate steps --
d <- lm(y ~ x)
# Learn about this object by saying ?lm and str(d)
# Compact model results --
# Pretty graphics for regression diagnostics --
I know that R is free and I am actually a Unix fan and think Open Source software is a great idea. However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail. It does NOT fit with the way the vast majority of people in the world use computers. The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.
There are two developments that I see coming as The Next Big Thing.
Data visualization. I am teaching a workshop this summer on this topic. This isn’t an ad, it is not open to the public so you can’t come anyway. I’m teaching it because I have seen more and more professors AND students frustrated by the fact that the average graduate student has trouble really understanding statistics. They may be able to get the correct answer on a multiple choice test that asks about a critical p-value. I have lived over half a century now and discovered that life holds very few multiple choice tests. We need statistical thinking, data literacy or whatever cool catch phrase someone can coin. This is the wave of the future. I am going to use examples from SPSS, SAS Enterprise Guide and JMP in this course because they can all be done with the pointing and clicking AND for those who want to go further all have a coding option, giving that extensibility thing.
Analyzing enormous quantities of unstructured data: First, let me explain structured data. That is data that is in a set format. Say, you have your annual expenditures. The first column is date of expense, the second column is check number, the third is the amount. That’s structured data. It can be over more than one row and in all sorts of other ways but the main point is that you have some sort of definite structure. The overwhelming majority of data – forum posts, blogs, comments on customer service cards, websites, etc. etc. is unstructured data. People start wherever they want, finish wherever they want, change subjects and just basically do it however the hell they way. And there is a ginormous amount of this stuff. The Next Big Thing is going to be finding meaning from this data. Google and its imitators are doing it with their search engines. Every company that has a clue is mining for market information.
So, for the next year, those are the eggs I am putting in my basket. I am sure the shape of those two fields will change over the years, but I guarantee that neither will go the way of BMDP, MUMPS and COBOL.