Jun
28
Sabbaticals?
June 28, 2010 | 3 Comments
When I was in high school I had a very defined career path. I told anyone who asked me (which was very few people since no one cares what a high school kid thinks) that my career goal was to be president of General Motors. I even applied to the General Motors College. (Bet you didn’t know they had their own college!)
They were my first choice but due to a glitch in getting the materials in from my school, my paperwork was not complete by the deadline and their rules were carved in stone.
So, I went to Washington University in St. Louis, where I received a great education despite the fact that I attended slightly more parties than classes. Wash U is generally known more for its pre-med program than as a party school but I didn’t let that stop me.
At 19, I graduated from college. Worked full-time all through college, worked full-time while getting my MBA and as an engineer for several years after that. For a while, I taught math and got a second masters at the same time.
At 29, I quit working full-time so I could finish my Ph.D.
After I graduated, I started working as a professor, expanded my consulting business I had started in 1985 into full-time. (Yes, that’s two jobs.)
At 39, I quit working full-time and took a post-doctoral position for a year. (Having my fourth baby at 39 slowed me down a bit.) Then, I took a position at a consulting company and continued the company I had started in 1985, taking on more and more business. (Yes, that’s two jobs.)
At 49, I quit working full-time, briefly retired and then took a position at a university. By then, the consulting company had split into two companies,which form The Julia Group. (Yes, that’s two jobs. In fact, for a while it was three as I was teaching statistics for the graduate division of another university.)
Someone noticed this recently and asked me,
“Do you deliberately take sabbaticals? That is supposed to be something only done in universities. And what are you going to do now?”
Well, it hasn’t been completely coincidental that I have done reverse sabbaticals and gone to a university every decade.
I find that as a consultant, I get paid for what I know and what I do. So, I may get asked to do a repeated measures Analysis of Variance over and over, for six projects in a row. Or, I may find myself repeatedly getting contracts to write grants for the Department of Education, because I have already gotten several funded.
Business is like that. There may be a few rare jobs where you get paid to learn things but those are mostly jobs where you learn things AFTER you have already put in a 40 hour week and those are your other 20 hours.
When I was an undergraduate, back when I attended classes with Fred Flintstone and Barney Rubble (if you even recognize that reference you are old!), there was a saying,
“No one ever got in trouble for buying IBM.”
Business hasn’t really changed all that much. Plenty of people buy Microsoft products because that is what they have always bought. If you’re going to hire a consultant to do X, you are pretty safe hiring a consultant who has already done X seven times for satisfied clients. That way, even if the person screws up, no one can blame you, it’s a reasonable choice.
So, every ten years or so, I get tired of doing X and I decided to do Y or 7 or purple.
I decide,
“You know, programming in SAS is cool, but I think I’ll take a look at what this Enterprise Guide thing will do, or maybe JMP or data mining or see what they’ve been developing at SPSS. Or, what the hell, maybe I’ll just go to Beijing and Tunisia.”
More than once, I have been called “insane” for giving up a great opportunity. The irony of that is that the second and third time I was giving up insanely great opportunities that I wouldn’t have had if I had not been “insane” enough to give up the first one.
I’ve been an engineer, math teacher, professor, statistician, programmer, consultant – and for thirty years run a business while raising four daughters.
And yet, the bizarre fact is that it has all turned out okay. After every “sabbatical” (which, incidentally, has always entailed a HUGE cut in pay because university salaries * blow * compared to the corporate sector), I’ve stepped into a new stream that paid much better and was more challenging than when I left.
Not only have I ignored every bit of career advice I was ever given from, “Stick to one thing,” to “Dress for success” to “Don’t have pictures of your children on your desk or you won’t be taken seriously” to “Always show up at work before the boss” to “Don’t express your own opinions”.
but .. to most of it I have replied,
“Bite me!”
It occurred to me that I have not so much had a career path as a career random walk.
Yet, it has turned out okay, as measured on The Julia Group scale, which is a factor score consisting of (unequally weighted) jelly beans, Chardonnay, time spent laying on tropical beaches, how much I love my children, terabytes and years of marriage to someone who brings me coffee in bed at 9 a.m.
So, what now? Well, I have a contract under review with a federal agency, six papers I’m committed to write and the family wants to go to Hawaii.
After that? I haven’t the faintest idea. But I’m sure I’ll like it. Because if I don’t, I won’t do it.
Jun
22
What Would I Do with JMP?
June 22, 2010 | 4 Comments
A large part of my day is spent playing with new software and trying to break it. Yes, there are actually grown-ups who get paid to do this for a living.
I find it hard to believe myself.
The theory, which actually works well, is that whenever someone has a question about something he or she wants to do, no matter how esoteric, I will have tried it at some point, based on my general philosophy of life which is, “What the hell… let’s see what happens.”
My inappropriately named desktop, since it is actually under my desk, runs Mac OS 10.6 and has five virtual machines with Vista, Windows 7 (32 & 64 bit), XP and Ubuntu. There is a supercomputer over my head that I can tap into from here directly that also runs SAS and Stata. So, why would I need JMP?
Besides, what really annoyed me at all the JMP events I went to (an N of 3) were all about look at these pretty pictures we got with JMP and nothing on how to do it. Finally, I went to one at SAS Global Forum which was by Wayne Levin of Predictum and was excellent (full disclosure: I probably wouldn’t recognize Wayne Levin again if I tripped over him, I only know the name because it is on a handout on my desk which has not been cleaned since I got back from SGF and he’s never given me so much as a jelly bean. It was still excellent.)
JMP is one of the many things that has been laying around here for the last couple of years that I’d look at every now and then, and think maybe I should do something with this. Lately, three things occurred to me.
1. It runs on a Mac, thus sparing me the 30 seconds of opening a virtual machine, that could then be used for such extremely important tasks as getting jelly beans out of my drawer.
2. It makes pictures, which fits well into my current interest in visual data analysis.
3. It gives me an answer for people who call up and say,
“SAS doesn’t run on a Mac? What the hell am I supposed to do now?”
I am actually married to one of those people who doesn’t believe anyone should buy software unless it AT LEAST runs on a mac and preferably Linux, too. Learning JMP turned out to be less trouble than finding another husband as good as the one I already have, so I decided to go with that.
I had a dataset downloaded from ICPSR and that I had done lots of work on in SAS. I was working on a project with someone who only uses JMP. So, I saved the dataset as a JMP file. We were working on a project to predict who would enlist in the military. I had a sample of > 2,500 high school sophomores who had been asked their plans after graduation. In JMP, I select ANALYZE from the main menu and then DISTRIBUTION. I moved the two variables into the Y column and clicked OK.
JMP TIP —-> NOTICE THE ARROWS —>
Those little red arrows next to almost everything do stuff. For example, when the results window first came up, I didn’t like the looks of it. No, it wasn’t rolling its eyes at me. It had the histogram vertically oriented and a table of Quantiles I had no interest in. Grey arrows expand and contract things. Red arrows give you options. If a grey arrow is pointing dowm and you click it, it hides what is underneath. Conversely, if it is pointing sideways it has hidden stuff underneath and you can click it to expand and see what that is. So. I got rid of the quantiles.
Clicking on the red arrow next to each variable gives a whole list of options and some options of the options. I clicked HISTOGRAM OPTIONS and then I clicked on VERTICAL which had been selected by default. Then I selected SHOW PERCENTS. Here is my first picture and my first conclusion. People are a bunch of liars.
Curt Gilroy, who was cited in the Army Times and has the impressive title of Director of Accessions for the Pentagon (which does not, despite what may have been implied by Sister Marion in the seventh-grade, have anything to do with the Virgin Mary going to heaven. That was the Ascension, or the Assumption. Either way, it definitely did not involve the Pentagon.)
Anyway, Gilroy says that 12% of military eligible youth show an interest in military service. So, if we put the 4% who said they “definitely will” (=4) and the 9% who said they “probably will” (=3) join the armed services after high school, we get 13% which sound about right.
However, 89% say that they definitely or probably will go to a four-year college. Uh, no. First of all, the percentage of freshman students who will graduate is only 73% according to the National Center for Education Statistics and of those only 69% will enroll in a four-year school. So, .73 *.69 = 50.4% and even given that some of the high school drop out has already occurred by the spring of tenth grade, uh, how about no, 89% of you are not going to four-year schools I am sorry to say.
I think race is a factor in military service. The data I used included race as 1 = African-American 2= White 3= Everyone else. I thought that third category doesn’t really make much sense for analysis. So, I created a new variable African-American which was 1 if race =1 and 0 if race = 2 or 3. Here is how:
Select COLS then NEW COLUMN. In the pop-up window, give it a name and then select FORMULA under column properties.
In the functions select CONDITIONAL and pick IF.
Formula box will pop up and it should be pretty obvious. You can just click on RACE to have it moved into your formula, then type = and put a 1 in the first box and a one in the second box for
If RACE = 1 then the new variable = 1.
Next, I can go to ANALYZE, MODELING, PARTITION and click on SPLIT a few times and I get my decision tree. It’s a start. I still think race should factor in there and I think the reason it doesn’t is because of that “garbage category” of three for everyone else – Asian, Native American, people who didn’t say. My hypothesis is that if I change that, race will become a factor.
So, what would I do with JMP? I guess since I should have left for home an hour ago, the answer is “get immersed in questions I’m interested in and lose track of time.”
Essentially, the same thing I do every day.
Jun
17
Why SAS Enterprise Miner on demand is like Jennifer & other thoughts on learning data mining
June 17, 2010 | 1 Comment
A miracle has occurred and I have had time to spend evaluating two things that have been on my to-do list forever, JMP and SAS Enterprise Miner. Both of these products are produced by SAS and the first interesting point is that knowing SAS won’t really help at all. That isn’t to say that having some knowledge of programming logic won’t help. In fact, I am taking a data mining course just for fun. It is very interesting because while I have taken plenty of workshops and short courses I haven’t been a student in a regular class for over a decade.

Jenn after finishing her M.A.
Until she went to college, every parent-teacher conference ever held about my daughter, Jennifer, went like this:
“Jennifer has great potential. She is obviously brilliant and if she just exerted some effort, she could do anything she wanted. Jennifer makes A’s on all of the work that she turns in. ”
In fact, Jenn dropped out of high school, took her GED, went to community college, finished her B.A. at 21, taught school for a while and had her masters degree from USC by 24. So, that is my general view on SAS Enterprise Miner on-demand. I think it has great potential and is worth keeping around. When it grows up, it will do impressive stuff and be a really good teacher.
In learning data mining, whether using JMP or Enterprise Miner, background knowledge makes difference. Because I have had decades of experience with both programming and statistics when I see something in JMP like FORMULA > Conditional it makes perfect sense to me as an IF statement. Some people reading this are probably thinking, “Of course”. If you are one of those people you may be proficient with SAS – or SPSS syntax or any number of programming languages. In Enterprise Miner, when I right-click on the Partition Node and see options like Cluster and Stratification, again, I think “of course”. This is why my fellow students hate me.
It’s not just me. There were a few posts in this cool blog, Bzst on SAS Enterprise Miner’s On-Demand version.
SAS Enterprise Miner
http://blog.bzst.com/2009/10/sas-on-demand-enterprise-miner-update.html
http://blog.bzst.com/2010/05/sas-on-demand-take-3-success.html
and I agreed with pretty much all of her points. Enterprise Miner is cool and the current on-demand version is a great improvement. It is much easier to install than the desktop version and as far as the client-server version, it involves over 340,000 steps to install ,one of which (and I may be imagining this) requires a band of marching flamingos.

Bring in the Flamingos
So… points in favor of Enterprise Miner on Demand
1. Way easier to install than previous versions
2. Free for students and faculty for teaching purposes
3. Students like it better than sitting in the lab. They can download and use on their computers.
4. Just the general cool options- you can use the Partition to create a test, training and validation data set. When you first read in a data source you can set Bayes prior probabilities, you can include the costs of decisions. It is really cool. I was going to include screen shots of some of the really cool output from the cluster analysis I did earlier today but SAS EM kept giving me an error about
“The load balancing object spawner timed out. Please check your Enterprise Miner license.”
Disadvantages
1. It is not easy. Very little of it is self-evident and even less so if you have never used SAS or JMP. As Dr. Shmuéli said in her posts, most MBA students probably aren’t going to thrilled by the need to download, install and learn another piece of software. On the other hand, those really interested in statistics,software or data mining will probably be pumped about that part.
2. As noted in the BZST blog also, if you don’t have some knowledge of statistics and a general idea of program logic you are going to have a hard time using Enterprise Miner. Some people, and I can’t say I wholly disagree, will say this not a disadvantage. You should know what the heck you are using.
3. It can be excruciatingly slow. Sometimes it pops up in a minute. It may take 15 minutes between the time it opens and one analysis runs and gives results when you add the delay in opening EM, adding a new data source, creating a new diagram, dragging the data source to the diagram, creating a sample and running an analysis. When using it at my desk I usually read a book while waiting for each step to execute. For teaching in a lab is just about useless from what I have observed. [And kudos to those brave souls who tried.]
4. It is unreliable. Even while writing this blog on the cool stuff it does, I could not get it to come up to do the cool stuff.
So…. EM is like Jennifer, because:
1. It will no doubt be awesome when it is all grown-up
2. It is worth waiting around for, and
3. The growing pains in the mean time can be REALLY irritating (oh, you have no idea).
Jun
7
SAS Enterprise Guide as a Magic Wand
June 7, 2010 | 3 Comments
You know those give-aways you get at conferences? The favorite one I ever saw, which I did not get because they had run out of them, was a magic wand. It was wand-shaped, with sparkles floating inside, they had a vase full of them with a note that this was the magic wand some people seemed to want you to wave to make all of the bugs disappear.
It did not have sparkles, much to my disappointment, but SAS Enterprise Guide actually made all of my data problems disappear today and I was happy.
Here is my problem – I downloaded a dataset from ICPSR (Interuniversity Consortium for Political and Social Research) that had hundreds of variables, each of which had a user-defined format and a name like V12345. I did not want hundreds of variables. I actually only wanted (I thought) 21.
So, first I did this in SAS 9.2 which read in the .stc file using PROC CIMPORT and kept me from getting format errors since I had the nofmterr option.
Libname in "E:\DS0008" ;
Filename readit 'E:\DS0008\25422-0008-Data.stc' ;
proc cimport infile = readit library = in ;
options nofmterr ;
data in.iom ;
set in.da25422p8 ;
keep caseid v4259 - v4267 v4240 v4253 v4254 v4255 v4241 v4240 v4116 - v4121 ;
run;
BUT … I still wanted to rename all of these variables and change the formats. I closed SAS and opened up Enterprise Guide.
Under EDIT, I turned off the PROTECT DATA. Then, for each of the variables, I right-clicked on the column (actual ctrl-click, since I was using a Mac) and selected properties. This was very efficient for me because I was not actually sure these were the exact variables I wanted and when I saw the labels I could delete some right then. I changed the names, labels and formats.
I did not have to do a proc contents, write a drop statement for the variables I didn’t want, a rename statement for the variables I wanted to rename, a label statement for the variables I wanted to relabel and then an attrib statement or some other method of changing the format.
Then, I opened a code window and wrote a few lines for all numeric variables to have the -9 value changed to . so it was Missing and didn’t throw off my calculations.
Because I was very curious I selected from TASKS > CHARACTERIZE DATA to take a look at what I had. It was kind of sad, really. These data are from a longitudinal study of youth, and the particular variables I had were from their senior year of high school. The sad part was the great disparity between the percentage of students who said they expected to go to a four-year college and the percentage who actually will. Because this was the interesting part, I went to TASKS > MULTIVARIATE > CORRELATIONS (yeah, I wouldn’t have put correlations there, either, but whatever). In short, mother’s and father’s education both relate significantly to every positive educational outcome you can imagine, but mother’s education matters more.
I right-clicked on the dataset in the Process Flow window and picked Export, to export it to Excel, since the person I am working with on this project does not have SAS on her computer.
Okay, it’s past 1 a.m. and even though it is supercool that I was able to at least look at my data somewhat tonight, I need to go to bed so I can get up tomorrow and work to buy the world’s most spoiled twelve-year-old what she decides she wants next. Today it was a 32G iPhone, but the fact that my sixth-grader has in her pocket more computing power than existed in the world when her grandparents were twelve, well that’s another story.
Jun
2
SAS ENTERPRISE MINER NOT WORKING? HERE’S WHY (maybe)
June 2, 2010 | 7 Comments
If I had time, which I don’t, I would start a series of how-to articles for statistical software and copy the Car Talk scale they use as a guide for whether or not you should attempt a job yourself, from
a. There are two kinds of screwdrivers ?
to
e. I have built a working nuclear reactor out of wood
I was very excited when I heard that SAS On-Demand was going to offer a cloud version of Enterprise Miner for use in teaching, for free, even. “Was” is the key word in that sentence. Should you do this yourself? Well, it depends. This is very far past an “e” on the Do-it-yourself scale. Do you remember the part in Iron Man where the guy built the Iron Man super hero suit out of spare parts salvaged from a plane crash while trapped in a cave? Well, if you’re that guy, you can do it yourself.
Sigh. I can discern from the fact that you are still reading this that you are not going to listen to me and you are going to try anyway. Yeah, I didn’t listen to me either. There are approximately 3,476 steps in getting Enterprise Miner to work. Let’s assume you have a SAS profile, you logged in, you have a user name and password for SAS on-demand and you have set up a course or someone has registered you for a course. If you are lost already, go here:
http://support.sas.com/ondemand/index.html
This is pretty straightforward all of the information you need to set up your account. If you try setting up your account and Enterprise Miner does not work, as in, failed to start, your problem may be that you have the wrong version of Java enabled. You may have been fooled by the system requirements for Enterprise Miner which said: {Warning incorrect information between lines}
==========================================================================
“System Requirements for SAS® Enterprise MinerTM
Operating System(s)Any system that supports the Sun Microsystems Java Runtime Environment (JRE). Typically, this includes Unix, Linux, and various Microsoft Windows operating systems, such as Microsoft Windows XP and Microsoft Windows Vista.
Macintosh operating systems are not officially supported. For information about a possible workaround that you could test, see SAS Usage Note 18131.
Java Runtime EnvironmentJava Runtime Environment (JRE) version 1.6.0_15 or greater.”
=========================================================================
NOT EXACTLY !!!
Do not be fooled that everything you need to know about systems requirements is on the page you get when you follow the link system requirements.
After you log in to your SAS on-demand account and click on a link to install your software you will see a link about configuring your system for Enterprise Miner. CLICK ON THIS LINK AND READ EVERYTHING OR YOU WILL BE SORRY.
http://support.sas.com/ondemand/emconfig.html
- You may be tempted to skip over the part about the Java Run Time environment because you just read the part above under systems requirements and you met those. Do not do that.
- You may be tempted to go to the Sun site and download the latest JRE version. Do not do that either.
Do ALL of the stuff on this page linked above.
Go to cmd and type javaws -viewer.
If you don’t have JRE 1.6.0_18 enabled (and who does?) go to the link on that page and download it. It is < NOT > the latest version.
Follow the directions on the page that I told you to read every word of and uncheck all of the other versions you may have installed so that only 1.6.0_18 is enabled.
Now … try starting SAS On Demand for Academic: Enterprise Miner by going back to that page and clicking on the second link. It should start.
Patience is a virtue.
Enterprise Miner can be really slow. At first, I thought it wasn’t working. I switched to a better connection and a faster computer (it wasn’t hard,I had to roll my desk chair a few feet but being the finely-tuned athletic machine that I am, I managed) . My advice is if you have several computers, use the best one for this. For a lot of things, the speed of connection and how much RAM you have may not make a difference. This is not one of those things.
Getting Your Data into SAS Enterprise Miner
But…. you have no data, do you? Your problem may be that you are not an instructor. Only instructors can upload data to the course. If that’s your problem, there’s not much I can do for you. If you are an instructor, go to the instructor home page > course information. Scroll down and you will find, in about the middle of the page, instructions on how to upload your data. You can use any FTP program. In fact, even though Enterprise Miner does not run on the Macintosh my data happened to be on a Mac and I uploaded it using Fetch. It worked fine.
If your data DON’T upload fine, check the settings on your FTP programs. A lot of organizations have set the default to be SFTP. SAS didn’t seem to like this. I changed it to FTP and my data uploaded happily away.
If you upload a SAS data set, then you and your students will be able to access the data using the LIBNAME statement shown below. You’ll want to include the access=readonly parameter to prevent your students from modifying the data.
libname mydata “/courses/BLAH/BLAHBLAH/THISCOURSE/saslib” access=readonly;
The BLAHs will be replaced with your course specific information. If you are teaching more than one course, when you upload your data and when you use the libname statement, be sure you include the name for THISCOURSE. Otherwise, you won’t be the first professor to have uploaded the data for the wrong course and have a class of very confused students. You won’t be the last, either.
Okay, you have uploaded your data to your directory and opened Enterprise Miner. Now what?
Create a new project. Go to FILE > NEW > PROJECT. Give it a name. I named mine Joe. On second thought, I should have named it Bob, because when you spell it backwards, it’s still Bob.
Open the program editor window. I thought when I went to the FILE menu and picked NEW I would have the option for program, code or something. No.
See that little thing that looks like the program editor window that you wouldn’t have noticed if you weren’t specifically looking for it? That’s it.
Run the Libname statement above, replacing the BLAHs and THISCOURSE to match your actual directory.
Okay, it is running, you have data uploaded, a project open and your library available within the project. The next thing I would do is click on the Help menu (honest) and start reading whatever interests you, like getting started. Unlike most documentation which is written like someone pasted a web page into Babelfish, this is actually easy to follow, well-written and less boring than watching paint dry.
I now have Enterprise Miner working on THREE computers, two using the on-demand version and one with Enterprise Miner for Desktop. Someone should bring me a prize. But no one did.
Blogroll
- Andrew Gelman's statistics blog - is far more interesting than the name
- Biological research made interesting
- Interesting economics blog
- Love Stats Blog - How can you not love a market research blog with a name like that?
- Me, twitter - Thoughts on stats
- SAS Blog for the rest of us - Not as funny as some, but twice as smart. If this is for the rest of us, who are those other people?
- Simply Statistics, simply interesting
- Tech News that Doesn’t Suck
- The Endeavor -John D Cook - Another statistics blog