So, thanks to the wonderful Heidi Johnson of SAS Tech Support I have finally successfully installed SAS Enterprise Miner 6.1 .
I had major problems installing Enterprise Miner on any virtual machine. So, I did what I always do in these situations. I went back to the most plain vanilla no-frills thing I could possibly get and started with that. It took me a while to get a Windows XP machine with 1GB RAM on which I could install Windows. I compromised with Dennis by having it dual-boot with Ubuntu so he did not feel that our house had been completely desecrated by Microsoft. And the little one was told not to play on “Mom’s computer in the living room” lest she be corrupted. Since she has an iMac in her room and a Powerbook she has no need for it other than the conviction of all children everywhere that other toys are always more fun than yours.
For those hardy souls among you, here is what I found out to date:
Step 1: Get a computer running Windows XP, NOT a virtual machine.
Step 2: Download software upgrade and upgrade to Service Pack 3 (required for SAS 9.2)
Step 3: Use the DVDs to create a SOFTWARE DEPOT on your computer. You cannot do a planned installation with Enterprise Miner from DVDs.
Step 4: Go to this directory c:\Windows\system32\drivers\etc and open the hosts file
Step 5: The file will have some number like 184.108.40.206 and localhost. Hit the tab key after local host and type in the name of your computer. It should be something like joe57-4.living.room
Step 6: Open up the folder where you have saved your SAS Software Depot and right-click on setup.exe
Step 7: There will be a number of windows where it is pretty obvious that you click next. WHEN YOU GET TO THE WINDOW THAT ASKED WHAT YOU WANT TO DO PLANNED INSTALLATION (the first option), INSTALL SAS SOFTWARE, etc. click on the PLANNED INSTALLATION.
Step 8: Now, at this point you might think that you would browse to the plan file folder on your DVD and select a plan file. That won’t work. The plan files folder is empty. Click the button next to USE A STANDARD PLAN FILE
Step 9: You may think, if like me, you are installing Enterprise Miner and Text Miner on one machine, like I am, that you would select the file that says “Enterprise Miner and Text Miner, one machine”. That would be wrong. Keep scrolling. What I actually wanted was Enterprise Miner FOR DESKTOP and Text Miner, one machine.
Step 10: Click next a few more times, your system requirements are updated, about 70 different things are installed and SUCCESS ! I now have Enterprise Miner 6.1 installed and working on one computer.
After that, I did the next step which was to try to reproduce these same steps on another computer. Since I did not have another Windows XP machine around I tried to install it on a virtual machine on my Mac. That did not work. The installer just quit after the first couple of screens. That VM is pretty flaky anyway. The only Vista machine I had handy was 64-bit (I don’t have a license for EM on that operating system) and the rest of the Windows machines were Windows 7 which don’t even have a system32 folder. So, for now I have managed to install Enterprise Miner on one machine, Windows XP 32-bit. I have just started using it. So far, everything seems to be working fine. So fine, in fact, that I am a bit pouty about not being able to spend more time playing with it.
It is really funny that now I have Enterprise Miner installed, structural equation modeling with AMOS, which is what I am going to be working on this afternoon, now seems weak by comparison! It seems as if it is true what my husband says about me, I am never satisfied. (Don’t read anything into that! )
Cluster analysis is one of those techniques I don’t get to use very often. About once every couple of years someone will be doing a study of types of companies, patients or clients and have a need for a cluster analysis. The best description I read of cluster analysis came from a book many years ago, by Kaufman & Rousseuw that began,
“Cluster analysis is the art of finding groups in data.”
It falls into that gray area between descriptive statistics, that asks how many people like programming and twizzlers and inferential statistics, which question the daily consumption of twizzlers by programmers and non-programmers and whether any difference between the two groups is greater than would be predicted by chance 95% of the time.
Cluster analysis is an exploratory method, usually, and is incorporated in what the young ‘uns now call data mining.
However, it can also be confirmatory in a hypothesis-testing sort of way. Say, I hypothesize that there are three groups of people who have eating disorders, anorexia, bulimia and anorexic-bulemics and they differ in their treatment. I can classify people in those three groups using a cluster analysis, then do an ANOVA or MANOVA on the clusters to see if there are in fact significant differences among clusters in days hospitalized, total inpatient costs, total outpatient costs or other variables of interest.
Personally, when I think of cluster analysis the first type that always comes to mind is the partition, k-means clustering method. I suppose that says a little about my level of weirdness that there actually IS a type that comes to mind. The second type, unless you were dying to know, is fuzzy clusters, because it is something I have been pondering lately. Fuzzy clusters are NOT, contrary to the vicious rumors spread by my enemies, what can be found under my bed because I last cleaned sometime during the Mesozic era, but rather, a method where observations are allowed to fall into two clusters at once. Can people be only anorexic or bulimic or can they fall in both groups? Fuzzy clustering says yes. Kmeans partitions says, no. You can perhaps, though, have a third group of people who are anorexic-bulimic.
Rambling note: When Maria came home for Christmas after having gotten her first job as a sportswriter, someone asked her if she had a favorite sport, she responded:
“Well, since Sports Illustrated is paying me to write about football, this week my favorite sport is football.”
Similarly, since I am meeting with someone tomorrow on how to do a cluster analysis with Stata, it has now become my favorite software for cluster analysis.
How to do it:
Either from the STATISTICS menu select MULTIVARIATE ANALYSIS > CLUSTER ANALYSIS > CLUSTER DATA > KMEANS
and then from the pop-up window select the number of groups and the variables
OR type the following in the command window
cluster kmeans list-of-variables , k(#) measure(type) start(seed)
FOR EXAMPLE, if I wanted to use data on anorexia and bulimia looking for two groups I would do this
cluster kmeans fast binge vomit purge hyper mens weight, k(2) measure(L2) start(krandom)
The default similarity/dissimilarity measure is Euclidean and you started with a random seed. The output of cluster analysis in Stata might be disconcerting to some people by virtue of the fact that there really isn’t any. It will come back and say something singularly unenlightening like “cluster name: _clus_1 ” and that’s it .
The first thing I recommend adding to your cluster analysis command is the keepcenters option, so my command looks like this:
cluster kmeans fast binge vomit purge hyper mens weight, k(2) measure(L2) start(krandom) keepcen
I am assuming you have a relatively modest number of observations, like I do, so you can open up your Stata data file in the Data editor and take a look at it. When I scroll down to the bottom I find that the last two observations are the means for each of the variables for the two clusters. My first cluster has very low values for body weight, moderate values for absence of menses, and high values for fasting, binging, vomiting, purging and hyperactivity. My second cluster has medium body weight, medium scores on hyperactivity, low values for amenorrhea and fasting, and high values for binging, vomiting and purging.
It is starting to appear that I have two groups, anorexics and bulimics. My next step in this exploration would be to use the tabstat command, by cluster to see if other expected (or unexpected) differences emerge.
Currently, there has been an unexpected emergence of my daughter, Jenn, who is neither anorexic nor bulimic, in the lobby downstairs, so we are off to conduct our own experiment to explore whether Chardonnay is best grouped with angel hair pasta primavera or does Pinot Noir fit better in this cluster.
I’ll let you know.
When you’re a student arguing with a professor about some topic in the field, there is always the presumption that you’re wrong and the professor is right. While statistically, I would say the odds do favor that position, it seems dramatically unfair to the student at the time, especially if she is not inclined to grant the authority to the professor’s position.
While I generally had a terrific educational experience, undergraduate and graduate, and ungratefully took it all for granted until much later in life when I realized this was not everyone’s experience, there is still one area that sticks out just because it was unusual. The last time I remember arguing about this was probably my senior year of college. The professor repeated the same party line that we had been given throughout our business education – a good manager can manage anybody. You don’t need to know the business to manage, you only need to know how to motivate people. The analogy we were given over and over was of a man driving a carriage in Central Park. He has never been the horse, he couldn’t do the horse’s job. That is not what matters. What matters is that he uses a carrot or stick to motivate the horse. He is oh so much smarter and more talented than the horse. The man does everything else – marketing, accounting, and, of course, is entitled to all the profits other than the minimal amount needed to sustain the horse.
Although I eventually learned the futility of arguing with my professors, I did not buy this argument then and I certainly don’t buy it now.
It seems no coincidence to me that all of the software companies that continue to be successful these days – SAS, Microsoft, Apple, Google, Twitter – continue to be run by people who could debug a program, write design specifications, replace the memory in their computer and more – ON THEIR OWN. That is a really key phrase, not that they do those things all on their own any more, but when they ask someone to do it, they understand what they are asking. Over the years, I have had the good fortune to work with some managers who had vast knowledge of technology, software and/or statistics. If I had a problem I could not solve, I could go to them and sometimes they had an answer. Even if not, they understood the question. I have also worked with people who put both replacing memory and writing an application in the same grey area of “computers”. The nadir of these was a manager who would regularly come to our offices and plead with us to “program faster”. This just made me laugh, although more than one of my colleagues either took to profanity or drinking.
Tip for those who don’t know – to replace the memory, you open up the computer, take out the old memory and put in the new one. To write an application you meet with people who will use it, get their input, design a prototype, run that by them, code the prototype, debug it and swear, show the results to the end user, make the changes they forgot to tell you about the first time, debug it and swear some more, show them the end result, walk around the building trying to come up with a way to do the seemingly impossible thing they want now, followed by another episode of coding, debugging and swearing. Eventually, you have your application where users do something and the computer runs a report, produces a graphic, throws up a web page or emails you a video of hula dancers.
I hate that line,
“I may not know how to produce our product or service X, but I know people.”
What the hell does that MEAN exactly? That you can distinguish a person from say, a naked mole rate or a zebra dressed up in a person costume?
I can barely abide those senior managers who in a meeting try to show their personal knowledge of their employees by asking me about my family and how my daughter is doing training for her third Olympics. So far, I have resisted the temptation to respond,
“She is doing fine, but my other daughter had a relapse after her 47th time in rehab and mowed down 14 people in Starbucks with a chain saw.”
Yes, that is mature of me.
I would be far more impressed if upper management person Y had an idea what Project X entails because if so it would be immediately obvious that at least 60% of the other people in the meeting added nothing, e.g., the project manager who has no task but to see that the meeting has an agenda and minutes that are emailed to upper management as proof that he/she is doing something. At this point, upper manager Y could disperse the various extraneous people to do something useful and if they aren’t capable of actually doing anything useful, get rid of them thus making the unit/ company far more profitable.
So, no, I still don’t agree with what I was taught in business school. I think people who know software inside and out are better at running a software company. People like me who live and breathe statistics, who can tell you how to do a mixed model in SAS, an ordered logit model in Stata and how to find odds ratios in SPSS are better at running a statistical consulting company. They are better at identifying, valuing and retaining the people who are the base of their company’s profits because they are those people.
I stole that name from Chris Hemedinger for “The Missing Manual” because I thought it was hilarious. If you don’t program in SAS much then you probably did not think immediately, “Oh, . is the symbol for missing numeric data, how funny.”
In fact, you are probably more like my daughter, Maria Burns Ortiz, who when my husband asked her about a problem with her computer and said,
“Did you try zapping the PRAM, because that’s the first thing I would have thought of.”
“No, the first thing I thought of was drop-kicking it.”
Of course, Maria now works for ESPN doing interviews about Olympic snowboarders who get girls to kiss their medals while we are here discussing SAS. So, we may debate who was the wiser.
The unpredictability of XPT files
XPT files, a.k.a. transport files used to be used to transport SAS files between operating systems. Say, you wanted to move your file from a Windows machine to Unix. Nowadays, you don’t need to do a thing. You can just use Filezilla, download your SAS file and it pops right open. There may be cases when you want to use XPT. My computer complains that the file was created on another operating system, but everything works just fine so I ignore it.
One thing XPT files are really cool for is let’s say you are like me and you DON’T have a metadata server set up, but you would still like to use the SAS files you have in JMP. Simply do this:
LIBNAME in “/home/directory/subdirectory/ ” ;
LIBNAME outfile xport “/home/directory/subdirectory/filename.xpt” ;
PROC COPY IN = in OUT= outfile MEMTYPE = DATA ;
SELECT filename ;
You can open XPT files right up in JMP, no problem
You can download that .xpt file and open it up in JMP. Pops right open and all is hunky-dory.
Incidentally, you can also open it in Stata using the fdause command.
XPT decides to be difficult
Very oddly, xpt do NOT pop right open in SAS Enterprise Guide, nor in SAS. I needed to actually write three lines of code, taking up time that I could have used to eaten a jelly bean.
I am quite upset by this because I am inordinately fond of jelly beans. Here is what I needed to write:
libname in xport “e:\filename.xpt” ;
data newfile ;
set in.allmem ;
Note something here. The member that you want may not be the same as the name of the file. What if, as often happens to me, the dataset was left on your desk in a brown paper bag labeled “Beware of the Leopard”, or created by your former colleague who was escorted from the building on what he was told was his last day murmuring that you jerks would be first against the wall when the revolution came. What then?
If you are using SAS 9.whatever it is easy. First, run just that first line to assign your library, then click on the Explorer tab, double-click on the Library named IN and you’ll see the member name.
If you are using Enterprise Guide you may need to hunt a little to find the library because it was misplaced while its author was laying on the beach drinking wine and not watching it better.
From the PROGRAM menu select NEW PROGRAM and run the same LIBNAME statement.
In the bottom left menu you’ll see this image. Click on the third button to bring up your servers. In my case, the file is on the local server, I click on Libraries, then the Library name and I spot the name of my file.
Weird facts about SAS 9.2 on Windows 64
Extremely strange but true. If you are running SAS 9.2 x 64 on Windows 7 on a virtual machine created with VMware it only works if the VM was created using the “more isolated” option. If your VM was created using the “more seamless option” SAS will not run. (A less ungrateful person than me would remember the name of the person at SAS Tech Support who told me this.)
If you already created your VM using the more seamless option you only have two choices. If, as in my case, there wasn’t really anything of importance on your virtual machine, just delete it and create a new one using the more isolated option. Install SAS 9.2 on that and it runs just fine. Option B is to create a new virtual machine and install SAS on that, because it is NOT going to install on the one you have, so just deal with it. It really is not a big deal having two VMs for different uses. I have six, with six different operating systems.
Oh, speaking of SAS 9.2 for Win 64, here is a known problem with this version on Windows 64, both Vista and Windows 7. The Import Data option from the file menu does NOT read in Excel files. You can still import Excel files like this:
Proc import out = sasdatasetname
datafile = “drive:\directory\filename.xls”
dbms = xls replace ;
sheet = “sheet-name” ;
The export works similarly. You can find more information at http://support.sas.com/kb/33/228.html
Disappearing CGF file: Another note on SAS on VM
If you have two hard drives defined, say D & E you may find that your VM won’t recognize certain things like a USB drive because it assumes that is E but E is already taken up. So, trying to be clever, you rename your drive to something else, say H.
However, the next time you start your computer and SAS doesn’t work,in fact it gives you a message like E:\ .. \SASV9.CFG is unreadable.
Of course it is, because this drive is now named H! The simplest fix (other than not having named it E: in the first place or only having one virtual disk, but it is too late for that now) is to go to Control Panel, Select Administrative Tools and then Computer Management. Select Storage and then Disk Management. Right click on the drive you need to change and select Change Drive Letter and Paths. Change it back to the letter you had changed from. If you don’t know, look at that error message you got. Whatever it says, in this case E:, is the drive you changed from. Change it back. Restart your computer.
Don’t change the drive letter for the disk that SAS is installed on again.