SAS Enterprise Miner – Free, on a Mac – Bet You Didn’t See That Coming … but how the hell do you get your data on it?

I wanted to test SAS Text Miner and was surprised to find the university did not have a license. No problem – and it really was, astoundingly, no problem – I had SAS On-Demand Enterprise Miner on a virtual machine using VMware.

I had installed it thinking   – “This probably won’t work but what the hell.”

Here are all the links on this blog on getting SAS Enterprise Miner to work in all of its different flavors, because I am helpful like that.

Let me emphasize that you just better have the correct version of the Java Run Time Environment (jre), don’t say I didn’t warn you, and after you have it running whenever Java asks if you want to update, give it a resounding, “NO!”

So, surprisingly, running Windows 8.1 Pro on a 4GB virtual machine, it pops open no problem.

Okay, now how to find your data.

Turns out that even if you have SAS Enterprise Miner you need to use SAS Studio to upload your data. So, you go to SAS Studio, on the top left hand side of your screen, you see an UP arrow. Click on that arrow and you will be prompted to upload your data.

Not so fast …. where do you want to put your data?

You can only upload the data if you are a professor but since I am,  that should be no problem. There is also a note on my login page that

The directory for all of your courses will be “courses/lalal123/”   .

The LIBNAME for your courses should be

LIBNAME  mydata “courses/lalal123” access = readonly ;

Except that it isn’t. In fact, my course directory is something like

“courses/lalal123/c_1223”

I found that out only by calling tech support a few months ago where someone told me that. Now, when I look on the left window pane I see several directories, most of which I created, and a few I did not. One of the latter is named my_content. If I click on the my_content directory I see two subdirectories

c_1223

and

c_7845

These are the directories for my two courses. How would you have known to look there if you didn’t call SAS tech support or read this blog? Damned if I know, but hey, you did read it, so good for you.

If you leave off the subdirectory … say you actually followed the instructions on your login page and in your start code had this:

LIBNAME  mydata “courses/lalal123” access = readonly ;
run ;

Why, it would run without an error but it would show your directory is empty of data sets, which is kind of true because they are all in those subdirectories whose name you needed to find out.

So …. to recap

1. Use SAS Studio to upload the data to the directory you are using for your SAS Enterprise Miner course. (Seems illogical but it works, so just go with it.)

2. In the start code for your SAS Enterprise Miner project, have the LIBNAME statement including the subdirectory which is under the my_content directory.

Once you know what to do, it runs fine. You can access your data, create a diagram, drag the desired nodes to it.

I’ve only been using this for testing purposes for use in a future course. For that it works fine. It is convenient to be able to pull it up on a virtual machine on my Mac. It is pretty slow but nowhere near as bad as the original version years ago, which was so slow as to be useless.

If you teach data mining – or want to – and your campus doesn’t have a SAS Enterprise Miner license, which I believe is equivalent to the cost of the provost’s first born and a kidney – you definitely want to check out SAS On-demand. It’s a little quirky, but so far, so good.

 

 

Captain Obvious wearing her obvious hat

Captain Obvious is wearing a hat

Maybe this is obvious, but I have often found that what is obvious to some people is not so obvious to others, so here are a few random tips.

1. Enterprise Miner can take a REALLY long time to load during which you wonder if anything is happening at all.

task manager

Open up the task manager and look for something that says javaw.exe *32  You can see it near the bottom in the image above. The number next to it should be going up, from 30,000 to 50, 000 etc. If it is, you should probably be patient for a few more minutes and your session will start.

2. Let’s say you want to change the properties of something. For example, I don’t want the data set to be partitioned into Training, Validation and Test in a 40, 30, 30 split. I want it to be 50, 50, 0.  So, I right-click on the DATA PARTITION node, get a drop-down menu and

diagram window with properties at left

 

there is all of this stuff about Edit Variables all the way down to Disconnect Nodes, where the hell are the properties to change? They’re on the left, in that window with the title Property!  Funny, but it’s so easy to focus on the diagram window and completely forget about everything else. Click on a node and it’s properties will show up in the window.

3. While the three screens you see when you run the StatExplore node are pretty interesting, it would be nice to have a more detailed look at your data. Just go to the VIEW menu and you can get more statistics, like the cell chi-square values, descriptive statistics of numeric variables broken down by the levels of your target variable.

Menu with window optionsNow that you are starting to see some of what you can do with Enterprise Miner, you’ll be wondering what MORE you can do, like decision trees, for example. I’m glad you asked that question ….

After all of the effort to get Enterprise Miner installed, I thought it better do something good. It is interesting to use. Unlike programming where you can get a program to run but give you errors or unexpected results, so far (key phrase!), with Enterprise Miner I have found the problem to be knowing exactly what to select, for example, with CREATE DATA sources. Once you know that, however, it seems pretty hard to make an error.

Goat on a mountainEnterprise Miner does do some pretty cool stuff, which makes it worth the pain of getting it installed. Even way cooler, unlike back in the day when no one could get their hands on it without paying approximately $4,893,0893.16 , their first born child, their left kidney and an albino goat, if you are an instructor or a student, you can get it for free through SAS On-Demand for Academics.

(And, yes, for the record, I *am* aware that said goat is not an albino. I was fresh out of pictures of albino goats. Deal with it.) 

Speaking of Enterprise Miner,  I thought I would ramble on about the good parts for a few posts, since I’m getting ready to teach data mining in the fall and I hate to do anything at the last minute.

One of the good parts is StatExplore. At first glance, it looks good, but at second glance, it looks better.

All you need to do is create a diagram by going to the FILE menu, then selecting NEW and then DIAGRAM.

You can start by dragging a data source on to the diagram. In this example, I used the heart data set from the Framingham Heart Study, which happens to ship with Enterprise Miner in the SASHELP library.

I drag the data set from data sources to the diagram window.

Next, I click on the EXPLORE tab just above the diagram window. This gives you a bunch of icons. Enterprise Miner is just rife with icons. Never fear, though, if you have no idea what this bunch of colored boxes is supposed to mean versus  that bunch, just hover over the icon with your mouse and it will tell you.

diagram

Here is my diagram. Simple, no?  It gives you a bunch of cool stuff. First, you have the plot of chi-square values for all nominal variables.

Chi-square plot

You can see that sex has the highest chi-square (as in gender, not as in frequency of), followed by cholesterol status, smoking status and weight status.  I find this rather surprising. I knew women lived longer than men, but with all of the discussion of obesity, I thought weight would be higher up there.

The next chart gives me the worth of each variable in predicting my target, which in this example is death.

plot of variables in order of predictive value

The variable on the far left is age at start. Not surprisingly, the older people are when you start following them, the more likely they are to die in a given period of time. The next variable is Age at CHD Diagnosis, followed by two blood pressure measures, their cholesterol, then cholesterol status – weight status is down at the end.

statistics

 

This analysis produces A LOT of statistics. This, I found interesting because despite some people arguing Enterprise Miner allows analysis by someone without extensive programming or statistics background, certainly in the case of statistics, the more knowledge you have, the better you could make use of the results.

For example,  in the top right (all three of the screen shots above are one screen, I broke them up at an attempt at legibility), the output pane gives descriptive statistics broken down by each level of the target variable. I can see how many people who died had missing data for age at CHD diagnosis, skewness and kurtosis values for variables by status, living or dead, the mode for weight status for people who were living or dead, and a whole lot more. Interestingly, 68% of the whole sample was overweight.

Scrolling through the statistics output I can get a good idea of the data quality – is it skewed, is it missing, is it missing at random.

Without some background in statistics, that’s probably no more than a bunch of numbers. Personally, I found it very helpful. That’s another assignment for the students, to write a brief summary of their data, including any concerns. There weren’t any real problems with these data except for the obvious fact that variables like cholesterol and cholesterol status,smoking and smoking status are going to be highly correlated. It would be a good idea to include one of those as input in any predictive analyses and reject the other to prevent multicollinearity problems.

(NOTE to self: Make sure to explain variable roles, changing variable roles in EM and multi-collinearity.)

You might think this is adequate for running just one node, but, in fact, there is much more here than meets the eye. More on that tomorrow because speaking of overweight, I have been at a computer for 13 hours today and I want to hope on the  bike and get some exercise in before I knock out the last task I need to do today. Although @sammikes just pointed out on twitter that round is a shape, it is not the one I want to be in.

I’m putting this here for my students this fall, but I’m sure there are two or three other people in the world who would like to know how to use Enterprise Miner. I’m assuming you read some of my other posts or received an email from your professor or in other ways got Enterprise Miner installed and running.

If not, you should read the documentation. Or, you are welcome to poke around on this blog and find out what I did. Just type “miner” into the search box.

To proceed:

 

  1. Start Enterprise Miner
  2. Create a new project
  3. Give it a name
  4. Create a new library so you have some data – File > New > Library
  5. Type in a name and your course library, something like “/courses/yourschool.edu1/a_123/b_456”
  6. Create a new diagram – File > New > Diagram
  7. Create a data source (this strikes me as counter-intuitive, since I have the data source in the library, but whatever. Here is how you do it

data sources tab

  • * Right-Click on the data sources tab
  • * it will come up with a drop down menu with 1 option, create data source
  • * pick that
  • * It will come up with this window.
  • select table
  • Select SAS table, which is the  exact same thing as a SAS data set
  • * Click Next and it will bring up the list of libraries available  including the one you just added in the last step
  • libraries
  • * Double-click to select your library
  • * Select your dataset and then
  • Click OK
  1. The next few screens give you information on your data. In my course, the first assignment is for the students to use these to answer:
  • How many variables in the data set
  •  How many observations
  • .How many of these are nominal variables

 

  •  Select one of the variables that is NOT nominal. Click the explore tab.
  • Write one paragraph describing these results. Include a screen shot of your results
  • Click the COMPUTE SUMMARY STATISTICS tab

your data

  • Write a one paragraph summary of these results, only hitting the high(low)lights such as 98% of the data for variable v_1980 are missing.

Obviously this isn’t a feasible assignment if you have 6,000 variables, but I try to have courses that increase gradually in order of difficulty, starting with a relatively small data set and then going to gradually larger and more complex ones.

 

Most likely, you,too, have experienced homicidal urges when confronted with a problem you have spent five hours trying to solve on your computer, only to call tech support and have them report,

Well, it works fine on my computer.

You’d think if that solved the problem that they would offer to box up their computer and send it over to your house but, alas, they never do.

This is the reason that any software I use for class I test on several computers under different conditions. After having initially failed to get SAS On-Demand for Enterprise Miner to work with boot camp on the Mac, I tried it on a Lenovo machine running Windows 8. I had to install the JRE and ignore a few security warnings, but after that it worked.

[For how I did eventually get it working with boot camp, click here, and thank Jason Kellogg from SAS. ]

Next, I needed to upload some data. The SAS instructions say to use your favorite FTP client and coincidentally, I do have a favorite FTP client (Filezilla), so I downloaded it to the testing machine.

Only the professor can upload data to the class directory, and most professors probably have an FTP program on their personal computer (or maybe not, do you?) Even if you normally do, you may, like me, have borrowed a machine to use for testing or have a new computer. Whatever, this just reinforces my argument that you should never, never plan to use any kind of software in a class unless you have ample time to prepare.

I know that there are schools that ask adjuncts to teach on a week or two notice. That seems to me a recipe for disaster for both the professor and students, unless maybe you are doing something that hasn’t changed in 50 years and requires no technology,  like reading Chaucer, I recommend you follow the advice of Nancy Reagan and “Just say no.”

Here are my first few hints:

  1. Test the software on multiple machines and multiple operating systems.
  2. Make sure one of those machines is on the older, under-powered end of the spectrum, as students often don’t have a lot of extra cash and may not have the shiniest, newest machine like you have on your desk.
  3. Test it on the latest operating system. It may turn out that the version your school has does not work with Windows 11. (I did not have that problem with the Enterprise Miner this time, but I’ve had it with other software in the past so it is a good idea.)
  4. Find out what other software you might need, for example, some kind of FTP program in this case, and install it on your computer, if necessary.
  5. Give yourself plenty of time to do all of the above.

You might think these types of things would be handled by the information technology department at your university, and you may be really lucky and that will be so. In many schools, the IT department basically helps re-set passwords, assigns school email addresses, helps to get discounts on software and upload files to Blackboard and not much else.

For years, I have been trying to figure out where the $50,000 a year or so tuition goes. It isn’t to adjunct professors and it isn’t to the IT staff. It also isn’t  to buying the latest technology because, more and more often, students are expected to bring their own device.

You may think that none of the above should be your job and you may be right, but I am just saying if you want to anticipate the frustrations your students will experience and be able to solve their problems during the lecture by directing them to a link on your class website/ blog your life and theirs will both be a lot easier.

 

Thank you to Jason Kellogg from SAS Technical Support, SAS On-Demand Enterprise Miner is now running on my Mac using Windows 8.1 with boot camp. Here were his instructions.

Note, this is after you have a SAS profile, registered a course, changed the security settings in Java, now you are here

The steps are:
  1. Download and save jre-6u24-windows-i586.exe.
          http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-javase6-419409.html#jre-6u24-oth-JPR
  2. Open the Windows Run window and run
"C:\users\[userid]\Downloads\jre-6u24-windows-i586.exe" STATIC=1
          where [userid] is your user account name
  3. Click OK to start the installation
  4. After finishing the installation, on the desktop, 
right click empty area and select “Create Shortcut”
(NOTE: on Windows 8.1 this was NEW and then SHORTCUT)
  5. In the location, Browse to Desktop and click Next
  6. In the next screen provide name of shortcut, 
for example “Enterprise MinerJWS”
  7. Once the shortcut is created, Right Click and select Properties.
 In the Target enter the following:
"C:\Program Files (x86)\Java\jre1.6.0_24\bin\javaws.exe"

 https://academic93.oda.sas.com/SASEnterpriseMinerJWS/main.jnlp
  8. Click Apply

You now have a clickable shortcut to Enterprise Miner. Please use it when starting Enterprise Miner.

This worked and I now have SAS Enterprise Miner working on my laptop, which is going to be extremely convenient.

PLEASE NOTE THAT ALL OF THE QUOTATION MARKS NEED TO BE THERE OR IT WILL GIVE YOU AN ERROR.

ALSO,  under #7 that is all one command.  I had to break into two lines on this blog to be legible.

 

Although it was still a huge pain in the ass to get started, it is leaps and bounds ahead of the first time I tried Enterprise Miner years ago.

 

chicken

Back then, it required back flips and sacrificing a chicken (okay, finding a machine running Windows XP, installing a bunch of files – just take my word it was a pain in the ass).  As for the on-demand version, it was so slow as to be useless.

In contrast, once I got up and running, it was not bad at all, and that was running off the wireless in the office. Now, our internet speed is good here, so your mileage may vary, but at least under good conditions it runs fine using a small dataset.

So, I just uploaded a dataset with 10,000 records and 6,000 variables. We’ll see what it does with that.

==== Random shameless plug =====

When I’m not playing around with statistical software, I’m running a company that makes adventure games to teach math. If you want your children to do something educational this summer, you can buy a copy here for $9.99.

 

A few years ago, when I was at USC, I tried to get a desktop version of Enterprise Miner to run on a virtual machine on my Mac and that never happened, although I did get it working on a Windows machine I had at home.

Last week, I wrote about my failed attempt to get SAS Enterprise Miner from SAS On-demand to run on a Mac running Windows under boot camp.

Since then, I have successfully installed Enterprise Miner and started it using a Windows native machine.

Sadly, the same cannot be said for my Macs. Using boot camp on two different Macs, one running Windows 7 and another with Windows 8.1 I have had the same problems.

Be aware that if you are going to run Enterprise Miner on any operating system you are going to need at least some idea of what a C: prompt is and feel comfortable poking around things like .dll files.

You might think that this can be assumed and goes without saying if you are teaching, or even taking, a course in data mining. You would be wrong. Nothing can be assumed or goes without saying. Trust me on this.

I am not going to assume that you checked your configuration and the appropriate Java Runtime Environment is installed. If that is not the case,or you are not sure, go here and take care of that now. (See how this not assuming thing works?)

If that is taken care of, regardless of operating system, you will probably have a problem on Java security blocking the application from starting.  For me, changing Java security setting to medium fixed that on all 3 machines. I tried several other things that did NOT fix it. To find your Java security settings, you can go to the control panel (in Windows 8, search for control panel first) and then search for Java with control panel. Click on Java, then the security tab to find the slider to move to medium.

At this point, the Windows machine worked, even though I had to click on several boxes where Java asked me was I ***SURE*** I wanted to do this.

With the Mac though, after I click on Start SAS On Demand Software, Enterprise Miner – it downloads a main.jnlp file which when I open  it,  I eventually get a message an error exists in the user services configuration. You can see screenshots here The same exact problems occurred with both Mac computers running boot camp.

The ever-helpful Rebecca Ottesen said that two of her students using Macs last semester had the same problem and sent me an email directing me to this site.

So, I did a PROC OPTIONS in SAS, which I had loaded on my desktop and verified that the .dll file was located where expected

— and this led me to thinking, wait a minute, my students aren’t going to have SAS loaded on their computers so what are THEY going to do to troubleshoot.

That was kind of a moot point, though, because …

When I got to step 3 and type in the command as directed in the exact directory directed.

C:\Program Files (x86)\Java\jre1.6.0_24>java -fullversion

I get the error message ‘java’ is not recognized as an internal or external command, operable program or batch file.

Now, there could be any number of other things to try but the fact is, I have other things to do and the course is not for a few months. I will keep plugging away and keep you abreast here. If I do decide to go with Enterprise Miner in the fall, I am sure these posts will be helpful references for students.

I do want to advise anyone who is thinking about using the on-demand version of Enterprise Miner to be aware that you are definitely going to have at least a few problems with getting it installed, for example, the security thing, and if you have any students using boot camp, they are going to most likely hate you.

New semester coming up when I will be teaching data mining.  Because I never do anything at the last minute, I’m registering my course and testing the SAS on-demand for Enterprise Miner now.

I have learned from experience not to ignore it when the instructions say to check your configuration. You should find how to do that here.

http://support.sas.com/ondemand/emconfig.html

The first step is to open a command window and see if you have the appropriate Java Runtime Environment installed (JRE). Haven’t had to do anything from a C prompt in a while. On Windows 7, go to the start window at the bottom left of the screen and type in Command in the search box. Command prompt should pop right up.

I followed the instructions and it seemed my JRE was hunky-dory but the first time I started SAS On-Demand with Enterprise Miner it told me my Java was out of date. I went ahead and downloaded the latest version and installed it.

Your mileage may vary but total time getting up and running was about 2 minutes – but don’t get excited yet.

I clicked on start SAS Enterprise Miner and I got this message

Security warning from Java

I clicked RUN anyway but it was blocked from running.

So … I went into the control panel, typed in Java to search the control panel and in the Java security settings added the SAS on-demand login site as an exception. Still no luck.

Next, I went and changed the Java security settings from high to medium. A lot of people would not feel comfortable doing this, but I at least wanted to get Enterprise Miner to work. I could always set it back later.

At this point, I actually got Enterprise Miner to sort of start. That is, there were a couple of screens of security warnings I had to accept and then I got this error message.

error1

After this message, I got another saying the components failed to load.

failed to load

Perhaps, I thought, I should not have updated Java. So, I went back to the configuration instructions, checked to make sure I had a 32-bit JRE even though I have a 64-bit computer (check).

I went and downloaded the recommended version of the JRE from the Oracle site which required me to create an Oracle ID.

After that was downloaded and installed, I went to the Java control panel and disabled the later version so only one version of Java was installed …

and I got the same error!

 

I checked the SAS documentation and it recommended clearing the Java cache. I did that. I got further that time with lots of messages about downloading the application and verifying the application, but just when I was getting excited, it came up and asked me if I wanted to run with an older version of Java. I picked to continue with the older version. After some more security warnings, I got the same error messages as before.

So … I went through the whole circle again, cleared the cache, started again, selected the newer version of Java – and still the same messages.

It’s past midnight on Memorial Day weekend, so I’m not going to bother calling SAS technical support. I have a laptop running Windows 8, so I’m going to try installing it on that tomorrow and see if I have any better luck.

One thing is pretty clear – my students in the fall better have a lot more familiarity with computers than just pointing and clicking or they are going to have a really hard time – and I still haven’t gotten Enterprise Miner to run!

P.S. At one time, years ago, I had gotten SAS Enterprise Miner to run with version 1.6.0_18    so I tried that and it also failed to start with the same error messages.

I’m running on boot camp on a Mac.  I have a Windows laptop, so I’ll try it on that tomorrow also and see what happens.

I am suspecting this is going to be a disaster, but I’m hoping to be mistaken.

I’m in North Carolina this week at a class for professors on Advanced Predictive Modeling using SAS Enterprise Miner. This is the sort of statement that causes The Spoiled One & Co. to make faces that look like this:

14-year-olds

Despite my failure to impress fourteen-year-olds, I think the class has been well worth it.

I’m not naive, I realize that taking a class on a SAS product by a SAS instructor at the SAS facility represents the best case scenario. You know those IT people you always want to smack who say,

Well, I don’t know what’s wrong. It works on my computer.

Yeah, because your computer has maxed out RAM, the latest software and is in the same building as the server. Given that, if it doesn’t work on the the IT computer, you’re totally hosed.

So, if someone else has installed it for you, you have it running on a powerful computer with the latest version (7.1) and NOT the scaled down SAS On-Demand version and it doesn’t work,  I would say Enterprise Miner is hopeless.

It’s not hopeless. It worked fine. All the problems I had were due to my own stupidity, like using a factor of .068 instead of 0.68 or forgetting to type SUM when writing a sum function and stuff like that.

On the other hand, those are the kind of problems that are quick and easy to find and fix.

There is a lot to like about SAS Enterprise Miner. Take this nifty little example from VARIABLE CLUSTERING

Think of it as a graph of  a principal components analysis.

diagram of variable clusters

You can see which variables load together very easily. Imagine explaining this to a client over a rotated factor pattern.

The two best things about Enterprise Miner are diagrams like the one above and the enormous number of data mining procedures it offers.

The two worst things about Enterprise Miner are related to one of the best. Because it can do so much, learning it is quite complicated. I had thought I might use it in my class this fall but it is clearly going to be too much on top of what I have already planned. I am sticking with SAS Enterprise Guide. I am still toying with the idea of teaching a business school class in the spring using data mining, so if I do that Enterprise Miner is a possibility.

The worst thing about EM used to be the installation.  You had to have 1,024 GB of RAM, sacrifice a flamingo and get your registry updated by seven of the twelve apostles before it would install. I just today met the first person who told me he had the damn thing installed – and he now works at SAS, which is probably why they hired him.

Amazingly, at least for the SAS On-Demand version, that seems to be a non-issue. I was in a hotel with a crummy wireless, using my old laptop that has 32-bit Windows 7 installed on bootcamp on a MacBook Pro. Enterprise Miner 7.1 started up and ran fine once I had the latest version of Java (and only one version of Java) installed. This incidentally isn’t the Java version they say is guaranteed to work, but it did.

Besides the complication of learning and installation, I think the other big drawback of EM if you are not at a university or college (for which it is free) is that it cost about  a scadzillion dollars (that’s a zillion squared plus a bushel, 40 leagues and a peck, for you Europeans reading this).

The SAS On-Demand offerings are a good first step, but I think SAS is missing an opportunity to market to people and companies who don’t have more money than a Romney Super-PAC.

So that’s the good and the bad in a nutshell. The neutral is that more SAS programming was involved than I had expected. This did not bother me in the slightest but it might perturb you if you are used to SAS Enterprise Guide where you can go pretty much forever without programming (whether you should or not is a different issue).

The class was fun, so fun that I am now seriously wondering if I can clear my schedule enough to teach a class in the spring just so that I can inflict it on some unsuspecting graduate students. I think that might be fun, too.

.

In the past, when I had to do any type of parsing of text, I wrote my own code with a zillion SUBSTR functions and IF statements and it did the job but it was *so-o-o ugly and painful that I never even considered including text mining in any courses I taught.

I looked into SAS Enterprise Miner years ago but the commercial version costs (and this is approximate) $1,278,544,899,711,315  and your left kidney.

The SAS On-Demand version sucked. You know how some programs you can get a cup of coffee while waiting for them to run? With the original SAS On-Demand for Enterprise Miner you could fly to Columbia, work as a day laborer to earn the money to buy land, start your own plantation, breed a strain of genetically superior coffee beans and skip the country on the last plane out  just before the latest government coup nationalizes your business – and your results STILL wouldn’t be available when you got back.

Having had such good luck with SAS On-Demand for Enterprise Guide last semester, I thought I’d give Enterprise Miner another look.

Oh.My.God.

Last year, The Spoiled One was in the living room with her boring parents, complaining they were watching The Daily Show with boring news when it turned out that Justin Bieber was the guest.

She must have felt like this.

The latest version is unbelievably faster. I cannot tell you if it is better because it ran so slow in the past it was impossible to tell. It is easy to use.  Let me give an example.

First, you register with SAS On-Demand and register a course for use with Enterprise Miner. This is really easy.

Second, you start Enterprise Miner which requires nothing more than clicking on the Get Software link on your log in page.

Login page. Click on Get Software

Next, create a project. Just go to FILE > NEW > Project and click next a lot. A long the way you give it a name. It’s pretty obvious.

It may not be obvious that you need to have a data source available and create a diagram. Again, it’s pretty easy to figure out, though.

Creating a data source – go to FILE > NEW > DATA SOURCE

a window pops up and the default is SAS TABLE, which is what you want if your data is in a SAS dataset (they now call them tables. I blame the damn SQL people.). Click Next

In the next window, you browse to where your data are. Because I am just testing this for use in a class, I used the abstract data set in the Sampsio library.

So, you have a project, a blank diagram and a data source. Now what?

Text parsing:
1. Drag the icon under data sources on to your diagram
2. Click on the Text Mining Tab
3. Click on the Text Parsing tab (hovering over each tab with the mouse will give you its name) and drag it to the diagram
4. Click on the little grey stem sticking out of the end of your data source and drag it to the text parsing box.

5. Now, right- click on the Text Parsing box and from the drop-down menu, select RUN

First 4 steps in text parsing illustrated

After a bit, it will come up with a window that has two choices, OK and Results. Click on Results. The most interesting bit in the results, I think, is the table of frequency for each word. You can see which words are most common in your documents.

STOP WORDS AND OTHER OPTIONS

This is just the beginning, of course. As you can imagine, if you had to actually write a program to read every word separately, that would take a bit of time. Far more time would be to have it ignore words that are useless, like, “the”, “that”, “there”. These are called stop words. Enterprise Miner has a stop list and you can add or delete words from it.

Click on the thing that looks like a page to add a row and type in another stop word. For example, these abstracts come from the SAS Global Forum proceedings so they probably all have some words like data and SAS that occur in every one of them, so in this case, that is pretty useless as far as analyzing the documents. You can add those to your stop list.

If there is a word you want to keep, you can remove it from the stop list by selecting it and clicking that X at the top (right next to the thing that looks like a new page). You’ll be asked if you are sure you want to delete that row.

 

How do you get the stop row list, you may ask, quivering with excitement.


Text parsing options window

If you have clicked on the Text Parsing box, making it active, you’ll see in the left window pane a number of options.

These include:

the language to use,

a list of multi-word terms, everything from “a lot” to “keep in mind” to “zero in”,

parts of speech to ignore, like adjectives, and, of course,

the stop list.

To modify any of these, just click on the three dots next to it and a window will pop up, like the one shown above for the stop list.

If you haven’t actually had to do analyze text data before, you have NO IDEA how amazingly awesomely cool this all is.

When I was in graduate school, we would actually print out multiple copies of the documents, cut the pages into paragraphs and sort them into categories.

More recently, this is why I started using Ruby because it was much easier to parse text than using SAS. There were some cheaper and open source solutions that I looked at but their documentation was non-existent, the interfaces were clear as mud.

The Bad and Good News 

Speaking of unclear interfaces … I’m not sure I would have guessed that the page with the corner folded meant “add new row”.  Also, there is a LOT of stuff on the Enterprise Miner screen. You have all of these different panes in the window and the options in them are completely different depending on whether you have clicked on the text mining tab, the text parsing box or something else. I’ve read a couple of data mining books, one specifically on Enterprise Miner, and they still were very sparse, particularly in their treatment of text mining, which is what I was most interested in.

That’s the bad news. The good news is that when I was at SAS Global Forum, I picked up a copy of Practical Text Mining. I almost didn’t buy it because it’s over 1,000 pages and my suitcase was already pretty full, which meant I’d have to lug it through the airport. Even worse, it did not have an electronic version, which is tough for me because even with contacts and glasses worn OVER my contacts, I still have difficulty reading some of the screen shots in it. (I expect if I had normal eyesight, I’d be fine.)

All that being said, this book is really useful. I know I got a discount at the conference, but still, it was about $70, which for a textbook like this is super-cheap. A thousand pages sounds like a lot, but that’s because it starts with the very basics and is a bit redundant. That’s not so terrible, though because that makes it easy to read. I was laying in bed sick this morning and read the first 120 pages in about two hours.

This is a godsend to anyone doing a qualitative dissertation. The real tragedy is that a lot of people in areas that do qualitative research – education, psychology, nursing, social work, to name a few – probably won’t even be aware that Enterprise Miner exists, much less that they can get it for free to use in teaching their courses.

Seriously, people, this is a huge opportunity for you to teach your students about text mining and it’s really not that hard.

Next Page →