Jun
17
Why SAS Enterprise Miner on demand is like Jennifer & other thoughts on learning data mining
June 17, 2010 | 1 Comment
A miracle has occurred and I have had time to spend evaluating two things that have been on my to-do list forever, JMP and SAS Enterprise Miner. Both of these products are produced by SAS and the first interesting point is that knowing SAS won’t really help at all. That isn’t to say that having some knowledge of programming logic won’t help. In fact, I am taking a data mining course just for fun. It is very interesting because while I have taken plenty of workshops and short courses I haven’t been a student in a regular class for over a decade.

Jenn after finishing her M.A.
Until she went to college, every parent-teacher conference ever held about my daughter, Jennifer, went like this:
“Jennifer has great potential. She is obviously brilliant and if she just exerted some effort, she could do anything she wanted. Jennifer makes A’s on all of the work that she turns in. ”
In fact, Jenn dropped out of high school, took her GED, went to community college, finished her B.A. at 21, taught school for a while and had her masters degree from USC by 24. So, that is my general view on SAS Enterprise Miner on-demand. I think it has great potential and is worth keeping around. When it grows up, it will do impressive stuff and be a really good teacher.
In learning data mining, whether using JMP or Enterprise Miner, background knowledge makes difference. Because I have had decades of experience with both programming and statistics when I see something in JMP like FORMULA > Conditional it makes perfect sense to me as an IF statement. Some people reading this are probably thinking, “Of course”. If you are one of those people you may be proficient with SAS – or SPSS syntax or any number of programming languages. In Enterprise Miner, when I right-click on the Partition Node and see options like Cluster and Stratification, again, I think “of course”. This is why my fellow students hate me.
It’s not just me. There were a few posts in this cool blog, Bzst on SAS Enterprise Miner’s On-Demand version.
SAS Enterprise Miner
http://blog.bzst.com/2009/10/sas-on-demand-enterprise-miner-update.html
http://blog.bzst.com/2010/05/sas-on-demand-take-3-success.html
and I agreed with pretty much all of her points. Enterprise Miner is cool and the current on-demand version is a great improvement. It is much easier to install than the desktop version and as far as the client-server version, it involves over 340,000 steps to install ,one of which (and I may be imagining this) requires a band of marching flamingos.

Bring in the Flamingos
So… points in favor of Enterprise Miner on Demand
1. Way easier to install than previous versions
2. Free for students and faculty for teaching purposes
3. Students like it better than sitting in the lab. They can download and use on their computers.
4. Just the general cool options- you can use the Partition to create a test, training and validation data set. When you first read in a data source you can set Bayes prior probabilities, you can include the costs of decisions. It is really cool. I was going to include screen shots of some of the really cool output from the cluster analysis I did earlier today but SAS EM kept giving me an error about
“The load balancing object spawner timed out. Please check your Enterprise Miner license.”
Disadvantages
1. It is not easy. Very little of it is self-evident and even less so if you have never used SAS or JMP. As Dr. Shmuéli said in her posts, most MBA students probably aren’t going to thrilled by the need to download, install and learn another piece of software. On the other hand, those really interested in statistics,software or data mining will probably be pumped about that part.
2. As noted in the BZST blog also, if you don’t have some knowledge of statistics and a general idea of program logic you are going to have a hard time using Enterprise Miner. Some people, and I can’t say I wholly disagree, will say this not a disadvantage. You should know what the heck you are using.
3. It can be excruciatingly slow. Sometimes it pops up in a minute. It may take 15 minutes between the time it opens and one analysis runs and gives results when you add the delay in opening EM, adding a new data source, creating a new diagram, dragging the data source to the diagram, creating a sample and running an analysis. When using it at my desk I usually read a book while waiting for each step to execute. For teaching in a lab is just about useless from what I have observed. [And kudos to those brave souls who tried.]
4. It is unreliable. Even while writing this blog on the cool stuff it does, I could not get it to come up to do the cool stuff.
So…. EM is like Jennifer, because:
1. It will no doubt be awesome when it is all grown-up
2. It is worth waiting around for, and
3. The growing pains in the mean time can be REALLY irritating (oh, you have no idea).
Jun
2
SAS ENTERPRISE MINER NOT WORKING? HERE’S WHY (maybe)
June 2, 2010 | 7 Comments
If I had time, which I don’t, I would start a series of how-to articles for statistical software and copy the Car Talk scale they use as a guide for whether or not you should attempt a job yourself, from
a. There are two kinds of screwdrivers ?
to
e. I have built a working nuclear reactor out of wood
I was very excited when I heard that SAS On-Demand was going to offer a cloud version of Enterprise Miner for use in teaching, for free, even. “Was” is the key word in that sentence. Should you do this yourself? Well, it depends. This is very far past an “e” on the Do-it-yourself scale. Do you remember the part in Iron Man where the guy built the Iron Man super hero suit out of spare parts salvaged from a plane crash while trapped in a cave? Well, if you’re that guy, you can do it yourself.
Sigh. I can discern from the fact that you are still reading this that you are not going to listen to me and you are going to try anyway. Yeah, I didn’t listen to me either. There are approximately 3,476 steps in getting Enterprise Miner to work. Let’s assume you have a SAS profile, you logged in, you have a user name and password for SAS on-demand and you have set up a course or someone has registered you for a course. If you are lost already, go here:
http://support.sas.com/ondemand/index.html
This is pretty straightforward all of the information you need to set up your account. If you try setting up your account and Enterprise Miner does not work, as in, failed to start, your problem may be that you have the wrong version of Java enabled. You may have been fooled by the system requirements for Enterprise Miner which said: {Warning incorrect information between lines}
==========================================================================
“System Requirements for SAS® Enterprise MinerTM
Operating System(s)Any system that supports the Sun Microsystems Java Runtime Environment (JRE). Typically, this includes Unix, Linux, and various Microsoft Windows operating systems, such as Microsoft Windows XP and Microsoft Windows Vista.
Macintosh operating systems are not officially supported. For information about a possible workaround that you could test, see SAS Usage Note 18131.
Java Runtime EnvironmentJava Runtime Environment (JRE) version 1.6.0_15 or greater.”
=========================================================================
NOT EXACTLY !!!
Do not be fooled that everything you need to know about systems requirements is on the page you get when you follow the link system requirements.
After you log in to your SAS on-demand account and click on a link to install your software you will see a link about configuring your system for Enterprise Miner. CLICK ON THIS LINK AND READ EVERYTHING OR YOU WILL BE SORRY.
http://support.sas.com/ondemand/emconfig.html
- You may be tempted to skip over the part about the Java Run Time environment because you just read the part above under systems requirements and you met those. Do not do that.
- You may be tempted to go to the Sun site and download the latest JRE version. Do not do that either.
Do ALL of the stuff on this page linked above.
Go to cmd and type javaws -viewer.
If you don’t have JRE 1.6.0_18 enabled (and who does?) go to the link on that page and download it. It is < NOT > the latest version.
Follow the directions on the page that I told you to read every word of and uncheck all of the other versions you may have installed so that only 1.6.0_18 is enabled.
Now … try starting SAS On Demand for Academic: Enterprise Miner by going back to that page and clicking on the second link. It should start.
Patience is a virtue.
Enterprise Miner can be really slow. At first, I thought it wasn’t working. I switched to a better connection and a faster computer (it wasn’t hard,I had to roll my desk chair a few feet but being the finely-tuned athletic machine that I am, I managed) . My advice is if you have several computers, use the best one for this. For a lot of things, the speed of connection and how much RAM you have may not make a difference. This is not one of those things.
Getting Your Data into SAS Enterprise Miner
But…. you have no data, do you? Your problem may be that you are not an instructor. Only instructors can upload data to the course. If that’s your problem, there’s not much I can do for you. If you are an instructor, go to the instructor home page > course information. Scroll down and you will find, in about the middle of the page, instructions on how to upload your data. You can use any FTP program. In fact, even though Enterprise Miner does not run on the Macintosh my data happened to be on a Mac and I uploaded it using Fetch. It worked fine.
If your data DON’T upload fine, check the settings on your FTP programs. A lot of organizations have set the default to be SFTP. SAS didn’t seem to like this. I changed it to FTP and my data uploaded happily away.
If you upload a SAS data set, then you and your students will be able to access the data using the LIBNAME statement shown below. You’ll want to include the access=readonly parameter to prevent your students from modifying the data.
libname mydata “/courses/BLAH/BLAHBLAH/THISCOURSE/saslib” access=readonly;
The BLAHs will be replaced with your course specific information. If you are teaching more than one course, when you upload your data and when you use the libname statement, be sure you include the name for THISCOURSE. Otherwise, you won’t be the first professor to have uploaded the data for the wrong course and have a class of very confused students. You won’t be the last, either.
Okay, you have uploaded your data to your directory and opened Enterprise Miner. Now what?
Create a new project. Go to FILE > NEW > PROJECT. Give it a name. I named mine Joe. On second thought, I should have named it Bob, because when you spell it backwards, it’s still Bob.
Open the program editor window. I thought when I went to the FILE menu and picked NEW I would have the option for program, code or something. No.
See that little thing that looks like the program editor window that you wouldn’t have noticed if you weren’t specifically looking for it? That’s it.
Run the Libname statement above, replacing the BLAHs and THISCOURSE to match your actual directory.
Okay, it is running, you have data uploaded, a project open and your library available within the project. The next thing I would do is click on the Help menu (honest) and start reading whatever interests you, like getting started. Unlike most documentation which is written like someone pasted a web page into Babelfish, this is actually easy to follow, well-written and less boring than watching paint dry.
I now have Enterprise Miner working on THREE computers, two using the on-demand version and one with Enterprise Miner for Desktop. Someone should bring me a prize. But no one did.
Mar
23
Installing SAS Enterprise Miner – Success 1.0
March 23, 2010 | 4 Comments
So, thanks to the wonderful Heidi Johnson of SAS Tech Support I have finally successfully installed SAS Enterprise Miner 6.1 .
I had major problems installing Enterprise Miner on any virtual machine. So, I did what I always do in these situations. I went back to the most plain vanilla no-frills thing I could possibly get and started with that. It took me a while to get a Windows XP machine with 1GB RAM on which I could install Windows. I compromised with Dennis by having it dual-boot with Ubuntu so he did not feel that our house had been completely desecrated by Microsoft. And the little one was told not to play on “Mom’s computer in the living room” lest she be corrupted. Since she has an iMac in her room and a Powerbook she has no need for it other than the conviction of all children everywhere that other toys are always more fun than yours.
For those hardy souls among you, here is what I found out to date:
Step 1: Get a computer running Windows XP, NOT a virtual machine.
Step 2: Download software upgrade and upgrade to Service Pack 3 (required for SAS 9.2)
Step 3: Use the DVDs to create a SOFTWARE DEPOT on your computer. You cannot do a planned installation with Enterprise Miner from DVDs.
Step 4: Go to this directory c:\Windows\system32\drivers\etc and open the hosts file
Step 5: The file will have some number like 38.25.63.10 and localhost. Hit the tab key after local host and type in the name of your computer. It should be something like joe57-4.living.room
Step 6: Open up the folder where you have saved your SAS Software Depot and right-click on setup.exe
Step 7: There will be a number of windows where it is pretty obvious that you click next. WHEN YOU GET TO THE WINDOW THAT ASKED WHAT YOU WANT TO DO PLANNED INSTALLATION (the first option), INSTALL SAS SOFTWARE, etc. click on the PLANNED INSTALLATION.
Step 8: Now, at this point you might think that you would browse to the plan file folder on your DVD and select a plan file. That won’t work. The plan files folder is empty. Click the button next to USE A STANDARD PLAN FILE
Step 9: You may think, if like me, you are installing Enterprise Miner and Text Miner on one machine, like I am, that you would select the file that says “Enterprise Miner and Text Miner, one machine”. That would be wrong. Keep scrolling. What I actually wanted was Enterprise Miner FOR DESKTOP and Text Miner, one machine.
Step 10: Click next a few more times, your system requirements are updated, about 70 different things are installed and SUCCESS ! I now have Enterprise Miner 6.1 installed and working on one computer.
After that, I did the next step which was to try to reproduce these same steps on another computer. Since I did not have another Windows XP machine around I tried to install it on a virtual machine on my Mac. That did not work. The installer just quit after the first couple of screens. That VM is pretty flaky anyway. The only Vista machine I had handy was 64-bit (I don’t have a license for EM on that operating system) and the rest of the Windows machines were Windows 7 which don’t even have a system32 folder. So, for now I have managed to install Enterprise Miner on one machine, Windows XP 32-bit. I have just started using it. So far, everything seems to be working fine. So fine, in fact, that I am a bit pouty about not being able to spend more time playing with it.
It is really funny that now I have Enterprise Miner installed, structural equation modeling with AMOS, which is what I am going to be working on this afternoon, now seems weak by comparison! It seems as if it is true what my husband says about me, I am never satisfied. (Don’t read anything into that! )
Feb
11
Enterprise Miner: The software for those with infinite time & no fish
February 11, 2010 | 5 Comments
A few days ago, I tried installing Enterprise Miner for, I think, the third time. The first time, I could not get it to work, saw we needed something called a planned install for which I needed a plan file which I was to get from my SAS administrator, who happens to be me and I did not have one. Since I did not really need Enterprise Miner, I went ahead, figured out how to do the basic install and got busy with other things.
The second time, I really was interested in data mining and thought it might be nice to try to figure this out. I looked up a few documents on it but still did not have a planned file, even when I asked myself very nicely if I could have one. I found some things on standard plan files, tried downloading some stuff but it didn’t work and I got busy.
In between, I was copied on a couple of emails from someone else on campus complaining that they had never actually gotten it to work, but since it wasn’t directed to me, I was busy and the person writing the emails is an extremely experienced SAS programmer who I figured could get by fine without any help from me, I didn’t give it much thought.
Lately, though, three related events piqued my interest. First, a notice came across my desk that the department paying for Enterprise Miner was canceling the license. Second and third, two people in two completely different departments asked about using Enterprise Miner. So, having a little bit of time, I decided I would try to install it.
1. I could not install it from the DVDs we distribute for installation. I call the helpful people at SAS Tech support and they tell me that I need to install it from a SAS software depot. Unfortunately (long story I will skip) we no longer have a SAS software depot and cannot download it again.
2. We want SAS 9.2 Maintenance release 2 anyway, so super-nice people in SAS contracts help me get a new download order and I download that. I decide to hedge my bets by installing this on the most stable computer I have, Windows Vista 32-bit, plain vanilla as they come. This is not a machine used for testing, it is one I actually do my work on. Yes, that was stupid of me.
3. I create a software depot and start to do a planned install, select what appears to be the appropriate plan file which is now included as a choice. Everything seems to be fine until the 10th step where it gives me a message about an error with the Object Spawner. I decide to go ahead with the install anyway. After a few hours of downloading and installing, I have Enterprise Miner on my computer but it doesn’t work. Doesn’t work as in I can’t create a project.
4. There was a long interlude in here with me reading numerous documents and two and a half hours with an extremely kind and patient person from SAS Tech Support named Heidi Johnson, who deserves a raise for not screaming. In some extremely bizarre way, when I semi-installed SAS Enterprise Miner and related SAS stuff on my computer it revoked my administrator privileges so a lot of the reasonable suggestions made by Heidi have not worked. Also, SAS 9.2 which I need to do work no longer works on my computer, as in, when I try to import a file from Excel for example, it gives me an error message
After all of this my first thought was to either:
a. ) Delete everything with the word SAS in it off of my computer,
b.) Completely delete the virtual machine.
In an uncharacteristic burst of maturity, though, I realized that it would be difficult to be the SAS administrator on campus without using SAS, that although we do support Stata and SPSS also, dropping SAS because I was annoyed was probably not a justifiable decision to people who might ask me to justify it, besides which I had three questions that came in my inbox while I was on the phone regarding SAS programs. I already answered two of them. I suppose I could just tell anyone who calls tomorrow “I’m sorry we don’t support SAS on Thursdays. Call back on Friday.”
Unfortunately, I cannot uninstall the (non-working) software from my computer because I don’t have administrator privileges due to the software I cannot uninstall. That part, at least, I fixed. If for some unexplainable reason you have an urge to install Enterprise Miner and get stuck, here is what you can do.
Restart your computer in safe mode. Create a new account. Call it Administrator1 . Give it administrator privileges. Uninstall SAS, including Enterpriser Miner.
Now, when you start your computer again and log in as yourself you will once again have administrative privileges.
You may find that your uninstall did not completely uninstall SAS and when you try to reinstall it, you get some kind of error. At this point, you need to show your computer who is in charge.
1. Go to the add/ remove programs and remove everything with the word “SAS” in it.
2. Search on your computer and find if there is anything left with the word SAS in it. In the Program Files almost everything was still there in a folder labeled SAS. Apparently the uninstall in Windows did not uninstall. Move all that stuff to the recycle bin and empty the recycle bin.
After this, I re-installed SAS TS2M2 and it seemed to work. I am not 100% sure because as I was leaving the install just finished, without errors, but when I tried to import an Excel file nothing happened. It may be that I didn’t wait long enough.
In all of this, my daughter, the perfect and patient Jennifer, had been sitting in the lobby downstairs for the past half hour waiting for me to give her a ride home. I realized that I had not fed my fish, which by this time was swimming against the side of the tank, trying to attract my attention.
So, I fed my betta fish, Beta, along with Type I and Type II, the frogs in the other tank, answered a couple of questions on multicollinearity and how to code an infile statement to skip over hundreds of “header” lines, and headed out the door.
After spending so much time trying unsuccessfully to install Enterprise Miner (I will skip the previous problems with 9.1.3 ) I have a backlog of questions to answer on everything from the significance of increases in chi-square to how Stata processes large datasets (short answer: inefficiently).
I will only be in two days next week, as I am presenting at a conference in Minneapolis. The topic is analysis of ethics, an interesting enough subject to almost make me forget that I am going to be in Minneapolis in February. Almost.
As for Enterprise Miner, I asked Justin The Hardware Guy to see if he could find me a computer running Windows XP since maybe it is that I have a virtual machine. Maybe it is that I am using Vista. Hell, maybe Enterprise Miner was designed by UCLA fans. All I know is that some people somewhere have it working, just no one here. He actually did have one computer running XP but it only had 512 MB of RAM so I can’t imagine that would work.
In a few weeks,I may try again, when I have caught up again, analyzed ethics data, found a Windows XP machine with a minimum of 1G RAM and thawed out, and when Heidi can see my name pop up on the caller ID again without being tempted to feign her own death to get out of talking to me. (I have actually done that. It helps that few people expect their statistician to be a Latina grandmother – Dr. De Mars ? No, he just left. You probably passed him on the way in – really old guy, white-haired, balding, pot belly, yeah, that was him.)
Too bad for Heidi I know what she sounds like now. As for the people who asked me about Enterprise Miner, I will give them my honest opinion. In the last week, I have spent more time with this thing than my husband and he gives me money, helps raise the children, does the laundry and has sex with me. Unless they expect Enterprise Miner to do more for them than that, it probably isn’t worth the effort. But, if I ever find out differently, I will let them know.
Jan
19
The first things a statistical consultant needs to know
January 19, 2020 | 3 Comments
I’ll be speaking about being a statistical consultant at SAS Global Forum in D.C. in March/ April. While I will be talking a little bit about factor analysis, repeated measures ANOVA and logistic regression, that is the end of my talk. The first things a statistical consultant should know don’t have much to do with statistics.
A consultant has paying clients.
In History of Psychology (it was a required course, don’t judge me) one of my fellow students chose to give her presentation as a one-woman play, with herself as Sigmund Freud. “Dr. Freud” began his meeting with a patient discussing his fee. In fact, Freud did not accept charity patients. He charged everyone. There’s a winning trivial pursuit fact for you.*
Why am I starting with telling you this? Because I have had plenty of graduate students whose goal is “to be a consultant” but they seem to think their biggest problem when they start out is going to be whether they should do propensity score matching using the nearest neighbor or caliper method.
Here are the biggest problems you’ll face:
- Getting your first clients
- Getting paid
- Getting your data into shape
- Communicating results to your clients.
Let’s start with getting clients. I can think of four ways to do this; referrals, as part of a consulting company, through your online presence and through an organization. I’ve done three of them. First, and most effective, I think, is through referrals. I got my first two clients when professors who did consulting on the side recommended me. I do this myself. If someone can’t afford my fees or I am just booked at the moment, I will refer potential clients to either students, former students or other professionals I know who are getting started as a consultant. It’s not competing with my business. I am never going to work for $30 an hour again and if that’s all that’s in your budget, I understand. If all you need is someone to do a bunch of frequency distributions and a chi-square for you, you don’t need me, although I’m happy to do it as a part of a larger contract.
Lesson number one: Don’t be a jerk.
Referrals mean I’m using my own reputation to help you get a job and so I’m going to refer students who are good statisticians and who I think will be respectful and honest with the client. Don’t underestimate the latter half of that statement.

Lesson number two: It helps if you really love data analysis.
I’d be the first to say that I’m a much nicer person now than when I was in graduate school. Yes, it took me a while to learn lesson one, I am embarrassed to say. However, I really did love statistics and if any of my fellow students had trouble, I was the first person they asked and I was really happy to help. When those students later became superintendents of schools or principal investigators of grants, they thought of me and became some of my earliest clients. Some of my professors also became clients, although those were after I’d had several years of experience.
Lesson number three: Don’t think you are smarter than your clients.
A young relative, who has a Ph.D. In math asked me, “No offense but isn’t what you do relatively easy, like anyone who understood statistics could do it? Why are you so in demand?”
Corollary to this lesson: If you find yourself saying, “No offense” just stop talking right then.
One reason a lot of want-to-be consultants go bankrupt or have to find another line of work is they do think they are smarter than their clients. This manifests itself in a lot of ways so we’ll return to it later, but one way is that they charge much more than the work is worth.
How do you know how much your work is worth?
Lesson number four: Ask yourself, if I had twice as many grants/ contracts as I could do and I was paying someone to do this work, what would I be willing to pay?
That’s a good place to start.
I’ve met a lot of people over the years who charged much more than me and bragged to me about it. In the long run, though, I’m sure I made a lot more money. Clients talk. They find out that you are charging them three times as much as their friend down the block is getting charged by their consultant. You may think you’re getting away with it, but you won’t. You may get paid on those first few contracts but you’ll have a very hard time getting work in the future.
Lesson number five: Know multiple languages, multiple packages
I’ve had discussions with colleagues on whether it is better to be a generalist or a specialist.
I have had a few jobs where they just needed propensity score matching or just a repeated measures ANOVA but those have been the small minority over the past 30 years.
I would argue that even those who consider themselves specialists actually have a wide range of skills. Maybe they are only an expert in SAS but that includes data manipulation, macros, IML and most statistical procedures.
In my case, I would not claim to be the world’s greatest authority on anything but if you need data entry forms created in JavaScript/HTML/CSS, a database back end with PHP and MySQL, your data read into SAS, cleaned and analyzed in a logistic regression, I can do it all from end to end. No, I’m not equally good at all of those. It’s been so long since I used Python, that I’d have to look everything up all over again.
I’ve used SPSS, STATA, JMP and Statistica, depending on what the client wanted. I think I might have even had a couple of clients using RapidMiner. For the last few years, though, the only packages I’ve used have been SAS and Excel. Why Excel? Because that’s what the clients were familiar with and wanted to use and it worked for their purposes. (See lesson three.)
I was really surprised to read Bob Muenchen saying SPSS surpassed R and SAS in scholarly publications. Almost no one I know uses SPSS any more, but, of course, my personal acquaintances are hardly a random sample. I suppose it depends on the field you are in.
I have never used R.
Some people think this is a political statement about being a renegade. Others think it’s because I’m too old to learn new things or in subservience to corporate overlords or some other interesting explanation. (The Invisible Developer, who has been reading over my shoulder, says he never got past C, much less D through Q.)
Since I fairly often get asked why not, I will tell you the real reasons, which is a complete digression but this is my blog so there.
- In my spare time that I don’t have, I teach Multivariate Statistics at a university that uses SAS. Since I’m using SAS in my class anyway and need real life data for examples, when a client has licenses for multiple packages and doesn’t care what I use (almost always the case), I use SAS.
- About the time that R was taking off, my company was also taking off in a different direction. The Invisible Developer and I own the majority of 7 Generation Games which is an application of a lot of the research done by The Julia Group. When we started developing math games, we needed to learn Unity, C#, PHP, SQL, JavaScript, HTML/CSS. We also needed to analyze the data to assess test reliability, efficacy, etc. I called the analysis piece and told The Invisible Developer I was interested in all of it so I’d do whatever was left. He was really interested in 3D game programming so he did the Unity/C# part. I did everything else. Then, after a few years, I moved to Chile, where the language I most had to improve was my Spanish.

It worked out for me. We have a dozen games available from 7 Generation Games and now we’re coming out with a new line on decision-making.
I mention all this because I want to emphasize there isn’t a single path to succeeding as a consultant. There isn’t a specific language or package you have to learn. There is one thing you absolutely must have, though, and that’s the next post.
* (See Warner, S. L. Sigmund Freud and Money. (1989) Journal of the American Academy of Psychoanalysis. Winter;17(4):609-22)
Apr
9
SAS Global Forum started out as planned …
April 9, 2018 | Leave a Comment
The first time I went to SAS Global Forum, over 30 years ago, it was actually called SUGI (SAS Users Group International) and it was in Reno, NV. I was a just-divorced single mom and there was no such thing as a Working Mothers Room (which I noticed signs for here in Denver). I paid for a bonded sitter, on contract with the hotel, to come to my room and watch my toddler. That toddler is now CEO of 7 Generation Games. So, yeah, it’s been a minute.
Having been to these events over 30 years, not to mention a dozen or so at WUSS (Western Users of SAS Software) I thought I might need to put some effort into learning new stuff. My plan was to pick one product that I wanted to learn more about and make my own little personal strand on that. I picked SAS Enterprise Miner. I hadn’t used it a lot, and not at all lately, and I thought it might be a good choice to introduce students to a more data mining – a topic I just touch on in my multivariate statistics course.
The first session was 10 Tips Learned in 20 Years of Enterprise Miner, by Melodie Rush. Did you realize that the nodes in EM are in alphabetical order? No, me neither. I also didn’t know that the Reporter node could automatically generate documentation. If you are registered for the conference, you can download the presentation from the app, even if you didn’t attend.
There wasn’t another Enterprise Miner presentation in the morning, so I wandered over to The Quad and talked to Tom Grant in SAS Global Academic who told me that now you can download a file tiny little 26kb file and run SAS Enterprise Miner on the SAS server, whether you use Windows or Mac. I remembered something like this years ago but it was deathly slow and it sucked. Your other option was to install SAS EM on your desktop which did not exactly require sacrificing a goat, taking your computer apart and putting it back together with each piece bathed in goat’s blood – but it wasn’t all that much easier.
Well, times have changed ! I already had a SAS On-demand for Academics account, I clicked to get Enterprise Miner. A file called main.jnlp downloaded and when I double-clicked my Mac said it was from an unidentified developer – so I went into the preferences and selected to open anyway.
Then, I got a message my version of Java was out of date. I clicked to update it and was directed to download and update it.
Did that, clicked on the main.jnlp again and will you look at that …
The whole process took less than five minutes …
leaving me time to head over to the convention center and see what Scott Leslie and Tricia Aanderude have to say about health outcomes and visual analytics.
How fast does the EM in the cloud run, you ask? Well, I am in a hotel where the wi-fi is about the same as my apartment in Santiago – that is, somewhere mid-way between Santa Monica and North Dakota speed. It runs fine. I can see using it as a demo in a class or making instructional videos with it. Screens don’t pop up as fast as if it was a regular web page but so far the minimal delay is not enough to be annoying to students using it for analyses or teachers using it to demonstrate.
So far, today’s Enterprise Miner strand plan was a success , however, after that, things definitely did not go according to plan, but still great. I’ll have more on that in my next post.
Speaking of not according to plan … I’m giving a presentation at SAS Global Forum at 11 am , Tuesday April 10 in room 207. I’ll talk about the connections between SAS and building games with JavaScript, how I got from Santa Monica, California to Santiago, Chile and where SAS can take you in the most unexpected ways.
Mar
8
Teaching Statistics and Epidemiology with SAS Studio
March 8, 2015 | Leave a Comment
In case you don’t know, SAS On-Demand is the FREE , as in free beer, offering of SAS for academic use. How good is it? There really can’t be one answer to that.
First of all, there are multiple options – SAS Studio, SAS Enterprise Miner, SAS Enterprise Guide, JMP, etc. so some may be better than others.I have a fair bit of experience with two of them, so let’s just look at one of those today.
SAS STUDIO
I mostly use SAS Studio with my students and over the past few courses I have been really pleased with the results. I selected SAS Studio over Enterprise Guide because I strongly believe it is useful for students to learn to code and many students, yes, even in an area like biostatistics need a little encouragement to learn. While they don’t end up expert SAS programmers after two or three courses, they at least can code a DATA step , read in raw data, aggregate data and data from external files, produce a variety of statistics and graphics and interpret the results.
Let’s be frank about this … it’s going to require a bit of work up front. You need to create a course with SAS On-Demand. You need to notify your students that they need to create accounts. If you are not going to use solely the sashelp directory data sets, you’re going to have to upload your own data.
Please don’t tell me you plan on solely using the sashelp data sets! These are really helpful for the first assignment or two while students get their feet wet but unless you expect your students to have careers where all of their files to be analyzed are going to be shipped with the software they use, you’re going to move to reading in other types of data sooner or later.
Your data are going to be stored on the SAS server (so you can tell people who ask that yes, you are ‘computing in the cloud’ – instead of what I usually tell people who ask stupid questions like that, which is shut the hell up and quit bothering me – but I digress. Even more than usual.)
No matter what software you use, you’re going to have to select some data sets for students to analyze, have some sort of codebook and make sure your data is reasonably clean (but not so clean that students won’t learn something about data quality problems). So, the only real additional time is figuring out how to get it on the SAS server.
None of these steps take much time, but adding them all up – getting a SAS profile, creating a course, creating an email to send to all of your students, with the correct LIBNAME, uploading your data – it all maybe adds up to a couple of extra hours.
My challenge always is how I shoehorn additional content into the very limited class time I have with students. One tool I’ve been using lately is livebinders. This is an application that lets you put together an online binder of web pages, videos and material you write yourself.
Here is an example of a livebinder I use for my graduate course in epidemiology. It has SAS assignments beginning with simply copying code to modifying it . Links to the relevant SAS documentation are included, as are videos that show step by step how to use SAS Studio for computing relative risk, population attributable risk, etc. I have a similar livebinder for my biostatistics course.
You might think this is a bit of hand-holding to walk the students through it, but I would disagree. Every time I have found myself thinking,
“Well, this is a little too easy”,
I have been wrong.
If you have been doing something for a decade or, in my case, a few decades, it’s hard to remember how confusing concepts were the very first time. Even things that you do automatically, like downloading your results as an HTML file, were a mystery at one time in your life. Making the videos takes some time initially – you have to do a screencast, and then the voice over. Sometimes I do them at once, using QuickTime and GarageBand simultaneously. Other times, I import the screencast into iMovie and record a voiceover.
Either way, a 7-minute video usually takes me half an hour to record, when you add in screwing up the first time, editing out the part where The Spoiled One came in and asked for money to go shopping, etc. So, you’re adding maybe 3-4 hours to the time you spend on your course. On the other hand, you only have to do it once, so, if you teach the same course a few times, it pays off. I cannot tell you how many times students tell me that the videos were helpful. Unlike when I am lecturing in class, they can slow the video down, play it over.Students end the course with experience coding, using data from actual studies and interpreting data to answer problems that matter.
My point is, that it is a little more work to teach using SAS Studio, but it is worth it.
Jun
30
Text Mining with SAS – class notes
June 30, 2014 | Leave a Comment
More notes from the text mining class. …
This is the article I mentioned in the last post, on Singular Value Decomposition
ftp://ftp.sas.com/techsup/download/EMiner/TamingTextwiththeSVD.pdf
Contrary to expectations, I did find time to read it, on the ride back from Las Vegas and it is surprisingly accessible even to people who don’t have a graduate degree in statistics, so I am going to include it in the optional reading for my course.
Many of these concepts like start and stop lists apply to any text mining software but it just happens that the class I’m teaching this fall uses SAS
———
In Enterprise Miner, you can only have 1 project open at a time, but you can have multiple diagrams and libraries, and of course, zillions of nodes, in a single project
In Enterprise Miner, can use text or text location as a type of variable. Documents < 32K in size can be contained in project as a text variable. If greater than 32K, give a text location.
Dictionaries
- start lists – often used for technical terms
- stop lists, e.g. articles like “the”, pronouns. These appear with such frequency in documents they don’t contribute to our goal which is to distinguish between documents. May also include words that are high frequency in your particular data. For example, mathematics, in our data, because it is in almost every document we are analyzing
Synonym tables
Multi-word term tables – standard deviation is a multi-word term
Importing a dictionary — go to properties. Click the …. next to the dictionary (start or stop) you want to import. When it comes up with a window, click IMPORT
Select the SAS library you want. Then select the data set you want. If you don’t find the library that you want, try this:
- Close your project.
- Open it again
- Click on the 3 dots next to PROJECT START CODE in the property window
- Write a LIBNAME statement that gives the directory where your dictionaries are located.
- Open your project again
[Note: Re-read that last part on start code. This applies to any time you aren’t finding the library you are looking for, not just for dictionaries. You can also use start code for any SAS code you want to run at the start of a project. I can see people like myself, who are more familiar with SAS code than Enterprise Miner, using that a lot.]
Filter viewer – can specify minimum number of documents for term inclusion
—————-
Speaking of Las Vegas, blogging has been a little slow lately since we took off to watch The Perfect Jennifer get married. It was a very small wedding, officiated by Hawaiian Elvis. Darling Daughter Number Three doubled as bartender and bridesmaid then stayed in Las Vegas because she has a world title fight in a few days.
Given the time crunch, I was particularly glad I’d attended this course that gave me the opportunity to draft at least one week’s worth of lectures in the fall. When I finish these notes, my plan is to to edit them and turn it into the last lecture in the data mining course. If it’s helpful to you, feel free to use whatever you like. I’ll try to remember to post a more final version in the fall. If you have teaching resources for data mining yourself, please let me know.
My crazy schedule is the reason I start everything FAR ahead of time.
Jun
25
Text Mining Notes from Awesome Free Course from SAS (Yeah, you read that right)
June 25, 2014 | Leave a Comment
Hot tip: If you are a professor, you have access to some major benefits from SAS. The main ones that jump to mind are:
- Free classes that are worth FAR more than you paid for them.
- Free software via SAS On-Demand.
- Free books – up to two per semester.
- Free teaching materials
You can get more information on the SAS Global Academic Program here.
Crazy, but true. I went to San Diego for two days (yes, I had to pay my own travel expenses, but with a Prius that’s $10 in gas and a night at a hotel room) and went to a free course on SAS Enterprise Miner. I have SAS Enterprise Miner free for a class I am teaching in the fall, and unlike desk copies, it’s not just free for the professor but for all of the students. I’m teaching data mining in the fall and although I really doubt we will get into text mining much, I think I may cover just an introduction in the last lecture. So, to remind myself, and for anyone else who might be teaching the same course, here are some of my notes.
General
Term-document matrix is a key concept in understanding SAS Text Miner (and probably any other text mining software) , columns are the documents, rows are the terms, like algebra, quotient, statistics
Of course, you are going to have plenty of 0 cells, where the document does not include the word, say”statistics”, and plenty columns that have many, many documents like, say, the word “mathematics”
According to the instructor text mining is a subset of text analytics. I always used them synonymously and we didn’t get into the distinction. Feel free to comment if you have an opinion, like that I should be burned at the stake for such text mining/analytics incest.
Using the filter in text mining works identically to a WHERE statement in an analysis in SAS , that is, it does not delete any observations from your data set but going forward in the analysis it only uses the records that match the filter (where statement)
Two general goals of data mining
- Pattern discovery – don’t have response variable. Trying to find variables that cluster together.
- Prediction
Kind of makes me think of statistics in general, where you have things like cluster analysis, factor analysis on one end and techniques like regression on the other.
People can manipulate a few inputs, but not everything, which is one way text mining can be used to identify fraud, by using large numbers of variables and looking for suspicious clusters. The whole fraud detection discussion of the course was pretty interesting, even though I’m not involved in credit card or insurance industries or other areas where it is such a big deal. I just found it intellectually interesting.
If you like matrix algebra (which I do), there was an interesting discussion of Singular Value Decomposition and the term document matrix. It seemed very much like principal components analysis, multiplying a vector of weights by a set of responses and an article was mentioned that distinguishes between SVD and PCA but to be truthful, I probably won’t find the time. I did end up discussing it with The Invisible Developer, though, who got a math degree at UCLA “because I thought as long as I was getting a degree in physics, I might as well”. We are well matched. This is the kind of career planning we go in for at The Julia Group.
Topics vs terms
Terms help define a topic.
Topic and category are not the same.
A document can be in only one category (cluster)
A topic can appear in multiple documents & a document can contain multiple topics
topic=concept , used interchangeably (at least as far as text miner documentation is concerned)
Types of data sets
Training, test and validation data sets are all based on historical data. You actually know what the value of the target variable is.
A scoring data set, you are trying to predict.
General
Transforming text to number options
- Boolean count – shows up or not
- Frequency counting
- Information theoretic counting (log of frequency counts)
Adjust for document size & corpus (number of documents) size -> term weights
- Entropy weights (Shannon information theory)
- Inverse document frequency weights
- Target-based weights
- Others
Can combine traditional data mining inputs with text mining inputs in a predictive model
…. I’ll post some more on specifics of how to use SAS text miner in my next post, but I wanted to point out two advantages for professors of taking a course, any course:
- It’s good to take courses to remind yourself what it’s like to not be the expert. So often, we get used to knowing all of the little nuances of a field and forget what it’s like to not find it obvious that the F value is the ratio of two estimates of variance, one obtained from between group differences and one from within groups. Back when I had slightly more time, I tried taking one course a year in something I knew nothing about, like microbiology. I learned interesting stuff and maintained more empathy.
- If you are lucky, you get to see good teaching modeled, and you can steal the instructor’s ideas. For example, in this class, it started out pretty slowly, but that was good because people who were not as familiar with data mining could get some understanding. It also was good that he defined a lot of the terms and basic concepts because I am just lifting some of that straight out of my notes for one of my lectures. (SAS not only allows this but they will encourage it and send you, free, instructional resources. If you are a professor, you only need to ask.) It was also good because by the afternoon of the first day, everyone was chomping at the bit to get their hands on the software and start running things, which would not have been the case if we’d started out using it right away. The less experienced people would have been lost and the more experienced people would have been bored after three hours of using it in the morning. I’m definitely stealing that idea for my class in the fall.
Here’s the other benefit I have found of courses, for professionals in general. Yes, you could maybe get all of the materials and read it in your spare time without going to San Diego or Cary or wherever. The fact is, that I would NEVER sit down and spend 16 hours in a week studying anything. I would get interrupted, have meetings, answer email, return calls.
Of course, if you are going to get a real benefit, you need to use it when you get back, which I have pretty much failed at. I will explain why next week (how is that for an air of mystery). In the meantime, the best I can do is review my notes so I’m ready to jump in next week.
Oh, and for those people who say that SAS only gives you free things because they want organizations to pay to use their software that students will be trained on – I’m sure that’s true. So?
May
26
A couple of days ago, I ended my post with
If you have a 25% probability of a job developing into something better, and you consistently have a job for years because you have no choice, then the odds are in your favor that you will eventually improve your situation unless … but that’s my next post …
I lied. My next post was on trying to get SAS Enterprise Miner to work, but that is actually related to my point. Alice in Wonderland is one of my absolute favorite books, and not just because it was written by a mathematician.From the Red Queen’s Race:s
“Well, in our country,” said Alice, still panting a little, “you’d generally get to somewhere else—if you run very fast for a long time, as we’ve been doing.”
“A slow sort of country!” said the Queen. “Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!”
How does this have to do with your career? If you aren’t constantly learning new information, you’ll fall behind. That’s why my last post was on SAS Enterprise Miner. I haven’t tried the SAS On-Demand version in well over a year and now I am trying to install a newer version. Mostly, the past few months, I’ve been working with javascript with a bit of PHP, CSS, HTML and SQL thrown in, working on our latest games. The Invisible Developer does the 3-D part and I do almost everything else.
The main reason I teach (it sure as hell isn’t the money!) is that it forces me to stay up to date on the latest software and statistical methods.
Some people do teach from the same yellowed notes every year – I knew a professor that joked he wrote his notes on yellow legal pads so that students couldn’t tell when he’d been using them for years. However, it’s a big mistake, for you and your students. I am shocked by the number of schools using Windows XP – they’re educating (and I use the word loosely) their students to use an operating system that doesn’t even resemble what they’ll be expected to use on the job.
Here is the unless … unless you fail to ACTIVELY seek out opportunities to learn and increase your skills and knowledge. It is so, so easy to fall into the “I’m so busy” trap. I have been really busy. A few months ago, I bought a new laptop and installed Windows 8, because even if you could buy an older operating system (and people do), that’s a mistake. You might as well announce, “I’m too lazy to learn.”
When I went to the schools that had just gotten Windows 8, I at least knew enough to install and test our games on their computers. Because I had tested on my new laptop, I could state positively that the games were compatible with Windows 8.
Realizing I hadn’t updated my Mac desktop operating system in a long time (remember, I’m so busy), I finally bit the bullet and did it and then some of my other software – garageband, iMovie, office – was out of date. So, I updated that, too. Realizing I was using Graphic Converter 6 – and version 9 is available, I updated that also. Much swearing ensued as options I was used to using were no longer there, menus were different. I can’t even say that I found any of the changes to be improvements for my uses. That’s not the point. The world isn’t changing for me and three years from now, if I am working with a school, student or client, whether they have Windows 8, iMovie 11, SAS 9.3 or Office 2008 I will have enough familiarity to work with them.
I made my first website with Netscape Composer (anyone remember that?). Then it was Adobe GoLive, later replaced with Dreamweaver. Now, I switch between Dreamweaver, Webstorms and Textwrangler. At one point, frames were the thing, then templates, now CSS – and that’s just websites.
I tried using Ruby for some programming tasks, but I really needed to do more text mining, I thought, so I tried out a couple of data mining packages – Enterprise Miner is the latest, and I’m looping back to that after having looked at it and decided it didn’t fit what I needed a couple of years ago.
After a problem with dropbox, I signed our company up for Google Apps for Business and we have been using Google hangout for meetings, Google drive for document sharing and backup, etc. We just signed up for a trial of base camp for a couple of projects to decide if that would be a good addition for project management.
I’m testing out both Fargo.io and evrybit (as an alpha tester) .
Here’s the take away message – no one told me to do any of this. No contract required it. I actually agreed to teach the data mining course because I knew it would force me to evaluate different tools on different operating systems. I keep a stack of technical books under my bed and try to read every morning as I have my first cup of coffee.
It’s not enough just to do whatever your job is – you need to know how to do what your job is becoming.
« go back — keep looking »Blogroll
- Andrew Gelman's statistics blog - is far more interesting than the name
- Biological research made interesting
- Interesting economics blog
- Love Stats Blog - How can you not love a market research blog with a name like that?
- Me, twitter - Thoughts on stats
- SAS Blog for the rest of us - Not as funny as some, but twice as smart. If this is for the rest of us, who are those other people?
- Simply Statistics, simply interesting
- Tech News that Doesn’t Suck
- The Endeavor -John D Cook - Another statistics blog