If I had a clone, all of my code would be beautiful.

St. Paul's butte

Last week, I was a speaker at the Tribal Disability Conference in Turtle Mountain, where I spoke on starting a business. Then, I went for a site visit at Spirit Lake Vocational Rehabilitation followed by another talk on self-employment at the Tribal Disability Awareness conference. In a nutshell, I talked about how having a disability often teaches people to persevere, to not accept when told they can’t do something, to find different ways of meeting goals and solicit other people to help them – and pointed out that all of these traits can be an advantage in starting a business.

Along the way, I was working on a couple of grants, edited a couple of papers – and just this second remembered I have to finish editing a paper I co-authored for something – crap!

There was also the usual matter of approving payroll and invoices, answering email and reviewing work people did while I was gone – new teaching videos to go into the game, artwork, animation, sound files,documentation, bug fixes. Haven’t nearly finished with that.

I’m super-stoked to be on a panel on Monday at the National Council of La Raza conference, “Economic Empowerment in a Wireless World”. I’m planning on going Sunday as well, to a lot of the sessions on education.

Heidi Heitkamp

I got to hear Heidi Heitkamp speak at Turtle Mountain last week and with any luck I’ll be able to attend Elizabeth Warren’s talk on Sunday. Must be my week for Democratic senators.

Somewhere in all of that, I finished my slides and video for the Serious Play conference, also this week, which I am also excited to attend.

Then, there was the meeting people for lunch, stopping in on my daughter who had surgery and checking on her and all of the other general life things. There is a board meeting I have to get up and go to in about nine hours, which I am definitely NOT excited about, but I’m the chair, so I kind of have to show up.

In the midst of all of this, there are 77 fixes and improvements in the Fish Lake game, from “add a better message when the pretest is completed” to “Revise quiz code for re-routing students. This is replicated in many quizzes. Make external file ref & just call it in all of those”.  Some of those are crucial – like I never wrote the quiz for one spot and so that is a dead end.

There are another 47 improvements for Spirit Lake. All of those are to make the game better. For example, we recorded voices from kids at Spirit Lake, and when a student gets a problem wrong, I want to add a video clip that shows one of the game characters and says something like,

“No, 7 x 8 = 56. Now your village burned down.”

The kids did a great job and I think those clips will really help players remember their multiplication tables.

burning village

But … back to my missing quiz. It has to be on mixed fractions, with questions answered using both improper fractions and mixed fractions. There also should be a question with two answers for the numbers that the mixed fraction falls between. Also, at least two word problems, with answers that are whole numbers.

As each question is answered, the program needs to determine if it is the right answer, and, if so, add to the total score, then show a slightly more difficult problem. At the end of the quiz, the student is shown  a success message and the student data written to our database and routed back to the game. If it is the wrong answer, the student is shown a failure message and routed to the appropriate page to study.

In the process of writing this, by the way, I noticed that one of the links on the study page is wrong, so I need to fix that. Apparently, I meant to write something involving turtle eggs. Also, there is a video Diana did on mixed fractions which I have yet to review because I got back at midnight on Wednesday and dived into everything else.

So … back to my no-longer-missing quiz. It is done. I even put in a few comments. As I was writing it, I was thinking, “some of this code is duplicated” and “I bet I could re-write some of these functions so they were more general and then not have so many functions” and a whole lot of other ideas for making it just a better program.

I KNOW that the world is full of code that gets written to be fixed “another day” is still sitting there six years later. In my defense, I will say that I do often loop back around and fix that code – although it might be a year or two later.

Here is my compromise – when I am in town, I try, come hell or high water, to make at least one substantive improvement on one of the games every day – a new video clip, a new quiz. At worst, I may not get any more done than fixing a broken link or touching up a graphic or sound file, but I really try to do more than that. Those 124 fixes are down from 266. It is not perfect but it is progress and it is 1 a.m. In addition to writing this post, I did review one more instructional video and sent feedback, finished the first draft of editing the paper and added improving the code in this quiz as a lower priority game fix.

My code is not perfect but it works, and I will come back and try to do better tomorrow because, at the end of the day, there’s another day. That’s how time works.




When we started the Dakota Learning Project to evaluate our educational games, I wondered if we had bitten off more than we could chew. We proposed to develop the games, pilot them in schools, collect data and analyze the data to see if the games had any impact. We were also going to go back and revise the games based on feedback from the students and teachers.

Some people told us this was far too much and we should just do a qualitative study observing the students playing the game and having them “think aloud”. Another competition we applied to for funding turned us down and one of the reasons they gave is that we were proposing too much.

We ended up doing a mixed methods design, collecting both qualitative and quantitative data and I’m very glad I did not listen to any of these people telling me that it was too much.

There is no substitute for statistics.

When I observed the students in the labs, I thought that perhaps the grade level assigned to specific problems was inconsistent with what the students could really do. For example:

Add and subtract within 1000 … is at the second-grade level 

Multiply one-digit numbers  … is at the third-grade level

It seemed to me that students were having a harder time with the supposedly second-grade problem, but I wasn’t sure if that was really true. Maybe I was seeing the same students miss it over and over. After all, we had 591 students play Spirit Lake in this round of beta testing. It was certainly possible I saw the same students more than once. It is definitely the case that students who were frustrated and just could not get a problem stuck in my mind.

So …. I went back to the data. These data do double-duty because  I’m teaching a statistics class this fall and I am a HUGE advocate of graduate students getting their hands on real data, and here was some actual real data to hand them. (I always analyze the data in advance so it is easy to grade the students’ papers, to give examples in class and so l don’t get student complaining that I am trying to get them to do my work for me, although they still do. Ha! As if.)

We had 1,940 problems answered so, obviously, students answered more than one problem each.  Of those problems, 1,053, or 54.3% were answered on the first attempt. This made me quite happy because it is close to an ideal item difficulty level. Too easy and students get bored. Too hard and they get frustrated.

I used SAS Enterprise guide to produce the chart below:

chart showing subtraction in the middle of difficulty range

You can see that the subtraction problem showed up about mid-range in difficulty. Now, it should be noted that the group gets more selective as you move along. That is, you don’t get to the multiplication problems unless you passed the subtraction problem. Still, it is worth noting that only 70% of fourth- and fifth-grade students in our sample answered correctly on the first try a problem that was supposedly a second-grade question.

Because we want students to start the game succeeding, I added a simpler problem at the beginning. That’s the first bar with 100% of the students answering it correctly. I won’t get too excited about that yet, as I added it later in the study and only a few students were presented that problem. Still, it looks promising.

So, what did I learn that I couldn’t learn without statistics? Well, it reinforced my intuition that the subtraction problem was harder than the multiplication ones and told me that  a substantial proportion of students were failing it on the first try. It was not the same students failing over and over.

The second question then, was whether the instructional materials made any difference. I’m pleased to tell you that they did. On the second (or higher) attempt, 85% of the students answered correctly. If you add the .85 of the 30% who failed the first go-round to the 70% who passed on the first attempt, you get 92% of the students continuing on in the game. This made me happy because it shows that we are beginning at an appropriate level of difficulty. I would have liked 100% but you can’t have everything.

I should note that the questions are NOT multiple choice, and in fact, the answer to that particular problem is 599, so it is not likely the student would have just guessed it on the second attempt.



Here is a little note for people on customer service:

Every company I have ever worked with that has terrible customer service apologizes a lot and makes soothing noises in lieu of actually doing anything.

When your company fucks up it does NO good to say how sorry you are and you empathize.  I really don’t care if you are going back in your office and doing the evil scientist laugh moo-ha-ha while you dance around and spit on my bank statement. I just want you to fix it.

I went to the bank yesterday to deposit a check and it seemed my balance was lower than I expected. When I got home, I found a letter saying my last deposit had been reversed because it didn’t match the name on the account. This was very weird since the check was written to me. I called the 800 number and was told that yes, NINE YEARS AGO I had come into the bank and shown them proof I had changed my name and that was noted on my account but it was not noted in the right spot on my account so I would need to go the branch that had reversed the deposit in person and show them my ID and proof of name change.

I pointed out that I had done exactly that and had been depositing checks at that exact branch for the past nine years with the same name and never had a problem before. The person on the humorously named customer service number told me that I would have to go back to the branch in person, show them my ID again and have them write in some other place on the record that I had changed my name.

I asked if that was the case could he at least note somewhere in their files that this was a HUGE inconvenience and, in fact, impossible for several days as I’m writing this from an airplane from which it is not feasible to leap out and parachute into my local branch. He said it wouldn’t make any difference because no one would read it. He said if I wanted to have anyone read it I should have the bank manager write a letter.

This morning, on the way to the airport, I called my local branch where I was told that it was NOT up to them and that it was reversed by some other central office that handles ATM transactions and I could go into any branch. Also, the branch manager told me that if I was unhappy with the way it was handled, I should call customer service (read preceding paragraph) and that she couldn’t change it over the phone because she had no ID and had no idea who I was because she had never seen me. I pointed out that a) I was going to be in the airport and going to any branch was not feasible and b) they had been cashing my checks for the last 9 years with the same name  and they had a record in my file I had changed my name so why couldn’t they just un-reverse their reversal of my deposit, c) what the branch was telling me was the exact OPPOSITE of what customer service had told me, d) I had ID nine YEARS ago when I did this exact same thing, e) every single person I talked to agreed that, “Oh, yes, we see that you came into the bank and told us you changed your name and it is written into the record here, but IT’S NOT WRITTEN IN THE RIGHT SPOT and e) I had been coming into the Santa Monica branch for SEVENTEEN YEARS.

Here is the story of my account with US Bank – I started with a small bank in North Dakota, which then got bought by another bank, which then got bought by another bank. All along, I have had my same account but would get a cheery letter saying, “We’ve merged with so-and-so bank. “

Eventually it was US Bank and then they bought another bank and moved the closest local branch. I don’t recall how long I have been going to this particular branch but I know it is over nine years because I remember going in there and having my married name added to my account.

How hard is to to make a company wiki or something so people in your company give out accurate information? This isn’t the first time this has happened to me at all (read post here on Microsoft’s laughably misnamed customer service).

More than that, though, why, when every person told me that they could SEE in the record that I had come in years ago and notified them of my name change did no one have the authority to say,

“Yes, I see this is a mistake on our part. You have been banking here for years. I even see the date when you informed us of the name change. We’ll take care of this.”

Here is what I have decided and I urge you to join me in it. When I get terrible customer service from an organization, I take my business elsewhere. I will never use Budget Rent-A-Car in Las Vegas ever again (see post here).

Although it will be a huge pain in the ass, over the next several weeks, I will close all of the accounts I have with U.S. Bank and go elsewhere.

With companies that I am forced to use their products, like Microsoft, because some of what we use only runs on Windows, I buy the minimum amount possible and put it off for as long as possible. The fact that they have a large captive market of people like me may explain why Microsoft’s customer service blows.

If enough people do this, perhaps companies will have an incentive to improve customer service. If not, at least I will have better experiences.

Over 14 years ago, The Spoiled One was barely old enough to walk and the flight attendant was unbelievably rude to us on a cross-country flight on Southwest Airlines. I have flown that airline once in the last 14 years and only then because there was no other flight I could take. It hasn’t hurt their bottom line as far as I can tell, but I’ve been pretty happy flying on other airlines. The highlight of my customer service experience flying with children was when Northwest Airlines ran out of lunches on a flight once and The Spoiled One was crying, the flight attendant went back and got her own lunch and gave it to my toddler! How happy do you think I was to being flying on Northwest that day?

So, that’s my advice to you. Don’t support rotten customer service. Even if it makes no difference to the organization you leave, at least your life will be easier.

The positive side of every experience I have like this is I realize that it is not a very high bar for our company to give people better service than they are used to. Every time something like this happens, I take it as a lesson of how we can try to do better than the average.


That is one of the competitive advantages of small businesses. They really do care whether you are there or not. It makes me hopeful for 7 Generation Games, because unlike a lot of the monolithic educational companies, if something is a problem for a school district, it will get fixed if I have to fly to North Dakota in the middle of a blizzard or drive to downtown LA and work with your IT staff to get our game through your impenetrable firewall.

Having said all of that, I think I’m going to pop into a small bank I know and see how we can do business.

Our story so far … I suggested that people with disabilities who are successful in education, jobs or self-employment don’t define themselves as disabled, and neither do the people around them.

It can’t be that simple, right? Have a positive attitude, look on the sunny side of life and next thing you know, people are driving by and throwing bags of cash in through the open door of your mansion.

sun moneyhouse

What about those people who say they want to get a job or start a business but never make an effort.

There are two major reasons for this lack of connection between attitudes and actions. One is, and I know this may come as a shock to those of you of more tender years, but …

People Lie.

Yes, they do. They tell you they want to get a job when they really have no such desire whatsoever. What they really want to do is to continue to live in your house, eat your food, watch your TV and have you quit nagging at them to go get a job.

Pick up any social psychology textbook (e.g., Myers, 2013) to read it in technical terms. People have a social appropriateness bias – they say what they think you want to hear, what makes them look like a good person, or, as one of my lovely children once explained herself

I said whatever I thought would get you to stop yelling at me at the time.

People may tell you that they want to graduate from school or get a job, but they really don’t care whether they do or not.

The second reason is that not all attitudes are created equal.

Two people who made careers out of proving this point are Icek Azjen and Martin Fishbein. It is an urban legend that you are guaranteed at least a C- in social psychology if you can pronounce their names.

General attitudes – I’d like to have more money – are really bad predictors of people’s behavior.

Specific attitudes – I’d like to get a job at the casino so I could earn money this summer and buy a car to drive to school – are far better predictors of behavior.

This is one reason why, whenever I review files for a vocational rehabilitation program, and see vague vocational goals, like, “Get a job” or “Go to school” it bothers me.

Attitudes we hold more strongly predict behavior more than attitudes we have just sort of adopted. If you asked me if I was in favor of research on endangered plants, I would say, “Yes”. Plants are good, right? I mean, what weirdo doesn’t like plants?


Would I really go to any major effort to insure that plant research was funded? Nope.

On the other hand, I care quite a bit about funding vocational rehabilitation, small business and Native American programs. I have written on my blog and to federal agencies on those issues. I’ve been a grant reviewer for competitions in those areas of research.

One way to strengthen attitudes is to have people actually think about them. This is where vocational counseling can be useful, if the counseling session is actually a discussion of what the person wants to do. This is also why I said at the beginning  that while my giving a lecture one day won’t make much difference, teachers, parents and counselors repeatedly talking to people with disabilities about their goals DOES matter.

One last point, it is easier to predict behavior from attitudes in the aggregate than a specific behavior.

What exactly does that mean? Let’s say I honestly, truly and very much want to succeed at self-employment – which I do, by the way. Let’s take the first behavior you might consider, did I get up early in the morning today to start work. I hate mornings, so no matter what day you asked that question,probably not.Let’s take a whole list of behaviors,though:

  • Working late
  • Working weekends
  • Working more than 8 hours a day
  • Being willing to travel for work
  • Working on holidays
  • Learning new skills so that I can be better at my job
  • Traveling to conferences so I can learn more
  • Attending events to meet people who might be customers for my company

This does loop back to self-employment (I promise), but let’s recap :

Having an attitude that you can succeed does predict success when:

  1. It’s honest
  2. It’s specific
  3. It’s strongly held
  4. It’s well thought out

As a person with a disability, that should maybe give you some clues about what types of people you want in your life – friends, teachers, counselors – who challenge you to set goals that are honest and specific. Who remind you of those goals regularly. Parents, teachers, etc. that gives you some direction in what you want to be encouraging on a daily basis.

You might think my talk for the conference is done at this point but you would be wrong. I have to talk for another 20 minutes and besides that, we haven’t got back to the main point of self-employment. Remember self-employment? that’s what this talk was supposed to be about.


I’m looking forward to speaking at the Turtle Mountain Disabilities Conference and not just because they have one of the best conference logos I have ever seen.

Turtle with world inside it



This topic reminds me of a joke my friend told me. His specialty is geriatrics and one day one of his patients came to him and asked for Viagra, which if you have watched late night TV in the past 10 years you might know is a, um performance-enhancing drug, for men .  Jake said, to him,

“Sir, you don’t need medication. I’ve met your ex-wife, and believe me, she didn’t turn me on, either.”

What does this have to do with self-employment? Someone asked me why I am always talking about employment for people with disabilities. Don’t I know how high the unemployment rate is? Don’t I know that it’s far higher on reservations? Don’t I know that many people, maybe most, aren’t interested in working? And that reminds me of this clip from a podcast I heard recently.

Imagine this offer:

You will spend 8 hours a day doing some task that could be done equally well by a machine – handing a cup of coffee to strangers, mopping a floor. It will present few opportunities for you to grow, physically, emotionally, mentally. You won’t make enough money to buy a home or many of the other things you might like to own. You’ll probably take public transportation to get there and back because you won’t be able to afford a reliable car. After working all day, you won’t have much time or energy on the weekends to do the things you like, whether it is hunting or going to the movies and you won’t have much money to do those things either, so you’ll probably just watch TV. 

That will be your life – hand out cups of coffee , ride the bus, watch TV – and you’ll continue doing that until you die or they find a machine that can do your job cheaper and fire you.

How can anyone turn down an offer like that?


It’s common to hear that people with disabilities don’t want to work, or that youth don’t want to work, or whatever group we are putting down today, and to blame that on lack of work ethic. I don’t think so. Because, you know what, that offer doesn’t turn me on, either.

I want to talk about self-employment from a personal perspective. I’m not funded by any grant that promotes starting a business so I’m not going to pretend that it’s any easier than it is. On the other hand, I started my first business in 1985, as R & R Consulting, and recently incorporated my fourth company, 7 Generation Games, so it is possible.

What I hope to achieve  is to convince more people that self-employment is a very realistic goal for many people with disabilities, although it’s not for every person with a disability, just like it isn’t for every person without a disability. Just me getting up and saying something once is probably not going to make a difference for many people, if anyone. What IS going to make a difference is the people they see every day, their parents, friends, relatives, counselors or teachers. Those are the people I hope to convince as much as the people with disabilities themselves.

            The first thing you need to start a business is …. Stop and think to yourself, what do you really need? Go ahead. I can wait. Email or text a friend and ask. You’re already reading this on some sort of electronic thing-a-ma-jig anyway. I’m going to tell you a few stories, during which time you will probably conclude I have forgotten my point entirely.


When I was young, my whole life was partying and sports, occasionally interrupted by school. Back then, I wondered what old people did when they had too much common sense, responsibilities and aches and pains to be running around. Now I know. They sit around with their friends and talk about life, the problems in the world and how all of them could be solved if people just listened to us. At least, that’s what Willie Davis and I do.

Lately, we’ve been discussing why is it that some people with disabilities become so successful while others are far from meeting their goals? We don’t know the answer to that question, but I’m scheduled for 45 minutes, so I’m going to talk about it anyway.

Willie Davis

Who has a disability? Is it one of those things like art, where you know it when you see it? Not so much. This was brought home to me in a couple of examples. Willie and I were discussing the lack of disability advocates on reservations and how that may be due to not many people with disabilities having the education and experience to be involved in activities like running a conference like this. We were trying to think of someone at Spirit Lake, and after a few minutes we realized, oh yeah, Erich Longie is a vocational rehabilitation “success story”. Now, maybe if you just walked by Erich in the airport when he was walking with two canes because he had a really long way to go through the terminal, you’d think, “There goes a person with a disability.”


However, I can guarantee you would never think of that if you knew him. Willie and I have both known Erich well over 20 years, he’s one of my best friends, I was at his graduation when he was the first enrolled member to receive a doctorate, he was my boss when he was tribal college president, we founded a company together and when asked to name someone on the Spirit Lake Nation who had the education and experience to be a disability advocate – I didn’t think of him.  Neither did Willie, so it’s not just me.


If you know Erich, when you think of him, probably one of the first things is he’s very family-oriented. He was a single father for many years, and now he’s raising his grandchildren. He was a major force in the fight against the Sioux nickname. He’s been quite politically involved over the years, particularly in education, as school board president, member of the tribal college board. He’s been immensely involved in American Indian education – adult basic education instructor, Even Start Director, elementary school teacher, college academic vice-president, written a masters thesis and dissertation on issues in Indian education, published articles in academic journals. He’s an avid pool player, drives like a stunt double for the Dukes of Hazzard (or Grand Theft Auto, if you’re too young to remember that), he’s survived the Marine corps, cancer, alcoholism, the death of his son and an exceptional number of ex-wives. All of this maybe explains why it took Willie and I about twenty minutes of trying to think of someone with a disability to say, “Oh, yeah, Erich was in a car accident and walks with a cane, sometimes two.”

Erich and teachers

Erich is unusual, but he’s not unique.


A few months ago, an old friend came to one of my daughter’s fights. Tina took the 4 a.m. bus from Los Angeles instead of flying out the night before because her mom had the flu and she wanted to make sure she was all right before she left town. Let me tell you a few things about Tina. She is a vocational rehabilitation counselor, has a black belt in judo, she is going back to school to get her PhD. – and she is always late. I’m always late, too, so if I KNOW that you are always late, it means you are getting there after me, so you must always be REALLY late, and she is.


Not long ago, someone asked me who I was waiting for, and I said,

“My friend, Tina. You know her right? You know what it is about her?”

The other person just went off,

“Oh, yes, she’s blind, right? I just think she is so inspirational. It’s so amazing how she lives in her own home, travels the world, graduated from college. She’s just so inspiring. Is that what you were going to say?”

And I said,

“No, I was going to say that she can be kind of a pain in the ass how she complains about everything and she’s always late, but you’ve kind of ruined it now.”

I told Tina this later and she agreed,

“Yeah, I do complain a lot, don’t I? I should work on that.”


I didn’t find it particularly inspirational that Tina got on a plane and went wherever she wanted to go any more than I found it inspirational that Willie Davis is part of a group that organized a conference that is now a model for events on other reservations or that Erich Longie earned a doctorate. They are all smart, hard-working people. Why wouldn’t they do these things? I need to take a plane to get to the conference that’s held in Belcourt, North Dakota and no one was inspired by me and said, “Ooh, look at her, she managed to get to the airport.”

There is a point here, other than that possibly I’m not the best friend.

So, here is my other question … What is a major factor in successful employment of people with disabilities?

One of the reasons that all of these people who I mentioned are successful is that they are surrounded by people who don’t just expect them to be successful but take it for granted.

Whether it is South Dakota, Washington, D.C. or Alaska, if Erich and I are doing a presentation together, I just assume he will show up and do his part. When I came to Belcourt, I had no doubt that I would be on the program, the conference would be well-organized, I would have my hotel room reserved. I got mad at Tina for being late all of the time because I knew she could do better.

These examples show two parts to being a success. One is not letting your disability define you and the other is being around people who don’t either.

You might say education is a major factor in success. Yes, everyone I mentioned has a degree, but they didn’t start out with an education. They DID start out with the assumption that they could go and get a college degree.

So, am I saying that all people – with or without disabilities – need to be successful is a positive attitude?

That’s certainly not all they need, but it’s a good place to start.

If you majored in psychology and were in argumentative mood, you might argue that there is a lot of research that shows that attitudes do not predict behavior very well. You could point to people who say they want to lose weight, get a job or any of a number of other goals and yet who take no steps toward  meeting those goals. What about THAT, Dr. Smartypants?

I’m glad you asked that question. Tune in tomorrow for the answer.

More notes from the text mining class. …

This is the article I mentioned in the last post, on Singular Value Decomposition


Contrary to expectations, I did find time to read it, on the ride back from Las Vegas and it is surprisingly accessible even to people who don’t have a graduate degree in statistics, so I am going to include it in the optional reading for my course.

Many of these concepts like start and stop lists apply to any text mining software but it just happens that the class I’m teaching this fall uses SAS

In Enterprise Miner, you can only have 1 project open at a time, but you can have multiple diagrams and libraries, and of course, zillions of nodes, in a single project

In Enterprise Miner, can use text or text location as a type of variable. Documents < 32K in size can be contained in project as a text variable. If greater than 32K, give a text location.


  • start lists – often used for technical terms
  • stop lists, e.g. articles like “the”, pronouns. These appear with such frequency in documents they don’t contribute to our goal which is to distinguish between documents. May also include words that are high frequency in your particular data. For example, mathematics, in our data, because it is in almost every document we are analyzing

Synonym tables
Multi-word term tables – standard deviation is a multi-word term

Importing a dictionary — go to properties. Click the …. next to the dictionary (start or stop) you want to import. When it comes up with a window, click IMPORT

Select the SAS library you want. Then select the data set you want. If you don’t find the library that you want, try this:

  1. Close your project.
  2. Open it again
  3. Click on the 3 dots next to PROJECT START CODE in the property window
  4. Write a LIBNAME statement that gives the directory where your dictionaries are located.
  5. Open your project again

[Note:  Re-read that last part on start code. This applies to any time you aren't finding the library you are looking for, not just for dictionaries. You can also use start code for any SAS code you want to run at the start of a project. I can see people like myself, who are more familiar with SAS code than Enterprise Miner, using that a lot.]

Filter viewer – can specify minimum number of documents for term inclusion



Jenn and ChrisSpeaking of Las Vegas, blogging has been a little slow lately since we took off to watch The Perfect Jennifer get married. It was a very small wedding, officiated by Hawaiian Elvis. Darling Daughter Number Three doubled as bartender and bridesmaid then stayed in Las Vegas because she has a world title fight in a few days.

Given the time crunch, I was particularly glad I’d attended this course that gave me the opportunity to draft at least one week’s worth of lectures in the fall. When I finish these notes, my plan is to to edit them and turn it into the last lecture in the data mining course. If it’s helpful to you, feel free to use whatever you like. I’ll try to remember to post a more final version in the fall. If you have teaching resources for data mining yourself, please let me know.

My crazy schedule is the reason I start everything FAR ahead of time.


Hot tip: If you are a professor, you have access to some major benefits from SAS. The main ones that jump to mind are:

  1. Free classes that are worth FAR more than you paid for them.
  2. Free software via SAS On-Demand.
  3. Free books – up to two per semester.
  4. Free teaching materials

You can get more information on the SAS Global Academic Program here. 

Crazy, but true. I went to San Diego for two days (yes, I had to pay my own travel expenses, but with a Prius that’s $10 in gas and a night at a hotel room) and went to a free course on SAS Enterprise Miner. I have SAS Enterprise Miner  free for a class I am teaching in the fall, and unlike desk copies, it’s not just free for the professor but for all of the students.  I’m teaching data mining in the fall and although I really doubt we will get into text mining much, I think I may cover just an introduction in the last lecture. So, to remind myself, and for anyone else who might be teaching the same course, here are some of my notes.


Term-document matrix is a key concept in understanding SAS Text Miner (and probably any other text mining software) , columns are the documents, rows are the terms, like algebra, quotient, statistics

Of course, you are going to have plenty of 0 cells, where the document does not include the word, say”statistics”, and plenty columns that have many, many documents like, say, the word “mathematics”

According to the instructor text mining is a subset of text analytics. I always used them synonymously and we didn’t get into the distinction. Feel free to comment if you have an opinion, like that I should be burned at the stake for such text mining/analytics incest.

Using the filter in text mining works identically to a WHERE statement in an analysis in SAS , that is, it does not delete any observations from your data set but going forward in the analysis it only uses the records that match the filter (where statement)

Two general goals of data mining

  • Pattern discovery – don’t have response variable. Trying to find variables that cluster together.
  • Prediction

Kind of makes me think of statistics in general, where you have things like cluster analysis, factor analysis on one end and techniques like regression on the other.

People can manipulate a few inputs, but not everything, which is one way text mining can be used to identify fraud, by using large numbers of variables and looking for suspicious clusters. The whole fraud detection discussion of the course was pretty interesting, even though I’m not involved in credit card or insurance industries or other areas where it is such a big deal. I just found it intellectually interesting.

If you like matrix algebra (which I do), there was an interesting discussion of Singular Value Decomposition and the term document matrix. It seemed very much like principal components analysis, multiplying a vector of weights by a set of responses and an article was mentioned that distinguishes between SVD and PCA but to be truthful, I probably won’t find the time. I did end up discussing it with The Invisible Developer, though, who got a math degree at UCLA “because I thought as long as I was getting a degree in physics, I might as well”. We are well matched. This is the kind of career planning we go in for at The Julia Group.

Topics vs terms

Terms help define a topic.

Topic and category are not the same.

A document can be in only one category (cluster)

A topic can appear in multiple documents & a document can contain multiple topics

topic=concept , used interchangeably (at least as far as text miner documentation is concerned)

Types of data sets

Training, test and validation data sets are all based on historical data. You actually know what the value of the target variable is.

A scoring data set, you are trying to predict.


Transforming text to number options

  • Boolean count – shows up or not
  • Frequency counting
  • Information theoretic counting (log of frequency counts)

Adjust for document size & corpus (number of documents) size -> term weights

  • Entropy weights (Shannon information theory)
  • Inverse document frequency weights
  • Target-based weights
  • Others

Can combine traditional data mining inputs with text mining inputs in a predictive model

…. I’ll post some more on specifics of how to use SAS text miner in my next post, but I wanted to point out two advantages for professors of taking a course, any course:

  1. It’s good to take courses to remind yourself what it’s like to not be the expert. So often, we get used to knowing all of the little nuances of a field and forget what it’s like to not find it obvious that the F value is the ratio of two estimates of variance, one obtained from between group differences and one from within groups. Back when I had slightly more time, I tried taking one course a year in something I knew nothing about, like microbiology. I learned interesting stuff and maintained more empathy.
  2. If you are lucky, you get to see good teaching modeled, and you can steal the instructor’s ideas. For example, in this class, it started out pretty slowly, but that was good because people who were not as familiar with data mining could get some understanding. It also was good that he defined a lot of the terms and basic concepts because I am just lifting some of that straight out of my notes for one of my lectures. (SAS not only allows this but they will encourage it and send you, free, instructional resources. If you are a professor, you only need to ask.) It was also good because by the afternoon of the first day, everyone was chomping at the bit to get their hands on the software and start running things, which would not have been the case if we’d started out using it right away. The less experienced people would have been lost and the more experienced people would have been bored after three hours of using it in the morning. I’m definitely stealing that idea for my class in the fall.

Here’s the other benefit I have found of courses, for professionals in general. Yes, you could maybe get all of the materials and read it in your spare time without going to San Diego or Cary or wherever. The fact is, that I would NEVER sit down and spend 16 hours in a week studying anything. I would get interrupted, have meetings, answer email, return calls.

Of course, if you are going to get a real benefit, you need to use it when you get back, which I have pretty much failed at. I will explain why next week (how is that for an air of mystery). In the meantime, the best I can do is review my notes so I’m ready to jump in next week.

Oh, and for those people who say that SAS only gives you free things because they want organizations to pay to use their software that students will be trained on – I’m sure that’s true. So?

Captain Obvious wearing her obvious hat

Captain Obvious is wearing a hat

Maybe this is obvious, but I have often found that what is obvious to some people is not so obvious to others, so here are a few random tips.

1. Enterprise Miner can take a REALLY long time to load during which you wonder if anything is happening at all.

task manager

Open up the task manager and look for something that says javaw.exe *32  You can see it near the bottom in the image above. The number next to it should be going up, from 30,000 to 50, 000 etc. If it is, you should probably be patient for a few more minutes and your session will start.

2. Let’s say you want to change the properties of something. For example, I don’t want the data set to be partitioned into Training, Validation and Test in a 40, 30, 30 split. I want it to be 50, 50, 0.  So, I right-click on the DATA PARTITION node, get a drop-down menu and

diagram window with properties at left


there is all of this stuff about Edit Variables all the way down to Disconnect Nodes, where the hell are the properties to change? They’re on the left, in that window with the title Property!  Funny, but it’s so easy to focus on the diagram window and completely forget about everything else. Click on a node and it’s properties will show up in the window.

3. While the three screens you see when you run the StatExplore node are pretty interesting, it would be nice to have a more detailed look at your data. Just go to the VIEW menu and you can get more statistics, like the cell chi-square values, descriptive statistics of numeric variables broken down by the levels of your target variable.

Menu with window optionsNow that you are starting to see some of what you can do with Enterprise Miner, you’ll be wondering what MORE you can do, like decision trees, for example. I’m glad you asked that question ….

After all of the effort to get Enterprise Miner installed, I thought it better do something good. It is interesting to use. Unlike programming where you can get a program to run but give you errors or unexpected results, so far (key phrase!), with Enterprise Miner I have found the problem to be knowing exactly what to select, for example, with CREATE DATA sources. Once you know that, however, it seems pretty hard to make an error.

Goat on a mountainEnterprise Miner does do some pretty cool stuff, which makes it worth the pain of getting it installed. Even way cooler, unlike back in the day when no one could get their hands on it without paying approximately $4,893,0893.16 , their first born child, their left kidney and an albino goat, if you are an instructor or a student, you can get it for free through SAS On-Demand for Academics.

(And, yes, for the record, I *am* aware that said goat is not an albino. I was fresh out of pictures of albino goats. Deal with it.) 

Speaking of Enterprise Miner,  I thought I would ramble on about the good parts for a few posts, since I’m getting ready to teach data mining in the fall and I hate to do anything at the last minute.

One of the good parts is StatExplore. At first glance, it looks good, but at second glance, it looks better.

All you need to do is create a diagram by going to the FILE menu, then selecting NEW and then DIAGRAM.

You can start by dragging a data source on to the diagram. In this example, I used the heart data set from the Framingham Heart Study, which happens to ship with Enterprise Miner in the SASHELP library.

I drag the data set from data sources to the diagram window.

Next, I click on the EXPLORE tab just above the diagram window. This gives you a bunch of icons. Enterprise Miner is just rife with icons. Never fear, though, if you have no idea what this bunch of colored boxes is supposed to mean versus  that bunch, just hover over the icon with your mouse and it will tell you.


Here is my diagram. Simple, no?  It gives you a bunch of cool stuff. First, you have the plot of chi-square values for all nominal variables.

Chi-square plot

You can see that sex has the highest chi-square (as in gender, not as in frequency of), followed by cholesterol status, smoking status and weight status.  I find this rather surprising. I knew women lived longer than men, but with all of the discussion of obesity, I thought weight would be higher up there.

The next chart gives me the worth of each variable in predicting my target, which in this example is death.

plot of variables in order of predictive value

The variable on the far left is age at start. Not surprisingly, the older people are when you start following them, the more likely they are to die in a given period of time. The next variable is Age at CHD Diagnosis, followed by two blood pressure measures, their cholesterol, then cholesterol status – weight status is down at the end.



This analysis produces A LOT of statistics. This, I found interesting because despite some people arguing Enterprise Miner allows analysis by someone without extensive programming or statistics background, certainly in the case of statistics, the more knowledge you have, the better you could make use of the results.

For example,  in the top right (all three of the screen shots above are one screen, I broke them up at an attempt at legibility), the output pane gives descriptive statistics broken down by each level of the target variable. I can see how many people who died had missing data for age at CHD diagnosis, skewness and kurtosis values for variables by status, living or dead, the mode for weight status for people who were living or dead, and a whole lot more. Interestingly, 68% of the whole sample was overweight.

Scrolling through the statistics output I can get a good idea of the data quality – is it skewed, is it missing, is it missing at random.

Without some background in statistics, that’s probably no more than a bunch of numbers. Personally, I found it very helpful. That’s another assignment for the students, to write a brief summary of their data, including any concerns. There weren’t any real problems with these data except for the obvious fact that variables like cholesterol and cholesterol status,smoking and smoking status are going to be highly correlated. It would be a good idea to include one of those as input in any predictive analyses and reject the other to prevent multicollinearity problems.

(NOTE to self: Make sure to explain variable roles, changing variable roles in EM and multi-collinearity.)

You might think this is adequate for running just one node, but, in fact, there is much more here than meets the eye. More on that tomorrow because speaking of overweight, I have been at a computer for 13 hours today and I want to hope on the  bike and get some exercise in before I knock out the last task I need to do today. Although @sammikes just pointed out on twitter that round is a shape, it is not the one I want to be in.

I’m putting this here for my students this fall, but I’m sure there are two or three other people in the world who would like to know how to use Enterprise Miner. I’m assuming you read some of my other posts or received an email from your professor or in other ways got Enterprise Miner installed and running.

If not, you should read the documentation. Or, you are welcome to poke around on this blog and find out what I did. Just type “miner” into the search box.

To proceed:


  1. Start Enterprise Miner
  2. Create a new project
  3. Give it a name
  4. Create a new library so you have some data – File > New > Library
  5. Type in a name and your course library, something like “/courses/yourschool.edu1/a_123/b_456″
  6. Create a new diagram – File > New > Diagram
  7. Create a data source (this strikes me as counter-intuitive, since I have the data source in the library, but whatever. Here is how you do it

data sources tab

  • * Right-Click on the data sources tab
  • * it will come up with a drop down menu with 1 option, create data source
  • * pick that
  • * It will come up with this window.
  • select table
  • Select SAS table, which is the  exact same thing as a SAS data set
  • * Click Next and it will bring up the list of libraries available  including the one you just added in the last step
  • libraries
  • * Double-click to select your library
  • * Select your dataset and then
  • Click OK
  1. The next few screens give you information on your data. In my course, the first assignment is for the students to use these to answer:
  • How many variables in the data set
  •  How many observations
  • .How many of these are nominal variables


  •  Select one of the variables that is NOT nominal. Click the explore tab.
  • Write one paragraph describing these results. Include a screen shot of your results

your data

  • Write a one paragraph summary of these results, only hitting the high(low)lights such as 98% of the data for variable v_1980 are missing.

Obviously this isn’t a feasible assignment if you have 6,000 variables, but I try to have courses that increase gradually in order of difficulty, starting with a relatively small data set and then going to gradually larger and more complex ones.


Next Page →