After all of the effort to get Enterprise Miner installed, I thought it better do something good. It is interesting to use. Unlike programming where you can get a program to run but give you errors or unexpected results, so far (key phrase!), with Enterprise Miner I have found the problem to be knowing exactly what to select, for example, with CREATE DATA sources. Once you know that, however, it seems pretty hard to make an error.

Goat on a mountainEnterprise Miner does do some pretty cool stuff, which makes it worth the pain of getting it installed. Even way cooler, unlike back in the day when no one could get their hands on it without paying approximately $4,893,0893.16 , their first born child, their left kidney and an albino goat, if you are an instructor or a student, you can get it for free through SAS On-Demand for Academics.

(And, yes, for the record, I *am* aware that said goat is not an albino. I was fresh out of pictures of albino goats. Deal with it.) 

Speaking of Enterprise Miner,  I thought I would ramble on about the good parts for a few posts, since I’m getting ready to teach data mining in the fall and I hate to do anything at the last minute.

One of the good parts is StatExplore. At first glance, it looks good, but at second glance, it looks better.

All you need to do is create a diagram by going to the FILE menu, then selecting NEW and then DIAGRAM.

You can start by dragging a data source on to the diagram. In this example, I used the heart data set from the Framingham Heart Study, which happens to ship with Enterprise Miner in the SASHELP library.

I drag the data set from data sources to the diagram window.

Next, I click on the EXPLORE tab just above the diagram window. This gives you a bunch of icons. Enterprise Miner is just rife with icons. Never fear, though, if you have no idea what this bunch of colored boxes is supposed to mean versus  that bunch, just hover over the icon with your mouse and it will tell you.

diagram

Here is my diagram. Simple, no?  It gives you a bunch of cool stuff. First, you have the plot of chi-square values for all nominal variables.

Chi-square plot

You can see that sex has the highest chi-square (as in gender, not as in frequency of), followed by cholesterol status, smoking status and weight status.  I find this rather surprising. I knew women lived longer than men, but with all of the discussion of obesity, I thought weight would be higher up there.

The next chart gives me the worth of each variable in predicting my target, which in this example is death.

plot of variables in order of predictive value

The variable on the far left is age at start. Not surprisingly, the older people are when you start following them, the more likely they are to die in a given period of time. The next variable is Age at CHD Diagnosis, followed by two blood pressure measures, their cholesterol, then cholesterol status – weight status is down at the end.

statistics

 

This analysis produces A LOT of statistics. This, I found interesting because despite some people arguing Enterprise Miner allows analysis by someone without extensive programming or statistics background, certainly in the case of statistics, the more knowledge you have, the better you could make use of the results.

For example,  in the top right (all three of the screen shots above are one screen, I broke them up at an attempt at legibility), the output pane gives descriptive statistics broken down by each level of the target variable. I can see how many people who died had missing data for age at CHD diagnosis, skewness and kurtosis values for variables by status, living or dead, the mode for weight status for people who were living or dead, and a whole lot more. Interestingly, 68% of the whole sample was overweight.

Scrolling through the statistics output I can get a good idea of the data quality – is it skewed, is it missing, is it missing at random.

Without some background in statistics, that’s probably no more than a bunch of numbers. Personally, I found it very helpful. That’s another assignment for the students, to write a brief summary of their data, including any concerns. There weren’t any real problems with these data except for the obvious fact that variables like cholesterol and cholesterol status,smoking and smoking status are going to be highly correlated. It would be a good idea to include one of those as input in any predictive analyses and reject the other to prevent multicollinearity problems.

(NOTE to self: Make sure to explain variable roles, changing variable roles in EM and multi-collinearity.)

You might think this is adequate for running just one node, but, in fact, there is much more here than meets the eye. More on that tomorrow because speaking of overweight, I have been at a computer for 13 hours today and I want to hope on the  bike and get some exercise in before I knock out the last task I need to do today. Although @sammikes just pointed out on twitter that round is a shape, it is not the one I want to be in.

Most likely, you,too, have experienced homicidal urges when confronted with a problem you have spent five hours trying to solve on your computer, only to call tech support and have them report,

Well, it works fine on my computer.

You’d think if that solved the problem that they would offer to box up their computer and send it over to your house but, alas, they never do.

This is the reason that any software I use for class I test on several computers under different conditions. After having initially failed to get SAS On-Demand for Enterprise Miner to work with boot camp on the Mac, I tried it on a Lenovo machine running Windows 8. I had to install the JRE and ignore a few security warnings, but after that it worked.

[For how I did eventually get it working with boot camp, click here, and thank Jason Kellogg from SAS. ]

Next, I needed to upload some data. The SAS instructions say to use your favorite FTP client and coincidentally, I do have a favorite FTP client (Filezilla), so I downloaded it to the testing machine.

Only the professor can upload data to the class directory, and most professors probably have an FTP program on their personal computer (or maybe not, do you?) Even if you normally do, you may, like me, have borrowed a machine to use for testing or have a new computer. Whatever, this just reinforces my argument that you should never, never plan to use any kind of software in a class unless you have ample time to prepare.

I know that there are schools that ask adjuncts to teach on a week or two notice. That seems to me a recipe for disaster for both the professor and students, unless maybe you are doing something that hasn’t changed in 50 years and requires no technology,  like reading Chaucer, I recommend you follow the advice of Nancy Reagan and “Just say no.”

Here are my first few hints:

  1. Test the software on multiple machines and multiple operating systems.
  2. Make sure one of those machines is on the older, under-powered end of the spectrum, as students often don’t have a lot of extra cash and may not have the shiniest, newest machine like you have on your desk.
  3. Test it on the latest operating system. It may turn out that the version your school has does not work with Windows 11. (I did not have that problem with the Enterprise Miner this time, but I’ve had it with other software in the past so it is a good idea.)
  4. Find out what other software you might need, for example, some kind of FTP program in this case, and install it on your computer, if necessary.
  5. Give yourself plenty of time to do all of the above.

You might think these types of things would be handled by the information technology department at your university, and you may be really lucky and that will be so. In many schools, the IT department basically helps re-set passwords, assigns school email addresses, helps to get discounts on software and upload files to Blackboard and not much else.

For years, I have been trying to figure out where the $50,000 a year or so tuition goes. It isn’t to adjunct professors and it isn’t to the IT staff. It also isn’t  to buying the latest technology because, more and more often, students are expected to bring their own device.

You may think that none of the above should be your job and you may be right, but I am just saying if you want to anticipate the frustrations your students will experience and be able to solve their problems during the lecture by directing them to a link on your class website/ blog your life and theirs will both be a lot easier.

 

Thank you to Jason Kellogg from SAS Technical Support, SAS On-Demand Enterprise Miner is now running on my Mac using Windows 8.1 with boot camp. Here were his instructions.

Note, this is after you have a SAS profile, registered a course, changed the security settings in Java, now you are here

The steps are:
  1. Download and save jre-6u24-windows-i586.exe.
          http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-javase6-419409.html#jre-6u24-oth-JPR
  2. Open the Windows Run window and run
"C:\users\[userid]\Downloads\jre-6u24-windows-i586.exe" STATIC=1
          where [userid] is your user account name
  3. Click OK to start the installation
  4. After finishing the installation, on the desktop, 
right click empty area and select “Create Shortcut”
(NOTE: on Windows 8.1 this was NEW and then SHORTCUT)
  5. In the location, Browse to Desktop and click Next
  6. In the next screen provide name of shortcut, 
for example “Enterprise MinerJWS”
  7. Once the shortcut is created, Right Click and select Properties.
 In the Target enter the following:
"C:\Program Files (x86)\Java\jre1.6.0_24\bin\javaws.exe"

 https://academic93.oda.sas.com/SASEnterpriseMinerJWS/main.jnlp
  8. Click Apply

You now have a clickable shortcut to Enterprise Miner. Please use it when starting Enterprise Miner.

This worked and I now have SAS Enterprise Miner working on my laptop, which is going to be extremely convenient.

PLEASE NOTE THAT ALL OF THE QUOTATION MARKS NEED TO BE THERE OR IT WILL GIVE YOU AN ERROR.

ALSO,  under #7 that is all one command.  I had to break into two lines on this blog to be legible.

 

Although it was still a huge pain in the ass to get started, it is leaps and bounds ahead of the first time I tried Enterprise Miner years ago.

 

chicken

Back then, it required back flips and sacrificing a chicken (okay, finding a machine running Windows XP, installing a bunch of files – just take my word it was a pain in the ass).  As for the on-demand version, it was so slow as to be useless.

In contrast, once I got up and running, it was not bad at all, and that was running off the wireless in the office. Now, our internet speed is good here, so your mileage may vary, but at least under good conditions it runs fine using a small dataset.

So, I just uploaded a dataset with 10,000 records and 6,000 variables. We’ll see what it does with that.

==== Random shameless plug =====

When I’m not playing around with statistical software, I’m running a company that makes adventure games to teach math. If you want your children to do something educational this summer, you can buy a copy here for $9.99.

 

A few years ago, when I was at USC, I tried to get a desktop version of Enterprise Miner to run on a virtual machine on my Mac and that never happened, although I did get it working on a Windows machine I had at home.

Last week, I wrote about my failed attempt to get SAS Enterprise Miner from SAS On-demand to run on a Mac running Windows under boot camp.

Since then, I have successfully installed Enterprise Miner and started it using a Windows native machine.

Sadly, the same cannot be said for my Macs. Using boot camp on two different Macs, one running Windows 7 and another with Windows 8.1 I have had the same problems.

Be aware that if you are going to run Enterprise Miner on any operating system you are going to need at least some idea of what a C: prompt is and feel comfortable poking around things like .dll files.

You might think that this can be assumed and goes without saying if you are teaching, or even taking, a course in data mining. You would be wrong. Nothing can be assumed or goes without saying. Trust me on this.

I am not going to assume that you checked your configuration and the appropriate Java Runtime Environment is installed. If that is not the case,or you are not sure, go here and take care of that now. (See how this not assuming thing works?)

If that is taken care of, regardless of operating system, you will probably have a problem on Java security blocking the application from starting.  For me, changing Java security setting to medium fixed that on all 3 machines. I tried several other things that did NOT fix it. To find your Java security settings, you can go to the control panel (in Windows 8, search for control panel first) and then search for Java with control panel. Click on Java, then the security tab to find the slider to move to medium.

At this point, the Windows machine worked, even though I had to click on several boxes where Java asked me was I ***SURE*** I wanted to do this.

With the Mac though, after I click on Start SAS On Demand Software, Enterprise Miner – it downloads a main.jnlp file which when I open  it,  I eventually get a message an error exists in the user services configuration. You can see screenshots here The same exact problems occurred with both Mac computers running boot camp.

The ever-helpful Rebecca Ottesen said that two of her students using Macs last semester had the same problem and sent me an email directing me to this site.

So, I did a PROC OPTIONS in SAS, which I had loaded on my desktop and verified that the .dll file was located where expected

— and this led me to thinking, wait a minute, my students aren’t going to have SAS loaded on their computers so what are THEY going to do to troubleshoot.

That was kind of a moot point, though, because …

When I got to step 3 and type in the command as directed in the exact directory directed.

C:\Program Files (x86)\Java\jre1.6.0_24>java -fullversion

I get the error message ‘java’ is not recognized as an internal or external command, operable program or batch file.

Now, there could be any number of other things to try but the fact is, I have other things to do and the course is not for a few months. I will keep plugging away and keep you abreast here. If I do decide to go with Enterprise Miner in the fall, I am sure these posts will be helpful references for students.

I do want to advise anyone who is thinking about using the on-demand version of Enterprise Miner to be aware that you are definitely going to have at least a few problems with getting it installed, for example, the security thing, and if you have any students using boot camp, they are going to most likely hate you.

New semester coming up when I will be teaching data mining.  Because I never do anything at the last minute, I’m registering my course and testing the SAS on-demand for Enterprise Miner now.

I have learned from experience not to ignore it when the instructions say to check your configuration. You should find how to do that here.

http://support.sas.com/ondemand/emconfig.html

The first step is to open a command window and see if you have the appropriate Java Runtime Environment installed (JRE). Haven’t had to do anything from a C prompt in a while. On Windows 7, go to the start window at the bottom left of the screen and type in Command in the search box. Command prompt should pop right up.

I followed the instructions and it seemed my JRE was hunky-dory but the first time I started SAS On-Demand with Enterprise Miner it told me my Java was out of date. I went ahead and downloaded the latest version and installed it.

Your mileage may vary but total time getting up and running was about 2 minutes – but don’t get excited yet.

I clicked on start SAS Enterprise Miner and I got this message

Security warning from Java

I clicked RUN anyway but it was blocked from running.

So … I went into the control panel, typed in Java to search the control panel and in the Java security settings added the SAS on-demand login site as an exception. Still no luck.

Next, I went and changed the Java security settings from high to medium. A lot of people would not feel comfortable doing this, but I at least wanted to get Enterprise Miner to work. I could always set it back later.

At this point, I actually got Enterprise Miner to sort of start. That is, there were a couple of screens of security warnings I had to accept and then I got this error message.

error1

After this message, I got another saying the components failed to load.

failed to load

Perhaps, I thought, I should not have updated Java. So, I went back to the configuration instructions, checked to make sure I had a 32-bit JRE even though I have a 64-bit computer (check).

I went and downloaded the recommended version of the JRE from the Oracle site which required me to create an Oracle ID.

After that was downloaded and installed, I went to the Java control panel and disabled the later version so only one version of Java was installed …

and I got the same error!

 

I checked the SAS documentation and it recommended clearing the Java cache. I did that. I got further that time with lots of messages about downloading the application and verifying the application, but just when I was getting excited, it came up and asked me if I wanted to run with an older version of Java. I picked to continue with the older version. After some more security warnings, I got the same error messages as before.

So … I went through the whole circle again, cleared the cache, started again, selected the newer version of Java – and still the same messages.

It’s past midnight on Memorial Day weekend, so I’m not going to bother calling SAS technical support. I have a laptop running Windows 8, so I’m going to try installing it on that tomorrow and see if I have any better luck.

One thing is pretty clear – my students in the fall better have a lot more familiarity with computers than just pointing and clicking or they are going to have a really hard time – and I still haven’t gotten Enterprise Miner to run!

P.S. At one time, years ago, I had gotten SAS Enterprise Miner to run with version 1.6.0_18    so I tried that and it also failed to start with the same error messages.

I’m running on boot camp on a Mac.  I have a Windows laptop, so I’ll try it on that tomorrow also and see what happens.

I am suspecting this is going to be a disaster, but I’m hoping to be mistaken.

During the time since I started this series of posts on a little thing I knocked out one evening to illustrate long division, I’ve probably done a dozen other somewhat interesting pieces of code – I am sad that Java has co-opted the use of the word codelet because it is such a nice term for a bit of programming that is more than a function but not a real application. Anybody has a good word, let me know. While we’re on the subject of words, what exactly is the difference in Dreamweaver between an extension and a widget?

Anyway …. our games include hundreds of bits like this, where if a student misses a problem, he or she gets routed to a page to pick an option to study.

So … here is the rest of the story. Yes, it could have been done more beautifully, and when I go back and revise it, I think I will change the answer button instead of having two buttons to have one that is changed after the first onClick.

The DOCTYPE (html5) and title are pretty obvious.

All of our web pages have a container ID that is set in the style sheet. That makes all of the content fall within a defined window size, regardless of the screen size.

The w class is just so the background is white in the spot where the problem is. The Invisible Developer wanted some type of background and he liked the specky one.

You might wonder why something like w is a class instead of an ID if it is only used once. In fact, I simplified this example for the blog. Actually the w class is in an external style sheet so their could be pages with more than one element using this same style.

As a commenter on an earlier post pointed out (thank you!) it would really be better practice to give these more descriptive names like white_back because in the future I’ll probably be looking at this page and wondering what the hell ‘w’ was supposed to do. Of course, I can look in the style sheet, but it still is better to name things something descriptive.

You can see that the input field for the second digit of the answer is hidden, as is the button for getting another problem.

The forms have an ANSWER button because we found that students in this age group (9- 12 years) often type something by accident or as their first impulse. This forces them to think, at least for a second, whether or not they really meant that and gives them a chance to change  their mind. We added this at the request of several teachers after our first year of beta testing.

The table width is set at 40% and since the container width is defined, the table will always be the same size.
The q class (again, should be renamed and shame on me), has a border at the bottom of the cell. That is used to give the top part of the division problem and used again when each digit of the quotient is found and multiplied by the divisor. The product is then put in a cell with a line underneath.

The first input field is where the first digit of the quotient will be entered. Onclick this will be hidden and the correct answer shown in the element yans1. If the student had entered an incorrect answer, they’ll also get a message telling him or her it is an incorrect answer. All of this is handled by the javascript.

For the remaining rows of the table – the left cell is underneath the divisor, so it will remain empty. The right cell will have each step in the division problem entered, as the student enters the first digit and then the second.  Again, this is handled by the javascript, all I need to do is make sure the id values for each cell match what is in the script.

Once the problem is finished, the div with the id fin will be shown, as will the button for trying another problem. The student now can select one of three choices:

Get another problem (button3), go back and select another option for studying division, or take a quiz to go back to the game. Five correct answers and he or she can go back to playing Spirit Lake.

<!DOCTYPE html>

<html>
<head>
<title>Practice division</title>

</head>

<body >
<div id=”container”>
<div class=”w”>

<h3>PRACTICE LONG DIVISION</h3>
<p></p>
<h3 id=”hd1″> Enter the FIRST digit in the answer</h3>
<h3 id=”hd2″ class=”hidden”> Enter the SECOND digit in the answer</h3>
<input type=”button” class =”hidden” value=”ANOTHER PROBLEM” size=”5″ name=”button3″ id=”button3″ onclick=”window.location.reload()”>
<p></p>

<p></p>
<form name=”formx” id=”formx” >

<input type=”button” value=”ANSWER” name=”button1″ id=”button1″ size=”5″ onClick=”checkProb(1)”>
<input type=”button” value=”ANSWER” size=”5″ name=”button2″ id=”button2″ class=”hidden” onClick=”checkProb(2)”>
<table width=”40%” border=”0″ cellpadding=”0″ >
<tr>
<td width=”20%” >&nbsp;</td>
<td width=”20%” class=”q” ><input type=”text” name=”ans1″ id=”ans1″ size=”3″><scan id=”yans1″ class=”hidden”></scan>
<input type=”text” name=”ans2″ id=”ans2″ size=”3″ class=”hidden”><scan id=”yans2″ class=”hidden”></scan></td>
</tr>
<tr >
<td ></td>

<td></td>
</tr>
<tr>
<td id=”c” ></td>

<td id= “divide”>&nbsp;</td>
</tr>
<tr>
<td ></td>

<td id= “d”class=”d” >&nbsp;</td>
</tr>
<tr>
<td ></td>

<td id= “e” >&nbsp;</td>
</tr>
<tr>
<td ></td>

<td id= “f” class=”d”>&nbsp;</td>
</tr>
</table>

</form>
<div id=”fin” class=”hidden”>

<p></p>
<a href=”../learndividelong.html”><img src=”../scenephotos/arrowhead_point_left.gif” width=”130″ height=”70″ alt=”back arrow” />
Go back to study more</a>
<img src=”../scenephotos/smalls/handblue.jpg” alt=”blue hand” /> <img src=”../scenephotos/smalls/handyellow.jpg” alt=”yellow hand” />
<a href=”../quizzes/dividelongerquiz.html”>Take a quiz to go back to the game<img src=”../scenephotos/arrowhead_point_right.gif” alt=”next arrow” /></a></div></td>
<p></p>
</div>
</div>
</body>
</html>

 

Yesterday I posted the code to get the problem, now here is where we check it. As I said yesterday, may way of programming is to knock out something that works and then go back and make it work better, like a first draft for a journal article. So, this is my first draft.

Keep in mind the point here is NOT a quiz but for them to review and see how long division works. So, if they get the wrong answer, they get an alert message that this is the wrong answer, but then the correct answer is shown. This happens for both the first and second digit of the quotient.  There are two digits in the quotient in these problems. We are trying to show students that when you do long division, you find the first digit, multiply that by the divisor, write the product below the dividend and subtract. Then, you do the same thing again for the next digit.

Also, I showed using alert here, but we actually use a function we wrote in our game because there are problems with using multiple alert boxes in the same page with Unity. The alert is included here for generalizability. This post is the second half of the javascript. You also need a bit of css and html that I’ll put up next.

You can see the final product here.

This is one of hundreds of applets we have written that are just auxiliary to the main game. You get sent here to study if you miss one of the math challenges in Spirit Lake: The Game.

Here is what this code does in order …

When they type in an answer, it is one digit at a time. The function checkProb, if it is the 1st digit,  hides the input box and answer button for the first digit and shows the correct answer. It also shows the input box and button to answer the second digit. The correct first digit is shown.

The product of the divisor and that first digit is computed, set to a value for a new variable d1, and that is shown.

The result is subtract from the dividend, and that result, e1 is shown but with a space included so the digits are lined up correctly.

If their answer is wrong, a message is shown telling them it is wrong and what the correct answer is. Actually, that message comes up first so once they click OK they can see the correct answer, product, etc.

Then, they enter the second digit and all of the steps execute again. After they have done a complete problem, the instructions on how to complete the problem are hidden and two new options are shown, to either get a new problem or go back to the game.

function checkProb(num){
this.num = num ;
if (this.num == 1)
{
var theirs = document.formx.ans1.value ;

$(“#ans1″).hide() ;
$(“#hd1″).hide() ;
$(“#button1″).hide() ;
$(“#hd2″).show() ;
$(“#ans2″).show() ;
$(“#button2″).show() ;
document.getElementById(“yans1″).innerHTML = rightans1 ;
$(“#yans1″).show() ;
var d1 = rightans1*divisor *10 ;
var e1 = dividend – d1 ;
document.getElementById(“d”).innerHTML = d1 ;
$(“#d”).show() ;
document.getElementById(“e”).innerHTML = ‘&nbsp;’ +e1 ;
if (theirs != rightans1){
alert(“Sorry,the correct answer is ” + rightans1) ;
}

}
else if (this.num ==2)
{
var theirs = document.formx.ans2.value ;
var d2 = rightans2*divisor ;
document.getElementById(“yans2″).innerHTML = rightans2 ;
document.getElementById(“f”).innerHTML = d2 ;
$(“#f”).show() ;
$(“#ans2″).hide() ;
$(“#yans2″).show() ;
$(“#fin2″).hide() ;
$(“#fin”).show() ;
$(“h3″).hide() ;
$(“#button3″).show() ;
$(“#button2″).hide() ;
if (theirs != rightans2){
alert(“Sorry,the correct answer is ” + rightans2) ;
}

}
}
</script>

You’d think it would be easy to find source code for a simple applet to demonstrate long division with a one-digit divisor and two-digit quotient. I wanted it to show the steps in long division, with the product for the first digit in the quotient shown, then that subtracted from the dividend, and the next digit in the quotient shown.

I had one in Flash I wanted to replace and I found all kinds of applets but none with the source code, so, here, as a public service, is what I did this evening while drinking beer.

You can see the end product here 

In case you are dying to know, here is how I write a program, regardless of the language:

  1. Get something to work
  2. Clean it up to make it better

To me, trying to get your code perfect on the first try is like expecting your first draft of an article to be perfect. I find it much easier to dash something off and then go back and rewrite. I know not everyone does it that way but it works for me.

I’m going to use jquery so let’s start with that

<script type=”text/javascript” src=”../javascript/jquery-1.11.0.min.js”></script>
<script type=”text/javascript” src=”../javascript/jquery-ui.min.js”></script>

<script type=”text/javascript”>

<! — First you need a random number function –>

function randnum(min,max)   {
var num=Math.round(Math.random()*(max-min))+min;
return num;
}

// Set up to get a new problem when the window loads. Create variables ;

window.onload = getProb ;
var rightans1 ;
var rightans2 ;
var dividend ;
var quotient ;
var divisor ;

 

function getProb()
{

quotient=randnum(10,100);
divisor=randnum(1,9) ;
document.getElementById(“ans1″).value = “” ;
document.getElementById(“ans2″).value = “” ;
document.getElementById(“yans1″).innerHTML = “” ;
document.getElementById(“yans2″).innerHTML = “” ;
dividend = quotient *divisor ;
divisor=dividend / quotient;
var w = quotient + “” ;
rightans1 = w.substring(0,1) ;
rightans2 = w.substr(1,1) ;
document.getElementById(“c”).innerHTML = divisor ;
document.getElementById(“divide”).innerHTML = dividend ;

}

Function above creates a problem with a quotient between 10 and 100 ;

The divisor will be between 1 and 9  ;

The quotient will be between 10 and 100 ;

I set the values in the form all to empty so that when a student reloads the page and gets a new problem the answer from the previous problem is not still there.

I made the dividend equal the quotient times the divisor to make sure that the divisor went into the dividend evenly with no remainder. I made a  local string variable, w, and then created two variables that were the first and second characters of that variable.

The final two lines in this statement write the problem to the page.

 

— Checking the problem is step 2.  Since I don’t like horrendously long blog posts, I’ll put that up tomorrow.

Getting ready to teach a data mining course at the end of the year, I started looking through data sets I have on my desktop. Not sure what I will end up using. My first lesson, no matter what, is going to be on data quality.

The very first thing I did was a series of PROC FREQs. Then, I thought maybe that was a mistake. Perhaps I ought to start off with SAS Enterprise Guide or Enterprise Miner.  Here is how I did the first peek at data quality with Base SAS. I’m going to do the same thing with Enterprise Guide tomorrow and see if it would be easier. After that, I’ll try Enterprise Miner. I know I downloaded the SAS On Demand version a while back and haven’t done much with it lately.

(There is a new SAS for the Web offering but from what I have seen (admittedly, a while back), it requires you to set up a virtual machine with VMware and I did not have the time to do it nor could I find my Windows 8 or Windows 7 install disk. Must clean office.)

The first thing I did was pull out a data set with a couple of thousand student quiz records. Yes, I know in data mining we will get to data sets in the millions but this is the first exercise of the first class.

I did not expect to have 2,000 quiz records because we only have around 1,200 beta testers and about 200 of those are teachers who I would expect would get all of the in-game problems correct so never be routed to a quiz. I also know from observation that some of the students never made it to the part of the game where they could do the quizzes. The first challenge page requires students to be able to read simple words and subtract two-digit numbers.

I did a super-simple PROC

proc freq data = in.realquiz ;
tables username*quiztype ;

and found that a couple of the users had supposedly taken the same quiz 40 or more times. One students showed having taken the quiz 70 times and another 91 times. While that is theoretically possible, I was suspicious because after those three, the highest number was 7.

I went into the data set and looked at those particular records and the time stamp showed them coming in tenths of a second apart. Clearly, the student was not answering 5-7 questions in less than a tenth of a second.

We tracked these down to a particular school that was having issues with the firewall. It appeared that when the program couldn’t connect to the server, it tried again and again. When there was a connection, all of those records went through at once.

LESSONS LEARNED

  1. Always look at the outliers. Don’t just toss them out. They can tell you things. In this case, taking a closer look at that PHP code is on my list of program fixes. If it happens at one school it can happen at others.
  2. Time stamps are your friend.  I try to include them whenever I can. Yes, it might take up a bit of time and space but there is nothing like it for detecting duplicate records – and fraud.
  3. Just because data has supposedly been cleaned up, never, never assume that it is problem-free.

 

At the moment, we are interested in knowing the most common failure point in the games. Do we need to add in more teaching and problems earlier? The games are designed to teach and test students in mathematics at the fourth and fifth grade levels. The teachers we work with often tell us that their average student is below grade level. So here was my next series of steps.

proc sort data = in.realquiz ;
by username quiztype ;

data test ;
set in.realquiz ;
by username quiztype ;
if first.quiztype ;

*** These first steps sort the dataset by username and type of quiz and then only retain the first instance of each. So, if a student actually did take the same quiz seven times, I am only interested in the fact that beginning the game, he or she could not do multiples of 3, not that it took seven tries to get there.

proc freq data = test ;
tables quiztype*pass / out=quizfreq ;
run;

*** This step shows both the quizzes students took and the result.

This is the point at which I began to become concerned, not about data quality but by what the data was beginning to reveal.

Table of quizzes by passing

Visually impaired – click here for HTML files of tables instead of png

Over half of the students failed and the quizzes they were failing seemed to be at the lower levels – around third-grade math.

LESSON LEARNED

You can get some very valuable information from some very simple statistics. A lot more about that, tomorrow, though, since I have to get back to work ….

I can imagine the type of person served by an expensive, intensive programming bootcamp – someone with money (or, at least, good credit) and several weeks of free time. That has never described me in my life. The last time I had six weeks free was in the summer after tenth grade, before I started working full-time and at age 14, I had neither money nor credit.

It doesn’t require a pile of money and uninterrupted summer’s worth of time to keep up or catch up on technology. If you fall behind, you have no one to blame but yourself.

My whole life, I have been interested in learning more about everything. (Well, except about literary and film criticism because, well, it sucks. Just try reading any of it and you’ll see I am right.) That’s included lots of graduate coursework. For years, I took one class a year – in something – microbiology, matrix algebra – just to learn something new. Now, I try to teach a course a year. Last year it was biostatistics. This year,  I think I will teach both biostatistics and data mining.  I always learn something new when trying to come up with good examples and activities for classes, I have to keep up on the latest software and operating systems. It isn’t just free, but they pay me – not a lot, which is why I only teach once or twice a year.

Someone recently tweeted,

I hope to never learn the meaning of the word “webinar”.

Webinars aren’t all bad (just most of them!). However, I was on one this morning Yakov Fain did on building HTML5 applications, hosted by O’Reilly Media that was definitely worth an hour of my time. It was free, by the way. Now, I’m sure it was just a way for them to sell books – which worked, since I bought one – but it is also a way to get a lecture by experts on a topic. I probably get 80 invitations to webinars for each one I attend. It’s not the most exciting format so I don’t recommend signing up unless it’s a subject you are really interested in learning.

Virtual conference – this is  a first for me to attend, so I will tell you how it works out. I signed up for one on health analytics sponsored by SAS. It looked interesting and it was free. There was a virtual conference I was interested in a while back, on javascript, but it was several hundred dollars for what appeared to me the equivalent of watching youtube videos. Maybe I missed out on something amazing. I’ll never know.

jennifer

Youtube – You are mistaken if you’ve only thought of it as videos for cute kittens and teenage rock star wanna-bes. (Are they actually called rock stars any more? Are they all rap star wanna-bes?) I actually watch youtube videos on jQuery and javascript on the TV while riding the exercise bike. This habit has caused The Perfect Jennifer to wonder aloud more than once
Exactly how is that you people don’t die of boredom around here?
Sadly, the public library hasn’t been a very good resource for me for programming resources. The books they have tend to be far out of date. It makes me sad because I love libraries and have cards for both the Santa Monica and Los Angeles libraries as well as a couple of university ones.

If your university or company offers you an account on the Safari library, I would jump on that because you get unlimited access to all of their books, videos and courses. The individual price for $43 a month seems a little much to me. If I didn’t have a free license, I’d just buy the ebooks I needed. We already have a LOT of technical books, though. If you don’t, maybe it’s worth it.

Just for questions, answers and randomly poking around stackoverflow.com is awesome.

It reminds me of when I was first learning SAS over 25 years ago. I was on the SAS-L mailing list and would just read every day what the really smart people were talking about.

I have to get back to work but there are lots more resources out there, both that I didn’t have time to list and others that I’m sure I don’t know about. Have a favorite?  Please share in the comments. I’m always looking for new places to learn cool stuff.

← Previous PageNext Page →