Lately, I’ve been working on a report that uses eight datasets that all have the same problems with the usernames.
In addition to needing to remove every username that contained the word “test” or “intern” we also needed to delete specific names of the classroom teachers who had played the game. We needed to correct names that were misspelled.
Here are a few examples of a very long list of statements:
if username in("MSDMARIA","1ANSWERKEY","MSDELAPAZ","MSCARRINGTON") then delete ;
if username = “GRETBUFFALO” then username = “GREYBUFFALO” ;
else if username = “HALFHORES” then username = “HALFHORSE” ;
else if username =”TTCARLSON18TTCARLSON18″ then username = “TTCARLSON18″ ;
These problems occurred in every dataset.
A second problem found when looking at the contents of each of the 8 datasets was that the username variable was not the same length in all of them, which could cause problems later when they were to be merged together or concatenated. Also, now that all of the usernames have been cleaned up, none should be over 12 characters in length.
Wouldn’t it be nice if there was a way to just get the first n characters of a string?
Enter our character function, substr, which returns a substring of a variable beginning at any position and for as many characters as you like. Problem solved.
newid = substr(username, 1, 12) ;
It seems pretty inefficient to write this set of statements eight times in eight different data sets. Also, next year we will have another eight data sets, and some will have these same students’ usernames and same problems. Wouldn’t it be a lot easier to have these statements in one place and add to the “fixnames.sas” file whenever we find a new problem?
So, now we have the write once, use anywhere solution of %INCLUDE.
What %INCLUDE does
The %INCLUDE statement references lines from an external file and processes them immediately. It has almost the exact same effect as if you had copied and pasted those lines write into your program. The exception that makes it “almost” is that a %INCLUDE statement must begin at a statement boundary. That is, it has to be either the first statement in your program or occur after a semi-colon ending a statement as in this example.
data studentsf ;
infile inf delimiter = “,” missover ;
attrib teacher length = $12. username length = $ 16. ;
input username $ age sex $ grade school $ teacher $ ;
%include “/courses/u_mine.edu1/wuss14/fixnames.sas” ;
Also, you need to think about it as if you had copied and pasted those lines into your program. Is it still valid code? Whenever using %INCLUDE, you should make sure the code runs in your program as expected, with no errors, before cutting it out and making it an external file.
To source or not to source
The default is not to show the statements that were in the included file. Generally, this is desirable. This is code you have already debugged and if you are using it multiple times (otherwise, why bother with the %INCLUDE), having the same 20 lines repeated 8 times in your log just makes it harder to debug.
Professors might want to use real data but hide all of the messy data handling from the students initially in fear they would run screaming for the door. I meant, professors might want to gradually introduce SAS statements and functions for data handling.
In either case, students could use the %INCLUDE statement as shown in the example above. To see the code include in your log is quite simple, just add a source2 option as shown.
%include “/courses/u_mine.edu1/wuss14/fixnames.sas” /source2 ;
It will be in your log as follows
NOTE: %INCLUDE (level 1) file “/courses/u_mine.edu1/wuss14/fixnames.sas is file “/courses/u_mine.edu1/wuss14/fixnames.sas.
419 +username = compress(upcase(username),”. “) ;
420 +if (index(username,”TEST”) > 0 or index(username,”INTERN”) > 0
and so on.
The + signs next to each statement denote it was in the included file.
If you want to know why I think it is so important for new SAS users to learn about the %INCLUDE statement, you should come to the Western Users of SAS Software conference in San Jose next month. Especially if you’re a student, you should come, because they cut you a really good deal.
If you’re not a student and you have a real professional job – well, then, you should be able to afford it. There will be funny hats, beer, coding and cookies. What more could one ask?
A few years ago, I was at the Western Uses of SAS Software conference and renowned statistician Stanley Azen played the piano and sang at the closing ceremony.
Briefly, very briefly, I considered beginning my presentation on 10 SAS steps to an annual report, by writing a song, These are a few of my favorite PROCs, and then singing it to the tune of “These are a few of my favorite things.”
This plan was dismissed a nanosecond later when I was reminded by The Perfect Jennifer that my singing bears an uncanny resemblance to the sound Beijing the cat used to make in the middle of the night when fighting with the cat next door.
To make up for my disappointment over my lack of musical rendition, I decided to do a few posts on my favorite PROCS, in no particular order. Today’s contestant is … drum roll please ….
Whenever possible, I try reading in the data using the IMPORT procedure, because, it is very simple and my goal in programming is not impress people with my brilliance – it is to get the job done with maximum efficiency and minimum effort.
As can be seen in the example below, there is no need to declare variable lengths, type or names. Only three statements are required.
PROC IMPORT OUT= work.studentsf DATAFILE= "/courses/u_mine.edu1/wuss14/Fish_students.csv" DBMS = CSV REPLACE;
GETNAMES = YES ;
This PROC IMPORT statement gives the location of the data file, specifies that its format is csv (comma separate values), the output file name is studentsf, in the work directory and that if the file specified in the OUT= option already exists, I want it to be replaced.
The second statement will cause SAS to get the variable names from the first row in the file. Since the variable names are in the first row of the file, the data begins in row 2.
Limitations of PROC IMPORT
As handy as it can be, PROC IMPORT has its limitations. Three we ran into in this project are:
- Excel files cannot be uploaded via FTP to the SAS server, so , no PROC IMPORT with Excel if you are using the SAS Web Editor,
- If the data that you want to import is a type that SAS does not support, PROC IMPORT attempts to convert the data, but that does not always work.
- For delimited files, the first 20 rows are used to determine the variable attributes. You can give a higher value for the number of rows scanned using the GUESSINGROWS statement, but you may have no idea what that higher value should be. For example, the first 300 rows may all have numbers and then the class that was records 301-324 has entered their grade as “4th” instead of the number 4.
Although PROC IMPORT is the first thing I always try, one of my pet peeves about instructors and textbooks is when it is the only thing they teach. It’s smart to try the simplest solution first. It’s dumb not to have a back up plan for the instances when that doesn’t work.
For more on what to do in those cases, you can come to WUSS in San Jose. Just a reminder – regular registration closes August 25. After that date, you’ll have to register on site.
Finishing up my second paper for WUSS next month and I have been thinking about the usefulness of character functions in a world where it sometimes seems like everyone is just put on this earth to irritate the hell out of me.
Take this problem, for example,
In analyzing the data for our games, we have all sorts of beta testers – teachers, staff, interns – who played the game but their data should be deleted from the datasets for the annual report. We asked them to use the word TEST in their username so it would be easy to pull them from the data. Some of them did and some apparently feel that I just say these things of exercise for my mouth.
There is also a problem with data entry errors. The subjects in this study were children in grades three through six and they frequently mistyped their usernames.
SAS has a wealth of character functions and this is a first opportunity to get to know and love four of them.
The UPCASE function, not surprisingly, changes the value of a variable to upper case. The COMPRESS function, if you give it only the variable as an argument, will remove blanks from a value. You can, however, include additional characters to remove. Since many of the students entered their names on some days as JohnDoe and others as John.Doe , we are removing both blanks and periods using the COMPRESS function, after we have converted them to upper case.
username = COMPRESS(UPCASE(username),'. ') ;
Then there is the INDEX function. Here is a general tip. Any time you find yourself thinking,
“Gee it would be nice if SAS did thing X”,
it is a pretty good bet that someone else thought the same idea and there is a function for it. The INDEX function is a perfect example of that. Our testers played the games many, many times and used usernames like “tester1”, “this.test”, “skippy the tester” or “intern7”.
“Wouldn’t it be nice if there was way to find out whether a given string appeared anywhere in a value?”
Enter the INDEX function, which does exactly that. This function is case-sensitive, but since we already converted the username to upper case above, that is no problem for us.
IF INDEX(username, “TEST”) > 0 or INDEX(username,”INTERN”) > 0 THEN DELETE ;
will do exactly what we want. The INDEX function returns a number that is the starting position in the string of the substring we are trying to find. So, in “skippy the tester”, the value is 12, in “tester1” it is 1. If the string is not found, the value is 0.
A problem I found when looking at the contents of each of the 8 datasets used for my research project was that the username variable was not the same length in all of them, which could cause problems later when they were to be merged together or concatenated. All of the usernames should have been a maximum of 12 characters but there were data entry problems when students would type mister_rogers instead of mr_rogers.
When the data are read in using PROC IMPORT, “For delimited files, the first 20 rows are scanned to determine the variable attributes. You can increase the number of rows scanned by using the GUESSINGROWS data source statement.”
Wouldn’t it be nice if there was a way to just get the first n characters of a string?
newid = SUBSTR(username, 1, 12) ;
will create a new variable with the first 12 characters of the username, now that we have gone and fixed the problems with it.
SAS is chock full of functions and options to make your life easier. If you are just beginning to work with SAS and you spend time working with messy data, you probably couldn’t spend your time much better than taking a few hours to read up on SAS character functions. In fact, I think for someone new to SAS, becoming familiar with a large number of all types of functions – character, statistical, date and time – is probably the fastest way to improve one’s productivity. (Ron Cody’s book, SAS Functions by Example, is a great resource). I’ve lost count of the number of times when reviewing a student’s program I’ve seen many lines of completely unnecessary code that could have been replaced by a SAS function – if the student only knew that it existed.
The second time I taught statistics, I supplemented the textbook with assignments using real data, and I have been doing it in the twenty-eight years since. The benefits seem so obvious to me that it’s hard to believe that everyone doesn’t do the same. The only explanation I can imagine is that they are not very good instructors or not very confident. You see, the problem with real data is you cannot predict exactly what the problems will be or what you will learn.
For example, the data I was planning on using for an upcoming class came from 8 tables from two different MySQL databases. Four datasets had been read into SAS in the prior year’s analysis and now four new files, exported as csv files were going to be read in.
Easy enough, right? This requires some SET statements and a PROC IMPORT, a MERGE statement and we’re good to go. What could go wrong?
Any time you find yourself asking that question you should do the mad scientist laugh like this – moo wha ha ha .
Here are some things that went wrong -
The PROC IMPORT did not work for some of the datasets. No problem, I replaced that with an INFILE statement and INPUT statement. It’s all good. They learned about FILENAME and file references and how to code an INPUT statement. Of course, being actual data, not all of the variables had the same length or type in every data set, so they learned about an ATTRIB statement to set attributes.
Reading in one data set just would not work, it has some special characters in it, like an obelus (which is the name for the divide symbol – ÷ now you know). Thanks to Bob Hull and Robert Howard’s PharmaSUG paper, I found the answer.
DATA sl_pre ;
SET mydata.pretest (ENCODING='ASCIIANY');
Every data set had some of the same problems – usernames with data entry errors that were then counted as another user, data from testers mixed in with the subjects. The logical solution was a %INCLUDE of the code to fix this.
In some data sets the grade variable was numeric and in others it was ‘numeric-ish’. I’m copywriting that term, by the way. We’ve all seen numeric-ish data. Grade is supposed to be a number and in 95% of the cases it is but in those other 5% they entered something like 3rd or 5th. The solution is here:
nugrade=compress(upcase(grade),'ABCDEFGHIJKLMNOPQRSTUVWXYZ ') + 0 ;
and then here
Data allstudentsents ;
set test1 ( rename =(nugrade= grade)) test2 ;
This gives me an opportunity to discuss two functions – COMPRESS and UPCASE, along with data set options in the SET statement.
I do start every class with back-of-the-book data because it is an easy introduction and since many students are anxious about statistics, it’s good to start with something simple where everyone can succeed. By the second week, though, we are into real life.
Not everyone teaches with real data because, I think, there are too many adjunct faculty members who get assigned a course the week before it starts and don’t have time to prepare. (I simply won’t teach a course on short notice.) There are too many faculty members who are teaching courses they don’t know well and reading the chapter a week ahead of the students.
Teaching with real, messy data isn’t easy, quick or predictable – which makes it perfect for showing students how statistics and programming really work.
I’m giving a paper on this at WUSS 14 in San Jose in September. If you haven’t registered for the conference, it’s not too late. I’ll post the code examples here this week so if you don’t go you can be depressed about what you are missing,
If I had a clone, all of my code would be beautiful.
Last week, I was a speaker at the Tribal Disability Conference in Turtle Mountain, where I spoke on starting a business. Then, I went for a site visit at Spirit Lake Vocational Rehabilitation followed by another talk on self-employment at the Tribal Disability Awareness conference. In a nutshell, I talked about how having a disability often teaches people to persevere, to not accept when told they can’t do something, to find different ways of meeting goals and solicit other people to help them – and pointed out that all of these traits can be an advantage in starting a business.
Along the way, I was working on a couple of grants, edited a couple of papers – and just this second remembered I have to finish editing a paper I co-authored for something – crap!
There was also the usual matter of approving payroll and invoices, answering email and reviewing work people did while I was gone – new teaching videos to go into the game, artwork, animation, sound files,documentation, bug fixes. Haven’t nearly finished with that.
I’m super-stoked to be on a panel on Monday at the National Council of La Raza conference, “Economic Empowerment in a Wireless World”. I’m planning on going Sunday as well, to a lot of the sessions on education.
I got to hear Heidi Heitkamp speak at Turtle Mountain last week and with any luck I’ll be able to attend Elizabeth Warren’s talk on Sunday. Must be my week for Democratic senators.
Somewhere in all of that, I finished my slides and video for the Serious Play conference, also this week, which I am also excited to attend.
Then, there was the meeting people for lunch, stopping in on my daughter who had surgery and checking on her and all of the other general life things. There is a board meeting I have to get up and go to in about nine hours, which I am definitely NOT excited about, but I’m the chair, so I kind of have to show up.
In the midst of all of this, there are 77 fixes and improvements in the Fish Lake game, from “add a better message when the pretest is completed” to “Revise quiz code for re-routing students. This is replicated in many quizzes. Make external file ref & just call it in all of those”. Some of those are crucial – like I never wrote the quiz for one spot and so that is a dead end.
There are another 47 improvements for Spirit Lake. All of those are to make the game better. For example, we recorded voices from kids at Spirit Lake, and when a student gets a problem wrong, I want to add a video clip that shows one of the game characters and says something like,
“No, 7 x 8 = 56. Now your village burned down.”
The kids did a great job and I think those clips will really help players remember their multiplication tables.
But … back to my missing quiz. It has to be on mixed fractions, with questions answered using both improper fractions and mixed fractions. There also should be a question with two answers for the numbers that the mixed fraction falls between. Also, at least two word problems, with answers that are whole numbers.
As each question is answered, the program needs to determine if it is the right answer, and, if so, add to the total score, then show a slightly more difficult problem. At the end of the quiz, the student is shown a success message and the student data written to our database and routed back to the game. If it is the wrong answer, the student is shown a failure message and routed to the appropriate page to study.
In the process of writing this, by the way, I noticed that one of the links on the study page is wrong, so I need to fix that. Apparently, I meant to write something involving turtle eggs. Also, there is a video Diana did on mixed fractions which I have yet to review because I got back at midnight on Wednesday and dived into everything else.
So … back to my no-longer-missing quiz. It is done. I even put in a few comments. As I was writing it, I was thinking, “some of this code is duplicated” and “I bet I could re-write some of these functions so they were more general and then not have so many functions” and a whole lot of other ideas for making it just a better program.
I KNOW that the world is full of code that gets written to be fixed “another day” is still sitting there six years later. In my defense, I will say that I do often loop back around and fix that code – although it might be a year or two later.
Here is my compromise – when I am in town, I try, come hell or high water, to make at least one substantive improvement on one of the games every day – a new video clip, a new quiz. At worst, I may not get any more done than fixing a broken link or touching up a graphic or sound file, but I really try to do more than that. Those 124 fixes are down from 266. It is not perfect but it is progress and it is 1 a.m. In addition to writing this post, I did review one more instructional video and sent feedback, finished the first draft of editing the paper and added improving the code in this quiz as a lower priority game fix.
My code is not perfect but it works, and I will come back and try to do better tomorrow because, at the end of the day, there’s another day. That’s how time works.
More notes from the text mining class. …
This is the article I mentioned in the last post, on Singular Value Decomposition
Contrary to expectations, I did find time to read it, on the ride back from Las Vegas and it is surprisingly accessible even to people who don’t have a graduate degree in statistics, so I am going to include it in the optional reading for my course.
Many of these concepts like start and stop lists apply to any text mining software but it just happens that the class I’m teaching this fall uses SAS
In Enterprise Miner, you can only have 1 project open at a time, but you can have multiple diagrams and libraries, and of course, zillions of nodes, in a single project
In Enterprise Miner, can use text or text location as a type of variable. Documents < 32K in size can be contained in project as a text variable. If greater than 32K, give a text location.
- start lists – often used for technical terms
- stop lists, e.g. articles like “the”, pronouns. These appear with such frequency in documents they don’t contribute to our goal which is to distinguish between documents. May also include words that are high frequency in your particular data. For example, mathematics, in our data, because it is in almost every document we are analyzing
Multi-word term tables – standard deviation is a multi-word term
Importing a dictionary — go to properties. Click the …. next to the dictionary (start or stop) you want to import. When it comes up with a window, click IMPORT
Select the SAS library you want. Then select the data set you want. If you don’t find the library that you want, try this:
- Close your project.
- Open it again
- Click on the 3 dots next to PROJECT START CODE in the property window
- Write a LIBNAME statement that gives the directory where your dictionaries are located.
- Open your project again
[Note: Re-read that last part on start code. This applies to any time you aren't finding the library you are looking for, not just for dictionaries. You can also use start code for any SAS code you want to run at the start of a project. I can see people like myself, who are more familiar with SAS code than Enterprise Miner, using that a lot.]
Filter viewer – can specify minimum number of documents for term inclusion
Speaking of Las Vegas, blogging has been a little slow lately since we took off to watch The Perfect Jennifer get married. It was a very small wedding, officiated by Hawaiian Elvis. Darling Daughter Number Three doubled as bartender and bridesmaid then stayed in Las Vegas because she has a world title fight in a few days.
Given the time crunch, I was particularly glad I’d attended this course that gave me the opportunity to draft at least one week’s worth of lectures in the fall. When I finish these notes, my plan is to to edit them and turn it into the last lecture in the data mining course. If it’s helpful to you, feel free to use whatever you like. I’ll try to remember to post a more final version in the fall. If you have teaching resources for data mining yourself, please let me know.
My crazy schedule is the reason I start everything FAR ahead of time.
Maybe this is obvious, but I have often found that what is obvious to some people is not so obvious to others, so here are a few random tips.
1. Enterprise Miner can take a REALLY long time to load during which you wonder if anything is happening at all.
Open up the task manager and look for something that says javaw.exe *32 You can see it near the bottom in the image above. The number next to it should be going up, from 30,000 to 50, 000 etc. If it is, you should probably be patient for a few more minutes and your session will start.
2. Let’s say you want to change the properties of something. For example, I don’t want the data set to be partitioned into Training, Validation and Test in a 40, 30, 30 split. I want it to be 50, 50, 0. So, I right-click on the DATA PARTITION node, get a drop-down menu and
there is all of this stuff about Edit Variables all the way down to Disconnect Nodes, where the hell are the properties to change? They’re on the left, in that window with the title Property! Funny, but it’s so easy to focus on the diagram window and completely forget about everything else. Click on a node and it’s properties will show up in the window.
3. While the three screens you see when you run the StatExplore node are pretty interesting, it would be nice to have a more detailed look at your data. Just go to the VIEW menu and you can get more statistics, like the cell chi-square values, descriptive statistics of numeric variables broken down by the levels of your target variable.
After all of the effort to get Enterprise Miner installed, I thought it better do something good. It is interesting to use. Unlike programming where you can get a program to run but give you errors or unexpected results, so far (key phrase!), with Enterprise Miner I have found the problem to be knowing exactly what to select, for example, with CREATE DATA sources. Once you know that, however, it seems pretty hard to make an error.
Enterprise Miner does do some pretty cool stuff, which makes it worth the pain of getting it installed. Even way cooler, unlike back in the day when no one could get their hands on it without paying approximately $4,893,0893.16 , their first born child, their left kidney and an albino goat, if you are an instructor or a student, you can get it for free through SAS On-Demand for Academics.
(And, yes, for the record, I *am* aware that said goat is not an albino. I was fresh out of pictures of albino goats. Deal with it.)
Speaking of Enterprise Miner, I thought I would ramble on about the good parts for a few posts, since I’m getting ready to teach data mining in the fall and I hate to do anything at the last minute.
One of the good parts is StatExplore. At first glance, it looks good, but at second glance, it looks better.
All you need to do is create a diagram by going to the FILE menu, then selecting NEW and then DIAGRAM.
You can start by dragging a data source on to the diagram. In this example, I used the heart data set from the Framingham Heart Study, which happens to ship with Enterprise Miner in the SASHELP library.
I drag the data set from data sources to the diagram window.
Next, I click on the EXPLORE tab just above the diagram window. This gives you a bunch of icons. Enterprise Miner is just rife with icons. Never fear, though, if you have no idea what this bunch of colored boxes is supposed to mean versus that bunch, just hover over the icon with your mouse and it will tell you.
Here is my diagram. Simple, no? It gives you a bunch of cool stuff. First, you have the plot of chi-square values for all nominal variables.
You can see that sex has the highest chi-square (as in gender, not as in frequency of), followed by cholesterol status, smoking status and weight status. I find this rather surprising. I knew women lived longer than men, but with all of the discussion of obesity, I thought weight would be higher up there.
The next chart gives me the worth of each variable in predicting my target, which in this example is death.
The variable on the far left is age at start. Not surprisingly, the older people are when you start following them, the more likely they are to die in a given period of time. The next variable is Age at CHD Diagnosis, followed by two blood pressure measures, their cholesterol, then cholesterol status – weight status is down at the end.
This analysis produces A LOT of statistics. This, I found interesting because despite some people arguing Enterprise Miner allows analysis by someone without extensive programming or statistics background, certainly in the case of statistics, the more knowledge you have, the better you could make use of the results.
For example, in the top right (all three of the screen shots above are one screen, I broke them up at an attempt at legibility), the output pane gives descriptive statistics broken down by each level of the target variable. I can see how many people who died had missing data for age at CHD diagnosis, skewness and kurtosis values for variables by status, living or dead, the mode for weight status for people who were living or dead, and a whole lot more. Interestingly, 68% of the whole sample was overweight.
Scrolling through the statistics output I can get a good idea of the data quality – is it skewed, is it missing, is it missing at random.
Without some background in statistics, that’s probably no more than a bunch of numbers. Personally, I found it very helpful. That’s another assignment for the students, to write a brief summary of their data, including any concerns. There weren’t any real problems with these data except for the obvious fact that variables like cholesterol and cholesterol status,smoking and smoking status are going to be highly correlated. It would be a good idea to include one of those as input in any predictive analyses and reject the other to prevent multicollinearity problems.
(NOTE to self: Make sure to explain variable roles, changing variable roles in EM and multi-collinearity.)
You might think this is adequate for running just one node, but, in fact, there is much more here than meets the eye. More on that tomorrow because speaking of overweight, I have been at a computer for 13 hours today and I want to hope on the bike and get some exercise in before I knock out the last task I need to do today. Although @sammikes just pointed out on twitter that round is a shape, it is not the one I want to be in.
Most likely, you,too, have experienced homicidal urges when confronted with a problem you have spent five hours trying to solve on your computer, only to call tech support and have them report,
Well, it works fine on my computer.
You’d think if that solved the problem that they would offer to box up their computer and send it over to your house but, alas, they never do.
This is the reason that any software I use for class I test on several computers under different conditions. After having initially failed to get SAS On-Demand for Enterprise Miner to work with boot camp on the Mac, I tried it on a Lenovo machine running Windows 8. I had to install the JRE and ignore a few security warnings, but after that it worked.
[For how I did eventually get it working with boot camp, click here, and thank Jason Kellogg from SAS. ]
Next, I needed to upload some data. The SAS instructions say to use your favorite FTP client and coincidentally, I do have a favorite FTP client (Filezilla), so I downloaded it to the testing machine.
Only the professor can upload data to the class directory, and most professors probably have an FTP program on their personal computer (or maybe not, do you?) Even if you normally do, you may, like me, have borrowed a machine to use for testing or have a new computer. Whatever, this just reinforces my argument that you should never, never plan to use any kind of software in a class unless you have ample time to prepare.
I know that there are schools that ask adjuncts to teach on a week or two notice. That seems to me a recipe for disaster for both the professor and students, unless maybe you are doing something that hasn’t changed in 50 years and requires no technology, like reading Chaucer, I recommend you follow the advice of Nancy Reagan and “Just say no.”
Here are my first few hints:
- Test the software on multiple machines and multiple operating systems.
- Make sure one of those machines is on the older, under-powered end of the spectrum, as students often don’t have a lot of extra cash and may not have the shiniest, newest machine like you have on your desk.
- Test it on the latest operating system. It may turn out that the version your school has does not work with Windows 11. (I did not have that problem with the Enterprise Miner this time, but I’ve had it with other software in the past so it is a good idea.)
- Find out what other software you might need, for example, some kind of FTP program in this case, and install it on your computer, if necessary.
- Give yourself plenty of time to do all of the above.
You might think these types of things would be handled by the information technology department at your university, and you may be really lucky and that will be so. In many schools, the IT department basically helps re-set passwords, assigns school email addresses, helps to get discounts on software and upload files to Blackboard and not much else.
For years, I have been trying to figure out where the $50,000 a year or so tuition goes. It isn’t to adjunct professors and it isn’t to the IT staff. It also isn’t to buying the latest technology because, more and more often, students are expected to bring their own device.
You may think that none of the above should be your job and you may be right, but I am just saying if you want to anticipate the frustrations your students will experience and be able to solve their problems during the lecture by directing them to a link on your class website/ blog your life and theirs will both be a lot easier.
Thank you to Jason Kellogg from SAS Technical Support, SAS On-Demand Enterprise Miner is now running on my Mac using Windows 8.1 with boot camp. Here were his instructions.
The steps are: 1. Download and save jre-6u24-windows-i586.exe. http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-javase6-419409.html#jre-6u24-oth-JPR 2. Open the Windows Run window and run "C:\users\[userid]\Downloads\jre-6u24-windows-i586.exe" STATIC=1 where [userid] is your user account name 3. Click OK to start the installation 4. After finishing the installation, on the desktop, right click empty area and select “Create Shortcut” (NOTE: on Windows 8.1 this was NEW and then SHORTCUT) 5. In the location, Browse to Desktop and click Next 6. In the next screen provide name of shortcut, for example “Enterprise MinerJWS” 7. Once the shortcut is created, Right Click and select Properties. In the Target enter the following: "C:\Program Files (x86)\Java\jre1.6.0_24\bin\javaws.exe" https://academic93.oda.sas.com/SASEnterpriseMinerJWS/main.jnlp 8. Click Apply You now have a clickable shortcut to Enterprise Miner. Please use it when starting Enterprise Miner.
This worked and I now have SAS Enterprise Miner working on my laptop, which is going to be extremely convenient.
PLEASE NOTE THAT ALL OF THE QUOTATION MARKS NEED TO BE THERE OR IT WILL GIVE YOU AN ERROR.
ALSO, under #7 that is all one command. I had to break into two lines on this blog to be legible.
Although it was still a huge pain in the ass to get started, it is leaps and bounds ahead of the first time I tried Enterprise Miner years ago.
Back then, it required back flips and sacrificing a chicken (okay, finding a machine running Windows XP, installing a bunch of files – just take my word it was a pain in the ass). As for the on-demand version, it was so slow as to be useless.
In contrast, once I got up and running, it was not bad at all, and that was running off the wireless in the office. Now, our internet speed is good here, so your mileage may vary, but at least under good conditions it runs fine using a small dataset.
So, I just uploaded a dataset with 10,000 records and 6,000 variables. We’ll see what it does with that.
==== Random shameless plug =====
When I’m not playing around with statistical software, I’m running a company that makes adventure games to teach math. If you want your children to do something educational this summer, you can buy a copy here for $9.99.