I was reminded today how useful a SAS log can be, even when it doesn’t give you any errors.

I’m analyzing data from a study on educational technology in rural schools. The first step is to concatenate 10 different data sets. I want to keep the source of the data, that is, which data set it came from, so if there are issues with these data, outliers, etc. I can more easily pinpoint where it occurred.

I used the IN= option for each data set when I read them in and then some IF statements to assign a source.

DATA mydata.all_users18 ;
    SET  sl_pre_users18 (in=slp )
              aztech_pre_clean (in=azp )
             AZ_maya_students18 (in=azms)
            fl_pretest_new18 (in=flpn)
            fl_pretest_old18 (in=flpo)
            ft_users18(in=ft)
           mydata.fl_students18 (in=fls )
          mc_bilingual_students18 (in=mcb)
        mc_users18 (in=mc)
        mydata.sl_students18 (in=sls)
;

After I run the data step, I see that 425 observations do not have a value for “source”. How would you spot the error?

Of course, there is more than one way, but I thought the simplest thing was to search in the SAS log and see which of the data sets had exactly 425 observations. Yep. There it is. Took me 2 seconds to find.

147 PROC IMPORT DATAFILE=REFFILE
148 DBMS=XLSX
149 OUT=WORK.MC_bilinguaL_students18 replace;
150 GETNAMES=YES;

NOTE: The import data set has 425 observations and 2 variables.

So, I looked at the code again and sure enough, I had misspelled “source”

IF  slp THEN source = “Spirit Pre” ;
      else if azp then source = “Az Pre” ;
     else if fls then source = “Fish Studn”;
     else if mcb then sourc = “M.Camp.Bil” ;

You might think I could have just read through the code, and you are right, but there were a lot of lines of code. In this case, I could immediately identify that it was something to do with that specific data set and reduce the code I needed to look at significantly. I just started with the last place that data set was referenced to work backward. Fortunately for me, it was in the very last place I called it.

The fact is, you will probably spend as much time debugging code as you do writing it. The log and logic are your friends. Also, no matter how long you have been programming you still make typos.

Want to play one of the games from this study? Have a computer? Go ahead, maturity is over-rated.

I know people who are so obsessive about testing and validating their code to the point they spend more time on testing it than actually writing it and analyzing the output. I said I know people like that, I didn’t say I was one of them. However, it is good practice to validate your SAS code and despite false rumors spread by my enemies, I do it sometimes.

Here is a simple example.  I believed that using the COMPRESS function with “l” for lower case or “I” for case-insensitive gave the same results. I wanted to test that. So, I ran two data steps

DATA USE_L;
set mydata.aztech_pre ;
q3 = compress(Q3,’ABCDEFGHIJKLMNOPQRSTUVWXYZ’,’l’);
q5 = compress(Q5,’ABCDEFGHIJKLMNOPQRSTUVWXY’,’l’);

… and a whole bunch more statements like that.

Then, I ran the exact same data step but with an “I” instead of an “l”  .

Finally, I ran a PROC COMPARE step

PROC COMPARE base =USE_L compare=USE_I ;
Title “Using l for lowercase vs I for insenstitive” ;

PROC COMPARE RESULTS SHOW NO DIFFERENCES

But, hey, maybe PROC COMPARE just doesn’t work. Is it really removing everything whether it is upper or lower case? To test this, I ran the procedure again comparing the dataset with the compressed results with the original data set.

PROC COMPARE base =mydata.aztech_pre compare=use_I ;
Title “Comparing with and without compress function” ;

The result was a whole lot of output, which I am not going to reproduce here, but some of the most relevant was:

  Values Comparison Summary                                                      
                                                                                                                                    
Number of Variables Compared with All Observations Equal: 24.                                     
 Number of Variables Compared with Some Observations Unequal: 16.                                  
Number of Variables with Missing Value Differences: 10.                                           
Total Number of Values which Compare Unequal: 694. 

Looking further in the results, I can see comparison of the results for each variable by observation number

          ||  q5                                                                              
           ||  Base Value           Compare Value                                              
       Obs ||  q5                    q5                                                        
 ________  ||  ____________          ____________                                              
            ||                                                                                  
         5  ||  150m                  150                                                       
         6  ||  42 miles              42                                                        
        10  ||  one thousand                                                                    
        12  ||  200 MILES             200       

So, I can see that the data step is doing what I want, which is removing all of the text from the responses and only leaving numbers. This is important because the next step is comparing the responses to the questions with the answer key and I don’t want any mismatches to occur because the student wrote ‘200 miles’ instead of 200.

In case you are interested, this is the pretest for two games that are used to teach fractions and statistics. You can find Aztech: The Story Begins here and play it for free, on your iPad , Mac, Windows or Chromebook computer.

Mayan god
Play Aztech !

Forgotten Trail can be played in a browser on any Mac, Windows or Chromebook computer.

Some people believe you can say anything with statistics. I don’t believe that is true, unless you flat out lie, but if you are a big fat liar, I am sure you would lie just as much without statistics.

However, a point was made today when Marshall and I were discussing, via email, our presentation for the National Indian Education Association. One point we made was, while most vocational rehabilitation projects serve relatively few youth, the number at Spirit Lake has risen dramatically. He said, 

You said the percentage of youth increased from 2% to 20% and then you said the percentage of youth served tripled. Which was it?

It depends on how you slice your data

There is more decision-making in even basic statistics than most people realize. We are looking at a pretty basic question here, “Did the percentage of the caseload that was youth age 25 and under, increase?”

The first question is, “Increase from when to when?”  That is, what year is the cutoff? In this case, that answer is easy. We had observed that the percentage of youth served was decreasing and changes were undertaken in 2015 to reduce that trend. So, the decision was to compare 2015 and later with 2014 and earlier.

Percent of Youth on Caseload, by Year

How much of an increase is found depends on the year used for comparison and whether we use one year or an average.

The discrepancy between the 10x improvement versus 3x comes because the percentage of youth served by the project varied from year to year, although the overall trend was going down. If we wanted to make ourselves look really good, we could compare the lowest year – 2013 at 2% with the highest year, 2015 at 20% and say the increase was 10x, but I think that isn’t the best representation, although it is true. One reason is that the changes we discussed in the paper weren’t implemented until 2015, so there is no justification for using 2013 as the basis.

The second question is how do you compute the baseline? If we use all of the data from 2008-2014 to get a baseline,  youth comprised  7% of the new cases added. At first, I used the previous year six years as baseline 2008-2014, we get 7% and if we compare that to 2015 with 20.2% the percentage of youth served almost tripled. 

However, we just started using the current database system in 2012 fiscal year and the only people from prior years in the data were those who had been enrolled prior to 2012 and still receiving services. The further back in time we went, the fewer people there were in the system, and they were definitely a non-representative sample. Typically, people don’t continue receiving vocational rehabilitation services for three or four years. 

You can see the number by year below. The 2018 figure is only through June of this year, which is when I took a snapshot of the database.

If we use 2013-2014  as a baseline, the percentage of youth among those served was 4%. If we use 2012-2014, it’s 6%. 

To me, it makes more sense to compare it to an aggregate over a few years.  I averaged 2012 through 2014 because it gave larger sample size, had representative data and also because I didn’t feel comfortable using the absolute lowest year as a baseline. Maybe it was just a bad year. As any good psychometrician knows, the more data points you have, the more reliable your measure. 

 The third question is how to select the years for comparison. I combined 2015-2018 also because it gave a larger sample size and, again,  I did not want to just pick the best year as a comparison. Over that period, 18% of those served by the project were youth.

So … what have we learned? Depending on how you select the baseline and comparison years we have either improved 10 times, from 2% to 20% , 2.6 times, from 7% to 18%,  tripled, from 6% to 18% , quadrupled, from 4% to 20% – and there are some other permutations possible as well.

Notice something here, though. No matter how we slice it, after 2014, the percentage of youth increased, and substantially so. This increase was maintained year after year. 

I thought this was an interesting example of being able to come up with varying answers in terms of the specific statistic but no matter what, you came to the same conclusion that the changes in outreach and recruitment had a substantial impact.

According to that source of all knowledge on the interwebz, Wikipedia,  “Gaslighting is a form of psychological manipulation that seeks to sow seeds of doubt in a targeted individual or in members of a targeted group, making them question their own memory, perception, and sanity.”

Have you ever had a brilliant, super-competent friend who doubted her own competence?

I’ve often seen this happen to women in technical jobs, and it’s happened to me. Here’s what happens. You work with a man or a couple of men. (In theory it could be women, or men could do this to other men but I personally have only seen men do this, and usually to women). No one knows everything (duh!). You are an expert in Python, Ruby, JavaScript, PHP and Objective C. You’ve developed some pretty cool iOS apps, been part of some successful teams.  Bob suggests that the team really needs an Android app, but, 

You don’t know Java, do you, Joan?

You suddenly realize,

“Oh, my God, I don’t! How did I miss learning Java?”

Part of gaslighting is “using what’s important to you as ammunition”. If you’re a woman who has been in software development, mathematics, statistics or science for a long time,  it’s no doubt important to you and you’ve overcome a lot to stick it out and get where you are.  It’s important to you to be competent and knowledgeable and having someone question that is disconcerting. 

Gaslighters wear you down. It’s the death of a thousand cuts. Bob will insist that the prototype of the next app has to be built for Android because it’s the largest market share, “Of course, that leaves you out of the prototype build  because we need experienced Java developers.” “I’ll bet you’ve never used Android Studio.”

Gaslighters are also experts at reframing things, so much so that you don’t think of the fact that the last five prototypes were done for iOS and there was no problem porting to Android.

Gaslighters can also be good at getting other people to go along with them. If Bob repeatedly tells Sam that Joan isn’t a good fit for this project because we really need an Android developer for this prototype and Joan has no expertise in that area, “she mostly just does testing on iPhones”, Sam may believe him, after all, she’s admitted she has no expertise with Java. So, Sam is not going to consult with Joan on any technical issues, which wears Joan down even further. 

I agree with Stephanie Sarkis that some gaslighters may do this unintentionally and subconsciously. They are, in my experience, trying to make up for feelings of inferiority by making themselves look better by comparison and getting other people to depend on them. 

It doesn’t matter whether it is deliberate or not. The effects are insidious.

I used to think, “Suck it up, buttercup. If some clowns don’t think you have the technical chops, prove them wrong.”

I still think that to some extent, but I can see that it can be really difficult if you are constantly pricked with an endless series of whispering questions of your competence, both behind your back and to your face. It’s exhausting to always be trying to prove your abilities in the areas where you are knowledgable at the same time explaining that no, you have never used (insert any language here because no one has used all of them). I’ve seen women who really enjoyed coding move into marketing or project management giving the reason, 

It just wasn’t fun any more.

You may already be the solution.

Three of us, mutual friends, were at lunch one day and one woman mentioned she had been offered a terrific job but it was for “an expert in the field’ and she didn’t consider herself an expert. Her other friend and I immediately interrupted her,

What? Are you nuts? You are the very definition of an expert!

Then, we proceeded to list all of her amazing accomplishments because she really is incredible. 

Stick with people who see you in the best possible light

I have a great advantage in protection against gaslighters in that I married the right person. Recently, we were drinking beer with a friend who referred to me as “testing the games” and The Invisible Developer corrected, 

She doesn’t just test the games. She makes them, too.

It’s not often someone questions my technical ability around my husband, but when it does happen, he speaks up for me 100% of the time. That’s a big deal because he is not at all one to draw attention to himself. He’s not called the Invisible Developer for nothing. 

It’s not just him. I’m super fortunate to have a group of friends and colleagues who are really supportive and collaborative people who always have my back. 

If you are the problem, you have a problem

Maybe you are scoffing dismissively at this point that if Joan was any good none of this would have bothered her. You are making a snide comment over your cubicle that real developers don’t need anyone to tell them they’re good. People often feel uncomfortable around gaslighters, even if they can’t give a reason. They are right, too, because once Joan leaves, you’ll need someone else to disparage to make yourself feel superior, maybe Sam.

If Sam has a choice of his next project, it’s probably not going to be one with you. If he does get stuck working with you, after all of your comments about Joan, Sam is going to expect that you are God’s gift to Android development, that, in fact, your middle name is Java and the language was named after you. Imagine his response when you turn out to be nothing special.

What I’ve seen happen to the gaslighters eventually is that no one wants to work with them. Even though Bob thinks he’s a 10X software developer, for some reason no one wants him on their team. He tells himself it’s because they’re jealous. 

In the meantime, though, Joan is now managing the marketing department.

Don’t end up like Joan

Years ago, on the More than Ordinary podcast, I had my lovely daughter , Julia, as a guest to talk about what it’s like in boarding school. After saying, that “First of all, it’s nothing like Hogwarts … ” she went on to add

No matter where you are, you can find people to study with, to help support you to reach your goals. And, if not, well, just be that person for yourself.

So, if you find yourself being questioned so much that you start questioning yourself, try finding friends and colleagues who support you and remind you of your awesomeness. If for some reason that’s not an option, I suggest this. Remind yourself. Sit down and write down all of your accomplishments. Then, next time Bob questions you tell him, 

“Shut up you little prick. I’ve done amazing things, and I’m going to be here long after you’re gone.”

Okay, well, maybe you shouldn’t say that out loud at work, but if you do, I won’t blame you. 

In my day job, I make educational games, like this one where a Mayan god thing drags you into the past. Yes, it teaches math.

Mayan guy
Aztech Games: Not made with Java – yet – but that’s just a coincidence

About ROUND(100) years ago, I took a couple of COBOL courses. I never coded anything in COBOL after the classes, but the concept of a table look up has stuck with me.

Just like it sounds, this is used when you want to look something up in a table. For example, if I have ICD-10 codes, which are used to classify diagnoses, I could look up J09.X2 in a table and see that is “influenza”.

There are several ways to do this with SAS, including using PROC FORMAT, a one-to-many merge with either the data step or PROC SQL , a bunch of IF statements and a macro.

One way I learned very early on to do this in SAS was to use a PROC FORMAT.

Say I have a bunch of possible codes for outcomes for my program and they are coded 02 = applies all the way to 32 = “Post-employment services” .

PROC FORMAT; 
VALUE stats
0 = “referral”
2 = “applicant”
6 = “evaluation”
8 = “closed”
10 = “eligible”
12 = “IPE complete”
14 = “counseling”
18 = “training”
20 = “service complete”
24 = “service interrupted”
22 = “employed”
26 = “successful closure”
28 = “closed after service”
30 = “closed before service”
32 = “post-employment”
;

Another option, if I wanted to combine categories, like those who had a successful and unsuccessful outcome, is to do it like this :

PROC FORMAT;
VALUE stats
0, 2, 6, 10 , 12, 14, 18, 20  = “open case”
24, 28, 30 = “unsuccessful”
22, 26 , 32 = “successful closure”
;

In either case,  if I just wanted to have the type of service printed , I could use a FORMAT statement , like this.

FORMAT status_type stats. ;

If I wanted to create a new variable I could use the put function to put the original value into a new variable using the format.

DATA testy ;
  SET mydata.vr_codes ;
  recoded_status = PUT(status_type,stats.) ;
  

Is there any advantage of PROC FORMAT over doing 20 or 50 IF statements?

I can think of several. First of all, you can use the same PROC FORMAT repeatedly.  If you need to do the same transformation with several different data sets, you can just do the format procedure once, include one PUT or FORMAT statement in each data step and you are done.  Second, since you can store formats permanently,  if you haven’t gotten around to learning macros yet, this can be one method of using the same code over and over in different programs. Third, it’s just less messy to type, which seems trivial until you have 300 values to recode. 

Some day I might write a post on user-defined formats, especially how to store and re-use them. Today is not that day. In the meantime, I highly recommend reading this paper on building and using  user-defined formats by Art Carpenter while you are waiting.


I live in opposite world. I blog on SAS and statistics for fun and make games for a living. Check out Making Camp Premium. Learn about Ojibwe culture, brush up your math skills, learn more English and have fun. All for under two bucks.

Making Camp scene with buffalo, deer and rabbit

 

“I don’t document my code because if you really understood the language, it should be obvious.”

– Bob

Bob is an arrogant little prick.

Here are just a few reasons to document your code.

  1. Other people may need to modify it because, despite your assumed brilliance, there may be other people in the universe capable of maintaining your code when you get a promotion, take another job or get your sorry ass fired.
  2. Six months from now, you may need to look at this code again. After 11 other projects have intervened, you’ll be trying to figure out what the hell the prev_grant_yrs variable was supposed to measure. Every time I add comments to a project, I say to myself, “Future me will thank me for this.”
  3. If you use Title and Label statements, there will be additional clarity not just for you as a programmer but also for the users.

Here is an example

This comes from a longitudinal analysis of a vocational rehabilitation project. There are only two comment statements in this snippet, however, there is a LABEL statement which explains that the prev_grant_yrs variable is the number of years a consumer was served under the previous grant. There was a significant change in operations in the current grant cycle, but when this five-year cycle started there were a number of people already on the caseload who had been determined eligible under the previous administration.

data by_year ;
set mydata.vr2018 ;

** USES YEAR FUNCTION TO GET THE YEAR OF INDIVIDUAL PLAN OF EMPLOYMENT ;

*** AND OF APPLICATION TO THE PROGRAM ;


ipe_year = year(ipe_date) ;
app_year = year(app_date);

if ipe_year ne . and ipe_year < 2008 then prev_grant_yrs = "5+" ;
else if ipe_year < 2012 then prev_grant_yrs = "2-4" ;

  else if ipe_year > 2011 then prev_grant_yrs = "0-1";

LABEL prev_grant_yrs = "Years Under Previous Grant"
ipe_year = "Year IPE written"

app_year = "Year applied"

;

The first procedure, I simply wanted to get a closer look at the people who had been getting services for more than five years under the previous grants. It’s important to add that second title line so readers know this isn’t ALL long-term consumers but those who had been long-term users coming into the current grant cycle.

TITLE "Check long-term consumers";

TITLE2 "Getting services 5+ under the previous grant" ;
proc print data= by_year ;
where prev_grant_yrs = "+5";
id username ;
var ipe_date app_year prev_grant_yrs ;
format ipe_date mmddyy8. ;

The second procedure, I wanted to see how consumers served in the current year were doing. Why do I have grantyear as a variable in the VAR statement when it is clear from the WHERE statement that only people from 2018 will be included?  Because the person who gets the output won’t see that WHERE statement. Just having “current year” in the title is not enough because next January someone looking at this might think it was for 2019.  I could have included 2018 in the title, but including it as a variable on the output both acts as a validity check for me and lets the user, my customer, know that the data are correct.

TITLE "Current year consumers" ;
proc print data=by_year ;
where grantyear = 2018 ;
id username ;
var grantyear status status_type ipe_year;
format ipe_year mmddyy8. ;

A few of the individuals served by this project did not have an Individual Plan of Employment. I wanted to see if the people missing an IPE just hadn’t had time to complete it yet or if they never came back and did it. An IPE is the first step in getting project services, so, if they had a missing date for a year or more than they had just dropped out. Again, the second title line tells the users what I’m trying to do here.

TITLE "IPE YEAR by Application Year";
TITLE2 "Note: Missing IPE consumers had ample time to complete IPE";
proc freq data=by_year ;
tables ipe_year*app_year/missing ;

So, you get the idea. Elegant code is nice, correct code is essential.

You know what is essential?

A young person once asked me,

“No offense, but why are your services so much in demand? It’s not as if there aren’t a lot of people who can do what you do.”

Okay, first tip, young people, when you find yourself saying, “no offense” you should probably just stop talking and then you definitely won’t offend anyone. Actually, I was pretty amused. It’s true that lots of people can do frequency distributions, if-then-else statements and cross-tabulations (although, in my defense, that’s not ALL I did on this project).

One essential skill is make your analyses easily understood by your co-workers and customers.

As a wise person once said,

“Mystery novels should be figured out. Code should be read.”


Wonder what else I’m writing these days?

You can get A Different Kind of Textbook, our family group text, for $2.99 as an ebook.  We definitely are a different kind of family.

Contacts : bios of family members

 

I’m back with another SAS Tip of the Day. Like a lot of people, I work with dates very often.

  • How many days is it from when a client applies to when he or she is determined eligible?
  • How many days until the average client is employed?

You get the idea. Inconveniently, in this particular case, I received the data in an Excel file and when I uploaded the file all of the dates were in character format. Here is a simple fix.

  1. Create an array of the character dates. Takes one statement. Note that you need that $ to indicate character variables.
  2. Create an array of your numeric dates. Takes one statement. Leave OFF the $ to indicate these are NOT character variables.
  3. Use a DO loop to fix any data problems, read into the new numeric variable and subtract 21,916. This is the number of days difference between the reference date for SAS and for Excel. You can read more about that here.
  4. Not required but good practice , since I was not going to use the character date values, I dropped those from the data set as well as the j subscript variable.

data fixdata ;
set fix1;
array chardates {9} $ birthdate date_app date_assess date_eligible date_ipe date_closed                    
date_employ date_int_completed date_last_contact
;
array numdates {9} date_birth app_date assess_date eligible_date ipe_date closed_date employ_date
int_completed_date last_contact_date
;
** Change all date variables to exclude invalid dates ;
** And from Excel to SAS date format ;

do j = 1 to 9 ;
if chardates{j} = "0000-00-00" then numdates{j} = . ;
else numdates{j} = chardates{j} - 21916 ;
end;
drop j birthdate date_app date_assess date_eligible date_ipe date_closed date_employ            date_int_completed date_last_contact;

 


I live in opposite world. I blog on SAS and statistics for fun and make games for a living. Check out Making Camp Premium. Learn about Ojibwe culture, brush up your math skills, learn more English and have fun. All for under two bucks.

Making Camp scene with buffalo, deer and rabbit

 

Contrary to appearances, this is not an abandoned blog. I’ve been super-busy with 7 Generation Games, where we released two new games and a customized app for a client this month! At the same time, I’m in Santiago, Chile piloting games for our Spanish language brand, Strong Mind Studios, you can read some of my blog in Spanish here.

Santa Lucia

Santa Lucia, next door

I decided to get back to blogging with a SAS tip of the day. Today’s tip is about the _character_ array.

If you didn’t know, now you know: All character variables are in the _character_ array

Often, I want to do something to every character variable in a data set, for example, set all of the values to upper case, so “diabetes”, “Diabetes” and “DIABETES” are not counted as three, different disabilities. Because I hate to expend unnecessary effort, I don’t want to list the names of every character variable and I don’t want to count how many there are because I’ll probably count wrong and then end up with errors.

Here is an example using the _character_ array.

data fixdata ;
set fix1;
array fixchars {*} _character_ ;
** Change all character values to upper case ;
do i = 1 to dim(fixchars);
fixchars{i} = upcase(fixchars{i}) ;
end ;

Just use an ARRAY statement, give your array a name and in the {} instead of the number of elements put a *  which SAS interprets as “the number of variables in the array are however many character variables there happen to be.

You might think you’d have to use the $ to specify that the _character_ array consists of character variables, but that’s kind of overkill and you actually don’t. It will work either way.

In my DO statement, I use the DIM function which will return the dimension of the array. That is, DO I = 1 to DIM(array_name) will do the statements from the first variable to however many happen to be in the array.

As you might guess, the UPCASE function returns the value in all upper case.


Have a kid? Like kids? Feel like a kid yourself? Check out our new game, Making Camp Premium, because maturity is over-rated.

 

 

Are you still re-ordering your factor pattern by sorting columns in Excel? Well, do I have a tip or two for you.

The cool thing about some large conferences is that even the things you hadn’t planned on attending can be worth while. For example, during one time slot, I didn’t have anything particular scheduled and Diane Suhr was doing a talk on factor analysis and cluster analysis. Now, I published my first paper on factor analysis in 1990, so I was mostly interested in the cluster analysis part.

After all of those years, how did I not know that PROC FACTOR had an option to flag factor loadings over a certain value? Somehow, I missed that, can you believe it?

I also missed the REORDER option that reorders the variables in the output from largest to smallest on their loading on the first factor, then in order of their loading on the second factor and so on.

It’s super-simple. Use FLAG = value  to flag loadings and REORDER to reorder them, like so.

proc factor data=principal n=3 rotate=varimax scree FLAG=.35 REORDER ;
var X1 x2 x3 x4;

You can see the results below. With a small number of variables like this example, it doesn’t make much difference but in an analyses with 40 or 50 variables this can make it much easier to identify patterns in your data.

output with reordered factors


I am backwards woman. I write about statistics and statistical software in my spare time and my day job is making video games. In my defense, the latest series of those games teaches statistics – in Spanish and English.

Aztech Games

The first time I went to SAS Global Forum, over 30 years ago, it was actually called SUGI (SAS Users Group International) and it was in Reno, NV. I was a just-divorced single mom and there was no such thing as a Working Mothers Room (which I noticed signs for here in Denver). I paid for a bonded sitter, on contract with the hotel, to come to my room and watch my toddler. That toddler is now CEO of 7 Generation Games. So, yeah, it’s been a minute.

Having been to these events over 30 years, not to mention a dozen or so at WUSS (Western Users of SAS Software) I thought I might need to put some effort into learning new stuff. My plan was to pick one product that I wanted to learn more about and make my own little personal strand on that. I picked SAS Enterprise Miner. I hadn’t used it a lot, and not at all lately, and I thought it might be a good choice to introduce students to a more data mining – a topic I just touch on in my multivariate statistics course.

The first session was 10 Tips Learned in 20 Years of Enterprise Miner, by Melodie Rush. Did you realize that the nodes in EM are in alphabetical order? No, me neither. I also didn’t know that the Reporter node could automatically generate documentation. If you are registered for the conference, you can download the presentation from the app, even if you didn’t attend.

There wasn’t another Enterprise Miner presentation in the morning, so I wandered over to The Quad and talked to Tom Grant in SAS Global Academic who told me that now you can download a file tiny little 26kb file and run SAS Enterprise Miner on the SAS server, whether you use Windows or Mac. I remembered something like this years ago but it was deathly slow and it sucked. Your other option was to install SAS EM on your desktop which did not exactly require sacrificing a goat, taking your computer apart and putting it back together with each piece bathed in goat’s blood – but it wasn’t all that much easier.

Well, times have changed !  I already had a SAS On-demand for Academics account, I clicked to get Enterprise Miner. A file called main.jnlp downloaded and when I double-clicked my Mac said it was from an unidentified developer – so I went into the preferences and selected to open anyway.

Then, I got a message my version of Java was out of date. I clicked to update it and was directed to download and update it.

Did that, clicked on the main.jnlp again and will you look at that …

 

SAS Enterprise Miner

The whole process took less than five minutes …

leaving me time to head over to the convention center and see what Scott Leslie and Tricia Aanderude have to say about health outcomes and visual analytics.

How fast does the EM in the cloud run, you ask? Well, I am in a hotel where the wi-fi is about the same as my apartment in Santiago – that is, somewhere mid-way between Santa Monica and North Dakota speed. It runs fine. I can see using it as a demo in a class or making instructional videos with it. Screens don’t pop up as fast as if it was a regular web page but so far the minimal delay is not enough to be annoying to students using it for analyses or teachers using it to demonstrate.

So far, today’s Enterprise Miner strand plan was a success , however, after that, things definitely did not go according to plan, but still great. I’ll have more on that in my next post.

Speaking of not according to plan … I’m giving a presentation at SAS Global Forum at 11 am , Tuesday April 10 in room 207. I’ll talk about the connections between SAS and building games with JavaScript, how I got from Santa Monica, California to Santiago, Chile and where SAS can take you in the most unexpected ways.

 

Next Page →