When I was an undergraduate, I argued with my professor about the management theory that said it was not necessary to understand the business to be an effective manager but only to understand management. The example he gave (and that I never accepted) was of a carriage in the park. The driver of the carriage has never been the horse and he cannot pull the carriage like the horse. He just needs to understand the horse’s motivation and either use the carrot or the stick to get the horse to do the job.
Here is where that analogy falls down – the carriage driver knows exactly where he wants to go, how fast he wants to go, where he wants to turn and so on. This is not so true with managers of programmers.
I was thinking about this topic today as I rewrote a part of our game. We had originally used surveymonkey, because we had a deadline to meet to deliver and test a prototype. That being done, we sat down to re-write a portion of the testing to make it better.
The design is as follows:
- The student is given a problem of moderate difficulty.
- If incorrect, the student gets the easiest problem in the domain (Q1)
- If the student gets the easiest problem incorrect, the student exits the test and gets sent to study. There is no point in attempting several more difficult problems if he or she has failed on the medium and easy problems.
- If correct at the beginning problem, the student gets sent to one of slightly greater difficulty.
- If the student gets the more difficult problem correct, the next is of even greater difficulty, and so on
- Any time the student misses two consecutive problems, the test exits.
- The student must get five problems correctly, increasing in difficulty to the last problem. When it becomes mathematically impossible to get five problems correctly, the test exits.
- If the student answers five problems correctly, in increasing difficulty, including the last problem, it is a success and the student can go on to the next part of the game.
There are three advantages to this:
- When we add new tests, it will be very fast and easy.
- Documentation of all of the test files is easier, because all that is left in those files is just some forms that call the function in this one scoring file.
- I only need to test the code and get it working once.
We only have about 40 tests at this point but within the next few months we will have hundreds. The extra time I spent today to rewrite the code from the first test that worked to be more generalizable will pay off greatly. However, if I had a non-programming manager, how would he or she know that I unless I explained it? To what extent is it reasonable to expect that the people you supervise to you should be explaining to you what they did and why? Would I even have thought of explaining it? Would the supervisor have even thought to have asked? I completed what needed to be done to get the tests to work as diagrammed, within the time frame, but one way of doing it was far superior to another.
I’m not complaining mind you. I did the original design because I knew we had a short deadline, and then I did the re-design and coded it. This isn’t a rhetorical question. I really am wondering about this, because I have worked for pretty good people almost all of my life, including a couple of really terrific bosses. Both had technical backgrounds, but while one had a degree in computer science, the second came from mechanical engineering and couldn’t write an IF statement. What the latter did, though, was find the technical staff with the very best reputations, recruit them to come work for him and spend much of his time sailing or playing golf. I remember his recruitment pitch went something like this:
Everyone says you are really, really good. I figure most technical people in this company are under-valued so I’m on a mission to get the very best people to work in my group. They make me look good and I will do whatever I can to make them happy – higher salary, travel budget, better equipment – anything short of whisky and hookers, you name it and I’ll get it for you. Well, you’re a woman so I guess you wouldn’t want hookers, would you?
He was very successful for a long time and moved up in management quite rapidly. I eventually moved away and lost track of him.
Yet again today, I spent a while trying to figure out an error that had me smacking my forehead at the end.
Here was the problem, I am testing a fairly simple database – adds records, updates, selects, does some error checking as you enter the data, typical stuff. To test it, I have a small table with a few records.
I enter the value for record #1 and it retrieves the data fine and everything looks perfect.
I enter the value for record #2 and it retrieves the data for record #1 ! Obviously, I have, in my testing, hard-coded the value for record #1, right? Wrong.
I have this:
$select_query = “SELECT * FROM clients WHERE id = ” . $id ;
I used this on the test data set previously and it was perfectly fine. I had used this for a different application and it worked fine. Finally, I tried a third record and found the error.
Here is what happened:
1. The previous application where I had used this had numeric ID values 111, 112 etc.
2. The current client has no specific requirements for an ID except that it must be unique. Some employees enter customers as 111. Others enter AAA or Bob. IDs are permanent and cannot be re-used. (Hey, this was not MY idea!)
3. If a customer is dropped for any reason and becomes inactive, then next year becomes active again, they are counted as a “new” customer, for the purpose of recording how many new customers were added this year. However, we still want to have some way of matching them with their history. So, they get a new record with -01 added to their ID. If they drop out and come back into the system again, the next time would have a -02 and so on.
It just so happened that the first record had an ID of 1123 and the second was 1124-01
Since $id was being read as a numeric variable, 1124-01 resolved to 1123 . If not for the coincidence of these two being entered consecutively, I probably would have spotted the error much quicker. When I got an error again when I entered AAA for an id, but not when I entered 1111, the problem was obvious. I changed my code as below, and life was good again.
$select_query = “SELECT * FROM clients WHERE id = ‘$id’ “;
Now I can put this aside for a while and go back to working on the game.
In my defense, the actual program is longer than this …
// Create variables ;
$id = strtoupper(trim($_REQUEST['id'])) ;
$application_date = $_REQUEST['yr_of_apply'] . “-” . $_REQUEST['month_of_apply'] .”-”. $_REQUEST['day_of_apply'] ;
$assessment_date = $_REQUEST['yr_of_assess']. “-” . $_REQUEST['month_of_assess'] .”-”. $_REQUEST['day_of_assess'] ;
$eligibility_date = $_REQUEST['yr_eligible'].”-”.$_REQUEST['month_eligible'] .”-”. $_REQUEST['day_eligible'] ;
$ipe_date = $_REQUEST['yr_ipe'].”-”.$_REQUEST['month_ipe'] .”-”. $_REQUEST['day_ipe'] ;
$notify_rights = $_REQUEST['notify_rights'] ;
$vocational_goal = $_REQUEST['vocational_goal'] ;
// Get user info ;
$result = mysql_query(“UPDATE clients SET notify_rights = ‘$notify_rights’, vocational_goal = ‘$vocational_goal’ ,
application_date = ‘$application_date’ , assessment_date = ‘$assessment_date’
WHERE id = ‘$id’”)
or die(mysql_error()) ;
echo “<p>Your record was updated successfully.</p>” ;
So, it didn’t work. All I wanted to do was connect to the SQL database, find the client’s id and update that record with the application information.
First problem I found was that I had tested this with a much smaller file with just a few columns and in my UPDATE statement it still had the table ‘test’ instead of ‘clients’. I was getting an error that said there was “an error in my SQL code”. Which is true. Since the table did exist, I wasn’t getting an error saying it wasn’t found. Ok, I fixed that
Second problem, the dates were actually entered in three different fields to make it easier for error checking – your year has to be within the current fiscal year, month between 1 and 12, no entering April 31. However, I needed those to BE dates for analyses we plan later. So, I just created the date variables. Problem solved.
Third problem, I realized I really did not want the password and other information required for connection in this program where anyone could see it. There is no personally identifiable information in here, but it’s just a bad habit to have your passwords and other data hanging out there. Hence the statement to require the connect.php script .
Fourth problem, the ID variable is not a number. It can be something like ABJ-001 , so it doesn’t match if it the case is not the same or if there are spaces. The strtoupper and trim functions fixed that.
Fifth problem, some of the dates were blank. I looked at them over and over to see if I had mismatched quotes . I even actually deleted and re-typed the statements. Nope, still, some of the dates were there and others were blank. Maybe they weren’t date format in the table definition? Nope, I checked and the missing dates and the dates that updated properly were all defined as the same format.
Well, maybe you spotted it already … As I said in the first problem, I had originally created and tested the script with a table named ‘test’ that had just a few columns. When I switched to my client table, I had forgotten to add all of the columns to the UPDATE statement. The problem wasn’t in creating the eligibility and ipe date fields. The problem was that I left those fields off of the update statement so they were never getting updated. Everything went through fine, I got a message saying my record was updated – because as far as PHP was concerned, it wasn’t an error. Maybe I only wanted to update some of the columns.
The moral of the story is this: Sometimes the problem is the code you DIDN’T write.
I was going to use SAS Enterprise Guide 4.3 with SAS On-Demand to do my mixed model analysis, but it did not quite work out.
First of all, if like me you are used to doing PROC GLM where each subject is one record, you have to change your dataset to be one where each score has one record. You can do this in SAS Enterprise Guide using the Query Builder Task but I frankly find this more trouble than it’s worth. Right click on your dataset and from the drop down menu select “PROPERTIES”. This will give you the dataset name SAS has assigned to your data set. Use that name in your SET statement. The code below creates two data sets named pretest and posttest. It renames the pretest and posttest scores to the same variable “testscore”
data pretest ;
set SASUSER.QUERY_FOR_MATCHEDMATH_SAS7BDAT ;
rename pretotal = testscore ;
testtype = “pre-test” ;
data posttest ;
set SASUSER.QUERY_FOR_MATCHEDMATH_SAS7BDAT ;
rename posttotal = testscore ;
testtype = “posttest” ;
This concatenates the two data sets to make one.
data matched ;
set pretest posttest ;
For some reason, I could not get the SAS Enterprise Guide windows to let me do the nested effect, so I finally gave up (okay, I spent about 10 minutes trying, because I was busy) and coded it like this:
proc mixed data= matched ;
class group testtype username ;
model testscore = group testtype group*testtype / ddfm = kr ;
repeated testtype / type = un subject= username(group) ;
This identifies three variables, group (control or experimental), testtype (pre or post) and username as categorical.
The dependent variable is testscore and I wanted to test for main effects of testtype, group and an interaction effect of testtype and group. I requested the Kenward Rogers method of calculating the degrees of freedom.
Testtype is a repeated effect. My subjects are identified by username and usernames are nested within groups, that is, each of the users was in either the experimental or control group.
Group = the students who played our math game or not (check it out here, it’s cool). Testtype is, obviously, pre or post, and you can see that there is a significant difference, even with the modest number of subjects in our pilot study.
Honestly, looking at how short the code really is and thinking about how much faster the SAS Web Editor is and the fact it works on the Mac as well as PC, I’m thinking it may not be worth the trouble of using Enterprise Guide. In this case it certainly wasn’t, since all I did was open the PROGRAM window and type.
We’re waiting by the phone, um, computer to find that our Kickstarter project is (we hope!) approved. In the meantime, our Chief Marketing Officer asked me why there was a need for documentation, which is one of the expenses that would be funded from the crowd.
It’s like this – we started writing a game and we had certain deadlines to meet. We’d promised to have a playable game in the school by October. And we did. And the kids really liked it. And their math skills improved.
In the process, though, we did everything that had to get done to meet those deadlines and other things had to be put off until later.
Version control – later. User’s manual – later. FAQ – later. Anything that wasn’t promised was put on a backburner while we kept our promises.
We have scripts that validate the answers students give to each question and then route them to the appropriate place – either back to play the game or to a site to study whatever math concept they missed. I know what those scripts are named and what they do – but it’s not written down anywhere. The Rocket Scientist has written a lot of C# code for the 3-D world and I know where the latest version is – but it’s not written down anywhere. I also do NOT know where the previous version is (although I could probably guess, or I could go upstairs and ask him).
My point is that we can’t go past the current level of complexity without documenting everything we have done up to this point or we are going to be royally screwed because:
- The further in time we get from when we wrote the first few game levels, the more likely we are to FORGET where stuff is and have one hell of a time fixing any bugs that come up because we won’t be sure what exactly File X does or where the script that does Y is saved.
- As we write new levels, we probably want to re-use code that does certain things like making the character do a little dance to pow-wow music, but we won’t quite remember why we wrote that bit in the middle, so, best leave it in – this is how one ends up with spaghetti code that is impossible to really maintain as no one knows exactly what it does.
- We are getting to the point of scaling up. This means bringing on another programmer or two. That future person is going to need to know where stuff is and what it does.
The truth is that we need to stop and write stuff down before we go any further. It’s kind of like when you are in college and you forego cleaning your apartment because you have to work and study for finals. Well, after finals are over, there is a point where you need to buckle down and clean your place before the fungus on the dishes in the sink evolves, forms its own government and evicts you.
What will anyone who donates to our eventual (we hope!) Kickstarter project get out of this? In front-end visible terms, they will get an on-line, downloadable user’s manual in pdf and html that explains how to play the game, all the levels, has an index to find answers to questions. We think you could probably figure everything out from playing the game, but some people like to read, and lots of people get stuck some times. We’ll have a Frequently Asked Questions page up on our site, and answers (since just the questions aren’t too helpful), for those people like me who don’t want to read a whole manual but just want an answer.
We’ll have a technical support wiki that you won’t see directly but which will help you indirectly because if you do run into a question not covered on our FAQ or a bug that needs to be fixed, one of us will be able to answer it a whole lot faster and better than, “Your guess is as good as mine” or, the one that always drives me crazy, “Well, it works on my machine.”
The other way documentation will help our supporters is that we really won’t be able to progress much farther nearly as fast without it – again, think about your post-finals trashed apartment. Eventually you realize that you are spending more time looking for your keys and cell phone than it would take to clean the place up.
So…. that is why we need funding for documentation.
I have colleagues who hate Excel with a passion. Why, they demand to know, would ANYONE use Excel for statistics when there are so many options that are so much better? Actually, I don’t find the Excel add-on for statistics that terrible, but that isn’t even the topic of this post.
I use Excel because sometimes:
- The data sent by the client is in Excel
- I can use Excel to answer the question in less time than it takes me to open another application.
Here is an example from today, the client needed to know for a few categories the average weekly income. They also were concerned that the employees doing data entry might have, in some cases, inadvertently entered weekly or monthly income instead of hourly. Relevant fact – the organization did not have any CEOs making thousands of dollars per hour.
First, find the averages:
This takes the average of all cells in that column from the second to the 139th and then multiplies that average hourly wage by 40. Click on the corner of that cell and drag across to get the average for each of the columns.
Second, find the standard deviation
Click on the corner of that cell and drag across to get the standard deviation for each of the columns.
Now, if your standard deviation is something like $2 or $4 per hour, you’re fine. If it is $43 per hour, then someone entered the weekly salary for that column. For the one column where that was the case, I sorted it and, of course, was immediately able to spot the person with the incorrect value.
The whole process took me about a minute to give them the means for the different categories and say, “Oh, by the way, record 47 was incorrect, I fixed it.”
Obviously, no one sends me a data set just to get the means for a few columns and this was just one of 60 different questions they needed answered. The objection to Excel I have heard is that is all some people know and so they use it for everything – “When your only tool is a hammer, every problem looks like a nail.”
That may be, but sometimes, you really DO just need to pull out a nail.
And that realization many years ago is how I overcame my prejudice against Excel.
Go to your local users group. If you don’t know if you have a local users group in your area, check the sascommunity.org page that lists bunches of them. There are six in California listed on their site and I heard of two others that started very recently that aren’t listed.
LABSUG is the Los Angles Basin SAS Users Group and it is pretty typical. It only meets once a year, organized by the FABULOUS Kim Le Bouton. If you live around LA, you should go. It is super-cheap at $35 for early registration, you get to meet about 100 people who are interested in SAS and statistics and the speakers are good.
If you are a hyper-critical type of person, well you can find something to criticize, but don’t sit next to me. Since there is only one session at a time, you may find some sessions too advanced for you and others too basic. I have two suggestions in this case:
- Try to benefit in some way. Even if it is way too advanced, you can probably glean something. Then, when a year or so down the road you run into that concept or procedure again, it won’t be completely unfamiliar. If it is too basic, if you have forgotten more SAS/ statistics than most people will ever know, there may be something in there you have forgotten. If you are that advanced, you probably present or teach a lot yourself. Personally, I’m always on the look out for good tips, from references to visuals to organization, that help get a point across and keep the audience from falling asleep.
- Do something else during that talk. Come late or leave early – the agenda is published in advance. If you need to step out and check your text messages, send an email to the office or catch up on work, no one is going to get upset. It’s not middle school. You don’t have to go to every class. We’re all adults and understand everyone has multiple responsibilities.
On the flip side, although not every talk will meet everyone’s interest or need, almost everyone will find at least ONE topic that is useful. It’s something for everybody and the great advantage of local users groups is their accessibility to everyone. You don’t need thousands of dollars in your travel budget.
What you missed, in tweets
(Not only is this an extremely lazy way to do a blog post, but it also accomplishes the main purpose of this blog which is to remind me of stuff I thought and then forgot. For example, the COMPARE statement and looking up what a segment is.)
In reverse chronological order…
Ods HTML gpath=”something” – will save your graphs in the specified directory. nice
I’m thinking of making a bubble chart that looks lie soap bubbles because
Ods journal style good for graphs that are going to be printed in black and white
With Compare statement with sgscatter you can, for example, have side by side plots of your experimental and control groups
I created each of these plots with just 3 statements – & 1 of the statements was “run” – Lora Delwiche
Lora Delwiche just made everyone in the room a believer in SGPLOT
Renato at LABSug worth knowing the difference between if-then & SELECT statement
Take-away from GTL presentation – you can make any kind of graph you can imagine- whether you should or not is a different issue
If I was doing a talk on graphics I would interleave program statement slides with slides of what this does on the graph
Proc gproject projects data into a Cartesian coordinate space – who knew?
I don’t know what segment does in the maps data set. Must find out
I think if I needed a graph as fancy as some of those in the GTL examples I’d have an artist draw it vs use SAS
I understand the R comment – GTL looks more like “real programming” that typical SAS code. not sure that is good
Interesting population pyramid example at LABsug comparing population distribution of Qatar & US convinced me of use of GTL
Just overheard someone comment that GTL looked like R
Use proc sgrender to put data and template together
GTL = graphics template language to make spiffy graphs A reason to get SAS 9.3
Ods graphics editor – stand alone free install from SAS website? Must check on this
Off to LABSUG I’d call my mom on the drive in but if she hears from me at this hour she’ll wonder who died
If you want to learn about LABSUG, you can find out more on the sascommunity.org site Los Angeles Basin SAS Users Group page
If you have a mad desire to do logistic regression with SAS On-Demand with SAS Enterprise Guide, here is a movie that shows how to do it. It is a .avi file so you may want to just download it and run it on your PC.
Here is why the movie is not all that good — Grrr – SAS On-Demand does not run on a Mac. Unfortunately, Quicktime does a screen capture video on the Mac version but the Windows version only the professional version does that. I used Debut Video Capture on Windows, which I actually paid for. I made one movie, made a mistake in the middle of it and the guinea pigs were raising a ruckus because they wanted parsley. You could here them squeaking all through it. So, the second try, when I was doing logistic regression, the sound track was about 15 or 20 seconds ahead of the video! So, as you were listening to the video, you were seeing something different on the screen! That was annoying. So … this third video is a bit sparse.
I also ran tasks before I did the video so I did not have to wait forever for them to run. I ran it on this old, old windows machine we use for testing because I did not want to take the time to re-boot my shiny new 12GB RAM Mac into Windows. That was stupid. It would have been quicker to re-boot the Mac than re-do the movie twice. Also, my Mac has a wired Internet connection so it is much faster all the way around.
Lessons learned today in addition to logistic regression.
1. When using SAS On-Demand, use the fastest computer
2. When using SAS On-Demand, use the fastest Internet connection
3. Get the Windows version of Quicktime to replace Debut Video Capture (there are other reasons I don’t like it, chief among them being the default format is .avi and if you change it to some other format, it does NOT remember that)
I have had SO much more of a positive experience with the SAS Web Editor – runs on a Mac, faster, no install problem – that I wonder why I ever used SAS Enterprise Guide to begin with. Actually, if you are running it in your office or home with a good Internet connection on a good computer, it’s not too bad. Not only is the lack of programming attractive to many students but from a learning statistics standpoint the fact that it kind of is in your face with the “Dependent variable”, “Classification variable”, “Quantitative variable” distinctions is kind of nice.
Most of all, though, I remembered how clunky SAS Enterprise Guide for the desktop was when it first came out and now I find it very useful, so I am HOPING this will be the direction for SAS On-Demand EG as well. Personally, the single biggest improvement I hope for is that it starts to run on the Mac. The simplest way for that to happen would be if it just ran as a client like the web editor does. Here’s hoping.
Here we have analysis of open data using free software with – uh, SAS?
Click the links below and watch the videos. Seriously. They are too large to embed in the post. Sorry.
Yes, you might think of SAS as the choice of multinational corporations with unlimited software budgets. You now have two options, if you are a student or faculty member, and those are either
- SAS web editor – which is fast and runs on both Windows and Mac (hurray!) but does require more knowledge of programming, OR
- SAS Enterprise Guide – which is MUCH slower in the typical university environment where it seems to be an accrediting body requirement that your wireless speed has to blow, but EG doesn’t require much programming, is much more pointy and click-y, which makes some people very happy. It also includes a process flow diagram which is like a security blanket for people in management who have some weird kind of Freudian attachment to Microsoft Project imitators.
If you haven’t seen the new SAS web editor, I highly recommend you take a peek at this video on how to do a regression analysis the SAS web editor.
I did it for my class but it nicely demonstrates how easy it is to get a quick view of your data with the web editor. This is a decent size data set of actual data from the 2007 TIMSS study. I did reduce it down to a few dozen variables. It’s really good because it has actual problems like user-written formats, missing data, non-obvious coding. This is good because my biggest complaint in hiring new graduates is they have only used data in the back of the statistics textbook and they have no idea how to work with data collected from actual human beings.
You can compare this first video to doing the same analysis with SAS On-Demand for SAS Enterprise Guide, another video I made for the same class. You can see that SAS Enterprise Guide takes longer and this was recorded in my office where we have an extremely good Internet connection. I was NOT using the wireless which seems to be pathetically slow at every educational institution where I have ever been. One of the reasons that I record these for the class is that with SAS Enterprise Guide it just takes so-o-o long. As I say on the video, I could sing Christmas carols while waiting for the results, if I could sing.
So, this semester I have used both options, but presuming it gets out of beta and is available next year, I’m thinking about using the SAS web editor for my next class. Even though it does require some programming, I think the increase in speed, use across all operating systems and lack of problems in installation make up for it.
Anyone else who has used one or both of these, please chime in with your opinions.
I may expand this into a series on software products in general. Years ago, I wrote a post on the similarities between the Rocket Scientist and SAS Enterprise Guide. Neither made a great first impression, both revealed their brilliance over time, and I am still with both lo these many years later.
Experiencing both SAS Web Editor and SAS On-demand this semester, I have to say this …. (well, I don’t HAVE to, but that’s never stopped me before) …
SAS On-demand is like a guy I dated who was on the Olympic team. You’d look at him and think,
“Gee, he should be a winner. He’s certainly got good genes and when he shows up, he looks nice.”
He was a nice guy, too, but just thick as a brick. Just like SAS On-Demand for Enterprise Guide, he meant well, but I couldn’t count on him to keep up. I don’t have to worry he’ll be reading my blog and get his feelings hurt because that sentence involved both “reading” and “blog”. Actually, I could have stopped at the word, “reading”. So, it came down to this,
“You know, I think we should see other people. We’re just not compatible. I’m really interested in a faster lifestyle.”
The positive thing about dumping people who are really dumb is that they don’t realize you’ve dumped them until a couple of months later, if ever. Just like my old boyfriend, SAS On-demand for Enterprise Guide might turn out okay and it is certainly fine for some people. I happen to know he married a very nice, not-too-bright woman and they have charming, athletic, attractive, moderately intelligent children who will probably run for Congress some day.
SAS Web Editor is like the promising young men I’d like to introduce to my daughters - intelligent, quick-witted, good-looking and seems likely to have a very good future. Really, it looks exactly to be SAS running on Linux with just a shiny front-end. Kind of like that guy getting his Ph.D. in computer science, but he’s cute.
(Of course, none of said daughters are actually getting introductions to men from me. They have told me in no uncertain terms that they don’t need my help in getting dates. Random picture of sample daughter presented below as support for this hypothesis. Also, a camel family.)
If you would like to compare your most (least) favorite product to a person of the opposite gender, please feel free to chime in.