I did a random sample of presentations at SAS Global Forum today, if random is defined as of interest to me, which let’s be honest, is pretty damn random most of the time. 

Tip #1 Stalk Interesting People

I don’t mean in a creepy showing up at their hotel room way. If you see someone presenting either in person or referenced in twitter, blogs, etc. , check out what else that person has freely available on the web, in published proceedings, etc.

Let me give you an example that applies even if you are not into logistic regression. (You’re not? Feel shame.)

The first session I attended was a Super Demo in the exhibit hall which for some reason I don’t understand is always called the Quad. 

In a nutshell, logistic regression is usually 

  • binary, which is where I started out, modeling mortality studies, you’re either dead or alive
  • multinomial, that is, multiple categories, like college major or religion or 
  • ordinal , like someone is a subscriber, contributor, editor or administrator on a group blog, which are progressively higher levels of involvement

What if the data fit the proportional odds model for some of the explanatory variables and not others? You can do a partial proportional odds model. 

Line plots on slide
Graphing your data is a great way to see if the proportional odds model makes sense. You can see that it does for the variable on the right, but for the left, not so much.

Unfortunately, the super demos do not have a paper published in the app or proceedings, however, the presenter, Bob Derr from SAS mentioned he had presented a paper on this topic in 2013 (way to play hard to get, Bob – not!)

Paper reference on slide (also below in blog)

I skipped the next presentation to read it (and to write this post). If you are at all interested in multinomial and ordinal logistic regression, you should, too. You can find it here in the SAS Global Forum 2103 proceedings. http://support.sas.com/resources/papers/proceedings13/446-2013.pdf

It’s an outstanding paper and I am going to require it for my course next year. I think the students will find it far more accessible than some of the readings we have been using. They don’t complain loudly, but I know, I know. 

Tip #2 Read the Documentation (No, seriously, keep reading)

People who answer comments with LMGTFY (let me Google that for you) or RTFM (read the fucking manual), just so you know, that quit being funny around 1990. However, SAS documentation really is a treasure trove. It’s not just SAS, the same could be said about jQuery documentation or the WordPress Codex but we’re not talking about those today, are we? Please try to stay on topic. 

The SAS documentation runs many, many thousands of pages. It’s far better and more detailed than you would think. Let me give you an example a very helpful person named Michael pointed out in the Quad (what the hell is it with that name?) today. As I’ve mentioned several times lately, my students often struggle with repeated measures ANOVA. He suggested checking out the page on longitudinal data analysis.

http://support.sas.com/rnd/app/stat/procedures/LongitudinalAnalysis.html

It gives four different procedures (none of which are GLM, I noted, but that’s a discussion for another day). 

Related to that, I recommend when you are learning procedures just running some of the code examples. For example, here is one for repeated measures with PROC MIXED. http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_mixed_examples02.htm&docsetVersion=15.1&locale=en. (Yes, I really do have that on my mind lately)

Think about this, though. Once you graduate from whatever your last degree turns out to be, you don’t have anyone checking your work and telling you if it is right or not. You just write your code and hope for the best. That sucks, huh?

When you are learning a new procedure, you can write code using the data shown in the SAS documentation and see if your results match. Like an answer key for life! I always wanted one of those.

Since the last few posts detailed errors in repeated measures with PROC GLM , I thought I should acknowledge that people seem to struggle just as much with PROC MIXED.

Forgetting data needs to be multiple rows

This is one of the first points of confusion for students. When you do a PROC MIXED, you need multiple records for each person. So, thinking back to my previous example with three time points, with PROC MIXED, and two options for treatment, my dataset needs to look like this:

SubjectExamTreatmentScore
1PreTalk43
1PostTalk46
1FollowTalk45
2PreDrug39

With GLM, you’d have 3 variables, named Pre, Post and Follow, for example (you *did* read the last post, right?). In PROC MIXED, your dataset has to be structured so that you have one variable, in this case, named “exam”, and it takes on one of three possible values.

Let’s start with the simplest case. I’d like to know, just like before, if there was a change from pre to post-test and if it was maintained at follow-up. In other words, my question is, “Was there a difference between the pretest and post-test and a difference between pretest and follow-up six months later?” I am not particularly interested in the post-test/ followup difference as such. Here is one way to code it:

Proc mixed data = example ;
class subject exam;
model score = exam ;
random subject ;
contrast “pre vs post” exam 1 -1 0 ;
contrast “pre vs follow” exam 1 0 -1 ;


The PROC GLM code from this post will give you the exact same results as the code above, but only if you have your data structured so that you have three variables instead of three records for each person.

I have a lot to say about CONTRAST statements, which I love, and random effects, about which I am neutral, and nested effects, that are not relevant to this example but could be. However, I am trying to not work past 9 pm and it’s already an hour later so … until next time.

Also, if you’re at SAS Global Forum, be sure to meet up and say “Hey!”

This is my day job …

Check it out. I make games that teach math, including, of course, statistics. Play AzTech: Meet the Maya – the only statistics game with Honduran fruit bats.

As I said in my last post, repeated measures ANOVA seems to be one of the procedures that confuses students the most. Let’s go through two ways to do an analysis correctly and the most common mistakes.

Our first example has people given an exam three times, a pretest, a posttest and a follow up and we want to see if the pretest differs from the other two time points.

proc glm data = example ;
model pre post follow = /nouni ;
repeated exams 3 contrast (1) /summary printm ;

Among other things, this will give you a table of Type III Sum of Squares that tells you that you have a significant difference across time. It will also give you contrasts between the 1st treatment and each of the other two.

You can see all of the output produced here.

This is using PROC GLM and so it requires that you have multiple VARIABLES representing each of the multiple times you measured people. This is in contrast to PROC MIXED which requires multiple records for each subject. We’ll get into that another day.

One thing that throws people all of the time is they ask, “Where did you get the exams variable?” In fact, I could have used any valid SAS name. It could have been “Nancy” instead of “exams” and that would have worked just as well. It’s a label we use for the factor measured multiple times. So, as counterintuitive as it sounds, there is NO variable named “exams” in your data set.

Let’s try a different example. This time, I have a treatment variable. I have administered two different treatments to my subjects. I want to see if treatment has any effect on improvement.

proc glm data =example ;
class treatment ;
model pre post follow = treatment/ nouni ;
repeated exams 3 /summary ;

The fixed effect does *not* go in your REPEATED statement

In this case, I do need a CLASS statement to specify my fixed effect of treatment. A really common mistake that students make is to code the REPEATED statement like this:

repeated treatment 3 /summary ; *WRONG! ;

It seems logical, right? Why would you use a completely made up name instead of one of your variables? If you think about it for a minute, though, treatment wasn’t repeated. Each subject only received one type of treatment.

When you are asking whether one group improved more than the other(s) what you are asking is, “Is there an interaction effect?” You can see by the table of Type III Sums of Squares produced below that there was no interaction effect.


A significant effect for the repeated measure does not mean your treatment worked!

A common mistake is to look at the significance for the repeated measure and because a significant change was found between times 1 and 3 to say that the treatment had an effect. In fact, though, we can see by the non-significant interaction effect that there was not an impact of treatment because there was no difference in the change in exam scores across the levels of treatment.

There are a lot of other common mistakes but I need to go back to work so those will have to wait for another blog.

When I teach students how to use SAS to do a repeated measures Analysis of Variance, it almost seems like those crazy foreign language majors I knew in college who were learning Portuguese and Italian at the same time.

I teach how to do a repeated measures ANOVA using both PROC GLM and PROC MIXED. It seems very likely in their careers my students will run into both general linear models and mixed models. The problem is that they confuse the two and the result is buggy code.

Let’s start with mistakes in PROC GLM today. Next time we can discuss mistakes in PROC MIXED.

Let’s say I have the simplest possible analysis – I’ve given the same students a pre- and a post-test and want to see if there has been a significant increase from time one to time two.

This will work just fine:

proc glm data =mydata.fl_pre_post ;
model pretest posttest = /nouni ;
repeated time 2 ;

Coding the repeated statement like this will also work

repeated time 2 (1 2) ;

So will

repeated time ;

It almost seems as if anything or nothing after the variable name will work. That’s not true. First of all,

repeated time 2 (a b) ; IS WRONG

… and will give you an error – Syntax error, expecting one of the following: a numeric constant, a datetime constant.

“Levels gives the number of levels associated with the factor being defined. When there is only one within-subject factor, the number of levels is equal to the number of dependent variables. In this case, levels is optional. When more than one within-subject factor is defined, however, levels is required,”

SAS 9.2 Users Guide

So, this explains why you can be happily using your repeated statement without bothering to specify the number of levels for a factor and then one day it doesn’t work. WHY? Because now you have two within-subject factors and you need to specify the number of levels but you don’t know that. This is why, when teaching I always include the number of levels. It will never cause your program to fail, even if it is unnecessary sometimes.

One more cool thing about the repeated statement for PROC GLM, you can do a planned contrast super easy. Let’s say I have done 3 tests, a pretest, a post-test and a follow-up. I want to compare the posttest and followup to the pretest.

proc glm data =mydata.fl_tests ;
model pretest posttest follow = /nouni ;
repeated test_time 3 contrast (1) /summary ;

What this will do is compare each of the other time points to the first one. A common mistake students make is to use a CONTRAST statement here with test_time. This will NOT work, although it will work with PROC MIXED, but that is a story for another day.

I cannot believe that it’s been over two months since I’ve written a post. That is the longest I’ve gone in the ten years I have been writing this blog. I read somewhere that the average blog has the lifespan of a fruit fly – after 31 days most people give it up.

That seems to lead to a cottage industry in taking over dormant sites. This site isn’t exactly stagnant even when I am not blogging because people use it for reference.

I started getting emails about “a somewhat embarrassing page”. At first I was aghast that hackers had redirected clients to a porn site.

Fortunately, no, it was just a failed re-direct attempt that ended up breaking a link so you get a 404 page that literally says, “Well, this is somewhat embarrassing.”

The Invisible Developer spent a good bit of time while we were in New York deleting malware from the site. At first, I was feeling very guilty because I thought my cavalier attitude toward security issues with PHP was the reason, but we did clean up most of the problems pointed out in those comments years ago, so that wasn’t the culprit. I should admit here that Paul and Clint were right and I was wrong. Although we have no data of particular value to anyone on this site, hackers are interested in re-directing sites to get links and for other nefarious purposes.

As near as we can tell it was a plugin on another site that was hosted by us that had not been updated in years. We had several more or less abandoned domains of content we had created for clients over the years. They paid us, we created the content for their course or other purpose, and then just left it up. Kind of like all of that stuff you have in your closets that you just shove to the back because you have room.

That’s all cleaned up now. The site, not the closets. Those are still chaos. For all I know, there is an entire new civilization developing in that closet under the stairs. Or maybe Harry Potter lives there.

As for me, I have been teaching two courses during the past 3 months, where I usually only teach one in a year. After landing back in the U.S. in February, I have been criss-crossing the country. Since the beginning of the year, I think I’ve been in 11 cities, 3 states and 2 countries but I may have forgotten a few.

We also released two new games, Fish Lake Adventure , for the iPad, and a new version of AzTech: Meet the Maya, also for the iPad.

Get it in the app store

My lovely daughter, Ronda, headlined this show called Wrestlemania, which is why we were in New York. We have chosen very different careers , my daughters and I. The Perfect Jennifer, or, as she likes to call herself, “the normal one”, is a middle school history teacher, in case you were wondering. The Spoiled One is currently doing a semester abroad in London. She will be back in the U.S. next month and needs a summer internship. Her talents include Instagram, shopping and soccer. If your company doesn’t need any of those skills, she’s also a good writer. Darling Daughter Number One, is 7 Generation Games CEO, she’s also a good writer, having co-authored a New York Times best seller, but she’s not looking for an internship.

So, anyway, I am back, well for a couple of weeks. Next, I head to SAS Global Forum in Texas for a few days to give a couple of presentations on biostatistics and career advice . You’d think my career advice might be to study biostatistics but, maybe not…

Then, I come home for a couple more weeks and am off to a Tech Inclusion conference in Melbourne, Australia. My talk there is going to be, well – different than most – and that’s all I’m going to say about that.

So, now, I’m back to blogging. I have a few things to say about the infinite number of ways people can incorrectly code a repeated measures ANOVA , subdomains and number needed to treat. Between the next game, new website, two conferences and two grant proposals all coming due before June, I’m sure I’ll fit it in there somewhere.