I’m giving a talk on Preparing Students for the Real World of Data at SAS Global Forum next month.

You’d think 50 minutes would be long enough for me to talk, but that just goes to show you don’t know me as well as you think you do. One point made in the template for papers is that you should not try to tell every single thing you know about the DATA step, for example, because it will bore your audience to death.

Random Tips That Didn’t Make it Into the Paper

1. CATS removes blanks and concatenates

While I did give a few shout outs to character functions, it was not possible to put in every function that is worth mentioning. One that didn’t make the cut is the CATS function.

The CATS function concatenates strings, removing all leading and trailing blanks.

Let’s say that I want to have each category renamed with a leading “F” to distinguish all of the variables from the Fish Lake game. I also want to add a ‘_’ to problems 10-14 so that when I chart the variables 11 comes just before 12, not before 2 (which is what would happen in alphabetical order). So, I include these statements in my DATA step.

IF problem_num IN(11,10,12,13,14) THEN probname = CATS(‘F’,’_’,probname);
ELSE probname = CATS(game,probname) ;

Now when I chart the results you can see the drop off in correct answers as the game gets more difficult.

Graph by variable name

2. Not all export files are created equal

Nine of the ten datasets I needed I was able to download as an EXCEL file and open up in SAS Enterprise Guide. It was a piece of cake, as I mentioned last time. Unfortunately, the third file was download from a different site and it had special characters in it, like division signs, and the data had commas in the middle of it. When I opened it up in SAS Studio it looked like this.

Ugly dataFixing it was actually super simple. This was an Excel file. I simply did a Replace ALL and changed the division signs to “DIV” and the commas to spaces. The whole thing took FIVE lines to read in after that.

3. Listen to Michelle Homes and know your data

filename fred “/courses/abc123add/sgf15/sl_pretest.csv ” ;

Data pretest keyed;

LENGTH item9 $ 38. ;

infile fred firstobs = 2 dlm=”,”;
input started $ ended $ username $ (item1 – item24) ($) ;

Thank you to the lovely  Michelle Homes for catching this! As she pointed out in the comments, the input statement assumes that the variables are 8 characters in length and character data. This is true for 26 of the 27 variables. However, ONE of the 24 items on the test is a question that can be answered with something like Four million, four thousand and twelve.

That, as you can see, is over 8 characters. So, I added a LENGTH statement. That brought up another issue, but that is the next post …

I’ll have a lot more to talk about in Dallas. Hope to see you there.

============

Want to be even smarter? Back us on Kickstarter! We make games that make you smarter. The latest one, Forgotten Trail, is going to be great! You can get cool prizes and great karma.

Boy walking in rain

If you came into my office and watched me work today, just before I had you arrested for stalking me, you might notice me doing some things that are the absolute opposite of best practices.

I need about 10 datasets for some analyses I’ll be doing for my SAS Global Forum paper. I also want these data sets to be usable as examples of real data for courses I will teach in the future. While I’m at it, I could potentially use most of the same code for a research project.

The data are stored in an SQL database on our server. I could have accessed these in multiple ways but what I did was

1. Go into phpMyAdmin and chose EXPORT as ODS spreadsheet.

2. Opened the spreadsheet using Open Office, inserted a row at the top and manually typed the names of each variable.

Why the hell would I do that when there are a dozen more efficient ways to do it?

In the past, I have had problems with exporting files as CSV, even as Excel files. A lot of our data comes from children and adolescents who play our games in after-school programs. If they don’t feel like entering something, they skip it. That missing data has wreaked havoc in the past, with all of the columns ended up shifted over by 1 after record 374 and shifted over again after record 9,433.  For whatever reason, Open Office does not have this problem and I’ve found that exporting the file as ODS, saving it as an xls file and then using the IMPORT DATA task or PROC IMPORT works flawlessly. The extra ODS > Excel step takes me about 30 seconds. I need to export an SQL database to SAS two or three times a year, so it is hard to justify trouble-shooting the issue to save myself 90 seconds.

IF YOU DIDN’T KNOW, NOW YOU KNOW

You can export your whole database as an ODS spreadsheet. It will open with each table as a separate sheet. When you save that as an XLS file, the structure is preserved with individual sheets.

You can import your data into SAS Enterprise Guide using the IMPORT DATA task and select which sheet you want to import. Doing this 2, 3 or however-many-sheets-you-have times will give you that number of data sets.

WHY TYPE IN THE VARIABLE NAMES?

Let me remind you of Eagleson’s law

“Any code of your own that you haven’t looked at for six or more months might as well have been written by someone else.”

It has been a few months since I needed to look at the database structure. I don’t remember the name of every table, what each one does or all of the variables. Going through each sheet and typing in variable names to match the ones in the table is far quicker than reading through a codebook and comparing it to each column. I’ll also remember it better.

If I do this two or three times a year, though, wouldn’t using a DATA step be a time saver in the long run? If you think that, back up a few lines and re-read Eagleson’s law. I’ll wait.

Reading and understanding a data step I’d written would probably only take me 30 seconds. Remembering what is in each of those tables and variables would take me a lot longer.

I’ve already found one table that I had completely forgotten. When a student reads the hint, the problem number, username and whether the problem was correctly answered is written to a table named learn. I can compare the percentage correct from this dataset with the rest of the total answers file, of which is a subset. Several other potential analyses spring to mind – on which questions are students most likely to use a hint? Do certain students ask for a hint every time while others never do?

Looking at the pretest for Fish Lake, I had forgotten that many of the problems are two-part answers, because the answer is a fraction, so the numerator and denominator are recorded separately. This can be useful in analyzing the types of incorrect answers that students make.

The whole point of going through these two steps is that they cause me to pause, look at the data and reflect a little on what is in the database and why I wanted each of these variables when I created these tables a year or two ago. Altogether, it takes me less time than driving five miles in Los Angeles during rush hour.

This wouldn’t be a feasible method if I had 10,000,000 records in each table instead of 10,000 or 900 variables instead of 90, but I rather think if that was the case I’d be doing a whole heck of  a lot of things differently.

My points, and I do have two, are

  • Often when working with small and medium-sized data sets, which is what a lot of people do a lot of the time, we make things unnecessarily complicated
  • No time spent getting to know your data is ever wasted

 

 

 

 

 

Some people may have said that hackathons are a stupid ass idea where a bunch of people who have can’t afford to buy their own pizza spend 48 hours with a bunch of strangers and no showers.

Okay, well, maybe that was me.

I take it all back.

In my office

We kicked off our hackathon at noon on Monday and wrapped up at 8 pm on Tuesday. The rules were simple – everyone who was working those days was to wipe their schedule completely for 8 hours each day and do nothing but work on the game. No emails, no blog posts, no meetings except for a kick off meeting each day to assign and review tasks. Jessica, Dennis, Samantha and I worked on the game for (at least) 16 hours. Any emails or interviews got done before the hackathon hours or after they were over. (I did pause for a brief interview with the Bismarck State College paper.)

Maria came in from maternity leave and worked 8 hours on Monday, baby in tow.

CalGonzalo and Eric each worked their regular shifts on Monday and Tuesday, respectively, doing nothing but writing code, creating sprites and editing audio. Sam even pitched in a few hours early in the morning from Canada. Our massively talented artist, Justin, completed all of the new artwork before the meeting so we had it in hand to drop into all of the spots where there had been placeholders.

So, in two days a total of 100 hours were devoted just to game development. We made a giant leap forward.

RocksThe 3-D portion of the first level is nearly done.

Player needing helpThe new characters are being dropped into each scene.

Why did it work so well? For one thing, we were all in the same spot for a long time. Although the original plan was to meet and then people would go there separate ways, on Monday, five of the six people working stayed at my house. Three of us even slept there. That had two positive impacts.

First of all, whenever anyone needed something, whether it was a piece of artwork modified or a question answered on whether we had a sound file of footsteps in the woods or to be shown how to do a voice over in iMovie, there was someone else to provide that assistance right on the spot. Very often, you can spend hours searching for something on Google, watching youtube videos, reading manuals trying to figure out how to do X when someone else can come up and say – Click on Window, pick record voiceover, click on the microphone in the middle of the left side of the window.

There are also those questions that CANNOT be found on Google, like where the hell was the new background image saved and what is it called.

The second positive impact was we got around to tasks that needed doing for a long time. While it may have seemed it kept us from getting real progress done on the game, the fifth time Sourcetree complained about not tracking those damned Dreamweaver .idea files, I HAD it and we removed those from the repository forever. When something bugs you every now and then you may think, “I’ll do it later”, but the fifth time it happens that day …

Anyway, I would share more of the awesomeness of the hackathon experience with you but it is now 9 pm and we are taking the team out for sushi.

In case you don’t know, SAS On-Demand is the FREE , as in free beer, offering of SAS for academic use. How good is it? There really can’t be one answer to that.

First of all, there are multiple options – SAS Studio, SAS Enterprise Miner, SAS Enterprise Guide, JMP, etc.  so some may be better than others.I have a fair bit of experience with two of them, so let’s just look at one of those today.

beer

SAS STUDIO

I mostly use SAS Studio with my students and over the past few courses I have been really pleased with the results. I selected SAS Studio over Enterprise Guide because I strongly believe it is useful for students to learn to code and many students, yes, even in an area like biostatistics need a little encouragement to learn. While they don’t end up expert SAS programmers after two or three courses, they at least can code a DATA step , read in raw data, aggregate data and data from external files, produce a variety of statistics and graphics and interpret the results.

Let’s be frank about this … it’s going to require a bit of work up front. You need to create a course with SAS On-Demand. You need to notify your students that they need to create accounts. If you are not going to use solely the sashelp directory data sets, you’re going to have to upload your own data.

Please don’t tell me you plan on solely using the sashelp data sets! These are really helpful for the first assignment or two while students get their feet wet but unless you expect your students to have careers where all of their files to be analyzed are going to be shipped with the software they use, you’re going to move to reading in other types of data sooner or later.

Your data are going to be stored on the SAS server (so you can tell people who ask that yes, you are ‘computing in the cloud’ – instead of what I usually tell people who ask stupid questions like that, which is shut the hell up and quit bothering me – but I digress. Even more than usual.)

No matter what software you use, you’re going to have to select some data sets for students to analyze, have some sort of codebook and make sure your data is reasonably clean (but not so clean that students won’t learn something about data quality problems). So, the only real additional time is figuring out how to get it on the SAS server.

None of these steps take much time, but adding them all up – getting a SAS profile, creating a course, creating an email to send to all of your students, with the correct LIBNAME, uploading your data – it all maybe adds up to a couple of extra hours.

My challenge always is how I shoehorn additional content into the very limited class time I have with students. One tool I’ve been using lately is livebinders. This is an application that lets you put together an online binder of web pages, videos and material you write yourself.

Here is an example of a livebinder I use for my graduate course in epidemiology. It has SAS assignments beginning with simply copying code to modifying it . Links to the relevant SAS documentation are included, as are videos that show step by step how to use SAS Studio for computing relative risk, population attributable risk, etc. I have a similar livebinder for my biostatistics course.

You might think this is a bit of hand-holding to walk the students through it, but I would disagree. Every time I have found myself thinking,

“Well, this is a little too easy”,

I have been wrong.

If you have been doing something for a decade or, in my case, a few decades, it’s hard to remember how confusing concepts were the very first time. Even things that you do automatically, like downloading your results as an HTML file, were a mystery at one time in your life. Making the videos takes some time initially – you have to do a screencast, and then the voice over. Sometimes I do them at once, using QuickTime and GarageBand simultaneously. Other times, I import the screencast into iMovie and record a voiceover.

Either way, a 7-minute video usually takes me half an hour to record, when you add in screwing up the first time, editing out the part where The Spoiled One came in and asked for money to go shopping, etc. So, you’re adding maybe 3-4 hours to the time you spend on your course. On the other hand, you only have to do it once, so, if you teach the same course a few times, it pays off. I cannot tell you how many times students tell me that the videos were helpful. Unlike when I am lecturing in class, they can slow the video down, play it over.Students end the course with experience coding, using data from actual studies and interpreting data to answer problems that matter.

My point is, that it is a little more work to teach using SAS Studio, but it is worth it.

 

 

 

 

baby mashing cake

Kappa is a useful measure of agreement between two raters. Say you have two radiologists looking at X-rays, rating them as normal or abnormal and you want to get a quantitative measure of how well they agree. Kappa is your go-to coefficient.

How do you compute it? Well, personally, I use SAS because this is the year 2015 and we have computers.

Let’s take this table, where 100 X rays were rated by two different raters as an example:

Rating by   Physician 1

————-Abnormal   |  Normal

Physician 2
————————————–

Abnormal        40             20

Normal             10            30

 

So ….. the first physician rated 60 X-rays as Abnormal. Of those 60, the second physician rated 40 abnormal and 20 normal, and so on.
If you received the data as a SAS data set like this, with an abnormal rating = 1 and normal = 0, then life is easy and you can just do the PROC FREQ.

 

Rater1    Rater2

1                1
1                 1

and so for 50 lines.

 

However, I very often get not an actual data set but a table like the one above. In this case, it is  still relatively simple to code

DATA compk ;

INPUT rater1 rater2 nums ;

DATALINES ;

1 1 40
1 0 20
0 1 10
0 0 30
;

 

So, there were 40 x-rays coded as abnormal by both rater1 and rater2.  When rater1 = 1 (abnormal) and rater2 = 0 (normal), there were 20,  and so on.

The next part is easy

PROC FREQ DATA = compk ;

TABLES rater1*rater2/ AGREE ;

WEIGHT nums ;

 

That’s it.  The WEIGHT statement is necessary in this case because I did not have 100 individual records, I just had a table, so the WEIGHT variable gives the number in each category.

This will work fine for a 2 x 2 table. If you have a table that is more than 2 x 2, at the end, you can add the statement

TEST WTKAP ;

This will give you the weighted Kappa coefficient. If you include this with a 2 x2 table nothing happens because the weighted kappa coefficient and the simple Kappa coefficient are the same in this case.

See, I told you it was simple.

 

Our Project Manager, Jessica,  made the very insightful comment at lunch the other day,

No one cares how hard it was for you to make. When people are looking to buy your product, all they want to know is what it will do for them.

That young woman has a bright future in marketing. Unfortunately for those who read this blog, I do not, so I am going to tell you how hard it is to make that last push to the finish line.

tired eyes

I quit counting the number of hours I worked this week when I got to 80. I’m sure The Invisible Developer had put in even more, because many nights (mornings?) I have gone to bed at 2 a.m. and when I wake up and check the latest build in the morning I find it was put up at 5 or 6 that morning. There hasn’t been much blogging going on lately and I only have a bit of a minute now because I’m waiting to get the latest latest latest build so that I can make the Windows installer.

I’ve blogged before on the great value I place on “details” people and this week is a prime example of the importance of details.

You’d think that down to and past the wire – the last build of the game was supposed to be today and we have negative 68 minutes left in today – that we would be moving forward pretty quickly. Um, not so much.

At the beginning of development, you can easily find the problems – the question is what fraction of the fish are over one foot long when you caught 125 fish last summer and 25 were over a foot long. The correct answer is 1/5.  However, 25/125 is also a correct answer, as is 5/25 . Finding those problems is easy. You can check the answer while you are creating the pages, have it write to the console the correct answer, step through the logic. No problem.

Same thing with playing the 3-D part of the game. If you are at the part where you are supposed to be spearing the fish and there is no spear, then it is an easy enough fix.

HOWEVER, now we are supposedly at the end. So…

  1. We make a version of the build for Mac and another for Windows.
  2. We zip the Windows file because many systems block .exe files downloaded from the Internet to prevent malware installation.
  3. We upload the zipped file to our server.
  4. We download it.
  5. We play the game from beginning to end on Mac.
  6. We play the game from beginning to end on Windows.

That is, we go through every step that a user would — and somewhere along the way we find an error that we somehow missed in all of our earlier testing. Maybe something we fixed in a later stage of the game was a script that was used in an earlier level and now that doesn’t work.

So … we go through all of the steps all over again. Yes, we do have debugging capabilities where we can skip to level 6 and test that, for example, but at the very end, you NEED to go through all of the steps your users will. Trust me. You can put in every unit test you want but it will not let you know that Microsoft or Chrome or some other organization put on this earth to try my patience now has a security feature that blocks the game from installing. You won’t see that three problems and all of the accompanying instructional material were left out.If you start at level 6 you will miss the fact that there is a problem in the transition from level 5 to level 6. And so on ad inifinitum until you go to speaking in Latin and wanting to tear out your eyeballs.

We go through all of the details so that when you play it all you see is a game that works.

My high school English teacher told me,

If something is easy to read, you can damn sure believe that it was hard to write.

I think this is also true,

Any kind of software that is easy to use, you can damn sure believe it was hard to make.

 

I think descriptive statistics are under-rated. One reason I like Leon Gordis’ Epidemiology book is that he agrees with me. He says that sometimes statistics pass the “inter-ocular test”. That is, they hit you right between the eyes.

moving eyeballs

I’m a big fan of eye-balling statistics and SAS/GRAPH is good for that. Let’s take this example. It is fairly well established that women have a longer life span than men in the United States. In other words, men die at a younger age. Is that true of all causes?

To answer that question, I used a subset of the Framingham Heart Study and looked at two major causes of death, coronary heart disease and cancer. The first thing I did was round the age at death into five year intervals to smooth out some of the fluctuations from year to year.

data test2 ;
set sashelp.heart ;
ageatdeath5 = round(ageatdeath,5) ;
proc freq data=test2 noprint;
tables sex*ageatdeath5*deathcause / missing out= test3 ;
/* NOTE THAT THE MISSING OPTION IS IMPORTANT */

THE DEVIL IS IN THE DETAILS

Then I did a frequency distribution by sex, age at death and cause of death. Notice that I used the missing option. That is super-important. Without it, instead of getting what percentage of the entire population died of a specific cause at a certain age,  I would get a percentage of those who died. However, as with many studies of survival, life expectancy, etc. a substantial proportion of the people were still alive at the time data were being collected. So, percentage of the population, and percentage of people who died were quite different numbers. I used the NOPRINT option on the PROC FREQ statement simply because I had no need to print out a long, convoluted frequency table I wasn’t going to use.

I used the OUT = option to output the frequency distribution to a dataset that I could use for graphing.

More details: The symbol statements just make the graphs easier to read by putting an asterisk at each data point and by joining the points together. I have very bad eyesight so anything I can do to make graphics more readable, I try to do.
symbol1 value = star ;
symbol1 interpol = join ;

Here I am just sorting the data set by cause of death and only keeping those with Cancer or Coronary Heart Disease.
proc sort data=test3;
by deathcause ;
where deathcause in (“Cancer”,”Coronary Heart Disease”);

 

Even more details.  You always want to have the axes the same on your charts or you can’t really compare them. That is what the UNIFORM option in the PROC GPLOT statement does. The PLOT statement requests a plot of percent who died at each age by sex. The LABEL statement just gives reasonable labels to my variables.

proc gplot data = test3 uniform;
plot percent*ageatdeath5 = sex ;
by deathcause ;
Label percent = “%”
ageatdeath5 = “Age at Death” ;

cause of death by age by gender

When you look at these graphs, even if your eyes are as bad as mine you can see a few things. The top chart is of cancer and you can conclude a couple of  things right away.

  1. There is not nearly the discrepancy in the death rates of men and women for cancer as there is for heart disease.
  2. Men are much more likely to die of heart disease than women at every age up until 80 years old. After that, I suspect that the percentage of men dying off has declined relative to women because a very large proportion of the men are already dead.

So, the answer to my question is “No.”

Frequently, I hear adults who should know better argue against learning something, whether it is algebra, analysis of variance or learning a programming language. They say,

I’m 47 years old and I’ve never used (insert thing I didn’t learn here).

Yes, that is true. However, if you had learned it, there is a good chance that you would have used it .  For those of you protesting, “Hey, I learned algebra!” , maybe you did and maybe you didn’t.  Read my post on number sense.)

Let’s take this morning as an example. In keeping with my New Year’s Literary resolution, I started out the day reading the jQuery cookbook. There were two things I learned that I expect to use this year. One of them was very simple, but I didn’t know it.

var t1 = +new Date ;

This returns the current date in milliseconds converted to a number. Yes, you could use the Javascript Number() function but this saves you a step.

Now you can use this useful bit of code which I am planning on applying next week when I get done with what I’m working on now. I can use it to see how long a student worked on each individual problem and how long he or she took for the whole test.

(function() {
var log = [], first, last ;
time = function(message, since) {
var now= +new Date ;
var seconds = (now - (since || last)) /1000 ;
log.push(seconds.toFixed(3) + ':' + message + 'br/>') ;
return last = +new Date ;
};
time.done = function(selector) {
time('total', first) ;
$(selector).html(log.join('')) ;
};
first = last = +new Date ;
})() ;

Now, the author’s interest was in seeing how long each bit of code took to run. However, I can see how this could be really useful in the pretest and posttests we use for our games to see how long the student spent on each problem. We could call this function each time the student clicks on the next arrow to go to the next problem.

One of the Common Core Standards for mathematical practice is “Make sense of problems and persevere in solving them.”

How do you know if a student is “persevering”? One way would be to measure how long he or she spent on a particular problem before going on to the next. We cannot know for a fact that the student spent time thinking about it rather than staring off into space, but we can at least set a maximum amount of time the student spent thinking about it before going on to the next thing.

This takes me to the point Morton Jervens was making about not everything that counts can be counted and data does not always equal statistics.

While there is truth in that, I would say that much more that counts can be counted and much more of data can be turned into statistics if you know how to do it.

Learn how.

I read a lot. This year, I finished 308 books  on my Kindle app, another dozen on iBooks, a half-dozen on ebrary and 15 or 20 around the house. I don’t read books on paper very often any more. It’s not too practical for me. I go through them at the rate of about a book a night, thanks to a very successful speed reading program when I was young (thank you, St. Mary’s Elementary School). Don’t be too impressed. I don’t watch TV and I read a lot of what I colleague called, “Junk food for the brain”. I read a bunch of Agatha Christie novels, three Skullduggery Pleasant books, several of the Percy Jackson and the Olympian books. Yes, when it comes to my fiction reading, I have the interests of a fourteen-year-old girl. Trying to read like a grown up, I also read a bunch of New York Times bestseller novels and didn’t  like any of them.

So, I decided to do my own “best books list” based on a random sample of one, me, and make up my own categories.

Because I taught a course on multivariate statistics,  I read a lot of books in that area, and while several of them were okay, there was only one that I really liked.

The winner for best statistics book I read this year, 

Applied logistic regression, 3rd Edition, by David Hosmer, Stanley Lemmeshow and Rodney Sturdivant.

I really liked this book. I’m not new to logistic regression, but I’m always looking for new ideas, new ways to teach, and this book was chock full of them.  What I liked most about it is that they used examples with real data, e.g., when discussing multinomial logistic regression, the dependent variable was type of placement for adolescents, and one of the predictor variables was how likely the youthful offender was to engage in violence against others. It is a very technical book and if you are put off by matrix multiplication and odds ratios, this isn’t the book for you. On the other hand, if you want any in depth understanding of logistic regression from a practical point of view, read it from the beginning to end.

Best SAS book  I read this year …

Let me start with the caveat that I have been using SAS for over 30 years and I don’t teach undergraduates, so I have not read any basic books at all. I read a lot of books on a range of advanced topics and most of them I found to be just – meh. Maybe it is because I had read all the good books previously and so the only ones I had left unread lying around were just so-so. All that being said, the winner is …

Applied statistics and the SAS programming language (5th Ed), by Ronald Cody and Jeffrey Smith

This book has been around for eight years and I had actually read parts of it a couple of years ago, but this was the first time I read through the whole book. It’s a very readable intermediate book. Very little mathematics is included. It’s all about how to write SAS code to produce a factor analysis, repeated measures ANOVA, etc. It has a lot of random stuff thrown in, like a review of functions, and working with date data. If you have a linear style of learning and teaching, you might hate that. Personally, I liked that about it. This book was published eight years ago, which is an eon in programming time, but a chi-square or ANOVA have been around 100 years, so that wasn’t an issue. While I don’t generally like the use of simulated data for problems in statistics, for teaching this was really helpful because when students were first exposed to a new concept they didn’t need to get a codebook, fix the data. For the purpose of teaching applied statistics, it’s a good book.

Best Javascript programming book I read this year

I read a lot of Javascript books and found many of them interesting and useful, so this was a hard choice.

The jQuery cookbook, edited by Cody Lindley

was my favorite. If you haven’t gathered by now, I’m fond of learning by example, and this book is pretty much nothing but elaborate examples along the lines of , “Say you wanted to make every other row in a table green”. There are some like that I can imagine wanting to do and others I cannot think of any need to use ever. However, those are famous last words. When I was in high school, I couldn’t imagine I would ever use the matrix algebra we were learning.

Best game programming book I read this year

Again, I read a lot of game programming books. I didn’t read a lot of mediocre game programming books. They all were either pretty good or sucked. The best of the good ones was difficult to choose, but I did

Building HTML5 Games by Jesse Freeman

This is a very hands-on approach to building 2-D games with impact, with, you guessed it, plenty of examples. I was excited to learn that he has several other books out. I’m going to read all of them next year.

So, there you have it …. my favorite technical books that I read this year. Feel free to make suggestions for what I ought to read next year.

 

What if you wanted to turn your PROC MIXED into a repeated measures ANOVA using PROC GLM. Why would you want to do this? Well, I don’t know why you would want to do it but I wanted to do it because I wanted to demonstrate for my class that both give you the same fixed effects F value and significance.

I started out with the Statin dataset from the Cody and Smith textbook. In this data set, each subject has three records,one each for drugs A, B and C. To do a mixed model with subject as a random effect and drug as a fixed effect, you would code it as so. Remember to include both the subject variable and your fixed effect in the CLASS statement.

Proc mixed data = statin ;
class subj drug ;
model ldl = drug ;
random subj ;

To do a repeated measures ANOVA with PROC GLM you need three variables for each subject, not three records.

First, create three data sets for Drug A, Drug B and Drug C.

Data one two three ;
set statin ;
if drug = ‘A’ then output one ;
else if drug = ‘B’ then output two ;
else if drug = ‘C’ then output three ;

Second, sort these datasets and as you read in each one, rename LDL to a new name so that when you merge the datasets you have three different names. Yes, I really only needed to rename two of them, but I figured it was just neater this way.

proc sort data = one (rename= (ldl =ldla)) ;
by subj ;

proc sort data= two (rename = (ldl = ldlb)) ;
by subj ;
proc sort data=three (rename =(ldl = ldlc)) ;
by subj ;

Third, merge the three datasets by subject.

data mrg ;
merge one two three ;
by subj ;

Fourth, run your repeated measures ANOVA .

Your three times measuring LDL are the dependent . It seems weird to not have an independent on the other side of the equation, but that’s the way it is. In your REPEATED statement you give a name for the repeated variable and the number of levels. I used “drug” here to be consistent but actually, this could be any name at all. I could have used “frog” or “rutabaga” instead and it would have worked just as well.

proc glm data = mrg ;
model ldla ldlb ldlc = /nouni ;
repeated drug 3 (1 2 3) ;
run ;

Compare the results and you will see that both give you the numerator and denominator degrees of freedom, F-statistic and p-value for the fixed effect of drug.

Now you can be happy.

Next Page →