Today, I finished up a bonus Easter egg for the game, Aztech: Meet the Maya that you are taken to play when you click to see what Jose is thinking.  You can see a rough version of it here. This plays better on a desktop / laptop because iPad blocks the autoplay for sound, but when it’s packaged for the app store, that will work on the iPad as well.

This game uses several functions, all of which I wrote my little old self.

  1. Switches sound file played from English to Spanish
  2. Switches text from English to Spanish
  3. Takes you to the bonus game when you click on the sound bubble
  4. When the sound file ends, replaces the talking gif with a static image  and shows the arrow to continue.
  5. For each item on the screen, performs an action when clicked – anything from text describing it’s use to the Maya to a jumping and howling monkey. Also, removes that item name from the list of things to find, increases the number of found items by 1 and checks to see if all items are found.

There are probably some other things I forgot.

 

monkey

You might wonder how I got from SAS to here. Well, it all started with SAS macros. A macro is no more than a user-written function. When I was first exposed to this idea in graduate school back in the 1980s (yes, literally) my mind was blown! You mean, I could write my own functions?!

You might think this SAS macro that I wrote a couple of years ago

%macro sched(the_day,start1,finish1,teacher1,start2=0, finish2=0, teacher2=” “, start3=0,finish3=0, teacher3=” “);
if date_data = &the_day then do ;
if minutes > &start1 & minutes < &finish1 then tclass = &teacher1 ;
else if (&start2 > 0) & (minutes > &start2) & minutes < &finish2 then tclass = &teacher2 ;
else if &start3 > 0 & (minutes > &start3) & minutes < &finish3 then tclass = &teacher3 ;
end ;
%mend sched ;

doesn’t look like this JavaScript function

// Section to include sound. ;

function playJungleAudio(scene,langs) {
audio_e2 = new Audio();
audio_s2 = new Audio();

if(langs ===2){
audio_e2.src = "sounds/" + scene + "_eng.mp3";
audio_s2.src = "sounds/" + scene + "_sp.mp3";
if ($("#span_button").hasClass("noshow")) {
audio_e2.play();
} else {
audio_s2.play();
}
}
else {
audio_e2.src = "sounds/" + scene + ".mp3";
audio_e2.play();
}
}

If you look closely, though, these are identical in purpose and structure. Both are intended to package a set of statements that will then be executed when called. For both types of functions, SAS (macros) or JavaScript, parameters are optional. Both of these examples just happen to have parameters. Both have a defined start and stop.

In SAS it is

%macro macr0-name (parameters) ;

in JavaScript it is

function function-name(parameters) {

Both have a defined end, with SAS it is

%mend macroname ;

with JavaScript it is simply

}

 

Both are named functions (JavaScript also has anonymous functions), and when you call the function it executes.

It just so happens that both of these functions contain if-then – else statements.

To call the SAS macro, you give the macro name with a % in front of it, and include all the parameters in parentheses, separated by commas.

%sched(19292,790,840,”Elmo”,start2= 840, finish2=900, teacher2= “Bert”, start3=940,finish3=990, teacher3= “Snuffleupagus”);

To call the JavaScript function, you give the name, and include all parameters in parentheses, separated by commas.

 playJungleAudio("howler_monkey",1);

These parameters are then passed to the macro/ function and their values are plugged into the code between the beginning and end.

I have a lot more to say about this but it is getting close to 1 am and I have a plane to catch tomorrow so I’ll have to pick it up next time.

Speaking of  games – check out Making Camp, you can get it here for free. Play it and learn stuff because maturity is overrated.

wigwam

If you want to learn even more stuff, you can get a bilingual version of Making Camp for your iPad for only $1.99 and brush up on your Spanish like you always said you were going to do but didn’t

blocksSo you want to be a successful software developer / consultant ?

If you are in any kind of quantitative field you have a VAST range of options, from working at some of the largest companies in the world in marketing research to performing efficacy studies for non-profits whose staff members can be counted on one hand.

All of these broad number of opportunities require, at most, five building blocks:

  1. Programming concepts – You need to understand scope, do-loop, arrays, functions
  2. Data management – The thousand ways that users can enter data, and how to keep it from screwing up your results
  3. Working in a software development team – this is the part “self-taught” programmers are often not taught – documentation, testing and debugging
  4. Statistics – coming from the age when we inverted matrices by hand with a piece of pare and a pencil (not kidding) SAS, SPSS, R, Stastica, JMP and even Excel have made this a hundred times easier from when I started in the field
  5. Domain specific knowledge – by that, I mean if you are working in aerospace know something about what a transmitter and receiver are, know that a male and a female plug is a thing. If you are in biostatistics, understand survival analysis, relative risk.

(Yes, I know I said four in the previous post but then I thought about the importance of being able to work as part of a software development team and it’s my blog, so hush up.)

Since I started (mostly) with SAS, I’m going to talk for the next few posts about how starting as a SAS programmer can be like a Dr. Seuss book – “Oh, the places you’ll go!” My main point, as I have said before (weren’t you listening?) is that it doesn’t matter what language you use in the beginning. Eventually, I will tell you why SAS is a great place to start – better than many others – but it is not eventually yet. Patience is a virtue.

Let’s start with programming concepts. Now, I’d had a bit of BASIC, Fortran and COBOL before I got to SAS (yes, shut up, I’m old and in fact, yes I DID use a keypunch machine with punched cards like those women in Hidden Figures.  When the movie came out, one of our interns, in all seriousness, asked me if I was in it. I’m not quite that old.)

The basic concepts I use almost every day:

Arrays – I’ve written about those on this blog a dozen time. One of the most frequent uses I make of SAS is to score tests, which requires creating an array of answers from a respondent and a second array of items scored correct or incorrect. Our game, Making Camp, that teaches multiplication and division, has a virtual trading post and a wigwam, both of which make extensive use of arrays. All of the items you can “buy” with the points you earned from solving math and history problems are in an array.

SOME SAS ARRAYS

Data scored ;
set mydata.data2013 ;
array ans{70} q1- q70 ;
array correct{70} c1 – c70 ;
array scored{70} sc1 – sc70 ;

SOME  JAVASCRIPT ARRAYS

var things = [
“art/tomahawk.png”, “art/dog.jpg”, “art/pottery.png”, “art/deer_skin_sm.png”,
“art/bass_side.png”, “art/arrows_and_quiver.png”, “art/turtle.jpg”, “”,
“art/parfleche.png”, “art/feather_sm_side.png”,
“art/plate.png”
];

var things_name = [
“TOMAHAWK”, “DOG”, “POTTERY”, “DEER SKIN”, “BASS”, “ARROWS AND QUIVER”, “TURTLE”, “”, “PARFLECHE”, “FEATHER”, “PLATE”
];

Yes, they look a little different but the basic concept is the same.

In the SAS example, I’m matching three arrays – the answer the students gave, the correct answer and the item scored correct or incorrect.

In the JavaScript example, I am matching up two arrays, with the source for the image file and the alternate text for that element.

In her paper presented in 2010 at SAS Global Forum, Jennifer Waller says,

A SAS ARRAY is a set of variables of the same type that you want to perform the same operation on. The set of
variables is then referenced in the DATA step by the array name. The variables in the array are called the “elements”
of the array.
Every word of that applies in JavaScript except for “of the same type”. In JavaScript you can have mixed type arrays and if SAS would add that, it would make me very, very happy.
Arrays are a fundamental concept to any programming language, so mastering that concept is a step forward.
Truly understanding variables is another foundational idea – not just that they are not constants, but the concepts of type, format and scope – but that is a whole different post and The Invisible Developer is reminding me it’s almost 11 pm on Sunday night, so that will be my next digression.

Speaking of Making Camp, you can get it here for free. Play it and learn stuff because maturity is overrated.

wigwam

If you want to learn even more stuff, you can get a bilingual version of Making Camp for your iPad for only $1.99 and brush up on your Spanish like you always said you were going to do but didn’t.

Last post, I talked about bricolage, the fine art of throwing random stuff together to make something useful. This is something of a philosophy of life for me.

Seems rambling but it’s not …

Over 30 years ago, I was the first American to win the world judo championships. A few years ago, I co-authored a book on judo, called Winning on the Ground. 

Winning on the ground cover

When it came to judo, although I was better than the average person, I was not the best at the fancy throws – not by a long shot. I didn’t invent any new judo techniques.  I wanted to call our book The Lego Theory of Judo but my co-author said, “That’s stupid” and the editor, more tactfully, said, “Nobody will know what you are talking about unless they read the book and you want a title that will get them to buy the book”. So, I lost that argument.

What I was really good at was putting techniques together. I could go from a throw to a pin to an armbar and voila – world champion! Well, it took a long time and a lot of work, too.

How does this apply to statistics?

Let’s start with Fisher’s exact test. Last year, I wrote about using this test to compare the bureaucratic barriers to new educational technology in rural versus urban school districts. Just in case you have not memorized my blog posts, Fisher’s exact test can be used when you have a 2 x 2 matrix that fails to meet the chi-square minimum of five observations per cell. In that instance, with only 17 districts, chi-square would not be appropriate. If you have a 2 x 2 table, SAS automatically computes the Fisher exact test, as well as several others. Here is the code:

PROC FREQ DATA = install ;
TABLES rural*install / CHISQ ;

Ten years ago, I was using this exact test in a very different context, as a statistical consultant working with a group of physicians who wanted to compare the mortality rates between  a department that had staff with a specific training program and a similar department where physicians were equally qualified except for participation in the specialized program. Fortunately for the patients but unfortunately for statistical analysis purposes, people didn’t die that often in either department. Exact same problem. Exact same code except for changing the variable names and data set name.

In 35 years, I have gone from using SAS to predict which missiles will fail at launch to which families will place their child with a disability in a residential facility to which patient in a hospital will die to which person in vocational program will get employed to which student will quit playing an educational game. ALL of these applications have used statistics and in some cases, like the examples above, the identical statistics applied in very diverse fields.

Where do the Legos come in?

In pretty much every field, you need four building blocks; statistics, foundational programming concepts, an understanding of data management and subject specific knowledge. SAS can help you with three of these and if you acquire the fourth, you can build just about anything.

More on those building blocks next post.

Support my day job AND get smarter. Buy Fish Lake for Mac or Windows. Brush up on math skills and canoe the rapids.

girl in canoe

For random advice from me and my lovely children, subscribe to our youtube channel 7GenGames TV

 

I thought the title of Al Franken’s book , The Truth, with jokes , was great and I wanted to do something just like it. Unfortunately, I’m not that funny.

Often, the discussion comes up among colleagues whether it is better for one’s career to be a specialist or a generalist. It’s a little (a lot) too late for me to become the world’s foremost authority on PROC REPORT (that’s Kim Le Bouton, isn’t it?). Right after wondering whether anyone uses PROC REPORT any more, I starting thinking of all of the basic concepts I learned from it that I apply regularly, mostly with PHP.

My point, which you have by now despaired of me having, is that when it comes to starting a fascinating career, SAS is as good as starting place as any, and probably better than most.

If you decide to be a specialist, your career path probably looks something like this:

typical career path

You get a bachelors and a masters in statistics, you become a data analyst and work your way up to managing the entire division. If you do that, work your way up the ranks to knowing absolutely everything about clinical trials of migraine drugs, you’ll probably end up with a nice house in the suburbs, a 401K and three weeks of vacation each year.

If that’s you, cool. To be honest, though, I look at friends who have spent 20, 30 or 40 years in the same company and think I would lose my mind.

My career path

 

But, you say,

What can I do? I have an M.S. in statistics (or business or sociology) and two years of experience using SAS. What options do I have?

Well, honey, you have come to the right blog. In my decidedly non-linear career, I have been an industrial engineer,  professor in schools of education, engineering, business, liberal arts and human services. I think the only one I have missed is fine art. I’ve been a consultant, programmer, statistician, consultant again and now am president/co-founder of a gaming start-up.

Over the next few posts, I’ll explain a dozen ways in which I have built a career by bricolage – that is, from building stuff – programs, companies – using whatever was lying about . All of my various careers have had their roots in statistics and SAS. Could I have learned the same concepts and gotten the same results using another programming language, taking a different path? Probably. But I didn’t.

 


Support my day job AND get smarter. Buy Fish Lake for Mac or Windows. Brush up on math skills and canoe the rapids.

girl in canoe

view from plane windowLittle known fact (because, seriously, how would you know) , I write a lot of code while sitting on a plane and I can’t always connect to the Internet.

NOT ALL QUOTATION MARKS ARE CREATED EQUAL

Sometimes, when I copy and paste my code into SAS Studio, it doesn’t work.

if compress(q23) = “3/4” then q23 = “.75” ;

Just so you know, this does not work because some programs like Word, or even TextEdit on the Mac will replace quotes with some swirly shit (see above) that SAS and other languages don’t read as quotes.

This article from the University of Michigan gives some hints on how to prevent or fix this problem.

How to tell if your quotations are a problem


SAS Studio is color-coded.
Note that the first two lines have the values shown in purple.

color coded

The next two lines don’t. If you look closely, those are the evil curly quotes. If you realize this, you can tell at a glance if there is a problem with your code.

Getting rid of text

Okay, I replaced the evil curly quotes, but I still have a problem. The questions are things like,

“What is the area of this shape in square feet”, and let’s say the answer is 240 .

Students answer all kinds of variations of that, like :

  • 240 square feet
  • The answer is 240
  • 240 sq ft

All of these answers are correct but if I just compared them to 240,  they would not be equal and be marked wrong. Enter the COMPRESS function.

q3 = compress(Q3,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','l');

The above statement will remove all alphabetic characters from the answer and return just numbers.

The COMPRESS function has three parts –

  • the source, which is the variable you want modified, in this case, Q3,
  • the characters you want added or removed (the default is removal),
  • an optional modifier

In my case, I used the modifier ‘l’  – that is a lower-case L, not a number 1 – because I wanted all of those letters removed if they were lower-case, too. So, I don’t have to type all of the letters of the alphabet twice.

Getting rid of special characters

You can also use the COMPRESS function to get rid of special characters. Say the question is “If tickets are normally $100 and tickets are 50% off, how much does it cost Cassandra for a ticket to the Dead Fleas concert?” Students will enter answers like, $50 or 50.   To get rid of the $, simply do this:

q1 = compress(q1, ‘$’) ;

When I’m not teaching statistics or writing about SAS, I’m making video games. We’re doing a Kickstarter campaign to make our bilingual games available everywhere and if you backed us, that would be AMAZING ! Plus you will get cool prizes.

I read a comment on line saying SAS probably would not disappear as an option for statistical analysis because “it’s good when you need to do a lot of data manipulation”.

I wonder what world those people live in that data comes all cleanly packaged and whether they have unicorns there.

Back on Planet Earth, I have a data set that has multiple records for the same date for the same students.  For some reason, the data were being sent at the end of each screen at one site, instead of at the end of the test. So, the data look like this:

kat123 4 5 18 11   2017-04-23 17:39:26

kat123 4 5 18 11   42 17 8 0 1 2017-04-23 17:41:12

and so on.

The students also took a post-test, months later, so …

I need the last record for each date, but my data has date and time

You might think doing

testday= datepart(date_entered);

would work and it would except for the fact that

My date is saved as a character format! What do I do?

You can read some suggestions here in SAS communities

https://communities.sas.com/t5/Base-SAS-Programming/how-to-convert-char-var-to-sas-date/td-p/45067

I could not find

2016-02-03 19:41:26

and I spent a good hour trying different methods to get this to work. I will spare you the details and maybe I could have gotten some method to work (no, whatever you are considering, I probably already tried). However, this occurred to me …

Do you really need to change it to a date format?

In this case, I was not doing any calculations with the date value, I simply needed the day part as a unique value.

I could just use the first 10 characters like this

day_of_test = substr(date_entered,1,10) ;

If you figured this out in the first sentence or two you are probably laughing by now (shut up).  Yes, it doesn’t matter if it is formatted as a date or not. So, that is what I did.  After creating a variable that is just the day of the test, I sorted by username, day of test and date entered (which included the time value). Then, I read in the data using the BY statement in the Data step so there would be  last. variable created that is whether or not this is the last record with that value in the BY group.  I output the last record for each day by using a subsetting IF statement.

Data fixdata ;
set mydata.aztech_pre ;

*** CREATE day_of_test variable as characters 1-10  ;
day_of_test = substr(date_entered,1,10) ;

*** SORT by username, day of test and date entered (including time);
proc sort data=fixdata;
by username day_of_test date_entered ;

*** DATA step that only saves last record ;
Data mydata.aztech_pre ;
set fixdata ;

***  BY statement to define that the data is by username and day_of_test ;
*** NOTE:  If you didn’t do the PROC sort first, this won’t work. For shame! ;
by username day_of_test ;

***
if last.day_of_test  ;
run;

So, that worked perfectly. I included my missteps because it is easy when you are a newbie to believe that everyone is smarter than you and never makes bonehead mistakes. Not so. We all make them all of the time. The important thing is, figuring it out in the end. Sometimes the easy way is not so obvious.

Or, maybe it is and I’m a bonehead. Either way, it worked. Now on to step 2.

 

When I am not writing about SAS, I’m making games that teach math, social studies and language.

Check them out.

screen shots from our games

A while back, I wrote a post on getting your Excel data into SAS Studio the quick and easy way. However,  I hear you saying,

What about ME? What about MY needs? What if I don’t want my data written to the working directory? What if my file has the names at the top and I want to keep those names?

First of all, open a program file and run some code that assigns the LIBNAME to the directory where you want your data stored. It should look like this but whatever is in the quotation marks should be where your data are stored.

LIBNAME mydata “/courses/d1234566789” ;
run;

Second, upload your Excel File

sasexcel1

Under FILES, select the folder where you would like your data stored. Click on the UPLOAD FILES button (the arrow pointing up at the top of the screen) and then click CHOOSE FILES to go to where the file is stored on your computer. Select that file, click the button on the pop-up window that says UPLOAD. Now you have your Excel file, uploaded but you want a SAS file.sasExcel2

Go under TASKS and UTILITIES and click the arrow to select UTILITIES and then select IMPORT DATA.

 

sasExcel3

On the right, you’ll see this big window that says DRAG AND DROP YOUR FILE HERE.

file list

In the left pane, open the FILES directory and go to where you saved your Excel file. Drag it into the window. Once you’ve done that, this wi If you stopped here, you would have the file written to the working directory, and named import.

import option

If you want to change that, click the button that says CHANGE.

changing default name in boxes

This pops up. Don’t see the directory you want? Did you run the LIBNAME statement at the very beginning of this post to assign a library reference to that directory? For shame! You think I just make this stuff up? Go back and do it now.

Okay, should you be concerned that your library name is greyed out? No, you should not. That just means you cannot change the name of your library reference here. If you wanted to change that library name from “mydata” to “yourdata” you’d have to do it in the LIBNAME statement.

Type the name you want for the data set. Do not forget to click SAVE or you may as well have skipped this step.

Click the little running guy at the top of the window.

Before you go, notice that SAS also generates code for you. If, like me, you anticipate that your data may change and you may need to do this again, you can copy and paste the code generated by SAS and save it in a program file. Run it again to recreate your data set. How likely is that to happen?  Well, it happened to me today when I inadvertently (that’s a synonym for “stupidly”, right?) wrote over this exact data set.

/* Generated Code (IMPORT) */
/* Source File: az_pretest.xlsx */
/* Source Path: /home/annmaria.demars/data_analysis_examples/data2017 */
/* Code generated on: 7/31/17, 6:09 PM */

%web_drop_table(MYDATA.aztech_pre);
FILENAME REFFILE ‘/home/annmaria.demars/data_analysis_examples/data2017/az_pretest.xlsx’;

PROC IMPORT DATAFILE=REFFILE
DBMS=XLSX
OUT=MYDATA.aztech_pre;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=MYDATA.aztech_pre; RUN;
%web_open_table(MYDATA.aztech_pre);
run;

Okay, there you go. With a few clicks, your Excel file is accessible in SAS Studio as a SAS data set and you have a copy of the code that did it.

Next post we’ll start whipping that data into shape.

When I am not writing about SAS, I’m making games that teach math, social studies and language.

Check them out.

screen shots from our games

 

 

Once every year, I teach an actual course, not a workshop or professional development, but a class with 20 – 40 students. One where I need to write a syllabus, have lectures, papers to grade, homework and exams.

Now, I’m not comparing teaching masters or doctoral students 3- 6 hours a week to my friends who teach middle school six hours a day. In fact, when I go for a day or two, as a guest speaker for six classes a day, and I need to stand on my feet and keep 40 teenagers’ attention for all of that time, I think yet again that teachers don’t get paid nearly enough.

There are several reasons that it is important to me to teach a course every year, and one is that I think it is super-important as someone who makes educational technology that I be in an actual classroom with students. It’s easy to forget how unbelievably BUSY teachers are if you are not in that situation day after day.

It’s also easy to overestimate the amount of time teachers have to investigate new technology. For example, for the course I am just finishing, I considered just two possible types of statistical software – SAS and SPSS.  The university had a license for one and it was available free (through SAS Studio) for the other. I knew R existed, of course,  but I did not consider it as an option for these students (long story I will skip). I had a short time to decide and someone suggested to me another option – JMP – that I had not considered, but by then I didn’t have time to research it, find a possible textbook and integrate it in my syllabus and lectures. If I’d had more time to look into it, that might have been a good choice.

I know there are other options out there- I had looked at Statistica at one point and it looked pretty cool. However, now that I have my syllabus done, lectures written, textbooks selected, model assignments and my students are generally doing pretty well, it is hard to see myself spending a lot of time researching new software applications for my engineering students.  (Social Science and Education might be a different issue).

My point is that one evident challenge for anyone who makes educational technology is the “good enough” problem. That is, if things are going good enough, teachers are not highly motivated to look for something better.

One of the things that drives me crazy, is those teachers who think it’s “good enough” when the vast majority of their students are below grade level or not proficient – but that’s a rant for another day.

(If you’re fascinated by this topic – and who wouldn’t be – I wrote more about why teaching helps me run an ed tech startup on my other blog over on the 7 Generation  Games site)

 

When I am not writing about statistics, I’m making games that teach math, social studies and language.

Check them out.

screen shots from our games

It may not be the secret to great joy but it is certainly the secret to avoiding unhappiness and it is simply this:

The absence of self-ruminative thoughts.

I’d like to claim the idea was originally mine but the truth is I first heard this phrase over a decade ago in a talk by Albert Bandura (yes, THAT Albert Bandura) and he said one of the differences between people who are content with their lives and those who are unhappy is that the happy group have “an absence of self-ruminative thoughts”.

There is a phrase I use a lot,

Not my circus, not my monkeys.

monkeys

In other words, I don’t make everything about ME.

Here are tips to not ruminating too much.

  1. What people think about you is none of your business (I stole this one from Darling Daughter Number Three)

I do the best I can. When I meet with employees or students, I tell them what I think needs to be said, listen to what they have to say and then I don’t worry about whether I was too harsh or too wishy-washy, whether they respected my authority or thought I was incompetent. If random Joe on the Internet thinks I’m old and grey and should just shut up, well, as much as it pains me to have lost the good opinion of an anonymous person I have never met – oh, wait, no I don’t care.

2. Don’t take things personally

If I screw up,  I try to learn from it. If I don’t get a grant, or a person decides not to invest in our company or a school decides not to buy our games,  I listen to their reasons and if it is a reasonable suggestion for a change I can make, I try to do it. If not, I don’t worry about it. I still remember the astonishment I felt seeing a colleague throw a grant review in the trash without reading it.

What are you doing? Why didn’t you read the comments?

I asked. He responded,

Shit, why should I read it? They didn’t like me. They don’t think I’m a researcher.

It’s more than just not taking things personally, though. It’s also a matter of not making everything about how other people are not acting as YOU think they should behave.

3. Don’t make it about YOU when it’s not

Your adult children aren’t raising their kids the way you think they should? The neighbors don’t maintain their yard the way you think it should be ?

Not my circus, not my monkeys.

4. Look out instead of in

A few months ago, we had a really fascinating guest on the More Than Ordinary podcast, Jonathan Shaw. He’d just finished writing his autobiography, Scab Vendor, and he encouraged me to go away for a month and write my own autobiography. Jonathan’s book was interesting and his idea was intriguing. I randomly happened to be in an area known as a writer’s retreat in Lopinot, Trinidad and I tried for a bit. I have had a long strange trip around the world and back again, that’s for sure.

I just don’t get excited about the idea of looking back through all of the things that happened in my life. Jonathan said,

You’ll grow from the experience, but it will probably hurt – and I only saw ‘probably’ to be nice.

Maybe if I went back and hung out in the mountains I would find myself.

Lopinot

Instead, I went back to making games, looking forward instead of back. Feel free to buy some. They are fun and you’ll learn. Kind of like life should be.

screen shots from our games

So, after three posts of

we have arrived at MANOVA.  If you skipped those three posts, feel shame at trying to take shortcuts, go back and read them.

Before we dive into coding, let’s take a look at some basic background on MANOVA.

The difference between ANOVA and MANOVA is simple

  • With ANOVA you have one dependent variable
    With MANOVA you have multiple dependent variables

How does that work? Think back to what you know about multiple correlation

In correlation, you are looking at the relationship between two variables, X and Y. You predict changes in X from changes in Y

Y = bX

In multiple correlation you are looking at the relationship between Y and MULTIPLE X variables.

You have an equation something like

Predicted Y = b0X0 + b1X1 + b2X2 + b3X3

And you are looking at how the Y variable changes in relation to the PREDICTED Y. Notice that predicted Y is a sum of all of your variables, each of which is multiplied by a regression coefficient.

The correlation between these predicted Ys and the actual Y is your multiple R and the multiple R-squared in ANOVA or regression is the square of the multiple R.

The multiple R-squared answers the question – how much of the variance in the dependent variable can be explained by variance in the independent variable (s) ?

In the case of ANOVA, this variance is in group membership, so we are testing the null hypothesis that the mean of group1 = the mean of group 2 all the way to group N

With MANOVA, you have multiple variables on the Y side of the equation

The variable you are predicting/ explaining in this case is also a weighted sum

Dependent = w1Y1 + w2Y2 + w3Y3

Our null hypothesis is that the mean of this weighted combination is equal for groups 1, 2 and all the way up to group N

Instead of looking at a multiple R-squared in this case, we look at two other statistics, Wilk’s lambda and Pillai’s trace

  • Assumptions of MANOVA
  • Independent, randomly sampled observations
  • Variables follow a multivariate normal distribution
  • Homoscedasticity – population covariances for the dependent groups are equal
  • Relationship of dependent variables is linear (because notice you made the dependent into a linear equation)

Also note that in the case of a repeated measures ANOVA certainly assumption 1 and possibly assumption 3 are violated

When you have conducted your MANOVA the first thing you should look at is the Multivariate tests – Wilk’s lambda, Pillai’s trace . Rejecting the null hypothesis that the model does not explain the difference in the VECTOR of means then leads you to examine the second logical question, which of these dependent variables differs ? So , if you don’ t have a significant, lambda, trace, etc. STOP. If you do, move on and check out the univariate F-tests. If your F is significant, go on to post hoc tests.

ETA-squared is the variance accounted for IN THE LINEAR COMBINATION OF THE DEPENDENT VARIABLES by the model.

Mertler and Vannata said it well.

“When the IV has only two categories, the F test for Pillai’s Trace, Wilks’ Lambda, and Hotelling’s Trace will be identical. When the IV has three or more categories, the F test for these three statistics will differ slightly but will maintain consistent significance or nonsignificance. Although these test statistics may vary only slightly, Wilks’ Lambda is the most commonly reported MANOVA statistic. Pillai’s Trace is used when homogeneity of variance-covariance is in question. If two or more IVs are included in the analysis, factor interaction must be evaluated before main effects. “

 

When I am not writing about statistics, I’m making games that teach math, social studies and language.

Check them out.

screen shots from our games

Next Page →