I read a comment on line saying SAS probably would not disappear as an option for statistical analysis because “it’s good when you need to do a lot of data manipulation”.

I wonder what world those people live in that data comes all cleanly packaged and whether they have unicorns there.

Back on Planet Earth, I have a data set that has multiple records for the same date for the same students.  For some reason, the data were being sent at the end of each screen at one site, instead of at the end of the test. So, the data look like this:

kat123 4 5 18 11   2017-04-23 17:39:26

kat123 4 5 18 11   42 17 8 0 1 2017-04-23 17:41:12

and so on.

The students also took a post-test, months later, so …

I need the last record for each date, but my data has date and time

You might think doing

testday= datepart(date_entered);

would work and it would except for the fact that

My date is saved as a character format! What do I do?

You can read some suggestions here in SAS communities

https://communities.sas.com/t5/Base-SAS-Programming/how-to-convert-char-var-to-sas-date/td-p/45067

I could not find

2016-02-03 19:41:26

and I spent a good hour trying different methods to get this to work. I will spare you the details and maybe I could have gotten some method to work (no, whatever you are considering, I probably already tried). However, this occurred to me …

Do you really need to change it to a date format?

In this case, I was not doing any calculations with the date value, I simply needed the day part as a unique value.

I could just use the first 10 characters like this

day_of_test = substr(date_entered,1,10) ;

If you figured this out in the first sentence or two you are probably laughing by now (shut up).  Yes, it doesn’t matter if it is formatted as a date or not. So, that is what I did.  After creating a variable that is just the day of the test, I sorted by username, day of test and date entered (which included the time value). Then, I read in the data using the BY statement in the Data step so there would be  last. variable created that is whether or not this is the last record with that value in the BY group.  I output the last record for each day by using a subsetting IF statement.

Data fixdata ;
set mydata.aztech_pre ;

*** CREATE day_of_test variable as characters 1-10  ;
day_of_test = substr(date_entered,1,10) ;

*** SORT by username, day of test and date entered (including time);
proc sort data=fixdata;
by username day_of_test date_entered ;

*** DATA step that only saves last record ;
Data mydata.aztech_pre ;
set fixdata ;

***  BY statement to define that the data is by username and day_of_test ;
*** NOTE:  If you didn’t do the PROC sort first, this won’t work. For shame! ;
by username day_of_test ;

***
if last.day_of_test  ;
run;

So, that worked perfectly. I included my missteps because it is easy when you are a newbie to believe that everyone is smarter than you and never makes bonehead mistakes. Not so. We all make them all of the time. The important thing is, figuring it out in the end. Sometimes the easy way is not so obvious.

Or, maybe it is and I’m a bonehead. Either way, it worked. Now on to step 2.

 

When I am not writing about SAS, I’m making games that teach math, social studies and language.

Check them out.

screen shots from our games

A while back, I wrote a post on getting your Excel data into SAS Studio the quick and easy way. However,  I hear you saying,

What about ME? What about MY needs? What if I don’t want my data written to the working directory? What if my file has the names at the top and I want to keep those names?

First of all, open a program file and run some code that assigns the LIBNAME to the directory where you want your data stored. It should look like this but whatever is in the quotation marks should be where your data are stored.

LIBNAME mydata “/courses/d1234566789” ;
run;

Second, upload your Excel File

sasexcel1

Under FILES, select the folder where you would like your data stored. Click on the UPLOAD FILES button (the arrow pointing up at the top of the screen) and then click CHOOSE FILES to go to where the file is stored on your computer. Select that file, click the button on the pop-up window that says UPLOAD. Now you have your Excel file, uploaded but you want a SAS file.sasExcel2

Go under TASKS and UTILITIES and click the arrow to select UTILITIES and then select IMPORT DATA.

 

sasExcel3

On the right, you’ll see this big window that says DRAG AND DROP YOUR FILE HERE.

file list

In the left pane, open the FILES directory and go to where you saved your Excel file. Drag it into the window. Once you’ve done that, this wi If you stopped here, you would have the file written to the working directory, and named import.

import option

If you want to change that, click the button that says CHANGE.

changing default name in boxes

This pops up. Don’t see the directory you want? Did you run the LIBNAME statement at the very beginning of this post to assign a library reference to that directory? For shame! You think I just make this stuff up? Go back and do it now.

Okay, should you be concerned that your library name is greyed out? No, you should not. That just means you cannot change the name of your library reference here. If you wanted to change that library name from “mydata” to “yourdata” you’d have to do it in the LIBNAME statement.

Type the name you want for the data set. Do not forget to click SAVE or you may as well have skipped this step.

Click the little running guy at the top of the window.

Before you go, notice that SAS also generates code for you. If, like me, you anticipate that your data may change and you may need to do this again, you can copy and paste the code generated by SAS and save it in a program file. Run it again to recreate your data set. How likely is that to happen?  Well, it happened to me today when I inadvertently (that’s a synonym for “stupidly”, right?) wrote over this exact data set.

/* Generated Code (IMPORT) */
/* Source File: az_pretest.xlsx */
/* Source Path: /home/annmaria.demars/data_analysis_examples/data2017 */
/* Code generated on: 7/31/17, 6:09 PM */

%web_drop_table(MYDATA.aztech_pre);
FILENAME REFFILE ‘/home/annmaria.demars/data_analysis_examples/data2017/az_pretest.xlsx’;

PROC IMPORT DATAFILE=REFFILE
DBMS=XLSX
OUT=MYDATA.aztech_pre;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=MYDATA.aztech_pre; RUN;
%web_open_table(MYDATA.aztech_pre);
run;

Okay, there you go. With a few clicks, your Excel file is accessible in SAS Studio as a SAS data set and you have a copy of the code that did it.

Next post we’ll start whipping that data into shape.

When I am not writing about SAS, I’m making games that teach math, social studies and language.

Check them out.

screen shots from our games

 

 

Once every year, I teach an actual course, not a workshop or professional development, but a class with 20 – 40 students. One where I need to write a syllabus, have lectures, papers to grade, homework and exams.

Now, I’m not comparing teaching masters or doctoral students 3- 6 hours a week to my friends who teach middle school six hours a day. In fact, when I go for a day or two, as a guest speaker for six classes a day, and I need to stand on my feet and keep 40 teenagers’ attention for all of that time, I think yet again that teachers don’t get paid nearly enough.

There are several reasons that it is important to me to teach a course every year, and one is that I think it is super-important as someone who makes educational technology that I be in an actual classroom with students. It’s easy to forget how unbelievably BUSY teachers are if you are not in that situation day after day.

It’s also easy to overestimate the amount of time teachers have to investigate new technology. For example, for the course I am just finishing, I considered just two possible types of statistical software – SAS and SPSS.  The university had a license for one and it was available free (through SAS Studio) for the other. I knew R existed, of course,  but I did not consider it as an option for these students (long story I will skip). I had a short time to decide and someone suggested to me another option – JMP – that I had not considered, but by then I didn’t have time to research it, find a possible textbook and integrate it in my syllabus and lectures. If I’d had more time to look into it, that might have been a good choice.

I know there are other options out there- I had looked at Statistica at one point and it looked pretty cool. However, now that I have my syllabus done, lectures written, textbooks selected, model assignments and my students are generally doing pretty well, it is hard to see myself spending a lot of time researching new software applications for my engineering students.  (Social Science and Education might be a different issue).

My point is that one evident challenge for anyone who makes educational technology is the “good enough” problem. That is, if things are going good enough, teachers are not highly motivated to look for something better.

One of the things that drives me crazy, is those teachers who think it’s “good enough” when the vast majority of their students are below grade level or not proficient – but that’s a rant for another day.

(If you’re fascinated by this topic – and who wouldn’t be – I wrote more about why teaching helps me run an ed tech startup on my other blog over on the 7 Generation  Games site)

 

When I am not writing about statistics, I’m making games that teach math, social studies and language.

Check them out.

screen shots from our games

It may not be the secret to great joy but it is certainly the secret to avoiding unhappiness and it is simply this:

The absence of self-ruminative thoughts.

I’d like to claim the idea was originally mine but the truth is I first heard this phrase over a decade ago in a talk by Albert Bandura (yes, THAT Albert Bandura) and he said one of the differences between people who are content with their lives and those who are unhappy is that the happy group have “an absence of self-ruminative thoughts”.

There is a phrase I use a lot,

Not my circus, not my monkeys.

monkeys

In other words, I don’t make everything about ME.

Here are tips to not ruminating too much.

  1. What people think about you is none of your business (I stole this one from Darling Daughter Number Three)

I do the best I can. When I meet with employees or students, I tell them what I think needs to be said, listen to what they have to say and then I don’t worry about whether I was too harsh or too wishy-washy, whether they respected my authority or thought I was incompetent. If random Joe on the Internet thinks I’m old and grey and should just shut up, well, as much as it pains me to have lost the good opinion of an anonymous person I have never met – oh, wait, no I don’t care.

2. Don’t take things personally

If I screw up,  I try to learn from it. If I don’t get a grant, or a person decides not to invest in our company or a school decides not to buy our games,  I listen to their reasons and if it is a reasonable suggestion for a change I can make, I try to do it. If not, I don’t worry about it. I still remember the astonishment I felt seeing a colleague throw a grant review in the trash without reading it.

What are you doing? Why didn’t you read the comments?

I asked. He responded,

Shit, why should I read it? They didn’t like me. They don’t think I’m a researcher.

It’s more than just not taking things personally, though. It’s also a matter of not making everything about how other people are not acting as YOU think they should behave.

3. Don’t make it about YOU when it’s not

Your adult children aren’t raising their kids the way you think they should? The neighbors don’t maintain their yard the way you think it should be ?

Not my circus, not my monkeys.

4. Look out instead of in

A few months ago, we had a really fascinating guest on the More Than Ordinary podcast, Jonathan Shaw. He’d just finished writing his autobiography, Scab Vendor, and he encouraged me to go away for a month and write my own autobiography. Jonathan’s book was interesting and his idea was intriguing. I randomly happened to be in an area known as a writer’s retreat in Lopinot, Trinidad and I tried for a bit. I have had a long strange trip around the world and back again, that’s for sure.

I just don’t get excited about the idea of looking back through all of the things that happened in my life. Jonathan said,

You’ll grow from the experience, but it will probably hurt – and I only saw ‘probably’ to be nice.

Maybe if I went back and hung out in the mountains I would find myself.

Lopinot

Instead, I went back to making games, looking forward instead of back. Feel free to buy some. They are fun and you’ll learn. Kind of like life should be.

screen shots from our games

So, after three posts of

we have arrived at MANOVA.  If you skipped those three posts, feel shame at trying to take shortcuts, go back and read them.

Before we dive into coding, let’s take a look at some basic background on MANOVA.

The difference between ANOVA and MANOVA is simple

  • With ANOVA you have one dependent variable
    With MANOVA you have multiple dependent variables

How does that work? Think back to what you know about multiple correlation

In correlation, you are looking at the relationship between two variables, X and Y. You predict changes in X from changes in Y

Y = bX

In multiple correlation you are looking at the relationship between Y and MULTIPLE X variables.

You have an equation something like

Predicted Y = b0X0 + b1X1 + b2X2 + b3X3

And you are looking at how the Y variable changes in relation to the PREDICTED Y. Notice that predicted Y is a sum of all of your variables, each of which is multiplied by a regression coefficient.

The correlation between these predicted Ys and the actual Y is your multiple R and the multiple R-squared in ANOVA or regression is the square of the multiple R.

The multiple R-squared answers the question – how much of the variance in the dependent variable can be explained by variance in the independent variable (s) ?

In the case of ANOVA, this variance is in group membership, so we are testing the null hypothesis that the mean of group1 = the mean of group 2 all the way to group N

With MANOVA, you have multiple variables on the Y side of the equation

The variable you are predicting/ explaining in this case is also a weighted sum

Dependent = w1Y1 + w2Y2 + w3Y3

Our null hypothesis is that the mean of this weighted combination is equal for groups 1, 2 and all the way up to group N

Instead of looking at a multiple R-squared in this case, we look at two other statistics, Wilk’s lambda and Pillai’s trace

  • Assumptions of MANOVA
  • Independent, randomly sampled observations
  • Variables follow a multivariate normal distribution
  • Homoscedasticity – population covariances for the dependent groups are equal
  • Relationship of dependent variables is linear (because notice you made the dependent into a linear equation)

Also note that in the case of a repeated measures ANOVA certainly assumption 1 and possibly assumption 3 are violated

When you have conducted your MANOVA the first thing you should look at is the Multivariate tests – Wilk’s lambda, Pillai’s trace . Rejecting the null hypothesis that the model does not explain the difference in the VECTOR of means then leads you to examine the second logical question, which of these dependent variables differs ? So , if you don’ t have a significant, lambda, trace, etc. STOP. If you do, move on and check out the univariate F-tests. If your F is significant, go on to post hoc tests.

ETA-squared is the variance accounted for IN THE LINEAR COMBINATION OF THE DEPENDENT VARIABLES by the model.

Mertler and Vannata said it well.

“When the IV has only two categories, the F test for Pillai’s Trace, Wilks’ Lambda, and Hotelling’s Trace will be identical. When the IV has three or more categories, the F test for these three statistics will differ slightly but will maintain consistent significance or nonsignificance. Although these test statistics may vary only slightly, Wilks’ Lambda is the most commonly reported MANOVA statistic. Pillai’s Trace is used when homogeneity of variance-covariance is in question. If two or more IVs are included in the analysis, factor interaction must be evaluated before main effects. “

 

When I am not writing about statistics, I’m making games that teach math, social studies and language.

Check them out.

screen shots from our games

Julia yellingWhere is the Multivariate Analysis of Variance ?

You promised there would be MANOVA ! Now we’re in the third post!

First there was recoding of variables.

Then, there was creating scales. 

Now, we’re looking at reliability.

Patience is a virtue.

Before we get to doing a MANOVA we want to be sure that our dependent and independent variables are reliable and valid. Let’s move on to reliability.

I’m going to do a correlation matrix and a Cronbach alpha, which is a measure of internal consistency. The rationale is that if items all measure the same construct – say, knowledge of health practices, or autonomy or acceptance of wife beating – then those items should be related to one another. An alpha of 0 would indicate the covariance of items in the scale are zero, so, your scale sucks. An alpha of .95 would mean your scale is amazingly consistent.

So, I did three analysis for my three scales

Title "Health Variables " ;
proc corr data=example alpha ;
var hbs1 hbs3-hbs7 ;

Title "Wife beating variables" ;
proc corr data=example alpha ;
var GR34 - GR39 ;

Title "Decision Variables" ;
proc corr data=example alpha ;
VAR D_GR1A GR2A D_GR3A D_GR4A GR5a GR6A D_GR7A GR8A
D_GR9A GR9F D_GR10A D_GR12A GR10F GR12F ;

Let’s skip the simple statistics, mean, etc. you get from these analyses and go to the alpha

Screen Shot 2017-06-14 at 9.48.47 PM

The alpha for the health scale is pretty bad. The value for the raw scores is .31, for standardized items, still really bad at .32.  When we look at how deleting a variable would improve the alpha, if we dropped the first variable , the alpha would go up to .34 – but that is still awful.

For the wife-beating scale the raw value for alpha was .81 and also for the standardized value. So, that one was pretty good as far as reliability.

I put all of the decision variables together, the ones on whether the woman was involved in making decisions, could go places on her own, needed to ask permission to go places. The Cronbach alpha for the raw variables was .65, for standardized variables .81. Note that standardized variables are placed on the same metric, so my idea of some variables being much more important than others did not pan out.

So … I standardized the variables, then I read in that data set and created two scales, one that was a sum of the decision  variables and the other that was the mean of the 6 wife-beating variables. There was no particular reason for using the mean of the six variables as opposed to just adding them up. I did both methods to show it was an option.

BEWARE THE SUM FUNCTION – Note, I did not use the sum function. If you add up the values, as shown below, and one of the variables has a missing value then the value of the sum is going to be missing. If you used the SUM function, the variables that have non-missing values would be added up, so the missing value would be treated as a zero. There are times where that is acceptable. This is not one of those times.

While I’m at it, I want to check whether the scales have approximately normal distributions. A perfectly normal distribution would have skewness and kurtosis values of 0.

proc standard data=example mean=0 std=1 out=MAN_data;

Data create_manova ;
set man_data ;
* I could have used the mean function here, but I didn't ;
decision = D_GR1A + GR2A + D_GR3A + D_GR4A + GR5a + GR6A + D_GR7A + GR8A +
D_GR9A + GR9F + D_GR10A + D_GR12A + GR10F + GR12F ;
beating = mean(of gr34-gr39);

proc univariate data=create_manova ;
var decision beating ;

The skewness values were relatively low: -1.3 and 0.2 for the two scales and kurtosis values were 2.0 and -1.2  . Since my scales aren’t a radical departure from normality, I’m now going on to MANOVA – finally!

When I am not writing about statistics, I’m making games that teach math, social studies and language.

Check them out.

screen shots from our games

Last time, we saw how to recode variables to score answers correct or incorrect, on a rating scale and weighted by importance. Today, we’re going to look at creating some scales from those variables because for reasons I’m sure I have written about at some point in the past, single items are usually not very reliable. Whether you use SAS, SPSS, R or any other statistical package, you are still going  to need to follow the steps of recoding your variables and creating and validating your scales before you get into MANOVA. Or, at least, you will if you are smart.

First, I want to check that there are no obvious errors or other problems in my data.
PROC MEANS DATA=example ;
VAR gr2A -- gr39 hbs1 --d_gr12a ;

You could type in the variable names but that is a lot of typing. The double dashes mean to include all variables in the data set in order from the first variable to the one that comes after the dashes. How do you know what order the variables are in? Click on the OUTPUT DATA tab at the top and look to the left under COLUMNS.

output da

If you didn’t just run a program creating your data and hence don’t have an OUTPUT DATA tab, you can find your data file by clicking the MY LIBRARIES tab and then clicking on the library (directory) where your data are kept and clicking on the dataset to open it. You can also use the PROC CONTENTS procedure but today we are being all pointy and clicky with SAS Studio.

Sometimes you will see something like:

VAR item1 – item12 ;

The single dash is used for variables that end in a number and if you don’t have item1, item2 all the way through item12, it will give you an error and not run. Then you will be sad.

PROC MEANS will give you the N, mean, standard deviation, minimum and maximum.

Here are a few things to consider.

  • Is the N substantially less than you had expected? If so, you have a lot of missing data and you should investigate that. The lowest N I have is 37, 814 out of 39, 430 people so not bad, but I might want to look at that one item, since most of the items have close to 39,000 for an N
  • Is your standard deviation zero? STOP RIGHT THERE!  On just what variable could 39,000 people give the same response? This likely shows a big problem with your data. I did not have that problem, so I continued.
  • Are your minimum and maximum the minimum and maximum possible scores for the item? Now, this may not always be the case. On a scale of 1 to 10, say, with a sample of 50 people, maybe no one will say 1. However, I have over 39,000 people and the items are 0 or 1, o – 2  or 1- 3, so I should have people from the minimum to the maximum or something is wrong. Nothing is wrong, and I continue.
  • Are the means about what you expect? Well, I’m not really an expert on social structure and family relations in India, so I can’t say. About a third of the women said it was usual for a husband to beat his wife if her dowry was not what was expected. About three-fourths said they would be allowed to visit a family or friend’s home alone.

Okay, so my results from the means procedure looks okay. Now what?

Next, I’m going to do a factor analysis to see if my supposition is supported of three scales related to health, beating your wife and autonomy.

Here is the code for my factor analysis.

PROC FACTOR DATA =example SCREE ROTARE= VARIMAX NFACTORS=5;
VAR gr2A -- gr39 hbs1 --d_gr12a ;

This is actually the second one I ran. In inspecting the results for the first, between the eigenvalues and scree plot, I decided that at most I should retain five factors. I’ve written a lot about factor analysis on this blog previously, so I’m not going to go into detail here.  In short, the decision-making variables mostly loaded on the first factor with factor loadings of .70 and higher. The median communality estimate for those items was about .67.  In short, considerable evidence for a decision-making factor. The wife-beating variables loaded on the second factor. All but one loaded above .67, and even that variable (Beating your wife if she had an extramarital affair – which 84% of the women said was accepted in their communities) loaded at .40. The variables regarding needing permission to go places loaded on the third factor and also had high communality estimates. The variables regarding going places by yourself loaded on the fourth factor and also had high communality estimates.

The health variables were a different story. Four out of six loaded between .47 and .67 on the fifth factor. The other two did not load on any factor.

It is starting to look like at this point that it is okay to retain the wife-beating items as a scale. The various measures of autonomy  – decision-making, going places on your own and needing permission – seem to hang together within factors. I think it would be reasonable to put all three of these together in one scale. I talked about parceling in the past, and I could have done that as a step here, and then re-run the factor analysis to support (or not) my supposed autonomy factor. Since I have limited time and simply doing this analysis for educational and illustrative purposes, I skipped over this to the next procedure, which is reliability analysis.

Since this post is pretty long already, I’ll save that for the next post.

When I am not writing about statistics, I’m making games that teach math, social studies and language.

Check them out.

screen shots from our games

Other people want to go see the new Wonder Woman movie. I’ve been wanting to talk about MANOVA, but first, we need some decent dependent and independent measures.

I have the India Human Development Survey data on over 39,000 women and my hypothesis is that education is related to women’s rights’ issues, especially autonomy, health practices knowledge and domestic violence. I also think that mobility might be related, as women who get out of their native village might be exposed to new ideas.

Before I can test out my (supposedly) brilliant hypotheses, I need to create some variables because it turns out when they were collecting data in India in 2011 they were not thinking about my convenience. (Yes, I, too, am appalled by this lack of consideration.)

Independent Variables

First, I will need to create my independent variables from

EW11 Differences in family by mobility

1= same village/ town

2= another village

3 = another town

4 = metro (since only 1% fall in here, I’m going to delete this category)

and education (see below)

Items that will go into dependent variables (maybe)

HEALTH QUESTIONS

HB1 Milk harmful

HB3. 1st milk good for baby 

Hb4 chulha smoke good

Hb5 child diarrhea drink more

Hb6 illness spread through water

Hb7 malaria spread

DECISIONS

The items below are scored 1 if the respondent decides, 0 if the respondent does not decide. (More than 1 person can decide, so if both husband and wife decide, the answer will be 1 for both. In this case, I just looked at if the wife had a say in the decision.)

  • GR1a Cooking
  • GR2A Expensive purchases.      
  • GR3A Decides number of children
  • GR4A Decides what to do if sick
  • GR5A Decides whether to buy land  
  • GR6A Decides wedding expense
  • GR7A Decides if child is sick
  • GR8A Decides who your children should marry

The items below are score 1 if the woman is allowed to do these things alone and 0 if she is not.

  • GR9F Can visit health center alone
  • GR10F Can visit relative/ friend alone
  • GR12F. Can go short distance alone

These items relate to whether the woman needs to ask permission for activities, with  0 = no, 1 = must inform someone and 2 = yes

  • GR9A Ask permission to visit health center
  • GR10A Ask permission to visit relative
  • GR12A. Ask permission to travel by bus/train

 

WIFE BEATING QUESTIONS

GR34 – GR39  – All of these relate to under what circumstances it is acceptable, coded yes = 1 or 0 = no.

As you can see, well, I hope you can see, each of these presents a different date re-coding problem.

  • Mobility and education needs to be coded into categories (there is a minor reason I will explain in a later post why this is not necessary but convenient), with the fourth category deleted,
  • Health questions need to be scored as correct or incorrect.
  • Decision questions are all scored equally – so deciding what food  to cook and how many children you have are each scored a 1. I think that’s not right and I want to weight some decisions more than others.
  • Independence questions need to be reverse coded, so not asking permission is a 2 and asking permission is a 0
  • Wife-beating questions need no recoding

So … here we go. The first thing we’re going to do is create categories. Notice I don’t do anything with the category 4 for mobility, so those people will just have a missing value for MOBILITY and be dropped from the analysis.

Also, a note on ELSE as opposed to just IF statements.

I could just use all IF statements but that would be inefficient. It doesn’t really matter here with 39,000 records but if I had millions it would slow down processing. The ELSE statement is only processed if the preceding IF statement is false.

NOTE!!!  In the second set of IF- ELSE statements, I have

else if ew8 < 9 and ew8 ne . then education = “ELEM”;

This statement is only executed IF the preceding IF statement was false.  Without the ELSE, everything less than 9, including those who had 0 years of education, would be set to ELEM.  Without the and ew8 ne .  in this statement, anyone that had missing data would be set to ELEM along with anyone who had 1-8 years of education.


data example ;
set mydata.india ;
If EW11 = 1  then Mobility = “None” ;
else if EW11 = 2 then mobility = “Vill” ;
else if EW11 = 3 then mobility = “TOWN”;

if ew8 = 0 then education = “NONE” ;
else if ew8 < 9 and ew8 ne . then education = “ELEM”;
else if ew8 > 8 then education = “HS +”;

*** The statements below recode the health items ;

*** For hb1 the correct answer is 0, so  1-hb1   will score respondents who said 0 as correct (= 1) and those who said 1 as incorrect (=0);

*** For hb3 the correct answer is 1, so respondents who said 1 are scored as correct (= 1) and those who said any number higher than 1 as incorrect (=0);

*** For hb4 – hb7, the correct answer is scored as correct (=1) and any numbers in the incorrect set scored as incorrect (=0);
*** HEALTH QUESTIONS ;
hbs1 = 1- hb1 ;

If hb3 = 1 then hbs3 = 1 ;
Else if hb3 > 1 then hbs3 = 0 ;
If hb4 = 2 then hbs4 = 1 ;
Else if hb4 in (1,3) then hbs4 = 0 ;
If hb5 = 2 then hbs5 = 1 ;
Else if hb5 in (1,3,4) then hbs5 = 0 ;
If hb6 = 2 then hbs6 = 1 ;
Else if hb6 in (1,3,4) then hbs6 = 0 ;

If hb7 = 3 then hbs7 = 1 ;
Else if hb7 in (1,2,4) then hbs7 = 0 ;

 

/* DECISION QUESTIONS */
/* ALSO INCLUDES ADDITIONAL ITEMS NOT RECODED */

**** Here, I multiplied items by a factor based on my estimation of importance ;
D_GR1A = GR1A* 0.5 ;
D_GR3A = GR3A * 10 ; * BECAUSE I THINK IT’S IMPORTANT ;
D_GR4A = GR4A *2 ;
D_GR7A = GR7A *2 ;

**** These items are subtracted from 3 so doesn’t have to tell anyone = 2 ;

****  Needs to inform someone = 1 and needs to ask permission = 0 ;
D_GR9A = 3 – GR9A ;
D_GR10A = 3 – GR10A ;
D_GR12A = 3 – GR12A ;

**** KEEPS THE VARIABLES I PLAN TO USE ;
Keep EW8 EW5  Ew6 EW10  EW14a   EW12a EW12b
HBS1 HBs3-HBS7 D_GR1A GR2A D_GR3A D_GR4A GR5a GR6A D_GR7A GR8A
D_GR9A GR9F D_GR10A D_GR12A GR10F GR12F GR34 – GR39 mobility education;

So, there we go. You might think I would dive into a Multivariate Analysis of Variance now but you would be wrong. The next thing I am going to do is check the validity of my scales through a combination of factor analysis, univariate statistics and reliability analysis. Only after  that step will I do the MANOVA.

I’m teaching a course on multivariate statistics and for some of the students it’s been a minute since their last inferential statistics course.

So, I have been doing a few videos here and there to refresh, for example, what is a repeated measures ANOVA and why you might want to do it.

 

Sometimes I use repeated measures ANOVA to test whether our games are effective in improving math scores (they are!). You can check out the games here.

attacking the aztecs

If you are interested in being a beta tester for our first bilingual game that teaches statistics, please email info@7generationgames.com

Since I had done a few youtube videos on using SAS Studio, I thought I would add them to my blog. This one uses the characterize data task to take a quick look at the data, but I suppose you could have guessed that from the title.

 

Support my day job AND get smarter. Buy Fish Lake for Mac or Windows. Brush up on math skills and canoe the rapids.

girl in canoe

For random advice from me and my lovely children, subscribe to our youtube channel 7GenGames TV

← Previous PageNext Page →