MANOVA beginning to end: Recoding Data is Part of the Process

Other people want to go see the new Wonder Woman movie. I’ve been wanting to talk about MANOVA, but first, we need some decent dependent and independent measures.

I have the India Human Development Survey data on over 39,000 women and my hypothesis is that education is related to women’s rights’ issues, especially autonomy, health practices knowledge and domestic violence. I also think that mobility might be related, as women who get out of their native village might be exposed to new ideas.

Before I can test out my (supposedly) brilliant hypotheses, I need to create some variables because it turns out when they were collecting data in India in 2011 they were not thinking about my convenience. (Yes, I, too, am appalled by this lack of consideration.)

Independent Variables

First, I will need to create my independent variables from

EW11 Differences in family by mobility

1= same village/ town

2= another village

3 = another town

4 = metro (since only 1% fall in here, I’m going to delete this category)

and education (see below)

Items that will go into dependent variables (maybe)

HEALTH QUESTIONS

HB1 Milk harmful

HB3. 1st milk good for baby

Hb4 chulha smoke good

Hb5 child diarrhea drink more

DECISIONS

The items below are scored 1 if the respondent decides, 0 if the respondent does not decide. (More than 1 person can decide, so if both husband and wife decide, the answer will be 1 for both. In this case, I just looked at if the wife had a say in the decision.)

• GR1a Cooking
• GR2A Expensive purchases.
• GR3A Decides number of children
• GR4A Decides what to do if sick
• GR5A Decides whether to buy land
• GR6A Decides wedding expense
• GR7A Decides if child is sick
• GR8A Decides who your children should marry

The items below are score 1 if the woman is allowed to do these things alone and 0 if she is not.

• GR9F Can visit health center alone
• GR10F Can visit relative/ friend alone
• GR12F. Can go short distance alone

These items relate to whether the woman needs to ask permission for activities, with  0 = no, 1 = must inform someone and 2 = yes

• GR9A Ask permission to visit health center
• GR10A Ask permission to visit relative
• GR12A. Ask permission to travel by bus/train

WIFE BEATING QUESTIONS

GR34 – GR39  – All of these relate to under what circumstances it is acceptable, coded yes = 1 or 0 = no.

As you can see, well, I hope you can see, each of these presents a different date re-coding problem.

• Mobility and education needs to be coded into categories (there is a minor reason I will explain in a later post why this is not necessary but convenient), with the fourth category deleted,
• Health questions need to be scored as correct or incorrect.
• Decision questions are all scored equally – so deciding what food  to cook and how many children you have are each scored a 1. I think that’s not right and I want to weight some decisions more than others.
• Independence questions need to be reverse coded, so not asking permission is a 2 and asking permission is a 0
• Wife-beating questions need no recoding

So … here we go. The first thing we’re going to do is create categories. Notice I don’t do anything with the category 4 for mobility, so those people will just have a missing value for MOBILITY and be dropped from the analysis.

Also, a note on ELSE as opposed to just IF statements.

I could just use all IF statements but that would be inefficient. It doesn’t really matter here with 39,000 records but if I had millions it would slow down processing. The ELSE statement is only processed if the preceding IF statement is false.

NOTE!!!  In the second set of IF- ELSE statements, I have

else if ew8 < 9 and ew8 ne . then education = “ELEM”;

This statement is only executed IF the preceding IF statement was false.  Without the ELSE, everything less than 9, including those who had 0 years of education, would be set to ELEM.  Without the and ew8 ne .  in this statement, anyone that had missing data would be set to ELEM along with anyone who had 1-8 years of education.

data example ;
set mydata.india ;
If EW11 = 1  then Mobility = “None” ;
else if EW11 = 2 then mobility = “Vill” ;
else if EW11 = 3 then mobility = “TOWN”;

if ew8 = 0 then education = “NONE” ;
else if ew8 < 9 and ew8 ne . then education = “ELEM”;
else if ew8 > 8 then education = “HS +”;

*** The statements below recode the health items ;

*** For hb1 the correct answer is 0, so  1-hb1   will score respondents who said 0 as correct (= 1) and those who said 1 as incorrect (=0);

*** For hb3 the correct answer is 1, so respondents who said 1 are scored as correct (= 1) and those who said any number higher than 1 as incorrect (=0);

*** For hb4 – hb7, the correct answer is scored as correct (=1) and any numbers in the incorrect set scored as incorrect (=0);
*** HEALTH QUESTIONS ;
hbs1 = 1- hb1 ;

If hb3 = 1 then hbs3 = 1 ;
Else if hb3 > 1 then hbs3 = 0 ;
If hb4 = 2 then hbs4 = 1 ;
Else if hb4 in (1,3) then hbs4 = 0 ;
If hb5 = 2 then hbs5 = 1 ;
Else if hb5 in (1,3,4) then hbs5 = 0 ;
If hb6 = 2 then hbs6 = 1 ;
Else if hb6 in (1,3,4) then hbs6 = 0 ;

If hb7 = 3 then hbs7 = 1 ;
Else if hb7 in (1,2,4) then hbs7 = 0 ;

/* DECISION QUESTIONS */
/* ALSO INCLUDES ADDITIONAL ITEMS NOT RECODED */

**** Here, I multiplied items by a factor based on my estimation of importance ;
D_GR1A = GR1A* 0.5 ;
D_GR3A = GR3A * 10 ; * BECAUSE I THINK IT’S IMPORTANT ;
D_GR4A = GR4A *2 ;
D_GR7A = GR7A *2 ;

**** These items are subtracted from 3 so doesn’t have to tell anyone = 2 ;

****  Needs to inform someone = 1 and needs to ask permission = 0 ;
D_GR9A = 3 – GR9A ;
D_GR10A = 3 – GR10A ;
D_GR12A = 3 – GR12A ;

**** KEEPS THE VARIABLES I PLAN TO USE ;
Keep EW8 EW5  Ew6 EW10  EW14a   EW12a EW12b
HBS1 HBs3-HBS7 D_GR1A GR2A D_GR3A D_GR4A GR5a GR6A D_GR7A GR8A
D_GR9A GR9F D_GR10A D_GR12A GR10F GR12F GR34 – GR39 mobility education;

So, there we go. You might think I would dive into a Multivariate Analysis of Variance now but you would be wrong. The next thing I am going to do is check the validity of my scales through a combination of factor analysis, univariate statistics and reliability analysis. Only after  that step will I do the MANOVA.