God spare me from the self-taught software developer who knows only the latest thing.

God

I’m not against the latest thing, whether it is react or ember or Python games on Raspberry Pi or whatever it is today. My objection is to the fallacy that it is the only thing or even the most import thing.  Let me enlighten you with why I am loathe to hire self-taught programmers no matter how many of the ‘most elegant’ techniques their example project showcases.

There are several things you learn as a grown-up programmer (which The Invisible Developer tells me I should not call myself because it sounds lower than software developer. Again, I ignore him. Do not be misled by this to believe he is not high on my list.  He just brought me a martini, with bleu cheese stuffed olives. )

martini

What Self-Taught Programmers Aren’t Taught

If you taught yourself to code by some online coding school or watching videos or reading books from Safari O’Reilly that shows an admirable amount of motivation. If you already have some experience as a software developer and this is how you learned a new language, that’s great. Maybe we can hang out and work together. If, however, that is your ONLY source of knowledge and experience, probably not. There are a few things self-taught programmers are generally not taught simply because they are not working as part of a team.

  1. Testing. Testing. Testing. I said it three times because it was important. I think  I will say it again. Testing. Testing. This is why I need the martini. If you are developing an application, you need to test EVERYTHING. If I had a dollar for every time someone told me, “I tested everything but …” I would never need to seek investor funding again, I would just pull money from the piles in every room in my house. However much you think you need to test your software, you are wrong. The answer is, “More.” You need to test it on other machines besides yours. I learned this from SAS code that ran on Mac (yes, there was SAS on Mac a very long time ago) but not on Windows or on Windows but not on Unix. You can’t look down your nose at those people who aren’t running Windows 10 because that is only half of people who run Windows and less than 20% of the total market. SAS is actually a good starting point for learning this because it runs on a lot of devices with few changes but you do need to change the LIBNAME and FILENAME statements, for example. Similarly, we make games now that run on Mac, Windows, iOS and Android . At a minimum, you need to do a separate build , but sometimes you need to make major changes. For example, Android has some limitations on app size that iOS does not. Test whether your software installs. Test whether it opens. Test the most basic applications. For SAS, this would be creating a temporary data set, reading in data with a DATALINES statement and doing a PROC MEANS. For our educational games, it might be playing all the way through getting all of the answers correct. Test extreme cases. For S AS this might be merging several enormous datasets, applying user created formats, calling macros to manipulate the data and then performing a multivariate analysis of variance.

    For our games, it would mean getting every single problem wrong and quitting the game and logging back in many times, maybe after every problem. It would include entering completely illogical numbers, say, that you had picked 9,145,087 berries and and seeing if the program really tried to put over 9 million berries in the baskets.

    I’m sure you can think of some more extreme cases, but you get the idea.

    I can’t emphasize testing enough. The problem with someone who creates applications on his or her own is that person understands completely how the software is supposed to work. Real testing includes things like wandering off the path in a game with the path clearly marked, “just to see what would happen”. It is having people enter “as often as I can” instead of male or female for sex.

    I once asked someone how he managed to test a game where the image that showed the key for deciphering the message was missing and he said, “I knew what the image was supposed to be.” This was not the answer I was looking for.

  2. Debugging is most of your life as a software developer. Basically, you write code for a few minutes and then swear and debug it for hours. Once you have a little experience, you learn to test and debug as you go and never write huge blocks of code that you then find doesn’t work and you have to figure out where in there the bugs occurred. You will learn all types of tricks of the trade for debugging. These include, printing out the first few records of a data set to make sure it looks like you expect. With JavaScript it might be writing the value of a variable to the console. Either way, the point is the same, you are testing little bits of code as you go and seeing that the result is what you expected. You also learn to debug all the way through. With SAS, you might apply the statements you have written to a data set in the documentation and verify that you got the same results. With a game, you might collect all of the objects in a scene and then check that the variable recording the number of objects is equal to what you expect.

    In any program that you are writing, you learn to break it into modules and test each of those modules. So you are debugging it in chunks by writing out the values of some number both in small steps, say even after each statement if you are really running into problems, and also in medium steps, say, at the end of each S A S data step or procedure, or after the execution of each function.

    I’m not saying that self-taught programmers don’t debug their code because obviously they do. No one always writes code that works perfectly the whole time. What I am saying, is that if you are self-taught, you only know the debugging techniques that you have figured out for yourself, as opposed to picking up ideas from your colleagues.

  3. A third part of being a grown-up software developer often missed by those who are self taught is how to document the software. Comments are your friend. I had a colleague who made fun of me for how much I would put comments in the code but when the next year we had to do a similar project again I could turn to him and say, “who’s laughing now, bitches?” I have never met the programmer who enjoyed writing documentation. I have met a lot of programmers who were happy they had written it. if you are always chasing the latest thing, you might not be in that situation where you need to revisit something that you did a year or two ago. If you are not part of the team, you probably are not worrying whether some nonexistent team member can understand your code. On the contrary, you might be trying some really cool new ideas just because they’re interesting. I’m not against that, in fact, I completely understand. However, you need to document those cool new things. And if you take the attitude, “well, everyone should be expected to know the function call to integrate Lua with PHP”, come here little closer so I can slap you.

Here’s why being part of a software development is usually a crucial aspect of your career progress – all of the things I mentioned, most people don’t really want to do. Testing isn’t nearly as fun as writing code. No one likes to write documentation. Everyone knows that debugging is crucial but it usually seems at the time as if putting in all of those statements to check every single variable’s value after every manipulation is so time-consuming when you are just sure it was correct anyway. When you’re on a team, you can’t get away with cutting corners and skipping the not fun parts nearly so much. You also realize how crucial those parts are when other people on the team have no idea what in the hell you were doing when you wrote that function or macro or nested do loop.

Sorry, but I don’t think a weekend hackathon is any substitution no matter how many prizes you won. Not unless you had to return to the same hackathon six months later and update the project with a completely new set of people.

I don’t want to leave you all depressed, though. So, I do have two pieces of advice. For the debugging part there are plenty of software conferences you can attend, and find sessions on tips for debugging software. you may also meet people at those conferences that you could end up working with on a team for some project interests all of you.

Blogging – is a great way to document what you have been doing. On this blog, and my other company blog, I often write down what ever I have been working on lately just so I remember when I run into a similar problem six months down the road. You’d be surprised how often I Google a question and one of the first answers that pops up is a blog post I wrote years ago.

Speaking of  games – check out Making Camp, you can get it here for free. Play it and learn stuff because maturity is overrated.

wigwam

If you want to learn even more stuff, you can get a bilingual version of Making Camp for your iPad for only $1.99 and brush up on your Spanish like you always said you were going to do but didn’t

Today, I finished up a bonus Easter egg for the game, Aztech: Meet the Maya that you are taken to play when you click to see what Jose is thinking.  You can see a rough version of it here. This plays better on a desktop / laptop because iPad blocks the autoplay for sound, but when it’s packaged for the app store, that will work on the iPad as well.

This game uses several functions, all of which I wrote my little old self.

  1. Switches sound file played from English to Spanish
  2. Switches text from English to Spanish
  3. Takes you to the bonus game when you click on the sound bubble
  4. When the sound file ends, replaces the talking gif with a static image  and shows the arrow to continue.
  5. For each item on the screen, performs an action when clicked – anything from text describing it’s use to the Maya to a jumping and howling monkey. Also, removes that item name from the list of things to find, increases the number of found items by 1 and checks to see if all items are found.

There are probably some other things I forgot.

 

monkey

You might wonder how I got from SAS to here. Well, it all started with SAS macros. A macro is no more than a user-written function. When I was first exposed to this idea in graduate school back in the 1980s (yes, literally) my mind was blown! You mean, I could write my own functions?!

You might think this SAS macro that I wrote a couple of years ago

%macro sched(the_day,start1,finish1,teacher1,start2=0, finish2=0, teacher2=” “, start3=0,finish3=0, teacher3=” “);
if date_data = &the_day then do ;
if minutes > &start1 & minutes < &finish1 then tclass = &teacher1 ;
else if (&start2 > 0) & (minutes > &start2) & minutes < &finish2 then tclass = &teacher2 ;
else if &start3 > 0 & (minutes > &start3) & minutes < &finish3 then tclass = &teacher3 ;
end ;
%mend sched ;

doesn’t look like this JavaScript function

// Section to include sound. ;

function playJungleAudio(scene,langs) {
audio_e2 = new Audio();
audio_s2 = new Audio();

if(langs ===2){
audio_e2.src = "sounds/" + scene + "_eng.mp3";
audio_s2.src = "sounds/" + scene + "_sp.mp3";
if ($("#span_button").hasClass("noshow")) {
audio_e2.play();
} else {
audio_s2.play();
}
}
else {
audio_e2.src = "sounds/" + scene + ".mp3";
audio_e2.play();
}
}

If you look closely, though, these are identical in purpose and structure. Both are intended to package a set of statements that will then be executed when called. For both types of functions, SAS (macros) or JavaScript, parameters are optional. Both of these examples just happen to have parameters. Both have a defined start and stop.

In SAS it is

%macro macr0-name (parameters) ;

in JavaScript it is

function function-name(parameters) {

Both have a defined end, with SAS it is

%mend macroname ;

with JavaScript it is simply

}

 

Both are named functions (JavaScript also has anonymous functions), and when you call the function it executes.

It just so happens that both of these functions contain if-then – else statements.

To call the SAS macro, you give the macro name with a % in front of it, and include all the parameters in parentheses, separated by commas.

%sched(19292,790,840,”Elmo”,start2= 840, finish2=900, teacher2= “Bert”, start3=940,finish3=990, teacher3= “Snuffleupagus”);

To call the JavaScript function, you give the name, and include all parameters in parentheses, separated by commas.

 playJungleAudio("howler_monkey",1);

These parameters are then passed to the macro/ function and their values are plugged into the code between the beginning and end.

I have a lot more to say about this but it is getting close to 1 am and I have a plane to catch tomorrow so I’ll have to pick it up next time.

Speaking of  games – check out Making Camp, you can get it here for free. Play it and learn stuff because maturity is overrated.

wigwam

If you want to learn even more stuff, you can get a bilingual version of Making Camp for your iPad for only $1.99 and brush up on your Spanish like you always said you were going to do but didn’t

blocksSo you want to be a successful software developer / consultant ?

If you are in any kind of quantitative field you have a VAST range of options, from working at some of the largest companies in the world in marketing research to performing efficacy studies for non-profits whose staff members can be counted on one hand.

All of these broad number of opportunities require, at most, five building blocks:

  1. Programming concepts – You need to understand scope, do-loop, arrays, functions
  2. Data management – The thousand ways that users can enter data, and how to keep it from screwing up your results
  3. Working in a software development team – this is the part “self-taught” programmers are often not taught – documentation, testing and debugging
  4. Statistics – coming from the age when we inverted matrices by hand with a piece of pare and a pencil (not kidding) SAS, SPSS, R, Stastica, JMP and even Excel have made this a hundred times easier from when I started in the field
  5. Domain specific knowledge – by that, I mean if you are working in aerospace know something about what a transmitter and receiver are, know that a male and a female plug is a thing. If you are in biostatistics, understand survival analysis, relative risk.

(Yes, I know I said four in the previous post but then I thought about the importance of being able to work as part of a software development team and it’s my blog, so hush up.)

Since I started (mostly) with SAS, I’m going to talk for the next few posts about how starting as a SAS programmer can be like a Dr. Seuss book – “Oh, the places you’ll go!” My main point, as I have said before (weren’t you listening?) is that it doesn’t matter what language you use in the beginning. Eventually, I will tell you why SAS is a great place to start – better than many others – but it is not eventually yet. Patience is a virtue.

Let’s start with programming concepts. Now, I’d had a bit of BASIC, Fortran and COBOL before I got to SAS (yes, shut up, I’m old and in fact, yes I DID use a keypunch machine with punched cards like those women in Hidden Figures.  When the movie came out, one of our interns, in all seriousness, asked me if I was in it. I’m not quite that old.)

The basic concepts I use almost every day:

Arrays – I’ve written about those on this blog a dozen time. One of the most frequent uses I make of SAS is to score tests, which requires creating an array of answers from a respondent and a second array of items scored correct or incorrect. Our game, Making Camp, that teaches multiplication and division, has a virtual trading post and a wigwam, both of which make extensive use of arrays. All of the items you can “buy” with the points you earned from solving math and history problems are in an array.

SOME SAS ARRAYS

Data scored ;
set mydata.data2013 ;
array ans{70} q1- q70 ;
array correct{70} c1 – c70 ;
array scored{70} sc1 – sc70 ;

SOME  JAVASCRIPT ARRAYS

var things = [
“art/tomahawk.png”, “art/dog.jpg”, “art/pottery.png”, “art/deer_skin_sm.png”,
“art/bass_side.png”, “art/arrows_and_quiver.png”, “art/turtle.jpg”, “”,
“art/parfleche.png”, “art/feather_sm_side.png”,
“art/plate.png”
];

var things_name = [
“TOMAHAWK”, “DOG”, “POTTERY”, “DEER SKIN”, “BASS”, “ARROWS AND QUIVER”, “TURTLE”, “”, “PARFLECHE”, “FEATHER”, “PLATE”
];

Yes, they look a little different but the basic concept is the same.

In the SAS example, I’m matching three arrays – the answer the students gave, the correct answer and the item scored correct or incorrect.

In the JavaScript example, I am matching up two arrays, with the source for the image file and the alternate text for that element.

In her paper presented in 2010 at SAS Global Forum, Jennifer Waller says,

A SAS ARRAY is a set of variables of the same type that you want to perform the same operation on. The set of
variables is then referenced in the DATA step by the array name. The variables in the array are called the “elements”
of the array.
Every word of that applies in JavaScript except for “of the same type”. In JavaScript you can have mixed type arrays and if SAS would add that, it would make me very, very happy.
Arrays are a fundamental concept to any programming language, so mastering that concept is a step forward.
Truly understanding variables is another foundational idea – not just that they are not constants, but the concepts of type, format and scope – but that is a whole different post and The Invisible Developer is reminding me it’s almost 11 pm on Sunday night, so that will be my next digression.

Speaking of Making Camp, you can get it here for free. Play it and learn stuff because maturity is overrated.

wigwam

If you want to learn even more stuff, you can get a bilingual version of Making Camp for your iPad for only $1.99 and brush up on your Spanish like you always said you were going to do but didn’t.

view from plane windowLittle known fact (because, seriously, how would you know) , I write a lot of code while sitting on a plane and I can’t always connect to the Internet.

NOT ALL QUOTATION MARKS ARE CREATED EQUAL

Sometimes, when I copy and paste my code into SAS Studio, it doesn’t work.

if compress(q23) = “3/4” then q23 = “.75” ;

Just so you know, this does not work because some programs like Word, or even TextEdit on the Mac will replace quotes with some swirly shit (see above) that SAS and other languages don’t read as quotes.

This article from the University of Michigan gives some hints on how to prevent or fix this problem.

How to tell if your quotations are a problem


SAS Studio is color-coded.
Note that the first two lines have the values shown in purple.

color coded

The next two lines don’t. If you look closely, those are the evil curly quotes. If you realize this, you can tell at a glance if there is a problem with your code.

Getting rid of text

Okay, I replaced the evil curly quotes, but I still have a problem. The questions are things like,

“What is the area of this shape in square feet”, and let’s say the answer is 240 .

Students answer all kinds of variations of that, like :

  • 240 square feet
  • The answer is 240
  • 240 sq ft

All of these answers are correct but if I just compared them to 240,  they would not be equal and be marked wrong. Enter the COMPRESS function.

q3 = compress(Q3,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','l');

The above statement will remove all alphabetic characters from the answer and return just numbers.

The COMPRESS function has three parts –

  • the source, which is the variable you want modified, in this case, Q3,
  • the characters you want added or removed (the default is removal),
  • an optional modifier

In my case, I used the modifier ‘l’  – that is a lower-case L, not a number 1 – because I wanted all of those letters removed if they were lower-case, too. So, I don’t have to type all of the letters of the alphabet twice.

Getting rid of special characters

You can also use the COMPRESS function to get rid of special characters. Say the question is “If tickets are normally $100 and tickets are 50% off, how much does it cost Cassandra for a ticket to the Dead Fleas concert?” Students will enter answers like, $50 or 50.   To get rid of the $, simply do this:

q1 = compress(q1, ‘$’) ;

When I’m not teaching statistics or writing about SAS, I’m making video games. We’re doing a Kickstarter campaign to make our bilingual games available everywhere and if you backed us, that would be AMAZING ! Plus you will get cool prizes.

A while back, I wrote a post on getting your Excel data into SAS Studio the quick and easy way. However,  I hear you saying,

What about ME? What about MY needs? What if I don’t want my data written to the working directory? What if my file has the names at the top and I want to keep those names?

First of all, open a program file and run some code that assigns the LIBNAME to the directory where you want your data stored. It should look like this but whatever is in the quotation marks should be where your data are stored.

LIBNAME mydata “/courses/d1234566789” ;
run;

Second, upload your Excel File

sasexcel1

Under FILES, select the folder where you would like your data stored. Click on the UPLOAD FILES button (the arrow pointing up at the top of the screen) and then click CHOOSE FILES to go to where the file is stored on your computer. Select that file, click the button on the pop-up window that says UPLOAD. Now you have your Excel file, uploaded but you want a SAS file.sasExcel2

Go under TASKS and UTILITIES and click the arrow to select UTILITIES and then select IMPORT DATA.

 

sasExcel3

On the right, you’ll see this big window that says DRAG AND DROP YOUR FILE HERE.

file list

In the left pane, open the FILES directory and go to where you saved your Excel file. Drag it into the window. Once you’ve done that, this wi If you stopped here, you would have the file written to the working directory, and named import.

import option

If you want to change that, click the button that says CHANGE.

changing default name in boxes

This pops up. Don’t see the directory you want? Did you run the LIBNAME statement at the very beginning of this post to assign a library reference to that directory? For shame! You think I just make this stuff up? Go back and do it now.

Okay, should you be concerned that your library name is greyed out? No, you should not. That just means you cannot change the name of your library reference here. If you wanted to change that library name from “mydata” to “yourdata” you’d have to do it in the LIBNAME statement.

Type the name you want for the data set. Do not forget to click SAVE or you may as well have skipped this step.

Click the little running guy at the top of the window.

Before you go, notice that SAS also generates code for you. If, like me, you anticipate that your data may change and you may need to do this again, you can copy and paste the code generated by SAS and save it in a program file. Run it again to recreate your data set. How likely is that to happen?  Well, it happened to me today when I inadvertently (that’s a synonym for “stupidly”, right?) wrote over this exact data set.

/* Generated Code (IMPORT) */
/* Source File: az_pretest.xlsx */
/* Source Path: /home/annmaria.demars/data_analysis_examples/data2017 */
/* Code generated on: 7/31/17, 6:09 PM */

%web_drop_table(MYDATA.aztech_pre);
FILENAME REFFILE ‘/home/annmaria.demars/data_analysis_examples/data2017/az_pretest.xlsx’;

PROC IMPORT DATAFILE=REFFILE
DBMS=XLSX
OUT=MYDATA.aztech_pre;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=MYDATA.aztech_pre; RUN;
%web_open_table(MYDATA.aztech_pre);
run;

Okay, there you go. With a few clicks, your Excel file is accessible in SAS Studio as a SAS data set and you have a copy of the code that did it.

Next post we’ll start whipping that data into shape.

When I am not writing about SAS, I’m making games that teach math, social studies and language.

Check them out.

screen shots from our games

 

 

Julia yellingWhere is the Multivariate Analysis of Variance ?

You promised there would be MANOVA ! Now we’re in the third post!

First there was recoding of variables.

Then, there was creating scales. 

Now, we’re looking at reliability.

Patience is a virtue.

Before we get to doing a MANOVA we want to be sure that our dependent and independent variables are reliable and valid. Let’s move on to reliability.

I’m going to do a correlation matrix and a Cronbach alpha, which is a measure of internal consistency. The rationale is that if items all measure the same construct – say, knowledge of health practices, or autonomy or acceptance of wife beating – then those items should be related to one another. An alpha of 0 would indicate the covariance of items in the scale are zero, so, your scale sucks. An alpha of .95 would mean your scale is amazingly consistent.

So, I did three analysis for my three scales

Title "Health Variables " ;
proc corr data=example alpha ;
var hbs1 hbs3-hbs7 ;

Title "Wife beating variables" ;
proc corr data=example alpha ;
var GR34 - GR39 ;

Title "Decision Variables" ;
proc corr data=example alpha ;
VAR D_GR1A GR2A D_GR3A D_GR4A GR5a GR6A D_GR7A GR8A
D_GR9A GR9F D_GR10A D_GR12A GR10F GR12F ;

Let’s skip the simple statistics, mean, etc. you get from these analyses and go to the alpha

Screen Shot 2017-06-14 at 9.48.47 PM

The alpha for the health scale is pretty bad. The value for the raw scores is .31, for standardized items, still really bad at .32.  When we look at how deleting a variable would improve the alpha, if we dropped the first variable , the alpha would go up to .34 – but that is still awful.

For the wife-beating scale the raw value for alpha was .81 and also for the standardized value. So, that one was pretty good as far as reliability.

I put all of the decision variables together, the ones on whether the woman was involved in making decisions, could go places on her own, needed to ask permission to go places. The Cronbach alpha for the raw variables was .65, for standardized variables .81. Note that standardized variables are placed on the same metric, so my idea of some variables being much more important than others did not pan out.

So … I standardized the variables, then I read in that data set and created two scales, one that was a sum of the decision  variables and the other that was the mean of the 6 wife-beating variables. There was no particular reason for using the mean of the six variables as opposed to just adding them up. I did both methods to show it was an option.

BEWARE THE SUM FUNCTION – Note, I did not use the sum function. If you add up the values, as shown below, and one of the variables has a missing value then the value of the sum is going to be missing. If you used the SUM function, the variables that have non-missing values would be added up, so the missing value would be treated as a zero. There are times where that is acceptable. This is not one of those times.

While I’m at it, I want to check whether the scales have approximately normal distributions. A perfectly normal distribution would have skewness and kurtosis values of 0.

proc standard data=example mean=0 std=1 out=MAN_data;

Data create_manova ;
set man_data ;
* I could have used the mean function here, but I didn't ;
decision = D_GR1A + GR2A + D_GR3A + D_GR4A + GR5a + GR6A + D_GR7A + GR8A +
D_GR9A + GR9F + D_GR10A + D_GR12A + GR10F + GR12F ;
beating = mean(of gr34-gr39);

proc univariate data=create_manova ;
var decision beating ;

The skewness values were relatively low: -1.3 and 0.2 for the two scales and kurtosis values were 2.0 and -1.2  . Since my scales aren’t a radical departure from normality, I’m now going on to MANOVA – finally!

When I am not writing about statistics, I’m making games that teach math, social studies and language.

Check them out.

screen shots from our games

Last time, we saw how to recode variables to score answers correct or incorrect, on a rating scale and weighted by importance. Today, we’re going to look at creating some scales from those variables because for reasons I’m sure I have written about at some point in the past, single items are usually not very reliable. Whether you use SAS, SPSS, R or any other statistical package, you are still going  to need to follow the steps of recoding your variables and creating and validating your scales before you get into MANOVA. Or, at least, you will if you are smart.

First, I want to check that there are no obvious errors or other problems in my data.
PROC MEANS DATA=example ;
VAR gr2A -- gr39 hbs1 --d_gr12a ;

You could type in the variable names but that is a lot of typing. The double dashes mean to include all variables in the data set in order from the first variable to the one that comes after the dashes. How do you know what order the variables are in? Click on the OUTPUT DATA tab at the top and look to the left under COLUMNS.

output da

If you didn’t just run a program creating your data and hence don’t have an OUTPUT DATA tab, you can find your data file by clicking the MY LIBRARIES tab and then clicking on the library (directory) where your data are kept and clicking on the dataset to open it. You can also use the PROC CONTENTS procedure but today we are being all pointy and clicky with SAS Studio.

Sometimes you will see something like:

VAR item1 – item12 ;

The single dash is used for variables that end in a number and if you don’t have item1, item2 all the way through item12, it will give you an error and not run. Then you will be sad.

PROC MEANS will give you the N, mean, standard deviation, minimum and maximum.

Here are a few things to consider.

  • Is the N substantially less than you had expected? If so, you have a lot of missing data and you should investigate that. The lowest N I have is 37, 814 out of 39, 430 people so not bad, but I might want to look at that one item, since most of the items have close to 39,000 for an N
  • Is your standard deviation zero? STOP RIGHT THERE!  On just what variable could 39,000 people give the same response? This likely shows a big problem with your data. I did not have that problem, so I continued.
  • Are your minimum and maximum the minimum and maximum possible scores for the item? Now, this may not always be the case. On a scale of 1 to 10, say, with a sample of 50 people, maybe no one will say 1. However, I have over 39,000 people and the items are 0 or 1, o – 2  or 1- 3, so I should have people from the minimum to the maximum or something is wrong. Nothing is wrong, and I continue.
  • Are the means about what you expect? Well, I’m not really an expert on social structure and family relations in India, so I can’t say. About a third of the women said it was usual for a husband to beat his wife if her dowry was not what was expected. About three-fourths said they would be allowed to visit a family or friend’s home alone.

Okay, so my results from the means procedure looks okay. Now what?

Next, I’m going to do a factor analysis to see if my supposition is supported of three scales related to health, beating your wife and autonomy.

Here is the code for my factor analysis.

PROC FACTOR DATA =example SCREE ROTARE= VARIMAX NFACTORS=5;
VAR gr2A -- gr39 hbs1 --d_gr12a ;

This is actually the second one I ran. In inspecting the results for the first, between the eigenvalues and scree plot, I decided that at most I should retain five factors. I’ve written a lot about factor analysis on this blog previously, so I’m not going to go into detail here.  In short, the decision-making variables mostly loaded on the first factor with factor loadings of .70 and higher. The median communality estimate for those items was about .67.  In short, considerable evidence for a decision-making factor. The wife-beating variables loaded on the second factor. All but one loaded above .67, and even that variable (Beating your wife if she had an extramarital affair – which 84% of the women said was accepted in their communities) loaded at .40. The variables regarding needing permission to go places loaded on the third factor and also had high communality estimates. The variables regarding going places by yourself loaded on the fourth factor and also had high communality estimates.

The health variables were a different story. Four out of six loaded between .47 and .67 on the fifth factor. The other two did not load on any factor.

It is starting to look like at this point that it is okay to retain the wife-beating items as a scale. The various measures of autonomy  – decision-making, going places on your own and needing permission – seem to hang together within factors. I think it would be reasonable to put all three of these together in one scale. I talked about parceling in the past, and I could have done that as a step here, and then re-run the factor analysis to support (or not) my supposed autonomy factor. Since I have limited time and simply doing this analysis for educational and illustrative purposes, I skipped over this to the next procedure, which is reliability analysis.

Since this post is pretty long already, I’ll save that for the next post.

When I am not writing about statistics, I’m making games that teach math, social studies and language.

Check them out.

screen shots from our games

Other people want to go see the new Wonder Woman movie. I’ve been wanting to talk about MANOVA, but first, we need some decent dependent and independent measures.

I have the India Human Development Survey data on over 39,000 women and my hypothesis is that education is related to women’s rights’ issues, especially autonomy, health practices knowledge and domestic violence. I also think that mobility might be related, as women who get out of their native village might be exposed to new ideas.

Before I can test out my (supposedly) brilliant hypotheses, I need to create some variables because it turns out when they were collecting data in India in 2011 they were not thinking about my convenience. (Yes, I, too, am appalled by this lack of consideration.)

Independent Variables

First, I will need to create my independent variables from

EW11 Differences in family by mobility

1= same village/ town

2= another village

3 = another town

4 = metro (since only 1% fall in here, I’m going to delete this category)

and education (see below)

Items that will go into dependent variables (maybe)

HEALTH QUESTIONS

HB1 Milk harmful

HB3. 1st milk good for baby 

Hb4 chulha smoke good

Hb5 child diarrhea drink more

Hb6 illness spread through water

Hb7 malaria spread

DECISIONS

The items below are scored 1 if the respondent decides, 0 if the respondent does not decide. (More than 1 person can decide, so if both husband and wife decide, the answer will be 1 for both. In this case, I just looked at if the wife had a say in the decision.)

  • GR1a Cooking
  • GR2A Expensive purchases.      
  • GR3A Decides number of children
  • GR4A Decides what to do if sick
  • GR5A Decides whether to buy land  
  • GR6A Decides wedding expense
  • GR7A Decides if child is sick
  • GR8A Decides who your children should marry

The items below are score 1 if the woman is allowed to do these things alone and 0 if she is not.

  • GR9F Can visit health center alone
  • GR10F Can visit relative/ friend alone
  • GR12F. Can go short distance alone

These items relate to whether the woman needs to ask permission for activities, with  0 = no, 1 = must inform someone and 2 = yes

  • GR9A Ask permission to visit health center
  • GR10A Ask permission to visit relative
  • GR12A. Ask permission to travel by bus/train

 

WIFE BEATING QUESTIONS

GR34 – GR39  – All of these relate to under what circumstances it is acceptable, coded yes = 1 or 0 = no.

As you can see, well, I hope you can see, each of these presents a different date re-coding problem.

  • Mobility and education needs to be coded into categories (there is a minor reason I will explain in a later post why this is not necessary but convenient), with the fourth category deleted,
  • Health questions need to be scored as correct or incorrect.
  • Decision questions are all scored equally – so deciding what food  to cook and how many children you have are each scored a 1. I think that’s not right and I want to weight some decisions more than others.
  • Independence questions need to be reverse coded, so not asking permission is a 2 and asking permission is a 0
  • Wife-beating questions need no recoding

So … here we go. The first thing we’re going to do is create categories. Notice I don’t do anything with the category 4 for mobility, so those people will just have a missing value for MOBILITY and be dropped from the analysis.

Also, a note on ELSE as opposed to just IF statements.

I could just use all IF statements but that would be inefficient. It doesn’t really matter here with 39,000 records but if I had millions it would slow down processing. The ELSE statement is only processed if the preceding IF statement is false.

NOTE!!!  In the second set of IF- ELSE statements, I have

else if ew8 < 9 and ew8 ne . then education = “ELEM”;

This statement is only executed IF the preceding IF statement was false.  Without the ELSE, everything less than 9, including those who had 0 years of education, would be set to ELEM.  Without the and ew8 ne .  in this statement, anyone that had missing data would be set to ELEM along with anyone who had 1-8 years of education.


data example ;
set mydata.india ;
If EW11 = 1  then Mobility = “None” ;
else if EW11 = 2 then mobility = “Vill” ;
else if EW11 = 3 then mobility = “TOWN”;

if ew8 = 0 then education = “NONE” ;
else if ew8 < 9 and ew8 ne . then education = “ELEM”;
else if ew8 > 8 then education = “HS +”;

*** The statements below recode the health items ;

*** For hb1 the correct answer is 0, so  1-hb1   will score respondents who said 0 as correct (= 1) and those who said 1 as incorrect (=0);

*** For hb3 the correct answer is 1, so respondents who said 1 are scored as correct (= 1) and those who said any number higher than 1 as incorrect (=0);

*** For hb4 – hb7, the correct answer is scored as correct (=1) and any numbers in the incorrect set scored as incorrect (=0);
*** HEALTH QUESTIONS ;
hbs1 = 1- hb1 ;

If hb3 = 1 then hbs3 = 1 ;
Else if hb3 > 1 then hbs3 = 0 ;
If hb4 = 2 then hbs4 = 1 ;
Else if hb4 in (1,3) then hbs4 = 0 ;
If hb5 = 2 then hbs5 = 1 ;
Else if hb5 in (1,3,4) then hbs5 = 0 ;
If hb6 = 2 then hbs6 = 1 ;
Else if hb6 in (1,3,4) then hbs6 = 0 ;

If hb7 = 3 then hbs7 = 1 ;
Else if hb7 in (1,2,4) then hbs7 = 0 ;

 

/* DECISION QUESTIONS */
/* ALSO INCLUDES ADDITIONAL ITEMS NOT RECODED */

**** Here, I multiplied items by a factor based on my estimation of importance ;
D_GR1A = GR1A* 0.5 ;
D_GR3A = GR3A * 10 ; * BECAUSE I THINK IT’S IMPORTANT ;
D_GR4A = GR4A *2 ;
D_GR7A = GR7A *2 ;

**** These items are subtracted from 3 so doesn’t have to tell anyone = 2 ;

****  Needs to inform someone = 1 and needs to ask permission = 0 ;
D_GR9A = 3 – GR9A ;
D_GR10A = 3 – GR10A ;
D_GR12A = 3 – GR12A ;

**** KEEPS THE VARIABLES I PLAN TO USE ;
Keep EW8 EW5  Ew6 EW10  EW14a   EW12a EW12b
HBS1 HBs3-HBS7 D_GR1A GR2A D_GR3A D_GR4A GR5a GR6A D_GR7A GR8A
D_GR9A GR9F D_GR10A D_GR12A GR10F GR12F GR34 – GR39 mobility education;

So, there we go. You might think I would dive into a Multivariate Analysis of Variance now but you would be wrong. The next thing I am going to do is check the validity of my scales through a combination of factor analysis, univariate statistics and reliability analysis. Only after  that step will I do the MANOVA.

I’m teaching a course on multivariate statistics and for some of the students it’s been a minute since their last inferential statistics course.

So, I have been doing a few videos here and there to refresh, for example, what is a repeated measures ANOVA and why you might want to do it.

 

Sometimes I use repeated measures ANOVA to test whether our games are effective in improving math scores (they are!). You can check out the games here.

attacking the aztecs

If you are interested in being a beta tester for our first bilingual game that teaches statistics, please email info@7generationgames.com

Since I had done a few youtube videos on using SAS Studio, I thought I would add them to my blog. This one uses the characterize data task to take a quick look at the data, but I suppose you could have guessed that from the title.

 

Support my day job AND get smarter. Buy Fish Lake for Mac or Windows. Brush up on math skills and canoe the rapids.

girl in canoe

For random advice from me and my lovely children, subscribe to our youtube channel 7GenGames TV

Next Page →