Where we left off, I had created some parcels and was going to do a factor analysis later. Now, it’s later. If you’ll recall, I had not find any items that correlated significantly with the food item that also made sense conceptually. For example, it correlated highly with attending church services but that didn’t really have any theoretical basis. So, I left it as a single variable. Here is my first factor analysis.

proc factor data= parcels rotate= varimax scree ;
Var socialp1 – socialp3 languagep spiritualp spiritual2 culturep1 culturep2  food;

You can see from the scree plot here that there is one factor way at the top of the chart with the rest scattered at the bottom. Although the minimum eigen value of 1 criterion would have you retain two factors, I think that is too many, for both logical and statistical reasons.  The eigenvalues of the first two factors, by the way, were 4.74 and 1.10 .

scree plot

Even if you aren’t really into statistics or factor analysis, I hope that this pattern is pretty clear. You can see that every single thing except for the item related to food loads predominantly on the first factor.

factor patternThe median factor loading was .79, and the factor loadings ranged from .49 to .83 .

These results are interesting in light of the discussion on small sample size. If you didn’t read it, the particular quote in there that is relevant here is

“If components possess four or more variables with loadings above .60, the pattern may be interpreted whatever the sample size used .”

Final Communality Estimates: Total = 5.845142

socialp1 socialp2 socialp3 languagep spiritualp spiritual2 culturep1 culturep2 food
0.67438366 0.72223020 0.64287274 0.80080260 0.34260318 0.46790413 0.70885380 0.69821549 0.78727573

These communality estimates are also relevant but it is nearly 1 am and I have to be up at 6:30 for a conference call, so I’ll ramble on about this some more next time.

First of all, what are parcels? Not the little packages your grandma left on the table in the hall when she came back from shopping. Well, not only that.

In factor analysis, parcels are simply the sum of a small number of items. I prefer using parcels when possible because both basic psychometric theory and common sense tells me that a combination of items will have greater variance and, c.p., greater reliability than a single item.

Just so you know that I learned my share of useless things in graduate school, c.p. is Latin for ceteris paribus which translates to “other things being equal”. The word “etcetera”  meaning other things, has the same root.

Know you know. But I digress. Even more than usual. Back to parcels.

As parcels can be expected to have greater variance and greater reliability, harking back to our deep knowledge of both correlation and test theory we can assume that parcels would tend to have higher correlations than individual items. As factor loadings are simply correlations of a variable (be it item or parcel) with the factor, we would assume that  – there’s that c.p. again – factor loadings of parcels would be higher.

Jeremy Anglim, in a post written several years ago, talks a bit about parceling and concludes that it is less of a problem in a case, like today, where one is trying to determine the number of factors. Actually, he was talking about confirmatory factor analysis but I just wanted you to see that I read other people’s blogs.

The very best article on parceling was called To Parcel or Not to Parcel and I don’t say that just because I took several statistics courses from one of the authors.

 

To recap this post and the last one:

I have a small sample size and due to the unique nature of a very small population it is not feasible to increase it by much.I need to reduce the number of items to an acceptable subject to variables ratio. The communality estimates are quite high (over .6) for the parcels. My primary interest is in the number of factors in the measure and finding an interpretable factor.

So… here we go. The person who provided me the data set went in and helpfully renamed the items that were supposed to measure socializing with people of the same culture ‘social1’, ‘social2’ etc, and renamed the items on language, spirituality, etc. similarly. I also had the original measure that gave me the actual text of each item.

Step 1: Correlation analysis

This was super-simple. All you need is a LIBNAME statement that references the location of your data and then:

PROC CORR DATA = mydataset ;

VAR  firstvar — lastvar ;

In my case, it looked like this

PROC CORR DATA = in.culture ;

VAR social1 — art ;

The double dashes are interpreted as ‘all of the variables in the data set located from var1 to var2 ‘ . This saves you typing if you know all of your variables of interest are in sequence. I could have just used a single dash if they were named the same, like item1 – item17 , and  then it would have used all of the variables named that regardless of their location in the data set. The problem I run into there is knowing what exactly item12 is supposed to measure. We could discuss this, but we won’t. Back to parcels.

Since you want to put together items that are both conceptually related and empirically – that is, the things you think should correlate do- you first want to look at the correlations.

Step 2: Create parcels

The items that were expected to assess similar factors tended to correlate from .42 to .67 with one another. I put these together in a ver simple data step.

data parcels ;
set out.factors ;
socialp1 = social1 + social5 ;
socialp2 = social4 + social3 ;
socialp3 = social2 + social6 + social7 ;
languagep = language2 + language1 ;
spiritualp = spiritual1 + spiritual4 ;
culturep1 = social2 + dance + total;
culturep2 = language3 + art ;

There was one item that asked how often the respondent ate food from the culture, and that didn’t seem to have a justifiable reason for putting with any other item in the measure.

Step 3: Conduct factor analysis

This was also super-simple to code. It is simply

proc factor data= parcels rotate= varimax scree ;
Var socialp1 – socialp3 languagep spiritualp spiritual2 culturep1 culturep2  ;

I actually did this twice, once with and once without the food item. Since it loaded by itself on a separate factor, I did not include it in the second analysis. Both factor analyses yielded two factors that every item but the food item loaded on. It was a very nice simple structure.

Since I have to get back to work at my day job making video games, though, that will have to wait until the next post, probably on Monday.

—–

Be more than ordinary. Take a break. Play Forgotten Trail. I bet you have a computer!

characters traveling on map

Learn and have fun. More productive than fruit crush, candy ninja or whatever the heck else it is you or your kids are playing.

Someone handed me a data set on acculturation that they had collected from a small sample size of 25 people. There was a good reason that the sample was small – think African-American presidents of companies over $100 million in sales or Latina neurosurgeons. Anyway, small sample, can’t reasonably expect to get 500 or 1,000 people.

The first thing I thought about was whether there was a valid argument for a minimum sample size for factor analysis. I came across this very interesting post by Nathan Zhao where he reviews the research on both a minimum sample size and a minimum subjects to variables ratio.

Since I did the public service of reading it so you don’t have to, (though seriously, it was an easy read and interesting), I will summarize:

  1. There is no evidence for any absolute minimum number, be it 100, 500 or 1,000.
  2. The minimum sample size depends on the number of variables and the communality estimates for those variables
  3. “If components possess four or more variables with loadings above .60, the pattern may be interpreted whatever the sample size used .”
  4. There should be at least three measured variables per factor and preferably more.

This makes a lot of sense if you think about factor loadings in terms of what they are, correlations of an item with a factor. With correlations, if you have a very large correlation in the population, you’re going to find statistical significance even with a small sample size. It may not be precisely as large as your population correlation, but it is still going to be significantly different than zero.

So … this data set of 25 respondents that I received originally had 17 items. That seemed clearly too many for me.  I thought there were two factors, so I wanted to reduce the number of variables down to 8, if possible. I also suspected the communality estimates would be pretty high, just based on previous research with this measure.

Here is what I did next :

  • Parceled
  • Parallel analysis
  • Factor Analysis

I can’t believe I haven’t written at all on parceling before and hardly any on the parallel analysis criterion, given the length of time I’ve been doing this blog. I will remedy that deficit this week. Not tonight, though. It’s past midnight, so that will have to wait until the next post.

Update: read post on parcels and the PROC FACTOR code here

—-

My day job is making games that make you smarter. Check out our latest game, Forgotten Trail. Runs on Mac or Windows in any browser. Be more than ordinary.

People on farm

In a very random life event, I was asked a lot of questions recently by people exploring making a movie about my life. This is not the interesting part, because in Hollywood people are always talking about making movies that come to nothing …

The interesting thing was how many times the answer to a question was,

Sister Marion, my sixth-grade math teacher.

I was not a very prepossessing child.

“Prepossessing – Having qualities that people like: appealing or attractive.”

School buildingIn fact, if there was such a word as anti-possessing (which there is not), that would have defined me well. I was short, overweight, often dressed in my brother’s too-big clothes because I was too lazy to look for my own uniform and didn’t care about my appearance. I was also the type of child who knew the definition of words like ‘prepossessing’ and mocked other children, and teachers, if they did not. It probably doesn’t surprise you to hear that I was not wildly popular.

My grades were not the best, partly because I often forgot my homework in the mad rush to get five kids out the door early enough that my mother could make it to work on time. Partly it was because I am EXTREMELY near-sighted, a fact no one discovered until the third or fourth grade (thank you, Lions Club vision screening!) and even after that I usually could not see the board because I could not manage to have a pair of glasses for more than a few weeks without losing them. Glasses were not cheap and my family didn’t have a lot of extra cash so it would usually be months between pairs.

Then, I got the chicken pox and was out of school for a week. Despite all of the bewailing about how stupid today’s children are compared to yesteryear, back then we learned fractions in sixth-grade, not fifth, and I had missed the entire week when these were introduced. A petty teacher (and the world has too damn many of  those), might have been gratified by the fact that a pain-in-the-ass, know-it-all kid was finally going to be put in her place.

I’d like to think that Sister Marion realized that the only thing I felt I had going for me was being smart and that’s why I had to rub everyone’s face in it. Maybe she realized I needed a friend, and a new perspective.

Whatever it was, she paired me up with another child in the class, Diane, who wasn’t a star student overall, but was very good at math, and told her to explain to me what we had learned while I was out. Not only did I get caught up on fractions, but I learned not to underestimate people based on appearances or first impressions. Just because a person wasn’t a great reader didn’t mean she couldn’t be good at math. Diane and I actually had conversations, and she introduced me to another friend of hers, also named Diane. I called one of the Dianes on the phone – it was the first time I had ever had another kid at school to call – and I was 11.

Sister Marion was nice to me. If you think every teacher is nice to every child then perhaps you need to go back and read the beginning of this post. When I think back, I can only think of two teachers I had before I got expelled from the public school system who were consistently nice to me, Sister Marion and Mr. Cartwright, my 8th grade algebra teacher.

It’s probably no coincidence that I’m good at math and made a career of it.

It’s funny how often when they asked me questions, Sister Marion’s name came up.

Did you have a teacher who you particularly admired?

Was there a teacher who interested you in mathematics?

What made you decide that you wanted to teach?

Who were your role models in life?

I’m not saying that she was the only person who was a role model or who made a difference. However, she was exactly what we try to be at 7 Generation Games – a change in the trajectory that made me shift from doing all right in school with no effort to doing better and better with more effort. She was a person that made me think I could be more than ordinary.

Of course I make an effort to encourage the students who show exceptional effort and ability. Then, I remember Sister Marion and make an extra effort to also encourage students who are annoying, rude, don’t do their work.

When I think of Sister Marion, I am reminded yet again of the truth of that saying:

I touch the future. I teach.

———-

Want to see what I did with math once I grew up?

Since I already called my mom on Mother’s Day, I thought that I’d talk about another woman who was important in my life, a mentor, who I probably haven’t talked to in 20 years. (I know, I’m such an ungrateful bitch. )

Dr. Jane Mercer was not even in the same department as me. My dissertation was an analysis of the psychometric properties of Wechsler Intelligence Scale for Children – Revised , Mexicano, and she was a sociologist renowned for her expertise on the impact of social and cultural factors on intelligence test scores.

Shortly after I finished the first draft of my dissertation, my advisor received some distressing news (no, it wasn’t that he was my advisor, he already knew that). He and his wife had begun dating as very young teenagers. Other than his military service during World War II, they had been together ever since. When she was diagnosed with cancer, he walked into the dean’s office and just said, simply,

I can’t.

… And went on sabbatical with about a four-minute notice. 

Everyone completely understood. His colleagues took over committee responsibilities. As his doctoral student that was furthest along, I taught his courses, like inferential statistics.

I was his only doctoral student writing a dissertation, and someone needed to step in to supervise my research. That was Dr. Jane Mercer. 

Not only did she read every draft of my dissertation, recommend articles I read and journals to submit publications, introduce me to people at conferences (not a gesture to be underestimated when one is looking for a position) but, more importantly, she provided advice on life.

Here are a few of the things I learned from Dr. Mercer just by observing her.

1. NO MATTER HOW FAR YOU HAVE GONE DOWN THE WRONG ROAD, TURN BACK! Taped over her desk, Dr. Mercer had a piece of paper with this proverb typed on it. No matter how far you’ve gone down the wrong road, turn back. We’re told in America that quitters never win, bloom where you’re planted, you can’t fight city hall, you’re never going to win against big corporations. Making a change in anything from your employer to your gym to the crowd you hang around with can be treated as an act of disloyalty. People stay in situations long, long after they should have left because they are ‘committed’, ‘invested’, ‘cannot leave now’. The unwillingness to turn back after going a long way down the wrong road is the second biggest barrier most people’s happiness. The biggest is fear, which leads me to …

2. Have the courage to speak the truth as you see it.  Being the most brilliant researcher in the world does no good to anyone if you are afraid to publish and publicize unpopular results. In the 1970s, many people thought intelligence tests were the answer to psychology’s long history of physics envy. At last, we were a real science with actual numbers, not this whacko dream interpretation stuff but measurement – hey, IQ even has a math word – quotient, in the name. Not to mention, companies like The Psychological Corporation and Educational Testing Service were big business (still are). Jane Mercer sincerely believed intelligence tests systematically underestimated the intelligence of low-income, minority children. In the case of Diana vs the State Board of Education, a lawsuit was filed on behalf a few Mexican-American children, including a little girl who spoke Spanish as her first language,  was tested in English and determined to be mentally retarded. All of the big names (and big money) lined up on the side of the State Board of Education and Jane spoke up for the side of Diana. This may not seem like much now, but back then she had to stand up to a LOT of opposition, it was not happy times. She did it anyway.

3. Yes, you CAN have a job and a family. Men do it all the time. Jane was older than me and of that generation that was told women could either have a career or children but not both. By the time I met her, her four sons were all adults. She and her husband got along fine and seemed to agree that since they were both parents of these children they could both engage in parenting them. We couch things in daunting terms “Can women have it all?”  Of course no one has it ALL. I’m finishing this blog post in the Denver airport. That empty spot you see at the end of jetway is where the plane I am taking back to Los Angeles should be. 

no plane

I would like to have a non-eventful flight out of Denver airport, just once. You see, none of us can have it ALL but no one asks men whether they think they can manage a career and children.

4. Being the first or only woman in an area doesn’t mean you have to go along with that happy-to-be-here crap. Yes, she was a tenured professor at the University of California, which had damn few of them, but that didn’t mean she had to accommodate in any way because of her gender. Don’t take on female doctoral students because you don’t want to be type-cast as ‘only a good advisor for women’? Screw that! If they needed an advisor and she could help, she was on board. Don’t speak out about intelligence testing because people will think you are shrill or too emotional, not a real academic? Screw that twice!  As you can see, I have taken that lesson deeply to heart but with less of her limits on profanity.

Woo-hoo – plane boarding now – only 90 minutes late – gotta go. Happy Mother’s Day.

———–

 If you forgot to give your mom anything for Mother’s Day, you sponsor a school, classroom or individual license for any of our games, starting at just $4.99Sam and Angie planning their journey

Or get one for your own kids (or yourself, maturity is over-rated).

At first, I was thinking it wasn’t right to have a favorite paper, but then I realized that was idiotic. It’s not like these papers (or their presenters) are my children.

My favorite paper was,

Statistical modeling for large complex data: Five new directions from SAS/STAT software

If you’re not a statistician, props to you for reading after that first sentence, especially since some of the lessons apply to any conference.

glm select

  1. You don’t always have to present or attend presentations on whatever is shiny and new. The techniques he presented, like GLMSELECT, a method for selecting the best model is not brand new. I remember when it was first added to SAS/STAT and thinking it was a way cool idea I should use – but, then, I didn’t. As you can see from the graph above, it can be pretty easy to select the best model. Looks a lot like a scree plot, doesn’t it?  This also further supports my point that visual displays of data, like the one above, are everywhere and taking over. Now that I have been reminded of its existence, I’m looking for a use for it so I can really remember it. Unfortunately, this is a method for general linear models and what I am most interested in right now has a binomial outcome, whether a player finished a game or not.
  2. Don’t stop learning when you go home. I remembered that there was also an example in this paper that used HPGENSELECT for generalized linear models, including binomial distributions. So, I am going to try that out with this dataset. One of the areas where I am improving is actually reading all of those papers I mean to get around to when I get home. Whether it is a paper you attended, but is now jumbled around in your brain with the other 25 sessions, or one you could not attend because it conflicted with something else, when you get home, you should read it. Conferences can be expensive and you want to get the most out of that time and money you spent.
  3. Of course, I learned about sparse regression, quantile regression, classification and regression trees and more, which you can, too if you follow my advice from #2.

Okay, well there is a lot more to say about SAS Global Forum and my adventures with HPGENSELECT but we have a new game, Forgotten Trail, coming out for sale tomorrow, so back to work.

———-

7 GENERATION GAMES

Sam and Angie planning their journey

BETTER GAMES, BETTER MATH