Someone handed me a data set on acculturation that they had collected from a small sample size of 25 people. There was a good reason that the sample was small – think African-American presidents of companies over $100 million in sales or Latina neurosurgeons. Anyway, small sample, can’t reasonably expect to get 500 or 1,000 people.
The first thing I thought about was whether there was a valid argument for a minimum sample size for factor analysis. I came across this very interesting post by Nathan Zhao where he reviews the research on both a minimum sample size and a minimum subjects to variables ratio.
Since I did the public service of reading it so you don’t have to, (though seriously, it was an easy read and interesting), I will summarize:
- There is no evidence for any absolute minimum number, be it 100, 500 or 1,000.
- The minimum sample size depends on the number of variables and the communality estimates for those variables
- “If components possess four or more variables with loadings above .60, the pattern may be interpreted whatever the sample size used .”
- There should be at least three measured variables per factor and preferably more.
This makes a lot of sense if you think about factor loadings in terms of what they are, correlations of an item with a factor. With correlations, if you have a very large correlation in the population, you’re going to find statistical significance even with a small sample size. It may not be precisely as large as your population correlation, but it is still going to be significantly different than zero.
So … this data set of 25 respondents that I received originally had 17 items. That seemed clearly too many for me. I thought there were two factors, so I wanted to reduce the number of variables down to 8, if possible. I also suspected the communality estimates would be pretty high, just based on previous research with this measure.
Here is what I did next :
- Parallel analysis
- Factor Analysis
I can’t believe I haven’t written at all on parceling before and hardly any on the parallel analysis criterion, given the length of time I’ve been doing this blog. I will remedy that deficit this week. Not tonight, though. It’s past midnight, so that will have to wait until the next post.