Parallel Analysis Criterion Simplified?

ByAnnMaria De Mars October 23, 2014October 23, 2014

Am I missing something here? All of the macros I have seen for the parallel analysis criterion for factor analysis look pretty complicated, but, unless I am missing something, it is a simple deal.

The presumption is this:

There isn’t a number like a t-value or F-value to use to test if an eigenvalue is significant. However, it makes sense that the eigenvalue should be larger than if you factor analyzed a set of random data.

Random data is, well, random, so it’s possible you might have gotten a really large or really small eigenvalue the one time you analyzed the random data. So, what you want to do is analyze a set of random data with the same number of variables and the same number of observations a whole bunch of times.

Horn, back in 1965, was proposing that the eigenvalue should be higher than the average of when you analyzed a set of random data. Now, people are suggesting it should be higher than 95% of the time you analyzed random data (which kind of makes sense to me).

Either way, it seems simple. Here is what I did and it seems right so I am not clear why other macros I see are much more complicated. Please chime in if you see what I’m missing.

Randomly generate a set of random data with N variables and Y observations.
Keep the eigenvalues.
Repeat 500 times.
Combine the 500 datasets (each will only have 1 record with N variables)
Find the 95th percentile

%macro para(numvars,numreps) ;
%DO k = 1 %TO 500 ;
data A;
array nums {&numvars} a1- a&numvars ;
do i = 1 to &numreps;
do j = 1 to &numvars ;
nums{j} = rand(“Normal”) ;
if j < 2 then nums{j} = round(100*nums{j}) ;
else nums{j} = round(nums{j}) ;
end ;
drop i j ;
output;
end;

proc factor data= a outstat = a&k noprint;
var a1 – a&numvars ;
data a&k ;
set a&k ;
if trim(_type_) = “EIGENVAL” ;

%END ;
%mend ;

%para(30,1000) ;

data all ;
set a1-a500 ;

proc univariate data= all noprint ;
var a1 – a30 ;
output out = eigvals pctlpts = 95 pctlpre = pa1 – pa30;

*** You don’t need the transpose but I just find it easier to read ;
proc transpose data= eigvals out=eigsig ;
Title “95th Percentile of Eigenvalues ” ;
proc print data = eigsig ;
run ;

It runs fine and I have puzzled and puzzled over why a more complicated program would be necessary. I ran it 500 times with 1,000 observations and 30 variables and it took less than a minute on a remote desktop with 4GB RAM. Yes, I do see the possibility that if you had a much larger data set that you would want to optimize the speed in some way. Other than that, though, I can’t see why it needs to be any more complicated than this.

If you wanted to change the percentile, say, to 50, you would just change the 95 above. If you wanted to change the method from say, Principal Components Analysis (the default, with commonality of 1) to saying else, you could just do that in the PROC FACTOR step above.

The above assumes a normal distribution of your variables, but if that was not the case, you could change that in the RAND function above.

As I said, I am puzzled. Suggestions to my puzzlement welcome.

Software

Ruby proves there is life after SAS
ByAnnMaria De Mars February 14, 2011February 14, 2011

The first programming language I learned, 36 years ago, was BASIC, followed by Fortran the next semester. A couple of years later, my employer had me learn COBOL, FORESIGHT and M (the successor to MUMPS). Haven’t touched either in decades and of the three, COBOL is the only one that I’m sure still exists. In…

Read More Ruby proves there is life after SAS
statistics

Native Americans: Why Heidi Heitkamp won & Nate Silver was wrong?
ByAnnMaria De Mars November 19, 2012

The past couple of weeks, I’ve been hearing my friends from Turtle Mountain and Spirit Lake talk about the election in North Dakota. I was particularly interested because this was the one election that Nate Silver predicted incorrectly. He had Heitkamp down by 3.9 percent, and yet she won. I have no idea how Silver’s…

Read More Native Americans: Why Heidi Heitkamp won & Nate Silver was wrong?
Software | statistics | Technology

MANOVA beginning to end: Recoding Data is Part of the Process
ByAnnMaria De Mars June 11, 2017

Other people want to go see the new Wonder Woman movie. I’ve been wanting to talk about MANOVA, but first, we need some decent dependent and independent measures. I have the India Human Development Survey data on over 39,000 women and my hypothesis is that education is related to women’s rights’ issues, especially autonomy, health…

Read More MANOVA beginning to end: Recoding Data is Part of the Process
20 Day Blogging | Open data | statistics

Random file, open data, website-love & Day 11 of the 20-day blogging challenge
ByAnnMaria De Mars February 26, 2014

As I mentioned yesterday, banging away at 7 Generation Games has led to less time for blogging and a whole pile of half-written posts shoved into cubbyholes of my brain. So, today, I reached into the random file and coincidentally came out with a second post on open data … The question for Day 11…

Read More Random file, open data, website-love & Day 11 of the 20-day blogging challenge
computer games | statistics

Dakota Math Results Coming In: Following my own advice
ByAnnMaria De Mars June 16, 2015

I tell clients on our statistical consulting side all of the time that if your conclusion is only valid if you look at this specific subset of your sample, with this particular statistical technique. You need to look for a convergence or results. Does the mean score increase? Does the proportion of people passing a…

Read More Dakota Math Results Coming In: Following my own advice
statistics

We NEED statistics because you don’t know it when you see it
ByAnnMaria De Mars July 12, 2014

When we started the Dakota Learning Project to evaluate our educational games, I wondered if we had bitten off more than we could chew. We proposed to develop the games, pilot them in schools, collect data and analyze the data to see if the games had any impact. We were also going to go back…

Read More We NEED statistics because you don’t know it when you see it

4 Comments

Abby Paden says:

December 18, 2016 at 11:27 am

I think a lot of the code out there in the ether is terribly optimized and unnecessarily complex. It’s as if some of the authors are trying to win an Obfuscated Code Challenge.

I will be using your PA program further into my degree.
AnnMaria says:

January 4, 2017 at 1:05 am

Ha ha ha, you win my best comment of the week award!
GM Jackson says:

January 4, 2017 at 3:57 pm

In my opinion, finding a simpler, faster, more efficient way is always better. If something seems overly complicated, it’s usually because the creator of that complicated mess doesn’t want anyone to figure out how it works. It’s easier to monopolize a given task if you are the only one who knows how to do it. Once you simplify it, practically anyone can do it.
AnnMaria says:

January 4, 2017 at 7:37 pm

I’m the opposite – if practically anyone can do it then I’m freed up to find a new, more interesting challenge.

Similar Posts

4 Comments

Leave a Reply