### Apr

#### 16

# Factor Analysis Tips: Unexpected Things I Learned at SAS Global Forum

April 16, 2018 | Leave a Comment

Are you still re-ordering your factor pattern by sorting columns in Excel? Well, do I have a tip or two for you.

The cool thing about some large conferences is that even the things you hadn’t planned on attending can be worth while. For example, during one time slot, I didn’t have anything particular scheduled and Diane Suhr was doing a talk on factor analysis and cluster analysis. Now, I published my first paper on factor analysis in 1990, so I was mostly interested in the cluster analysis part.

After all of those years, how did I not know that PROC FACTOR had an option to flag factor loadings over a certain value? Somehow, I missed that, can you believe it?

I also missed the REORDER option that reorders the variables in the output from largest to smallest on their loading on the first factor, then in order of their loading on the second factor and so on.

It’s super-simple. Use FLAG = value to flag loadings and REORDER to reorder them, like so.

`proc factor data=principal n=3 rotate=varimax scree FLAG=.35 REORDER ;`

var X1 x2 x3 x4;

You can see the results below. With a small number of variables like this example, it doesn’t make much difference but in an analyses with 40 or 50 variables this can make it much easier to identify patterns in your data.

I am backwards woman. I write about statistics and statistical software in my spare time and my day job is making video games. In my defense, the latest series of those games teaches statistics – in Spanish and English.

### May

#### 26

# Factor analysis of parcels: part 1

May 26, 2016 | Leave a Comment

Where we left off, I had created some parcels and was going to do a factor analysis later. Now, it’s later. If you’ll recall, I had not find any items that correlated significantly with the food item that also made sense conceptually. For example, it correlated highly with attending church services but that didn’t really have any theoretical basis. So, I left it as a single variable. Here is my first factor analysis.

proc factor data= parcels rotate= varimax scree ;

Var socialp1 – socialp3 languagep spiritualp spiritual2 culturep1 culturep2 food;

You can see from the scree plot here that there is one factor way at the top of the chart with the rest scattered at the bottom. Although the minimum eigen value of 1 criterion would have you retain two factors, I think that is too many, for both logical and statistical reasons. The eigenvalues of the first two factors, by the way, were 4.74 and 1.10 .

Even if you aren’t really into statistics or factor analysis, I hope that this pattern is pretty clear. You can see that every single thing except for the item related to food loads predominantly on the first factor.

The median factor loading was .79, and the factor loadings ranged from .49 to .83 .

These results are interesting in light of the discussion on small sample size. If you didn’t read it, the particular quote in there that is relevant here is

“If components possess four or more variables with loadings above .60, the pattern may be interpreted whatever the sample size used .”

Final Communality Estimates: Total = 5.845142

socialp1 | socialp2 | socialp3 | languagep | spiritualp | spiritual2 | culturep1 | culturep2 | food |

0.67438366 | 0.72223020 | 0.64287274 | 0.80080260 | 0.34260318 | 0.46790413 | 0.70885380 | 0.69821549 | 0.78727573 |

These communality estimates are also relevant but it is nearly 1 am and I have to be up at 6:30 for a conference call, so I’ll ramble on about this some more next time.

### May

#### 20

# Parceling Items in Factor Analysis

May 20, 2016 | 5 Comments

First of all, what are parcels? Not the little packages your grandma left on the table in the hall when she came back from shopping. Well, not only that.

In factor analysis, parcels are simply the sum of a small number of items. I prefer using parcels when possible because both basic psychometric theory and common sense tells me that a combination of items will have greater variance and, c.p., greater reliability than a single item.

Just so you know that I learned my share of useless things in graduate school, c.p. is Latin for ceteris paribus which translates to “other things being equal”. The word “etcetera” meaning other things, has the same root.

Know you know. But I digress. Even more than usual. Back to parcels.

As parcels can be expected to have greater variance and greater reliability, harking back to our deep knowledge of both correlation and test theory we can assume that parcels would tend to have higher correlations than individual items. As factor loadings are simply correlations of a variable (be it item or parcel) with the factor, we would assume that – there’s that c.p. again – factor loadings of parcels would be higher.

Jeremy Anglim, in a post written several years ago, talks a bit about parceling and concludes that it is less of a problem in a case, like today, where one is trying to determine the number of factors. Actually, he was talking about confirmatory factor analysis but I just wanted you to see that I read other people’s blogs.

The very best article on parceling was called To Parcel or Not to Parcel and I don’t say that just because I took several statistics courses from one of the authors.

To recap this post and the last one:

I have a small sample size and due to the unique nature of a very small population it is not feasible to increase it by much.I need to reduce the number of items to an acceptable subject to variables ratio. The communality estimates are quite high (over .6) for the parcels. My primary interest is in the number of factors in the measure and finding an interpretable factor.

So… here we go. The person who provided me the data set went in and helpfully renamed the items that were supposed to measure socializing with people of the same culture ‘social1’, ‘social2’ etc, and renamed the items on language, spirituality, etc. similarly. I also had the original measure that gave me the actual text of each item.

**Step 1: Correlation analysis**

This was super-simple. All you need is a LIBNAME statement that references the location of your data and then:

PROC CORR DATA = mydataset ;

VAR firstvar — lastvar ;

In my case, it looked like this

PROC CORR DATA = in.culture ;

VAR social1 — art ;

The double dashes are interpreted as ‘all of the variables in the data set located from var1 to var2 ‘ . This saves you typing if you know all of your variables of interest are in sequence. I could have just used a single dash if they were named the same, like item1 – item17 , and then it would have used all of the variables named that regardless of their location in the data set. The problem I run into there is knowing what exactly item12 is supposed to measure. We could discuss this, but we won’t. Back to parcels.

Since you want to put together items that are both conceptually related and empirically – that is, the things you think should correlate do- you first want to look at the correlations.

**Step 2: Create parcels**

The items that were expected to assess similar factors tended to correlate from .42 to .67 with one another. I put these together in a ver simple data step.

data parcels ;

set out.factors ;

socialp1 = social1 + social5 ;

socialp2 = social4 + social3 ;

socialp3 = social2 + social6 + social7 ;

languagep = language2 + language1 ;

spiritualp = spiritual1 + spiritual4 ;

culturep1 = social2 + dance + total;

culturep2 = language3 + art ;

There was one item that asked how often the respondent ate food from the culture, and that didn’t seem to have a justifiable reason for putting with any other item in the measure.

**Step 3: Conduct factor analysis**

This was also super-simple to code. It is simply

proc factor data= parcels rotate= varimax scree ;

Var socialp1 – socialp3 languagep spiritualp spiritual2 culturep1 culturep2 ;

I actually did this twice, once with and once without the food item. Since it loaded by itself on a separate factor, I did not include it in the second analysis. Both factor analyses yielded two factors that every item but the food item loaded on. It was a very nice simple structure.

Since I have to get back to work at my day job making video games, though, that will have to wait until the next post, probably on Monday.

—–

Be more than ordinary. Take a break. Play Forgotten Trail. I bet you have a computer!

### May

#### 16

Someone handed me a data set on acculturation that they had collected from a small sample size of 25 people. There was a good reason that the sample was small – think African-American presidents of companies over $100 million in sales or Latina neurosurgeons. Anyway, small sample, can’t reasonably expect to get 500 or 1,000 people.

The first thing I thought about was whether there was a valid argument for a minimum sample size for factor analysis. I came across this very interesting post by Nathan Zhao where he reviews the research on both a minimum sample size and a minimum subjects to variables ratio.

Since I did the public service of reading it so you don’t have to, (though seriously, it was an easy read and interesting), I will summarize:

- There is no evidence for any absolute minimum number, be it 100, 500 or 1,000.
- The minimum sample size depends on the number of variables and the communality estimates for those variables
- “If components possess four or more variables with loadings above .60, the pattern may be interpreted whatever the sample size used .”
- There should be at least three measured variables per factor and preferably more.

This makes a lot of sense if you think about factor loadings in terms of what they are, correlations of an item with a factor. With correlations, if you have a very large correlation in the population, you’re going to find statistical significance even with a small sample size. It may not be precisely as large as your population correlation, but it is still going to be significantly different than zero.

So … this data set of 25 respondents that I received originally had 17 items. That seemed clearly too many for me. I thought there were two factors, so I wanted to reduce the number of variables down to 8, if possible. I also suspected the communality estimates would be pretty high, just based on previous research with this measure.

Here is what I did next :

- Parceled
- Parallel analysis
- Factor Analysis

I can’t believe I haven’t written at all on parceling before and hardly any on the parallel analysis criterion, given the length of time I’ve been doing this blog. I will remedy that deficit this week. Not tonight, though. It’s past midnight, so that will have to wait until the next post.

Update: read post on parcels and the PROC FACTOR code here

—-

### Jul

#### 10

My doctoral advisor, the late, great Dr. Eyman, used to tell me that my psychometric theory lectures were,

A light treatment of a very serious subject.

Hmph. Well, with all due respect to a truly wonderful mentor, I still have to state unequivocally that the majority of students when looking at a factor analysis for the first (or second or third) time are thinking more like the title of this post.

Several days ago,I described how to point and click your way through factor analysis so that you got a bunch of output. Now what?

The questions to answer are:

- What exactly is a factor anyway?
- How many factors are present in these data?
- What does each factor that you extracted represent?

Conceptually, a factor is some underlying trait that is measured indirectly by the items you measured directly. For example, I want to measure a factor of “mathematical aptitude”. So, I ask a bunch of questions like, “What is 7 x 6?” and “If two trains left the station at the same time, going 100 miles an hour in opposite directions, how far apart would they be 45 minutes later?” I’m really not that interested in your ability to answer that specific question about trains.

Factor analysis is also referred to as a ‘dimension reduction technique’. It’s much simpler to understand a relationship between, say college GPA and two factors of quantitative aptitude and verbal aptitude than to explain the correlations among 120 separate questions and college GPA.

The measures could be anything – test scores, individual items on a test, measurements of various dimensions like height or weight, agricultural measures like yield of a rice field or economic ones like family income. You’re factor analyzing a correlation matrix of these neasures (if your input data set was not a correlation matrix, it’s going to be transformed into one before it’s analyzed). Correlations are standardized to have a variance of 1.

One thing you want to look at is the eigenvalues. An eigenvalue is the amount of variance in the individual measures explained by the factor. (If you don’t believe me, square the loadings in the factor pattern and add them up. The total is the eigenvalue. Prediction: At least one person who reads this will do exactly that and be surprised that I am right. Contrary to appearances, I do not make this shit up.) So if the eigenvalue is 1.0 it has explained exactly as much variance as a single item. What good is that? It would take you 42 factors with an eigenvalue of 1.0 to explain all of the variance in a set of 42 measures. You’re not reducing the dimensions any. For that reason, a common criterion for deciding the number of factors is “Minimum eigenvalue greater than 1.”

The problem is, and it has been documented many times over, this criterion, although it is the default for many software packages, tends to give you too many factors. I prefer two other methods. My favorite is the parallel analysis criterion which does many iterations of analysis of a dataset of random numbers. The idea is you should get factors that explain more than if you analyzed random data. There is a useful SAS macro for doing that.

Or … you can just look at a scree plot, which, although not quite as accurate involves no more effort than staring. Here is my scree plot from the 42 variables I analyzed from the 500 family study. As every good statistician (and Merriam-Webster ) knows, scree is ” an accumulation of loose stones or rocky debris lying on a slope or at the base of a hill or cliff”. The challenge is to distinguish which factors should be retained and which are just showing small random relationships among variables, like the bits of rubble.

Clearly, we want to keep our first factor, with an eigenvalue of 7.3. Our second, with an eigenvalue of 3.3 looks like a keeper as well. So-o-o , do we take the third factor with an eigenvalue of 2.2 or do we say that is just part of the scree-type random correlations? I’m saying we keep it. Were you hoping for something more scientific? Well, I guess you’re disappointed, then.

By the way, if we used the minimum eigenvalue of 1 criterion that would give us 12 factors which is just ridiculous. Liau et al. (2011) in a very serious paper for SAS Global Forum suggest not having less than 50% of the variance explained. That would mean your eigenvalues you keep add up to 21 at least, and not the 12.8 we have here (7.3 + 3.3 +2.2). To do that, instead of cutting the factors at our plot at 3, which I have so helpfully labelled Point A, we would instead cut it at Point B.

What we are doing now is an exploratory factor analysis so I am going to do this:

1. Based on my scree plot request a 3-factor solution.

2. Inspect the factor pattern and see if that makes sense to me based on expertise in the content area which I am going to pretend to have. (Actually, if you’re familiar with Baumrind’s work, it is looking a bit like the control / warmth factors that she postulated so I am not completely pulling this out of my — um, head.)

3. Run the parallel analysis macro and see the number of factors recommended by that.

Check back here next time I can get some time away from my day job writing computer games to pontificate on analysis of random data. Hopefully, that will be tomorrow because our (relatively) new Chief Marketing Officer is going to the women entrepreneur meet-up in Pasadena instead of me because I’d rather write about Kabacoff’s parallel analysis macro. And THAT is why I hired someone to do marketing.

### Jun

#### 27

# Mama AnnMaria’s Point-y Click-y Guide to Factor Analysis

June 27, 2013 | 3 Comments

So, yesterday, if you were paying attention, we figured out WHY to do a factor analysis today’s post is about how. I’m using SAS Enterprise Guide because I had it open on my computer.

Here is what the completed project looks like:

Here is what I did, reading from the top — I opened a data set, ran a factor analysis and looked at it. When I looked at it, I saw that over 120 of the records were missing out of less than 500 people. I made a note of this – literally.

Thing to know: the default for SAS is to delete a record if it is missing ANY of the variables.

Next, I ran summary statistics to see if maybe there was one that 200 people were missing, say it was about how much input parents have into your job choices and most of the kids did not work. If that had been the case, I could have just dropped that one variable. It wasn’t.

So… I ran correlations of all the variables and then I factor analyzed the correlation matrix (WAY easier than it sounds!)

After I took a look at the results from this analysis, I thought I could do better, so I re-analyzed the data requesting only three factors.

With the overview out of the way, let’s take a look at each part.

Open the data set is a piece of cake, go to File > Open > Data

Select the data set you want, just like you open a file in Microsoft Word or anything else.

To do the Factor Analysis, click TASKS then MULTIVARIATE and then select FACTOR ANALYSIS

A window will pop up where you select the variables you want to use in the analysis

Click on a variable and then click the arrow which I have so helpfully labeled as “A”. Notice that SAS Enterprise Guide in the box I have equally helpfully labeled “B” often gives you tips on what you are supposed to do in a given situation. You’re welcome. You can hold down the shift key, and select a bunch of variables at once, too.

You can leave most of the defaults but I would strongly suggest that you change two of them under ROTATION AND PLOTS. Generally, you’ll find a rotated factor pattern easier to interpret. I usually start with ORTHOGONAL VARIMAX rotation, which assumes that your factors are unrelated. I always want a scree plot, so I check that. Then, click RUN.

When you get your results, do NOT look at your results first. Be smarter than most people and look at your log. To do that you click on the tab that says LOG

When you do, you see this:

WARNING: 123 OF 465 OBSERVATIONS IN DATA SET WORK.SORTTEMTABLESORTED OMITTED DUE TO MISSING VALUES.

**If we didn’t have a lot of people missing data, we could skip the next few steps, but hey, that’s life. One of my big gripes about many statistics courses and textbooks is they pretend that data is always just pristine and perfect. There are very few times in real life that your data are like that, and this is not one of them.**

So …. before going any further, I decide to look at the descriptive statistics for the data. Normally I look at this before any other analyses to make sure the data are not out of range, there aren’t people who show an age of 999 or who scored 99 on a scale of 1 to 10. There aren’t variables that were skipped by 90% of the sample. I did that with these data but since now I am missing over one-fourth of the sample, I decide to look again.

To get descriptive statistics using SAS Enterprise Guide, go to TASKS > DESCRIBE > SUMMARY STATISTICS

A window will pop up and just as you did above, select the variables you want to analyze. When I look at the results, I can see that the data are fine. The variables are on a 0 (=Never) to 3 (=often) scale and that all looks right. The sample size is 431, 428, 429, 415. In other words, for each question, a few people overlooked it or skipped it, but if you add all of those people who missed one here or there together it comes out to 123 people.

Here is where you can factor analyze the correlation matrix. You see, a factor analysis is a look at which items on a questionnaire are related. We hope to find a group of items that are related to each other and then put them into a scale of say, parental supervision. What else looks at whether a bunch of items are related? Why, a correlation matrix.

Because I should get some actual work done for money, I’ll talk about how to do that in my next post, unless some other shiny thing catches my eye and I decide to write about something else.

=================================================

Learn math. Save lives. Learn culture. Kill animals. (Relax, it’s a game.)

### Jun

#### 23

Too often, when I look at the surveys some people design, I have the same thought as when I see my granddaughter with a lollipop bigger than her head –

Just what exactly do you think that you are going to DO with that?

The problem is that both may have metaphorically (or, in Eva’s case, literally) have bitten off more than they can chew.

Okay, great, you asked 72 questions on your survey, received 1,873 surveys back and most people answered most of them. You could try throwing everything into data mining software with your 72 items and hope for the best but that presumes a) you have some data mining software handy and b) an understanding of test sets and validation. I’m going with the more likely scenario that the answer to either a) or b) is

Um – no.

Imagine yourself in this scenario – someone, maybe you, has collected survey data at great expense. Maybe you paid subjects to answer questions about themselves, gave students credit to participate in a study, and now you have dozens, perhaps hundreds, of variables on each person. How on earth do you analyze these data? You could just go through and start putting questions together to form subscales, but that is pretty arbitrary. Enter factor analysis to help you make sense of your data.

Factor analysis is extremely useful. Conceptually, it is relatively easy to understand – mathematically, um, not so much so.

*You take a large number of questions and find what few, underlying traits they represent, such as supervision, collaborative decision making and ambition.*

So, for example, the Weschler Intelligence Scale has many, many items. These can be combined into subscales such as information, comprehension, object assembly and coding. The subscales can be further aggregated into two scores – a Verbal IQ and a Performance IQ.

This is based on the belief expressed by Wechsler who said that some people were good at reasoning with words and other people are good at reasoning with things but that both were types of intelligence. Writing a paper displays your intelligence, but so does putting together a computer or designing a part for it. So, said Wechsler, let’s have a bunch of items that measure those two factors, add up the scores on those items and get our two types of IQ.

Ever since I watched this TED talk by Conrad Wolfram on how math does not equal computation (and he is, of course, right), I’ve been thinking about how to apply it to the work we do here at The Julia Group.

Factor analysis is one example. The math behind it can be fairly daunting, but the actual concept is quite simple, and there are tools like SPSS and SAS Enterprise Guide that now eliminated the need to learn programming.

Still …. how do you know the number of factors? How do you decide which survey item goes with which factor? Why would you rotate and which rotation would you use?

Stay tuned and … later this week I will explain the answer to those questions and more. I know you can hardly wait.

### May

#### 22

# A quick introduction to interpretation of Exploratory Factor Analysis: Mplus Example

May 22, 2013 | 1 Comment

Last week I wrote a bit about how to get an exploratory factor analysis using Mplus. The question now, is what does that output MEAN ?

First, you just get some information on the programming statements or defaults that produced your output:

INPUT READING TERMINATED NORMALLY

Exploratory Factor Analysis ;

SUMMARY OF ANALYSIS

Number of groups 1

Number of observations 730

Number of dependent variables 6

Number of independent variables 0

Number of continuous latent variables 0

Observed dependent variables

Continuous

Q1F1 Q2F1 Q3F1 Q1F2 Q2F2 Q3F2

Estimator ML

Rotation GEOMIN

Row standardization CORRELATION

Type of rotation OBLIQUE

This tells us we our analyzing all of the data as one group, and not, for example, separate analyses for males and females. We have 730 records, six variables, all of which are continuous and listed above. The maximum likelihood method (ML) of estimation is used and the default rotation, GEOMIN, which is an oblique method, that is it allows the factors to be correlated.

Here we have a list of our eigenvalues

RESULTS FOR EXPLORATORY FACTOR ANALYSIS

EIGENVALUES FOR SAMPLE CORRELATION MATRIX

1 ……… 2 ……… 3 4 5

________ ________ _____ ________ ________

1.866 1.262 0.866 0.750 0.716

EIGENVALUES FOR SAMPLE CORRELATION MATRIX

6

________

0.539

In this case, you could go ahead with the eigenvalue greater than one rule, but let’s take a look at a couple of other statistics. First, we have the results from the one factor solution. Here we have the chi-square testing the goodness of fit of the model

Chi-Square Test of Model Fit

Value 96.228

Degrees of Freedom 9

P-Value 0.0000

We want this test to be non-significant because our null hypothesis is there is no difference between the observed data and our hypothesized one-factor model. This null is soundly rejected.

Let’s take a look at the Chi-square for our two-factor solution

Chi-Square Test of Model Fit

Value 3.016

Degrees of Freedom 4

P-Value 0.5552

You can clearly see that the chi-square is much smaller and non-significant.

Let’s take a look at two other tests. The Root Mean Square Error of Approximation (RMSEA) for the one-factor solution is .115, as shown below. We would like to see an RMSEA less than .05 which is clearly not the case here.

RMSEA (Root Mean Square Error Of Approximation)

Estimate 0.115

90 Percent C.I. 0.095 0.137

Probability RMSEA <= .05 0.000

For the two factor solution, our RMSEA rounds to zero, as shown below

RMSEA (Root Mean Square Error Of Approximation)

Estimate 0.000

90 Percent C.I. 0.000 0.049

Probability RMSEA <= .05 0.954

Clearly, we are liking the two-factor solution here, yes? The eigenvalue > 1 rule (which should not be TOO emphasized) points there, as does the model fit chi-square and the RMSEA.

In their course on factor analysis, Muthen & Muthen give this very nice example of a table comparing different factor solutions using the data

They also like the scree plot, which I do, too. I also agree with them that one should never blindly follow some rule but rather have some theory or expectation about how the factors should fall out. I also agree with them in looking at multiple indicators, for example, scree plot, chi-square, RMSEA and eigen-values.

### May

#### 15

# Exploratory Factor Analysis with Mplus

May 15, 2013 | 1 Comment

Previously, I discussed how to do a confirmatory factor analysis with Mplus. What if you aren’t sure what variables should load on what factor? Then you are doing an exploratory factor analysis. Really, you should probably do the exploratory factor analysis first unless you have some very large body of research behind you saying that there should be X number of factors and these exact variables should load on them. If you’re analyzing the Weschler Intelligence Scale, you probably could skip the exploratory step. For everyone else …. here is how you do an exploratory factor analysis with Mplus.

TITLE : Exploratory Factor Analysis ;

Data: FILE IS ‘values.dat’ ;

VARIABLE: NAMES ARE q1f1 q2f1 q3f1 q1f2 q2f2 q3f2 ;

ANALYSIS: TYPE = EFA 1 3 ;

ESTIMATOR = ML ;

When no rotation is specified using the ROTATION option of the ANALYSIS command, the default oblique GEOMIN rotation is used.

I explained the first three statements earlier this week.

The fourth statement is new. Like the other statements, you need to follow the ANALYSIS key word with a colon and end each statement in the command (or if you are familiar with SAS, think of it as a procedure) with a semi-colon.

TYPE = EFA 1 3 ;

Requests an exploratory factor analysis with a 1 factor solution, 2-factor solution and 3-factor solution. Of course, depending upon your own study, you can request whatever solutions you want. This is really useful because often in an exploratory study you aren’t quite sure of the number of factors. Maybe it is two or maybe three will work better. Mplus gives you a really simple way to request multiple solutions and compare them. I’ll talk more about that in the next post.

ESTIMATOR = ML ;

requests maximum likelihood estimation.

If you are interested in factor analysis at all, there is a really good video on the Mplus site. Far more of it discusses exploratory and confirmatory factor analysis – methods, goodness of fit tests, equations, interpretation of factor matrix – than Mplus, which as you can see, is pretty easy, so even if you are using some other software the video is definitely worth checking out.

### May

#### 15

Being able to find SPSS in the start menu does not qualify you to run a multi-nomial logistic regression.

This is the kind of comment statisticians find funny that leaves other people scratching their heads. The point is that it’s not that difficult to get output for some fairly complex statistical procedures.

Let’s start with the confirmatory factor analysis I mentioned in my last post. Once you get past the standard stuff that tells you that your model terminated successfully, the number of variables and factors, you see this:

Chi-Square Test of Model Fit

Value 8.707

Degrees of Freedom 8

P-Value 0.3676

The null hypothesis is that there is no difference between the patterns observed in these data and the model specified. So, unlike many cases where you are hoping to reject the null hypothesis, in this case I certainly do NOT want to reject the hypothesis that this is a good fit. As you can see from my chi-square value above, this model is acceptable.

Another measure of goodness of fit is the root mean square error of approximation (RMSEA).

RMSEA (Root Mean Square Error Of Approximation)

Estimate 0.011

90 Percent C.I. 0.000 0.046

Probability RMSEA <= .05 0.973

An acceptable model should have an RMSEA less than .05. You can see above that the estimate for RMSEA is .011, the 90 percent confidence interval is 0 – .046 and the probability that the population RMSEA is less than .05 is 97.3%. Again, consistent with our chi-square, the model appears to fit.

…………………………………………………………Two-Tailed

…………………Estimate S.E. Est./S.E. P-Value

F1 BY

Q1F1 1.000 0.000 999.000 999.000

Q2F1 1.828 0.267 6.833 0.000

Q3F1 1.697 0.235 7.231 0.000

F2 BY

Q1F2 1.000 0.000 999.000 999.000

Q2F2 1.438 0.291 4.943 0.000

Q3F2 1.085 0.191 5.687 0.000

Here are the unstandardized estimates. By default the first variable for each factor is constrained to a value of 1, so, of course, there is no real standard error, probability or standard error of estimate. It isn’t really an estimate, that was set. Let’s look at the other two. Since they are unstandardized the more useful measure for us is the estimate divided by the standard error of the estimate, for example 1.828/ .267 . This is done for us in the column under Est. / S.E. and in that case comes out to 6.833. You interpret these values in the same way as any z-score, with 1.96 as the critical value, and you can see in the last column that all of my variables loaded on the factor hypothesized with a p-value much less than .05.

The next thing I look at is the residual variances. At this point my only concern is that I *not* have a residual variance that is negative. It makes no sense that you would have a negative variance because (among other reasons) variance is a sum of squares and squares cannot be negative. Also, in this case, the commonality is greater than 1, meaning you have explained over 100% of the variance in this variable by its relation to the latent construct. This also makes no sense. These are referred to as Heywood cases and explained beautifully here (even though the linked documentation is from SAS it applies to any confirmatory factor analysis).

The final thing I want to look at, for right now, anyway, is the R-squared

R-SQUARE

Observed Two-Tailed

Variable Estimate S.E. Est./S.E. P-Value

Q1F1 0.142 0.032 4.473 0.000

Q2F1 0.475 0.065 7.256 0.000

Q3F1 0.438 0.061 7.123 0.000

Q1F2 0.174 0.045 3.883 0.000

Q2F2 0.376 0.078 4.827 0.000

Q3F2 0.179 0.044 4.057 0.000

You can see that the r-square is pretty decent overall. These are interpreted just like any other R-square values. I didn’t show the standardized factor loadings here but just take my word for it that the R-squared values are the standardized loadings squared. So this is the variance in q1f1, for example, explained by factor 1.

I started this whole thing working with Mplus to do a factor analysis and overall, I’d have to call it a pretty painless experience.

keep looking »

## Blogroll

- Andrew Gelman's statistics blog - is far more interesting than the name
- Biological research made interesting
- Interesting economics blog
- Love Stats Blog - How can you not love a market research blog with a name like that?
- Me, twitter - Thoughts on stats
- SAS Blog for the rest of us - Not as funny as some, but twice as smart. If this is for the rest of us, who are those other people?
- Simply Statistics, simply interesting
- Tech News that Doesn’t Suck
- The Endeavor -John D Cook - Another statistics blog