### May

#### 15

Being able to find SPSS in the start menu does not qualify you to run a multi-nomial logistic regression.

This is the kind of comment statisticians find funny that leaves other people scratching their heads. The point is that it’s not that difficult to get output for some fairly complex statistical procedures.

Let’s start with the confirmatory factor analysis I mentioned in my last post. Once you get past the standard stuff that tells you that your model terminated successfully, the number of variables and factors, you see this:

Chi-Square Test of Model Fit

Value 8.707

Degrees of Freedom 8

P-Value 0.3676

The null hypothesis is that there is no difference between the patterns observed in these data and the model specified. So, unlike many cases where you are hoping to reject the null hypothesis, in this case I certainly do NOT want to reject the hypothesis that this is a good fit. As you can see from my chi-square value above, this model is acceptable.

Another measure of goodness of fit is the root mean square error of approximation (RMSEA).

RMSEA (Root Mean Square Error Of Approximation)

Estimate 0.011

90 Percent C.I. 0.000 0.046

Probability RMSEA <= .05 0.973

An acceptable model should have an RMSEA less than .05. You can see above that the estimate for RMSEA is .011, the 90 percent confidence interval is 0 – .046 and the probability that the population RMSEA is less than .05 is 97.3%. Again, consistent with our chi-square, the model appears to fit.

…………………………………………………………Two-Tailed

…………………Estimate S.E. Est./S.E. P-Value

F1 BY

Q1F1 1.000 0.000 999.000 999.000

Q2F1 1.828 0.267 6.833 0.000

Q3F1 1.697 0.235 7.231 0.000

F2 BY

Q1F2 1.000 0.000 999.000 999.000

Q2F2 1.438 0.291 4.943 0.000

Q3F2 1.085 0.191 5.687 0.000

Here are the unstandardized estimates. By default the first variable for each factor is constrained to a value of 1, so, of course, there is no real standard error, probability or standard error of estimate. It isn’t really an estimate, that was set. Let’s look at the other two. Since they are unstandardized the more useful measure for us is the estimate divided by the standard error of the estimate, for example 1.828/ .267 . This is done for us in the column under Est. / S.E. and in that case comes out to 6.833. You interpret these values in the same way as any z-score, with 1.96 as the critical value, and you can see in the last column that all of my variables loaded on the factor hypothesized with a p-value much less than .05.

The next thing I look at is the residual variances. At this point my only concern is that I *not* have a residual variance that is negative. It makes no sense that you would have a negative variance because (among other reasons) variance is a sum of squares and squares cannot be negative. Also, in this case, the commonality is greater than 1, meaning you have explained over 100% of the variance in this variable by its relation to the latent construct. This also makes no sense. These are referred to as Heywood cases and explained beautifully here (even though the linked documentation is from SAS it applies to any confirmatory factor analysis).

The final thing I want to look at, for right now, anyway, is the R-squared

R-SQUARE

Observed Two-Tailed

Variable Estimate S.E. Est./S.E. P-Value

Q1F1 0.142 0.032 4.473 0.000

Q2F1 0.475 0.065 7.256 0.000

Q3F1 0.438 0.061 7.123 0.000

Q1F2 0.174 0.045 3.883 0.000

Q2F2 0.376 0.078 4.827 0.000

Q3F2 0.179 0.044 4.057 0.000

You can see that the r-square is pretty decent overall. These are interpreted just like any other R-square values. I didn’t show the standardized factor loadings here but just take my word for it that the R-squared values are the standardized loadings squared. So this is the variance in q1f1, for example, explained by factor 1.

I started this whole thing working with Mplus to do a factor analysis and overall, I’d have to call it a pretty painless experience.

# Comments

4 Comments so far

## Blogroll

- Andrew Gelman's statistics blog - is far more interesting than the name
- Biological research made interesting
- Interesting economics blog
- Love Stats Blog - How can you not love a market research blog with a name like that?
- Me, twitter - Thoughts on stats
- SAS Blog for the rest of us - Not as funny as some, but twice as smart. If this is for the rest of us, who are those other people?
- Simply Statistics, simply interesting
- Tech News that Doesn’t Suck
- The Endeavor -John D Cook - Another statistics blog

Is it possible to have overall fit model indices e.g.,CFI 0.96 RMSEA 0.04-0.07 but some items having non-significant loadings but R square is significant for all of them?

Thanks for the beautiful explanation. Just to confirm whether I have understood completely when judging the model fit p>0.05 is desirable because of the way null hypothesis is framed and, however, when observing the factor loading’s estimate (estimate of say Q2F1) p value <0.05 is desirable same as most of the regression analysis.

Am I correct?

If my chi-square test of model fit is significant (so I failed to reject the Ho) but my RMSEA is over 0.05 and my estimates are acceptable, does that mean the model is OK? or because my chi-square results are not good then I cannot accept the model?

Thank you.

Miriam: if the chi-square model fit is significant this simply means you cannot accept the model as an exact fit, however it does not mean you cannot accept the model. If the RMSEA is greater (>0.05) then it is not a close fit, furthermore if the SRMR is greater than 0.8 then again its not a close fit hence poor. Above the mode fit you need to look at the standardised residuals and normalised residuals ; if these are very large (e.g >2.5) then there may be an modelled/unobserved factor we have not included in the model. Furthermore is the R squared values are very very small then again substantively speaking our model is not doing a good job in explaining the data or that particular variable/indicator. ** I AM NOT AFFILIATED WITH THIS WEBSITE I AM JUST A PASSER BY COMMENTING **