From time to time I get asked, “Can you recommend a book like Structural Equation Modeling for Dummies?” My unspoken thought is always, “You’re f***ing kidding me, right?” SEM isn’t the sort of thing done by dummies. Well, ask no more if you want  straightforward, basic treatment of CALIS – the SAS procedure for structural equation modeling, you should definitely check out Yiu-Fai Yung’s presentation on CALIS and missing data.

Of course in a 50 minute or so presentation you can’t do a comprehensive discussion of anything. Well, except maybe mangoes . I don’t think there is more than 50 minutes worth of stuff to say about mangoes.

Dr. Yung did not talk about mangoes, though, he talked about missing data.

As you well know, SAS, along with most other common statistical packages, uses pairwise deletion for missing data when it creates a correlation or covariance matrix. So, if you are missing data for 3 people for question 1, 6 different people for question 2, when you do the correlation your total N will be (N-9). Let’s say you have no missing data for question 46. Then, the N for question 1 and question 46 is (N-3) and the N for the correlation of question 2 and question 46 is (N-6). You know this. Everyone knows this. I thought it was a very good idea to start an SEM presentation with information everyone knows.

One problem with pairwise deletion is that you may end up with a matrix that is not positive definite. This is a bad thing. I wrote a blog a while back on the sadness of non-positive definite matrices. This page from Ed Rigdon’s structural equation modeling site explains a little more about why non-positive definite matrices involve division by zero to get the inverse, which is another thing that doesn’t require a huge amount of knowledge of advanced mathematics to know it won’t end well. Okay, pairwise deletion may give you a matrix which results in negative eigenvalues which is kind of the same as negative variance, i.e., stupid.

If you try to use a matrix with unequal Ns as input to PROC FACTOR it will give you a warning and use  the minimum N for any pair as the N. It is a sad reflection on what I do in my free time that I know this. The next possibility is to do listwise deletion. In that case, every record that has even one of your variables missing will be deleted. In that case, you should end up with a positive definite matrix but you may have lost a huge proportion of your data. Let’s assume, says Dr. Yung, that you have a small number of people missing data for each variable. If you have a large number of people missing for one variable, say 50% didn’t answer question 3, that’s an issue and you should look into what’s wrong with question 3. What the heck are you asking that half the people didn’t answer, their bra size?

Another possibility is to do mean  imputation. For each variable missing data, you substitute the mean for all of the other people. The problem with this is that it overstates your certainty because it understates your standard error. You are pretending  you had those data but you didn’t. The standard error is a function of N and something else. For example, the standard error of the mean is the standard deviation divided by the square root of N.  I know you already knew that, too. So, when you increase the size of N larger than it really is, you are dividing by a larger number which means your resulting standard error will be smaller. [Think about this for a moment and you will realize it makes perfect sense. Take your time. I’ll wait. ]

With PROC CALIS if you do this PROC CALIS METHOD = ML MSTRUC = x1 – x5 Then it will use the maximum likelihood method to arrive at a solution. The MSTRUC = gives the variables for which you want it to estimate the means and covariances. The default is to use all variables. The maximum likelohood method uses listwise deletion.

If you do PROC CALIS method = FIML ; it will use the full information maximum likelihood method which uses all of the information and does NOT do listwise deletion. You should read his paper to get a complete explanation. My best analogy is this. If you are familiar with multiple imputation, it is highly similar to if you did a multiple imputation using PROC MI and then ran your analysis.  [If you’re not familiar with multiple imputation this multiple imputation FAQ page is a quick and easy way to get  to know it better. ] This takes multiple steps, though.

So, if you wanted to do a regression and impute your variables, you do the PROC MI, then PROG REG and then the PROC MIANALYZE . PROC CALIS does the same things all in one step. To prove this, you could do those three steps above and then go ahead and do the same thing with CALIS. You did know that CALIS does things besides SEM, right, like regular path analysis models and regression.  Confirmatory factor analysis, too.

Of course you did,because if think about it, a structural equation model is just all of those pieces put together. Do the regression with MI and MIANALYZE and then, with the same dataset, try this, and you’ll see what I mean:

PROC CALIS method = FIML ;

path x1 <— x2 – x5 ;

So FIML is very much like if you did a whole bunch of multiple imputations and then ran your model. It uses all of the information so you do not delete any observations. So, he says, and I believe him, but  I am still going to try it  when I get home. You should read the paper. It was really good. Seriously,  there was a lot more to it and it was all extremely clear with less discussion of mangoes included than found here. It is number one on my list from here on out when people ask me if such a thing as a clear explanation of at least one aspect of SEM exists. Also, full information maximum likelihood is a relatively new concept to most people so having it explained clearly was most helpful.



Despite the title, this is not the new show in Las Vegas. John Sall is Mr. JMP . In fact, it is said behind his back that JMP actually stands for John’s Macintosh Project, this being the product SAS started about the time that they made the move away from running SAS on Mac OS and forced everyone to run out and buy VMware or Parallels unless they wanted to start running their statistics on Excel or with a slate and flat rock, which is pretty much the same thing.

I met John Sall and he was incredibly nice and talked to me about structural equation modeling for a while and recommended two people whose presentations I should attend who he said would know more about it than him. I went to one of them, by Yui-Fai Yung and John was right, it was awesome.  I did not think it was possible to do a simple straight-forward presentation on SEM.

The other person was Wayne Watson, who everyone has mentioned. He’s presenting tomorrow.

It wasn’t until hours later that I realized that this nice man is worth about a billion dollars and could afford to buy my children. Well, not my children – despite their annoying faults, which are legion, I am actually quite fond of them.- but probably somebody’s children.

I had ignored JMP 9 when it came out because I figured it was pretty much the same thing as JMP 8 with a different digit at the end, that being the case with most new versions of software.

I was wrong.

Three cool things about JMP in JMP 9 (and thank you to the amazingly helpful Eric from JMP for demonstrating them all for me).

The amazingly Eric from JMP in case you wondered what he looks like.

  1. JMP now integrates with R so if you want to analyze your data in R, put it into some format, say factor scores or whatever floats your boat, and then pass it back to JMP without ever having to take the 42 seconds for your virtual machine to start up and run SAS, then you can, allowing you more time to drink Chardonnay and eat jelly beans.
  2. There is now a map option under graph builder. This was actually something that was way more interesting to me than R because I had actually said to myself last week how useful it would be if JMP had a map option as that was the only reason I needed to use SAS for a presentation I was doing. Now JMP has a map option.  If you are looking at your good old graph builder window, in the bottom left corner, you now have an option that says SHAPE. Drag a variable, say STATE to that corner and you can create a map. (See example below of a heat map of movie data that it took Eric a few seconds to create. A few seconds is the kind of response time I’m talking about, because I have the attention span of an ant and the patience of  two-year-old.)
  3. A really, really cool example of the use of the new mapping feature was given by Eric using a cheetah. He had gotten the cheetah data from an organization doing research on endangered species. I include this information on the off-chance that you should find yourself asking, “Where can I get information on cheetah movements?” The answer is that you should ask Eric from JMP to hook you up. His cheetah data included the latitude and longitude of a specific cheetah, we’ll call him Bob the Cheetah,  on specific dates. Using these data, he could create a bubble plot that showed the cheetah location over time. Since a plot of this with just an X axis and Y axis is kind of boring, he overlaid it on a map thus allowing him in just a few seconds to produce an animated bubble plot that showed the movement on a map of his endangered cheetah over a period of time. He mentioned this to someone else, who contacted the organization which then provided that person the data on FIVE cheetahs. This unnamed JMP person is doing a presentation at SAS Global Forum showing the movements across time of five cheetahs. Eric seemed a bit put out about having been exceeded in the cheetah data realm. So, if you contact the same organization and they give you data on ten cheetahs to put on your JMP bubble plot/ map, I recommend you don’t tell him.

This is a picture of the movie map. The moving cheetah one was cooler but I did not have the data to create the animation nor do I have JMP 9.

(Note to self: find client who loves me enough to buy me JMP 9)

I guess if this was more of a full service blog I would have taken a video of it.

Oh well.

The fourth thing with JMP that everyone has been talking about is the SEM interface. What Mr. JMP a.k.a. John Sall did tell me is that it was an interface to the SAS CALIS procedure, which both makes sense and makes me happier about it, since CALIS is a fairly well-documented and tested procedure.

Speaking of CALIS, the presentation I went to today by Dr. Yung absolutely rocked. I’ll write more about it tomorrow but my hot tip is if you are interested in not just SEM but other stuff that CALIS can do like FIML (full information maximum likelihood) for regression, you should download his paper as soon as the SAS Global Forum 11 proceedings are out.



In three minutes before the next statistics session, here’s some more on the opening session last night.

SAS Chief Marketing Officer Jim Davis made the comment that for every SAS product they are asking the question “Is there a mobile application for this and if so what does it look like?”

He also showed some really cool application of the Social Media Analytics using an iPad 2. (Note to self: Find out if he used something like gotomypc or if there is a way to run SMA on an iPad)

The social media applications are really interesting here but I’m going to duck into the SEM sessions instead. Tough choice.

« go back


WP Themes