The government is extremely fond of amassing great quantities of statistics. These are raised to the nth degree, the cube roots are extracted, and the results are arranged into elaborate and impressive displays. What must be kept ever in mind, however, is that in every case, the figures are first put down by a village watchman, and he puts down anything he damn well pleases.
Any time you do anything with any data your first step is to consider the wisdom of Sir Josiah Stamp and check the validity of your data. One quick first step is using the Summary Tables task from SAS Enterprise Guide. If you are not familiar with SAS Enterprise Guide, it is a menu driven application for using SAS for data analysis. You can open a program window and write code if you like, and I do that every now and then but that’s another post. In my experience, SAS Enterprise Guide works much better with smaller data sets – defined by me, as the blog owner, of less than 400,000 records or so. Your mileage may vary depending upon your system.
How to do it:
- Open SAS Enterprise Guide
- Open your data set – (FILE > OPEN > DATA)
- From the TASKS menu, select DESCRIBE and then SUMMARY TABLES. The window below will pop up
- Drag the variables to the roles you want for each. Since I have less than 450 usernames here, I just quickly want to see are there duplicates, errors (e.g. ‘gret bear’ is really the same kid as ‘grey bear’ , with a typo). I also want to find out the number of problems each student attempted and the percent correct. So, I drag ‘username’ under CLASSIFICATION VARIABLES and ‘correct’ under ANALYSIS variables. You can have more than one of each but it just so happens I only have one classification and one analysis variable I’m interested in right now.
5. Next click on the tab at left that says SUMMARY TABLES and drag your variables and statistics where you want them. I want ‘username’ as the row, so I drag it to the side, ‘correct’ as the column, N is already filled in as a statistic if you drag your classification variable to the table first. I also want the mean, so I drag that next to the N. Then, click RUN.
Wait a minute! Didn’t I say I wanted the percent correct for each student? Why would I select mean instead of percent?
Because the pctN will simply tell me what percent of the total N responses from this username make up. I don’t want that. Since the answers are score 0 = wrong, 1= right, the mean will tell me what percentage of the questions were answered correctly by each student. Hey, I know what I’m doing here.
6. Look at the data! In looking at the raw data, I see that there are two erroneous usernames that shouldn’t be there. These data have been cleaned pretty well already, so I don’t find much to fix.Now, I want to re-run the analysis deleting these two usernames.
7. At the top of your table, you’ll see an option that says “Modify Task”. Click that.
8. Under TASK FILTER pull down the first box to show the variable ‘username’. Pull down the second box to show the option NOT EQUAL TO and then click the three dots next to the third box. This will pull up a list of all of your values for usernames. You can select the one you want to exclude and click OK. Next to the three dots, pull down to select AND, then go through this to select the second username you want to delete. You can also just type in the values, but I tend to do it this way because I’m a bad typist with a bad short-term memory.
11. From the DESCRIBE menu again select SUMMARY STATISTICS
12. Drag ‘correct_mean’ under ANALYSIS VARIABLES and click RUN.
The resulting table gives me my answer – the mean is .838 with a standard deviation of .26 for N=424 subjects. So … the average subject answered 84% of the problems correctly. This, however, is just the first step. There are couple more interesting questions to be answered with this data set before moving on.
It’s been a good week for the darling daughters.
The Spoiled One graduated summa cum laude, also president of the senior class, and is heading to the east coast to attend a small liberal arts college where she has an academic scholarship and a spot on the soccer team.
The book co-authored by Darling Daughter One and Darling Daughter Three won International Sports Biography of the Year, and the two lovelies pictured above flew to London to receive the award.
The Perfect Jennifer has tenure now and is finishing out another year of being an outstanding teacher.
A couple of years ago, there was a book with the thesis that Chinese mothers are superior and all Americans are raising a bunch of lazy slackers. It irritated me and I wrote a blog with the title “Why American mothers are superior” because that seemed more professional than “Go Fuck Yourself” . And no, in all seriousness, I really don’t think that one race or country has better mothers, but I also think the idea that if we don’t regiment our children lock-step for 18 years straight into MIT we are a bunch of losers is irritating as fuck.
You might think this is my rubbing it in post to say, “How you like me now? My kids are doing awesome.”
You’d be wrong. To paraphrase Erma Bombeck yet again, no mother should ever be arrogant because she can’t be sure that at any moment the principal won’t call to tell her that one of her children rode a motorcycle through the gymnasium.
I wanted to talk about something different – definitions of success that Tiger Mom Lady probably would not understand at all.
A friend of mine has a son in his mid-twenties who lives at home. He earned a degree from a two-year college. He is not crushing it as a hedge fund manager, but rather, has a regular job with benefits. I’m sure Tiger Mom would be dismayed if he was her kid.
My friend was distraught over the situation at work. The company had been acquired and reorganized. Her new boss was a nightmare and she came home in tears more often than not. Despite over a decade of good performance, she was afraid she was going to be laid off and was becoming depressed and stressed. They couldn’t afford to make the payments on their house on one income, and they had already lost a home back in 2008 when the housing marketing imploded. They were the collateral damage of those hedge fund managers.
It was at this point that her son (remember him?) stepped up. He had been living at home to save money for a down payment on a house of his own. Since he is single, has no children and gets along well with his parents, it seemed like a good arrangement, and he was paying them rent, but a lot less than it would cost to go out and get his own apartment. Plus, there were those home-cooked meals. He said something like this,
Look, you took care of me for 26 years. I make enough money now to cover the mortgage. If you are that unhappy about your job, quit. Even if you don’t quit your job, at least quit worrying about being laid off. I’ll pick up any slack. Between Dad and me, we got you covered.
Look at this family – they all love each other, the mom, dad and son. They get along well enough that he feels comfortable living at home to save money. Her son is hard-working and appreciates the fact that his parents have done what they could to support him. He can take the perspective of another person, see the stress his mother is experiencing and offer to do what he can to alleviate it out of appreciation for what they have done for him.
In my view, my friend is a success as a mother and her son is a success as a human being.
Where we left off, I had created some parcels and was going to do a factor analysis later. Now, it’s later. If you’ll recall, I had not find any items that correlated significantly with the food item that also made sense conceptually. For example, it correlated highly with attending church services but that didn’t really have any theoretical basis. So, I left it as a single variable. Here is my first factor analysis.
proc factor data= parcels rotate= varimax scree ;
Var socialp1 – socialp3 languagep spiritualp spiritual2 culturep1 culturep2 food;
You can see from the scree plot here that there is one factor way at the top of the chart with the rest scattered at the bottom. Although the minimum eigen value of 1 criterion would have you retain two factors, I think that is too many, for both logical and statistical reasons. The eigenvalues of the first two factors, by the way, were 4.74 and 1.10 .
Even if you aren’t really into statistics or factor analysis, I hope that this pattern is pretty clear. You can see that every single thing except for the item related to food loads predominantly on the first factor.
These results are interesting in light of the discussion on small sample size. If you didn’t read it, the particular quote in there that is relevant here is
“If components possess four or more variables with loadings above .60, the pattern may be interpreted whatever the sample size used .”
Final Communality Estimates: Total = 5.845142
These communality estimates are also relevant but it is nearly 1 am and I have to be up at 6:30 for a conference call, so I’ll ramble on about this some more next time.
First of all, what are parcels? Not the little packages your grandma left on the table in the hall when she came back from shopping. Well, not only that.
In factor analysis, parcels are simply the sum of a small number of items. I prefer using parcels when possible because both basic psychometric theory and common sense tells me that a combination of items will have greater variance and, c.p., greater reliability than a single item.
Just so you know that I learned my share of useless things in graduate school, c.p. is Latin for ceteris paribus which translates to “other things being equal”. The word “etcetera” meaning other things, has the same root.
Know you know. But I digress. Even more than usual. Back to parcels.
As parcels can be expected to have greater variance and greater reliability, harking back to our deep knowledge of both correlation and test theory we can assume that parcels would tend to have higher correlations than individual items. As factor loadings are simply correlations of a variable (be it item or parcel) with the factor, we would assume that – there’s that c.p. again – factor loadings of parcels would be higher.
Jeremy Anglim, in a post written several years ago, talks a bit about parceling and concludes that it is less of a problem in a case, like today, where one is trying to determine the number of factors. Actually, he was talking about confirmatory factor analysis but I just wanted you to see that I read other people’s blogs.
The very best article on parceling was called To Parcel or Not to Parcel and I don’t say that just because I took several statistics courses from one of the authors.
To recap this post and the last one:
I have a small sample size and due to the unique nature of a very small population it is not feasible to increase it by much.I need to reduce the number of items to an acceptable subject to variables ratio. The communality estimates are quite high (over .6) for the parcels. My primary interest is in the number of factors in the measure and finding an interpretable factor.
So… here we go. The person who provided me the data set went in and helpfully renamed the items that were supposed to measure socializing with people of the same culture ‘social1’, ‘social2’ etc, and renamed the items on language, spirituality, etc. similarly. I also had the original measure that gave me the actual text of each item.
Step 1: Correlation analysis
This was super-simple. All you need is a LIBNAME statement that references the location of your data and then:
PROC CORR DATA = mydataset ;
VAR firstvar — lastvar ;
In my case, it looked like this
PROC CORR DATA = in.culture ;
VAR social1 — art ;
The double dashes are interpreted as ‘all of the variables in the data set located from var1 to var2 ‘ . This saves you typing if you know all of your variables of interest are in sequence. I could have just used a single dash if they were named the same, like item1 – item17 , and then it would have used all of the variables named that regardless of their location in the data set. The problem I run into there is knowing what exactly item12 is supposed to measure. We could discuss this, but we won’t. Back to parcels.
Since you want to put together items that are both conceptually related and empirically – that is, the things you think should correlate do- you first want to look at the correlations.
Step 2: Create parcels
The items that were expected to assess similar factors tended to correlate from .42 to .67 with one another. I put these together in a ver simple data step.
data parcels ;
set out.factors ;
socialp1 = social1 + social5 ;
socialp2 = social4 + social3 ;
socialp3 = social2 + social6 + social7 ;
languagep = language2 + language1 ;
spiritualp = spiritual1 + spiritual4 ;
culturep1 = social2 + dance + total;
culturep2 = language3 + art ;
There was one item that asked how often the respondent ate food from the culture, and that didn’t seem to have a justifiable reason for putting with any other item in the measure.
Step 3: Conduct factor analysis
This was also super-simple to code. It is simply
proc factor data= parcels rotate= varimax scree ;
Var socialp1 – socialp3 languagep spiritualp spiritual2 culturep1 culturep2 ;
I actually did this twice, once with and once without the food item. Since it loaded by itself on a separate factor, I did not include it in the second analysis. Both factor analyses yielded two factors that every item but the food item loaded on. It was a very nice simple structure.
Since I have to get back to work at my day job making video games, though, that will have to wait until the next post, probably on Monday.
Someone handed me a data set on acculturation that they had collected from a small sample size of 25 people. There was a good reason that the sample was small – think African-American presidents of companies over $100 million in sales or Latina neurosurgeons. Anyway, small sample, can’t reasonably expect to get 500 or 1,000 people.
The first thing I thought about was whether there was a valid argument for a minimum sample size for factor analysis. I came across this very interesting post by Nathan Zhao where he reviews the research on both a minimum sample size and a minimum subjects to variables ratio.
Since I did the public service of reading it so you don’t have to, (though seriously, it was an easy read and interesting), I will summarize:
- There is no evidence for any absolute minimum number, be it 100, 500 or 1,000.
- The minimum sample size depends on the number of variables and the communality estimates for those variables
- “If components possess four or more variables with loadings above .60, the pattern may be interpreted whatever the sample size used .”
- There should be at least three measured variables per factor and preferably more.
This makes a lot of sense if you think about factor loadings in terms of what they are, correlations of an item with a factor. With correlations, if you have a very large correlation in the population, you’re going to find statistical significance even with a small sample size. It may not be precisely as large as your population correlation, but it is still going to be significantly different than zero.
So … this data set of 25 respondents that I received originally had 17 items. That seemed clearly too many for me. I thought there were two factors, so I wanted to reduce the number of variables down to 8, if possible. I also suspected the communality estimates would be pretty high, just based on previous research with this measure.
Here is what I did next :
- Parallel analysis
- Factor Analysis
I can’t believe I haven’t written at all on parceling before and hardly any on the parallel analysis criterion, given the length of time I’ve been doing this blog. I will remedy that deficit this week. Not tonight, though. It’s past midnight, so that will have to wait until the next post.
In a very random life event, I was asked a lot of questions recently by people exploring making a movie about my life. This is not the interesting part, because in Hollywood people are always talking about making movies that come to nothing …
The interesting thing was how many times the answer to a question was,
Sister Marion, my sixth-grade math teacher.
I was not a very prepossessing child.
In fact, if there was such a word as anti-possessing (which there is not), that would have defined me well. I was short, overweight, often dressed in my brother’s too-big clothes because I was too lazy to look for my own uniform and didn’t care about my appearance. I was also the type of child who knew the definition of words like ‘prepossessing’ and mocked other children, and teachers, if they did not. It probably doesn’t surprise you to hear that I was not wildly popular.
My grades were not the best, partly because I often forgot my homework in the mad rush to get five kids out the door early enough that my mother could make it to work on time. Partly it was because I am EXTREMELY near-sighted, a fact no one discovered until the third or fourth grade (thank you, Lions Club vision screening!) and even after that I usually could not see the board because I could not manage to have a pair of glasses for more than a few weeks without losing them. Glasses were not cheap and my family didn’t have a lot of extra cash so it would usually be months between pairs.
Then, I got the chicken pox and was out of school for a week. Despite all of the bewailing about how stupid today’s children are compared to yesteryear, back then we learned fractions in sixth-grade, not fifth, and I had missed the entire week when these were introduced. A petty teacher (and the world has too damn many of those), might have been gratified by the fact that a pain-in-the-ass, know-it-all kid was finally going to be put in her place.
I’d like to think that Sister Marion realized that the only thing I felt I had going for me was being smart and that’s why I had to rub everyone’s face in it. Maybe she realized I needed a friend, and a new perspective.
Whatever it was, she paired me up with another child in the class, Diane, who wasn’t a star student overall, but was very good at math, and told her to explain to me what we had learned while I was out. Not only did I get caught up on fractions, but I learned not to underestimate people based on appearances or first impressions. Just because a person wasn’t a great reader didn’t mean she couldn’t be good at math. Diane and I actually had conversations, and she introduced me to another friend of hers, also named Diane. I called one of the Dianes on the phone – it was the first time I had ever had another kid at school to call – and I was 11.
Sister Marion was nice to me. If you think every teacher is nice to every child then perhaps you need to go back and read the beginning of this post. When I think back, I can only think of two teachers I had before I got expelled from the public school system who were consistently nice to me, Sister Marion and Mr. Cartwright, my 8th grade algebra teacher.
It’s probably no coincidence that I’m good at math and made a career of it.
It’s funny how often when they asked me questions, Sister Marion’s name came up.
Did you have a teacher who you particularly admired?
Was there a teacher who interested you in mathematics?
What made you decide that you wanted to teach?
Who were your role models in life?
I’m not saying that she was the only person who was a role model or who made a difference. However, she was exactly what we try to be at 7 Generation Games – a change in the trajectory that made me shift from doing all right in school with no effort to doing better and better with more effort. She was a person that made me think I could be more than ordinary.
Of course I make an effort to encourage the students who show exceptional effort and ability. Then, I remember Sister Marion and make an extra effort to also encourage students who are annoying, rude, don’t do their work.
When I think of Sister Marion, I am reminded yet again of the truth of that saying:
I touch the future. I teach.
Want to see what I did with math once I grew up?
Since I already called my mom on Mother’s Day, I thought that I’d talk about another woman who was important in my life, a mentor, who I probably haven’t talked to in 20 years. (I know, I’m such an ungrateful bitch. )
Dr. Jane Mercer was not even in the same department as me. My dissertation was an analysis of the psychometric properties of Wechsler Intelligence Scale for Children – Revised , Mexicano, and she was a sociologist renowned for her expertise on the impact of social and cultural factors on intelligence test scores.
Shortly after I finished the first draft of my dissertation, my advisor received some distressing news (no, it wasn’t that he was my advisor, he already knew that). He and his wife had begun dating as very young teenagers. Other than his military service during World War II, they had been together ever since. When she was diagnosed with cancer, he walked into the dean’s office and just said, simply,
… And went on sabbatical with about a four-minute notice.
Everyone completely understood. His colleagues took over committee responsibilities. As his doctoral student that was furthest along, I taught his courses, like inferential statistics.
I was his only doctoral student writing a dissertation, and someone needed to step in to supervise my research. That was Dr. Jane Mercer.
Not only did she read every draft of my dissertation, recommend articles I read and journals to submit publications, introduce me to people at conferences (not a gesture to be underestimated when one is looking for a position) but, more importantly, she provided advice on life.
Here are a few of the things I learned from Dr. Mercer just by observing her.
1. NO MATTER HOW FAR YOU HAVE GONE DOWN THE WRONG ROAD, TURN BACK! Taped over her desk, Dr. Mercer had a piece of paper with this proverb typed on it. No matter how far you’ve gone down the wrong road, turn back. We’re told in America that quitters never win, bloom where you’re planted, you can’t fight city hall, you’re never going to win against big corporations. Making a change in anything from your employer to your gym to the crowd you hang around with can be treated as an act of disloyalty. People stay in situations long, long after they should have left because they are ‘committed’, ‘invested’, ‘cannot leave now’. The unwillingness to turn back after going a long way down the wrong road is the second biggest barrier most people’s happiness. The biggest is fear, which leads me to …
2. Have the courage to speak the truth as you see it. Being the most brilliant researcher in the world does no good to anyone if you are afraid to publish and publicize unpopular results. In the 1970s, many people thought intelligence tests were the answer to psychology’s long history of physics envy. At last, we were a real science with actual numbers, not this whacko dream interpretation stuff but measurement – hey, IQ even has a math word – quotient, in the name. Not to mention, companies like The Psychological Corporation and Educational Testing Service were big business (still are). Jane Mercer sincerely believed intelligence tests systematically underestimated the intelligence of low-income, minority children. In the case of Diana vs the State Board of Education, a lawsuit was filed on behalf a few Mexican-American children, including a little girl who spoke Spanish as her first language, was tested in English and determined to be mentally retarded. All of the big names (and big money) lined up on the side of the State Board of Education and Jane spoke up for the side of Diana. This may not seem like much now, but back then she had to stand up to a LOT of opposition, it was not happy times. She did it anyway.
3. Yes, you CAN have a job and a family. Men do it all the time. Jane was older than me and of that generation that was told women could either have a career or children but not both. By the time I met her, her four sons were all adults. She and her husband got along fine and seemed to agree that since they were both parents of these children they could both engage in parenting them. We couch things in daunting terms “Can women have it all?” Of course no one has it ALL. I’m finishing this blog post in the Denver airport. That empty spot you see at the end of jetway is where the plane I am taking back to Los Angeles should be.
I would like to have a non-eventful flight out of Denver airport, just once. You see, none of us can have it ALL but no one asks men whether they think they can manage a career and children.
4. Being the first or only woman in an area doesn’t mean you have to go along with that happy-to-be-here crap. Yes, she was a tenured professor at the University of California, which had damn few of them, but that didn’t mean she had to accommodate in any way because of her gender. Don’t take on female doctoral students because you don’t want to be type-cast as ‘only a good advisor for women’? Screw that! If they needed an advisor and she could help, she was on board. Don’t speak out about intelligence testing because people will think you are shrill or too emotional, not a real academic? Screw that twice! As you can see, I have taken that lesson deeply to heart but with less of her limits on profanity.
Woo-hoo – plane boarding now – only 90 minutes late – gotta go. Happy Mother’s Day.
At first, I was thinking it wasn’t right to have a favorite paper, but then I realized that was idiotic. It’s not like these papers (or their presenters) are my children.
My favorite paper was,
If you’re not a statistician, props to you for reading after that first sentence, especially since some of the lessons apply to any conference.
- You don’t always have to present or attend presentations on whatever is shiny and new. The techniques he presented, like GLMSELECT, a method for selecting the best model is not brand new. I remember when it was first added to SAS/STAT and thinking it was a way cool idea I should use – but, then, I didn’t. As you can see from the graph above, it can be pretty easy to select the best model. Looks a lot like a scree plot, doesn’t it? This also further supports my point that visual displays of data, like the one above, are everywhere and taking over. Now that I have been reminded of its existence, I’m looking for a use for it so I can really remember it. Unfortunately, this is a method for general linear models and what I am most interested in right now has a binomial outcome, whether a player finished a game or not.
- Don’t stop learning when you go home. I remembered that there was also an example in this paper that used HPGENSELECT for generalized linear models, including binomial distributions. So, I am going to try that out with this dataset. One of the areas where I am improving is actually reading all of those papers I mean to get around to when I get home. Whether it is a paper you attended, but is now jumbled around in your brain with the other 25 sessions, or one you could not attend because it conflicted with something else, when you get home, you should read it. Conferences can be expensive and you want to get the most out of that time and money you spent.
- Of course, I learned about sparse regression, quantile regression, classification and regression trees and more, which you can, too if you follow my advice from #2.
Okay, well there is a lot more to say about SAS Global Forum and my adventures with HPGENSELECT but we have a new game, Forgotten Trail, coming out for sale tomorrow, so back to work.
In the past month, I have been in five cities and three states and met a lot of people. Lately, I’ve had conversations with several families that have adult “children” living at home in their 30s, 40s and even 50s. By then, the children’s girlfriend/boyfriend and children are also living in the home. Often, this whole inverted pyramid is supported by one person in his or (usually) her sixties or seventies.
It’s not often that I don’t speak up but when I do it is because the incongruous situation just leaves me at a loss for words.
Seriously, when you are 72 and calling your 50-year-old son’s probation officer, what the hell are you thinking?
One of my daughters had just come back from the Olympics. She had not finished high school and was scheduled to take the GED exam, but she was driving across country and ran into some bad weather. I called the GED center to reschedule the exam.
The nice lady on the phone asked me,
Excuse me, ma’am, but how old is your daughter?
Startled, I replied,
The woman said,
Then, let me tell you something. If she’s 21 years old, you should not be calling for her. She’s old enough to make these calls herself.
The excellent point that this woman made was that we make excuses for our children. She was training for the Olympics. She was caught in a snow storm.
The fact is, Ronda was perfectly capable of making arrangements herself. She came home, rescheduled the exam and passed it with flying colors, took a couple of college courses, worked a couple of jobs and ended up being very successful in multiple chosen careers.
There are two questions here:
- Why do our children rely on us long after they are capable of relying on themselves?
- Why do we do for our children long after they are capable of doing for themselves?
The answer to the first is easy, I think. It is comfortable, convenient and they are accustomed to it. I’ve heard more than one adult respond with outrage to the suggestion they should start paying rent,
“I’ve lived here for 29 years without paying rent. Why should I start now? Just to make it easier on my mom?”
Honestly, I want to slap those people. What the hell is wrong with you that you cannot see that you are making life more difficult than it needs to be for someone you supposedly love? Not all of those people are evil. They have fallen into a pattern where they live in someone else’s house, drive their car, eat their food and they consider it all the same as if they earned it themselves. They’ve never grown up.
What about those parents? Why do they allow this? Why are they still calling the dentist, unemployment office, probation officer, community college counselor long after their “children” are adults?
Like the younger generation, part of it is habit. They have been taking care of Johnny since he came out of the womb, cleaning up his messes, solving his problems and it is hard to break that habit. I’m sure they love their lazy, inconsiderate offspring and don’t want to see him not get the classes he needs, sent to jail, not get his general assistance check.
On the other hand, there is a good dose of guilt handed out to parents who kick their children to the curb. People have told me that I was an awful mother for telling all of my children that they had three choices when they turned 18; get a job, get into college or get out.
Especially in Ronda’s case, people have questioned whether I really loved her or believed in her if I told her she had a year to make it with this mixed martial arts bullshit and that was as far as I was willing to go. Note that Ronda herself has never thought this.
Frankly, I don’t get it. As I told all of my daughters,
I’m an old woman. I shouldn’t have to be supporting an able-bodied adult.
None of them expected that I would, but other people have called me heartless – and a lot worse. What about supporting their dreams?
Here is where some of the parents of Johnny Leech need some tough love. So, if that describes, you, Mom or Dad, listen up!
Parents are people, too. They are allowed to have their own dreams.
I have wanted to make educational games since I was in graduate school 30 years ago. Now that my children are on their own, I have the freedom to take the risks that running a start-up entails.
Maybe your dream is just to sit on your porch drinking iced tea and not have to get up and go to work. That is perfectly fine. Do it!
What about Johnny? What will he do if you kick him out? He’ll end up living on the street! He’ll starve!
Okay, probably not. He’ll figure it out. And, if he doesn’t, that’s his choice. Yes, it will probably be hard at first. Oh, well.
You’re entitled to your own life.
The nice thing about going to SAS Global Forum is that it’s the gift that keeps on giving. Long after I have gone home, there are still points to ponder.
Visual analytics is big and not just in the sense of there is a product out called that which I have never used but that every presentation, no matter how ‘tech-y’ now makes very effective use of graphics. If I was the type of person to say I told you so, I would mention that I predicted this six years ago after I went to SAS Global Forum in 2010.
Richard Culter’s presentation on PROC HPSPLIT, which was really excellent, made extensive use of graphics to illustrate fairly complex models.
You can create classification and regression trees (the model you can’t see in this tiny graphic on the left) and you can drill down into sub-trees for further analysis.
Sometimes your classification tree is very easily interpretable. For example, in this case here from the same presentation, each split represents a different type of vegetation/ land surface – water, two different species of tree, etc.
Speaking of classification, regression and PROC HPSPLIT ….
If you didn’t know, now you know
PROC HPSPLIT is a high performance procedure for fitting and classification now available in SAS/STAT which is useful for data sets where relationships are non-linear. It produces classification and regression trees, includes options for pruning trees and a whole lot more. It is now available on a single computer, not limited to high performance computing clusters. So, yay!
A regression tree is what you get when your dependent variable is continuous, and a classification tree when it is categorical, as in the vegetation example above.
On a semi-related note, graphics can even be used to show when a data set is not suited to a linear model as in the example below, also from Cutler’s presentation. You can see that all of the 1’s are in two quadrants and all of the 0’s in two other quadrants. Yes, you COULD use a regression line to fit this but that is not the best fit of the data.
Also, on a related topic that visualizing data, like all of statistics, really, is a process of iterations, I think this would be more obvious if the quadrants were color coded.
I have a lot more to say on this but I am in North Dakota speaking at the ND STEM conference this weekend and a kind soul gave me tickets to the hockey game in the president’s box, so, peace, I’m out.