Dr. De Mars General Life Ramblings | statistics

Finding Groups in Data

ByAnnMaria De Mars September 26, 2008

Today, Dr. De Mars is — happy.

One of the fun things about my job is that I get to do lots of different things. That can be a bit troubling some days, because statistical software consultant encompasses a wide range from different types of models, to coding, to various operating systems to all of non-parametric, parametric, Bayesian and other statistics that I cannot remember at the moment.

Because the range of people I work with continually increases, I am now more often running into questions I cannot answer off the top of my head. I do know how Mahalanobis’ distance is used, even though I had not thought about it in years until someone asked me a question yesterday, I do know the calculation for pooled variance , which should be used when Levene’s test is rejected. Still, once a day or so, someone asks me a question I have to look up. Sometimes, these are on techniques I have not used before and just as many times, the question relates to something that I KNOW can be done, and I know this because I personally have used that statistic or written that code before. I just can’t remember how.

You know that saying,

“I have forgotten more about statistics than you’ll ever know.”

Well, that is my problem. I keep forgetting it. Fortunately for me, and this is why I am happy, I get to consult on a lot of different projects each week that remind me of things I used to know. For example, cluster analysis, as the Stata multivariate statistics guide so poetically says, is used for finding groups in data. You can use it to identify or validate specific diagnostic groups, you can try to group just about anything. Most often, cluster analysis is used as an exploratory technique, which is my favorite type of statistics, where you are turning a bunch of numbers into knowledge.

The most common way to use cluster analysis is the k-means technique. You assume there are k-groups (with k being a number you specify) and the program iterates to a solution. The program starts with k “seeds” which are the means for each group. Every observation is assigned to the group whose mean is closest to it. New group means are calculated based on the observations in the group. If an observation’s mean is closer to a different group, it is moved into that group. Then, group means are calculated again. This continues until a step is reached where none of the observations change groups. And that is one way to do cluster analysis.

Dr. De Mars General Life Ramblings

Is the Ivy League Ruining America?

ByAnnMaria De Mars May 26, 2013

I’ve been thinking about this a lot lately. It started when I read an article by David Brooks where he actually gave a student at Yale an ‘A’ and approved her assessment that “Time not spent investing in yourself carries an opportunity cost, rendering you at a competitive disadvantage as compared to others who maintained…

Dr. De Mars General Life Ramblings | Software

No one should be discouraged from computer science

ByAnnMaria De Mars March 18, 2014March 18, 2014

I read a blog post where the author said the women who dropped out of programming “should have been discouraged” because it’s not for everyone and many women try to use smiles and flattery to get men to do their work for them. I actually have had the experience the author cites, but with both…

Software | statistics | Technology

Whipping your data into shape with SAS : Part 1 for Today

ByAnnMaria De Mars February 24, 2018February 24, 2018

I’m sure I’ve written about this before – after all, I’ve been writing this blog for 10 years – but here’s something I’ve been thinking about: Most students don’t graduate with nearly enough experience with real data. You can use government websites with de-identified data from surveys, and I do, but I teach primarily engineering…

20 Day Blogging | Dr. De Mars General Life Ramblings | Software

Website to Die for : Day 3 of the 20-day blogging challenge

ByAnnMaria De Mars January 9, 2014January 14, 2014

The question for Day 3 is : “What is a website that you cannot live without? Tell about your favorite features and how you use it in your teaching and learning.” The first part is easy. Oh my God, I love, love, LOVE stackoverflow, a site where all of your programming questions are answered. It’s free…

55 things | Dr. De Mars General Life Ramblings

Jealousy is Toxic, Honesty is Good and other things I’ve learned in (almost) 55 years

ByAnnMaria De Mars June 7, 2013

Five more things I have learned in almost 55 years. Jealousy is bad for you. Scrupulous honesty about your motives will pay off. Much negative criticism stems from jealousy. Yes, employers ARE right to turn you down because you are over-qualified. Don’t take it personally but DO take it seriously. I am on a roll…

Dr. De Mars General Life Ramblings | Life Lessons

My Biggest Mistake

ByAnnMaria De Mars December 6, 2015December 6, 2015

I try not to be a hypocrite, so after a long talk this week with someone about the importance of admitting mistakes and not continuing to go down the wrong path, I sat down and asked myself, Self? What mistakes have I made? Certainly, if you can’t see any mistakes you have made, you are…

Similar Posts

Leave a Reply