Last year, one of my very young doctoral students (who was single), commented in class that she was sure women with more education were less likely to get married. Two older women in the class agreed that was probably true because women with more education were less likely to settle for just any man who came along.

The male students disagreed (as did the rocket scientist). They said that these women were looking at the demand side of the equation in deciding if they got married, but that the SUPPLY side, the number of men who wanted to marry them, increased as women had more education.

A simple way to answer this question is to download the American Community Survey and look at what percentage of women are married by level of education. I talked about these results before but did not explain how I got them.

First, got to the U.S. Census website and download the ACS data.
Second, subset the data to only include women over age 18. The age cut-off is up to you but you certainly don’t want to include all women because children of 8 or 10 aren’t able to get married and have less than a high school education, so will throw your results off. I also created a categorical variable with four categories, less than high school, high school graduate, college degree and graduate degree. I personally think that relationships between education and other things are not linear. Finishing your fourth year of college and getting your degree is a much bigger difference from your third year than going from two to three years of college. If you don’t agree with me, do your own analysis. (No, really, I’m serious.)

In our study, we only used African-American women just because the three women who had started the discussion originally were all African-American and they said,

“Hey, could we look at African-American women?”

Sure, why not?  That’s the great thing about open data and doing your own statistics, you can look at whatever you want.

Now we have the data set the rest is quite easy. We are going to do two tasks, a Table Analysis and a Bar Chart.

To do the Table Analysis

First, open SAS Enterprise Guide using SAS On-Demand

Second, open the data file you created, which is as simple as going to the FILE menu, selecting OPEN and picking DATA. Then select your data set. It’s no different than opening in a data set in Excel or Word or anything else. (You youngsters saying, “So?” don’t realize that is not the way SAS has always worked.)

Next, go to Tasks, then Describe then Table Analysis

Menus to get Table Analysis

The window you see below will pop up. In a table analysis, and much of SAS Enterprise Guide in general, the concept of Task roles is important.


In this case, there are only two roles to worry about. One is the Frequency count. This is the weight variable. Using this will give us an estimate of the total population frequencies. I drag the variable PWGTP under here.

Task roles

The only other role we need for this task is Table variables. Drag the variables married and education under here.


Next click on the TABLES tab in the window to your life. Drag married and education under Variables permitted in table .

Select table variables



Now drag married to the top of the table and education to the side

Table with variables assigned




Next, click on the CELL STATISTICS tab. Select Row percentages, Column Percentages, Cell frequencies and Include percentages in the data set

Cell Statistics Window




Finally, click on Cell Stat Results under the RESULTS tab. Under the Select tables for cell statistics tab, click the button next to the one table shown.

Cell Stat results window

Click RUN.

This will produce the table below. It is an rtf file because that just happened to be the default for the results in SAS. I highlighted the relevant numbers when I opened the output file in Word. (I exported the output file to my desktop.)

Education by marriage table

As you can see, in fact, the more educated these women were, the MORE likely they were to be married.

Next, I wanted to create a bar chart from these results. This is why I sent the output to a data set, which is what the last two windows – the including percentages in the data set and the saving the cell statistics did.

However, I have three daughters and two granddaughters waiting to go to brunch with me, since I fall into that graduate degree / married cell on the far left.

Tomorrow, assuming my prayers to the god of in-flight wi-fi are answered, I’ll post how to use this data set to create a bar chart. As my granddaughter says, it is easy-peasy lemon squeezy.




A few years ago, taking testimony in a court case, an attorney asked me,

“Tell me, doctor, have you heard the saying, ‘Lies, damned lies and statistics’? Isn’t it true what they say, that you can lie with statistics?”

I answered,

“Not to me, you can’t.”

My point that day was that if the person evaluating your statistical argument knows their stuff, you are not going to be able to use statistics to prove a false argument. This past week provided a prime example of that.

On March 3rd, darling daughter number three fought for the world title in mixed martial arts in the 135 pound division. Prior to the fight, several websites had picked her opponent to win, “based on the numbers”. They argued that the sports book odds (which favored Ronda three to one) were influenced by hype, trash-talking, looks, you name it and that if you looked at actual numbers her opponent would win.

What were these numbers? They used statistics like this:

According to these “statistical analyses”, on every dimension but one, Ronda was the weaker fighter and thus, they predicted, she would lose. They pointed out that she had won all of her matches the same way and was therefore clearly a limited fighter. They advised the readers of their blogs and websites to take advantage of these ridiculous odds and place some serious money on the opponent.

There is only one statistic that matters

There were a couple of problems with this analysis. Foremost is that not all statistics are created equal. A submission ends the match and gives you a win. So, even if Ronda’s opponent manages to land 6 punches to her 4 before the submission occurs, once Ronda dislocates the other woman’s elbow and wins by submission, the number of punches is irrelevant. The percentage of times Ronda’s matches have come to a decision – 0%.

One reason Ronda has not landed a bunch of kicks and punches is that she had ended all of her matches up to this point in under a minute. That doesn’t give a lot of time to punch or kick. What about the percentage of successful punches? Surely that is relevant, no? The number of punches, kicks and take downs only is relevant when it comes to a decision. Ronda has been criticized for the fact that she is willing to “eat a couple of punches” on her way in to get into the clinch and throw her opponent, transitioning into a submission. She does this deliberately figuring that hey, she may get hit in the face once but after she does she is going to be close enough to grab you, throw you and break your arm, so it was a calculated risk. She gets a lot of press for her looks and athletic accomplishments, but when the writer from Sports Illustrated asked me to tell her something most people don’t know about Ronda, I told her,

“She’s really good at math.”

One way to understand the error of these armchair statisticians, and why they were so far wrong, is to realize they had failed to realize theirs was an implied conditional probability.  We all know that, as this lovely site from Yale University points out:

If events A and B are not independent, then the probability of the intersection of A and B (the probability that both events occur) is defined by P(A and B) = P(A)P(B|A).”

On the condition that Ronda had not already won by arm bar, these other variables could predict a decision or technical knock out. A and B are definitely not independent.

So maybe, the probability of her opponent winning by a decision, was 60% , if it went to a decision, making a the 3 to 1 odds the bookies were giving on Ronda winning look outrageous, as some sites called it. However, if her odds of losing to Ronda by submission were 80%, then the odds of her opponent actually winning a decision were 12% – 60% of the 20% of the time it went to a decision. Now, I just made up those numbers of 80%, 60% and so on. The point is that you need to consider the probability of Ronda winning by submission is considerably higher than 0% and calculate your probability of her opponent winning given the inverse of that probability.

Ronda on fight night

In case you were wondering, Ronda won the fight in the first round by arm barring her opponent into submission.

Congratulations to Ronda Rousey, 135 lb champion of the world.



Every day, I get email asking me if I am interested in some affiliate program, link exchange, blah, blah, blah. Obviously, these people do not read my blog because I personally would question the wisdom of affiliating oneself with a person who has called a congressman a lying ass mother-fucker (I was right, too) and once used the likelihood of a colleague having a one-night stand with a man she met in a bar as a dependent variable.

The common suspicion that this blog is written with no adult supervision whatsoever is dangerously close to the truth. The rocket scientist, hardly counts as an adult. Before we went to The Spoiled One’s high school admission interview she commanded,

“Dad is not allowed to wear anything that has a cartoon character on it, has holes in it or is tie-dyed.”

This led the wise oldest daughter to conjecture (correctly),

“So, you’re going shopping, right?”

I am not sure why parents of prospective high school students are interviewed, because I imagine that the answer really of interest is to the question, “Can you afford the tuition?”

Which is the kind of thing you could answer on twitter, without the need of new clothes. But I digress. Actually, this entire blog is a digression so I suppose this is digression squared.

I was talking about the reason I ignore all of these affiliate / sponsorship/ linkbait requests. These are all tacky, irrelevant and/or useless. I started this blog because I often forget whatever it was I was thinking about on a given day, like back in 1985 when I finally figured out the difference between part and partial correlation – so, I thought I would write it down.

I have negative interest in most of the products I get queries from. Not only am I not interested in their product but its mere existence causes me to be less interested in life. There will never come a time when I write a review of a romance novel because that would require me to read one and I can feel brain cells dying at the mere thought. Occasionally, there are products that I am interested in – generally any kind of software – but following up is not a priority. I have software for coding (TextWrangler on the Mac, Notepad++ on Windows) for data analysis (SAS, JMP, SPSS and occasionally Stata) for all that office stuff (Microsoft Office and OpenOffice). I am BUSY. While something else may exist that is better what I already have is working and I have a client on Line 1 and Line 2.

Unlike all of these other invitations- appsmitten actually does something I could use and don’t already have. Click here to register for your very own appsmitten newsletter for free.

Here’s why you might want to sign up.

It goes through the 1,000,000 plus apps out there and gives reviews and recommendations. Sign up for their newsletter. Not only because they said they’d pay me if you did. I signed up for it and so far have downloaded two apps they recommended, NPR News and Dropbox. They recommended Pinterest which is supposedly the new application / social network everyone must have who has lady parts. It didn’t look like something I needed – I already have Delicious to bookmark websites  and Evernote for everything else.

Three things I got from their newsletter and site. One was referrals to apps I thought I’d like, which I downloaded. A second was referrals to apps I use, like Evernote, that it doesn’t hurt to be reminded to use more frequently. The third was information on apps like Pinterest which I might (theoretically) want but decide against, for a variety of reasons – in this case lots of reviews on the app store that said it crashed their iPad.

So far, I’m loving Dropbox, especially since I am travelling for the next two weeks. I’ll write later about some of the cool apps I found through appsmitten.

However, my not-so-little any more girl is fighting for the 135 lb world title tonight, so it’s about time for me to start freaking out.



It’s not often that you read a paragraph and it sticks in your mind for months. That this particular paragraph came not from some great literary work but rather from the proceedings of the annual meeting of the Association of Small Computer Users in Education is even more expected, but there it is. Douglas Kranch wrote:

“Expertise develops in three stages. In the first stage, novices focus on the superficial and knowledge is  poorly organized. During the end of the second stage, students mimic the instructor’s mastery of the domain. In the final stage, true experts make the domain their own by reworking their knowledge to meet the personal demands that the domain makes of them.”

This idea kept coming back to me in a lot of ways. I have a thirteen-year-old daughter who is now learning the basics of chemistry, algebra and physics. I teach students statistics, and often have them use SAS for data analysis. I’m in the middle of using javascript for a much larger scale application than I have created with it in the past.

Julia falling asleep over homework

I get irritated by the frequent use of  the phrase “STEM education” for science, technology, engineering and mathematics, as if there is no difference among organic chemistry, javascript and calculus, but in this case I really did see a common thread.

Existing mathematics (and statistics and science) education programs are too limited. They either focus ONLY on drill and practice, not progressing past the first stage, or they try to skip the first stage or two entirely, an overreaction that while having the laudable goal of teaching “higher order thinking skills” often leaves students frustrated and discouraged as they do not have the basis for the tasks required. Part of the problem comes from, I think, having subjects taught by people who were not experts themselves.

Let me give two examples, one horrid and one good.

It’s common for middle school teachers to give students assignments that are supposed to be “relevant”, for example, “Make up your own periodic table”. They did not, however, come up with a new way of arranging elements. No, they did a periodic table of football players or TV shows. I suggested to The Spoiled One that perhaps she could do Disney channel shows and have those that had a character move from one show to another be in one group, just like elements that lose an electron are in one group. Similarly, those shows that shared a character, like if Miley Cyrus also did appearances on The Suite Life could be a different group, like those elements that shared an electron. I was out of town at the time –  (if you follow this blog, you know that my children contend most stories of their childhood begin this way) – but when the project was due, the world’s most spoiled thirteen-year-old turned in something she drew with a pencil on a piece of paper and got a 50% on it. When I quizzed the rocket scientist about how this happened he answered unrepentantly,

“I didn’t make her put any effort into it because she said it was stupid and I agreed.”

While I did not have this precise conversation with the school –  Seriously… What. The. Fuck – you want a kid to learn about the periodic table, covalent and ionic bonding – you teach them that,  NOT relate it to stupid  TV shows we’d just as soon she not watch any way. We spent many hours going over with her the idea of electron shells, what happens when a shell is not full, the number of electrons in each shell. You want a kid to know that NaCl is sodium chloride? You explain that Na is the symbol for sodium and Cl is the symbol for chlorine and you put the two together and you get sodium chloride. There’s actually some really interesting stuff you can throw in there about how it’s kind of weird that when you combine these two elements you get something that really isn’t very similar to either one individually. You want to get kids interested in chemistry? Do experiments. Few things are more motivating to the average eighth-grader than the possibility (however slim) that they might get to see the school blow up with the teachers in it.

I’ve been a teacher. I started out, like most people, as a not particularly good teacher, and then, with years of experience, I got better. I recognized that all of that stuff, like the periodic table the electron shells, multiplication tables, how to read an ANOVA table, you need to learn that. Even if you don’t get it at first, if you ” … focus on the superficial and your knowledge is poorly organized” – you still learn that p-values, df, sums of squares should be in there. At first, you don’t know what df stands for and when you find it is degrees of freedom that doesn’t really tell you much. After a while, you vaguely start to get it. It’s frustrating, it really is, going through those motions you don’t really understand – but there isn’t any alternative.

Let me go on to a different example. I wanted to use javascript to write an extremely complex application. So I needed to learn javascript better. I read a couple of books. I did a bunch of codecademy exercises, watched some videos. I wrote small programs that did bits and pieces of what I wanted. Then, I took someone else’s program that was pretty complicated. Not thousands of lines of code but hundreds. I went through and typed in their whole program line by line trying to figure out what each part did as I copied it. After I got it to run, I made some changes just for my own amusement. Then I did the same with a few other programs.

After a while, I could see where those “master programmers” had made mistakes. I’d notice they’d left a semi-colon off the end of a statement, left out the period, typing Mathrandom instead of Math.random or used a semi-colon when calling a function instead of a comma. In short, I got better at understanding the superficial – how the syntax has to look. At the same time, though, I started to see how the logic worked.  To see how one could use a loop inside a function to draw a deck of cards, for example. In the end, I had a game that worked. Then I changed it to be a different game, more like what I had in mind. I’m not as expert as I’d like to be in javascript yet, but I’m getting there.

Yes, in this process, I drew a lot of connections to other programs I had written in other languages. What I did not do is draw a parallel with the time we got lost and went driving around Miami trying to find somewhere it was legal to make a u-turn. (Let me just say that Florida has commitment issues. If you are going south they think you should just keep going and if you change your mind and want to go north, forget it.)

What I also did not do is one mindless fill-in-the blank or multiple-choice exercise after another ad infinitum. I didn’t memorize rules until I could pass some arbitrary test at 100% accuracy. Although I did start with that, I didn’t finish with it. In fact, I did the very minimal amount until I could move on to the imitating experts and making it my own.

If you want to learn programming, statistics, chemistry then DO that. Don’t just read about how to do it and for the love of God, don’t do something else, like stupid charts of TV shows or biographies of women mathematicians and pretend you’re doing STEM education.

« go back


WP Themes