Let me just say off the bat that open data is awesome and there should be more of it available.  This semester, I have been using SAS On-Demand in my statistics class and creating the data sets to meet students’  interests. Despite some people’s aspersions that I read on Twitter that some statisticians know no more than what PROC to use to get a p-value, it is, unfortunately, not all that easy.

I was going to write about adjusted survival curves and log log curves with PHREG tonight but it is already past 1 a.m. and both my time and Chardonnay are exhausted creating analytic data sets for my students.

I did hear back from the helpful folks at the National Center for Educational Statistics (thank you!) and downloaded the School Survey on Crime and Safety for a group of students interested in bullying. Awesome public use data set. Check it out!

My cousin

After that, I had another group of students interested in testing the hypothesis that African-American women are less likely to get married the more education they have. Conveniently, I had the American Community Survey data for California on my desktop from some analyses I had done earlier, so I pulled out the subset of people they were interested in, which is native-born African-American women over 15 years of age. (Actually, the picture is my cousin who has never, as far as I know, been to America, but hey, Ashelle, if you’re reading this, come and visit. It’s nice here.)

I downloaded the data, created a few new variables to fit the students’  interest and emailed them the file and documentation. For example, they wanted to break education down into categories, thinking, rightly, I believe, that getting a high school diploma or college diploma is a better way of categorizing education than by years, it’s not a linear relationship with most other variables.

I did run some of the analyses myself because I was curious and I will say is that the preliminary results are very, very interesting. I am looking forward to their presentation.

So, that is the plus of open data  – real data, real experience and questions the students really want to answer.

The minus – well, it took me a lot of time to locate and download the data. The data set for the study on African-American women I had on my computer, but the one on school crime I had to track down and it still wasn’t exactly what they originally planned – although it ended up working perfectly.

A second minus is that SAS On-demand is SLOW. It is several times better than it was originally. When first released it was so slow as to be useless. Now, based on, I don’t know what – sunspots – there are times it works perfectly, just a tiny bit slower than SAS on my desktop, and other times when it is really tedious. I’m sticking with it this semester because it is a) free, b) used in lots of organizations where my students may work one day and c) showing the potential to be really useful.

A third minus is that one of the students has not been able to get it to install and run, for reasons I cannot figure out. I referred him to SAS Tech support today.

If the professor teaching a statistics, research methods or data mining course did not have a lot of SAS programming experience, I think using SAS on-demand would be a challenge.

So — why bother? I think it comes back to the study one group is doing on African-American women and marriage, another group is doing on bullying in school, a third group is doing on the relationship between arts education and academic achievement using the National Educational Longitudinal Study.

Years ago, when my daughter, Maria Burns Ortiz, was a little girl, I asked her how science class was at her new school. We had recently moved and she had gone from a magnet school for gifted children to a regular parochial school. She said, “We don’t have science.”

I corrected her, “Mija, you must have science. You got an A in it on your progress report.”

She said, “No, we don’t have science at this school. We just read about it.”

So, that is why I am putting together data sets at 1 a.m. My students have statistics, they don’t just read about it.



3 Comments so far

  1. Peter Norvig on October 25, 2011 6:17 am

    1 a.m.?? Luxury!! I’m putting together quizzes for my students, and it’s 4 a.m. 🙂

    But seriously, I’m glad you’re doing this … keep it up.

  2. josh on November 21, 2011 12:17 pm

    Wish there where more professors like you out there

  3. AnnMaria on November 21, 2011 12:31 pm

    Aaaw, thank you. What a nice thing to say.


