So, this is day 13 of the 20 day blogging challenge, and I skipped over day 12 (although I may go back to it). The prompt was

“Tell about a favorite book to share or teach. Provide at least one example of an extension or cross-curricular lesson.”

My favorite resource is not actually a book, it is a magazine, Mathematics Teaching in the Middle School, published by the National Council of Teachers of Mathematics. One of my favorite parts of the magazine is the Palette of Problems section, which is a bit odd because often I find myself thinking … this problem has no point, for example,

“How many birth dates in a century have the property that the sum of the month and the day equal the value of the last two digits of the birth year?”

I do realize that some students will be interested just in the challenge of solving a problem. However, for many students, the apparent lack of application can be very de-motivating. Most of the problems, though, can be adopted to our games with really simple modifications or may just give me ideas for a problem that would fit right in. For example, this is an extension of a problem in this month’s issue



Zoongey Gniw is looking for a wife.  He is from the Catfish clan and people from the same clan are not allowed to marry. His uncles are going to trade with two different bands. In the first band, 12% are from the Marten clan, 20% from the Crane clan, 64% from the Bear and Loon clans and the rest from the Catfish clan. His other uncle is going to trade with a band where 11% are from the Catfish clan. It is going to be a hard decision which uncle to accompany, says his father.

Not at all, says Zoongey Gniw, and he steps over to the first uncle. How did he decide?

This fits perfectly in our game. There is a video clip on clans, narrated by the inimitable Debbie Gourneau from Turtle Mountain. The prohibition on marrying within clans is historically accurate. As far as the interest of our students today, not only are many of them from tribes that have  the clan system described, but they are also, like most middle school students, interested in the opposite sex, having a boyfriend or girlfriend, so the topic is inherently interesting.

I like this magazine, and I call it that deliberately, rather than an academic journal. All of the journals I read and nearly all of the academic texts talk in theory about what needs to be done and why but not nearly enough on how to effectively do it, whether the topic is teaching mathematics or running a company. Mathematics Teaching in the Middle School is all about how to do things.

When I was in graduate school, it was common for professors to mock teachers who “aren’t interested in anything longer-range or deeper than what am I going to do on Monday.”

That’s the attitude you have the luxury of having if you don’t have to actually show up and teach on Monday.



As I mentioned yesterday, banging away at 7 Generation Games has led to less time for blogging and a whole pile of half-written posts shoved into cubbyholes of my brain. So, today, I reached into the random file and  coincidentally came out with a second post on open data …


The question for Day 11 of the 20-day blogging challenge was,

“What is one website that you can’t do without? Tell about your favorite features and how you use it in teaching.”

Well, I’m a big open data fan and I am a big believer in using real data for teaching. I couldn’t limit it to one. Here are four sites that I find super-helpful

The Inter-university Consortium for Political and Social Research has been a favorite of mine for a long time.  From their site,

“An international consortium of more than 700 academic institutions and research organizations, ICPSR provides leadership and training in data access, curation, and methods of analysis for the social science research community.

ICPSR maintains a data archive of more than 500,000 files of research in the social sciences. “

I like ICPSR but it is often a little outdated. Generally, researchers don’t hand over their data to someone else to analyze until they have used it as much as their interest (or funding) allows. On the other hand, it comes with good codebooks and often a bibliography of published research. As such, it’s great for students learning statistics and research methods, particularly in the social sciences.

For newer data, my three favorites are the U.S. census site, CHIS and CDC.

census data resources section is enough to make you drool when it comes to open data. They have everything from data visualization tools to enormous data files. Whether you are teaching statistics, research methods, economics or political science – it doesn’t matter if you’re teaching middle school or graduate school, you can find resources here.


Yes, that’s nice, but what if you are teaching courses in health care – biostatistics, nursing, epidemiology – whatever your flavor of health-related interests, and whether you want your data and statistics in any form from raw data to publication, the Center for Disease Control Data & Statistics section is your answer.

Last only because it is more limited in scope is the California Health Interview Survey site where you can get public use files to download for analysis (my main use) as well as get pre-digested health statistics.

It all makes me look forward to diving back into teaching data mining  this fall.



drinking for science

There are multiple reasons that I haven’t gotten around to Day 10 of the 20-day blogging challenge. In part, because I have been really busy, and the other part is because I read this topic,

“Share ideas that your classroom uses for brain breaks and/or indoor recess”

and I thought

I got nothin’

Anyone who knows me well can tell you that I am NOT a very fun person. I like to think that I have some good qualities, but playfulness is not among them. Ph.D., world champion, founded/ co-founded a few companies, publishes scientific articles – does this sound like I spend a lot of time playing frisbee in the park? No, I didn’t think so. About the closest I come to this in class is on the first day having everyone introduce themselves and talk about their research interests – which is not really very close, I must admit.

For the last SAS assignment of the Public Health Research Methods course, I decided to make a video and upload it to youtube. For one of the dependent variables, I used how often in a year a person engaged in binge drinking, defined as 4 or more drinks per day. I’ve probably had four drinks in a day a few times in my LIFE so I was surprised to find that the average person  (out of over 40,000), said they did this on average 2.4 times per year.

Today has been a really frustrating day. Yesterday, after a margarita at dinner, I came home and was working on our newest game, Fish Lake, and everything was progressing smoothly. Today, for both The Invisible Developer and I, it has been just beating our heads against the wall. For example, I have this PHP script that ran intermittently today – I have three records written to the database – and all of the rest of the times, it failed with an error. The I.D. has been having similar problems.

I took a break and made a video on how to do simple statistics with SAS to test the hypothesis that I could do a screen recording with Quicktime, write a program using SAS On-Demand in Firefox, record the audio in Garageband and drink Chardonnay all at the same time because Von’s had a half-price sale on wine over $20 a bottle and, well, you know – science.

You can determine if my hypothesis – whatever the hell it was – was supported. Bizarrely, the equals signs do not show up in the video. How weird is that?



Today I’m getting around to day nine of the 20-day blogging challenge while I wait for The Invisible Developer to get out of the shower where he is curled in a fetal position whining about having to go outside when it is 14 below zero. Actually, he is probably just taking a shower, but lots of whining has taken place this week, let me tell you.

Today’s question is what did I do this week that would I do again in teaching or what would I not do again. I think I’ll answer both. Coincidentally, (or maybe not, since I’ve been working on a course re-design to incorporate SAS programming), both of those things have to do with SAS. Short version, if your data are in a form amenable to SAS, it is a godsend for teaching statistics. If your data are not in a very SAS compatible format, it just blows. If, God forbid, you are limited to using SAS On-demand, as I am this week because I have yet to receive the Windows 8 compatible version from the university, and I am in North Dakota, working on my laptop, well then, your life is about to suck, I am sorry to say.

The thing I would totally do again, if I was teaching an epidemiology course, is PROC STDRATE. I love everything about this procedure. The documentation explains the procedure in very plain language which I did not have to rewrite at all for the students, I just included the overview in my livebinder.

“Two commonly used event frequency measures are rate and risk:

  • A rate is a measure of the frequency with which an event occurs in a defined population in a specified period of time. …

  • A risk is the probability that an event occurs in a specified time period. “

It also includes datasets that can be used as an example, and they are easily typed or copied and pasted into your SAS program. Further, these data are very similar in format to the types of data that students will usually come across. Most important, this is one of the most useful procedures for students beginning to learn epidemiology, providing a lot of statistics in one- population attributable risk, population attributable fraction, standardized morbidity rate and more. It will save loads of time over computing statistics on a calculator to answer homework questions – which I think is just silly, because it is 2014 and we have computers. Also, the syntax is relatively easy.

You can read one example of using STDRATE for crude risks, reference risk and attributable fractions here.

So, that was the good part. What I would never do again, if I had any choice at all, is

a) Use SAS to create maps, or really, analyze in any way, data that was either not already in a SAS dataset or in a very easy to read format, e.g. , no missing data, no variable length variables, and

b) Use the SAS Enterprise Guide version of SAS On-Demand for anything, ever

There are some significant drawbacks of the SAS Web Editor as well but they pale in comparison with the slowness of SAS Enterprise Guide in the on-demand version. While some programs you could maybe get a cup of coffee while waiting for it to run, with the on-demand version of SAS EG you can drive to Starbucks, wait in line, by your coffee, drive back to the office, park, take the elevator to your floor and STILL be there just about when your cross-tabulation had completed. It’s ridiculous, which is sad because if it ran ten times faster it would be a really great tool. It’s terrific on my desktop.

Someone on twitter commented that they hated SAS because it did not play well with open data. Aint that the truth! Now the exception is if you can get  your data in a SAS dataset format. Then it’s wonderful. Well, I was using HIV prevalence data from  – great site, by the way – and it took me an HOUR to get it read by SAS Web Editor. You can only upload csv files or SAS files to the web editor, so I couldn’t use PROC IMPORT to read in the Excel file. The data I had used country name as the ID and that didn’t match with the ID in the SAS map files – it’s a long sad story with the moral that if I had the option of not using SAS for maps I would certainly be looking into that right now and if I never have to use SAS Enterprise Guide again (which only seems to have the US map in the On-demand version anyway) it will be too soon.

Yes, in the end, I did get my world HIV in the end. The computer will not defeat me!





Day eight of the 20-day blogging challenge was to write about a professional read – a book, article or blog post that has had an impact on me. To be truthful, I would have to say that the SAS documentation has had a profound impact on me. SAS documentation is extremely well-written (to be fair, so is SPSS) in contrast to most operating system documentation which is written as if feces-flinging monkeys were somehow given words instead, which they flung onto a page which then became a manual. But I digress – more than usual. It’s not reasonable to suggest to someone reading the entire SAS documentation which is several thousand pages by now. Instead, I’d recommend Jennifer Waller’s paper on arrays and do-loops. This isn’t the paper where I first learned about arrays – that was before pdf files and I have met Jennifer Waller and she was probably barely in elementary school at the time. It’s a good paper though and if you are interested in arrays, you should check it out.

Here is what I did today, why and how. I wanted to score a dataset that had hundreds of student records. I had automatically received the raw score for each student, percent correct and what answer they gave for each multiple choice question. I wanted more than that. I wanted to know for each question whether or not they got it correct so that I could do some item analyses, test reliability and create subtests. This is a reasonable thing for a teacher to want to know – did my students do worse on the regression questions, say, than the ones on probability, or vice-versa?  Do the data back up that the topics I think are the hardest are the ones that my students really score worst on?  Of course, test reliability is something that would be useful to know and most teachers just assume but don’t actually assess. So, that’s what I did and why. Here is how.
filename sample “my-directory/data2013.csv”;
libname mydata “mydirectory” ;
data mydata.data2013 ;
infile sample firstobs = 2 dsd missover ;
input group_type $ idnum $ raw pct_correct qa qb qc q1- q70 ;

** These statements read in the raw data, which was an Excel file I had saved as csv file ;
** The first line was the header and I forgot to delete it so I used FIRSTOBS = 2 ;
*** That way, I started reading at the actual data. ;
*** The dsd specifies comma-delimited data. dlm=”,” would have worked equally well ;
*** Missover instructs it to leave any data missing if there are no values, rather than skipping to the next line ;

Data scored ;
set mydata.data2013 ;
array ans{70} q1- q70 ;
array correct{70} c1 – c70 ;
array scored{70} sc1 – sc70 ;

*** Here I created three arrays. One is the actual responses ;
*** The second array is the correct answer for each item ;
*** The third array is where I will put the scored right or wrong answers ;

if _N_ = 1 then do i = 1 to 70 ;
correct{i} = ans{i} ;
end ;

*** If it is the first record (the answer key) then c1 – c70 will be set to whatever the value for the correct answer is ;

else do i = 1 to 70 ;
if ans{i} = correct{i} then scored{i} = 1 ;
else scored{i} = 0 ;
end ;

**** If it is NOT the first record, then if the answer = the correct answer from the key, it is 1 , otherwise 0 ;

Retain c1 – c70 ;

**** We want to retain the correct answers that were in the key for all of the records in the data set ;
**** Since we never put a new value in c1 – c70, they will stay the correct answers ;

raw_c = sum(of sc1 – sc70) ;
*** This sums the raw score ;

pct_c = raw_c/70 ;
*** This gives a percentage score :

proc means data=scored ;
var sc1-sc10 c1 -c10 ;

*** This is just a spot check. Does the mean for the scored items fall between 0 and 1? Is the minimum 0 and the maximum 1 ;
*** The correct answers should have a standard deviation of 0 because every record should be the same ;
*** Also, the mean should either be 1, 2, 3, 4 or 5 ;

proc corr data = scored ;
var raw_c pct_c raw pct_correct ;

*** Spot check 2. The raw score I calculated, the percent score I calculated ;
*** The original raw score and percent score, all should correlate 1.0 ;

data mydata.scored ;
set scored ;
if idnum ne “KEY” ;
drop c1-c70 q1-q70 ;

*** Here is where I save the data set I am going to analyze. I drop the answer key as a record. I also drop the 70 correct answer fields and the original answers, just keeping the scored items ;
proc corr alpha nocorr data= mydata.scored ;
var sc1 – sc70 ;

*** Here is where I begin my analyses, starting with the Cronbach alpha value for internal consistency reliability ;

I want to point something out here, which is where I think the professional statisticians are maybe distinguished from others. It’s second nature to check and verify. Even though this program should work perfectly – and it did – I threw in reality checks at a couple of different points. Maybe I spelled a variable name wrong, maybe there was a problem with data entry.

One thing I did NOT do was write over that original data. Should I decide I need to look at what the actual answers were, say, I wanted to see if students were selecting chi-square instead of t-test (my hypothetical correct answer), that would alert me to some confusion.

Incidentally, for those who think that all of the time they save grading is taken up by entering individual scores, I would recommend having your students take tests on the computer if you possibly can. I was at a school today where we had a group of fourth graders taking two math tests using Google chrome to access the test and type in answers. They had very little difficulty with it. I wrote the code for one of those tests, but the other was created using survey monkey and it was super easy.

I’d love to include pictures or video of the kids in the computer lab but the school told me it was not allowed )-:



Story Questions!

January 16, 2014 | Leave a Comment

In the 20-day blogging challenge, the prompt for day seven was

“Share a classroom management tip. What is one thing you do that works?”

My initial thought was that I teach graduate school and if you are still having classroom management problems by then, either you or the student have a real issue. It happens, but rarely. Then it occurred to me – story questions!

If you’ve attended many college classes, conference presentations or business meetings, you’ve run into these people with the “question that is really intended to show how smart I am rather than to elicit any information”.  Let me give you an example,

“Well, doctor, that may be true in your academic area but as someone who has spent some time in the real world, I would have to ask if sums of squares would ever be useful in my field where I’m managing a team of highly trained sales people who travel the world convincing customers to use water purification systems that will make the world a better place and safer for our children because there will be lower infant mortality and therefore people will have fewer children and the over-population problem that contributes to unrest will be decreased.”

On the off chance that this may be one of those unlikely events where the story questioner actually wants to hear my answer, I will answer a question like this – once – and as briefly as possible. The false implication that I am not actually working “in the real world”, I ignore because I do not have infinite time to argue with people.

The answer is,

“Yes. There are a great many uses of Analysis of Variance, for example, to see if sales are higher in one country than another, to see if regions where your widget is sold really do have a lower rate of unrest, controlling for other factors. You’ve made a lot of statements about differences that occur, and policymakers might want to see some documentation that those claims can be supported.”

I will answer one question during a talk. If the story questioner follows up with a second, depending on the question, I respond:

Don’t let the story questioners monopolize your class or presentation. The other people came to hear you, or maybe to learn from their fellow students as a group, not to be lectured by Mr or Ms SQ. Hence, my rule of thumb – in an average class period, any individual student can take up the whole class time for a total of five minutes. In a 50-minute class period, that’s 10% of the class. Since some of class time is spent in groups or working on problems individually, that’s more than 10% of the whole group time. This point was brought home to me years ago when I was teaching an undergraduate course where the Empress of Story Questioners was enrolled. There was literally nothing that could be discussed that she did not have an experience to relate. It got to the point where one day my teaching assistant came to me after class and said,

“You know that woman who sits in the front row and tells all the stories disguised as questions? Well, me and the other three people who sit around her talked it over and if she’s not here next week, it’s because we all got together and killed her.”





I’m thinking I’m going to need to create a new category on my blog – here is Day 6 of the 20-day blogging challenge, which, if you are just now tuning in is (surprise!), 20 days of prompts on teaching, a challenge I decided to undertake for the hell of it, the same reason I do most things in life. Given that this is particular crunch month for work it’s kind of amazing I’ve done six of these already.

Today’s prompt was,

What is one thing you wish you were better at? Just one! Why? What can you do about it?

Sort of one and a half – the one thing I wish I was better at teaching statistics is pacing. I never feel as if I have enough time at the end of the course. On the other hand, I feel at the beginning of the course that I need to spend ample time on the basics. How can you understand explained variance if you don’t understand variance in the first place? The main thing I wish I spent more time on the past couple of courses I taught was on discussing what we mean by explained variance and residual variance.

Let’s say that we know nothing about each person who walks into a room and we are trying to predict his or her IQ. The mean population IQ is 100, with a standard deviation of 15, which means the variance is 225 (15 squared). Thus, the variance of our random guesses will be 225.  This is the error variance , which is the same as the population variance, since we had no predictors.

Let’s say now we get everyone’s college GPA and we find that the correlation with GPA and IQ is .707  (it’s lower than that in real life, but just pretend). So, now, when Bob comes into the room and I know he has a GPA of 4.0, two standard deviations above the mean, I am going to predict that he has an IQ of 1.414 standard deviations above the mean.

My equation is Y = a +bX   where a = the mean and b = the regression coefficient

On the average, now, my predictions will be more accurate. In fact, the variance of the prediction is now 112.  Instead of being off by 15 points in my prediction, I’m off by about 10.5 points (the square root of 112.)

Notice that:

There’s a lot more I have to say about explained variance, but it is past 1 a.m. and I’m trying to get to bed earlier and get up earlier so I won’t die when I’m in North Dakota next week and get up at what is the equivalent of 7:30 a.m. Pacific time. (Because there is no way on God’s green earth I’m going to make it into those schools before 10 a.m. and that’s just a fact. Maybe it’s on God’s white earth, since I believe everything in North Dakota is under a blanket of snow for at least three more months.)

buffalo in the snow



It’s now day five of the 20-day blogging challenge, which, if you are late to the party, FYI is an idea of Kelly Hines to blog 20 days in a month on topics related to teaching.

“Share any tips for designing/ grading/ giving assessments.”

I have two really good ideas for assessment, one of which I always use and the other I’m kicking myself because I have not done it lately but I’m really thinking of using it again for my next class.

1. Analysis paper with real data. I teach statistics, multivariate methods, data mining, stuff like that. As a business owner, I used to say that I would not hire anyone fresh out of graduate school because they could never DO anything. I don’t need someone to prove the Central Limit Theorem for me,  calculate sums of squares with a calculator or look up degrees of freedom in a table in the back of some book. It’s 2014 and we have computers here at The Julia Group. It occurred to me at some point that since I teach graduate students each year, I am part of the problem. Now, I always require students to pose a research question, analyze data to answer it and write a paper discussing their method and results.

Details may vary from course to course but what the assignment always includes is REAL data, which means some of it is missing, some is impossible, meaning it was data entry errors or the person just wrote down the wrong information. No one is 992 years old. Data may not comply with distributional assumptions. Your measure may turn out to be unreliable. In all of that, you need to figure how to compute an Analysis of Variance or logistic regression and interpret the output without an answer in the back of the book.

I require the paper to be submitted in pieces, first a draft of the descriptive statistics, then a draft of inferential statistics, then a final draft. One reason for this, unfortunately, is that cheating is rampant at universities, and if you have to turn in lots of drafts, it is going to be difficult and expensive for you to get one of those storefront paper mills in West L.A. to write them all for you.

There is a more positive reason, though. You may (probably will) forget 90% of what was on a multiple-choice final exam that you crammed for. You’re a lot more likely to remember how you solved a problem that you posed yourself, because it’s likely to be of interest to you. For example, a young woman in a recent course wanted to do research on the relationship between obesity and health problems such as diabetes and cardiovascular disease because her family had several members who were obese and had health problems. When I asked whether she’d like to look at BMI or breakdown the sample in different ways, she was emphatic that she was interested in obesity. She used the data from the California Health Interview Survey. Because it was something meaningful to her, I believe she’ll retain a lot more of what she learned than if we had just had chapter tests.

2. Class notebook. This is an assignment I used to give and I would run into students years later who told me they still had theirs and used it. This wasn’t just notes but more of a lab notebook detailing in your own words just how you did each part of the project. They could also copy and paste in anything that would help them. The purpose was for them to have something they could use if they ran into this problem on the job. The notebook was THEIRS. How did you compute descriptive statistics with SAS? How did you compute reliability? How did you compute an ANOVA, what type of post hoc test did you use and why? How did you compute a MANOVA? What did each of those numbers on the printout mean? Because students wrote it for themselves, when they needed to do one of these procedures even a year or two later, they could pick it up and use it.

What both of these assessments have in common is that they allow the students to personalize their learning.

They also both take a really long time to grade because I read every page. I think this would be really hard for a large class unless you had a teaching assistant or grader. Even for graduate courses that tend to be less than 25 students, it’s a lot.



Amazingly, given my current schedule, I have made it to Day 4 of the 20-day blogging challenge. This was the  brain child of Kelly Hines as a way to get herself to blog more regularly. Today’s prompt was :

Share a topic/ idea from class this week. What’s one thing you did with students this week that you will (or will not) do again. Why?

I’m not teaching a course right now but I am revising the curriculum for the biostatistics course. The topics students had the most trouble with was hypothesis testing. Even though all of them had a previous course in statistics, many had it back when they were undergraduates, and let’s be honest, how much do you remember from any class you took five or six years ago?

One thing I would do differently is go back to an idea I had when I very first started teaching statistics. I noticed that some students only had a vague idea what an exponent was. A few times I got asked why I wrote that V thing next to numbers (students who had never heard of a square root). I could go on, but you get my point. Students in a GRADUATE program. It turns out you can get a degree in some fields with very, very, very little mathematics. I was teaching the first course in the statistics sequence and I started each term with a 20-item algebra test on the first day of class. It was not part of the course grade, but I told students if they did not get over 85% they were going to have great difficulty in the course. The questions were things like find  A when A² = 9  or identify the coordinates of a given point on a plot.

Usually, I would have one or two students who scored below 60%. Almost always, those students dropped the class, which was, I think, for the best. HOWEVER, important point coming up here …. I did not tell them they couldn’t pass the class. I told them that it would be very much to their benefit to take a course in algebra and come back the following term. I would show them some of the problems later in the course and emphasize that they would be able to do this much, much easier later on if they went and took another math class first. Most students saw my point, dropped the class and took a prerequisite course. Even though it wasn’t an official university prerequisite, it was prerequisite information. The few students who did not knew what they were getting themselves into and planned to meet with me during office hours every single week, and allocated their time for a LOT of extra studying.

The course I am teaching now is not as basic as that one, but I do think the students could benefit by having some assessment of their understanding of basic concepts. Do you know what a z-score is? A normal curve? Percentile?  Yes, I give quizzes, but I don’t mean exactly that. I mean a test of basic concepts, like what does a probability of .45 mean .

So, that is what I am ruminating on tonight. What are the absolute basics of statistics that you need to comprehend before forging ahead?




The question for Day 3 is :

“What is a website that you cannot live without? Tell about your favorite features and how you use it in your teaching and learning.”

The first part is easy. Oh my God, I love, love, LOVE stackoverflow, a site where all of your programming questions are answered. It’s free , you don’t have to register. You can just go there and search for an answer to why your css is not properly aligning 5 pixels from the left margin of the container, or whatever is bothering you at the moment. Normally, when I type a question into Google one of the first few hits will be on and I go read whatever it is. Even if my question isn’t answered, I’ll learn something and I can usually search the site or look at the related topics in the sidebar and find what it is I was trying to learn.

I can’t really say that I use stackoverflow for teaching, except for indirectly. One of The Julia Group companies, 7 Generation Games, is games to teach kids math and many of the problems I encounter are related to game development.

There are sites, I use for teaching and I was going to list more here but I peeked ahead and saw this question comes up again in the 20-day challenge so I’ll save those for later. There are a few other good sites, including a couple of blogs, that I like for statistics, SAS and SPSS but answering the first part of the question, what site, if I woke up tomorrow and it wasn’t there would you find me screaming NO- O – O – O !!! and searching for the nearest lake to drown myself in? Definitely,


lake for drowning in

keep looking »


WP Themes