At first, I was thinking it wasn’t right to have a favorite paper, but then I realized that was idiotic. It’s not like these papers (or their presenters) are my children.

My favorite paper was,

Statistical modeling for large complex data: Five new directions from SAS/STAT software

If you’re not a statistician, props to you for reading after that first sentence, especially since some of the lessons apply to any conference.

glm select

  1. You don’t always have to present or attend presentations on whatever is shiny and new. The techniques he presented, like GLMSELECT, a method for selecting the best model is not brand new. I remember when it was first added to SAS/STAT and thinking it was a way cool idea I should use – but, then, I didn’t. As you can see from the graph above, it can be pretty easy to select the best model. Looks a lot like a scree plot, doesn’t it?  This also further supports my point that visual displays of data, like the one above, are everywhere and taking over. Now that I have been reminded of its existence, I’m looking for a use for it so I can really remember it. Unfortunately, this is a method for general linear models and what I am most interested in right now has a binomial outcome, whether a player finished a game or not.
  2. Don’t stop learning when you go home. I remembered that there was also an example in this paper that used HPGENSELECT for generalized linear models, including binomial distributions. So, I am going to try that out with this dataset. One of the areas where I am improving is actually reading all of those papers I mean to get around to when I get home. Whether it is a paper you attended, but is now jumbled around in your brain with the other 25 sessions, or one you could not attend because it conflicted with something else, when you get home, you should read it. Conferences can be expensive and you want to get the most out of that time and money you spent.
  3. Of course, I learned about sparse regression, quantile regression, classification and regression trees and more, which you can, too if you follow my advice from #2.

Okay, well there is a lot more to say about SAS Global Forum and my adventures with HPGENSELECT but we have a new game, Forgotten Trail, coming out for sale tomorrow, so back to work.

———-

7 GENERATION GAMES

Sam and Angie planning their journey

BETTER GAMES, BETTER MATH

 

In the past month, I have been in five cities and three states and met a lot of people. Lately, I’ve had conversations with several families that have adult “children” living at home in their 30s, 40s and even 50s. By then, the children’s girlfriend/boyfriend and children are also living in the home. Often, this whole inverted pyramid is supported by one person in his or (usually) her sixties or seventies.

It’s not often that I don’t speak up but when I do it is because the incongruous situation just leaves me at a loss for words. 

Seriously, when you are 72 and calling your 50-year-old son’s probation officer, what the hell are you thinking?

One of my daughters had just come back from the Olympics. She had not finished high school and was scheduled to take the GED exam, but she was driving across country and ran into some bad weather. I called the GED center to reschedule the exam.

The nice lady on the phone asked me,

Excuse me, ma’am, but how old is your daughter?

Startled, I replied,

She’s 21.

The woman said,

Then, let me tell you something. If she’s 21 years old, you should not be calling for her. She’s old enough to make these calls herself.

The excellent point that this woman made was that we make excuses for our children. She was training for the Olympics. She was caught in a snow storm.

The fact is, Ronda was perfectly capable of making arrangements herself. She came home, rescheduled the exam and passed it with flying colors, took a couple of college courses, worked  a couple of jobs and ended up being very successful in multiple chosen careers.

There are two questions here:

  1. Why do our children rely on us long after they are capable of relying on themselves?
  2. Why do we do for our children long after they are capable of doing for themselves?

The answer to the first is easy, I think. It is comfortable, convenient and they are accustomed to it. I’ve heard more than one adult respond with outrage to the suggestion they should start paying rent,

“I’ve lived here for 29 years without paying rent. Why should I start now? Just to make it easier on my mom?”

Honestly, I want to slap those people. What the hell is wrong with you that you cannot see that you are making life more difficult than it needs to be for someone you supposedly love? Not all of those people are evil. They have fallen into a pattern where they live in someone else’s house, drive their car, eat their food and they consider it all the same as if they earned it themselves. They’ve never grown up.

What about those parents? Why do they allow this? Why are they still calling the dentist, unemployment office, probation officer, community college counselor long after their “children” are adults?

Like the younger generation, part of it is habit. They have been taking care of Johnny since he came out of the womb, cleaning up his messes, solving his problems and it is hard to break that habit. I’m sure they love their lazy, inconsiderate offspring and don’t want to see him not get the classes he needs, sent to jail, not get his general assistance check.

 On the other hand, there is a good dose of guilt handed out to parents who kick their children to the curb. People have told me that I was an awful mother for telling all of my children that they had three choices  when they turned 18; get a job, get into college or get out.

Especially in Ronda’s case, people have questioned whether I really loved her or believed in her if I told her she had a year to make it with this mixed martial arts bullshit and that was as far as I was willing to go. Note that Ronda herself has never thought this.

Frankly, I don’t get it. As I told all of my daughters,

I’m an old woman. I shouldn’t have to be supporting an able-bodied adult.

None of them expected that I would, but other people have called me heartless – and a lot worse. What about supporting their dreams?

Here is where some of the parents of Johnny Leech need some tough love. So, if that describes, you, Mom or Dad, listen up!

Parents are people, too. They are allowed to have their own dreams.

I have wanted to make educational games since I was in graduate school 30 years ago. Now that my children are on their own, I have the freedom to take the risks that running a start-up entails.

Maybe your dream is just to sit on your porch drinking iced tea and not have to get up and go to work. That is perfectly fine. Do it!

What about Johnny? What will he do if you kick him out? He’ll end up living on the street! He’ll starve!

Okay, probably not. He’ll figure it out.  And, if he doesn’t, that’s his choice. Yes, it will probably be hard at first. Oh, well.

You’re entitled to your own life.

The nice thing about going to SAS Global Forum is that it’s the gift that keeps on giving. Long after I have gone home, there are still points to ponder.

Visual analytics is big and not just in the sense of there is a product out called that which I have never used but that every presentation, no matter how ‘tech-y’ now makes very effective use of graphics. If I was the type of person to say I told you so, I would mention that I predicted this six years ago after I went to SAS Global Forum in 2010.

In my last post, I mentioned the propensity score graphic with mustaches.

Richard Culter’s presentation on PROC HPSPLIT, which was really excellent, made extensive use of graphics to illustrate fairly complex models.

Nodes in subtree

You can create classification and regression trees (the model you can’t see in this tiny graphic on the left) and you can drill down into sub-trees for further analysis.

Sometimes your classification tree is very easily interpretable. For example, in this case here from the same presentation, each split represents a different type of vegetation/ land surface – water,  two different species of tree, etc.

Classification tree

Speaking of classification, regression and PROC HPSPLIT ….

If you didn’t know, now you know

PROC HPSPLIT is a high performance procedure for fitting and classification now available in SAS/STAT which is useful for data sets where relationships are non-linear. It produces classification and regression trees, includes options for pruning trees and a whole lot more. It is now available on a single computer, not limited to high performance computing clusters. So, yay!

A regression tree is what you get when your dependent variable is continuous, and a classification tree when it is categorical, as in the vegetation example above.

On a semi-related note, graphics can even be used to show when a data set is not suited to a linear model as in the example below, also from Cutler’s presentation. You can see that all of the 1’s are in two quadrants and all of the 0’s in two other quadrants. Yes, you COULD use a regression line to fit this but that is not the best fit of the data.

Also, on a related topic that visualizing data, like all of statistics, really, is a process of iterations, I think this would be more obvious if the quadrants were color coded.


classify

I have a lot more to say on this but I am in North Dakota speaking at the ND STEM conference this weekend and a  kind soul gave me tickets to the hockey game in the president’s box, so, peace, I’m out.

If you did not go to SAS Global Forum this week, here are some things you missed:

Me, rambling on about the 13 techniques all biostatisticians should know, including the answer to:

If McNemar and Kappa are both statistics for handling correlated, categorical data, how can they give you completely different results?

The answer is that the two test different hypotheses, apply different formula and are coded differently.

McNemar tests whether the marginal probabilities are the same. For example, when you switched your patients from drug one to drug two, was there a decrease in the number who experienced side effects? These are correlated data because they are the same people. Can’t get much more correlated than that.

Kappa tests whether the level of agreement of two raters is greater than would be expected by chance. I’ve rambled on it here before, using it to test the level of agreement that our 7 Generation Games raters have when scoring the pretest and post-test we use to assess whether kids are improving as a result of playing our games. Quick answer: Yes.

You also missed Lucy D’Agostino McGowan’s talk on propensity score matching integrating SAS and R.

Random notes from that presentation:

Why would you want to do this? Well, it would be lovely if you could do a randomized control trial and sending your subjects randomly off to treatment or control group.

However, what if your subjects tell you to drop dead they’re not going to be in your stupid treatment group?

In my experience, propensity scores have been commonly used when evaluating special programs that do not randomly receive patients. For example, patients sent to an Intensive Care Unit tend to be sicker than non-ICU patients. How then, do you decide if an ICU has any benefit when people in it are more likely to die?

Observational studies can use propensity scores to get a more unbiased estimate of treatment effects.

Propensity score matching assumes

  1. That there are no unmeasured confounders
  2. Every subject has a non-zero probability of receiving treatment.

Propensity scores are simply predicted values from a logistic regression predicting treatment

Useful rule of thumb:
Use caliper of .2 * pooled standard deviation

Only match people from treatment group to control group if their distance is within the caliper.

Also, I have slide envy because she thought to use mustaches and fedoras in illustrating propensity scores.

Propensity Scores with mustaches

Also with really cool slides I was not quick enough to take a picture before he moved on …

Using Custom Tasks with In-memory statistics and SAS Studio by Steve Ludlow

I was able to find the slides from a related presentation he give in the UK last year. I linked to that one because it gave a little more detail on what SAS in-memory statistics is, how to use it and examples. If you had gone to his presentation, you probably would have wanted to learn more about this proc imstat and custom tasks of which he speaks.

Three points you might have come away with:

  1. Creating custom tasks is really easy
  2. Custom tasks could be really useful for teams sharing a large data base. Say, for example, you are on a longitudinal project study development of at-risk youth from age 12-25. You might have all kinds of people doing similar analyses, maybe looking at predictors of high school dropout, say. You could save your task and re-run it with next year’s data, only for females or in a hundred other ways.
  3. Custom tasks could be super-useful for teaching. Have the students run and inspect tasks you create and then modify these for their own analyses.

Fish lake woman

Okay, off to more sessions. Just a reminder, if you are here and feeling guilt that you left your children/ grandchildren at home, you can buy Fish Lake or Spirit Lake for them to play while you are gone. They’ll get smarter and you will get brownie points from their mom / dad / teacher .

Whenever you find yourself overworked or tired out, it’s easy to get upset out of all proportion when something goes wrong.

The usual advice to avoiding stress  –  “Be sure you get enough sleep. Work-life balance is important.”  – can be like telling runners at a track meet to run faster. It is correct but not helpful.

This month, I’ll be in Tampa, San Francisco,  Las Vegas,  Grand Forks,  Devils Lake and Tulsa. I’m giving three papers at three conferences, meeting with a lot of people, writing an annual report, working on the bugs to get out our latest game by the end of this month, working on a Chromebook version of our two current games, expanding our resources for instructors, meeting with potential investors, visiting school beta sites. It’s exhausting just reading it, no?

I arrive in Los Angeles from Las Vegas at 10 pm and leave at 1 am for Grand Forks.  I’m not at private jet level yet, so I had to catch whatever flights were available to get from Nevada to North Dakota because those selfish bastards who organized SAS Global Forum did not consult me in which days on the calendar fit in my personal schedule. I know, I’m shocked, too! The North Dakota STEM conference, same thing! No one called me and said,

“Hi, AnnMaria, we’re thinking of having a conference. What day is good for you?”

My point is that when Delta airlines doesn’t consider you in its flight routes or conferences don’t schedule around your convenience, you don’t get outraged because you don’t take it personally.

That last phrase is the major stress relief strategy that is within your control. You can’t always get enough sleep, do your yoga, only associate with positive people (or, based on some of my recent flights, people with adequate hygiene).

You can, however, try to avoid taking personally the inevitable slips in your best-laid plans. I’ll give you an example that happens to me more often than I would like – I am supposed to go to a school that has expressed an interest in using our games to teach math. Sometimes they have made an appointment for me to present our research to the staff, observe students playing in the computer lab or meet with a focus group of students and then, at the last minute cancel – sometimes, literally, as I am getting into my car to drive to meet with them.

It’s easy to get upset about this, or a hundred other things that don’t go perfectly in the average day. I was talking to The Perfect Jennifer about one of these today and she said,

Jenn studying

You know, Mom, as much as I am always on your side, I don’t think this has anything to do with you.

And she is right. I could have said,

“What do you mean? It was ME they had agreed to meet with/ invest in/ buy games from/ have as the chief belly dancer (okay, maybe not that last one).”

The fact is, though, people change their minds, say something they didn’t mean, double-book, forget appointments or have conflicts for all kinds of reasons and a very, very small proportion of them have to do with you. They just plain forgot that Wednesday was a half-day and the students would not be there at 2 pm. THEY are overworked, too, and were called in to handle a crisis with a student who had attempted suicide or assaulted a teacher, the school is on lock-down, the superintendent dropped by to ask why their test scores were so low, their budget was cut, their last investment lost a bucket of money and they can no longer afford to invest.

So … the next time you find your stress level rising along with your desire to throttle someone, don’t count to ten, start counting all of the plausible explanations for their behavior that don’t involve you. I’m pretty sure people are not wandering around plotting to make your life stressful. In fact, once you start thinking about the potential problems other people have to deal with, your life starts looking pretty good by comparison.

———

Games for Mac and Windows that make you smarter

Fish lake splash screen

Chromebook versions coming soon

 

It’s almost 6 am here on the east coast, and after flying all day during which I worked on a final report for a grant to develop our latest educational game and make bug fixes on same, I landed and wrote a report for a client, because that pays the bills.

In the meantime, over on our 7 Generation Games blog, Maria wrote a post where she called bullshit on venture capitalists who claim not to be interested in educational games because they aren’t a billion dollar business but then fund other enterprises that no way in hell are a billion dollar business.

She seems to have touched a nerve because now we are getting comments from people saying no one wants to fund you because your games are bad and you are mean.

That is part of the start-up life, really. You have this idea for a business that you think is wonderful, it is your baby. Like a baby, you get too little sleep, because you are working all of the time, but you think it’s worth it.

kid acting ugly

And every day, you run into people who are essentially telling you that your baby is ugly.

People like to believe they are reasonable and give reasons for their belief in your baby’s ugliness. I think you should consider those explanations because they could be right. Maybe your baby IS ugly.

For example, someone said, “Maybe venture capitalists don’t want to invest in your games because they aren’t as good as the PS4 , Wii and Xbox games and kids don’t want to play them.”

I answered that he was correct, our games, that cost schools an average of $2- $3 per student, and cost individuals $9.99 are NOT as good as games that cost $40 – $60. If you have 200 kids in your school playing our games, you probably can’t afford to pay us $10,000 . I know this is true. Could I be wrong about the price of the games to which he was comparing ours? I went and checked on Amazon which is probably one of the cheapest places to buy games and,  I was correct.

I have a Prius. My daughter has a BMW that costs four times as much. Her car looks much cooler than mine and goes much faster. Does that mean Prius sucks and no one should invest in them? Obviously, no.

Actually, we have thousands of kids playing our games and they sincerely seem to like them, and upper elementary and middle school kids are usually pretty honest about what they think sucks.

People sometimes point out that our graphics could be cooler or our game world could be larger or other really, really great ideas that I completely agree with. The fact is, though, that we want our games to be an option for schools, parents across the income spectrum, after-school programs and even nursing homes, in some cases. (There is a whole group of “silver gamers”.) These markets often do NOT have the type of hardware that hard-core gamers do. In fact, the minimal hardware requirement we aim to support is Chromebooks and we are building web-based versions that will run in areas that don’t have high-speed Internet access.

Did you ever have that experience where you call tech support for a problem and the person on the other end says,

Well, it works on my computer.

What good does that do me?

So, we are trying to make games that work on a lot of people’s computers. Believe me, I do get it. I play games on my computer and I have a really nice desktop in an area with high-speed Internet and I would LOVE to do some way cooler things. We made the decision to try to provide games people could play even if the only computer they can access is some piece of junk computer that most of us would throw out. Don’t get me started on the need to upgrade our schools and libraries, that is a rant for another day.

A teacher commented the other day that while she really liked the educational quality of our games what she really wanted for her classroom were Xbox quality games for free . I would like a free computer, too, but those bastards at Apple keep charging me when I want a new one. I guess that is a rant for another day, too.

My whole point is that running a start-up is a lot of hard work and a lot of rejection. Almost like being an aspiring actor or author or raising a teenager. You have to consider the criticisms without being discouraged. Maybe they are correct that Shakespeare wouldn’t have said,

Like, you know, to be or not.

On the other hand, I remember that publishers rejected Harry Potter, and just about every successful company over the last few decades has had more detractors than supporters when it got started. And let it be noted I was right about that jerk I told you not to date, too.

In the meantime, check out our games, they really are fun and DO make you smarter!

Fish lake splash screen

 

 

 

Esteemed statistics guru, Dr. Nathaniel Golden has some sobering news for Democrats. His latest models predict a Republican blow out. As can be seen by the map below, the Republican front-runner has tapped into the mood of resentment in the country’s non-elites. When the dust has settled, only the two highest earning states in the country will remain in the blue column, Maryland and New Jersey (seriously, New Jersey). Code used in creating this map and the statistics behind it can be found below.

Map in all red but 2 states

Step 1: Create a data set

Oh, and April Fool’s !  I just made up these data. If you really do need a data set with state data aligned to SAS maps, though, you can do what I did and pull it from the UCLA Stats Site. If you had real data, say percent of people who use methamphetamine, or whatever, you could just replace the last column there with your data. Since I did not have actual data, I just created a variable that was 40,000 for everything less than 51,000, and 51,000 for everything over. I’m going to use that in the PROC FORMAT below.

Also, even though my data are not nicely aligned here, note that the statename variable has a width of 20 so make sure you align your data like that so that state comes in column 22 or after.

DATA income2000;
INPUT statename $20. state income ;
IF income < 51000 THEN vote = 40000 ;
ELSE vote = 51000 ;
DATALINES ;
Maryland 24 51695
Alaska 2 50746
New Jersey 34 51032
Connecticut 9 50360

— a bunch more data

;

Here’s how you set up a PROC FORMAT for the two categories.

PROC FORMAT
VALUE votfmt low-50000="Republican"
50001-high="Democrat";

*** Making the patterns red and blue ;

pattern1 value=msolid color=red;
pattern2 value=msolid color=blue;

*** Making the map ;

proc gmap data = income2000 map=maps.us;
id state;
choro vote;
format vote votfmt.;

The important thing to keep in mind is if you want a U.S. map with the states that maps.us is in a SAS library named maps. Like the sashelp library, it’s already there, you don’t need to create it or assign it in the LIBNAME statement, you can just reference it. Go look under your libraries. See, I was right.

And don’t forget to vote.  I don’t care how busy you are. You don’t want this, do you?

There are some things in life that I just have difficulty wrapping my brain around, and one of those is how some people can be so incompetent that they don’t know they’re incompetent.

Let’s take the example of people earning doctorates. You’d think that would be a pretty select crowd, right?

From 1960 – 69, about 16,ooo Ph.D.’s were awarded  annually in the United States

From 1990-99, there were about 40,000 annual Ph.D graduates.

That seems like a pretty steep jump in 30 years, but maybe science, technology, etc. was increasing at a rapid rate, we were in a race to space, make up whatever explanation you want because, are you ready for this  …. in 2013, we awarded over 125% of the number of degrees a mere 14 years ago- and  that is following on pretty steep trends up to that decade.

There has been a dramatic increase in the number of institutions awarding doctorates.

So, here is a question for you …. who are the people educating all of these doctoral students?

At the risk of sounding like an old curmudgeon, even more than usual, I’d like to point out that it used to be that a professor supervised only a few doctoral students at a time. You worked closely with that person on your research for a year or two. Prior to that, you had 3-5 years of coursework, often with only a dozen or fewer students in a class. When I enrolled in the doctoral program, I had to agree not to work more than 20 hours a week during the term because being a doctoral student was a full-time job.  All but two of my statistics courses were six hours a week, a three-hour lecture and a three-hour lab. One of the two that didn’t have a lab, structural equation modeling, you were just expected to spend that lab time figuring it out on your own, and believe me, it took more than an extra three hours.

When I look at what doctoral students are required to know in most institutions, I wonder – who is going to replace the people who are retiring?

If someone poses a statistical problem to me – say, determining whether three groups receiving different treatments improved from pretest to post-test, I can perform all of the steps required to answer the problem – pose the relevant hypotheses and post hoc tests, evaluate the reliability and validity of the measures used, clean the data in preparation for analysis. Not only can I lay out the research design and necessary steps, but I can code it, in SAS preferably but in SPSS or Stata if someone prefers. Everyone I knew in graduate school was expected to be able to do this, it wasn’t the special AnnMaria program.

Now, many people use consultants. I have friends that make their living full time consulting on dissertations for doctoral students.

This leads me to the question, “What are their advisors doing if these students need a consultant?”

Isn’t that what your professors in your program are supposed to be doing, consulting with you?

The fact is that the vast majority of professors now are adjuncts, teaching a course here or there. I’m not bashing adjuncts per se. I teach as an adjunct now and then myself, and it is fine if you need a course on say, programming or statistics, but if that is all you get, is courses taught by someone tangentially tied to the university, you are missing out on the in-depth research and study that used to be required for a Ph.D.

The really alarming thing to me is that now we have whole waves of students who are being educated by people who don’t know any other system. So, we have people who cannot conduct a complete research project on their own, who have only vague concepts of what a ‘mixed model’ is – and they are teaching doctoral students!  Now, if you are in French literature or something, maybe that’s cool and mixed models aren’t very applicable. That’s not my point.

My point is this whole cutting costs by reducing full-time faculty to a tiny fraction has resulted in people who are poorly educated and don’t even know it! They don’t know what they don’t know and now they are passing their ignorance on to the next generation.

I came out of my Ph.D. program knowing one hell of a lot, simply because, if I wanted to graduate, there was no other option. The University of California didn’t give a damn if I had three kids (I did), or needed to work (I did) or that it costs one hell of a lot to provide that level of individual supervision (it did). The powers that be figured you needed this body of knowledge to get a Ph.D. and that was that. And now, that isn’t that. That worries me.

 

I can’t believe I haven’t written about this before – I’m going to tell you an easy (yes, easy) way to find and communicate to a non-technical audience standardized mortality rates and relative risk by strata.

It all starts with PROC STDRATE . No, I take that back. It starts with this post I wrote on age-adjusted mortality rates which many cohorts of students have found to be – and this is a technical term here – “really hard”.

walnut

Here is the idea in a nutshell – you want to compare two populations, in my case, smokers and non-smokers, and see if one of them experiences an “event”, in my case, death from cancer, at a higher rate than the other. However, there is a problem. Your populations are not the same in age and – news flash from Captain Obvious here – old people are more likely to die of just about anything, including cancer, than are younger people. I say “just about anything” because I am pretty sure that there are more skydiving deaths and extreme sports-related deaths among younger people.

Captain Obvious wearing her obvious hat

Captain Obvious wearing her obvious hat

So, you compute the risk stratified by age. I happened to have this exact situation here, and if you want to follow along at home, tomorrow I will post how to create the data using the sashelp library’s heart data set.
The code is a piece of cake

cake

PROC STDRATE DATA=std4
REFDATA=std4
METHOD=indirect(af)
STAT=RISK
PLOTS(STRATUM=HORIZONTAL);
POPULATION EVENT=event_e TOTAL=count_e;
REFERENCE EVENT=event_ne TOTAL=count_ne;
STRATA agegroup / STATS;

The first statement gives the data set name that holds your exposed sample data, e.g., the smokers, your reference data set of non-exposed records, in this example, the non-smokers. You don’t need these data to be in two different data sets, and, this example, they happen to be in the same one.  The method used for standardization is indirect. If you’re interested in the different types of standardization, check out this 2013 SAS Global Forum paper by Yang Yuan.

STAT = RISK will actually produce many statistics,  including both crude risk estimates and estimates by strata for the exposed and non-exposed groups, as well as standardized mortality rate – just, a bunch of stuff. Run it yourself and see.  The PLOTS option is what is of interest to me right now. I want plots of the risk by stratum.

The POPULATION statement gives the variable that holds the value for the number of people in the exposed group who had the event, in this case, death by cancer, and the count is the total in the exposed group.

The REFERENCE statement names the variable that holds the value of the number in the non-exposed group who had the event, and the total count in the non-exposed group (both those who died and those who didn’t).

The STRATA statement gives the variable by which to stratify. If you don’t need your data set stratified because there are no confounding variables – lucky you – then just leave this statement out.

Below is the graph

risks by strata
The PLOTS statement produces plots of the crude estimate of the risk by strata, with the reference group risk as a single line. If you look at the graph above you can see several useful measures. First, the blue circles are the risk estimate for the exposed group at each age group and the vertical blue bars represent the 95% confidence limits for that risk. The red crosses are the risk for the reference group at each age group. The horizontal, solid blue line is the crude estimate for the study group, i.e., smokers, and the dashed, red line is the crude estimate of risk for the reference group, in this case, the non-smokers.

Several observations can be made at a glance.

  1. The crude risk for non-smokers is lower than for smokers.
  2. As expected, the younger age groups are below the overall risk of mortality from cancer.
  3. At every age group, the risk is lower for the non-exposed group.
  4. The differences between exposed and non-exposed are significantly different for the two younger age groups only, for the other two groups, the non-smokers, although having a lower risk, do fall within the 95% confidence limits for the exposed group.

There are also a lot more statistics produced in tables but I have to get back to work so maybe more about that later.

I live in opposite world

Speaking of work — my day job is that I make games for 7 Generation Games and for fun I write a blog on statistics and teach courses in things like epidemiology. Actually, though, I really like making adventure games that teach math and since you are reading this, I assume you like math or at least find it useful.

Mom and kid

Share the love! Get your child, grandchild, niece or nephew a game from 7 Generation Games.

One of my favorite emails was from the woman who said that after playing the games several times while visiting her house, her grandson asked her suspiciously,

Grandma, are these games on your computer a really sneaky way to teach me math?

You can check out the games here and if you have no children to visit you or to send one as a gift, you can give one to a school – good karma. (But, hey, what’s with the lack of children in your life? What’s going on?)

SENSITIVITY AND SPECIFICITY – TWO ANSWERS TO “DO YOU HAVE A DISEASE?”

Both sensitivity and specificity address the same question – how accurate is a test for disease – but from opposite perspectives. Sensitivity is defined as the proportion of those who have the disease that are correctly identified as positive. Specificity is the proportion of those who do not have the disease who are correctly identified as negative.

Students and others new to biostatistics often confuse the two, perhaps because the names are somewhat similar. If I was in charge of naming things, I would have named one ‘sensitivity’ and the other something completely different like ‘unfabuloso’. Why I am never consulted on these issues is a mystery to me, too.

Specificity and sensitivity can be computed simultaneously, as shown in the example below using a hypothetical Disease Test. The results are in and the following table has been obtained:

 

  Disease No Disease
Test Positive 240 40
Test Negative 60 160

Results from Hypothetical Screening Test

COMPUTING SENSITIVITY AND SPECIFICITY USING SAS

Step 1 (optional): Reading the data into SAS. If you already have the data in a SAS data set, this step is unnecessary.

The example below demonstrates several SAS statements in reading data into a SAS dataset when only aggregate results are available. The ATTRIB statement sets the length of the result variable to be 10, rather than accepting the SAS default of 8 characters. The INPUT statement uses list input, with a $ signifying character variables.

DATALINES;

a statement on a line by itself, precedes the data. (Trivial pursuit fact : CARDS; will also work, dating back to the days when this statement was followed by cards with the data punched on them.) A semi-colon on a line by itself denotes the end of the data.

DATA diseasetest ;

ATTRIB result LENGTH= $10 ;

INPUT result $ disease $ weight ;

DATALINES ;

positive present 240

positive absent 40

negative present 60

negative absent 160

;

Step 2: PROC FREQ

PROC FREQ DATA= diseasetest ORDER=FREQ ;

TABLES result* disease;

WEIGHT weight ;

Yes,  plain old boring PROC FREQ. The ORDER = FREQ option is not required but it makes the data more readable, in my opinion, because with these data the first column will now be those who had a positive result and did, in fact, have the disease. This is the numerator for the formula for sensitivity, which is:

 

Sensitivity =   (Number tested positive)/ (Total with disease).

 

TABLES variable1*variable2   will produce a cross-tabulation with variable1 as the row variable and variable2 as the column variable.

Weight weightvariable will weight each record by the value of the weight variable. The variable was named ‘weight’ in the example above but any valid SAS name is acceptable. Leaving off this statement will result in a table that only has 4 subjects, 1 subject for each combination of result and disease, corresponding to the data lines above.

Results of the PROC FREQ are shown below. The bottom value in each box is the column percent.

Because the first category happens to be the “tested positive” and the first column is “disease present”, the column percent for the first box in the cross-tabulation – positive test result, disease is present – is the sensitivity, 80%. This is the proportion of those who have the disease (the disease present column) who had a positive test result.

 

Table of result by disease
result disease
Frequency
Percent
Row Pct
Col Pct
present absent Total
positive 240
48.00
85.71
80.00
40
8.00
14.29
20.00
280
56.00
negative 60
12.00
27.27
20.00
160
32.00
72.73
80.00
220
44.00
Total 300
60.00
200
40.00
500
100.00

Output from PROC FREQ for Sensitivity and Specificity

The column percentage for the box corresponding to a negative test result and absence of disease is the value for specificity. In this example, the two values, coincidentally, are both 80%.

Three points are worthy of emphasis here:

  1. While the location of specificity and sensitivity in the table may vary based on how the data and PROC FREQ are coded, the values for sensitivity and specificity will always be diagonal to one another.
  2. This exact table produces four additional values of interest in evaluating screening and diagnostic tests; positive predictive value, negative predictive value, false positive probability and false negative probability. Further details on each of these, along with how to compute the confidence intervals for each can be found in Usage Note 24170 (SAS Institute, 2015).
  3. The same exact procedure produces six different statistics used in evaluating the usefulness of a test. Yes, that is pretty much the same as point number 2, but it bears repeating.

Speaking of that SAS Usage Note, you should really check it out.

Next Page →