Apr

30

This is why:
A. I support affirmative action
B. I think some kids succeed in math and science while most don’t.

For the past several days, this call has been heard in our house at least once every five minutes,

“Mom! Dad! I need help!”

Hard at Work?

Hard at Work?

It is science project time for the sixth grade at St. Anne’s School. This year, the world’s most spoiled twelve-year-old has gotten on sciencebuddies.org and decided to do her project on how the density of a solution can be determined by the index of refraction. Plus, it involves lasers, so it is hard to beat that. So, every five minutes we hear,

“Mom! Dad! I need help!”

“Yes, what is it?”

“I found on the internet that Snell’s Law is sine(theta1) divided by sine(theta2). What’s a sine function?”

So, I sat down with the white board in the living room floor. (WHY do we have a white board on our floor? Who put it there? I don’t know.) and wrote:

y = f(x)

She asked,

“So you multiply f and x, right?”

We realized then that she had only gotten so far in school and to her the notation f(x) meant you multiply f by whatever is inside the parentheses. So, I explained the idea of a function, drew out a linear function, a curvilinear function and a sine function.

A few minutes later,

“Mom! Dad! I need help!”

“Yes?”

“What is theta1 and theta2?”

So, we find an article on wikipedia that explains Snell’s law and has a diagram showing theta1 and theta2. We explain that theta is a Greek letter, that in math people use Greek letters a lot to stand for things. We point out which one is theta1 and which is theta2 on the diagram in the article. Satisfied, she writes up the first part of her science project – her question and hypothesis.

While she’s at work, Dennis gets on line and orders prisms and lasers from MiniScience.com . In an amazing burst of restraint he only orders two prisms and four lasers and nothing else. Realizing we need something to measure density, I walk down to Sur La Table and pick up a measuring cup that has a digital readout in the handle that tells the weight in grams of what you put into the cup.

She spends a good bit of Sunday afternoon setting up her apparatus and taking measurements, the first part of that is messing with the cup, putting in sugar, checking the weight in grams, dumping in water, calculating the percentage. It involves making a mess and not getting yelled at, a combination hard to beat. Calculating ratios and density is secondary. Being only slightly higher on the maturity scale, Dad helps.

Eventually, he holds the laser while she marks the spot it hits on the ubiquitous white board. They repeat this with solutions varying in density.

Julia's Laser Project

“Mom! Dad! I need help!”

“What is it?”

“On this paper it says I am supposed to write down which are my dependent and independent variables and which are my controls. Which is it?”

“Well, the thing that you changed would be your independent variable -”

“The density, how much sugar was in the solution. So the independent variable is the thing that changed and the dependent variable is the thing that stayed the same?”

“No, variables change. That’s what vary means, to change…”

A discussion of variables versus constants ensues.

Over the days that Julia works on her science project she learns about math, including trigonometry, measurement in grams and centimeters, refraction and more. She uses the Internet and finds some sites that interest her. She spends a lot of time on sciencebuddies.com , looking at other projects she doesn’t choose. She reads about Snell’s Law and refraction on wikipedia. Does she understand it all, even with explanation? Nope, but she understands a lot more than she did last month. In making her project board, she uses OpenOffice, decides she doesn’t like that and switches to Microsoft Office. She tries the chart feature within PowerPoint, decides that sucks and does her chart in Excel. She learns how to edit a chart in Excel….

“Mom! Dad! I need help!”

“Yes?”

“How do I fix this chart?”

“Right-click on it. Pick select data. Click where it says X axis category labels.”

We have another discussion about X axis and Y axis, categorical data versus numeric data. And so it goes until the project is done. Every year, Julia is required to do a science project because every child at her school is required to do a science project. Similarly, every child in her school takes Algebra in the eighth grade because that is the only math class that is offered. A teacher at a public school district bragged to me recently that her district did the same, every child took Algebra in the eighth grade. I found that fascinating because a few years ago I had done an evaluation for a program at the high school that was addressing the problem that 65% of the NINTH-GRADERS were failing Algebra. So, the solution, apparently, was to teach Algebra in the eighth grade.

This is like back in the 1970s when the solution to children from lower-income families entering kindergarten behind those from middle-class families was to have mobiles over the crib and other accoutrements of the typical suburban nursery.

What Julia has that those children don’t have is both a school that requires more of her and a home environment that provides the support to meet those requirements. There are three computers within reach of where I am sitting, with Unix, Macintosh and Windows operating systems all with either Open Office or Microsoft Office. There is a wireless network in the house. While the stuff makes it easier to do her project, it is not just the stuff and it is not just the requirement to do a science project.

She also has two parents sitting around who are willing (albeit grudgingly at times) to drop what they are doing and explain anything from the concept of f(x) to how to label the X categories on a graph in Excel. While I am writing this, because a documentary on the financial market is on TV, Julia and her father are arguing about economic theories based on rational behavior versus Schiller’s theory of irrational economic behavior. It involves some rather immature discussions of what he might do to the stuffed monkey he is offering to buy from her and tossing of the monkey back and forth.

In the past couple of weeks alone, Julia has probably received 20 hours of tutoring in math and science. Vygotsky would be pleased. Two years from now, she’ll be taking exams to get into high school and I am > 99% sure that she will get into the high school that we have already picked out. Is that arrogant? Nope. With nine years of the advantage of a good private school and day after day of patient (usually!) explanation of functions, sine(theta), X axis, angles of refraction and more I expect she will do well. Currently, she also has an older sister in the house who was a history teacher and is living at home while she finishes her masters. She makes sure to check Julia’s social studies homework and quiz her on that.

Why am I in favor of affirmative action? Because I am not stupid. The world’s most spoiled twelve-year-old has had years of individual tutoring, just about every resource money can buy and excellent, caring teachers every single day. I realize that any child that comes from a low-income home with parents who have never graduated from college and does just as well as Julia on the high school or college entrance exams is probably more motivated, smarter or in some way exceptional.

That’s the way the world is, right? I’ve never been too happy with that answer. So, I am sending an email to the urban schools program at the university offering to teach their teachers how to use SAS On-demand for Academics (hey, it will be free beginning in August). Yes, it is a small thing. I am pretty sure, though, that big changes come from a combination of small things added together.

Maybe you could do something to help. Probably you have your own spoiled twelve (or ten or eight) year old that hollers every five minutes that she needs help, but maybe there’s some little bit you could do to help someone else’s, too.

Apr

25

I have been trying to get ready for two workshops this summer. One is called Visual Data with SPSS (pretty obvious what it is about). The second one is statistics using SAS Enterprise Guide. I was going to call the first course Statistics without Numbers and the second one Statistics without Programming. A colleague pointed out that what students want is not statistics without programming but statistics without pain. I never quite see statistics as painful in the same way that some students do, but I conceded his point and so that is now the official name on the schedule.

It will, in fact, be a course without programming because I have spent half the weekend so far beating the data into shape. Unfortunately, this is going to be a bit misleading for students because in real life data don’t come nicely packaged. There are a couple of things that I have not found a way to do in SAS, SPSS or Stata using a point-and-click (GUI) interface.

Chief among these is array processing. Say, for example, I want to recode all 90 questions on a survey to have both -1 and 8 as missing values. The closest you can do this is in SPSS with the TRANSFORM > RECODE menu options and it does remember the previous values you entered for old and new values. Still, it’s much quicker to just write the syntax for it. Same with Stata and SAS. If there’s a way to do it quickly, I have yet to discover it.

One idea I stole from Dreamweaver is snippets, little bits of code you store to do specific little tasks, like create a form button. Probably the most common “snippet” I use in SAS is the array/ do – loop

data in.visualdata ;
set in.visual ;
array redo{*} _numeric_ ;
array nxt{*} q1 — q920a ;
Do i = 1 to dim(redo) ;
if redo{i} = -1 then redo{i} = . ;
end ;
Do j = 1 to dim(nxt) ;
if nxt{j} = 8 then nxt{j} = . ;
end ;

Above, the data used -1 for missing data for all of the numeric variables, so it was easy enough to take care of that. However, for some questions, 8 was coded “no opinion/ don’t know” so I wanted that to be missing also, but for other questions 8 was a valid value. So, I needed two array statements and two do-loops.

I did not see any way to do this without programming other than 90+ pointy-clicky things. Not.

I have similar “snippets” that do the exact same thing for Stata and SPSS.

Another disappointment in Enterprise Guide in particular is the lack of a convenient where clause. I would like to only analyze cases where the respondent selected Obama or McCain as the likely candidate in the election. I could easily use the QUERY feature in SAS EG , create a computed column, recode into a new column called vote2008 and now have three values, missing, Obama and McCain. However, if I wanted results only on those who had selected Obama or McCain I would have to use the Filter & Sort feature and create a new dataset, I thought perhaps there was a WHERE clause and I had missed it.

So, I googled “SAS Enterprise Guide” WHERE clause and was linked to a post that ironically mentioned my blog saying that I can’t see a lot of experienced programmers switching to Enterprise Guide. [In an aside here, I should mention I did not get as much hate mail as from the R people, just some snippy comments from the SAS EG folks about how I am “old”. Having survived the adolescence of three daughters and a fourth now on the brink I have developed immunity to all such comments . Moo ha ha <---- Evil scientist laugh, in case you didn't recognize it.]

Eva, Supergenius Baby

Eva, Supergenius Baby

Besides, when you’re old, you get to have grandchildren as compensation. So, it’s all good.

In the comments on the blog that commented on my blog (are you lost yet?) was a discussion of the disappointing absence of the WHERE clause.

I overcame this disappointment quickly because SPSS actually does have something like what I wanted. Go to the DATA menu and choose SELECT CASES, throw in an IF clause and you have the dataset you want to analyze. Then, when you go to the next analysis, you can select different cases if your little heart desires. In another, fleeting, disappointment, SAS Enterprise Guide does not seem to have an option to export to SPSS (or Stata, for that matter), although SAS 9.2 does export to both SPSS and Stata (and about damn time, too). No big deal. I exported it to Excel which will pop open in SPSS no problem.

In yet another disappointment (is the title of this post not “Life is full of disappointments”?”) I could not find a way to make SAS EG do the graph I wanted which was the mean income of people who voted for McCain and people who voted for Obama. The bar chart options kept giving me percentage, cumulative percentage, frequency and cumulative frequency as the only options. Yes, I KNOW I could code it in PROC GCHART but have you ever actually written anything in SAS/Graph? Yuck! It reminds me of when I used to have to write things using Tell-A-Graf to produce plots on our plotters at General Dynamics. (And if you remember any of that, you really ARE old!)

Of course, the course IS entitled Visual Data with SPSS and I was only cleaning up the dataset in SAS EG because it happened to be open.

In the final disappointment that has been going on for a while, actually, I haven’t been able to read in the formats with a .stc file from ICPSR. I contacted them and they suggested running with options nofmterr . This is one of those pieces of advice like yelling “Run faster!” to a runner in a race. It is correct but not really very helpful. My problem is that I wanted to have the formats created so I could use them. Usually ICPSR provides you code in SAS, SPSS or Stata with the formats/ data labels. Not this time. Oh well, that is something helpful, young assistant can do on Monday. Thankfully I will only be using 16 of the bazillion variables.

Anyway, I am over all of it. Tomorrow, after judo practice, I am going to the Renaissance Faire for Mother’s Day. We are going tomorrow because, for the umpteenth year in a row I will be out of town on Mother’s Day. This time, though, it is NOT for work but to watch my next-to-youngest baby compete in Tunisia. Some people, like those that watched her winning this final at the Valentine’s Day Massacre, say she is not such a baby, but she still is to me.

So, yeah, my software isn’t perfect but the weather is lovely and my kids are pretty good. As for my husband, he just brought me up a glass of Chardonnay and it is time to kick back, drink it and read the New York Times (yes, even though I live in LA, I read both papers every day).

Just read a tweet from some young starlet saying,
“You don’t marry someone because you can live with them, but because you can’t live without them.”

And my thought was,
“Honey, you are obviously single. (And put some more clothes on, too.)”
Yes. I AM old.

So, despite the disappointments, I guess I will survive to teach the summer workshops. Who knows, I may even get time to go to the beach.
beachbahamasjpg

Apr

21

When Data is Not Art

April 21, 2010 | 2 Comments

I failed art in junior high school. When I tell people that, people who actually have artistic talent, they look at me in disbelief and say,

“No one fails art. That’s one of the great things about art. How could you possibly fail art?”

The answer is that I was very, very bad at it. Part of this might have to do with the fact that I am extremely near-sighted and was constantly losing my glasses and then going without for weeks or months until somehow my mom found the money to buy me yet another pair. The other part, to be truthful, is that I was just very, very bad at it.

Narratives 2.0 has awesome pictures of music tracks,
which maybe mean something if you are a musician. Then again, maybe not.

Then there is Synesketch, which is “.. a generative painting system of imaginary colliding particles, inspired by graphics created by particle colliders”.

Flowing data is more what I am talking about in terms of data visualization. While some of the graphics are just plain funny (the one on love, for example) , the message of this map, on mortality rate under five, should be obvious to almost anyone.

Often, when I am looking at data, it is something far less artistic. I’ve done a lot of program evaluations over the years, sometimes of programs that were not exactly above board. After all, the staff members reason, I’m flying in from thousands of miles away. They’ll just enter some names and test scores in their database. How will I possibly know?

Here is an SPSS dataset that happened to be lying around. It has the actual data from a project that was supposed to be providing staff training. There was an experimental group, which received the training, and a control group that did not. The first thing I do is select out the control group and plot the pre-test by the post-test. If this is a reliable test, there should be a high correlation between pre- and post-test for the control group, fitting pretty close to a straight line.

Plot of Control Group Pre-test by Post-Test
The next thing I do is SELECT CASES (found under the DATA menu) for the experimental group. If the training was effective at all, there should be a correlation between the pretest and post-test for the experimental group, but there should be more scatter around the line. Why? Because some people benefit more from training than others. Some come late, leave early and fall asleep in between. Others pay rapt attention and read more about the topic on-line when they get home. Some people with really high scores may have known all of the information in the training and not gained a point. Other people with average pre-test scores may have learned a lot and moved up to a higher score. People who had a very high pre-test score should still have a high post-test score. Hopefully, your training didn’t make them dumber. (Although I think I have attended a training session or two that felt like that.)

So, this is the pattern I am looking for – more scatter on the experimental group, tighter in the control group and those with high scores are more likely to stay high than low or moderate scores are to stay in the same place.
Pre-test by Post-test for Experimental Group

If this is NOT the pattern observed, then you and I are going to have a little chat and try to explore these data further. Personally, most of the time I have found more confusion than corruption. For example, once I was looking at graphs like those above but the relationship for the control group was not quite as strong as I expected and for the experimental group there was more of a relationship than I expected. I said,

“It looks as if possibly someone entered people as being in the experimental group who were actually in the control group and vice versa.”

A couple of the staff members looked guiltily at one another and then one spoke up,

“You know, I never really understood which was which.”

So… with only a minimal amount of sighing and eye-rolling, and a significant addition to our eventual bill, one of our young staff members checked all of the files, sorted them into the correct piles, corrected the database and we re-did the analyses.

A second situation in which, more than once, we have seen a different pattern than expected is when the amount of intervention varies greatly. I may see a clump of people who seem more like the control group. Their scores are pretty much the same as when they started. When you and I have our discussion about your program and I ask about those people it turns out that they were in the group that received therapy, after-school tutoring or whatever but they only came to one or two sessions and then dropped out. On the other hand, those people who actually did come to all 15 or 25 or 40 sessions showed significant improvement. When we find these patterns, we split the clients the project served into two groups – and it is usually easy to see a naturally occurring break – and analyze those who came to X number of sessions or more versus the control group. We also take a look at the people who dropped out of the program to see what information we can provide on the people who are not being reached.

We often expect that the more of a treatment an individual gets – therapy, training, tutoring – the better he or she will do. We don’t consider very often a second factor in there. The more of an intervention a person gets the more the therapist has given and, presumably, the better he or she gets at it. This is going to be especially true with a new program. Of course, if you have been tutoring for 20 years, an extra two years of experience isn’t going to make near as much difference as if you have six months of experience. Starting off my career as an industrial engineer back in the early 1980s (yes, they did have engineers back then), I was more familiar with learning curves than I wanted to be and it surprises me that we don’t think of these in social science very often.

Let’s take a look at this with our same training data. We have training delivered for four groups over a two year period.

interact1

For simplicity, I have included only the experimental group above. The first group that was trained, the green line, shows the least improvement, the middle two groups, which were trained in the middle of the project showed more improvement and by far the most improvement was shown by the fourth group (the purple line), trained when the staff had nearly two years of experience on this project.

[For those who want to know, yes, I did do a repeated measures Analysis of Variance with time (pre-test/post-test) as the repeated factor and group (experimental versus control) and training cohort as the between subjects factor. Yes, I did test for a three-way interaction and yes it was statistically significant at p < .001 . Yes, there was also a significant interaction of time by group, with the experimental group improving significantly more than the control group, also at p < .001 . ]

Apr

19

Being a professor can build humility. About twenty years ago, I was teaching the third course in the statistics sequence required of all graduate students. The second course had been taught by an adjunct professor, which was FAR less common then than it is now (that’s a whole different post). The first day I started out talking about multivariate statistics (that being the name of the course) and was almost immediately stopped by a student,

“What is that F-statistic you mentioned? We didn’t learn that in the last course!”

Another student interjected,

“Mean square error? We didn’t learn anything about that!”

And so it went. I told the students this material should have been covered in the previous course, but since it obviously had not been, we would back up and start with Analysis of Variance and other topics they should have learned in the previous course. At the end of the lecture, one young woman stayed behind. She said,

“I just had to say this in defense of Professor X. They may not have learned all of those things last quarter, but I was in that class and he sure the heck TAUGHT all of those things!”

The truth of the matter is that the great majority of our students remember very little of the information we teach them, no matter what we teach or who we are. Try this exercise with some friends who have been out of college a few years or more. Ask them to name all the courses that they took. If they are like most people, they can’t even remember the NAMES of every course much less what they learned in them. Even better, pick a random course, outside of that person’s major, and ask what was taught in it. If you are lucky, they can tell you one, maybe two facts.

Even more humbling after several years as a professor, I moved from academia to what my mother refers to as ” a real job” in the corporate world. When I was doing what everyone does to move up in rank, get tenure and generally prove their worth as a human being in the university world, i.e., publish articles in refereed journals, I, and my colleagues, were convinced that this would work its way down to those in practice in our fields, be it business, education or whatever. When I mention this to friends who are in business, they laugh in my face (strangers feel they need to be polite to you). I have to admit that in over twenty years running a business, the number of business articles I have read that helped my business were startlingly minute.

I have given hundreds of lectures on probability, p-values, power, the normal distribution, multiple regression, standardized beta weights, etc. The vast majority of people in those lectures were graduate students in social work, public policy, education, psychology, history, speech pathology and other majors but definitely NOT mathematics and statistics. In a development of which I disapprove, many graduate students can now get a masters or even a Ph.D. with only one course in statistics and research methods. Incredibly, my disapproval has NOT caused universities throughout the country to reverse this trend. Yes, I can hardly believe it either.

These graduate students, many of them long since graduated, are now making policy, managing programs, running companies and awarding grants. Many of them are very intelligent, logical and extremely knowledgeable about their content area, whether it be speech disorders or social security. What they are not is statisticians. They need to be able to make sense of statistical data and they need to be able to do it better than most of them can now.

Ironically, given the amount of time we spend on probability and hypothesis testing in most statistics courses, many of their decisions do not hinge on generalization to a population. It reminds me of a story someone told me recently about a presentation to a local school district. The speaker discussed the differences between schools in low income and higher income areas, correlations between various school factors and income, and talked at length about p-values and generalization to the population. The superintendent asked,

“What are you talking about? You have all of the records of all of the students in our district – or over 98% of them, anyway – this IS the population. I’m not interested in the rest of the U.S. or the world. You HAVE the population.”

In one or two semesters, I cannot make a person a statistician. I can’t even convince them to regularly read articles in refereed journals, and, if I am honest about it, most of those articles were written just so people could get tenure and really aren’t that useful anyway. What I hope I can do is enable him or her to take data and tell a useful and correct story. Here is a very simple example from a real project. The BP Project (not its real name and not affiliated with British Petroleum) received $1.5 million to recruit and educate teachers for bilingual students. The first thing the Project Director wanted was a picture of the type of students being recruited for the project. At the time when this snapshot of the data was taken they had admitted 408 teacher candidates over a five-year period.

Distribution by Gender

The first thing we can see is that the students are overwhelmingly female. That isn’t surprising since most teachers are female. Although increasing the number of males in teaching isn’t one of the project goals, the director is concerned about these results. She feels that many of the students in the schools where her graduates teach don’t have enough male role models and a discussion ensues with her staff about whether they should be trying to recruit more male students.

The university also has both a joint credential – masters degree program that students may elect and a regular teaching credential option. In looking at the proportion of students who choose the Masters in Education option we can see that it is a minority, less than 15%, who select the masters program. Although this is more than the 10% at the university as a whole, the director feels that her students could do better and discusses with her staff options for increasing the number of students who make this choice.
Masters -Credential Candidates

The next question she wants to ask is why some students select the masters program. Are those who choose the masters program better academically, as measured by their GPA? Are the ones who did not choose the masters program academically sub-par? As we can see from this chart, there is difference between the two groups. In general, when I tell clients,

“The F-statistic for Levene’s test of equality of variances was significant at p < .01. Therefore, you have a t-statistic of 7.3 with 135 degrees of freedom with a p-value of less than .001."

They do not find it very helpful, even though I personally think that is very useful information. (Sadly, I have found that people are willing to pay for what THEY personally find useful and aren’t that interested in whether or not it is crystal clear to me!)

medgpa5

So, I produced this graph using SPSS, which shows that the average student who is in the regular credential program has a GPA of 3.11 , above the 3.0 minimum for admission to the graduate program. The director found this very interesting. She wanted to know why, when the average student who was getting a credential could qualify for the masters program why more did not choose this route. I have no idea. She decided to meet with her students and former students and ask them. On the other hand, it is clear that the masters-credential students do have a higher GPA – nearly 3.5 versus 3.1 – and that this is probably significant in the practical as well as the statistical sense.

Given that this is a program designed to serve disadvantaged students, there have been mutterings from other faculty members that these students do not meet the university standards. This irritates the director for many reasons, not the least of which is that the program is for teachers to serve disadvantaged students, not necessarily teacher candidates who are disadvantaged themselves. She wants to look at the overall grade distribution.

GPA Distribution

This graph tells her several things. First, it tells her there are probably a few people who had data entered incorrectly because it is very unlikely anyone was admitted with a 1.0 GPA. It looks like maybe 1- 1.5% of the data have errors. I recommend she check that out. Second, the average student has a GPA of 3.16, substantially above the cut-off of 2.50 for the credential program and even above the 3.0 cut-off for the masters program. Further, the GPA distribution is very skewed, which is a good thing, in this case, and expected for a selective program. The overwhelming majority of the students exceed the minimum GPA cut-off.

The final question the director had was about ethnic distribution of her teacher candidates. The university is predominantly white, non-Hispanic but she wondered whether a program designed to prepare teachers of bilingual students might attract more Latino and Asian-American students. This chart produced an interesting picture and one that suggested something wrong with the data.

ethnic8

The most common category is did not specify and the second is “other”. Looking into data collection problems, it was found that, in the initial year of the project, ethnicity was not asked on the student information form. So, to the extent that the first year ethnic distribution may have been different from later years, these data are biased. Was ethnic distribution different the first year? We don’t know.

Even in years the question was asked, many elected not to answer. A lot of hypotheses were offered as to why this was the case. Possibly non-Hispanic students felt they would have less chance of being admitted to the program and did not answer this question. When the staff followed up with some alumni they were told that, in fact, students did expect to experience “reverse discrimination” in the program and were pleasantly surprised this did not occur. Why did so many students list “other” as ethnicity? Some were mixed ethnicity, e.g., African-American and White and did not feel either category fit. Some were Native American or Filipino. Others we have no idea why they put down other. Given the questionable validity of the data, the director was cautioned against using these results to draw any kind of conclusions about the ethnic make up of the program participants.

One course I do remember from graduate school over twenty years ago was Questioning and Teaching, taught by J. T. Dillon, the author of a book by the same name. I haven’t seen him since and I doubt he’d know me if he tripped over me. I do remember a question he asked though, and my answer. He wanted to know:

“How do we know if we have taught someone something?”

I said,

“I think I have taught a person if I have given them the answer to a question they have been wondering about. If it isn’t THEIR question, they probably won’t pay any attention to it and I’m sure they won’t remember. If I don’t answer their questions, I’m just a person who stands in front of a room and talks a lot.”

He repeated,
“A person who stands in front of a room and talks a lot. Young lady, do you realize you’ve just given the description of most of the teaching that occurs in this country.”

I didn’t have an answer to that.

Apr

13

The Next Big Thing

April 13, 2010 | 67 Comments

I’m at Seattle this week, at SAS Global Forum, and it is even greater than usual. I go to several conferences each year, some because I am presenting, some because there is a topic that particularly interests me, but there are three I go to every year.  Of these, SAS Global Forum is the one I would absolutely not miss. It is not for those on a limited budget, but it is worth it. You get the chance to meet A LOT of the smartest people in the world. Seriously. And I have a basket of degrees and am married to an honest-to-God rocket scientist so my bar for “smartest people in the world” is pretty high.

One of the other two I always attend are the Western Users of SAS Software conference, you learn a lot , it’s relatively inexpensive and not far to travel. Lots of bang for the buck. The second is the SPSS Directions conference.

At ALL of these, and in general, in the back of my mind all of the time, I am looking for “the next big thing”.  Whether as an individual, a university or a company, I think to stay competitive in the long-run you need to be ahead of the learning curve, as people who want to be smart-asses refer to it, “bleeding edge”. Think about it, if you were teaching statistics twenty years ago, you had the choice of having your students learn SPSS, SAS, SYSTAT, BMDP or Minitab. Of those, BMDP, which was “for real statisticians”, kind of like the R of the day, is one I haven’t seen used in years. I thought SYSTAT was off the market but I did see an ad for it recently, surprised to hear it still existed.

If you had taught your students SAS twenty years ago and they stuck with it they are much more marketable now than if you had made the other choices. My definition of marketable is based on how many jobs are available requiring SAS as skill, and how extensible those skills are. For example, Stata is not really feasible to use for running a company’s entire data management and data analysis. If you are an individual economist and you just need to do some specific econometric procedures, you don’t care about that, but if you are looking for “the next big thing”, something that will be around and used by millions of people twenty years from now, Stata is probably not it. Actually, I don’t think that’s their plan, anyway. I think their plan is to be a very good choice for high-level statistical analysis and stay in business as a profitable company.

Contrary to what  some people seem to think, R is definitely not the next big thing, either. I am always surprised when people ask me why I think that, because to my mind it is obvious. Note: For those of you who were so unhappy with the example I used previously, here is a new snippet of code from the site R by example

Below is an example of R code:

# Goal: Simulate a dataset from the OLS model and obtain
# obtain OLS estimates for it.

x <- runif(100, 0, 10) # 100 draws from U(0,10) y <- 2 + 3*x + rnorm(100) # beta = [2, 3] and sigma = 1 # You want to just look at OLS results? summary(lm(y ~ x)) # Suppose x and y were packed together in a data frame -- D <- data.frame(x,y) summary(lm(y ~ x, D)) # Full and elaborate steps -- d <- lm(y ~ x) # Learn about this object by saying ?lm and str(d) # Compact model results -- print(d) # Pretty graphics for regression diagnostics -- par(mfrow=c(2,2)) plot(d) Follow this link for the rest of the program.

I know that R is free and I am actually a Unix fan and think Open Source software is a great idea. However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail. It does NOT fit with the way the vast majority of people in the world use computers. The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.
There are two developments that I see coming as The Next Big Thing.
Data visualization. I am teaching a workshop this summer on this topic. This isn’t an ad, it is not open to the public so you can’t come anyway. I’m teaching it because I have seen more and more professors AND students frustrated by the fact that the average graduate student has trouble really understanding statistics. They may be able to get the correct answer on a multiple choice test that asks about a critical p-value. I have lived over half a century now and discovered that life holds very few multiple choice tests. We need statistical thinking, data literacy or whatever cool catch phrase someone can coin. This is the wave of the future. I am going to use examples from SPSS, SAS Enterprise Guide and JMP in this course because they can all be done with the pointing and clicking AND for those who want to go further all have a coding option, giving that extensibility thing.

Analyzing enormous quantities of unstructured data: First, let me explain structured data. That is data that is in a set format. Say, you have your annual expenditures. The first column is date of expense, the second column is check number, the third is the amount. That’s structured data. It can be over more than one row and in all sorts of other ways but the main point is that you have some sort of definite structure. The overwhelming majority of data – forum posts, blogs, comments on customer service cards, websites, etc. etc. is unstructured data. People start wherever they want, finish wherever they want, change subjects and just basically do it however the hell they way.  And there is a ginormous amount of this stuff. The Next Big Thing is going to be finding meaning from this data. Google and its imitators are doing it with their search engines. Every company that has a clue is mining for market information.
So, for the next year, those are the eggs I am putting in my basket. I am sure the shape of those two fields will change over the years, but I guarantee that neither will go the way of BMDP, MUMPS and COBOL.

Apr

4

Happy

April 4, 2010 | 3 Comments

Easter.

It being Easter weekend I have tried not to do too much work. The world’s most spoiled twelve-year-old has assisted in this goal insisting she needed to be taken shopping for books, nail polish and things to put in the Easter baskets. Since two of her sisters are in their twenties we need more than just chocolate. This included trouble dolls from South America (you whisper your troubles, put them under your pillow and they are supposed to solve them while you sleep), room freshener from L’occitane for the sister that smokes and bath salts for the sister that doesn’t.

Shortly I will have to begin administering death threats (and you WON’T be resurrected) to get everyone dressed to go to mass. Even though every single person in this family attended years of Catholic schools, most of them are more Catholic in theory than practice. The sun has come out and the two youngest want to go hiking in the mountains to see the waterfalls. Dennis wants to drink a martini and Jenn wants to watch 1940s movies while she writes a paper that’s due tomorrow.

Taking a break from work, I’ve been reading, The Geography of Bliss, by Eric Weiner, who spent a year and tens of thousands of miles trying to find the happiest place on earth.  (Hint: It isn’t Disneyland, no matter what they tell you.)

The best part of the book, I think, was his conclusions at the end about the nature of happiness;

“Money matters, but less than we think and not in the way we think. Family is important. So are friends. Envy is toxic. So is excessive thinking. Beaches are optional. Trust is not. Neither is gratitude.”

There are two other points I would add.

1. We find time for what is important to us. There are certain projects I have been planning to do for months but have not gotten finished, or, to be honest, worked on very much at all. Yet, next week I am spending an entire week at SAS Global Forum in Seattle. I am even flying in early to attend a class on data visualization techniques. Why? Because I am convinced it will be worthwhile. I know I will learn new things, meet fascinating, brilliant people and no one I have ever met has said to me, “I wish I hadn’t learned so much.”

At my age I am too impatient with bureaucracy to get another degree (besides, I already have four). I want to take from the smorgasbord of learn this, not interested in that, this looks useful, this may not be useful but sounds fascinating nonetheless. I am not interested in being force fed a curriculum to get a degree. Not putting that down. It’s necessary at a certain point in one’s life. We didn’t let my two-year-old granddaughter choose whatever she wanted to eat when she stayed with us (well, there was that chocolate birthday cake for breakfast incident) but I am a grown up now and can choose for myself.

2. While some places may be happier than others overall, it is how you FIT in that place that matters for your happiness. I have lived in; Halifax, Nova Scotia, Tokyo, Japan, Minneapolis, Minot, North Dakota, Los Angeles, California and St. Louis, Missouri (this is a partial list). I’ve visited the Bahamas, Baja California, Mexico, Winnipeg, Toronto, Athens, Beijing, Costa Rica, Caracas, Paris, London, Zurich and over 40 of the fifty states (another partial list).  It is true as psychologists say that “going geographic” seldom helps because you take your problems with you. There is one exception, though, and that is when being in that place is what is making you unhappy. I could never be happy in Minneapolis because I hate cold winters with a passion. I don’t mind a few days of snow now and then that has the good manners to melt, but this sticking around for months on end is my definition of hell. In fact, I am considering skipping mass just to hedge my bet on being warm in the hereafter as well.

Happy Easter

Apr

1

Occasionally it has seemed to me that I see things a little differently than other people.  Several years ago, I was co-authoring a paper on outcomes for married people with mental retardation. My co-author was an extremely intelligent person with far more knowledge of the issue than me. Looking at the first set of figures, which was a 2 x 2 cross-tabulation of had children (yes/no) and living independently without state services (yes/no), I stated,

“It’s obviously a significant relationship.”

Sandi asked in surprise,

How do you know that before you look at the table that shows the chi-square and significance?”

I said,equally surprised,

“Because you can see it. I mean, it’s obvious, isn’t it? Chi-square is the square of observed minus expected divided by the expected frequency. If there was no relationship you’d see those with and without children equally likely to be in the good or bad outcomes categories. That’s what you’d expect. It’s very far from equal and when you square it, the difference will be enormous.”

She said,

All I see are a bunch of numbers. But I can see from the table on the next page that you’re right.”

In the book, Born on a Blue Day, an autobiography by a man with autism who also had a remarkable facility with numbers, Daniel Tammet talks about seeing numbers as different shapes and colors. Well, I certainly don’t see anything like that, but I do wonder if spending lots of time just staring at numbers you come to see things differently. For example, today I was looking at an example of coding a structural equation model using PROC CALIS, a SAS procedure. Reading the equations, it was obvious to me that the first few equations looked like this:

path

I grabbed someone who was walking by, who happens to be a very smart person and said,

“Look at these equations. What do you see?”

She said,

“Well, I don’t know. Maybe it is sales data?”

I drew out the figure above and demanded,

“Do you see something like this?”

She said,

“No, not at all. If you are teaching or using this, you need to document it more, I think.”

My point, and I do have one, is that if you spend a LOT of time, and I have spent a good bit of the past 27 years, staring at numbers, things may start to look differently to you and that is very useful and helpful. In Daniel Tammet’s case it sounds like it is genetic but for people like me I think a lot of it is environment, specifically experience. I met with someone later in the day and when I was explaining to her how structural equation modeling can be viewed as a combination of confirmatory factor analysis and path analysis she said,

“I see.”

And as we were talking, it was clear to me that, very literally, she did SEE it.

One way that I think this happens was demonstrated when I got home tonight. I was going to take my little Julia to judo practice. In my spare time, of which I have none, I am the president of the United States Judo Association. During my misspent youth I was the first American to win the world championships in judo (it’s true, you can look it up). However, when I got home Julia was on her laptop writing her science project on Index of Refraction, which I decided was more important, so I let her be. Julia turned 12 two weeks ago. Her project involves measuring the angle of refraction in solutions with varying degrees of sugar. Tonight she looked up definitions of density, index of refraction and how to calculate the angle of refraction using Snell’s law and sine functions. Both her father and I gave her definitions of density (I _did_ start out as an engineer). Dennis explained a little bit about refraction, the index of refraction and angles. I showed her a picture of the sine function.

Last year, she did her project on the impact of multi-tasking on performance and did a pre-test and post-test with a cross-over design. She found the information on the Internet and told her parents to mind their own business it was her project and she was going to do it herself, which she did, punctuated only by a bit of crying when she could not figure out how to do the graphs. She eventually figured it out, after the evil mother told her she would spend the rest of her life at that kitchen table working on it if necessary but she WOULD figure it out. Fifteen minutes later it was done.

Lest one misinterpret this as Julia being some type of math/ science prodigy, it should be noted that although she made all A’s last year, she is currently not allowed to do video chat because she got a C+ in science after doing poorly on a test (she gets nervous and she has no concept of test-taking strategies. She will spend all of the time on a question if she doesn’t know the answer, rather than doing the ones she knows and going back to it later). After finishing her research for the evening we had to go to the Third Street Promenade in Santa Monica because her life would apparently have ended did she not get Van’s tennis shoes in two different colors, blue new polish, two new pairs of  “skinny” jeans from Pac Sun and a new jacket from American Apparel. If the economy is not rebounding, it is not Julia’s fault.

AND YET …  I am fairly certain that by the time she is in high school, Julia will, literally, SEE sine functions and a great deal more.

This is good. In a way.

Five days a week, I look down on William Jefferson Clinton Middle School, which is a few hundred feet from the building where I work. It is in a very disadvantaged neighborhood in Los Angeles. On Friday, I attended a presentation at the university where brilliant well-meaning people talked about how they were going to leverage technology to raise the achievement of students like these.

Maybe.

Still, I can’t shake the feeling that three years from now, when they are all in high school, Julia is going to be seeing a sine function when all those student see is a bunch of numbers.

And, as Robert Frost said, that may make all the difference.

Blogroll

WP Themes