I wanted to learn how to use smashwords, for reasons completely unrelated to this blog, SAS or statistics, but I thought it would go much faster if I had an actual project to work on.

I considered writing a serious book for researchers on SAS Enterprise Guide, but then I decided that I did not want to do it badly enough to put in the amount of work required.  If you are doing something pointy- and-clicky like SAS Enterprise Guide, you need screenshots, and at a lot higher resolution than for the web.

Also, my experience with publishers and editors has revealed that they have actual standards and they ask questions like,

“Why do you have a picture of two naked mole rats on page 42?”

and are seldom satisfied with extremely logical answers like,

“I had a picture of one naked mole rat, but he looked lonely. Or she looked lonely. I’m not really clear on the gender differences.”

So, enter Plan B …. I got to thinking (yes, it does happen), that I have written several things on moving from beyond basics with SAS and I already have the screen shots and the output. Furthermore, I think the resolution does not need to be any higher than what I already have. So, my plan is to edit and put together several papers I have written into an approximately 200 page book. It is theoretically possible that people might buy it, but even if they don’t, it is highly probable (p > .79 ) that I will learn how to use smashwords . My next step after that is learning how to publish for Kindle on the Amazon site.

The title of the book is “Beyond SAS Basics: Tips, Statistics and a Naked Mole Rat”

Here is my table of contents

Part I – After the DATA Step

Beyond the Basics: Statements, formats and functions

Beyond the Basics: New procedures and some new twists on old procedures

Beyond the Basics: Making your output look like it was done by a grown-up

Part 2 – Statistics OR Math is not hard, well, it’s easier than unemployment

Logistic Regression – for when your data really do fit in neat little boxes

Repeated Measures Analysis of Variance – control group, experimental group, yada yada yada

Part 3 – Data Quality : Garbage in- garbage out & no one wants to pay you for garbage

Checking if your data blow with SAS Enterprise Guide

1, 2, 3 , 4 Data quality procedures

Part 4 – A little bit of macro

An introduction to macros with the STRT macro that I just made up

Super-duper useful data quality macro

Part 5 – Where do you go from here

A rambling map of SAS resources

A naked mole rat

 

What do you think?

 

I was very busy this weekend working on the semi-annual site update (I am SO getting last place in the Search Engine Optimization contest) and starting on my book – Beyond SAS Basics: Tips, Statistics and a Naked Mole Rat and on TOP of all of that, I had to take the world’s most spoiled 13-year-old shopping because there are apparently some items of clothing and footwear existing in Santa Monica that she does not own yet.

Unhappy camperI’m also working on a proposal for math education software and I got to thinking that there is SO much out there, how can there possibly be the need for any more. In (very) partial payment for the shopping spree, I had The Spoiled One review math games and websites for me. Since I don’t see the need to call out any particular resource just because she happened to randomly land on that one today, the names have been omitted to protect the guilty.

As background, I should tell you that she was recently accepted for a summer program for high-achieving girls, scores above average on standardized tests for math (not as above as WE would like) and has never made a grade below a B in anything. (Because in our house a C means you are grounded until the next report card.) On the other hand, homework is sometimes accomplished only as a means to effect the return of all of her confiscated electronics. In other words, she is a little better on achievement and motivation than the average student, but hardly a paragon of mathematics virtue. And here were her reviews:

 

Video of Rap Song on Mathematics Topics (Because, you know, you kids these days like that)

… Um, distracting. I learned nothing because I couldn’t understand the lyrics.

Place Value Video Lecture

Not really for someone my age (13). Kind of stupid anyway.

Pre-Algebra Game

Sucks! (She drew a picture here to indicate how much she hated it.) BORING. Doesn’t really work. (Punctuated by another picture)

Game with Word Problems

The game was good I guess … (a few minutes later…. ) Never mind. It didn’t give you the right answer after. I HATE THIS SITE.

Game on Factors and Multiples

OK. Not creative or fun. (Another picture, that looked something like this

  • . .
  • |

Sites on Math in Every Day Life/ Real Life Math

Eewww  NO!!  Doesn’t make me like math!

Mucho Math- The only one that didn’t suck

“That one with the Hispanic math teacher and the kid. That one was okay and kind of funny even though the topic it was on wasn’t really at my level.”

I found this last comment extremely interesting because I knew who she meant. I had sat my daughter down at a computer on a web page with over 1,000 videos, games and other math resources and she came up with the same option that I thought was one of the best ones I’d reviewed when I was doing  the same thing a couple of months ago.  The teacher is Lawrence Perez. The innovation he has included is really quite simple – he has a student in his video.

Having reviewed numerous other options myself, I have to say I agree with my daughter on much of it. The absolute WORST thing you can do in designing mathematics software is have it get the wrong answer, for example, when it asks :

If Y = 5 + x**2    and Y = 14    what is X

and you put  -3   and it says

WRONG!  The answer is 3

Of course, -3   is also a valid answer and then you have a student who says,

“I hate this program. It sucks!”

Not as bad, but also frustrating are those programs that don’t tell you the answer, but simply come up with the next question.

If you say that both of these problems are examples of poor design, well, I agree with you, but poor design seems to be rampant.

Having a game or video that is too basic is not the problem of the software, of course, but MAYBE whoever marketed it as being at the middle school level. Or, it may just be that there is wide variation among students and was not appropriate for this particular student.

Yes, I’m generalizing from an N of 1 (well, 3 actually, if you include me and my brother, who is a math teacher and has had generally the same responses), but from what I have seen so far, there is a whole lot of math education software out there that is not effective in interesting students enough to use it. Sometimes the game doesn’t even do the minimal job of providing the right answer, something any parent could accomplish with a $1.29 stack of index cards by writing the question on one side and the answer on the other.

Every time I have done this experiment, whether with me, my daughter or someone else, the outcome has been equally underwhelming. Even more underwhelming is the fact that almost NONE of the designers/ producers of these resources even MENTION the thought that perhaps one would evaluate  the software and see if it has any impact at all. The attitude seems to be “Here you go”.  Period. Kind of depressing.

I guess the good news is that there are about a bazillion more games, videos and other resources out there to try.

“Can you explain multicollinearity statistics?”

she asked.

Why, yes, yes I can.

First of all, as noted in the Journal of Polymorphous Perversity,

“Multicollinearity is not a life-threatening condition except when a depressed graduate student employs multiple, redundant measures.”

What is multicollinearity, then, and how do you know if you have it?

Multicollinearity is a problem that occurs with regression analysis when there is a high correlation of at least one independent variable with a combination of the other independent variables. The most extreme example of this would be if you did something like had two completely overlapping variables. Say you were predicting income from the Excellent Test for Income Prediction (ETIP). Unfortunately, you are a better test designer than statistician so your two independent variables are Number of Answers Correct (CORRECT)  and Number of Answers Incorrect (INCORRECT). Those two are going to have a perfect negative correlation of -1.  Multicollinearity. You are not going to be able to find a single least squares solution. For example, if you have this equation:

Income = .5*Correct + 0*Incorrect

or

Income = 0*Correct -.5*Incorrect

You will get the exact same prediction.  Now that is a pretty trivial example, but you can have a similar problem if you use two or more predictors that are very highly correlated. Let’s assume you’re predicting income from high school GPA, college GPA and SAT score. It may be that high school GPA and SAT score together have a very high multiple correlation with college GPA.

For more about why multicollinearity is a bad thing, read this very nice web page by a person in Michigan who I don’t know. Let’s say you already know multicollinearity is bad and you want to know how to spot it, kind of like cheating boyfriends. Well, I can’t help you with THAT (although you can try looking for lipstick on his collar), but I can help you with multicollinearity.

One suggestion some people give is to look at your correlation matrix and see if you have any independent variables that correlate above some level with one another. Some people say .75, some say .90, some say potato. I say that looking at your correlation matrix is fine as far as it goes, but it doesn’t go far enough. Certainly if I had variables correlated above .90 I would not include both in the equation. Even if it was above .75, I would look a bit askance, but I might go ahead and try it anyway and see the results.

The problem with just looking at the correlation matrix is what if you have four variables that together explain 100% of the variance in a fifth independent variable. You aren’t going to be able to tell that by just looking at the correlation matrix. Enter the Tolerance Statistic, wearing a black cape, here to save the day. Okay, I lied, it isn’t really wearing a black cape  – it’s a green cape. ( By the way, if you have a mad urge to buy said green cape, or a Viking tunic, you can fulfill your desires here. I am not affiliated with this website in any way. I am just impressed that they seem to be finding a niche in the Pirate Garb / Viking tunic / cloak market .)

In complete seriousness now, ahem ….

To compute a tolerance statistic for an independent variable to test for multi-collinearity, a multiple regression is performed with that variable as the new dependent and all of the other independent variables in the model as independent variables. The tolerance statistic is 1 – R2 for this second  regression. (R-square, just to remind you, is the amount of variance in a dependent variable in a multiple regression explained by a combination of all of the indepedent  variables). In other words, Tolerance is 1 minus the amount of variance in the independent variable explained by all of the other independent variables. A tolerance statistic below .20 is generally considered cause for concern.Of course, in real life, you don’t actually compute a bunch of regressions with all of your independent variables as dependents, you just look at the collinearity statistics.

Let’s take a look at an example in SPSS, shall we?

The code is below or you can just pick REGRESSION from the ANALYZE menu. Don’t forget to click on the STATISTICS button and select COLLINEARITY STATISTICS.

Here I have a dependent variable that is the rating of problems a person has with sexual behavior, sexual attitudes and mental state. The three independent variables are ratings of symptoms of anorexia, symptoms of bulimia and problems in body perception

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT problems
/METHOD=ENTER anorexic perceptprob bulimia.


Let’s just take a look at the first variable “anorexic”. It has a Tolerance of .669.  What does that mean? It means that if I ran a multiple regression with anorexic as the dependent, and perceptprob and bulimia as the independent vairables, I would get an R-square value of .331. Don’t take my word for it. Let’s try it. Notice that now anorexic is the dependent variable.

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT anorexic
/METHOD=ENTER perceptprob bulimia.

Now, look at that. When we do a regression with anorexia as the dependent variable and bulimia and perceptprob as the two independent variables the R-square is .331 . If we take 1 – .331 we get .669 which is exactly the Tolerance Statistic for anorexia in the previous regression analysis above. Don’t you just love it when everything works out?

So WHY is a tolerance below .20 considered a cause for concern? It means that at least 80% of the variance of this independent variable is shared with some other independent variables. It means that the multiple correlation of the other independent variables with this independent variable is at least .90 (because .9 * .9 = .81) .

Another statistic sometimes used for multicollinearity is the Variance Inflation Factor, which is just the reciprocal of the tolerance statistics. A VIF of greater than 5 is generally considered evidence of multicollinearity. If you divide 1 by .669 you’ll get 1.495, which is exactly the same as the VIF statistic shown above .

And you thought I just made this sh*t up as I went along, didn’t you?

I’ve been reviewing a number of options for students to learn mathematics.

A lot of sources kind of sucked. At best, these sites were just the same old thing, flash cards but on a computer screen, for example. There is nothing terribly wrong with that, but it is hard to imagine that they have any greater benefit than just using index cards you picked up at any store and writing 2 x 3  on one side and 6  on the other, which is how I think everyone has learned multiplication since we quit writing on slates with a piece of chalk. Hey, maybe we should go back to that. It probably involves less waste. Green math! But I digress …

At worst, these sites were just plain wrong. This was more often true for those that dealt with less basic mathematics, where they would, for example, give a definition for a chi-square that was really for a t-test or say that the median was the most common score in a distribution (It isn’t. That’s the mode.)

 

Other sites were better, including videos of short lectures and the explanation of whatever the topic they were teaching was correct. (AnnMaria’s first rule of teaching – have something non-stupid to say).

Two examples are:

  • Khan Academy site, which is free, has over 2,000 videos and Bill Gates as its BFF.
  • Cool Math Guy website has some free samples, for others, you have to pay. The videos I saw are good explanations of such topics on trigonometry.

There are hoards of math game sites out there, many of which are just a computerized version of asking your child over and over what is 47 + 52 until his brain crawls out his left ear and runs away just to escape the boredom.

Then, there are sites like Gamequarium, which offers a LOT of different math games for every topic, most of which look like they would be fun if you were immature, which I am.

ALL of the resources I found suffer from the same fatal flaw which is that they begin with the presumption that the student has some interest in learning math. This seems a reasonable, some might even say ‘sane’, assumption based on the fact that the person has come to a site that is for teaching mathematics. For those people who seek out these sites, they might work.

The problem is with the vast majority of people who WON’T ever voluntarily go to these sites because they really don’t give a rat’s ass if they ever learn math or not. Sometimes, as this excellent article “The Education of Jose Pedrazza” points out, they are much more concerned about whether they are going to be homeless, how their family is going to eat.

Given those circumstances, it’s really hard to focus on if you learn this math, you’ll be able to do next year’s math and so on for the next 10 years until you graduate from college and get a good-paying job. It’s all well and good to talk about delayed gratification when you are sitting here like me drinking Chardonnay at an expensive oak desk, and quite another when your mom is collecting cans to come up with money for dinner.

Some of it, the odds are great that you will NEVER use. I just came across this statement in a publication on research in teaching and learning mathematics.

“Across all age levels, the best estimates are made in temperature situations and the most difficult estimates involve acreage situations.”

ACREAGE? Okay, I’m 52 years old, I use math for a living, I’ve bought and sold four houses in my life, including one that had five acres of land with it and was in North Dakota AND NEVER IN MY LIFE HAVE I NEEDED TO ESTIMATE ACREAGE!!

Yes, I am sure there are farmers and landscape architects and people doing surveillance for homeland security applications who may need to estimate acreage. Every  time I write something like this, I get hate mail from people telling me this is why they will never hire me to work for them at Google Maps. (Of course, when I look up these people, they never actually work for Google, or anybody. They are invariably some embittered graduate student teaching Mathematics of Acreage Estimation at Boo-hoo U. )

My point is that most of math is taught completely out of context with no real thought to application other than answering a question on the SAT. For some students, like the most spoiled 13-year-old in America, who happens to live in my house, that is adequate enough incentive. One reason is that for her, and many of her peers, it is NOT gratification delayed ten years. At the end of the school year, many neighborhood parents trek to the Apple Store to buy the iPhone 4 or the gadget du jour for Buffy and Justin who got an A in math. In eighth grade, the kids will all take their high school entrance exams, and when the test scores come and acceptance letters come out, there will be ANOTHER round of iPhone -buying and trips to The Grove. A couple of years after that, many of those same kids will get  their first car, with the stern admonition that, “Your grades better stay up or you will be walking to St. Alphonso’s Catholic High School “.

I was a little depressed after I read this article on the Los Gatos Patch, where the mother happily admits that she could not do her 14-year-old son’s Algebra class. It tells me not only that we find it perfectly acceptable not to know math (while it is NOT okay to say that you forgot how to read) but also that the mom obviously has no need for Algebra in her daily life. On the other hand, I was majorly impressed that she got her son to make dinner and to clean up – twice.

Some people just like math – I did and I still do. That’s only incentive, though, to study the parts that interest you. For example, I watched a video on trigonometry for about five minutes. Then I was bored. It was exactly like the movie, Freaky Friday, where the middle-aged mother changes bodies with her teenage daughter, and in algebra class tells the teacher, “No, believe me, I will NEVER use this.”

I use algebra nearly every day of my life. I use matrix algebra, not every day, but certainly weekly, and calculus fairly often, too. On the other hand, I have NEVER and I do mean, NEVER, needed to know a sine, cosine, tangent, arctangent for any reason whatsoever, not even when I was an industrial engineer. This isn’t to say that no one ever uses these. I asked the house rocket scientist when was the last time he used any of these and he said that everyone in the real world uses all of these every day.  Well, EXCUSE ME!

Perhaps we have it backwards. Instead of railing about the poor performance of our kids on tests and teaching to the test, maybe we should turn things around. Perhaps we should start with why they need to know how to calculate acreage, t-tests or cosines. Give them some projects where this information as applied. Maybe then not only will they actually give a rat’s ass if they learn it or not, but they’ll also still remember it when they have 14-year-old kids of their own and be able to use that information on the job when people like me hire them.

Wouldn’t that be a nice change of pace?

 

Question authority.

Whenever I hear authoritative statements made that don’t fit with the world I see around me, I try to follow up.

How many times have we been told that the U.S. is just terrible in math, we are falling behind educationally, China and India are eating our lunch – deservedly so, because our students are all fat, lazy pie-eating, Wii-playing slackers, we’re far behind where we used to be and therefore our teachers need to be fired, unions disbanded and everyone who came to this country after 1918 sent back to wherever the hell they came from?

And yet, I wonder how that can be when many countries from China and India to Ethiopa and Uruguay have a much higher rate of poverty than the U.S.

I asked a very pompous businessman who told me how poorly we are doing in international comparisons if the 80% of the world’s population living on less than $10 a day, including the one-quarter of the children in the developing world who are underweight, the 1.6 billion people who don’t have electricity, THEY’RE all doing better than our kids in Santa Monica. He said,

“Yes! They’re studying by candlelight without enough to eat, living in a hut and they’re still doing better than our kids at math!”

I thought I would check. So, I looked at the Trends in International Mathematics and Science Study report and here is what I found:

The 2007 study only included 36 countries for Grade 4 and 48 for Grade 8. Neither India nor China were included. Only one country from central America was in the study, one from South America and almost no African countries outside of northern Africa / middle east nations.

In short, it was a non-representative sample that was definitely skewed toward the more affluent nations, I presume because those are the ones that chose to participate.

Among this group of predominantly fairly well off Asian and European countries, the U.S. average was above the median mathematics score both in fourth grade and eighth grade (which were the two grades tested). The median for both grade levels was 500. The average for U.S. fourth graders was 529 and for U.S. eighth graders, 508.

In eighth grade, only Korea, Japan, Singapore, Hongkong and Taiwan score significantly higher than the U.S. No disrespect to the people of Hongkong, but it is, after all, a city. So, four actual countries scored higher than the United States in  a sample of relatively affluent countries. Four. Rather than being in the bottom of the national rankings, we are in about the top 10% of a selective sample. The reports of our sucking are seeming a bit questionable.

At fourth grade, Korea was not included in the study. Again, Singapore, Hongkong and Taiwan scored higher than the U.S., as did England and Russia. There were two eastern European countries that also scored higher but they did not meet the TIMSS criteria for national target population. In short, if you have sampled from, say only 50% of your schools, then the results are called into question because that may be the higher performing half of the population. Even if we include those two, the U.S. came in tied for ninth out of 36 – so, in the top quarter of a group of countries that are richer and more industrialized by far than the world population.

But our students DROPPED in test scores from fourth to eighth grade, didn’t they? How about THAT? They did. So did  Hongkong, Singapore, England and Russia (four countries that were ahead of us in the fourth grade).  In fact, in eighth grade we are tied for fifth out of 48, in international rankings, we had moved UP.

In the twelve years the study has been conducted, from 1995 t0 2007, the U.S. average mathematics score has increased 11 points for fourth graders and 16 points for eighth grade students. In 1995, U.S. eighth graders’ average mathematics score was 8 points below the median while in 2007 it was 8 points ABOVE the median.

So, to recap, even among a sample of comparatively well-off countries, the U.S. comes out above average on all overall measures, and, in fact, above 3/4 or more of the other countries. The ‘countries’ that consistently score above the U.S. are Taiwan (population 23 million), Singapore (population 5 million) , Hongkong, which is actually a city (population 7 million) and Japan (127 million). South Korea (48 million) only had data for 8th graders. By comparison, the U.S. has a population of 307 million. So, all of those countries put together add up to about two-thirds the size of the U.S. and about 3% of the world population of 6 billion and rapidly increasing.

This is NOT to argue against looking at what these countries are doing to try to see if it could work in the United States, or to see what other explanation there could be for the differences. Yes, there may be other countries not included that score higher than us. Yes, the U.S. does need to try to do better, especially in Geometry, which was the one area of the test where we were below average.

However, this picture is a far cry from we’re near dead last and racing toward the bottom. Next time someone tells you how terrible mathematics education in the U.S. is, point him or her toward the TIMSS report and suggest looking at the data.

After all, our eighth-graders scored an average of 531, compared to an international mean of 500, in the area of data and probability.

NOTES

As did the TIMSS researchers, I considered countries tied if the difference in scores were not statistically significant. You can download the TIMSS report here.

By the way, statistics on poverty, etc. came from the Poverty Facts and Statistics page.

Now, I’m not as extreme as the person who created a SAS Tetris game, (Richard DeVenezia, in case you were wondering) but I still try to do just about everything in SAS,  if only to prove that I can do it.

Why would someone who has been using SAS for 28 years decide to pick up Ruby?  One legitimate reason is just for the hell of it, which is why I do most things. The other reason, though, is you will find that some things in Ruby are very easy that are not so easy in SAS, which makes it a very good complementary language.

For example, how many times have you had problems with a variable that was originally defined as a string somehow, even though that wasn’t what you intended at all. This often happens to me when importing Excel files where someone has typed spaces for missing data or before a number in a field I want to be read as numeric. However it happens, it’s annoying. You can use the input function in SAS to create numeric data from character data. You can create a new variable and add 0 to it and SAS will create a numeric variable. So, there’s ways to get around it. In Ruby, it’s really simple

varname.to_s     —  will change varname to a string variable

varname.to_i  — changes varname to an integer

Then, there are some things that SAS just won’t allow you to do, like mix character and numeric variables in the same array. What if you WANT to do that? Well, Ruby is cool with it.

On the other hand, I haven’t found an easy way to get Ruby to do a logistic regression. So … I am thinking if I find the time to learn Ruby well enough, this may be a match made in heaven, read in and manipulate data easily, then send it out to a file to be analyzed by SAS.

Yes, there are other languages that would do this equally well, or perhaps better, but it’s pretty obvious that Ruby has much wider applicability than just data management. If I am going to go to a great deal of trouble to learn something in depth, it better be usable in a variety of situations. After all, I wouldn’t use SAS that much if it didn’t have a whole lot to offer besides Analysis of Variance.

“How do you get business?”

I get asked this a lot, and since I doubt random people I meet at conferences are truly interested in my life, I think what they really want to know is:

“How can I get business as a consultant?”

I’ve been working as a consultant for over thirty years. Here are the steps, in order of how I got started in the business. Hope this helps.

1. Be able to DO something that has an impact your client can SEE.Leave the client with a concrete product  – a grant proposal submitted, database that now can be used to produce reports, statistical analysis and final report to a funding agency, a program that automates your monthly charts. The consultants I know who provide “coaching” or “strategy” have a harder time finding work because no one is quite sure what you’re going to do, if you did it well or how to tell when you’re done.

2. Go to graduate school and stand out. I started out with two projects that were referred to me by a professor who had more consulting work than he could do. When I went looking for more work, I could point to those successful projects. I have been hired on several occasions by former professors. By “stand out”, I mean academically. I published articles while a

graduate student, presented at conferences, received two fellowships and a small grant. In my statistics classes, I was at the top of the curve. Be genuinely interested in your field and try to learn as much as you possibly can, more than is just on the test. Work on campus. While you may get noticed if you are working full-time and just on campus for class, it is certainly less likely.

3. Work at a university. I was a professor full-time for seven years and part-time, on and off, for the next 14. There is a lot of questioning in the media currently about the value of a higher education. I think it depends on the type and quality of your education. Certainly there are a lot of really smart people at universities. Most people don’t meet a lot of statisticians and as my students graduated, moved on in  their careers and came to the point where they needed a program evaluator or statistician or statistical programmer, they thought of me. Others didn’t know me but they called their alma mater and talked to someone, a business professor, Multicultural Student Programs, or whoever they happened to know and that person recommended me.

4.Write the proposals yourself. Several of my first major contracts came from proposals I helped write, either as part of my duties at the university, or on a consulting contract. The client needed an evaluator, they didn’t have anyone in mind, so I offered to throw in my resume. (By the way, I get LOTS of people asking if I will write the grant for free and then they’ll give me the evaluation contract if it gets funded. I always say, “No”.  That is the topic for another post.)

5. Find partners who bring you business. After about ten years, I started working with partners who were very good at finding business. I don’t know how they do it. Magic, I think. If you’re good at the technical side and you find some partners who are good at the business development side, count your blessings and don’t EVER, EVER regret the share of the money they take. I have acquaintances who don’t do a fraction of the business I do

and a big reason is that they don’t see why they should share any part of “their” money. I just shake my head.

6. Be generous with your referrals. Many times, I have referred clients to graduate students or other professors who were interested in starting out as consultants. Partly I did this to pay back the help I got early on. My less altruistic reason is that it is not a good use of my time or the client’s money to be doing t-tests in SPSS or correlations with Excel. It’s one thing if that’s part of a project, but if it’s the whole project, they’d be better off paying someone $40 an hour and I’d be better off working on a contract for $20,000 or $200,000 instead of one for $2,000. It would be nice to say that these people paid me back by referring work to me later. They didn’t, in most cases because I was much further along in my career and they don’t really have enough business to let go of any. I guess there is a philosophy of only helping people who might be in a position to help you back, but I don’t hold to that. Interestingly, in most cases, those CLIENTS came back to me when they had larger projects. Other consultants have referred work to me when they were booked. I haven’t paid back the favor yet, but I remember it. I think what goes around comes around. Karma.

7. Don’t ever rip anybody off. There have been times when a client was really over a barrel. I once had someone who called two days before a grant proposal was due because they had forgotten the research design section. Everyone on the project thought someone else would do it. I came in on another project that was 18 months past due on their evaluation reports and the agency was pulling their funding if they didn’t get it in six weeks. You get the idea. There are times when people are desperate. I DO charge overtime on those occasions, and it may be as much as double our usual fee because I may have to have people doing data entry, editing and other tasks and they all have to be paid overtime. It’s the law. I’ve known people who charge triple or more in those situations. The client will pay it. But they’ll never work with you again. If you can pull out all the stops and get an incredible job done at at reasonable price, they’ll remember you and come back under normal circumstances.  Related to that is charging a fair price for your regular work. I have friends who brag about their consulting rate – which is much higher than mine – but they have no work. Yeah, well, when I’m not working, I charge a million dollars an hour.

8. The biggie is referrals from other clients, and this is how most of our business has come about over the last dozen years, in addition to numbers 4 & 5, but you can’t get client referrals until you have clients. The question was about how you get started.

I haven’t mentioned LinkedIn, CCR (Central Contractor Registry), presenting at conferences, meetups or any of the networking that is supposed to bring you business. That’s because it hasn’t DIRECTLY brought us much business. Some of those things have brought us some business, and maybe my magic partners have met a lot of people that way (I’m pretty sure not).

I say “directly” because what these venues have done sometimes is thrown us in the path of people we already knew, when a conversation would come up that, “Hey, we need a statistician- ” or “Could you do training on — ”

Also, LinkedIn didn’t come up until we had been in business for decades, and I don’t think I heard of CCR until about a dozen years ago. The size of contracts we get from these sources would probably be a much bigger proportion of the business for someone just starting out.

Well, good luck!

I’m off to Grand Forks in the morning and I had better pack. I kind of like North Dakota in the summer, but winter is a whole different ball game. Which comes to the subject of another post – are you insane to be considering going into the consulting business?

“How do you get business?”

I get asked this a lot, and since I doubt random people I meet at conferences are truly interested in my life, I think what they really want to know is:

“How can I get business as a consultant?”

I’ve been working as a consultant for over thirty years. Here are the steps, in order of how I got started in the business. Hope this helps.

1. Be able to DO something that has an impact your client can SEE. Leave the client with a concrete product  – a grant proposal submitted, database that now can be used to produce reports, statistical analysis and final report to a funding agency, a program that automates your monthly charts. The consultants I know who provide “coaching” or “strategy” have a harder time finding work because no one is quite sure what you’re going to do, if you did it well or how to tell when you’re done.

2. Go to graduate school and stand out. I started out with two projects that were referred to me by a professor who had more consulting work than he could do. When I went looking for more work, I could point to those successful projects. I have been hired on several occasions by former professors. By “stand out”, I mean academically. I published articles while a graduate student, presented at conferences, received two fellowships and a small grant. In my statistics classes, I was at the top of the curve. Be genuinely interested in your field and try to learn as much as you possibly can, more than is just on the test. Work on campus. While you may get noticed if you are working full-time and just on campus for class, it is certainly less likely.

3. Work at a university. I was a professor full-time for seven years and part-time, on and off, for the next 14. There is a lot of questioning in the media currently about the value of a higher education. I think it depends on the type and quality of your education. Certainly there are a lot of really smart people at universities. Most people don’t meet a lot of statisticians and as my students graduated, moved on in  their careers and came to the point where they needed a program evaluator or statistician or statistical programmer, they thought of me. Others didn’t know me but they called their alma mater and talked to someone, a business professor, Multicultural Student Programs, or whoever they happened to know and that person recommended me.

4.Write the proposals yourself. Several of my first major contracts came from proposals I helped write, either as part of my duties at the university, or on a consulting contract. The client needed an evaluator, they didn’t have anyone in mind, so I offered to throw in my resume. (By the way, I get LOTS of people asking if I will write the grant for free and then they’ll give me the evaluation contract if it gets funded. I always say, “No”.  That is the topic for another post.)

5. Find partners who bring you business. After about ten years, I started working with partners who were very good at finding business. I don’t know how they do it. Magic, I think. If you’re good at the technical side and you find some partners who are good at the business development side, count your blessings and don’t EVER, EVER regret the share of the money they take. I have acquaintances who don’t do a fraction of the business I do and a big reason is that they don’t see why they should share any part of “their” money. I just shake my head.

6. Be generous with your referrals. Many times, I have referred clients to graduate students or other professors who were interested in starting out as consultants. Partly I did this to pay back the help I got early on. My less altruistic reason is that it is not a good use of my time or the client’s money to be doing t-tests in SPSS or correlations with Excel. It’s one thing if that’s part of a project, but if it’s the whole project, they’d be better off paying someone $40 an hour and I’d be better off working on a contract for $20,000 or $200,000 instead of one for $2,000. It would be nice to say that these people paid me back by referring work to me later. They didn’t, in most cases because I was much further along in my career and they don’t really have enough business to let go of any. I guess there is a philosophy of only helping people who might be in a position to help you back, but I don’t hold to that. Interestingly, in most cases, those CLIENTS came back to me when they had larger projects. Other consultants have referred work to me when they were booked. I haven’t paid back the favor yet, but I remember it. I think what goes around comes around. Karma.

7. Don’t ever rip anybody off. There have been times when a client was really over a barrel. I once had someone who called two days before a grant proposal was due because they had forgotten the research design section. Everyone on the project thought someone else would do it. I came in on another project that was 18 months past due on their evaluation reports and the agency was pulling their funding if they didn’t get it in six weeks. You get the idea. There are times when people are desperate. I DO charge overtime on those occasions, and it may be as much as double our usual fee because I may have to have people doing data entry, editing and other tasks and they all have to be paid overtime. It’s the law. I’ve known people who charge triple or more in those situations. The client will pay it. But they’ll never work with you again. If you can pull out all the stops and get an incredible job done at at reasonable price, they’ll remember you and come back under normal circumstances.  Related to that is charging a fair price for your regular work. I have friends who brag about their consulting rate – which is much higher than mine – but they have no work. Yeah, well, when I’m not working, I charge a million dollars an hour.

8. The biggie is referrals from other clients, and this is how most of our business has come about over the last dozen years, in addition to numbers 4 & 5, but you can’t get client referrals until you have clients. The question was about how you get started.

I haven’t mentioned LinkedIn, CCR (Central Contractor Registry), presenting at conferences, meetups or any of the networking that is supposed to bring you business. That’s because it hasn’t DIRECTLY brought us much business. Some of those things have brought us some business, and maybe my magic partners have met a lot of people that way (I’m pretty sure not).

I say “directly” because what these venues have done sometimes is thrown us in the path of people we already knew, when a conversation would come up that, “Hey, we need a statistician- ” or “Could you do training on — ”

Also, LinkedIn didn’t come up until we had been in business for decades, and I don’t think I heard of CCR until about a dozen years ago. The size of contracts we get from these sources would probably be a much bigger proportion of the business for someone just starting out.

Well, good luck!

I’m off to Grand Forks in the morning and I had better pack. I kind of like North Dakota in the summer, but winter is a whole different ball game. Which comes to the subject of another post – are you insane to be considering going into the consulting business?

Lately, I’ve been missing some of my former colleagues at the USC Medical School. This is not just because they are super-nice people, which they are, but also because they used to ask for different types of statistics, and I do think variety is the spice of life – except for in marital relationships where it is the spice of divorce courts.

Many of the physicians I’ve worked with deal with small sample sizes, especially if they are just looking at their own practices. Not wanting to violate any confidentiality agreements here, let’s make up a disease, say, fear of naked mole rats, or nakedmoleratophobia . In the normal course of one’s practice, you may only see a couple dozen people a year who have this malady.

Thus, many of the medical studies on which I have been a consultant involve small sample statistics. I haven’t done a lot of that lately, so as I was coveting a Mann-Whitney U (used in place of an independent t-test) or a Wilcoxon signed rank.

I ran through what did I have that could be a small sample and produce an answer to a question that interested me, and here is what I have been thinking about —

I’ve heard most of my life from most of the experts that allowing gifted children to skip grades and attend school with children older is a bad idea.  Short version – my brother and I both started college at 16. I thought it was a terrific advantage and two of my daughters began college before age 18. My brother thought it was a bad idea and neither of his children began college early.

I got to wondering specifically about males who were accelerated, since you hear that boys mature later. Another common belief is that boys are less verbal. While I was wondering, I noticed in the TIMSS data that there were 31 males who were younger than the typical age range – that is younger than 13.5 at the time of the test.

I wondered if, given the bias against promoting children, and especially boys, whether these young boys would be exceptionally advanced. I also wondered if they would be doing relatively better in mathematics than in science since, based on my completely casual observations, it seemed like middle school science requires a lot more reading than middle school mathematics does.

Both mathematics and science on the TIMSS are measured on a scale with a mean of 500, so I thought I could compare these using a Wilcoxon signed rank test. In case you didn’t know, this is a non-parametric test used with small sample sizes with related measures. Kind of like a paired t-test for non-normal data.

There are all sorts of statistical packages you could use to do this, and with small sample sizes like the one I have, you can even do it by hand. I happened to use SAS. I was going to try it with SPSS also but that would have required moving at least four feet to the computer and desk behind me. (Yes, my office does have two desks and two computers. What of it?)

It’s quite simple, really. You create a difference score by subtracting one variable from the other and then do a PROC UNIVARIATE. (This page from the University of Delaware gives a few other ways to do it. It also has a picture of a turkey’s head, which is something you don’t see that often. You also don’t hear much from Delaware. They are awfully quiet there. They are probably up to something.)
data smartboys ;
set sm ;
where itsex = 2 and round(bsdage) = 13 ;
diff = BSMMAT01 – BSSSci01 ;
proc univariate data = smartboys ;
var   diff ;

This gives me the following:

Tests for Location: Mu0=0

Test           -Statistic-    —–p Value——

Student’s t    t  -0.07961    Pr > |t|    0.9371
Sign           M      -1.5    Pr >= |M|   0.7201
Signed Rank    S       -24    Pr >= |S|   0.6458

Plus a bunch of other stuff.

Well, clearly, there is a non-significant difference between their scores in mathematics and science. This isn’t very surprising when you learn that their average score in mathematics is 535.7 and in science 537.2 . So, it is a really small difference and not at all what I expected. Also, from looking at the PROC UNIVARIATE output for the mathematics and science scores, it was obvious that the distributions were quite normal and I could have gone ahead and used a paired t-test. When I looked at the t statistic, shown above and helpfully included as part of the univariate output, it can be seen that the difference is even less significant.

HOWEVER — and here is where it is useful and highly recommended to know something about your data – it turns out that the mean scores for the U.S. are anything BUT identical. In fact, the mean for U.S. students in mathematics is 508 with a standard deviation around 77, while the mean for science is about 520 with a mean of 84. So, these young boys are about .37 standard deviations above their peers in mathematics and about .23 standard deviations higher in science. In fact, when I compared them to the other students, these boys WERE significantly higher than their peers in mathematics but not in science.

data testboys ;
set lib.statsfile ;
if itsex = 2 and round(bsdage) = 13 then smb = 1 ;
else smb = 0 ;
proc ttest data = testboys ;
class smb ;
var bsmmat01 bsssci01 ;

I had thought, given that there seems to be a prejudice against starting school early or skipping grades, both in general and especially for boys, that these boys would have to be amazingly ahead of their peers. As you can see, that isn’t the case. Yes, they were ahead, and yes, in mathematics it was statistically significant, but they weren’t far out there on the right of the normal curve.

On the other hand, most of them were doing quite fine, thank you, and being youngest in their classes didn’t seem to be affecting them in any negative way, at least, not academically.

Of course, since it did turn out that the data were quite normal, I could have just simply done a paired t-test, as so:

proc ttest data = smartboys ;
paired BSMMAT01 * BSSSci01  ;

Of course,  this will give me the EXACT same result as for the t-test in the univariate output above, with one less step because I don’t need to use a data step and create a variable which is the difference between the two.

However, I got to do my Wilcoxon signed rank test, I got an answer to my questions, in fact, for the question of math vs science, I got two answers, and they both agreed. On top of it all, the world’s most spoiled 13-year-old received a letter today telling her that she was accepted for the Summer Scholars program, despite not being 12, or a boy, (which since it is a program for high-achieving girls, actually worked in her favor).

So, I am satisfied and fulfilled. It’s just another sunny day in paradise.

I’m doing a workshop at the San Diego SAS users group meeting on Wednesday and  had suggested opening the session with a clip of my daughter’s last amateur fight.  Someone politely commented,

“Uh, I guess that would be okay, if it was, uh, relevant.”

Fair question, how can martial arts be related to statistics or to programming?

I was world judo champion, so I think I can claim a bit of knowledge of martial arts. In teaching over the years,  I have seen thousands of up and coming young players, what I would consider the programming equivalent of those at the intermediate level -no longer a novice but not quite to the expert level yet, either. What the most promising of those martial artists have in common with the most promising young programmers and statisticians is, unfortunately, too often the same thing. They are in a hurry. They believe their own press.

They are enamored of the latest technique someone is doing in the Olympics or they want to do whatever the newest form of complex sampling – Rasch – IML – hierarchical -neural network model is without nailing down the basics first.

Here is what I have learned:

  1. Get off to a good start – make sure that you have the correct data set. Seems pretty obvious, doesn’t it? About once a year, someone sends me the wrong data, data from the previous year or month, the data set that was not corrected for invalid data, etc.
  2. Nail down the basics – make sure you completely understand the data you will be using. Do a reality check. Does an average income of $120,000 a year make sense to you? It’s amazing to me the number of times  that people think not having ERROR show up in the log means that there are no errors in the program. Don’t just count on automated rules like there should be a non-negative minimum for age, weight, height, etc. Some of the biggest screw-ups I have seen are because the programmer did not reverse code the items before scoring. It wasn’t that the person didn’t know to do this, he or she just didn’t think of doing it. Just like in martial arts, the things that are fundamental should be over-learned until they are a reflex.
  3. Automate what you can – I did the same “boring” matwork drills 100 times a night for year after year until I did them almost as a reflex. When my daughter hits certain positions, she will automatically spin out and land on her feet or rotate into an armbar. With programming, it’s even easier. If you do the same thing over and over, turn it into a macro.
  4. Automation takes time – just like the boring drills, people resist writing macros because it takes time, and it seems, when you are doing it, to take time from the really important things that are going to make you better. (I already KNOW that armbar, Sensei, why are we doing it again?) I’d be embarrassed to tell you for how many projects I wrote essentially the same code before sucking it up, taking the time to turn it into a macro and rarely thinking about it again.
  5. There are many ways to the same goal. Whether you are using SAS, Ruby, SPSS or whatever your flavor of the month is, there are multiple ways to parse text, test relationships, validate your data.
  6. Size matters. What works on an opponent (or data) that is really big may be inefficient or inappropriate in a smaller situation.
  7. You can’t learn it all from a book. This is a rather discouraging fact since I am just now writing a book on training champions in martial arts. The fact is, though, most statisticians I have met came out of graduate school unprepared for the real world. I hate that term by the way. I worked at universities for many years and if they really are an alternate universe, I think they should have flying cars and a unicorn or two. Still, one way in which universities do resemble an alternate universe is that data are all perfect and you’re often told what statistical test you need to use. It’s really very weird to me – you’re asked a lot to prove theorems and equations, which you can look up, but the stuff you can’t look up, like handling missing data or drawing conclusions based on incomplete and imperfect data, doesn’t come up nearly as often as it should.
  8. People can train you, but once you’re an expert, you’re on your own. When you’re out there on the mat fighting, you need to figure out the right thing to do all on your own. Many years ago, I was visiting my former advisor. I showed him an article I was working on at the time and asked his opinion if the analysis and conclusions were correct. I was a little dismayed when he said, “Probably. Your guess is as good as mine. What are you asking me for? You know this stuff as well as I do. Look, there comes a point when you aren’t a student any more. You can consult with other people, you can read books, but in the end, you find the answers for yourself and they’re as right as you know how to make them. That’s it. No one has the answer key for the whole field, you know.”

As they say in martial arts – and then the student becomes the teacher. That can be exhilarating in many ways, but I must confess that both in statistics and in martial arts, there are days when I say to myself, “Damn, I wish I could find that answer key for the whole field!”

Next Page →