There may come a day (shudder) when I am called upon to find what my mother refers to as “a real job”. I’m not sure how I would go about it. For the past 30 years, here is how my career has gone.
- I was walking in the door at the university where I had dropped in to visit a friend. A professor I’d known as a student years ago was walking out the other door. She said, “Hey, AnnMaria, I need a statistical consultant. Are you available?”
- I was walking in the door of my obstetrician’s office when I ran into someone else I had known years ago. She had been seeing another doctor in the same office. She said, “Hey, I just got a grant and need a statistician.”
- I was bored one day and applied for a job on Dice.com. They called up, interviewed me and hired me.
- I was bored one day and applied for a job I saw advertised in the Chronicle of Higher Ed. They interviewed me and hired me.
I think one can clearly detect a pattern here, mainly that I should spend more time walking in doors to buildings.
When getting a new job, I’ve generally been in the work equivalent of “married but looking”. I know that sounds horrible but what I mean is that I have had a job that I was considering getting out of, but I didn’t necessarily want the people at the job to know that.
This problem is common to people in any field, but I think those in analytic jobs have another problem.
Enter analyst finder. This is the new company started by Art Tabachneck of SAS fame. If you’ve been using SAS for any length of time at all, you’ve run across his papers and if you live in Canada and drive a car you have been affected by his work. He uses SAS to set automobile insurance rates.
I checked out the site and it takes less than 15 minutes to fill out a form to be included in their data base. The really cool thing is that it asks about so many areas of expertise – what industries you have had experience, are you familiar with SAS, SQL, ANCOVA … it is a very, very long list – but you can just check off the boxes that apply to you.
If I was actually looking for a job, I might have spent a little time filling in the “essay questions” that allow you to expand on your credentials as well.
How it works
Currently, Art is compiling a database of analysts. Once this is of reasonable size, employers will be able, for a very modest fee – around $300 – to submit position descriptions. Analysts who match those descriptions will be contacted and asked if they are interested. The 20 names with the closest match who have expressed interest will be sent to the employer with contact information.
As an employer, it sounds like a great service. If I’m ever in the market for a “real job”, as an employee, it’s the first place I would hit up.
So … go check it out. It’s totally free to analysts, which is very broadly defined. If you’re interested, download the form, fill it out and send it back.
It’s a more scientific method for running around the city walking through doors hoping you run into someone who offers you a job.
Speaking of which, I need to be walking in the door of my office in less than 8 hours, so I guess I’ll call it a night.
This is the most depressing chart I have seen in a long time. Below are the results of our pretest on knowledge of fraction operations of 322 students in grades 3 through 7, attending schools on and adjacent to American Indian reservations.
These are questions like,
“Drag 6/1 to the correct spot on the number line.”
Which was one of only two questions that at least 50% of the children answered correctly.
Identify the letter that marks 7/8 on a number line
14% of the children answered that right.
Then there are the word problems,
“Bob and Ted painted a wall. Bob painted 1/5 of the wall and Ted painted 2/5 of the wall. How much of the wall is left to paint?”
38% of the children answered that correctly.
Looks like they did better on item 7, which asks which of these statements is true
5/ < 3/4
2/8 < 1/4
3/6 = 6/
2/ = 4/5
26% of them got that correct. Guess what? That was one of the few multiple choice items on the test, so random guessing would have gotten it correct 25% of the time.
This is a test of what is ostensibly third- through fifth-grade math. Two-thirds of the test is at the fourth-grade level or below. As our results indicate, the majority of the students who took the test would not understand what that statement means.
For the 163 fifth-graders who took our pretest, the mean score was 28%.
For the 114 fourth-graders, the mean was a dismal 14.7%.
It wasn’t that the students didn’t try. I looked and there were very few places they left the items blank. They simply did not know.
These students came from several different schools, and while there may be differences between schools, there is nothing to suggest one school with abysmal results pulled down all of the others.
I called our lead cultural consultant, Dr. Erich Longie, out at Spirit Lake, and told him that I was concerned about presenting these results to the schools that they might want to shoot the messenger. After all, it is important to us that these schools continue to provide us their input and guidance. He told me not to worry about it too much.
“They know,” he told me, “As someone who has been a teacher and administrator in schools on the reservations, I’m not surprised by the results and I can’t imagine these schools will be, either. What we all ought to be worried about is making sure that the post-test scores don’t look like this.”
So … students will start playing Fish Lake in the schools next month. No pressure here.
Excuse me while I get back to work.
I was going to call this new category for my blog
“Mama AnnMaria’s advice on not getting your ass fired” but it turned out to be too long to fit in the box.
It may surprise young people in the work place to find out that people who admit to having screwed up are often valued more as employees than those who are blameless.
Who cares whose fault it is?
One of the things that drives me crazy is when the first thing (and sometimes the second and third thing) an employee does in response to a problem is to find proof that it was not his or her fault. There are a whole lot of reasons why this is stupid, bad and will eventually get your ass fired.
Are you exclaiming.
What? Why would you fire the one person who never makes a mistake?
Well, for starters, you are clearly delusional. Everybody makes mistakes so if you are convinced you NEVER make mistakes, it is never your fault, then you have a tenuous grasp on reality that you may suddenly lose one day and begin mowing down your co-workers with an Uzi, convinced that they are evil demon zombies out to eat your non-mistake-making perfect brain. As a responsible employer, I cannot take that chance.
Next is the fact that you are wasting time and energy. You could have found the missing data and gotten it to Dr. Cflange. Instead, you put your effort into finding that email from seven months ago where Bob said we didn’t need to worry about sending the data to Dr. Cflange to prove that it wasn’t your fault that the data was not sent to our collaborator, after all, Bob told you not to bother. So, here we are, three hours later and Dr. C still hasn’t gotten the data. Besides, the fact that Bob told you that seven months ago when Dr. Cflange was in Uzbekistan does not absolve you of responsibility of sending out that data any time until the end of the world. Plus, Bob hates you now.
Which brings me to my next point – if you are always claiming you are blameless, then by implication, you are blaming someone else. Your boss is not stupid.
It’s like that time when my mom came home and the front window was broken. She asked what happened and we all swore up and down that we had nothing to do with it. She asked,
“So, you were all just standing around and the glass just fell out of the window?”
We all swore that yes, it had happened exactly like that.
(Mom, if you are reading this, it wasn’t me that pushed one of the Slattery boys into the window. Just so you know.)
Unlike me, who did not throw said sibling under the bus, if you are pointing at Bob and saying,
“It was him, it’s his fault, not me!”
Then, guess how likely Bob is to be inclined to help you out in the future. So … people who are always blaming everyone around them are not going to have as good teamwork with their co-workers.
Listen carefully here, because this next part is really important. Let’s assume the people you work with are not idiots, that there is a reason you are working for them instead of them working for you. Let’s call that reason -“experience”. Not being idiots, your bosses realize that everyone makes mistakes.
Employers are not looking for people who never make mistakes. Those people don’t exist. They are looking for people who can fix problems.
Final two reasons never taking responsibility for any mistake is going to eventually get your ass fired –
If every time an issue comes up it’s like an argument before the Supreme Court to get you to address it because you are so involved in gathering your evidence why it was not your fault, eventually people will quit pointing out problems to you because it’s just not worth the hassle.
If you never believe that any problem is your fault, then you will never get any better at preventing them, because none of the problems that occur have anything to do with you.
The most impressive interactions I have with employees often begin like this:
“That was my mistake that X happened. I would like to take the responsibility of fixing it by doing Y.”
Those people are probably never going to get their asses fired.
Now you know. Act accordingly.
What if you wanted to turn your PROC MIXED into a repeated measures ANOVA using PROC GLM. Why would you want to do this? Well, I don’t know why you would want to do it but I wanted to do it because I wanted to demonstrate for my class that both give you the same fixed effects F value and significance.
I started out with the Statin dataset from the Cody and Smith textbook. In this data set, each subject has three records,one each for drugs A, B and C. To do a mixed model with subject as a random effect and drug as a fixed effect, you would code it as so. Remember to include both the subject variable and your fixed effect in the CLASS statement.
Proc mixed data = statin ;
class subj drug ;
model ldl = drug ;
random subj ;
To do a repeated measures ANOVA with PROC GLM you need three variables for each subject, not three records.
First, create three data sets for Drug A, Drug B and Drug C.
Data one two three ;
set statin ;
if drug = ‘A’ then output one ;
else if drug = ‘B’ then output two ;
else if drug = ‘C’ then output three ;
Second, sort these datasets and as you read in each one, rename LDL to a new name so that when you merge the datasets you have three different names. Yes, I really only needed to rename two of them, but I figured it was just neater this way.
proc sort data = one (rename= (ldl =ldla)) ;
by subj ;
proc sort data= two (rename = (ldl = ldlb)) ;
by subj ;
proc sort data=three (rename =(ldl = ldlc)) ;
by subj ;
Third, merge the three datasets by subject.
data mrg ;
merge one two three ;
by subj ;
Fourth, run your repeated measures ANOVA .
Your three times measuring LDL are the dependent . It seems weird to not have an independent on the other side of the equation, but that’s the way it is. In your REPEATED statement you give a name for the repeated variable and the number of levels. I used “drug” here to be consistent but actually, this could be any name at all. I could have used “frog” or “rutabaga” instead and it would have worked just as well.
proc glm data = mrg ;
model ldla ldlb ldlc = /nouni ;
repeated drug 3 (1 2 3) ;
Now you can be happy.
I have been teaching at the post-secondary level since 1987, at schools ranging from a small liberal arts college in North Dakota to the second-largest non-profit university in the country. I’ve taught at private schools and public ones, and courses ranging from first year undergraduate to doctoral students. In all of those situations, some students aced the courses and some students failed. The difference between those students was NOT as some might believe, that the students with A’s had some sort of magical math gene the others didn’t. Nope. Here are seven tips how not to fail a college math class.
- Have the textbook when the class starts. Textbooks are required for a reason. That reason is primarily that the instructor does not have the time to tell you in the lecture everything that might be useful. Every course I have taught, at least one student tells me that he or she does not have the textbook yet. This makes me wonder, “Did you not know you were going to take this course?” , because I am pretty certain that I told the university the book that would be required two months ago. Even if you have an excellent reason for not having the textbook, falling a week behind in the reading makes the class more difficult.
- Read the assigned readings. You are supposed to read them. That is what “assigned” means. See #1. Also, some of the stuff you learn might not be so easy. This is why it is good to go over it twice, once in the lecture and once by reading it.
- Attend all of the lectures. It can’t hurt. See #2. Very few professors are so terrible that you cannot learn anything from them. If you think the professor is difficult to understand, perhaps it is because you did not read the assigned readings before the class so this is the first time you have been exposed to this material. Maybe you missed the last lecture where he or she explained the information that is PREREQUISITE to understanding the information covered in this lecture.
- If you still don’t understand, read the textbook again. I was an excellent student in statistics. It is what I specialized in for my Ph.D. (along with Tests & Measurement). The only statistics courses I did not get an A in, I got an A+. And still … there were many times when I read the textbook, thought I understood it, tried the problems at the end of the chapter and realized I didn’t understand it so well after all. So, I read the chapter again. Sometimes for a third time.
- Don’t try to cram at the last minute. Math builds on itself. If you did not understand chapter two, you are going to have a hard time with chapter three. If you just read it for the first time at 3 a.m. the night before the final exam, I’m guessing you didn’t understand chapter two very well.
- Ask for help as soon as you don’t understand something. How to ask for help is a whole post in itself.
- Don’t study drunk or high. This may sound like really unnecessary advice but I see people doing it. Most often it is because they are young and stupid, so drinking and getting high is part of what they do in college. Sometimes, they have fallen behind, are stressed out about not doing well in their math classes (often due to numbers 1 through 6 above), so they have a drink or smoke a joint so they can relax a little before tackling the books. “Hey, you know what would improve my ability to estimate variance? The same substance that so impairs my ability to estimate distance that they make it illegal to use while driving!”
A common factor in the first six of these is that math is cumulative. You can have messed up on the section in a literature course on whatever it is you were supposed to learn about Jane Eyre , pick up the next assigned book, Great Expectations, and still get an A on the test on that book. (I don’t say this from personal experience, having avoided English courses like the plague, but I have witnessed it done by other people. )
So … the next time you take a math class, try the tips above and see what happens. Maybe it is hard. Maybe it takes you a lot more work than you had anticipated. That is good, because when you graduate from college you will learn that the hard stuff is what people pay you to do. You can read Jane Eyre on your own time. (Sorry, English teachers).
Shameless plug – It’s Small Business Saturday
Learn about math AND support small business
Any time you learn anything new it can be intimidating. That is true of programming as well as anything else. It may be even more true of using statistical software because you combine the uneasiness many people have about learning statistics with learning a new language.
To a statistician, this error message makes perfect sense:
ERROR: Variable BP_Status in list does not match type prescribed for this list.
but to someone new to both statistics and SAS it may be clear as mud.
Here is your problem.
The procedure you are using, PROC UNIVARIATE , PROC MEANS is designed ONLY for numeric variables. You have tried to use it for a categorical variable.
This error means you’ve used a categorical variable in a list where only numeric variables are expected. For example, bp_status is “High”, “Normal” and “Optimal”
You cannot find the mean or standard deviation of words, so your procedure has an error.
So … what do you do if you need descriptive statistics?
Go back to your PROC UNIVARIATE or PROC MEANS and delete the offending variables. Re-run it with only numeric variables.
For your categorical variables, use a PROC FREQ for a frequency distribution and/ or PROC GCHART.
I’ve been busy my whole life. Right now, I’m finishing the last week of a course I’m teaching on biostatistics, writing a lecture for a course on multivariate statistics that starts next week, fixing bugs in our next game, Fish Lake, working on a new project for free resources for teachers, and working on a final grant report. Writing this, I just remembered a couple of things I needed to do.
Driving 90 miles to take The Spoiled One back to school and then turning right around and driving 90 miles home seemed like a waste of time that I did not have. The Invisible Developer pointed out that he had work to do also on the spear fishing part of the game and that he had picked her up on Friday.
So … away we went, and since she recently got her learner’s permit, The Spoiled One drove on the freeway for the first time. This was interesting in itself, since the 101 regularly makes the list of 10 most congested freeways in America.
Not only did she get nearly two hours of practice in driving, but I also got filled in on all of the latest news on her soccer team, college fairs, the campuses she was interested in visiting and life in general. If your child is 16 and still talks to you in a civil tone for two hours straight, count yourself among a lucky minority of parents.
Having raised four daughters, I know whereof I speak.
When we got to the school, she immediately began complaining (she’s not called The Spoiled One for nothing). According to her, she is living in “hell”. (See picture below for what hell looks like. It is surprisingly more scenic than I had imagined.)
What is so infernal about her school, I asked. They make her study. Even on Sundays. There is a study hall from 7 to 9 pm and she has to walk across the yard to get to the building. Yes, like prison.
Just as she was telling me this, I saw something in front of her dorm. It was a deer! I said we should go take pictures of it and she said we’d never be able to get close enough, and besides we were wasting time. She had to get to study hall and put away her clothes and books in her dorm room. Besides, her religion teacher had told the students to stay away from the deer because coyotes track them and students who got too close could get attacked by coyotes. (You would think a nun wouldn’t just go around making shit up, now wouldn’t you? Having spent a good bit of the last twenty-five years in North Dakota, I’m justifiably skeptical of the deer-coyote-mauled prep school student triumvirate.)
Just then, the deer walked through the gate on to the baseball field and I spotted a second one in there. So, we sneaked up on them and took pictures.
That’s when it occurred to me that sometimes the best use of my time is to “waste it”. Really, what better way to spend my time than talking to my daughter and watching deer grazing as the sun sets in the mountains.
But now, I really do need to finish that lecture.
In August, I attended a class at Unite 2014 (on Unity game development) and the presenter said,
“And some of you, your code won’t run and you’ll swear you did exactly what was shown in the examples. But, of course, all of the rest of us will know that is not true.”
This perfectly describes my experience teaching. For example, the problem with the LIBNAME.
I tell students,
Do not just copy and paste the LIBNAME from a Word document into your program. Often, this will cause problems because of extra formatting codes in the word processor. You may not see the code as any different from what you typed in, but it may not work. Type your LIBNAME statement into the program.
Apparently, students believe that when I say,
Do not just copy and paste the LIBNAME statement.
either, that what I really mean is,
Sure, go ahead and copy and paste the LIBNAME statement
or, that I did mean it but that is only because I want to force them to do extra typing, or because I am so old that I am against copying and pasting as a new-fangled invention and how the hell would I know if they copied and pasted it anyway.
Then their program does not work.
Very likely, their log looks something like this:
58 LIBNAME mydata “/courses/d1234455550/c_2222/” access=readonly;
59 run ;
NOTE: Library MYDATA does not exist.
All quotation marks are not created equal.
What you see above if you look very closely is that the end quote at the end of the path for the LIBNAME statement does not exactly match the beginning quote. Therefore, your reference for your library was not
but rather, something like
/courses/d1234455550/c_2222/ access=readonly run ;
Which is not what you had in mind, and, as SAS very reasonably told you, that directory does not exist.
The simplest fix: delete the quotation marks and TYPE in quotes.
LIBNAME mydata ‘/courses/d1234455550/c_2222/’ access=readonly;
If that doesn’t work, do what I said to begin with. Erase your whole LIBNAME statement and TYPE it into the program without copying and pasting.
Contrary to appearances, I don’t just make this shit up.
Computing confidence intervals is one of the areas where beginning statistics students have the most trouble. It is not as difficult if you break it down into steps, and if you use SAS or other statistical software.
Here are the steps:
1. Compute the statistic of interest– that is mean, proportion, difference between means
2. Compute the standard error of the statistic
3. Obtain critical value. Do you have 30 or more in your sample and are you interested in the 95% confidence interval?
- If yes, multiply standard error by 1.96
- If no (fewer people), look up t-value for your sample size for .95
- If no (different alpha level) look up z-value for your alpha level
- If no (different alpha level AND less than 30) look up the t-value for your alpha level.
4. Multiply the critical value you obtained in step #3 by the standard error you obtained in #2
5. Subtract the result you obtained in step #4 from the statistic you obtained in #1 . That is your lower confidence limit.
6. Add the result you obtained in step #4 to the statistic you obtained in #1. That is your upper confidence limit.
Simplifying it with SAS
Here is a homework problem:
The following data are collected as part of a study of coffee consumption among undergraduate students. The following reflect cups per day consumed:
3 4 6 8 2 1 0 2
A. Compute the sample mean.
B. Compute the sample standard deviation.
C. Construct a 95% confidence interval
I did this in SAS as so
data coffee ;
input cups ;
proc means mean std stderr;
var cups ;
I get the follow results.
|Analysis Variable : cups|
|Mean||Std Dev||Std Error|
These results give me A and B. Now, all I need to do to compute C is find the correct critical value. I look it up and find that it is 2.365
3.25 – 2.365 * .94 = 1.03
3.25 + 2.365 * .94 = 5.47
That is my confidence interval (1.03, 5.47)
If you want to verify it, or just don’t want to do any computations at all, you can do this
Proc means clm mean stddev ;
var cups ;
You will end up with the same confidence intervals.
Prediction: At least one person who reads this won’t believe me, will run the analysis and be surprised when I am right.
A recent tweet about mixed martial arts decisions set me to thinking about probability. @Fight_ghost tweeted that a TV commentator made no sense when she said that she thought a fighter should have won by split, not unanimous decision. Others on twitter agreed with him that was a stupid comment, and asked did she think judges should say the other fighter only 2/3 won or what.
I thought it did make sense in statistical terms. Think of it this way:
The “true score” of the population in this case is the mean of what an infinite number of judges would rate a fighter’s performance. Of course, there is going to be variation around that mean. Some judges may tend to weight take downs a tiny bit more. Judges vary in their definition of a significant strike. Some judges are just going to be clueless or inattentive and give a score that is far from accurate. On the average, though, these balance out and the mean of all of those infinite judges’ scores should be the true score. Let’s say our fighter, Bob, had a true score of 27. The most common score we should see a judge give him is 27, but a 26 or 28 would not be totally unexpected. Given that the standard deviation of fight scores is low, we would be surprised to see him given a score of 25 or 29 and completely floored if he received a 24 or a 30.
Let’s say we have a second fighter, Fred. His true score is 29. The most common score we should see for him is a 29, but again, a 28 or a 30 would not be unexpected because there is variation in our sample of judges.
Here is the point … when fighters are far apart in the true score of their performance, judges should very seldom have a difference of opinion in who won. Even when Bob is scored high, for him, at 28 and Fred is scored his average of 29, Fred still wins. Let’s say the standard deviation of judge’s scores is 1. I believe it is really lower than that and I do know that the winner of a round has to get 10 points, but for ease of computation, just go with me.
For Bob to win, he must be rated at least two standard deviations above his true score (which occurs 2.5% of the time) and Fred must be rated below his true score, which occurs half the time. Since the scores for Bob and Fred are independent probabilities the probability of BOTH of these events happening is .025 x .5 = .0125
The other way for Bob to win is if Fred scores two standard deviations below his true score, which will occur 2.5% of the time AND for Bob to score above his true score. Again, the combined probability is .0125. SO …. only 2.5% of the time (.0125 + .0125) would Bob win. Since judges’ scores are independent, the probability of any one scoring it for him, causing a split decision is .025 + .025 + .025 = 7.5%
(If all three judges scored it for Bob, that would be a very, very low probability of .o25 * .025 * .025 because, again, the judges scores are assumed independent of one another. In only 0.063% of the cases would this occur. We should probably subtract that and the probability of two of them scoring it for Bob to be exact, but I have to finish grading papers tonight so we’ll just acknowledge that it is not exactly 7.5% and move on.)
Let’s go back to the fight that actually happened. I didn’t see it so I am going to take some people’s word that it was a close fight. They might be lying but let’s assume not.
In this case, Bob, who has a true score of 27, is not fighting Fred, but rather, Ignatz, who has a true score of 27.3 (with three judges, he’d get a 27, 27, 28 score). There is great overlap in Bob and Ignatz’s scores. To outscore Ignatz’s average score, Bob would need a score of 27.4 – well, a z-score of .4 occurs about 35% of the time. Half of the time Ignatz is going to score 27.3 or lower so the probability of him both having an average or below score AND Bob having a 27.4 or high score is .5 *.35 or .175. So 17.5% of the time, a judge would give Bob a higher score. Since there are three judges, the probability of ONE of them giving him a higher score would be .175 + .175 + .175 = 52.5%
There is also the small probability that it could go unanimous the other way, but that’s not really pertinent to our argument.
The point is simply this … if two fighters’ true scores are close, it is much less likely that you will see a unanimous decision than if their true scores are really far apart. The closer they are, the more that statement holds. So, no, it is not a stupid comment to say that you believe someone warranted a split decision rather than a unanimous decision. It may simply mean that you think the fighters’ were so close that you were surprised there was not any variance in favor of the only slightly better fighter.
Really, I think most people would find that a reasonable statement.
Extra credit points:
Give one reason why the Central Limit Theorem does not apply in the above scenario.
Answer this question:
Does the fact that the distribution of errors is necessarily non-symmetric in Fred’s case (cannot score above 30) negate the application of the Central Limit Theorem?