(There may even be a part two, if I get around to it.)
Let me ask you a couple of questions:
1. Do you have more than just one dependent variable and one independent variable?
2. If you said, yes, do you have a CATEGORICAL or ORDINAL dependent variable? If so, use logistic regression. I have written several posts on it. You can find a list of them here. Some involve Euclid, marriage, SAS and SPSS. Alas, none involve a naked mole rat. I shall have to remedy that.
3. You said yes to #1, multiple variables, but no to number 2, so I am assuming you have multiple variables in your design and your dependent variable is interval or continuous, something like sales for the month of December, average annual temperature or IQ. The next question is do you have only ONE dependent variable and is it measured only ONCE per observation? For example, you have measured average annual temperature of each city in 2013 or sales in December , 2012. In this case, you would do either Analysis of Variance or multiple regression. It doesn’t matter much which you do if you code it correctly. Both are specific cases of the general linear model and will give you the same result. You may also want to do a general linear MIXED model, where you have city as a random effect and something else, say, whether the administration was Democratic or Republican as a fixed effect. In this case I assume that you have sales as your dependent variable because contrary to the beliefs of some extremists, political parties do not determine the weather. Generally, whether you use a mixed model or an Ordinary Least Squares (OLS) plain vanilla ANOVA or regression will not have a dramatic impact on your results unless the result is a grade in a course where the professor REALLY wants you to show that you know that school is a random effect when comparing curricula.
4. Still here? I’m guessing you have one of two other common designs. That is, you have measured the same subjects, stores, cities, whatever, more than once. Most commonly, it is the good old pretest posttest design and you have an experimental and control group. You want to know if it works. If you have only tested your people twice, you are perfectly fine with a repeated measures ANOVA. If you have tested them more than twice, you are very likely to have grossly violated the assumption of compound symmetry and I would recommend a mixed model.
5. All righty then, you DO have multiple variables, they are NOT categorical or ordinal, your dependent variable is NOT repeated, so you must have multiple dependent variables. In that case, you would do a multivariate Analysis of Variance.
Some might argue that logistic regression is not a multivariate design. Other people would argue with them that, assuming your data are multinomial, you need multiple logit functions so that really is a type of multivariate design. A third group of people would say it is multivariate in the ordinal or multinomial case because there are multiple possible outcomes.
Personally, I wonder about all of those types of people. I wonder about the amount of time in higher education spent in forcing students to learn answers to questions that have no real use or purpose as far as I can see.
On the other hand, while knowing whether something falls in the multivariate category or not probably won’t impact your life or analyses, if you treat time as an independent variable and analyze your repeated measures ANOVA with experiment and condition as a 2 x 2 ANOVA, you’re screwed.
Know your research designs.
There may come a day (shudder) when I am called upon to find what my mother refers to as “a real job”. I’m not sure how I would go about it. For the past 30 years, here is how my career has gone.
- I was walking in the door at the university where I had dropped in to visit a friend. A professor I’d known as a student years ago was walking out the other door. She said, “Hey, AnnMaria, I need a statistical consultant. Are you available?”
- I was walking in the door of my obstetrician’s office when I ran into someone else I had known years ago. She had been seeing another doctor in the same office. She said, “Hey, I just got a grant and need a statistician.”
- I was bored one day and applied for a job on Dice.com. They called up, interviewed me and hired me.
- I was bored one day and applied for a job I saw advertised in the Chronicle of Higher Ed. They interviewed me and hired me.
I think one can clearly detect a pattern here, mainly that I should spend more time walking in doors to buildings.
When getting a new job, I’ve generally been in the work equivalent of “married but looking”. I know that sounds horrible but what I mean is that I have had a job that I was considering getting out of, but I didn’t necessarily want the people at the job to know that.
This problem is common to people in any field, but I think those in analytic jobs have another problem.
Enter analyst finder. This is the new company started by Art Tabachneck of SAS fame. If you’ve been using SAS for any length of time at all, you’ve run across his papers and if you live in Canada and drive a car you have been affected by his work. He uses SAS to set automobile insurance rates.
I checked out the site and it takes less than 15 minutes to fill out a form to be included in their data base. The really cool thing is that it asks about so many areas of expertise – what industries you have had experience, are you familiar with SAS, SQL, ANCOVA … it is a very, very long list – but you can just check off the boxes that apply to you.
If I was actually looking for a job, I might have spent a little time filling in the “essay questions” that allow you to expand on your credentials as well.
How it works
Currently, Art is compiling a database of analysts. Once this is of reasonable size, employers will be able, for a very modest fee – around $300 – to submit position descriptions. Analysts who match those descriptions will be contacted and asked if they are interested. The 20 names with the closest match who have expressed interest will be sent to the employer with contact information.
As an employer, it sounds like a great service. If I’m ever in the market for a “real job”, as an employee, it’s the first place I would hit up.
So … go check it out. It’s totally free to analysts, which is very broadly defined. If you’re interested, download the form, fill it out and send it back.
It’s a more scientific method for running around the city walking through doors hoping you run into someone who offers you a job.
Speaking of which, I need to be walking in the door of my office in less than 8 hours, so I guess I’ll call it a night.
This is the most depressing chart I have seen in a long time. Below are the results of our pretest on knowledge of fraction operations of 322 students in grades 3 through 7, attending schools on and adjacent to American Indian reservations.
These are questions like,
“Drag 6/1 to the correct spot on the number line.”
Which was one of only two questions that at least 50% of the children answered correctly.
Identify the letter that marks 7/8 on a number line
14% of the children answered that right.
Then there are the word problems,
“Bob and Ted painted a wall. Bob painted 1/5 of the wall and Ted painted 2/5 of the wall. How much of the wall is left to paint?”
38% of the children answered that correctly.
Looks like they did better on item 7, which asks which of these statements is true
5/ < 3/4
2/8 < 1/4
3/6 = 6/
2/ = 4/5
26% of them got that correct. Guess what? That was one of the few multiple choice items on the test, so random guessing would have gotten it correct 25% of the time.
This is a test of what is ostensibly third- through fifth-grade math. Two-thirds of the test is at the fourth-grade level or below. As our results indicate, the majority of the students who took the test would not understand what that statement means.
For the 163 fifth-graders who took our pretest, the mean score was 28%.
For the 114 fourth-graders, the mean was a dismal 14.7%.
It wasn’t that the students didn’t try. I looked and there were very few places they left the items blank. They simply did not know.
These students came from several different schools, and while there may be differences between schools, there is nothing to suggest one school with abysmal results pulled down all of the others.
I called our lead cultural consultant, Dr. Erich Longie, out at Spirit Lake, and told him that I was concerned about presenting these results to the schools that they might want to shoot the messenger. After all, it is important to us that these schools continue to provide us their input and guidance. He told me not to worry about it too much.
“They know,” he told me, “As someone who has been a teacher and administrator in schools on the reservations, I’m not surprised by the results and I can’t imagine these schools will be, either. What we all ought to be worried about is making sure that the post-test scores don’t look like this.”
So … students will start playing Fish Lake in the schools next month. No pressure here.
Excuse me while I get back to work.
I was going to call this new category for my blog
“Mama AnnMaria’s advice on not getting your ass fired” but it turned out to be too long to fit in the box.
It may surprise young people in the work place to find out that people who admit to having screwed up are often valued more as employees than those who are blameless.
Who cares whose fault it is?
One of the things that drives me crazy is when the first thing (and sometimes the second and third thing) an employee does in response to a problem is to find proof that it was not his or her fault. There are a whole lot of reasons why this is stupid, bad and will eventually get your ass fired.
Are you exclaiming.
What? Why would you fire the one person who never makes a mistake?
Well, for starters, you are clearly delusional. Everybody makes mistakes so if you are convinced you NEVER make mistakes, it is never your fault, then you have a tenuous grasp on reality that you may suddenly lose one day and begin mowing down your co-workers with an Uzi, convinced that they are evil demon zombies out to eat your non-mistake-making perfect brain. As a responsible employer, I cannot take that chance.
Next is the fact that you are wasting time and energy. You could have found the missing data and gotten it to Dr. Cflange. Instead, you put your effort into finding that email from seven months ago where Bob said we didn’t need to worry about sending the data to Dr. Cflange to prove that it wasn’t your fault that the data was not sent to our collaborator, after all, Bob told you not to bother. So, here we are, three hours later and Dr. C still hasn’t gotten the data. Besides, the fact that Bob told you that seven months ago when Dr. Cflange was in Uzbekistan does not absolve you of responsibility of sending out that data any time until the end of the world. Plus, Bob hates you now.
Which brings me to my next point – if you are always claiming you are blameless, then by implication, you are blaming someone else. Your boss is not stupid.
It’s like that time when my mom came home and the front window was broken. She asked what happened and we all swore up and down that we had nothing to do with it. She asked,
“So, you were all just standing around and the glass just fell out of the window?”
We all swore that yes, it had happened exactly like that.
(Mom, if you are reading this, it wasn’t me that pushed one of the Slattery boys into the window. Just so you know.)
Unlike me, who did not throw said sibling under the bus, if you are pointing at Bob and saying,
“It was him, it’s his fault, not me!”
Then, guess how likely Bob is to be inclined to help you out in the future. So … people who are always blaming everyone around them are not going to have as good teamwork with their co-workers.
Listen carefully here, because this next part is really important. Let’s assume the people you work with are not idiots, that there is a reason you are working for them instead of them working for you. Let’s call that reason -“experience”. Not being idiots, your bosses realize that everyone makes mistakes.
Employers are not looking for people who never make mistakes. Those people don’t exist. They are looking for people who can fix problems.
Final two reasons never taking responsibility for any mistake is going to eventually get your ass fired –
If every time an issue comes up it’s like an argument before the Supreme Court to get you to address it because you are so involved in gathering your evidence why it was not your fault, eventually people will quit pointing out problems to you because it’s just not worth the hassle.
If you never believe that any problem is your fault, then you will never get any better at preventing them, because none of the problems that occur have anything to do with you.
The most impressive interactions I have with employees often begin like this:
“That was my mistake that X happened. I would like to take the responsibility of fixing it by doing Y.”
Those people are probably never going to get their asses fired.
Now you know. Act accordingly.
What if you wanted to turn your PROC MIXED into a repeated measures ANOVA using PROC GLM. Why would you want to do this? Well, I don’t know why you would want to do it but I wanted to do it because I wanted to demonstrate for my class that both give you the same fixed effects F value and significance.
I started out with the Statin dataset from the Cody and Smith textbook. In this data set, each subject has three records,one each for drugs A, B and C. To do a mixed model with subject as a random effect and drug as a fixed effect, you would code it as so. Remember to include both the subject variable and your fixed effect in the CLASS statement.
Proc mixed data = statin ;
class subj drug ;
model ldl = drug ;
random subj ;
To do a repeated measures ANOVA with PROC GLM you need three variables for each subject, not three records.
First, create three data sets for Drug A, Drug B and Drug C.
Data one two three ;
set statin ;
if drug = ‘A’ then output one ;
else if drug = ‘B’ then output two ;
else if drug = ‘C’ then output three ;
Second, sort these datasets and as you read in each one, rename LDL to a new name so that when you merge the datasets you have three different names. Yes, I really only needed to rename two of them, but I figured it was just neater this way.
proc sort data = one (rename= (ldl =ldla)) ;
by subj ;
proc sort data= two (rename = (ldl = ldlb)) ;
by subj ;
proc sort data=three (rename =(ldl = ldlc)) ;
by subj ;
Third, merge the three datasets by subject.
data mrg ;
merge one two three ;
by subj ;
Fourth, run your repeated measures ANOVA .
Your three times measuring LDL are the dependent . It seems weird to not have an independent on the other side of the equation, but that’s the way it is. In your REPEATED statement you give a name for the repeated variable and the number of levels. I used “drug” here to be consistent but actually, this could be any name at all. I could have used “frog” or “rutabaga” instead and it would have worked just as well.
proc glm data = mrg ;
model ldla ldlb ldlc = /nouni ;
repeated drug 3 (1 2 3) ;
Now you can be happy.
I have been teaching at the post-secondary level since 1987, at schools ranging from a small liberal arts college in North Dakota to the second-largest non-profit university in the country. I’ve taught at private schools and public ones, and courses ranging from first year undergraduate to doctoral students. In all of those situations, some students aced the courses and some students failed. The difference between those students was NOT as some might believe, that the students with A’s had some sort of magical math gene the others didn’t. Nope. Here are seven tips how not to fail a college math class.
- Have the textbook when the class starts. Textbooks are required for a reason. That reason is primarily that the instructor does not have the time to tell you in the lecture everything that might be useful. Every course I have taught, at least one student tells me that he or she does not have the textbook yet. This makes me wonder, “Did you not know you were going to take this course?” , because I am pretty certain that I told the university the book that would be required two months ago. Even if you have an excellent reason for not having the textbook, falling a week behind in the reading makes the class more difficult.
- Read the assigned readings. You are supposed to read them. That is what “assigned” means. See #1. Also, some of the stuff you learn might not be so easy. This is why it is good to go over it twice, once in the lecture and once by reading it.
- Attend all of the lectures. It can’t hurt. See #2. Very few professors are so terrible that you cannot learn anything from them. If you think the professor is difficult to understand, perhaps it is because you did not read the assigned readings before the class so this is the first time you have been exposed to this material. Maybe you missed the last lecture where he or she explained the information that is PREREQUISITE to understanding the information covered in this lecture.
- If you still don’t understand, read the textbook again. I was an excellent student in statistics. It is what I specialized in for my Ph.D. (along with Tests & Measurement). The only statistics courses I did not get an A in, I got an A+. And still … there were many times when I read the textbook, thought I understood it, tried the problems at the end of the chapter and realized I didn’t understand it so well after all. So, I read the chapter again. Sometimes for a third time.
- Don’t try to cram at the last minute. Math builds on itself. If you did not understand chapter two, you are going to have a hard time with chapter three. If you just read it for the first time at 3 a.m. the night before the final exam, I’m guessing you didn’t understand chapter two very well.
- Ask for help as soon as you don’t understand something. How to ask for help is a whole post in itself.
- Don’t study drunk or high. This may sound like really unnecessary advice but I see people doing it. Most often it is because they are young and stupid, so drinking and getting high is part of what they do in college. Sometimes, they have fallen behind, are stressed out about not doing well in their math classes (often due to numbers 1 through 6 above), so they have a drink or smoke a joint so they can relax a little before tackling the books. “Hey, you know what would improve my ability to estimate variance? The same substance that so impairs my ability to estimate distance that they make it illegal to use while driving!”
A common factor in the first six of these is that math is cumulative. You can have messed up on the section in a literature course on whatever it is you were supposed to learn about Jane Eyre , pick up the next assigned book, Great Expectations, and still get an A on the test on that book. (I don’t say this from personal experience, having avoided English courses like the plague, but I have witnessed it done by other people. )
So … the next time you take a math class, try the tips above and see what happens. Maybe it is hard. Maybe it takes you a lot more work than you had anticipated. That is good, because when you graduate from college you will learn that the hard stuff is what people pay you to do. You can read Jane Eyre on your own time. (Sorry, English teachers).
Shameless plug – It’s Small Business Saturday
Learn about math AND support small business
Any time you learn anything new it can be intimidating. That is true of programming as well as anything else. It may be even more true of using statistical software because you combine the uneasiness many people have about learning statistics with learning a new language.
To a statistician, this error message makes perfect sense:
ERROR: Variable BP_Status in list does not match type prescribed for this list.
but to someone new to both statistics and SAS it may be clear as mud.
Here is your problem.
The procedure you are using, PROC UNIVARIATE , PROC MEANS is designed ONLY for numeric variables. You have tried to use it for a categorical variable.
This error means you’ve used a categorical variable in a list where only numeric variables are expected. For example, bp_status is “High”, “Normal” and “Optimal”
You cannot find the mean or standard deviation of words, so your procedure has an error.
So … what do you do if you need descriptive statistics?
Go back to your PROC UNIVARIATE or PROC MEANS and delete the offending variables. Re-run it with only numeric variables.
For your categorical variables, use a PROC FREQ for a frequency distribution and/ or PROC GCHART.
I’ve been busy my whole life. Right now, I’m finishing the last week of a course I’m teaching on biostatistics, writing a lecture for a course on multivariate statistics that starts next week, fixing bugs in our next game, Fish Lake, working on a new project for free resources for teachers, and working on a final grant report. Writing this, I just remembered a couple of things I needed to do.
Driving 90 miles to take The Spoiled One back to school and then turning right around and driving 90 miles home seemed like a waste of time that I did not have. The Invisible Developer pointed out that he had work to do also on the spear fishing part of the game and that he had picked her up on Friday.
So … away we went, and since she recently got her learner’s permit, The Spoiled One drove on the freeway for the first time. This was interesting in itself, since the 101 regularly makes the list of 10 most congested freeways in America.
Not only did she get nearly two hours of practice in driving, but I also got filled in on all of the latest news on her soccer team, college fairs, the campuses she was interested in visiting and life in general. If your child is 16 and still talks to you in a civil tone for two hours straight, count yourself among a lucky minority of parents.
Having raised four daughters, I know whereof I speak.
When we got to the school, she immediately began complaining (she’s not called The Spoiled One for nothing). According to her, she is living in “hell”. (See picture below for what hell looks like. It is surprisingly more scenic than I had imagined.)
What is so infernal about her school, I asked. They make her study. Even on Sundays. There is a study hall from 7 to 9 pm and she has to walk across the yard to get to the building. Yes, like prison.
Just as she was telling me this, I saw something in front of her dorm. It was a deer! I said we should go take pictures of it and she said we’d never be able to get close enough, and besides we were wasting time. She had to get to study hall and put away her clothes and books in her dorm room. Besides, her religion teacher had told the students to stay away from the deer because coyotes track them and students who got too close could get attacked by coyotes. (You would think a nun wouldn’t just go around making shit up, now wouldn’t you? Having spent a good bit of the last twenty-five years in North Dakota, I’m justifiably skeptical of the deer-coyote-mauled prep school student triumvirate.)
Just then, the deer walked through the gate on to the baseball field and I spotted a second one in there. So, we sneaked up on them and took pictures.
That’s when it occurred to me that sometimes the best use of my time is to “waste it”. Really, what better way to spend my time than talking to my daughter and watching deer grazing as the sun sets in the mountains.
But now, I really do need to finish that lecture.
In August, I attended a class at Unite 2014 (on Unity game development) and the presenter said,
“And some of you, your code won’t run and you’ll swear you did exactly what was shown in the examples. But, of course, all of the rest of us will know that is not true.”
This perfectly describes my experience teaching. For example, the problem with the LIBNAME.
I tell students,
Do not just copy and paste the LIBNAME from a Word document into your program. Often, this will cause problems because of extra formatting codes in the word processor. You may not see the code as any different from what you typed in, but it may not work. Type your LIBNAME statement into the program.
Apparently, students believe that when I say,
Do not just copy and paste the LIBNAME statement.
either, that what I really mean is,
Sure, go ahead and copy and paste the LIBNAME statement
or, that I did mean it but that is only because I want to force them to do extra typing, or because I am so old that I am against copying and pasting as a new-fangled invention and how the hell would I know if they copied and pasted it anyway.
Then their program does not work.
Very likely, their log looks something like this:
58 LIBNAME mydata “/courses/d1234455550/c_2222/” access=readonly;
59 run ;
NOTE: Library MYDATA does not exist.
All quotation marks are not created equal.
What you see above if you look very closely is that the end quote at the end of the path for the LIBNAME statement does not exactly match the beginning quote. Therefore, your reference for your library was not
but rather, something like
/courses/d1234455550/c_2222/ access=readonly run ;
Which is not what you had in mind, and, as SAS very reasonably told you, that directory does not exist.
The simplest fix: delete the quotation marks and TYPE in quotes.
LIBNAME mydata ‘/courses/d1234455550/c_2222/’ access=readonly;
If that doesn’t work, do what I said to begin with. Erase your whole LIBNAME statement and TYPE it into the program without copying and pasting.
Contrary to appearances, I don’t just make this shit up.
Computing confidence intervals is one of the areas where beginning statistics students have the most trouble. It is not as difficult if you break it down into steps, and if you use SAS or other statistical software.
Here are the steps:
1. Compute the statistic of interest– that is mean, proportion, difference between means
2. Compute the standard error of the statistic
3. Obtain critical value. Do you have 30 or more in your sample and are you interested in the 95% confidence interval?
- If yes, multiply standard error by 1.96
- If no (fewer people), look up t-value for your sample size for .95
- If no (different alpha level) look up z-value for your alpha level
- If no (different alpha level AND less than 30) look up the t-value for your alpha level.
4. Multiply the critical value you obtained in step #3 by the standard error you obtained in #2
5. Subtract the result you obtained in step #4 from the statistic you obtained in #1 . That is your lower confidence limit.
6. Add the result you obtained in step #4 to the statistic you obtained in #1. That is your upper confidence limit.
Simplifying it with SAS
Here is a homework problem:
The following data are collected as part of a study of coffee consumption among undergraduate students. The following reflect cups per day consumed:
3 4 6 8 2 1 0 2
A. Compute the sample mean.
B. Compute the sample standard deviation.
C. Construct a 95% confidence interval
I did this in SAS as so
data coffee ;
input cups ;
proc means mean std stderr;
var cups ;
I get the follow results.
|Analysis Variable : cups|
|Mean||Std Dev||Std Error|
These results give me A and B. Now, all I need to do to compute C is find the correct critical value. I look it up and find that it is 2.365
3.25 – 2.365 * .94 = 1.03
3.25 + 2.365 * .94 = 5.47
That is my confidence interval (1.03, 5.47)
If you want to verify it, or just don’t want to do any computations at all, you can do this
Proc means clm mean stddev ;
var cups ;
You will end up with the same confidence intervals.
Prediction: At least one person who reads this won’t believe me, will run the analysis and be surprised when I am right.