My work day started with a call on research design and ended ten hours later after I fixed a program that wasn’t working. I just resigned from my position as senior statistical consultant at a major research university so that I could concentrate on research. I’m on the technical staff on several projects, have a Ph.D., a record of scientific publication, am frequently an invited speaker on assessment, methodology and SAS programming. So, what am I whining about?
Who the fuck are you to say that I am whining?
That, my dear, is probably one of the reasons that I have been successful in this field, and one of the problems women in technology face.
I’m the size of the average twelve-year-old, female , Hispanic and over fifty to boot. Despite all of these disadvantages, I am doing well in this field, thank you because I have a few …
Qualities that I don’t think should be necessary for women in technical fields, but they are….
1. I can be a straight-A dyed in the wool bitch when the situation warrants.
One day, I was sitting in a faculty meeting with the suspicion that women in our department were not being taken seriously. As a statistician, I decided to collect a little data. I drew a cross-tabulation. The rows were gender of the speaker and the column was whether the next speaker responded – questioned, followed up, elaborated – or ignored the comment as if the speaker hadn’t even said anything. Of the speakers, 80% were male (the department was about 50% female) and of those 20%, most of the comments were made by me. Near the end of the meeting, I made a comment and again, a male member of the faculty made a remark that was if I hadn’t spoken. I pounded on the table and said,
“I said something and God damn it, you are all going to listen to me!”
Then, I mentioned the data I had been collecting during the meeting (believe me, the chi-square was highly significant). The two department chairs present were somewhat embarrassed but no one argued with my data. We discussed whatever the topic was – I think it was reducing our mathematics requirement for general education.
Personally, I don’t have a problem pounding on the table and swearing if that’s what it takes. Three points:
- The men in the room didn’t need to be that way.
- Not all women are like me.
- Not all women should HAVE to be like me. I have a pretty high self-esteem but not so high that I think everyone must be like me because I am so perfect.
Women who support Arrington’s view on Tech Crunch that it isn’t men’s fault that there aren’t more women in tech because “After all, look at me, I’m not complaining and I’m doing great” are perhaps missing the point that they are doing well because they have certain characteristics that men don’t need to have.
In my copious spare time, of which I have none, I teach judo. In 1984, I was the first American to win the world judo championships. One very important lesson it took me a while to learn as a coach is that not every athlete is me. Not every world class athlete is me. I would have been a better coach if I had learned that lesson earlier
2. I aggressively seek out mentors and figure everything out all by myself if I can’t find them.
I was discussing this with a young woman today. She’s probably my daughter’s age. We were working on a program and I commented to her that I had noticed she did not get the off-handed kind of help that the male staff members got. The men tell one another excitedly about new apps, new functions, bug fixes and other interesting and useful information they come across. She said,
“Now, I don’t understand this program at all, but you are explaining it to me where the guys would just be like – here, you’re not interested in this, let me do it. Or, you don’t know how to do this, so just go away.”
I know she is telling the truth because I have seen just exactly that happen to her many times. Maybe I should be more of a mentor. I feel a little bad about that, but she doesn’t work for me, and hey, I am busy. I told her,
“Well, of course you don’t understand it! No one comes out of the womb knowing this shit. But you’re smart, you’ll get it. Just keep plugging away. If you have any questions, ask!”
The program she wrote in the end was very good. Women, much more than men, in my experience, need to be immune to subtle and not so subtle discouragement, to disrespect. While Arrington says that Tech Crunch goes out of their way to invite women, these are the women who have already made it. Where men generally don’t go out of their way, and in fact, don’t even think about it, is in the unexamined assumptions and treatment of women. Most of the men this woman works with are very nice people who like women in general and are married to one or would very much like to be some day. They don’t treat me like this because …
3. I got all the credentials
I have a Ph.D., two masters degrees, 28 years programming experience, articles published in academic journals and so on. There is an enormous body of literature on social psychology on bias. In brief, the same study done over and over runs like this.
The identical resumes are sent out. Half of them have experience but no degree. Half have a degree but no experience. Of the entire sample of resumes, half of them have a name (or picture) that shows them to be female (or black). The other have are white (or male). The two resumes with experience/degree , white/black or male/ female are then sent out randomly to a group of college students/ personnel managers/ or whatever group.
The results are always the same. Overwhelmingly, the male (or white) candidate is selected. Those who choose the male candidate swear it had nothing to do with gender, he had experience. However, the managers/students/whatever who had the reversed resumes swear it had nothing to do with gender, he had a degree.
This is why all those people who loudly proclaim “I’m not a racist” or “I’m not sexist” have me wanting to slap them.
I have experience and I have the degrees. I work with men who don’t have nearly the educational qualifications I have. These guys are SMART and they’re fun and I like working with them. I truly don’t believe any woman would get the jobs they have without a graduate degree – and guess what, there are very , very few.
The one you hear next, of course is, “You just wouldn’t fit in with our team.”
Despite the impression I might give, I actually believe that most people are genuinely good at heart and well-meaning, that the false assumptions and subtle discrimination is not intentional and they really would try to change most of them, if it was pointed out. Some people are just jerks, though. There are people who would never want me working for them because I “am not a team player”.
Let me give you an example of a person who said he would never have me work for him..
I had written a program that was, if I do say so myself, a pretty kick-ass awesome piece of work. As most things that are that awesome, there were other people who helped, who came up with design suggestions, reviewed the results and made recommendations for improvement. All of the coding was done by me. I don’t get the chance to just write code that much and I was justly proud of this product.
4. Have the luck to have awesome bosses and mentors
We had a matrix management model at the time and the project manager, who was not my boss, came to me and wanted to have UMF review all of my work and “check that it is correct”. Now UMF is male and fits the stereotype of what a programmer should look like, which, I could gauge from this is not a Latina grandmother. UMF also is complete waste of oxygen as a programmer. Think the absolutely stupidest code you have ever seen written and that is UMF. I did not make up the acronym UMF. This is how he is referred to by the other programmers. The U stands for useless.
Short version of long story, the project manager weenie went to my boss and told him that,
“AnnMaria says she’s not going to do this.”
to which my boss responded,
“Well, I guess that means she’s not going to do it.”
Dr. Richard Eyman was my doctoral advisor. He spent endless hours teaching me statistics. When everyone but me dropped one of the upper level doctoral courses, he taught it to me as an independent study. He introduced me to his friends who were profoundly competent in psychometrics, people like Jane Mercer. He made sure I took courses from people like Keith Widaman and Lew Petronovich.
It was just luck. I attended UC Riverside because I was pregnant with my second child, my husband had just taken a job at Rohr Aircraft in Riverside and I didn’t want a blank spot on my resume while I was out of the job market having a baby (which turned out to be two babies in thirteen months).
5. I’m not bothered that no one in the room looks like me.
Being in judo probably helped my career. I’m startled by the number of female judo competitors I meet who are in the tech field. It’s kind of ironic that a non-male, non-Japanese American would be the first from the U.S. to win the world championships because that is certainly not the demographic of U.S. judo competitors. I’ve spent so much of my life being the only woman in the room that I am used to it. It’s actually gotten better. I remember 28 years ago when I was pregnant (hence needing a bathroom every 30 minutes) at a meeting in an aerospace plant where NO ONE knew where there was a women’s restroom because all of the people I was meeting with were male engineers. I finally spotted another woman, grabbed her and said I KNOW you know! Turns out she was just visiting, and, in fact, did not!
Whether it should bother you or not that no one is like you (that you “just don’t fit in”) is a separate issue.
My point is that there are a number of characteristics that women must have that men don’t need to be successful in technology.
These are but PART of the reasons I see that there are few women in technical fields. And why, exactly, is pointing this out called whining?
I probably hadn’t thought about canonical correlation in twenty years, but then a problem came up this week where it was the exact technique I needed. What made me laugh, though, is the particular problem I was dealing with twenty years ago had school achievement measures – tests of English, Mathematics, Science and Social Studies – as the dependent variables and the problem I was dealing with this week had, you guessed it, school achievement measures as the dependent variables.
So, I thought I’d ramble on about canonical correlation for a while…
Canonical correlation is used when you want to maximize the correlation between a set of X variables and a set of Y variables. For example, you might want to know how much teachers can affect student performance. You have a set of teacher factors; years of experience, percentage of time spent in hands-on activities, percentage of time spent on classroom discipline, minutes per week spent on preparation, minutes per week spent on grading. You have a set of student outcome variables, math achievement, reading achievement and science achievement.
You could do three multiple regression equations and maximize the explained variance in each dependent variable individually. However, hypothetically speaking, what if you found that increasing classroom structure increased achievement in science but decreased it in math? If you’re an elementary teacher, it’s probably hard to relax and increase the classroom rules during the day depending on the subject. School achievement is one of the few good candidates that leaps to mind for canonical correlation because you have multiple dependent variables and it is hard to argue one is more important than the other. We want kids who can read AND do math.
In a simple linear regression, we are calculating the covariance between two variables, X and Y. (The standardized form of the covariance is correlation.)
In a regular multiple regression equation we are trying to select the set of regression coefficients that maximize the covariance between a set of X variables and a single variable. To get a multiple correlation, we apply those regression coefficients to the X variables for each individual. We get a predicted score for that individual, the Y-hat. The correlation between the predicted Y and the actual Y is the multiple correlation.
In a canonical correlation, we go one step further. We have a set of X variables and we are trying to maximize the covariance between TWO matrices. (If you remember your normal equations from college, with multiple regression you had an X matrix and a Y vector – a vector, in statistical terms, being just a column of numbers, not to be confused with the geometric term of the same name, just like one should not confuse a three-way interaction in ANOVA by the event of the same name in pornography, of which I hear a great deal more of the latter than the former can by found on the Internet. Extremely odd when you consider that the initial motivation for development of the Internet was assistance of scientific research and not distribution of pornography. It’s true. You can look it up.)
ANYWAY … multiple regression maximizes the covariance between the X matrix and the Y vector while canonical correlation maximizes the covariance between the X matrix and the Y matrix.
I was going to say more about this but I have to finish my second WUSS paper on procedures. Speaking of SAS, if you wanted to do a canonical correlation with SAS, it’s very easy. You simply type:
proc cancorr data = datasetname ;
var first-list-of-variable names ;
with second-list-of-variablenames ;
Of course, there are a ton of options. One point really worth making is that you can analyze the covariance matrix, correlation matrix and other types of matrices. This is useful because listwise deletion is a common problem in analyses with a large number of variables, that is, if a person is missing just one out of ten variables he or she is dropped from the analysis. So, if you have ten variables each of which are only missing 4% of the data you can easily end up with 20-40% of your subjects dropped from the analysis. (It would be a little odd if it was 40%, but that’s another topic.)
Speaking of SPSS, even though we weren’t, although there are usually pointy-clicky things for just about every statistical procedure in SPSS I could not find it for canonical correlation. No big deal, just open a syntax window and use a MANOVA statement, like so (this uses the example from the anorexic data set included in the SPSS samples).
MANOVA weight mens fast with binge vomit purge
/discrim all alpha(1)
/Print = sig(eigen dim) .
I would like to say a lot more about this but I promised to have a paper on procedures novice programmers need to learn and I am kind of guessing that the conference organizers would give me “THAT LOOK” if I suggested that CANCORR was one of those procedures. I know the exact look. It is the one Maria gave Dennis when he asked her if she had thought of re-setting the programmable random access memory on her computer when she had a problem. She said,
A friend of mine mentioned that a woman had invited him to her apartment. Let me just say that my friend does not exactly rival Mother Teresa for celibacy. Astounded, I asked him why he hadn’t taken her up on the offer. He answered that his son had died recently and he didn’t want to spend the rest of his life having one-night stands. My friend is Dakota Sioux and they believe that what one does in the year after the death of a significant person – a spouse, a child – is how you will be the rest of your life.
I think the Sioux have a great deal of traditional wisdom when it comes to dealing with death. Maybe there is something to this part of it. I know after my husband died I worked from when I got up until I fell asleep from exhaustion. I taught psychometrics and developmental psychology at one college, drove 80 miles and taught evening courses in market research (I have an MBA along with an MA & Ph.D.). Then, late into the night, I did analyses of data for evaluations, research projects, needs assessments for grant proposals. I paid off the medical bills, funeral bills, put two children through college, watched a third compete in two Olympics, and kept working 70-80 hours a week.
Finally, this year, my daughter’s coach asked me out of curiosity,
“Do you really need the money? Because you have a little girl at home and you are teaching until past her bedtime and then doing whatever you do at the university on the computer all day and flying around the country doing that other thing you do.”
And I realized, no, I didn’t really need the money and hadn’t for quite some time. Three of my daughters are living on their own, my (not-so-new any more) husband has a job and if I quit working completely the main difference in our lives might be that there would be less useless crap that I have to box up and give to Goodwill every month. If the economy really is stalling, it’s not our fault. In China, there’s probably an entire Factory of Useless Plastic and Electronic Crap named after my family.
So, I politely declined the opportunity to teach at two graduate schools this year. Then, I was on Twitter one day and Eric Greenspan, who I would not know if he came to my house and fixed my plumbing, asked the question, to the world in general,
“If I could do what I really wanted ___________ “
I thought about it for a minute and I almost wrote back, if I could do what I really wanted to do, I would just work on research projects that interest me. I would write papers on aspects of programming that I happened to be interested in that day. I’d learn as much as I could about everything I could in statistics and satisfy my curiosity about some things. For example, I really am skeptical that generalized models or mixed models are that superior to a plain old GLM if you have a million data points or if there are very large differences among your groups. (Okay, well maybe you wonder about the meaning of life, but as my daughters always tell me, I’m not THAT interesting.) I’d only work on projects that I thought had the potential to make a difference – or that I felt like doing just for the hell of it. And I would telecommute because what’s the point of living by the beach if you are away at work all the time. Besides, my house is where all my stuff is.
I didn’t write him back, first of all, because he wouldn’t know me if I spilled my Starbucks coffee on him, secondly because that is way the hell over 140 characters and thirdly, as I thought about this, my jaw dropped open because I realized there was not a single reason in the world that I couldn’t do exactly that this moment.
Within the next 48 hours, I was asked to work on not one but two separate projects that I thought would be awesome (oh, I’m charging them money. I’m not stupid). I turned in my letter of resignation. Wednesday, I leave from my last day at the university to Boston, where I have a meeting on research design that I think will be totally awesome. (If you like that sort of thing, which I really, really do.) I’ll see my daughter for her birthday the next day and take my granddaughter to the aquarium and then fly home.
Don’t take everything you read on twitter too seriously though. According to @vwadhwa companies prefer to hire younger engineers and tech people. This may be true in general, but my two weeks of notice isn’t even up yet and I am already booked for the next six months.
Vwadhwa seems to keep working, too, and from his teeny little picture on twitter he looks to be slightly older than Justin Bieber, if you ask me, which, you shouldn’t because I haven’t the faintest idea what Justin Bieber looks like, but I’m going with what it would be like if Abercrombie & Fitch sold people.
Statisticians are good at lots of things but naming is not one of them. If Carl Linnaeus had been a statistician the name for camel would be Horse With Hump and for elephant Really Big Horse with Nose based on the fact that both have four legs and people ride them.
Such is the case with the Generalized Linear Model.
Let’s start with the General Linear Model, because it is easier. As I said before, the General Linear Model is general (and also linear and a model). Then I said this “Almost all questions that can be stated:
Is there a relationship between this thing and this other thing?
….and rambled on a bunch.
So, now that we have that down pat … there is the GENERALIZED Linear Model because the general linear model was not general enough for us. You see GLM (well, both are GLM so let’s call it the horse), was based on the assumption that the errors follow a multivariate normal distribution. So a Generalized Linear Model (let’s call it the camel), generalizes from our basic GLM (the horse) to other types of distributions.
Then there are General Linear MIXED models (maybe those are zebras ?).
The basic Analysis of Variance, going back to the original apparently not-so-general-as-originally-believed model, is quite simple. You have one (or more) independent variables that can be broken down into two (or more) groups. Let’s say gender. You have people in two groups, male and female. This is not a sample of all genders. It’s all the genders there are. The same is true if you put people in an experimental and a control group. This is NOT a mixed model, because you only have one type of effect. Hurray you.
You can have a random effects only model. This example from Stanford looks at whether there is variability across brands of beer. With brands as the random effect, eight different measures are taken across six brands of beer. This is a very worthwhile study as it involves drinking 48 bottles of beer. Oh, and repeated measures, too.
What if you had taken 20 people and put ten in one group that got to drink beer and ten in another group and tested them four times, with the first group getting to drink two beers between each testing and the second group having to watch Sarah Palin videos? Now you have a random effect. You do not have all possible levels of people. You have randomly sampled twenty out of the population. There will be an effect of person and an effect of group. So, this is a random effect AND a fixed effect in your model. I presume your dependent variable will be stupidity with the research question what makes you become stupider, listening to Palin or getting drunk. I’m agnostic on this question.
If you have a MIX of fixed and random effects, then it is a mixed model.
Generalized Linear Mixed Models are perfectly cool with heterocatanomic multivariate distributions, that is when you have some predictor variables that have one distribution, say a Poisson distribution, and another that has a normal distribution.
Then, just when you thought it was safe to go read something else, there are Nonlinear mixed models.
My husband asked if there was such a thing as model mixers, where statisticians got to go to parties and mix with models like Tyra Banks.
Lovely daughter number two mutters,
“Have you seen what the people Mom hangs out with LOOK LIKE? The best you could say for any of them is they are good-looking for old people who work on computers all day. So, for model mixers, I’m going with – No.”
Does moving from being a novice to a not-so-novice programmer mean knowing everything there is to know about PROC TABULATE? Well, yes and no.
It would be hard to call someone who knew everything there was to know about SAS ODS or every possible regression procedure in Stata from regress to nlogit a newbie. However, how useful is that person when you need to perform calculations, format your data differently, merge files or write more efficient, readable code? In the long-term, whether going deep or going wide is better is a debatable topic. For a beginning programmer, though, my recommendation is to spread out and increase your knowledge across a range of topics from programming to presentation to new procedures.
So, you are just learning a programming language and are like all of the eager, young people I have worked with over the years, you are excited to branch out. Yay, you!
There are a lot of possible next steps. One is to start taking advantage of all of the resources for learning. With SAS you want to check out the mailing list SAS-L, sasCommunity.org, SAS publications, of course, and some of the many blogs by both SAS Institute employees and SAS users. For Linux, I like the Ubuntu forums. The SPSS India website has a lot of free tutorials on statistics (of course with SPSS) and it really surprised me how good it was because I hadn’t thought of SPSS as offering a lot of resources. It may be the effect of the IBM purchase, or maybe it was always that way. Raynald Levesque’s site has cool macros, scripts and syntax for SPSS. Yes, I realize it’s not a blazing insight to say that you can find some good stuff on the Internet, but I mentioned those because I think all of them are very friendly for the programmer who is just beginning and wants to move ahead (and I mean that in the knowledge sense not in the stepping over the bloodied bodies of your co-workers to get to the top sense).
A second direction is to become familiar with more than just your current limited area of expertise. If you are a SAS programmer, you could learn more SAS products, such as SAS Enterprise Guide, Enterprise Miner, and smaller features like the Power and Sample Size application. Or, maybe you want to learn to use Stata on Linux and start learning about the Linux operating system. SAS and SPSS both have interfaces with R. SPSS was pushing Python procedures at one time. I’m not sure if they’re still heading in that direction. It doesn’t matter whether they are or not, really, if it is something that interests YOU.
Whichever direction you take shows that you have become interested enough to search for new knowledge and that is always a great sign. I was asked today who I would recommend for a new position that just opened up and why. I said,
“There are several people I can think of who could learn what you need. What I would look for is someone who is genuinely, sincerely interested, and not just in a $10 an hour raise, but in statistics, because that person will learn on their own, grow and develop into the kind of person you want.”
I was at the Predictive Perspectives seminar today, met someone who was excited to be there, not because she was looking for a job but because she wanted to learn. I gave her my card and asked her to contact me. It may be a cliche but it’s still true. You can’t buy passion (well, maybe in Las Vegas, but it’s legal there).
The biggest step forward, though, I think is – PLAY! As you learn more about programming languages, statements, functions, procedures and products you’ll invariably like some more than others. For example, my brain tries to crawl out of my skull to escape being melted by boredom whenever I have to look at SQL code. You may love SQL (unlikely as that may be). The more time you spend working on different projects, the more you’ll learn and the more you’ll discover what really interests you. I don’t have a profound thought to help you decide which direction to go except this. Remember the movie, Pleasantville, where his mother said
“…I had the right house. I had the right car. I had the right life.”
And her son answered,
“There is no right house. There is no right car.”
Learning SAS (or any programming language) is very much like that. There is no right choice. There are a lot of choices. The more you learn, the greater the number of choices you get to make.
How cool is that?
Years ago, I read a science fiction story about a future where all plays were performed by robots that had been programmed with the combined characteristics of the world’s best actors. An aspiring actor sadly asked the technician working on the computers to run these robots:
“What would you do if they invented a little black box to do your job?”
The technician paused thoughtfully and responded,
“Well, I guess I’d get me a job making them little black boxes.”
Remember “word processor”? You probably don’t but thirty years ago it was one of the fastest growing careers in America, women were moving out of the typing pool to “technical” jobs using word processing machines. And then, applications like WordPerfect, Word, Wordstar and Applewrite got to be easy enough that no one needed a word processing department and all of those jobs evaporated practically overnight.
I think that as soon as you have down DATA steps, PROC sort and other basic steps, it’s really time for you to look around for new challenges. The world is changing, whether you like it or not. SAS Enterprise Guide, Stata, SPSS, JMP, S-plus and even Excel/ Open Office to some extent, are all making it easier and easier for people to merge, create and analyze data sets all on their own using pre-written procedures that can be selected from a menu and populated with choices dragged and dropped in a pop-up window.
When you start out as a SAS programmer, you use the pre-written formats to make your data read “August 18, 2010″ instead of 8/18/10 and you think you are cool. Then you find out you can create your own formats and have it print “Dennis’ birthday” for the 18th and “Naked Mole Rat Appreciation Day” for the 19th and so on and you are convinced that you are super-cool.
If you liked PROC FORMAT allowing you to write your own formats, you’ll really love the SAS macro language, a major extension of Base SAS that allows you to write a program to write programs. But … learning to write macros is a big leap toward being an expert programmer, especially for someone new who is just getting used to PROC MEANS.
There are a couple of baby steps you can take to get started in that direction. The first is the use of %INCLUDE. I usually introduce this to new programmers first because it is relatively easy.
%INCLUDE is a gentle way to get a novice introduced to the concept of SAS macros, which can be a bit intimidating.
When you include something, unlike with a macro, you don’t need to change any of the variables, statements or functions. Debugging is a snap because you can run the code exactly as is, copy it into another dataset once it is bug free and then run it. Yet, it leads you into the idea of having code from somewhere other than within your program run over and over.
So, what does %INCLUDE do?
It includes programming statements from another file outside of your program. There are two reasons to use this very frequently. One is to save copying or typing the same code over and over. A common example is acknowledgements of funding, legal disclaimers, legends or other text that might always go at the bottom of every page on a website, such as:
Footnote1 “Material under the Creative Commons license for this site” ;
Footnote2 “can be freely produced as it was created by pixies who live on air “ ;
Footnote3 “Other brand and product names are trademarks of their respective companies. “ ;
Footnote4 “Which are probably owned by the devil and run by communists“ ;
Footnote5 "Except for SAS" ;
Footnote6 "and SPSS (now owned by IBM) on Thursdays" ;
Footnote7 "and Stata which is in Texas and run by cowboys."
If you have 10 lines of this, you can copy and paste it every time you write a new program, or you can simply save the footnotes to a file and when ever you need these use a line something like this:
****** This is the text for footnotes **** ;
%include “c:\myfiles\mysasfiles\footers.sas “ ;
You really don’t need those ten lines of footnotes getting in your way every time you’re trying to read the program and de-bug it or document it. A major advantage over copying and pasting the same lines in every program is that if you get a new grant, or SAS gets bought by IBM, you don’t have to go find every program and change that code. I can just change it in the footers.sas file and I’m done.
A second common use for %include is , again, to make the code more readable and get rid of distractions. While it is kind that ICPSR includes PROC FORMAT code with many of the SAS data sets available on its site, these can get in the way.
You may get a program from ICSPR or another source that has hundreds of lines of Proc Format, Label and Format statements (this happens to me all the time).
value statnum 1='(1) Alabama'
And a hundred more lines of the same …
Followed by :
rec_bh = 'bh:record type'
statnum = 'numeric state code'
ori = 'originating agency identifer' ;
format state statnum. division divisn. ;
And a hundred more lines …
Moving the Proc format, labels and formats cuts the length of the program in this example that I had received from 387 lines to 160. There are several reasons I may want to do this. The first is debugging. Before I am sure I read in the code exactly the way I want it, the formats are pretty irrelevant. Often, for whatever reason, the file is not in the exact format I expected. I don’t want to slog through a lot of useless formats and labels before the data are actually read in. When I do have a permanent data set, I may want to use those formats but not have to see hundreds of extra lines in my code.
Important point: %INCLUDE statements are executed as they are read. When SAS hits a %INCLUDE statement it is as if the code as copied into your program. Think about this. What this means is that I had to move the FORMAT procedure into one file and the LABEL and FORMAT statements into another file.
My program now looks like this:
***** Proc Format to create user-written formats ***** ;
%include “c:\myfiles\mysasfile\crimefmt.sas” ;
Data libref.crimes ;
Infile datasetname ;
Input variables ;
<bunch of other statements >
******* Labels and Formats *********************** ;
%include “c:\myfiles\mysasfile\labelfmt.sas” ;
Note that if I have the labels and formats after the run statement, it will give me an error.
Try it. It’s very non-threatening and once you get used to seeing those %____ in your program you are ready to move on to the next baby step, assigning macro variables.
Don’t believe that crap about how hard programming is or how hard statistics is. Everything is hard when you first start doing it. As a good friend of mine, who is a very good coach said at practice one day,
“When you learned to walk it was hard. You fell down on your butt. You screamed, you cried, you were really frustrated. But you didn’t give up and now you walk around all the time and walking, that’s just nothing to you.”
Yeah, it’s like that.
Read a great line in Seth Godin’s book, Linchpin,
“It’s not an effort contest, it’s an art contest.”
The point being that no one cares how hard you worked, they care how great your product is. Of course, great products tend to result from hard work along the “necessary-but-not-sufficient-condition” lines, but that’s a whole different topic.
This fit with what I have been thinking about statistics a lot lately,
“It’s not an IQ contest, it’s a knowledge contest.”
I read a lot of statistics articles, attend a lot of presentations on statistics – and I see a huge disconnect between these really brilliant people and those who are making decisions on policy, funding and management. This is our fault. Well, maybe not yours and mine personally, but the fault of the discipline as a whole. We spend much of our time researching new statistics, writing papers on the latest developments. Some of these decrease the standard error, or at least give us a more accurate estimate. That’s all good, right?
We spend FAR less time on making sure our results are interpretable to people other than ourselves. In my previous life, I led faculty development workshops on teaching mathematics and science. Many of my colleagues were dismissive of the idea that we ought to work to make our ideas discernible to the average person. One of them actually sniffed dismissively (I didn’t know people did that outside of Victorian novels). He told me that he had gone to the effort to get a Ph.D. and these students, if they didn’t have the intelligence and/or weren’t willing to work hard enough to learn from him, then they all deserved to fail and that people like me were ruining academia.
SIGH. It gave me empathy with a school psychologist I knew who once wrote as a child’s diagnosis “Dyspedagogia”. (Latin for “bad teaching”.) After he was caught out, he refused to change it, too, saying he stood by his professional opinion.
Anyway…. I don’t know if that professor was as brilliant as he thought – he did obtain a doctorate in a scientific field from an Ivy League school – but I really don’t care. It’s not an IQ contest. As for everyone else who is not a statistician, as Sheila Tobias said in her wonderful book by that title, “They’re not dumb, they’re different”.
As a statistician, I KNOW that I see things differently and my job is to explain what I see, and not necessarily in the way it makes sense to me but to make sense to other people.
For example, when I look at this:
When I see the table, the formula below automatically pops into my head.
Then, I think, that there should be about the same number of males and females saying “yes”.
The total sample is (roughly) evenly divided by gender and about 300 people said they planned to join the military after high school, so the expected value is 150 women.
Subtracting 72 from the 150 one would expect gives a value of about 80, which squared is 6,400.
It is already obvious this is significant.
Really, I don’t get nearly that explicit. What I’m more likely to think is,
“150 – 72 squared is a lot. That’s significant.”
Then I run it in JMP or SAS or SPSS and see that the chi-square is 110, p < .001 and I am right.
Occasionally someone shows me a table like this in their dissertation and the chi-square value is 1.06 and it is non-significant. They tell me this is what the computer gave them. I look at it and tell them.
“Well, then, the computer is wrong.”
The uncomprehending shock on someone’s face when I say this always strikes me as a bit odd, as is their complete amazement when I turn out to be right.
Of course, that just means that somewhere along the line they incorrectly programmed the computer. It happens to everybody.
Yet, people go away shaking their heads in disbelief, and I have a reputation as “that scary smart person in the corner”. Truly, here was the great dramatic insight that popped into my head that lead to the conclusion that their chi-square value was incorrectly calculated. I looked at the table for two seconds and this was my exact thought.
“6,400 is a big number.”
Not very profound, is it? You might be thinking my point here is that I should work back to the steps that led me to that insight. If you are thinking that, you weren’t paying attention to the “different” part. In fact, this is what I did;
“Take a look at this chart. You can see that about three times as many men plan to enter the military as women. That might not be significant if you had five or ten people, but you can see (on the Y axis), you’ve got over 1,000 people in each group.
When you get a fairly large number of people and the difference between the two groups is a factor of three to one then you can be pretty certain there is a relationship between being in that group and whatever the other variable is, in this case the decision to join the military.
Now, this fits with what you already were pretty sure of in this case, which is that men are significantly more likely to enter military service than women.
However, that’s not necessarily the case that the results will always bear out your pre-conceived notions. For example, the next set of results, on race and ethnicity, did not fit at all what we had expected….”
While this might not make me sound as intelligent as going through the chi-square formula (which is really a pretty simple formula, after all), it does accomplish three things:
- I am pretty certain that everyone in the room when I explained this understood exactly what I was saying and could follow my logic.
- Because they had the confidence that they understood exactly what I was saying and how it fit with what they already knew, they agreed with me, they didn’t just go along with me. This is crucially important because if later on I want them to support some initiative based on this information they’re a hell of a lot more likely to do it than if they’re not really sure what I was talking about.
- Having gained some confidence that they can understand what is going on here and that it is not completely random, when we come to the next set of results, which just happens to be contrary to what was expected, I have their attention and I am half-way to having their agreement and support. Come on now, admit it, aren’t you just a little curious to find out what we learned about the relationship to military service to race and ethnicity?
I don’t know if any of this shows how intelligent I am. If anything, I think I might have more claim to intelligence based on the fact that I figured out how to do the chart in JMP only a few minutes after starting to use it. I wanted “male/ female” and “Yes/No” instead of ones and zeros and reasoned that since SPSS has value labels and SAS has proc format, there must be some way to do it. (There is, right click on the column you want. Click on COLUMN INFO then select COLUMN PROPERTIES and then select VALUE LABELS”.
I remember a quote on the definition of intelligence that,
“The essential difference between a genius and a moron is the ability to generalize.”
So, is any of this proof that I’m a genius? I don’t know. I do know that nobody cares.
The next time your boss asks what you do all day or why it takes you so long to answer a question, show him or her this …
Now, unlike this blog. where I basically drink Chardonnay and say whatever the hell I feel like, when people are paying me for answers, I take my work pretty seriously. Years ago, on the very cool resource, SAS-L there was a (thankfully, short) trend where people would post answers with the note “Code not tested”, which is a polite way of saying, “I don’t know if this will work.”
When anyone asks me a question, unless it is something like “Do you end SAS statements with a period or a semi-colon?”, I test my answer before I send it. For one thing, with changes in versions, different operating systems and other variations, something that may work in one situation may not work in another.
So… the very simple question asked was:
“How do I renew SAS for Linux 64?”
First problem: I do not have SAS installed on a 64-bit Linux system to renew, so, I decide to install it on this Ubuntu VM I happen to have , and it doesn’t work at all. Well, I had ASSUMED that since it is a VM running on a 64-bit Mac that also has a 64-bit Windows 7 VM and a 64-bit Vista VM that it must be a 64-bit Ubuntu VM.
The first time the install failed I thought, gee, maybe I should check. Well, I had created this VM about a year ago to test some things on the 32-bit version of Ubuntu for someone so …
Delete VM I no longer need.
Download the 64-bit iso.
Curious that it says not recommended for daily desktop usage. Read several posts speculating on why it said that but nothing to convince me it was a big problem.
Downloaded and installed anyway.
Second problem: COULD NOT GET PAST LOGIN SCREEN! Ubuntu would not take my password. It was as if it did not recognize the keyboard. Turns out this is a known problem with VMware/Mac/Ubuntu combination.
Must be relatively new because I did not have it before.
I’ve had the keyboard problem with Ubuntu 10.04 in VMware Fusion 3.0.2 on a MacBook, US keyboard layout. I got around the problem as follows:
- At the logon screen, go to the Accessibility Preferences at the bottom of the screen, and tick on screen keyboard.
- You may have to reboot if the virtual keyboard doesn’t start.
- You can now type in your password using the virtual keyboard on the logon screen. Once logged in, your physical keyboard works.
To fix the problem with your keyboard
- *Open a terminal, and reconfigure your console using the command:
sudo dpkg-reconfigure console-setup
Once logged in, I went to
cd to the external hard drive where all my SAS deployment folders are (we have MANY versions of SAS for many different operating systems)
cp -R sas92Linux64 ~/sasinst
This copied all the files and directories in the deployment folder to the ~/sasinst folder on my hard drive
I changed to bash
sudo rm /bin/sh
sudo ln -s /bin/bash /bin/sh
I then went to the folder where I had copied the deployment files, typed
Third problem: The deployment wizard started …. and then stopped
Now, at this point I have created a new VM, gotten around the keyboard problem, copied over the files and still nothing
I then found this FABULOUS web page from the National Center for Ecological Analysis and Synthesis
VERY IMPORTANT ADVICE …
install required packages:
sudo apt-get install xauth x11-apps libstdc++5 ia32-libs libxp6
Then, of course, comes the question, where do you find libstdc++5
So…. I download libstdc++5 , run the command above to get everything I need installed.
I go back and typed
sudo bash setup.sh
Getting SAS to work
First, create a directory to act as your working directory. I called mine tmp
I went to usr/local and typed
Give everyone access to write to the tmp directory
sudo chmod 777 /usr/tmp
To get SAS to start the first time I had to type
/usr/local/SAS/SASFoundation/9.2/sas - work /usr/local/tmp
———– One more thing —-
Since I always like to tie up the loose ends, I took one final bit of advice from the wonderful NCEAS site and typed the following :
sudo ln -s /usr/local/SAS/SASFoundation/9.2/sas /usr/bin/sas
Now when I start SAS all I need to do is type
sas -work /tmp
I like that better.
Fourth problem: I get an error message telling me that there is a mismatch between my license and the version of SAS I have installed. I talk to a few people at SAS and get several other things I need, like an electronic software download for the latest version of SAS for Linux 64, the SID file to renew Linux 64 for a different site, and I have a long conversation with someone who tells me that what I have installed is Linux 32. Since the current (expired) SID file says Linux 64 and when I run a PROC SETINIT it says Linux 64, and the original ESD from a year ago says Linux 64, I tell her that while it is theoretically possible that it is, in fact, Linux 32 which was mislabeled in four different ways, I kind of doubt it. She finally gives in to my superior logic, tracks down the SID file for this version and emails it to me.
Once you do actually have the correct SID file …
Here is the answer I actually sent on how to renew SAS on Ubuntu (Remember, that was the original question)
1. Open up a terminal window (under applications > accessories)
2. cd to where you have your sas software installed
3. Start the renewal utility by typing
4. Hit enter to continue
5. Type 1 for Run Setup Utilities
6. Type 1 for Renew SAS Software
7. You’ll be prompted for the file containing the SAS installation data file. If you downloaded it from an email sent to you by your SAS Administrator, it will probably be something like
(by the way, when I tried ~/Downloads/SAS92_Linux64.txt it didn’t work )
The Setup Utilities Menu will pop up again.
You have now renewed.
I owe Victoria Brookhart an apology. One day, we were in the graduate student lounge discussing research methods and she burst out,
“Isn’t this great? Here we are warming ourselves at the fires of knowledge. These are the times we’ll remember our whole lives as the good old days.”
The rest of us threw spitballs at her, I think. We were too cool and cynical to think like that, or, if we did, to say it out loud. On top of which, Vicky was one of the very few in the group to pursue qualitative research, or as, us quantitative types referred to it, creative writing.
It might have been me that wrote on the white board outside her office,
“Victoria’s sex life is participant observation.”
Actually, I think it was Dio, but it might have been me. Of course, a few years later, when I was working in North Dakota and doing research on Native Americans with developmental disabilities, I had to call up and grovel and ask for her advice on how to get started in an area where there was NO published literature. That’s not why I owe her an apology, though,
Vick was right. If you are fortunate like we were, and like so few people are any more, that you can devote full time to learning, that is an incredible blessing. Yes, we felt more broke than blessed at the time, plus there is that whole thing about being a professor’s slave, or at least indentured servant. And yet ….
We had the opportunity to delve into subjects that interested us, to read books about them, collect data, analyze data, formulate theories, test them, swear, talk things out with people who had whopping years more experience than us who could steer us in a better direction, be mentored, read the latest articles, hang out at the computing center until 10 p.m. when maybe we could talk somebody into loading OUR tapes on the tape drives since no one else would be there until morning and running OUR data right now so we could get the results and …
I don’t know if I was smarter back then but I sure THOUGHT I was smarter. It made me smile several years later when the department chair showed me his evaluation of a new member of our department which began,
“Like most new Ph.D.’s, X thinks he is smarter than God, but we expect he’ll get over it …”
I have heard a lot of comments like that over the years. I remember when the dean of a state college came to campus recruiting. Someone (it could have been me) asked him how many articles he had published in the 20 years he had been there. He said he hadn’t published any but he had received over $30 million in grants and he was proud of that. After he left, my friends and I discussed with a superior attitude how WE were not impressed because WE already each had a couple of articles in press. WE obviously were so much smarter than him.
Fast-forward twenty-five years and Victoria was right. One of the reasons that dean didn’t publish is it takes an enormous amount of work to write a grant and then to administer it. I’m sure that grant money funded a lot of graduate students over the years and a lot of demonstration and service programs. He was right to be proud of it.
I think of that gentleman from time to time when I make a mistake and ask myself how I could have been so stupid or careless. Well, really, I’m not particularly stupid or careless but what I am is pulled in many directions. Some things, whether it is serving on departmental search committees or putting together a justification for a major software purchase or analyzing data on the number of students applying so we can project the need for courses – they just have to be done.
Several years ago, when I was doing a lot more grants management, I know some of my students secretly classified me as one of those old professors who understood statistics but had to have other people write programs for them.
I’m very happy that I get a chance to do more programming these days. Now that three of my children are grown, I can afford to choose to do more of the work I want to do, a luxury I haven’t had as much since graduate school.
I listen to doctoral students talk about how behind the times and overrated their professors are, with that attitude of superiority I remember we all had. Now I laugh at it realizing that part of the reason that our professors weren’t up to our standards of research productivity and error-free equations and code is that they had other shit to do. Sad, but true.
Every decade or so, I step back and work in an academic setting, and it is always amazing to me the new developments, new statistics, new software, that I haven’t had time to learn, and it takes me a year or two to catch up.
Coming from a business into academia, I’m not nearly as convinced of the greater practical benefit of some of these. We often give ourselves far too much credit in the universities. I hear my colleagues say,
“This article is in an academic journal but eventually it gets used by the people in the field.”
And I think to myself,
“No, usually, it doesn’t.”
It never occurred to me at the time, but now I wonder what that dean thought of us. I am guessing he was partly wistful and partly amused. Probably like most people in the field, I remember the first computer program I ever wrote, back in 1975, in BASIC. It was a program to create poems, as a basis it started with one by Ogden Nash,
“Behold the hippopotamus
We laugh at how he looks to us
And yet in moments dark and grim
I wonder how we look to him
Peace, peace, oh hippopotamus
We really look all right to us
As you no doubt delight the eye
Of other hippopotami”
Here’s to you, Victoria. You were right all along.
Vista on VMware was running slow. I mean painfully slow. Like, bamboo shoots under your fingernails painful. As in banana slug slow.
SAS Enterprise Guide was running SO slow on VMware I had gotten to the point where I would read a book while waiting for it to open, or to view results in a project. These are results from tasks that I had run previously, so we’re talking just moving from one window to another.
On the positive side, I read a few chapters in
Decision Trees for Business Intelligence and Data Mining, by Barry De Ville, a book I recommend.
I checked the Task Manager and it said CPU usage was 100% which seemed very odd. I had SAS running on several virtual machines on four other Macs and it ran fine. The one I happened to be using lately though was a laptop my wonderful husband gave me to replace the l7 inch one I dropped. You know those commercials where they show the laptop being dropped in the airport, the traveler panicking and then everything is fine due to the Titanium case?
Yeah, well, that didn’t happen.
So… while it was getting a new screen I was using a 13-inch laptop, which really did not work since I have terrible vision. I hooked it up to a 28 inch monitor and all was well except …
When I tried to run SAS, it was unbelievably slow. It wasn’t that not enough memory was allocated – I had given it 2 GB which was the same as three of my other computers, so it should have enough memory. I had VMWare installed on all of them, but the others were all running Windows XP, Windows 7, or Windows Vista 64. I came to the (erroneous, it turns out) conclusion that on Vista 32, SAS Enterprise Guide sucked as bad as when it first came out ten years or so ago when it was glacially slow. I was so frustrated I thought I would go with option A, use S+, which I had been meaning to do more with for a while. Option B involved moving four feet to the desk behind me and copying the data over.
When S+ took forever to start, the light dawned. Obviously, it’s not you, it’s me.
Coincidentally, I had just been thinking today about how there is no time to check ALL of the things we hear or assume we know. I’d been told at some point that you shouldn’t allocate more than half the RAM for a virtual machine because then memory swapping would occur. When I customized the settings for the VM the pop-up menu suggests not using more than 3GB. 2 is less than 3 (see how good I am at math?) so I should be fine, right?
Then, I read on the macrumors forum a post by someone who reported their performance slowing considerably once they went over a third of the memory. So… contrary to what you would normally expect that allocating more memory would make your machine run faster, this is, in fact, a curvilinear relationship and at some point it makes your machine run slower.
This makes perfect sense when you think about it and I knew this. What finally dawned on me was that the computer that was so slow had 4 GB of RAM and the other three each had ether 8 or 16 GB. Also, when running on my 17 inch laptop, I didn’t use two monitors because I could see it okay. Not great, but good enough.
So … I reduced the RAM allocated to the VM to slightly over 1 GB, closed the laptop and just used the external monitor instead of having two monitors and used VMware in full-screen mode.
And … all is well with the world. S+ popped right open. SAS EG is behaving again.
So, I have been reminded of a valuable lesson, which is that when it comes to software problems, sometimes, it isn’t you, it’s me.
I do want to add, though, that the same does not extend to relationships and if we ever break up, it’s definitely you.