Sep
4
Knowing Code Doesn’t Make You an Expert Programmer Any More than Knowing English Makes You Hemingway
So, I am writing these papers on moving from novice to intermediate programmer and Kim Le Bouton has to go apply logic to it and ask,
“Just how do you define a novice programmer, anyway?”
I was tempted to be a smart ass about it and answer that it was anyone who didn’t come to my papers, but was overcome by an uncharacteristic burst of maturity.
First of all, my definition of a novice programmer, having been elected the word chooser of this blog unanimously by a nationally representative random sample of all of the people who are me, would say this:
“Being a novice, as distinct from an expert programmer, is not merely a function of years of experience, it is also reflects quality and results of experience. A novice programmer is a person who is limited in knowledge of the field. “
Recently, someone told me there was a surplus of programmers and a shortage of managers. As evidence, he cited some report he had seen where a couple of programmers knew all sorts of programming languages but couldn’t get a job.
I told him,
“I don’t believe that. I believe there are people who know a programming language who can’t find a job but taking a course in a language doesn’t make you an expert programmer any more than writing in English makes you Hemingway. There’s never been a surplus of excellence and I don’t believe there ever will be. Managers who consider everyone who knows a programming language to be interchangeable are going to find that out to their detriment.”
One difference between novice and expert programmers is hours. I loved the book Outliers, by Malcolm Gladwell. His main point was that people who are outstanding in a field spend much, much more time in practice than people who are simply very good.
A while back, Mark Stevens posted a blog on Zero to SAS Certification in Ninety Days. Now, Mark Stevens seems to be a pretty smart guy, who started out with the education, motivation and experience that would make him derive the maximum benefit from this training and it is theoretically possible that I am dumber than a rock, but I seriously question what exactly one is being certified as in three months.
After 28 years of working with SAS, I would like to believe I have learned more than could be picked up in 90 days of study. So, back to Kim’s question, what would that be?
WHAT: A novice programmer is one who knows fairly limited set of procedures or solutions for most problems. For example, given the need to aggregate categories, he or she might consider several IF-THEN, ELSE statements and probably an ARRAY statement with a DO – LOOP. A more experienced programmer would consider other options such as PROC FORMAT or PROC FREQ, to name just two. An example of the former… I am using the 2008 Uniform Crime Reporting data on hate crimes. These are coded in infinite detail. I’d like to combine all crimes against races other than black or white, since there are very few in each category. I’d like to combine the categories “Anti-homosexual male, “Anti-homosexual female”, “Anti-homosexual- both sexes” etc. into a single category. Below was my solution:
WHEN: The solution isn’t always to use, or even learn, proc format. Perhaps I wanted to aggregate in a different way. I would like to learn more about the locations in which hate crimes occur. There are 25 categories for location but only a few of them occur as often as 5% of the time. The following few statements will pull out only those locations that occur more than 4% of the time and give me a frequency distribution of those locations along the way.
proc freq data = in.hatecrime ;
tables loccod1 / out = location (where = (percent > 4)) ;
proc sort data = location ;
by loccod1 ;
proc sort data = in.hatecrime ;
by loccod1 ;
data common ;
merge location (in = a) in.hatecrime ;
by loccod1 ;
if a ;
run;
WHY: There is an almost magnetic attraction between software and oneupmanship. Someone might say they above solution is not efficient, there is a better way to do this without two sort steps. Maybe. I can give a reason why I did it this way.
Total processing time (real time, not CPU time) was 78 seconds. It took me another minute to type those statements. So, in terms of both processing and programming time, it was efficient. Most of all, it is easy to read, so if I need to explain it to someone or turn the program over to someone else because I am leaving a project where I was brought in as a consultant for a short period, it is a simpler transition.
I did the frequency procedure selecting those locations that had a percent of > 4, I sorted by those locations and then created a new dataset from the original dataset that excluded those with low frequency.
When someone presents me with a more complex solution to a problem like this, I am the opposite of impressed [that would not be unimpressed. Unimpressed is null. People like that score negative on my impressed scale]. I’ve had people tell me, very condescendingly, that code like the above is wrong because it is inefficient and doesn’t minimize CPU usage. And I sit there thinking that CPU time was 39 seconds, so why do I care?
HOW: This is the Hemingway, part, I think. An expert programmer is able to put together those different pieces of knowledge, the what and when and why, apply what they know, integrating information on some subject area – be it marketing, statistics, genetics or what have you – and come up with a solution that is greater than the sum of the parts.
A novice programmer just hasn’t put in the hours yet to learn a wide array of techniques that can be useful in solving a variety of problems. This in NO WAY implies the person is dumb or incapable of learning to be a fantastic programmer. He or she just hasn’t become that yet.
This is usually because the person is new to the field, but it can also be a result of a lack of interest or a lack of time. I don’t buy that it is due to a lack of opportunity. If you are anywhere with an Internet connection and you have a few bucks to buy a trial or learning sample of the software there are tons of resources out there for you to learn. There are even open source offerings like Linux and R that you can get for free.
The secret is to just hack away at it, and the deeper secret than that is to love it. Without going into boring details (unlike how I usually do) – when my hotel turned out to be more one star than four star, I was extremely upset and frustrated last night. My solution was to sit up until 4 a.m. reading up on generalized linear models, link functions canonical variates, response bias, and trying different things with proc format.
When I do this kind of work, I’m happy and content (Mihaly Csikszentmihalyi would call it “Flow”) and so I work a lot.
I think I am a damn good programmer and statistician and I think that is the reason why. There isn’t a secret decoder ring. Sorry.
Aug
31
Yes, Virginia, There IS Discrimination against Women in Technology
Filed Under Dr. De Mars General Life Ramblings, Technology | 2 Comments
My work day started with a call on research design and ended ten hours later after I fixed a program that wasn’t working. I just resigned from my position as senior statistical consultant at a major research university so that I could concentrate on research. I’m on the technical staff on several projects, have a Ph.D., a record of scientific publication, am frequently an invited speaker on assessment, methodology and SAS programming. So, what am I whining about?
Who the fuck are you to say that I am whining?
That, my dear, is probably one of the reasons that I have been successful in this field, and one of the problems women in technology face.
I’m the size of the average twelve-year-old, female , Hispanic and over fifty to boot. Despite all of these disadvantages, I am doing well in this field, thank you because I have a few …
Qualities that I don’t think should be necessary for women in technical fields, but they are….
1. I can be a straight-A dyed in the wool bitch when the situation warrants.
One day, I was sitting in a faculty meeting with the suspicion that women in our department were not being taken seriously. As a statistician, I decided to collect a little data. I drew a cross-tabulation. The rows were gender of the speaker and the column was whether the next speaker responded – questioned, followed up, elaborated – or ignored the comment as if the speaker hadn’t even said anything. Of the speakers, 80% were male (the department was about 50% female) and of those 20%, most of the comments were made by me. Near the end of the meeting, I made a comment and again, a male member of the faculty made a remark that was if I hadn’t spoken. I pounded on the table and said,
“I said something and God damn it, you are all going to listen to me!”
Then, I mentioned the data I had been collecting during the meeting (believe me, the chi-square was highly significant). The two department chairs present were somewhat embarrassed but no one argued with my data. We discussed whatever the topic was – I think it was reducing our mathematics requirement for general education.
Personally, I don’t have a problem pounding on the table and swearing if that’s what it takes. Three points:
- The men in the room didn’t need to be that way.
- Not all women are like me.
- Not all women should HAVE to be like me. I have a pretty high self-esteem but not so high that I think everyone must be like me because I am so perfect.
Women who support Arrington’s view on Tech Crunch that it isn’t men’s fault that there aren’t more women in tech because “After all, look at me, I’m not complaining and I’m doing great” are perhaps missing the point that they are doing well because they have certain characteristics that men don’t need to have.
In my copious spare time, of which I have none, I teach judo. In 1984, I was the first American to win the world judo championships. One very important lesson it took me a while to learn as a coach is that not every athlete is me. Not every world class athlete is me. I would have been a better coach if I had learned that lesson earlier
2. I aggressively seek out mentors and figure everything out all by myself if I can’t find them.
I was discussing this with a young woman today. She’s probably my daughter’s age. We were working on a program and I commented to her that I had noticed she did not get the off-handed kind of help that the male staff members got. The men tell one another excitedly about new apps, new functions, bug fixes and other interesting and useful information they come across. She said,
“Now, I don’t understand this program at all, but you are explaining it to me where the guys would just be like – here, you’re not interested in this, let me do it. Or, you don’t know how to do this, so just go away.”
I know she is telling the truth because I have seen just exactly that happen to her many times. Maybe I should be more of a mentor. I feel a little bad about that, but she doesn’t work for me, and hey, I am busy. I told her,
“Well, of course you don’t understand it! No one comes out of the womb knowing this shit. But you’re smart, you’ll get it. Just keep plugging away. If you have any questions, ask!”
The program she wrote in the end was very good. Women, much more than men, in my experience, need to be immune to subtle and not so subtle discouragement, to disrespect. While Arrington says that Tech Crunch goes out of their way to invite women, these are the women who have already made it. Where men generally don’t go out of their way, and in fact, don’t even think about it, is in the unexamined assumptions and treatment of women. Most of the men this woman works with are very nice people who like women in general and are married to one or would very much like to be some day. They don’t treat me like this because …
3. I got all the credentials
I have a Ph.D., two masters degrees, 28 years programming experience, articles published in academic journals and so on. There is an enormous body of literature on social psychology on bias. In brief, the same study done over and over runs like this.
The identical resumes are sent out. Half of them have experience but no degree. Half have a degree but no experience. Of the entire sample of resumes, half of them have a name (or picture) that shows them to be female (or black). The other have are white (or male). The two resumes with experience/degree , white/black or male/ female are then sent out randomly to a group of college students/ personnel managers/ or whatever group.
The results are always the same. Overwhelmingly, the male (or white) candidate is selected. Those who choose the male candidate swear it had nothing to do with gender, he had experience. However, the managers/students/whatever who had the reversed resumes swear it had nothing to do with gender, he had a degree.
This is why all those people who loudly proclaim “I’m not a racist” or “I’m not sexist” have me wanting to slap them.
I have experience and I have the degrees. I work with men who don’t have nearly the educational qualifications I have. These guys are SMART and they’re fun and I like working with them. I truly don’t believe any woman would get the jobs they have without a graduate degree – and guess what, there are very , very few.
The one you hear next, of course is, “You just wouldn’t fit in with our team.”
Despite the impression I might give, I actually believe that most people are genuinely good at heart and well-meaning, that the false assumptions and subtle discrimination is not intentional and they really would try to change most of them, if it was pointed out. Some people are just jerks, though. There are people who would never want me working for them because I “am not a team player”.
Let me give you an example of a person who said he would never have me work for him..
I had written a program that was, if I do say so myself, a pretty kick-ass awesome piece of work. As most things that are that awesome, there were other people who helped, who came up with design suggestions, reviewed the results and made recommendations for improvement. All of the coding was done by me. I don’t get the chance to just write code that much and I was justly proud of this product.
4. Have the luck to have awesome bosses and mentors
We had a matrix management model at the time and the project manager, who was not my boss, came to me and wanted to have UMF review all of my work and “check that it is correct”. Now UMF is male and fits the stereotype of what a programmer should look like, which, I could gauge from this is not a Latina grandmother. UMF also is complete waste of oxygen as a programmer. Think the absolutely stupidest code you have ever seen written and that is UMF. I did not make up the acronym UMF. This is how he is referred to by the other programmers. The U stands for useless.
I said,
“No.”
Short version of long story, the project manager weenie went to my boss and told him that,
“AnnMaria says she’s not going to do this.”
to which my boss responded,
“Well, I guess that means she’s not going to do it.”
Dr. Richard Eyman was my doctoral advisor. He spent endless hours teaching me statistics. When everyone but me dropped one of the upper level doctoral courses, he taught it to me as an independent study. He introduced me to his friends who were profoundly competent in psychometrics, people like Jane Mercer. He made sure I took courses from people like Keith Widaman and Lew Petronovich.
It was just luck. I attended UC Riverside because I was pregnant with my second child, my husband had just taken a job at Rohr Aircraft in Riverside and I didn’t want a blank spot on my resume while I was out of the job market having a baby (which turned out to be two babies in thirteen months).
5. I’m not bothered that no one in the room looks like me.
Being in judo probably helped my career. I’m startled by the number of female judo competitors I meet who are in the tech field. It’s kind of ironic that a non-male, non-Japanese American would be the first from the U.S. to win the world championships because that is certainly not the demographic of U.S. judo competitors. I’ve spent so much of my life being the only woman in the room that I am used to it. It’s actually gotten better. I remember 28 years ago when I was pregnant (hence needing a bathroom every 30 minutes) at a meeting in an aerospace plant where NO ONE knew where there was a women’s restroom because all of the people I was meeting with were male engineers. I finally spotted another woman, grabbed her and said I KNOW you know! Turns out she was just visiting, and, in fact, did not!
Whether it should bother you or not that no one is like you (that you “just don’t fit in”) is a separate issue.
My point is that there are a number of characteristics that women must have that men don’t need to be successful in technology.
These are but PART of the reasons I see that there are few women in technical fields. And why, exactly, is pointing this out called whining?
Aug
18
Years ago, I read a science fiction story about a future where all plays were performed by robots that had been programmed with the combined characteristics of the world’s best actors. An aspiring actor sadly asked the technician working on the computers to run these robots:
“What would you do if they invented a little black box to do your job?”
The technician paused thoughtfully and responded,
“Well, I guess I’d get me a job making them little black boxes.”
Remember “word processor”? You probably don’t but thirty years ago it was one of the fastest growing careers in America, women were moving out of the typing pool to “technical” jobs using word processing machines. And then, applications like WordPerfect, Word, Wordstar and Applewrite got to be easy enough that no one needed a word processing department and all of those jobs evaporated practically overnight.
I think that as soon as you have down DATA steps, PROC sort and other basic steps, it’s really time for you to look around for new challenges. The world is changing, whether you like it or not. SAS Enterprise Guide, Stata, SPSS, JMP, S-plus and even Excel/ Open Office to some extent, are all making it easier and easier for people to merge, create and analyze data sets all on their own using pre-written procedures that can be selected from a menu and populated with choices dragged and dropped in a pop-up window.
When you start out as a SAS programmer, you use the pre-written formats to make your data read “August 18, 2010″ instead of 8/18/10 and you think you are cool. Then you find out you can create your own formats and have it print “Dennis’ birthday” for the 18th and “Naked Mole Rat Appreciation Day” for the 19th and so on and you are convinced that you are super-cool.
If you liked PROC FORMAT allowing you to write your own formats, you’ll really love the SAS macro language, a major extension of Base SAS that allows you to write a program to write programs. But … learning to write macros is a big leap toward being an expert programmer, especially for someone new who is just getting used to PROC MEANS.
There are a couple of baby steps you can take to get started in that direction. The first is the use of %INCLUDE. I usually introduce this to new programmers first because it is relatively easy.
%INCLUDE is a gentle way to get a novice introduced to the concept of SAS macros, which can be a bit intimidating.
When you include something, unlike with a macro, you don’t need to change any of the variables, statements or functions. Debugging is a snap because you can run the code exactly as is, copy it into another dataset once it is bug free and then run it. Yet, it leads you into the idea of having code from somewhere other than within your program run over and over.
So, what does %INCLUDE do?
It includes programming statements from another file outside of your program. There are two reasons to use this very frequently. One is to save copying or typing the same code over and over. A common example is acknowledgements of funding, legal disclaimers, legends or other text that might always go at the bottom of every page on a website, such as:
Footnote1 “Material under the Creative Commons license for this site” ;
Footnote2 “can be freely produced as it was created by pixies who live on air “ ;
Footnote3 “Other brand and product names are trademarks of their respective companies. “ ;
Footnote4 “Which are probably owned by the devil and run by communists“ ;
Footnote5 "Except for SAS" ;
Footnote6 "and SPSS (now owned by IBM) on Thursdays" ;
Footnote7 "and Stata which is in Texas and run by cowboys."
If you have 10 lines of this, you can copy and paste it every time you write a new program, or you can simply save the footnotes to a file and when ever you need these use a line something like this:
****** This is the text for footnotes **** ;
%include “c:\myfiles\mysasfiles\footers.sas “ ;
You really don’t need those ten lines of footnotes getting in your way every time you’re trying to read the program and de-bug it or document it. A major advantage over copying and pasting the same lines in every program is that if you get a new grant, or SAS gets bought by IBM, you don’t have to go find every program and change that code. I can just change it in the footers.sas file and I’m done.
A second common use for %include is , again, to make the code more readable and get rid of distractions. While it is kind that ICPSR includes PROC FORMAT code with many of the SAS data sets available on its site, these can get in the way.
You may get a program from ICSPR or another source that has hundreds of lines of Proc Format, Label and Format statements (this happens to me all the time).
proc format;
value statnum 1='(1) Alabama'
2='(2) Arizona'
3='(3) Arkansas'
And a hundred more lines of the same …
Followed by :
label
rec_bh = 'bh:record type'
statnum = 'numeric state code'
ori = 'originating agency identifer' ;
format state statnum. division divisn. ;
And a hundred more lines …
Moving the Proc format, labels and formats cuts the length of the program in this example that I had received from 387 lines to 160. There are several reasons I may want to do this. The first is debugging. Before I am sure I read in the code exactly the way I want it, the formats are pretty irrelevant. Often, for whatever reason, the file is not in the exact format I expected. I don’t want to slog through a lot of useless formats and labels before the data are actually read in. When I do have a permanent data set, I may want to use those formats but not have to see hundreds of extra lines in my code.
Important point: %INCLUDE statements are executed as they are read. When SAS hits a %INCLUDE statement it is as if the code as copied into your program. Think about this. What this means is that I had to move the FORMAT procedure into one file and the LABEL and FORMAT statements into another file.
My program now looks like this:
***** Proc Format to create user-written formats ***** ;
%include “c:\myfiles\mysasfile\crimefmt.sas” ;
Data libref.crimes ;
Infile datasetname ;
Input variables ;
<bunch of other statements >
******* Labels and Formats *********************** ;
%include “c:\myfiles\mysasfile\labelfmt.sas” ;
run;
Note that if I have the labels and formats after the run statement, it will give me an error.
Try it. It’s very non-threatening and once you get used to seeing those %____ in your program you are ready to move on to the next baby step, assigning macro variables.
Don’t believe that crap about how hard programming is or how hard statistics is. Everything is hard when you first start doing it. As a good friend of mine, who is a very good coach said at practice one day,
“When you learned to walk it was hard. You fell down on your butt. You screamed, you cried, you were really frustrated. But you didn’t give up and now you walk around all the time and walking, that’s just nothing to you.”
Yeah, it’s like that.
Aug
5
Ubuntu 64, SAS 9.2 & What I Do All Day
Filed Under Software, Technology | 3 Comments
The next time your boss asks what you do all day or why it takes you so long to answer a question, show him or her this …
Now, unlike this blog. where I basically drink Chardonnay and say whatever the hell I feel like, when people are paying me for answers, I take my work pretty seriously. Years ago, on the very cool resource, SAS-L there was a (thankfully, short) trend where people would post answers with the note “Code not tested”, which is a polite way of saying, “I don’t know if this will work.”
When anyone asks me a question, unless it is something like “Do you end SAS statements with a period or a semi-colon?”, I test my answer before I send it. For one thing, with changes in versions, different operating systems and other variations, something that may work in one situation may not work in another.
So… the very simple question asked was:
“How do I renew SAS for Linux 64?”
First problem: I do not have SAS installed on a 64-bit Linux system to renew, so, I decide to install it on this Ubuntu VM I happen to have , and it doesn’t work at all. Well, I had ASSUMED that since it is a VM running on a 64-bit Mac that also has a 64-bit Windows 7 VM and a 64-bit Vista VM that it must be a 64-bit Ubuntu VM.
The first time the install failed I thought, gee, maybe I should check. Well, I had created this VM about a year ago to test some things on the 32-bit version of Ubuntu for someone so …
Delete VM I no longer need.
http://www.ubuntu.com/desktop/get-ubuntu/download
Download the 64-bit iso.
Curious that it says not recommended for daily desktop usage. Read several posts speculating on why it said that but nothing to convince me it was a big problem.
http://ubuntuforums.org/showthread.php?p=9370540
Downloaded and installed anyway.
Second problem: COULD NOT GET PAST LOGIN SCREEN! Ubuntu would not take my password. It was as if it did not recognize the keyboard. Turns out this is a known problem with VMware/Mac/Ubuntu combination.
Must be relatively new because I did not have it before.
Searching Google, source of all knowledge, I found the answer on the source of all knowledge on Ubuntu, the Ubuntu forums
http://ubuntuforums.org/showthread.php?t=1467905
I’ve had the keyboard problem with Ubuntu 10.04 in VMware Fusion 3.0.2 on a MacBook, US keyboard layout. I got around the problem as follows:
- At the logon screen, go to the Accessibility Preferences at the bottom of the screen, and tick on screen keyboard.
- You may have to reboot if the virtual keyboard doesn’t start.
- You can now type in your password using the virtual keyboard on the logon screen. Once logged in, your physical keyboard works.
To fix the problem with your keyboard
- *Open a terminal, and reconfigure your console using the command:
sudo dpkg-reconfigure console-setup
Once logged in, I went to
cd /media
cd to the external hard drive where all my SAS deployment folders are (we have MANY versions of SAS for many different operating systems)
Then did
cp -R sas92Linux64 ~/sasinst
This copied all the files and directories in the deployment folder to the ~/sasinst folder on my hard drive
I changed to bash
sudo rm /bin/sh
sudo ln -s /bin/bash /bin/sh
I then went to the folder where I had copied the deployment files, typed
sudo ./setup.sh
Third problem: The deployment wizard started …. and then stopped
Nothing>
Now, at this point I have created a new VM, gotten around the keyboard problem, copied over the files and still nothing
I then found this FABULOUS web page from the National Center for Ecological Analysis and Synthesis
http://help.nceas.ucsb.edu/Install_SAS_on_Ubuntu
VERY IMPORTANT ADVICE …
install required packages:
sudo apt-get install xauth x11-apps libstdc++5 ia32-libs libxp6
Then, of course, comes the question, where do you find libstdc++5
http://packages.debian.org/stable/base/libstdc++5
So…. I download libstdc++5 , run the command above to get everything I need installed.
THEN …
I go back and typed
sudo bash setup.sh
Everything installed.
Getting SAS to work
First, create a directory to act as your working directory. I called mine tmp
I went to usr/local and typed
mkdir tmp
Give everyone access to write to the tmp directory
sudo chmod 777 /usr/tmp
To get SAS to start the first time I had to type
/usr/local/SAS/SASFoundation/9.2/sas - work /usr/local/tmp
———– One more thing —-
Since I always like to tie up the loose ends, I took one final bit of advice from the wonderful NCEAS site and typed the following :
sudo ln -s /usr/local/SAS/SASFoundation/9.2/sas /usr/bin/sas
Now when I start SAS all I need to do is type
sas -work /tmp
I like that better.
Fourth problem: I get an error message telling me that there is a mismatch between my license and the version of SAS I have installed. I talk to a few people at SAS and get several other things I need, like an electronic software download for the latest version of SAS for Linux 64, the SID file to renew Linux 64 for a different site, and I have a long conversation with someone who tells me that what I have installed is Linux 32. Since the current (expired) SID file says Linux 64 and when I run a PROC SETINIT it says Linux 64, and the original ESD from a year ago says Linux 64, I tell her that while it is theoretically possible that it is, in fact, Linux 32 which was mislabeled in four different ways, I kind of doubt it. She finally gives in to my superior logic, tracks down the SID file for this version and emails it to me.
Once you do actually have the correct SID file …
Here is the answer I actually sent on how to renew SAS on Ubuntu (Remember, that was the original question)
================
1. Open up a terminal window (under applications > accessories)
2. cd to where you have your sas software installed
cd /usr/local/SAS/SASFoundation/9.2
3. Start the renewal utility by typing
sudo ./sassetup
4. Hit enter to continue
5. Type 1 for Run Setup Utilities
6. Type 1 for Renew SAS Software
7. You’ll be prompted for the file containing the SAS installation data file. If you downloaded it from an email sent to you by your SAS Administrator, it will probably be something like
/home/ademars/Downloads/SAS92_Linux64.txt
(by the way, when I tried ~/Downloads/SAS92_Linux64.txt it didn’t work )
The Setup Utilities Menu will pop up again.
Type q
You have now renewed.
Aug
1
It’s not you, it’s me: Getting SAS EG, S+ to run on VMWare
Filed Under Software, Technology | Leave a Comment
Vista on VMware was running slow. I mean painfully slow. Like, bamboo shoots under your fingernails painful. As in banana slug slow.
SAS Enterprise Guide was running SO slow on VMware I had gotten to the point where I would read a book while waiting for it to open, or to view results in a project. These are results from tasks that I had run previously, so we’re talking just moving from one window to another.

On the positive side, I read a few chapters in
Decision Trees for Business Intelligence and Data Mining, by Barry De Ville, a book I recommend.
I checked the Task Manager and it said CPU usage was 100% which seemed very odd. I had SAS running on several virtual machines on four other Macs and it ran fine. The one I happened to be using lately though was a laptop my wonderful husband gave me to replace the l7 inch one I dropped. You know those commercials where they show the laptop being dropped in the airport, the traveler panicking and then everything is fine due to the Titanium case?
Yeah, well, that didn’t happen.
So… while it was getting a new screen I was using a 13-inch laptop, which really did not work since I have terrible vision. I hooked it up to a 28 inch monitor and all was well except …
When I tried to run SAS, it was unbelievably slow. It wasn’t that not enough memory was allocated – I had given it 2 GB which was the same as three of my other computers, so it should have enough memory. I had VMWare installed on all of them, but the others were all running Windows XP, Windows 7, or Windows Vista 64. I came to the (erroneous, it turns out) conclusion that on Vista 32, SAS Enterprise Guide sucked as bad as when it first came out ten years or so ago when it was glacially slow. I was so frustrated I thought I would go with option A, use S+, which I had been meaning to do more with for a while. Option B involved moving four feet to the desk behind me and copying the data over.
When S+ took forever to start, the light dawned. Obviously, it’s not you, it’s me.
Coincidentally, I had just been thinking today about how there is no time to check ALL of the things we hear or assume we know. I’d been told at some point that you shouldn’t allocate more than half the RAM for a virtual machine because then memory swapping would occur. When I customized the settings for the VM the pop-up menu suggests not using more than 3GB. 2 is less than 3 (see how good I am at math?) so I should be fine, right?
Then, I read on the macrumors forum a post by someone who reported their performance slowing considerably once they went over a third of the memory. So… contrary to what you would normally expect that allocating more memory would make your machine run faster, this is, in fact, a curvilinear relationship and at some point it makes your machine run slower.
This makes perfect sense when you think about it and I knew this. What finally dawned on me was that the computer that was so slow had 4 GB of RAM and the other three each had ether 8 or 16 GB. Also, when running on my 17 inch laptop, I didn’t use two monitors because I could see it okay. Not great, but good enough.
So … I reduced the RAM allocated to the VM to slightly over 1 GB, closed the laptop and just used the external monitor instead of having two monitors and used VMware in full-screen mode.
And … all is well with the world. S+ popped right open. SAS EG is behaving again.
So, I have been reminded of a valuable lesson, which is that when it comes to software problems, sometimes, it isn’t you, it’s me.
I do want to add, though, that the same does not extend to relationships and if we ever break up, it’s definitely you.
Jul
24
JMP: Three shiny things catch my eye
Filed Under Software, Technology | 2 Comments
Hmm … so, Liz, our finance person is incomparably efficient and unfailingly nice, where I am usually efficient and have a reputation for being correct 97.6% of the time (as someone commented on twitter, if it has decimals in it, it must be true).
Between the two of us we just accomplished the impossible task of adding another statistical package for the university-wide license. Getting anything approved at a large institution requires something like the following;
recommendation and agreement to provide technical support (me), request from finance (Liz), approval from person in charge of the budget, approval from person in charge of person in charge of the budget, approval from legal department, sacrifice of a live chicken, dancing naked in the network operations center, signing of the contract with the blood of a unicorn executed by a troll under a full moon.
Well, it might be simpler than that, but not much. Since we have just agreed to increase the number of statistical packages installed by 33% with a 0% expansion in staff (what was I thinking?) it seemed like a good idea to drive down to Carlsbad and check out the JMP Explorer Seminar and see if I could steal any ideas to put up on the JMP website and FAQ which I now need to create (seriously, what WAS I thinking?).
First cool things I will put on the site are a description of the Graph Builder and a discussion of export to flash.
The graph builder is drag and drop on meth.
Here, I want to compare the correlation between the pretest and post-test by experimental and control group. I drag pretest to X, post-test to Y and Group to “Group X”.

As I was reducing the size of this graph in Graphic Converter (amazing deal at $34.95 and no I don’t get a kickback from them. I mean seriously, with as much as I talk shit about everything here do you honestly think anyone would PAY me to write about them?) to post here it occurred to me that it would be helpful to have a line that showed the pretest mean so I added that. The whole graph took about 30 seconds.
From my really cool chart here you can easily see that the majority of people in the experimental group scored above the pretest mean (that line) while the control group scored noticeably lower than the experimental group. You can also see that there is, as there should be, a stronger correlation between pre- and post-test for the control group than there is for the experimental group.

This next chart took just another few seconds to create, but as I looked at it, I realized three things. First, it would be better if I had put the sites in chronological order rather than alphabetical order because the difference between experimental and control was greatest on the last one we did (V) and least on the first one (I). Second, it would have been better if I had grouped by Group (uncreative name) on the X axis and site on the Y axis so it would be much easier to compare them side by side as in the chart above. Third,
**** AND THIS IS A VERY IMPORTANT POINT WHICH SELDOM HAPPENS HERE SO PAY ATTENTION ***
I think there is such a thing as visual literacy. Just like experienced statisticians can look at a cross-tabulation and in their heads estimate (observed – expected) and get a quick appraisal of likely size of a relationship, it takes some staring at visual data, too. The more graphical displays of data, the more I see and the more ideas I get for how to do it better. While this may seem like a blinding flash of the obvious, I mention it here because I have read so many books and articles that say data visualization should not need any explanation. On one level, yes, well, maybe.
However, I think, as with statistics in general, the more you study it, the more you DO see.

Back to JMP, one of the reasons we felt it was important to add it to our campus offerings is that it allows you easily to do those explorations, to look at data from one side and then another (literally). I could have re-done the chart above in seconds. Of course, then I would have had to have opened JMP again, saved the chart, and uploaded it to this site, which would have taken me possibly two minutes. But, I have a quota of three graphics per post so I ate jelly beans for two minutes instead and then included the bubble plot as the last one because it moves, has colors and pointy-clicky things.
You laugh and sneer but lo I say to you that Youtube and Facebook each have hundreds of millions of users and all of Scientific Software International’s Item Response Theory programs put together are used by fewer people each year than the number of pigs sold in one day for Farmville. (Incidentally, Eric Greenspan of Make it Work is my hero for having bought the url www.ihatefarmville.com which redirects to a site with information on him and his company.)
The Bubble Chart — simply include an X value, a Y value and a time value. You can also, like I did, choose a value to color by, and (as I didn’t) a value for the size of the bubble.
Here I have the different test sites (X axis), months of product testing, and score. Since these were just data I had on my computer while I was sitting in the seminar and not something like stock prices or median home prices by state the chart does not look as cool as examples that would apply to this type of visualization. What I want to illustrate here, though, is the fact that in under a minute you can drag in a few variables, then, click on the ubiquitous red arrow. One of the options is to export as flash. Now you have your chart in flash.
Click on it and you can label bubbles, zoom in, zoom out, change the speed, size and other interactive options. Did I mention it took me about 30 seconds? Almost makes me want to re-do it with something other than data I just had lying around.
Now THAT is some kick-ass statistical software when it makes you want to go out and find reasons to use it.
That kind of reaction to software is usually limited to applications that involve shooting people or pornography. However, unlike in those other options, a three-way interaction in JMP will get you neither dead nor a sexually-transmitted disease.
Jul
20
Behind the Door Marked ‘Beware of the Leopard’: Importing Excel 2007 into SAS 9.2 on Windows 7 x64
Filed Under Software, Technology | 4 Comments

Some times documentation can be a little hard to find…
You may be aware of the fact that, if you are running SAS 9.2 on a 64-bit Vista or Windows 7 machine the Import Data option from the file menu does not work for Excel files.
Per SAS Usage Note 33228: (Courtesy of Peter Ruzsa in SAS Technical Support.)
You are running into this issue here,
“An error occurs when you use SAS® 9.2 to import or export Microsoft Excel or Access files in the Windows x64 and Windows Vista 64 environments.”
(Yes, we know that.)
When you use SAS 9.2 to import or export Microsoft Excel or Microsoft Access files in the Windows X64, Windows Vista 64, and Windows 2003 64-bit server environments, you can receive the following message:
ERROR: DBMS type EXCEL (ACCESS) not valid for import.
In addition, when you use the Import and Export wizards, the Excel engine is not presented as a selection.
(Yes, and this makes us sad because people insist on continuing to email us files in Excel format, and Access, too, but we have these shiny new computers running SAS 9.2 that we want to use and, on top of it all, we are out of doughnuts. They keep buying that raspberry arugala crap instead. Why do we always modernize the wrong things?)
You could save your Excel 2007 files as .csv and import them that way but that is pretty inefficient.
So, let’s read on in Pete’s note… well, actually, let’s not because it had some code in it that probably works for some people in certain situations. I was not one of those people. However, maybe you are, so you can go to the SAS knowledge base and read it here.
http://support.sas.com/kb/33/228.html
When that didn’t work, I tried swearing. Next, I went to the documentation for PC Files Server, specifically, this page
http://support.sas.com/documentation/cdl/en/acpcref/63184/HTML/default/viewer.htm#/documentation/cdl/en/acpcref/63184/HTML/default/a003353773.htm
which gives the exact correct code for running Proc Import, assuming you have the PC Files Server installed. Which, it turns out, I did not.
So …, from a different helpful person at Tech Support, I received the following:
“Note if you have an existing 9.1 or 9.2 pc file server you should uninstall it first.
1. Download the PC file server from the following location to your windows pc that
is going to run the application. You can find it at this location:
ftp://ftp.sas.com/techsup/download/base/zqjpcfileserver92m3.zip
you can simply save the file to any location on the pc where you are going to install the SAS PC File Server
2. For more information on the PC file server go to this link here.
http://support.sas.com/documentation/cdl/en/acpcref/61891/HTML/default/a002645029.htm
3. Unzip the zqjpcfileserver92M3.zip file on your pc, it will unzip to
the pcfilesrv__92130__prt__xx__sp0__1 sub directory where you stored the zip file.
4. In the unzipped directory named pcfilesrv__92130__prt__xx__sp0__1 double click on
the setup.exe
5. This will start the install
a. The setup.exe will install the pc file server in the C:\Program Files\SAS\PCFilesServer\9.2 directory.
If you are installing this on an X64 box it will install in C:\Program Files (x86)\SAS\PCFilesServer\9.2 directory because
this is a 32 bit application.
b. You will have a choice to install the pc file server as a service. The checkbox selection is
Start Service Now and When Windows Starts.
c. Note that if you install it as a service you must read network drive names with their Universal Naming Convention names such
as \\servername\directory\filename.xls.
After I installed the PC Files server, everything worked absolutely lovely to import Excel files, whether using the Import Data option in the File menu or Proc Import in my code ON WINDOWS 7 x64. So, my advice is that if you have a shiny new computer and a shiny new SAS 9.2 Maintenance 3 and you want to import the latest in Excel files or Access, download and install the PC Files server and you will be happy. Someone might even bring you doughnuts. But don’t count on that.
When I tried the same exact steps in Vista 64 I received a message. “Connection failed. See log for details.” The “details” were that SAS stopped processing this step because of errors.
Bad computer! No doughnut !

Jul
20
You May Be a Novice Programmer if ….
Filed Under Software | 2 Comments
I am writing a paper on moving from novice to intermediate programmer and got to thinking about the sort of things that people say that identify someone as a novice programmer.
NOTE: No one is allowed to feel bad for having made these mistakes. Everyone you meet will admit to having made the exact same errors at one time, except for a very few people. Those very few people are probably lying. Try to avoid having coffee with them. They are a bad influence.
( Not long ago I was on the phone with someone and they said to type something like “ls pipe command” and I actually typed the word “pipe” instead of ls | command.
as in ….
ls | mail annmaria@thejuliagroup.com
Fortunately, I did not actually hear the person say, ‘What a moron.’ A fact I attribute to the helpful invention of the mute button. In my defense, I was only on my 4th cup of coffee recovering from a conference call at 6:30 a.m. that morning with a group that apparently believed that the entire world is on Eastern Standard Time.)
These characteristics DO generally reveal you as a newbie:
- Thinking that just because your program ran and there are no messages that say ERROR in your log that your results are correct.
- Not reading your log.
- Thinking that just because your program ran with the perfectly cleaned up test data, or with the first 1,000 records, that all is now well and there will be no problems with it.
- Writing your own code for common functions like mean, log, random numbers. I don’t mean to be rude (no more than usual, anyway), but did you really think that no one in the previous decades no one thought about this and included it as part of the language?
- Copying and pasting the same lines over and over. – If you are doing that, I’ll bet your code is almost screaming at you MACRO! or DO-LOOP or maybe ROSEBUD! (Well, the latter is the least likely, actually.)
- Not using comments, which is proof of your unfamiliarity with “Eagleson’s Law: Any code of your own that you haven’t looked at for six or more months, might as well have been written by someone else.” (I did not know that had a name until recently.)
There are several more but I am going call it a night, as I have a meeting at 7 a.m. because, as the individual on the East Coast who scheduled it logically concluded, “It’s 10 a.m. somewhere.” What IS IT with you people?
Jul
6
Are you a programmer or aren’t you?
Filed Under Dr. De Mars General Life Ramblings, Software, Technology | Leave a Comment
So, I am writing a paper on how you know you (or someone else) is a “real” programmer. That is, they don’t fit in that “new user” box any more. But how do you make that decision?
Is it like pornography, you just know it when you look at it? (Not that I ever personally looked at any of course, but I have heard you can find it on the Internet if you try really hard.)
Yesterday, Rob Meekings made a comment about design decisions. That is certainly a distinction, when you get to the point that you are actually thinking that way. For example, I often will merge everything together in one long dataset, a habit that makes those who love SQL and the star schema just cringe. The REASON I do this is that most of the people I work with are researchers using very powerful computers with datasets of a few thousand observations, or, at most, a few hundred thousand. Even on a desktop, an analysis with SAS, Stata or SPSS takes seconds. It isn’t worth taking an extra hour or two to make a program run in one second instead of two. It also may make the program more difficult for the user to maintain him/herself.
HOWEVER, when I am running a program that runs against a 100GB dataset and can take hours to run because the researcher cannot use a supercomputer, e.g., due to security classification, I’ll spend a good bit of time trying to make it run as efficiently as possible.
If there isn’t a pressing reason not to do it, I’d recommend someone with a large dataset considering running it on a cluster and take advantage of parallel processing capabilities. This means changing your code slightly to run on a different OS, often Linux or some other Unix version.
I do a lot of “throw away programming”, that’s not to say it’s garbage. Sometimes I think my work is quite good, in fact, but it’s not production code that runs every day to produce reports on 500 different stores. When I DO write production code, I do several things differently. One is that I make good use of %include statements. For example, if there is a footnote that is going to be in every single output that says, “Funding provided by National Science Foundation Rural Systemic Initiative Grant #1234-2010″ and several more lines about the university, address for contact, etc., I am going to have a small file that I just include. Yes, I could copy and paste it or have that as a template for when I create a new program. BUT what happens when we get another grant and we want to recognize both funding agencies in everything we publish?
My point, and you may be surprised by this point to find that I do, in fact, have one, is that a distinction between novice and non-novice programmers is that they have the luxury of thinking about a design because they know more than one way to do something.
Jul
4
Signs you’re not a novice programmer
Filed Under Software, Technology | 3 Comments

I'll get this down eventually
Writing a presentation for WUSS, I had to fill out the usual check box for the intended audience:
Level of programming expertise:
___ Novice __ Intermediate __ Advanced
and I started wondering when exactly does someone stop being a novice? One answer is that your programming no longer LOOKS like it was written by a novice. That’s kind of circular reasoning, though, isn’t it? To be more specific, here are a few of those signs, generated from a survey of a random sample of 1.
(Note, if your programming does not always show all of the characteristics mentioned below, you are forbidden to feel bad. All but a very exceptional few programmers will admit to having made every ‘newbie’ mistake when they started, and on occasion, they still do when they are rushed, tired or distracted by three fighting children or after their third martini. As for that exceptional few – they’re chronic liars. Stay away from them.)
Five signs you’re no longer a novice, in no particular order ….
1. Good use of functions
AvgQtr = (Jan + Feb + Mar) /3
is a sign of a novice
AvgQtr = Sum(Jan, Feb, Mar) /3
is better
AvgQtr = Mean(Jan,Feb, Mar)
is what an intermediate programmer would do.
2. You know options of options
3. You understand how the particular language you are using processes data.
For example, in SAS, let’s say you have two datasets
Pretest has the following variables: Id Age Gender Testscore
Where testscore is (obviously) the pretest score.
Posttest has the same variables: Id Age Gender Testscore
Where testscore is (obviously) the posttest score.
If you do this (bad!)
Proc sort data = libref.pretest ;
By id ;
Proc sort data = libref.posttest ;
By id ;
Data libref.alltests ;
Merge libref.pretest libref.posttest ;
By id ;
You have just created a dataset that is a copy of posttest because the testscore from the second dataset named will copy over the first.
Try this:
Proc sort data = libref.pretest out = pre (rename = (testscore = pretest)) ;
By id ;
Proc sort data = libref.posttest out= post (rename = (testscore = posttest));
By id ;
Data libref.alltests ;
Merge pre post ;
By id ;
Yes, you COULD have done this by at least one data step where you renamed the testscore variable, but adding an extra step is inefficient.
A good, short article on beyond the basics in proc sort was written by Kelsey Basset.
4. Use your knowledge of functions in your programming logic.
5. Don’t forget about missing values.
For example, a researcher wants to categorize people who have ANY positive response to five questions on raising taxes, “Would you vote to raise taxes if … the state budget isn’t balanced?” “Would you raise taxes if … the option was to cut social services?” and so on.
A novice response would be:
If q1 = 1 then taxes = 1 ;
Else If q2 = 1 then taxes = 1 ;
Else If q3 = 1 then taxes = 1 ;
Else If q4 = 1 then taxes = 1 ;
Else If q5 = 1 then taxes = 1 ;
Else taxes = 0 ;
Better
If sum(of q1 – q5) > 0 then taxes = 1 ;
Else if sum(of q1 – q5) = 0 then taxes = 0 ;
The reason for having the second IF in there is that if you do not then all of those with missing values get set to zero, which may result in throwing off your results by a great deal, depending on how frequent missing data is.
There are a variety of ways, some better some worse. However, one statement that does exactly what we want is :
Taxes = Max(of q1 – q5) ;
If any of the questions were answered 1, the value of taxes is 1. If all were answered 0, the value is 0 and if all were missing, the value is missing.
I saw a similar example from SPSS on Douglas Smith’s page. Although Recode is actually a command and not a function, my point is the same. Once you proceed from being a novice, you are naturally seeing the ways you can make your program more efficient.
“Another example of using recode might be to invert the order of the values for a subjective evaluation variable. For instance, the variable “happy” has three valid response categories:
1 = Very Happy
2= Pretty Happy
3 = Not Too Happy
You might want to change the order to go from least happy to most happy. To do this, all you need to do is swap the values 1 and 3. The recode statement that will accomplish this is:
recode happy (1=3) (3=1).
Oh, and if you don’t use the command window, much less the Do-file editor in Stata, you are definitely a novice. Same goes for anyone who doesn’t write syntax for SPSS or hasn’t found a use for the Program window in SAS Enterprise Guide.
That isn’t to say that there will never come a day when one can be considered a programmer by simply being very good at pointing and clicking.
Just sayin’ …. today is not that day.