Jul
31
Meta-blogging, social media & naked mole rats
July 31, 2010 | 1 Comment
This is a blog about blogging. On the fun scale, meta-blogging probably falls midway between metadata and meta-analysis but I am going to do it anyway. In stages, because, just like if it has bullet points it must be serious business, if it has stages, numbers and units you’ve never heard of, like petapixels, it must be serious research:
Stage 1: Claim Guru Status
I am a social media expert because I have three blogs and accounts on Facebook, LinkedIn, twitter and Bebo. I’m also a member of sasCommunity.org and a bunch of things like Second Life I randomly signed up for and then forgot about.
Stage 2: Scoff at other self-styled gurus
Seriously, what is it with these people who claim to be experts on social media, anyway? As someone said, I must listen to them because they have days of experience. How long does something have to exist before you can claim to be an expert on it? What makes you an expert, anyway?
I have been told over and over by self-proclaimed media gurus that everyone must have an account on Facebook or they’re missing out because half a billion, 700 million, 5 teraflops or a petabyte of shakes of a lamb’s tail of people are on Facebook.
We make money at The Julia Group by doing survey design, evaluation research, quantitative and qualitative analysis. We have some consultants who are terrific statistical programmers and others who are experts in qualitative research. Now, twice in the past twenty years, I have gotten substantial contracts because I was walking out a door as someone I knew was walking in and she (both times it was a she) said,
“I can’t believe I ran into you! I really need a statistician. Are you available?”
So, given that…. it is theoretically possible that someone will see my Facebook page (well, actually, they won’t because it’s private) and say,
“Hey, you rock at Mafia Wars! How about doing $150,000 worth of data analysis for us over the next five years to evaluate our $3 million project?”
I think it more likely, though, that we will continue to get business by submitting grant proposals, bidding on contracts and having people I have worked with in the past call and say,
“We need someone with your expertise on this project. Can you work with us?”
Before the economy went south it seemed to me like the only thing on LinkedIn was headhunters and job seekers. Since I was neither, I never had any use for it. Lately, I’ve given it a second look and it seems mildly interesting. I can’t say I’ve seen any huge benefits but I haven’t been using it much either, so I’d say the jury is still out.
What about Second Life, the place a few years ago that any serious company would be and if you weren’t there you were a dinosaur missing out on the “new Internet”? This article that conveniently was written this week, entitled “How to save Second Life”, well I guess the title gives a clue how that’s going.
Stage 3: Pontificate
What does work? I have run into interesting people all over the U.S. and in a few other countries who say they read my blog, which has gotten me a few free beers. I have occasionally gotten work or hired people I met at conferences, so, it is possible that some of these people who read my blog may eventually become clients or employees. If not, hey, I got free beer, so hurray blog!
Someone said that Facebook is the people you went to school with while Twitter is the people you wish you went to school with. For me, personally, twitter has been the most useful social media in that it has given me a lot of references to useful websites. To paraphrase Douglas Adams, the Internet is big, really big. Having people with interests like mine who pass along sites of interest to them has yielded a number of very useful leads on development in fields I’m interested in, latest research results or just new ideas.
“The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, ‘hmm… that’s funny…'”
– Isaac Asimov
Twitter has delivered more than its fair share of those “that’s funny” moments.
Stage 4 (optional): Say something remotely sensible. If that fails you, just rant
This is the third blog I started. My earlier two blogs were focused on judo (I was the world judo champion 26 years ago and still coach) and for Spirit Lake Consulting, which was our parent company before The Julia Group was spun off as a separate company.
I write this blog because I feel like it. The main benefit for me, in addition to the free beer, is that I have forgotten more about statistics and statistical software than most people know. (Part of this is simply a function of being older than most people.) The problem is, that I have forgotten it. So, I post it here. Suppose I vaguely remember that I did something with AGGREGATE in SPSS that added the number of times a person appeared in the database to each case, or that I did install SAS Enterprise Miner for Desktop and here I am a year later wondering how the hell was it that I did that. I can go search my blog and it pops up.
Since I write it for me, I am often surprised that other people read it, and I am really surprised when I Google a phrase looking for information that my blog comes up on the first page. Sometimes it reminds me of a book I liked when I was about ten years old, called The Pushcart War. At one point, there is a public TV show on the topic of traffic problems with various experts and a movie star. During the show, the movie star states that the problem with traffic is “The trucks are too big and there are too many of them.” And, the author says, since no one understood what the experts said and they all understood what the movie star said, that was the part they paid attention to.
I think it is because I am writing for me that much of what I say is very accessible. I wonder how many statisticians when they are writing notes for themselves do like me and explain
AUTOREGRESSIVE – autoregressive means something is regressed on itself
As this very cool page from the National Institute on Standards and Technology says
An autoregressive model is simply a linear regression of the current value of the series against one or more prior values of the series.
… and how many of them write as if they are trying to fit a mold.
Of course, now that I find more people are actually reading my blog I have given some thought to being (slightly) more serious and not say things like PROC MI is used to make up data, the MI standing for “my imagination”. I haven’t actually done anything. I’ve just thought about it. Sort of like the world’s most spoiled twelve-year-old’s view on cleaning her room.
[If you have a mad desire for a serious blog on multiple imputation, I did write one. Your life is now complete. You’re welcome.]
The rant part
If we were a bigger company, and if I wasn’t the majority owner, this blog would be very different. There would be no profanity. There would be no mention of other company’s products unless they paid us. There would definitely be no mention of any company’s products having all the attractiveness of a sexual experience with a naked mole rat for fear they might sue us. There would be a legal review, an editorial review, a graphic designer to tell me that my blog too often features pictures of naked mole rats.
Someone would tell me that I cannot end sentences with prepositions, that “cannot” is actually one word, that the word count for the average post exceeds what research says people will read and that I don’t write in the tense recommended by the Microsoft Style Guide. All of these people would be paid. My blog would cost many, many times what the current cost to produce (which currently is the value of my time between midnight and 1 a.m. and a glass of Chardonnay) and take at least 60 times longer from beginning to actual publication (which is currently from the time I get an idea to the end of the hour it takes me to slowly sip a glass of white wine, ramble and locate pictures of naked mole rats).
I don’t know how blogging went from being an individual’s musings on life to the property of corporate weenies. This isn’t to say there aren’t some corporate blogs that don’t suck. Jon Peck did a good one for SPSS that I hope will eventually be reinvigorated once the whole IBM takeover gets sorted out. Chris Hemedinger does a cool one on SAS Enterprise Guide.
Most corporate blogs, though, fit the cartoon by Hugh Macleod in his awesome book, Ignore Everybody,
“Welcome to Nobody Cares. population 6 billion.”
I started this blog when a web site editor for an organization I was working for told me that my web pages on statistics could not be approved until they were rewritten because,
“Our entire website has to have one voice and that is the CEO. You cannot use examples that might offend anyone. You must speak in a formal tone. Your pictures are too big and often inappropriate. Your headers are not the right size. You need to conform with the Microsoft Style Guide (attached). I am sure you put a lot of work into this and some people (sniff) seem to think you are very bright but I am sorry we cannot post anything that might cause anyone to call and complain. I suggest that if you want to have your own voice and express your own opinions, you write a blog.”
So, I did. And the Microsoft Style Guide (whatever that is) can bite me.
In case you were wondering, no naked mole rats were molested in the writing of this post.
Jul
24
JMP: Three shiny things catch my eye
July 24, 2010 | 4 Comments
Hmm … so, Liz, our finance person is incomparably efficient and unfailingly nice, where I am usually efficient and have a reputation for being correct 97.6% of the time (as someone commented on twitter, if it has decimals in it, it must be true).
Between the two of us we just accomplished the impossible task of adding another statistical package for the university-wide license. Getting anything approved at a large institution requires something like the following;
recommendation and agreement to provide technical support (me), request from finance (Liz), approval from person in charge of the budget, approval from person in charge of person in charge of the budget, approval from legal department, sacrifice of a live chicken, dancing naked in the network operations center, signing of the contract with the blood of a unicorn executed by a troll under a full moon.
Well, it might be simpler than that, but not much. Since we have just agreed to increase the number of statistical packages installed by 33% with a 0% expansion in staff (what was I thinking?) it seemed like a good idea to drive down to Carlsbad and check out the JMP Explorer Seminar and see if I could steal any ideas to put up on the JMP website and FAQ which I now need to create (seriously, what WAS I thinking?).
First cool things I will put on the site are a description of the Graph Builder and a discussion of export to flash.
The graph builder is drag and drop on meth.
Here, I want to compare the correlation between the pretest and post-test by experimental and control group. I drag pretest to X, post-test to Y and Group to “Group X”.
As I was reducing the size of this graph in Graphic Converter (amazing deal at $34.95 and no I don’t get a kickback from them. I mean seriously, with as much as I talk shit about everything here do you honestly think anyone would PAY me to write about them?) to post here it occurred to me that it would be helpful to have a line that showed the pretest mean so I added that. The whole graph took about 30 seconds.
From my really cool chart here you can easily see that the majority of people in the experimental group scored above the pretest mean (that line) while the control group scored noticeably lower than the experimental group. You can also see that there is, as there should be, a stronger correlation between pre- and post-test for the control group than there is for the experimental group.
This next chart took just another few seconds to create, but as I looked at it, I realized three things. First, it would be better if I had put the sites in chronological order rather than alphabetical order because the difference between experimental and control was greatest on the last one we did (V) and least on the first one (I). Second, it would have been better if I had grouped by Group (uncreative name) on the X axis and site on the Y axis so it would be much easier to compare them side by side as in the chart above. Third,
**** AND THIS IS A VERY IMPORTANT POINT WHICH SELDOM HAPPENS HERE SO PAY ATTENTION ***
I think there is such a thing as visual literacy. Just like experienced statisticians can look at a cross-tabulation and in their heads estimate (observed – expected) and get a quick appraisal of likely size of a relationship, it takes some staring at visual data, too. The more graphical displays of data, the more I see and the more ideas I get for how to do it better. While this may seem like a blinding flash of the obvious, I mention it here because I have read so many books and articles that say data visualization should not need any explanation. On one level, yes, well, maybe.
However, I think, as with statistics in general, the more you study it, the more you DO see.
Back to JMP, one of the reasons we felt it was important to add it to our campus offerings is that it allows you easily to do those explorations, to look at data from one side and then another (literally). I could have re-done the chart above in seconds. Of course, then I would have had to have opened JMP again, saved the chart, and uploaded it to this site, which would have taken me possibly two minutes. But, I have a quota of three graphics per post so I ate jelly beans for two minutes instead and then included the bubble plot as the last one because it moves, has colors and pointy-clicky things.
You laugh and sneer but lo I say to you that Youtube and Facebook each have hundreds of millions of users and all of Scientific Software International’s Item Response Theory programs put together are used by fewer people each year than the number of pigs sold in one day for Farmville. (Incidentally, Eric Greenspan of Make it Work is my hero for having bought the url www.ihatefarmville.com which redirects to a site with information on him and his company.)
The Bubble Chart — simply include an X value, a Y value and a time value. You can also, like I did, choose a value to color by, and (as I didn’t) a value for the size of the bubble.
Here I have the different test sites (X axis), months of product testing, and score. Since these were just data I had on my computer while I was sitting in the seminar and not something like stock prices or median home prices by state the chart does not look as cool as examples that would apply to this type of visualization. What I want to illustrate here, though, is the fact that in under a minute you can drag in a few variables, then, click on the ubiquitous red arrow. One of the options is to export as flash. Now you have your chart in flash.
Click on it and you can label bubbles, zoom in, zoom out, change the speed, size and other interactive options. Did I mention it took me about 30 seconds? Almost makes me want to re-do it with something other than data I just had lying around.
Now THAT is some kick-ass statistical software when it makes you want to go out and find reasons to use it.
That kind of reaction to software is usually limited to applications that involve shooting people or pornography. However, unlike in those other options, a three-way interaction in JMP will get you neither dead nor a sexually-transmitted disease.
Jul
20
Behind the Door Marked ‘Beware of the Leopard’: Importing Excel 2007 into SAS 9.2 on Windows 7 x64
July 20, 2010 | 6 Comments
Some times documentation can be a little hard to find…
You may be aware of the fact that, if you are running SAS 9.2 on a 64-bit Vista or Windows 7 machine the Import Data option from the file menu does not work for Excel files.
Per SAS Usage Note 33228: (Courtesy of Peter Ruzsa in SAS Technical Support.)
You are running into this issue here,
“An error occurs when you use SAS® 9.2 to import or export Microsoft Excel or Access files in the Windows x64 and Windows Vista 64 environments.”
(Yes, we know that.)
When you use SAS 9.2 to import or export Microsoft Excel or Microsoft Access files in the Windows X64, Windows Vista 64, and Windows 2003 64-bit server environments, you can receive the following message:
ERROR: DBMS type EXCEL (ACCESS) not valid for import.
In addition, when you use the Import and Export wizards, the Excel engine is not presented as a selection.
(Yes, and this makes us sad because people insist on continuing to email us files in Excel format, and Access, too, but we have these shiny new computers running SAS 9.2 that we want to use and, on top of it all, we are out of doughnuts. They keep buying that raspberry arugala crap instead. Why do we always modernize the wrong things?)
You could save your Excel 2007 files as .csv and import them that way but that is pretty inefficient.
So, let’s read on in Pete’s note… well, actually, let’s not because it had some code in it that probably works for some people in certain situations. I was not one of those people. However, maybe you are, so you can go to the SAS knowledge base and read it here.
http://support.sas.com/kb/33/228.html
When that didn’t work, I tried swearing. Next, I went to the documentation for PC Files Server, specifically, this page
http://support.sas.com/documentation/cdl/en/acpcref/63184/HTML/default/viewer.htm#/documentation/cdl/en/acpcref/63184/HTML/default/a003353773.htm
which gives the exact correct code for running Proc Import, assuming you have the PC Files Server installed. Which, it turns out, I did not.
So …, from a different helpful person at Tech Support, I received the following:
“Note if you have an existing 9.1 or 9.2 pc file server you should uninstall it first.
1. Download the PC file server from the following location to your windows pc that
is going to run the application. You can find it at this location:
ftp://ftp.sas.com/techsup/download/base/zqjpcfileserver92m3.zip
you can simply save the file to any location on the pc where you are going to install the SAS PC File Server
2. For more information on the PC file server go to this link here.
http://support.sas.com/documentation/cdl/en/acpcref/61891/HTML/default/a002645029.htm
3. Unzip the zqjpcfileserver92M3.zip file on your pc, it will unzip to
the pcfilesrv__92130__prt__xx__sp0__1 sub directory where you stored the zip file.
4. In the unzipped directory named pcfilesrv__92130__prt__xx__sp0__1 double click on
the setup.exe
5. This will start the install
a. The setup.exe will install the pc file server in the C:\Program Files\SAS\PCFilesServer\9.2 directory.
If you are installing this on an X64 box it will install in C:\Program Files (x86)\SAS\PCFilesServer\9.2 directory because
this is a 32 bit application.
b. You will have a choice to install the pc file server as a service. The checkbox selection is
Start Service Now and When Windows Starts.
c. Note that if you install it as a service you must read network drive names with their Universal Naming Convention names such
as \\servername\directory\filename.xls.
After I installed the PC Files server, everything worked absolutely lovely to import Excel files, whether using the Import Data option in the File menu or Proc Import in my code ON WINDOWS 7 x64. So, my advice is that if you have a shiny new computer and a shiny new SAS 9.2 Maintenance 3 and you want to import the latest in Excel files or Access, download and install the PC Files server and you will be happy. Someone might even bring you doughnuts. But don’t count on that.
When I tried the same exact steps in Vista 64 I received a message. “Connection failed. See log for details.” The “details” were that SAS stopped processing this step because of errors.
Bad computer! No doughnut !
Jul
20
You May Be a Novice Programmer if ….
July 20, 2010 | 3 Comments
I am writing a paper on moving from novice to intermediate programmer and got to thinking about the sort of things that people say that identify someone as a novice programmer.
NOTE: No one is allowed to feel bad for having made these mistakes. Everyone you meet will admit to having made the exact same errors at one time, except for a very few people. Those very few people are probably lying. Try to avoid having coffee with them. They are a bad influence.
( Not long ago I was on the phone with someone and they said to type something like “ls pipe command” and I actually typed the word “pipe” instead of ls | command.
as in ….
ls | mail annmaria@thejuliagroup.com
Fortunately, I did not actually hear the person say, ‘What a moron.’ A fact I attribute to the helpful invention of the mute button. In my defense, I was only on my 4th cup of coffee recovering from a conference call at 6:30 a.m. that morning with a group that apparently believed that the entire world is on Eastern Standard Time.)
These characteristics DO generally reveal you as a newbie:
- Thinking that just because your program ran and there are no messages that say ERROR in your log that your results are correct.
- Not reading your log.
- Thinking that just because your program ran with the perfectly cleaned up test data, or with the first 1,000 records, that all is now well and there will be no problems with it.
- Writing your own code for common functions like mean, log, random numbers. I don’t mean to be rude (no more than usual, anyway), but did you really think that no one in the previous decades no one thought about this and included it as part of the language?
- Copying and pasting the same lines over and over. – If you are doing that, I’ll bet your code is almost screaming at you MACRO! or DO-LOOP or maybe ROSEBUD! (Well, the latter is the least likely, actually.)
- Not using comments, which is proof of your unfamiliarity with “Eagleson’s Law: Any code of your own that you haven’t looked at for six or more months, might as well have been written by someone else.” (I did not know that had a name until recently.)
There are several more but I am going call it a night, as I have a meeting at 7 a.m. because, as the individual on the East Coast who scheduled it logically concluded, “It’s 10 a.m. somewhere.” What IS IT with you people?
Jul
16
Why Middle Managers Hate the Numerati
July 16, 2010 | 3 Comments
There are two kinds of people in organizations; those who can count and those who claim to have “people skills”.
When David Wechsler created the most commonly used intelligence test in America, the results gave two IQ scores, Verbal and Performance Intelligence. Dr. Wechsler said he had noticed that there are some people who were good at using words and some people were good at solving problems with things, and that those were both types of intelligence. Steven Baker, wrote an interesting book about one subset of those people, those who control and analyze data. He called them the numerati. I used that term here to describe everyone high on Wechsler’s second intelligence score, because it was simpler than saying “technologists, mathematicians, statisticians, engineers, scientists and people like them”. Besides, I liked the term and it’s my blog.
In a great many areas, from the BP oil spill, to global warming, to curing diseases like AIDS or cancer, to genetic engineering to technology start-ups, many people in American society can be heard to say, “Scientists can certainly find a solution for this”, sometimes prefaced by “If we could send a man to the moon… ”
Listening to the news, my husband, an actual rocket scientist type, has responded sardonically more than once to these comments,
“Well, your faith is touching but… ”
Yes, it would seem we LOVE scientists, engineers, mathematicians, statisticians, computer programmers, all of those people who are going to figure this stuff out, right? I consider myself to be very fortunate to often be right in the thick of machinery that powers science. I get to help people create propensity scores to quantify mortality risk, write macros to create simulated data for parallel analyses, modify programs so they run on a supercomputer and a lot more fun stuff. Some nights I leave the building hours later than I had planned or go home and work into the morning because I am chasing a problem and lose track of time. What’s not to love about that?
And yet, when I look at who is supervising our technical staff, the engineers, physicians, and scientists, it is often a different story. You would THINK, that the bright young people coming up would be the ones you want to encourage. And yet ….
TRUE STORY #1:
There once was a technical support center with some very savvy technical staff. The kind of people who took computers apart just to see if they could put them back together again or who would run thirteen virtual machines at a time just to see what would happen. Their department supervisor was pretty decent with Unix and even better hacking into the Windows operating system. When he left, some of the staff applied for his position as well as some very good technical people from outside. The new supervisor had no technical expertise but “people skills”. The training to teach the staff more about Unix, more about systems administration was limited to guest lecturers. Recently, I was copied on an email to the staff regarding the proper “phraseology” for answering the phone and telling people how happy you are they called.
This troubled me enough that I mentioned to an executive for that organization how misguided I thought it was. I pointed out that when people call technical support they want their technical question answered. Further, since this is an entry point for many people, it was a great opportunity to train and develop people who already have some skills and talent to be successful. I was told that while, yes, for people like ME, this was true but these technical people did not have MY abilities (whatever those might be) and thus what they really needed was not an explanation of the difference between a 32-bit and 64-bit operating system or parallel versus serial processing. What they really needed was signs saying, “Smile when you answer the phone.”
For a while, when people from tech support would call me, I would answer the phone with,
“Hi, this is AnnMaria, I’m very fucking happy-ology you called.”
Then I would answer their questions. They seemed to be very fucking happy-ology about it, too.
Except for one middle manager type who overheard me and told me I was wildly inappropriate and asked me what if it had been the president calling me. I pointed out that I have caller ID and unless Barack Obama happened to be visiting technical support, borrowed someone’s phone and called me just to ask a question about logistic regression, it wasn’t very likely to be an issue.
So, we’re not mentoring those with potential to be up-and-comers. What about the existing “numerati”?
At the university level – sadly, for the last thirty years, the number of tenured professors in all fields has been dropping dramatically . The proportion of classes taught by full-time professors has been dropping. There is a rising new group called “clinical professors” who are paid only to teach and don’t do any research at all. Then, there are the for-profit universities, a rapidly rising group that takes up almost a quarter of all federal student aid. They don’t support any research at all.
This article from the Chronicle of Higher Ed discusses both the fact that the tenure track isn’t all it’s cracked up to be “you can’t speak your mind for seven years” and the number of positions is declining anyway.
From what I have seen, in technology companies SOME SUBSET of the numerati are well-treated. A software company may esteem its programmers but disregard the market research staff that can hold some whiz-bang statisticians. A pharmaceutical company may treat very well the clinical researchers but completely ignore the programmers who run their accounting and inventory systems.
Doesn’t this make sense? Isn’t it the old cliche about staff versus line positions we learned about in business school? Maybe, maybe not but certainly it is stupid. Those professors, I would think, would be “line jobs”. As for the accounting, market research and inventory folks, if you let them apply some of those equations they might make or save you millions of dollars. Why do we generally think that science and technology are the answers to all of our national ills but overlook those skills in specific situations?
TRUE STORY #2
An organization planned to expand the software licensed. A new purchase, available to all researchers, for a very modest fee, would have given them the capability to easily do decision trees, neural networks, survival analysis and more. The purchase was stopped because the vendor’s attorney and the client’s attorney could not agree on a phrase in the contract. This was reviewed by two managers and two attorneys, none of whom actually knew what the software could do for the organization.
As I hear these stories, and many, many more like them, I wonder what exact “people skills” these middle managers bring to organizations. If the skill is to develop people, you’d think they would bring in people to train them. Maybe they would look at data that showed the greatest areas of need. If it was to support existing researchers, you’d think they’d ask them what it is they need and try to ACTIVELY promote new technologies rather than “Say no and see if anyone screams”.
In looking at some of the behavior (think the phraseology example and the fact that this individual was hired) it shows an active distrust, disrespect and dislike for the technical staff.
I cannot state for sure why this happens in some organizations (certainly not all), but this distinction between “people skills” and “research skills” got me thinking of the difference in security.
What are technical skills? The ability to conduct an experiment, diagnose a patient, write a program. Generally, these are very portable. As a consultant, when I leave one client and go to the next >95% of what makes me valuable goes with me. Yes, the next client may have some specific system I need to learn, but the definition of a training dataset, how to select a stratified random sample and all the programming languages I know go with me. The same is true of anyone in a technical or scientific field. The more you apply your skills, the more value you have and you take that value with you wherever you go.
I’m a bit confused by the “people skills” that some middle managers supposedly have. As a friend of mine commented about the manager for his department,
“They say he was hired for his ‘people skills’ and not his expertise. Well, we’re all people in this department and we all think he’s a dork. “
People skills include the ability to motivate and communicate. Those are a lot harder skills to document. How do you know your staff didn’t succeed despite you? For middle managers, a good deal of success seems to depend on connections. It’s not what you know, it’s who you know. I say that not in a perjorative sort of way but because I have noticed that many middle managers LOVE meetings. The point is, as I have been told many times,
“So we can all get to know each other.”
and I wonder,
“Why?”
One reason middle managers and the numerati don’t get along is they seem to think differently. Take meetings. My view on most meetings with a middle manager with a Gantt chart is
“Why are you here?”
I’m not talking about the person from the department we are supposed to serve who can tell me about how the data are stored, what questions they hope the data can help them answer and problems with data quality. I totally get why she is there.
I also understand people at a higher level of management who have an enormous project and need to parcel out parts of it to different teams, who need to set priorities for resources. I understand what they are doing and why we need them.
What I DON’T get is the guy in the middle who organizes meetings, requires agenda and minutes so they can be forwarded to “upper upper management”.
Here is what I am thinking:
“My team and I are going to do the absolute best we can. Tell us how much money is available and when you need it done. Then, go away.”
I really, really don’t know what the middle managers are thinking. What I deeply suspect, though, is that when I read about people in the New York Times who used to have a job that paid $60,000 or $80,000 or $100,000 a year and now they have been unemployed for two years, that I am reading about THEM.
————
Fish Lake – it will teach you numbers and it’s fun
Our games are like push-ups for your brain.
Jul
11
Nine out of ten businesses owned by trapeze artists are not in this survey
July 11, 2010 | 2 Comments
Back when I was in college, there was a group advocating burning rock albums. A major investigative journalist wrote a story on their motivation (I think he either wrote for Rolling Stone or Playboy, the latter of which, yes, I really did read for the articles. Despite having competed on my college track team and the U.S. judo team, worked as a programmer and played rugby, I am actually not a lesbian, a fact which frequently surprises people. But I digress. Even more than usual.)
One question that I remember was how the group came about their figure of 80% of out-of-wedlock babies were conceived by listening to rock music. The founder said they had heard this figure cited by an evangelist during a revival in their town. The reporter followed up with the question,
“So, do you have any data to support your album burning other than the traveling evangelist poll?”
There were many things wrong with this study, the first of which being, I suspect, that it didn’t exist. Beyond that, there is the sampling issue. Is 80% a high number? Perhaps it is the music listened to by women of child-bearing age, the Big Band, Lawrence Welk fans being primarily post-menopausal and thus not at-risk of pregnancy on either side of wedlock.
A causal relationship is at least implied, otherwise what was the whole burning point? To test this hypothesis, I turned on Blinded by the Light by Bruce Springsteen at full volume. Two unmarried daughters of child-bearing age were in the house, as was my husband.
No pregnancies ensued. Said husband remained downstairs building a robot with the world’s most spoiled twelve-year-old, although he did come up momentarily to ask if I minded if he turned down the music.
One daughter announced she was going to the apartment of the other daughter because, and I quote,
“No offense, but you people are boring.”
Which brings me to my tangentially related point… Lately I have been trying to come to the source of the frequently stated “facts” that
A. Small businesses produce the jobs that lead the economy out of a recession
B. Most jobs are created by start-ups
C. What small businesses really need are credit and counseling. Business plans always feature in there big.
I have no idea whether A and B are true or not. I rather suspect A is in part because there are way more small businesses than any other type. It goes back to the Traveling Evangelist Poll (whether it existed or not). If there are way more people working in small businesses then a 10% increase in them is going to be more than a 10% increase in the fewer number of people who work for large business.
As far as C, I am a bit confused. Vivek Wadhwa, who is a pretty interesting writer on this topic, had this article on Tech Crunch on July 10, with which I agree completely. The title is “You’re no Steve Jobs” and his main point is that the problem with many start-ups is that no one wants to buy their crap. He said it way more nicely than that, though.
Years ago, I used to spend some time on a forum for small businesses. One of the reasons I quit was because in the start-up section, no one ever said,
“What? Are you crazy?”
Instead, there were always supportive comments like,
“Live your dream, baby! I’m sure your business making hand-knit sweaters for turtles will make millions by next summer.”
These people are always saying that if they only had the money, they could have this amazingly amazing life but that the big bad banks would just not lend them the money to go out and buy a building to turn into a turtle sweater making factory.
The very odd thing, odder even than the turtle sweaters, is that the same week I read an article with which I mostly disagreed by the same Vivek Wadhwa ! (No, I am not stalking him, it was coincidence. I swear.)
Well, those aren’t 100% contradictory … they don’t start because of lack of knowledge and financing but they DO fail after they start because no one buys their products. (Maybe some of that fear of failure is realism.)
Having been in business 25 years and never once been part of a survey (hey, I’m WORKING here!) I was curious as to the source of these figures.
Being a good academic, Wadhwa did provide references, and the first was to a study of 549 entrepeneurs in high-growth industries.
I don’t doubt that one might find for this specific group that access to capital is a big barrier. To a small company becoming a big company in a short period of time, the capital to buy a building, working capital to meet a growing payroll, all are important.
What percentage of jobs are those, though? I don’t know but I don’t think it is a lot. Google and Yahoo both have offices in Santa Monica. Geocities had its headquarters a few blocks from where I’m sitting. Even in our relatively tech sector of the world, the number of “high-growth” employees are dwarfed by those working at the restaurants, hotels, liquor stores, car dealers, movie and TV industry.
In other areas where I work frequently, like North Dakota, and Washington, D.C., the proportion of “high growth” industry personnel is even smaller.
What about the “more jobs are created by start-ups”? I looked into that, too. There was a really interesting study by the Kauffman Foundation that pointed out that start-ups can ONLY create jobs. Their definition of a start-up is a business that started this year.
Jobs created = Jobs This Year – Jobs Last Year
Since the second part for a start-up is zero, it can ONLY add to the number of jobs.
Existing companies may hire ten people (cool for the ten people hired) but have 15 who retired, were laid off, fired or had a heart attack due to having sex while listening to rock music. Even though ten people were hired, the company has a net loss of five jobs.
I am not convinced, though, that the answer to economic malaise is to have a massive number of start-ups as many of them (like turtle-sweater lady) may be negative on the job-creation number by the next year.
Where do they get this idea that what small businesses really need is credit, so the government should give the banks more money to lend?
I went to Google, the source of all knowledge, and typed in “Small Business Survey.” The first several that came up were places like the North Texas Small Business Development Center, Citibank and the Huffington Post Survey on the Credit Crunch.
The latter asks :
“Small business owners: have you applied for business credit? Was it approved, or turned down? Have you not applied because you didn’t think you’d have a chance?”
I’m just sayin’ that perhaps organizations whose main function is to give credit, help you obtain credit and polls asking you if you have applied for credit might be a bit biased in the proportion of those reporting credit is an issue compared to say, the general population of small businesses.
On Monday, the world’s most spoiled 12-year-old is starting trapeze school.

Supporting the Economy
I don’t think what the Trapeze School (which is a small business, not vulnerable to out-sourcing) needed was a line of credit or a business plan. From their perspective, what they needed was to swipe my credit card.
I have some more thoughts on representativeness (or lack thereof) in surveys but the world’s most spoiled twelve-year-old is asking to be tucked into bed and my husband is suggesting that perhaps The Rolling Stones would be better than Bruce Springsteen.
I doubt it. We already have a fifteen-year gap between the oldest child and the youngest. A few years ago, I thought I might be pregnant again (we DID go to a Rolling Stones concert around that time). He was very cool about it until we got the results that said I wasn’t pregnant and then he exclaimed,
“THANK GOD!”
So… I punched him.
Jul
7
What Small Businesses Need to Create Jobs
July 7, 2010 | 1 Comment
I’ve been in business for over twenty years. All of that time, I have run a small business, by choice. During those twenty years, I have had a sick husband, been widowed, had four children – so I had some reasons that becoming the next Oracle was not my priority. However, I have made a profit every year, some years more than others, and have increased and decreased my number of employees as necessary.
The more articles I read on small business in general and women-owned businesses in particular, the more I wonder how many of those organizations talking about helping small business owners create jobs include people who have actually run a small business.
There seems to be a great concern about the disparity in access to venture capital. Now, that may be a concern for some small businesses but most of the people I know own consulting companies, hair salons, restaurants, retail stores or manufacture products like t-shirts. They are not attractive to VCs because they are not going to have exponential growth.
Many of these small business owners, like me and my friends, are going to be in business for ten, twenty years or more, and pay corporate taxes, payroll taxes and everything else our accountant says we have to pony up every few months.
What about jobs?
I think everyone trying to create jobs through small business should read the insightful article Andrew Grove, Intel-cofounder, wrote on this subject. Those high-flying tech firms create a lot of jobs – overseas ! One problem with the VC-find-the-next-Apple approach, of course, is that those jobs may help investors but they don’t help the U.S. unemployment rate. Many, many of the high tech, high ROI jobs end up in China and India. (Seriously, read Grove’s article. It’s great.)
Twenty years ago, my business partners and I decided against outsourcing because we did not want to employ fewer Americans and pay someone in another country a sub-minimum wage so we could be richer. I know that sounds un-American, but part of our motivation in founding a business, which still derives much of its revenue from work on reservations, was to make life better for people. Obviously, we are privately owned, so we can make those choices.
The other thing I don’t need that every agency and company seems to want to sell me is a business plan. I have a business plan. Like most companies, the gist of it is to have revenues exceed expenses. Okay, it is a little more than that, BUT – after 20 years most of the business owners I know are not kept from hiring from lack of a plan. In fact, their plan is to add workers to meet the demand. It is certainly NOT to take out loans (guaranteed or not) so we can expand and hire more workers.
If anyone really seriously wanted to help small business create jobs they would make it easier for them to get business.
I had to laugh. Several times, representatives from the same “small business services” company have called me telling me,
“We’ll help you get YOUR money from the federal government. After all, it’s YOUR money.”
and then went on to promise me we could get on the GSA schedule and agencies would be falling over themselves to just pull our name up and order a million dollars of consulting services from us. I told their representative that’s not the way it works and he assured me it was and they had done that for lots of companies. I told him to email me the name of one. I’m still waiting.
I am not sure where the stimulus money went. I see some signs that the roads are being upgraded with Recovery Act funds, so that is a good thing. I don’t actually know anyone who got any of that $200 million that went to the NIH in grants, although I know a lot of people who applied, but that all went to universities any way.
I may actually bite the bullet and complete the section 8(a) application this year, although it grates on me to do it. The time I spend on that will take away from billable hours so it will actually COST me money. I’m still debating on it.
Don’t get the idea that we’re sitting around here whining. We have work enough to keep the people we have employed and I am now looking for new contracts. We’ve already turned down a few over the last year, which may sound inconsistent, but it’s not.
We submitted one proposal in May, a second in June. We have too much current work to take time away to do a proposal this month. I’ll submit one or two in August and September, depending on how tired I feel.
Taking a six-month or shorter contract that takes up all of your time and keeps you from bidding on multi-year contracts is not good business. Just bidding on everything that comes down the pike isn’t too bright, either. We look for a match between our capabilities and what the client needs, for areas we can really do excellent work. That way, they are happy and come back to us again and again. After a quarter-century in business, we DO kind of know what we are doing.
I hear a lot about tax breaks for small business. Well, we pay a hell of a lot of taxes and that would be nice. Even though we will probably be exempt from the requirement to provide health care, I have always offered that as an option to employees and our costs may go up a little. Taxes and health care costs are not what keep me from adding employees.
It seems like the people aiming to help small businesses are sincere. However, it’s like the old cliche that when the only tool you have is a hammer, every problem is a nail. Because most of these organizations have people who know how to write business plans, fill out loan applications, apply for certifications of some status or another and lobby on Capitol Hill, that’s what they see as the way to help small businesses.
Most people who have been in business for decades don’t need some consultant to help them develop a business plan before they can add jobs. If their business has been around a long time, they already have a line of credit. I’m not sure what they need is tax cuts or worse health care coverage for their employees.
What they need is work.
I’m surprised I have to explain this to you.
Jul
6
Are you a programmer or aren’t you?
July 6, 2010 | Leave a Comment
So, I am writing a paper on how you know you (or someone else) is a “real” programmer. That is, they don’t fit in that “new user” box any more. But how do you make that decision?
Is it like pornography, you just know it when you look at it? (Not that I ever personally looked at any of course, but I have heard you can find it on the Internet if you try really hard.)
Yesterday, Rob Meekings made a comment about design decisions. That is certainly a distinction, when you get to the point that you are actually thinking that way. For example, I often will merge everything together in one long dataset, a habit that makes those who love SQL and the star schema just cringe. The REASON I do this is that most of the people I work with are researchers using very powerful computers with datasets of a few thousand observations, or, at most, a few hundred thousand. Even on a desktop, an analysis with SAS, Stata or SPSS takes seconds. It isn’t worth taking an extra hour or two to make a program run in one second instead of two. It also may make the program more difficult for the user to maintain him/herself.
HOWEVER, when I am running a program that runs against a 100GB dataset and can take hours to run because the researcher cannot use a supercomputer, e.g., due to security classification, I’ll spend a good bit of time trying to make it run as efficiently as possible.
If there isn’t a pressing reason not to do it, I’d recommend someone with a large dataset considering running it on a cluster and take advantage of parallel processing capabilities. This means changing your code slightly to run on a different OS, often Linux or some other Unix version.
I do a lot of “throw away programming”, that’s not to say it’s garbage. Sometimes I think my work is quite good, in fact, but it’s not production code that runs every day to produce reports on 500 different stores. When I DO write production code, I do several things differently. One is that I make good use of %include statements. For example, if there is a footnote that is going to be in every single output that says, “Funding provided by National Science Foundation Rural Systemic Initiative Grant #1234-2010” and several more lines about the university, address for contact, etc., I am going to have a small file that I just include. Yes, I could copy and paste it or have that as a template for when I create a new program. BUT what happens when we get another grant and we want to recognize both funding agencies in everything we publish?
My point, and you may be surprised by this point to find that I do, in fact, have one, is that a distinction between novice and non-novice programmers is that they have the luxury of thinking about a design because they know more than one way to do something.
Jul
4
Signs you’re not a novice programmer
July 4, 2010 | 3 Comments

I'll get this down eventually
Writing a presentation for WUSS, I had to fill out the usual check box for the intended audience:
Level of programming expertise:
___ Novice __ Intermediate __ Advanced
and I started wondering when exactly does someone stop being a novice? One answer is that your programming no longer LOOKS like it was written by a novice. That’s kind of circular reasoning, though, isn’t it? To be more specific, here are a few of those signs, generated from a survey of a random sample of 1.
(Note, if your programming does not always show all of the characteristics mentioned below, you are forbidden to feel bad. All but a very exceptional few programmers will admit to having made every ‘newbie’ mistake when they started, and on occasion, they still do when they are rushed, tired or distracted by three fighting children or after their third martini. As for that exceptional few – they’re chronic liars. Stay away from them.)
Five signs you’re no longer a novice, in no particular order ….
1. Good use of functions
AvgQtr = (Jan + Feb + Mar) /3
is a sign of a novice
AvgQtr = Sum(Jan, Feb, Mar) /3
is better
AvgQtr = Mean(Jan,Feb, Mar)
is what an intermediate programmer would do.
2. You know options of options
3. You understand how the particular language you are using processes data.
For example, in SAS, let’s say you have two datasets
Pretest has the following variables: Id Age Gender Testscore
Where testscore is (obviously) the pretest score.
Posttest has the same variables: Id Age Gender Testscore
Where testscore is (obviously) the posttest score.
If you do this (bad!)
Proc sort data = libref.pretest ;
By id ;
Proc sort data = libref.posttest ;
By id ;
Data libref.alltests ;
Merge libref.pretest libref.posttest ;
By id ;
You have just created a dataset that is a copy of posttest because the testscore from the second dataset named will copy over the first.
Try this:
Proc sort data = libref.pretest out = pre (rename = (testscore = pretest)) ;
By id ;
Proc sort data = libref.posttest out= post (rename = (testscore = posttest));
By id ;
Data libref.alltests ;
Merge pre post ;
By id ;
Yes, you COULD have done this by at least one data step where you renamed the testscore variable, but adding an extra step is inefficient.
A good, short article on beyond the basics in proc sort was written by Kelsey Basset.
4. Use your knowledge of functions in your programming logic.
5. Don’t forget about missing values.
For example, a researcher wants to categorize people who have ANY positive response to five questions on raising taxes, “Would you vote to raise taxes if … the state budget isn’t balanced?” “Would you raise taxes if … the option was to cut social services?” and so on.
A novice response would be:
If q1 = 1 then taxes = 1 ;
Else If q2 = 1 then taxes = 1 ;
Else If q3 = 1 then taxes = 1 ;
Else If q4 = 1 then taxes = 1 ;
Else If q5 = 1 then taxes = 1 ;
Else taxes = 0 ;
Better
If sum(of q1 – q5) > 0 then taxes = 1 ;
Else if sum(of q1 – q5) = 0 then taxes = 0 ;
The reason for having the second IF in there is that if you do not then all of those with missing values get set to zero, which may result in throwing off your results by a great deal, depending on how frequent missing data is.
There are a variety of ways, some better some worse. However, one statement that does exactly what we want is :
Taxes = Max(of q1 – q5) ;
If any of the questions were answered 1, the value of taxes is 1. If all were answered 0, the value is 0 and if all were missing, the value is missing.
I saw a similar example from SPSS on Douglas Smith’s page. Although Recode is actually a command and not a function, my point is the same. Once you proceed from being a novice, you are naturally seeing the ways you can make your program more efficient.
“Another example of using recode might be to invert the order of the values for a subjective evaluation variable. For instance, the variable “happy” has three valid response categories:
1 = Very Happy
2= Pretty Happy
3 = Not Too Happy
You might want to change the order to go from least happy to most happy. To do this, all you need to do is swap the values 1 and 3. The recode statement that will accomplish this is:
recode happy (1=3) (3=1).
Oh, and if you don’t use the command window, much less the Do-file editor in Stata, you are definitely a novice. Same goes for anyone who doesn’t write syntax for SPSS or hasn’t found a use for the Program window in SAS Enterprise Guide.
That isn’t to say that there will never come a day when one can be considered a programmer by simply being very good at pointing and clicking.
Just sayin’ …. today is not that day.
Blogroll
- Andrew Gelman's statistics blog - is far more interesting than the name
- Biological research made interesting
- Interesting economics blog
- Love Stats Blog - How can you not love a market research blog with a name like that?
- Me, twitter - Thoughts on stats
- SAS Blog for the rest of us - Not as funny as some, but twice as smart. If this is for the rest of us, who are those other people?
- Simply Statistics, simply interesting
- Tech News that Doesn’t Suck
- The Endeavor -John D Cook - Another statistics blog