Jul

24

Hmm … so, Liz, our finance person is incomparably efficient and unfailingly nice, where I am usually efficient and have a reputation for being correct 97.6% of the time (as someone commented on twitter, if it has decimals in it, it must be true).

Between the two of us we just accomplished the impossible task of adding another statistical package for the university-wide license. Getting anything approved at a large institution requires something like the following;

recommendation and agreement to provide technical support (me), request from finance (Liz), approval from person in charge of the budget, approval from person in charge of person in charge of the budget, approval from legal department, sacrifice of a live chicken, dancing naked in the network operations center, signing of the contract with the blood of a unicorn executed by a troll under a full moon.

Well, it might be simpler than that, but not much. Since we have just agreed to increase the number of statistical packages installed by 33% with a 0% expansion in staff (what was I thinking?) it seemed like a good idea to drive down to Carlsbad and check out the JMP Explorer Seminar and see if I could steal any ideas to put up on the JMP website and FAQ which I now need to create (seriously, what WAS I thinking?).

First cool things I will put on the site are a description of the Graph Builder and a discussion of export to flash.

The graph builder is drag and drop on meth.

Here, I want to compare the correlation between the pretest and post-test by experimental and control group. I drag pretest to X, post-test to Y and Group to “Group X”.

jmpgraphbuilder

As I was reducing the size of this graph in Graphic Converter (amazing deal at $34.95 and no I don’t get a kickback from them. I mean seriously, with as much as I talk shit about everything here do you honestly think anyone would PAY me to write about them?) to post here it occurred to me that it would be helpful to have a line that showed the pretest mean so I added that. The whole graph took about 30 seconds.

From my really cool chart here you can easily see that the majority of people in the experimental group scored above the pretest mean (that line) while the control group scored noticeably lower than the experimental group. You can also see that there is, as there should be, a stronger correlation between pre- and post-test for the control group than there is for the experimental group.
prepostjmp

This next chart took just another few seconds to create, but as I looked at it, I realized three things. First, it would be better if I had put the sites in chronological order rather than alphabetical order because the difference between experimental and control was greatest on the last one we did (V) and least on the first one (I). Second, it would have been better if I had grouped by Group (uncreative name) on the X axis and site on the Y axis so it would be much easier to compare them side by side as in the chart above. Third,

**** AND THIS IS A VERY IMPORTANT POINT WHICH SELDOM HAPPENS HERE SO PAY ATTENTION ***

I think there is such a thing as visual literacy. Just like experienced statisticians can look at a cross-tabulation and in their heads estimate (observed – expected) and get a quick appraisal of likely size of a relationship, it takes some staring at visual data, too. The more graphical displays of data, the more I see and the more ideas I get for how to do it better. While this may seem like a blinding flash of the obvious, I mention it here because I have read so many books and articles that say data visualization should not need any explanation. On one level, yes, well, maybe.

However, I think, as with statistics in general, the more you study it, the more you DO see.
groupsjmp

Back to JMP, one of the reasons we felt it was important to add it to our campus offerings is that it allows you easily to do those explorations, to look at data from one side and then another (literally). I could have re-done the chart above in seconds. Of course, then I would have had to have opened JMP again, saved the chart, and uploaded it to this site, which would have taken me possibly two minutes. But, I have a quota of three graphics per post so I ate jelly beans for two minutes instead and then included the bubble plot as the last one because it moves, has colors and pointy-clicky things.

You laugh and sneer but lo I say to you that Youtube and Facebook each have hundreds of millions of users and all of Scientific Software International’s Item Response Theory programs put together are used by fewer people each year than the number of pigs sold in one day for Farmville. (Incidentally, Eric Greenspan of Make it Work is my hero for having bought the url www.ihatefarmville.com which redirects to a site with information on him and his company.)

The Bubble Chart — simply include an X value, a Y value and a time value. You can also, like I did, choose a value to color by, and (as I didn’t) a value for the size of the bubble.

Here I have the different test sites (X axis), months of product testing, and score. Since these were just data I had on my computer while I was sitting in the seminar and not something like stock prices or median home prices by state the chart does not look as cool as examples that would apply to this type of visualization. What I want to illustrate here, though, is the fact that in under a minute you can drag in a few variables, then, click on the ubiquitous red arrow. One of the options is to export as flash. Now you have your chart in flash.

Click on it and you can label bubbles, zoom in, zoom out, change the speed, size and other interactive options. Did I mention it took me about 30 seconds? Almost makes me want to re-do it with something other than data I just had lying around.

Now THAT is some kick-ass statistical software when it makes you want to go out and find reasons to use it.

That kind of reaction to software is usually limited to applications that involve shooting people or pornography. However, unlike in those other options, a three-way interaction in JMP will get you neither dead nor a sexually-transmitted disease.

Jul

20

In the bottom of a file cabinet, in an unused lavatory, behind a door marked Beware of the Leopard

Some times documentation can be a little hard to find…

You may be aware of the fact that, if you are running SAS 9.2 on a 64-bit Vista or Windows 7 machine the Import Data option from the file menu does not work for Excel files.

Per SAS Usage Note 33228: (Courtesy of Peter Ruzsa in SAS Technical Support.)
You are running into this issue here,

“An error occurs when you use SAS® 9.2 to import or export Microsoft Excel or Access files in the Windows x64 and Windows Vista 64 environments.”
(Yes, we know that.)

When you use SAS 9.2 to import or export Microsoft Excel or Microsoft Access files in the Windows X64, Windows Vista 64, and Windows 2003 64-bit server environments, you can receive the following message:

ERROR: DBMS type EXCEL (ACCESS) not valid for import.
In addition, when you use the Import and Export wizards, the Excel engine is not presented as a selection.

(Yes, and this makes us sad because people insist on continuing to email us files in Excel format, and Access, too, but we have these shiny new computers running SAS 9.2 that we want to use and, on top of it all, we are out of doughnuts. They keep buying that raspberry arugala crap instead. Why do we always modernize the wrong things?)

You could save your Excel 2007 files as .csv and import them that way but that is pretty inefficient.

So, let’s read on in Pete’s note… well, actually, let’s not because it had some code in it that probably works for some people in certain situations. I was not one of those people. However, maybe you are, so you can go to the SAS knowledge base and read it here.

http://support.sas.com/kb/33/228.html

When that didn’t work, I tried swearing. Next, I went to the documentation for PC Files Server, specifically, this page

http://support.sas.com/documentation/cdl/en/acpcref/63184/HTML/default/viewer.htm#/documentation/cdl/en/acpcref/63184/HTML/default/a003353773.htm

which gives the exact correct code for running Proc Import, assuming you have the PC Files Server installed. Which, it turns out, I did not.

So …, from a different helpful person at Tech Support, I received the following:

“Note if you have an existing 9.1 or 9.2 pc file server you should uninstall it first.

1. Download the PC file server from the following location to your windows pc that
is going to run the application. You can find it at this location:

ftp://ftp.sas.com/techsup/download/base/zqjpcfileserver92m3.zip

you can simply save the file to any location on the pc where you are going to install the SAS PC File Server

2. For more information on the PC file server go to this link here.

http://support.sas.com/documentation/cdl/en/acpcref/61891/HTML/default/a002645029.htm

3. Unzip the zqjpcfileserver92M3.zip file on your pc, it will unzip to
the pcfilesrv__92130__prt__xx__sp0__1 sub directory where you stored the zip file.
4. In the unzipped directory named pcfilesrv__92130__prt__xx__sp0__1 double click on
the setup.exe
5. This will start the install

a. The setup.exe will install the pc file server in the C:\Program Files\SAS\PCFilesServer\9.2 directory.
If you are installing this on an X64 box it will install in C:\Program Files (x86)\SAS\PCFilesServer\9.2 directory because
this is a 32 bit application.
b. You will have a choice to install the pc file server as a service. The checkbox selection is

Start Service Now and When Windows Starts.

c. Note that if you install it as a service you must read network drive names with their Universal Naming Convention names such
as \\servername\directory\filename.xls.

After I installed the PC Files server, everything worked absolutely lovely to import Excel files, whether using the Import Data option in the File menu or Proc Import in my code ON WINDOWS 7 x64. So, my advice is that if you have a shiny new computer and a shiny new SAS 9.2 Maintenance 3 and you want to import the latest in Excel files or Access, download and install the PC Files server and you will be happy. Someone might even bring you doughnuts. But don’t count on that.

When I tried the same exact steps in Vista 64 I received a message. “Connection failed. See log for details.” The “details” were that SAS stopped processing this step because of errors.

Bad computer! No doughnut !

Doughnuts for all!

Jul

20

I am writing a paper on moving from novice to intermediate programmer and got to thinking about the sort of things that people say that identify someone as a novice programmer.

NOTE: No one is allowed to feel bad for having made these mistakes. Everyone you meet will admit to having made the exact same errors at one time, except for a very few people. Those very few people are probably lying. Try to avoid having coffee with them. They are a bad influence.

( Not long ago I was on the phone with someone and they said to type something like “ls pipe filename” and I actually typed the word “pipe” instead of ls > filename. Fortunately, I did not actually hear the person say, ‘What a moron.’ A fact I attribute to the helpful invention of the mute button. In my defense, I was only on my 4th cup of coffee recovering from a conference call at 6:30 a.m. that morning with a group that apparently believed that the entire world is on Eastern Standard Time.)

These characteristics DO generally reveal you as a newbie:

  1. Thinking that just because your program ran and there are no messages that say ERROR in your log that your results are correct.
  2. Not reading your log.
  3. Thinking that just because your program ran with the perfectly cleaned up test data, or with the first 1,000 records, that all is now well and there will be no problems with it.
  4. Writing your own code for common functions like mean, log, random numbers. I don’t mean to be rude (no more than usual, anyway), but did you really think that no one in the previous decades no one thought about this and included it as part of the language?
  5. Copying and pasting the same lines over and over. – If you are doing that, I’ll bet your code is almost screaming at you MACRO! or DO-LOOP or maybe ROSEBUD! (Well, the latter is the least likely, actually.)
  6. Not using comments, which is proof of your unfamiliarity with “Eagleson’s Law: Any code of your own that you haven’t looked at for six or more months, might as well have been written by someone else.” (I did not know that had a name until recently.)

There are several more but I am going call it a night, as I have a meeting at 7 a.m. because, as the individual on the East Coast who scheduled it logically concluded, “It’s 10 a.m. somewhere.” What IS IT with you people?

Jul

16

There are two kinds of people in organizations; those who can count and those who claim to have “people skills”.

When David Wechsler created the most commonly used intelligence test in America, the results gave two IQ scores, Verbal and Performance Intelligence. Dr. Wechsler said he had noticed that there are some people who were good at using words and some people were good at solving problems with things, and that those were both types of intelligence. Steven Baker, wrote an interesting book about one subset of those people, those who control and analyze data. He called them the numerati. I used that term here to describe everyone high on Wechsler’s second intelligence score, because it was simpler than saying “technologists, mathematicians, statisticians, engineers, scientists and people like them”. Besides, I liked the term and it’s my blog.

In a great many areas, from the BP oil spill, to global warming, to curing diseases like AIDS or cancer, to genetic engineering to technology start-ups, many people in American society can be heard to say, “Scientists can certainly find a solution for this”, sometimes prefaced by “If we could send a man to the moon… ”

Listening to the news, my husband, an actual rocket scientist type, has responded sardonically more than once to these comments,
“Well, your faith is touching but… ”

Yes, it would seem we LOVE scientists, engineers, mathematicians, statisticians, computer programmers, all of those people who are going to figure this stuff out, right? I consider myself to be very fortunate to often be right in the thick of machinery that powers science. I get to help people create propensity scores to quantify mortality risk, write macros to create simulated data for parallel analyses, modify programs so they run on a supercomputer and a lot more fun stuff. Some nights I leave the building hours later than I had planned or go home and work into the morning because I am chasing a problem and lose track of time. What’s not to love about that?

And yet, when I look at who is supervising our technical staff, the engineers, physicians, and scientists, it is often a different story. You would THINK, that the bright young people coming up would be the ones you want to encourage. And yet ….

TRUE STORY #1:
There once was a technical support center with some very savvy technical staff. The kind of people who took computers apart just to see if they could put them back together again or who would run thirteen virtual machines at a time just to see what would happen. Their department supervisor was pretty decent with Unix and even better hacking into the Windows operating system. When he left, some of the staff applied for his position as well as some very good technical people from outside. The new supervisor had no technical expertise but “people skills”. The training to teach the staff more about Unix, more about systems administration was limited to guest lecturers. Recently, I was copied on an email to the staff regarding the proper “phraseology” for answering the phone and telling people how happy you are they called.

This troubled me enough that I mentioned to an executive for that organization how misguided I thought it was. I pointed out that when people call technical support they want their technical question answered. Further, since this is an entry point for many people, it was a great opportunity to train and develop people who already have some skills and talent to be successful. I was told that while, yes, for people like ME, this was true but these technical people did not have MY abilities (whatever those might be) and thus what they really needed was not an explanation of the difference between a 32-bit and 64-bit operating system or parallel versus serial processing. What they really needed was signs saying, “Smile when you answer the phone.”

For a while, when people from tech support would call me, I would answer the phone with,

“Hi, this is AnnMaria, I’m very fucking happy-ology you called.”

Then I would answer their questions. They seemed to be very fucking happy-ology about it, too.

Except for one middle manager type who overheard me and told me I was wildly inappropriate and asked me what if it had been the president calling me. I pointed out that I have caller ID and unless Barack Obama happened to be visiting technical support, borrowed someone’s phone and called me just to ask a question about logistic regression, it wasn’t very likely to be an issue.

So, we’re not mentoring those with potential to be up-and-comers. What about the existing “numerati”?

At the university level – sadly, for the last thirty years, the number of tenured professors in all fields has been dropping dramatically . The proportion of classes taught by full-time professors has been dropping. There is a rising new group called “clinical professors” who are paid only to teach and don’t do any research at all. Then, there are the for-profit universities, a rapidly rising group that takes up almost a quarter of all federal student aid. They don’t support any research at all.

This article from the Chronicle of Higher Ed discusses both the fact that the tenure track isn’t all it’s cracked up to be “you can’t speak your mind for seven years” and the number of positions is declining anyway.

From what I have seen, in technology companies SOME SUBSET of the numerati are well-treated. A software company may esteem its programmers but disregard the market research staff that can hold some whiz-bang statisticians. A pharmaceutical company may treat very well the clinical researchers but completely ignore the programmers who run their accounting and inventory systems.

Doesn’t this make sense? Isn’t it the old cliche about staff versus line positions we learned about in business school? Maybe, maybe not but certainly it is stupid. Those professors, I would think, would be “line jobs”. As for the accounting, market research and inventory folks, if you let them apply some of those equations they might make or save you millions of dollars. Why do we generally think that science and technology are the answers to all of our national ills but overlook those skills in specific situations?

TRUE STORY #2
An organization planned to expand the software licensed. A new purchase, available to all researchers, for a very modest fee, would have given them the capability to easily do decision trees, neural networks, survival analysis and more. The purchase was stopped because the vendor’s attorney and the client’s attorney could not agree on a phrase in the contract. This was reviewed by two managers and two attorneys, none of whom actually knew what the software could do for the organization.

As I hear these stories, and many, many more like them, I wonder what exact “people skills” these middle managers bring to organizations. If the skill is to develop people, you’d think they would bring in people to train them. Maybe they would look at data that showed the greatest areas of need. If it was to support existing researchers, you’d think they’d ask them what it is they need and try to ACTIVELY promote new technologies rather than “Say no and see if anyone screams”.

In looking at some of the behavior (think the phraseology example and the fact that this individual was hired) it shows an active distrust, disrespect and dislike for the technical staff.

I cannot state for sure why this happens in some organizations (certainly not all), but this distinction between “people skills” and “research skills” got me thinking of the difference in security.

What are technical skills? The ability to conduct an experiment, diagnose a patient, write a program. Generally, these are very portable. As a consultant, when I leave one client and go to the next >95% of what makes me valuable goes with me. Yes, the next client may have some specific system I need to learn, but the definition of a training dataset, how to select a stratified random sample and all the programming languages I know go with me. The same is true of anyone in a technical or scientific field. The more you apply your skills, the more value you have and you take that value with you wherever you go.

I’m a bit confused by the “people skills” that some middle managers supposedly have. As a friend of mine commented about the manager for his department,

“They say he was hired for his ‘people skills’ and not his expertise. Well, we’re all people in this department and we all think he’s a dork. “

People skills include the ability to motivate and communicate. Those are a lot harder skills to document. How do you know your staff didn’t succeed despite you? For middle managers, a good deal of success seems to depend on connections. It’s not what you know, it’s who you know. I say that not in a perjorative sort of way but because I have noticed that many middle managers LOVE meetings. The point is, as I have been told many times,

“So we can all get to know each other.”
and I wonder,
“Why?”

One reason middle managers and the numerati don’t get along is they seem to think differently. Take meetings. My view on most meetings with a middle manager with a Gantt chart is

“Why are you here?”

I’m not talking about the person from the department we are supposed to serve who can tell me about how the data are stored, what questions they hope the data can help them answer and problems with data quality. I totally get why she is there.

I also understand people at a higher level of management who have an enormous project and need to parcel out parts of it to different teams, who need to set priorities for resources. I understand what they are doing and why we need them.

What I DON’T get is the guy in the middle who organizes meetings, requires agenda and minutes so they can be forwarded to “upper upper management”.

Here is what I am thinking:

“My team and I are going to do the absolute best we can. Tell us how much money is available and when you need it done. Then, go away.”

I really, really don’t know what the middle managers are thinking. What I deeply suspect, though, is that when I read about people in the New York Times who used to have a job that paid $60,000 or $80,000 or $100,000 a year and now they have been unemployed for two years, that I am reading about THEM.

Jul

11

Back when I was in college, there was a group advocating burning rock albums. A major investigative journalist wrote a story on their motivation (I think he either wrote for Rolling Stone or Playboy, the latter of which, yes, I really did read for the articles. Despite having competed on my college track team and the U.S. judo team, worked as a programmer and played rugby, I am actually not a lesbian, a fact which frequently surprises people. But I digress. Even more than usual.)

One question that I remember was how the group came about their figure of 80% of out-of-wedlock babies were conceived by listening to rock music. The founder said they had heard this figure cited by an evangelist during a revival in their town. The reporter followed up with the question,

“So, do you have any data to support your album burning other than the traveling evangelist poll?”

There were many things wrong with this study, the first of which being, I suspect, that it didn’t exist. Beyond that, there is the sampling issue. Is 80% a high number? Perhaps it is the music listened to by women of child-bearing age, the Big Band, Lawrence Welk fans being primarily post-menopausal and thus not at-risk of pregnancy on either side of wedlock.

A causal relationship is at least implied, otherwise what was the whole burning point? To test this hypothesis, I turned on Blinded by the Light by Bruce Springsteen at full volume. Two unmarried daughters of child-bearing age were in the house, as was my husband.

No pregnancies ensued. Said husband remained downstairs building a robot with the world’s most spoiled twelve-year-old, although he did come up momentarily to ask if I minded if he turned down the music.

One daughter announced she was going to the apartment of the other daughter because, and I quote,
“No offense, but you people are boring.”

Which brings me to my tangentially related point… Lately I have been trying to come to the source of the frequently stated “facts” that

A. Small businesses produce the jobs that lead the economy out of a recession

B. Most jobs are created by start-ups

C. What small businesses really need are credit and counseling. Business plans always feature in there big.

I have no idea whether A and B are true or not. I rather suspect A is in part because there are way more small businesses than any other type. It goes back to the Traveling Evangelist Poll (whether it existed or not). If there are way more people working in small businesses then a 10% increase in them is going to be more than a 10% increase in the fewer number of people who work for large business.

As far as C, I am a bit confused. Vivek Wadhwa, who is a pretty interesting writer on this topic, had this article on Tech Crunch on July 10, with which I agree completely. The title is “You’re no Steve Jobs” and his main point is that the problem with many start-ups is that no one wants to buy their crap. He said it way more nicely than that, though.

Years ago, I used to spend some time on a forum for small businesses. One of the reasons I quit was because in the start-up section, no one ever said,

“What? Are you crazy?”

Instead, there were always supportive comments like,
“Live your dream, baby! I’m sure your business making hand-knit sweaters for turtles will make millions by next summer.”

These people are always saying that if they only had the money, they could have this amazingly amazing life but that the big bad banks would just not lend them the money to go out and buy a building to turn into a turtle sweater making factory.

The very odd thing, odder even than the turtle sweaters, is that the same week I read an article with which I mostly disagreed by the same Vivek Wadhwa ! (No, I am not stalking him, it was coincidence. I swear.)

In the Business Week article he says that what keeps Americans from starting companies is lack of knowledge, lack of financing and fear of failure.

Well, those aren’t 100% contradictory … they don’t start because of lack of knowledge and financing but they DO fail after they start because no one buys their products. (Maybe some of that fear of failure is realism.)

Having been in business 25 years and never once been part of a survey (hey, I’m WORKING here!) I was curious as to the source of these figures.

Being a good academic, Wadhwa did provide references, and the first was to a study of 549 entrepeneurs in high-growth industries.

I don’t doubt that one might find for this specific group that access to capital is a big barrier. To a small company becoming a big company in a short period of time, the capital to buy a building, working capital to meet a growing payroll, all are important.

What percentage of jobs are those, though? I don’t know but I don’t think it is a lot. Google and Yahoo both have offices in Santa Monica. Geocities had its headquarters a few blocks from where I’m sitting. Even in our relatively tech sector of the world, the number of “high-growth” employees are dwarfed by those working at the restaurants, hotels, liquor stores, car dealers, movie and TV industry.

In other areas where I work frequently, like North Dakota, and Washington, D.C., the proportion of “high growth” industry personnel is even smaller.

What about the “more jobs are created by start-ups”? I looked into that, too. There was a really interesting study by the Kauffman Foundation that pointed out that start-ups can ONLY create jobs. Their definition of a start-up is a business that started this year.

Jobs created = Jobs This Year – Jobs Last Year

Since the second part for a start-up is zero, it can ONLY add to the number of jobs.

Existing companies may hire ten people (cool for the ten people hired) but have 15 who retired, were laid off, fired or had a heart attack due to having sex while listening to rock music. Even though ten people were hired, the company has a net loss of five jobs.

I am not convinced, though, that the answer to economic malaise is to have a massive number of start-ups as many of them (like turtle-sweater lady) may be negative on the job-creation number by the next year.

Where do they get this idea that what small businesses really need is credit, so the government should give the banks more money to lend?

I went to Google, the source of all knowledge, and typed in “Small Business Survey.” The first several that came up were places like the North Texas Small Business Development Center, Citibank and the Huffington Post Survey on the Credit Crunch.

The latter asks :
“Small business owners: have you applied for business credit? Was it approved, or turned down? Have you not applied because you didn’t think you’d have a chance?”

I’m just sayin’ that perhaps organizations whose main function is to give credit, help you obtain credit and polls asking you if you have applied for credit might be a bit biased in the proportion of those reporting credit is an issue compared to say, the general population of small businesses.

On Monday, the world’s most spoiled 12-year-old is starting trapeze school.

Supporting the Economy

Supporting the Economy

I don’t think what the Trapeze School (which is a small business, not vulnerable to out-sourcing) needed was a line of credit or a business plan. From their perspective, what they needed was to swipe my credit card.

I have some more thoughts on representativeness (or lack thereof) in surveys but the world’s most spoiled twelve-year-old is asking to be tucked into bed and my husband is suggesting that perhaps The Rolling Stones would be better than Bruce Springsteen.

I doubt it. We already have a fifteen-year gap between the oldest child and the youngest. A few years ago, I thought I might be pregnant again (we DID go to a Rolling Stones concert around that time). He was very cool about it until we got the results that said I wasn’t pregnant and then he exclaimed,

“THANK GOD!”

So… I punched him.

Jul

7

I’ve been in business for over twenty years. All of that time, I have run a small business, by choice. During those twenty years, I have had a sick husband, been widowed, had four children – so I had some reasons that becoming the next Oracle was not my priority. However, I have made a profit every year, some years more than others, and have increased and decreased my number of employees as necessary.

The more articles I read on small business in general and women-owned businesses in particular, the more I wonder how many of those organizations talking about helping small business owners create jobs include people who have actually run a small business.

There seems to be a great concern about the disparity in access to venture capital. Now, that may be a concern for some small businesses but most of the people I know own consulting companies, hair salons, restaurants, retail stores or manufacture products like t-shirts. They are not attractive to VCs because they are not going to have exponential growth.

Many of these small business owners, like me and my friends, are going to be in business for ten, twenty years or more, and pay corporate taxes, payroll taxes and everything else our accountant says we have to pony up every few months.

What about jobs?

I think everyone trying to create jobs through small business should read the insightful article Andrew Grove, Intel-cofounder, wrote on this subject. Those high-flying tech firms create a lot of jobs – overseas ! One problem with the VC-find-the-next-Apple approach, of course, is that those jobs may help investors but they don’t help the U.S. unemployment rate. Many, many of the high tech, high ROI jobs end up in China and India. (Seriously, read Grove’s article. It’s great.)

Twenty years ago, my business partners and I decided against outsourcing because we did not want to employ fewer Americans and pay someone in another country a sub-minimum wage so we could be richer. I know that sounds un-American, but part of our motivation in founding a business, which still derives much of its revenue from work on reservations, was to make life better for people. Obviously, we are privately owned, so we can make those choices.

The other thing I don’t need that every agency and company seems to want to sell me is a business plan. I have a business plan. Like most companies, the gist of it is to have revenues exceed expenses. Okay, it is a little more than that, BUT – after 20 years most of the business owners I know are not kept from hiring from lack of a plan. In fact, their plan is to add workers to meet the demand. It is certainly NOT to take out loans (guaranteed or not) so we can expand and hire more workers.

If anyone really seriously wanted to help small business create jobs they would make it easier for them to get business.

I had to laugh. Several times, representatives from the same “small business services” company have called me telling me,

“We’ll help you get YOUR money from the federal government. After all, it’s YOUR money.”

and then went on to promise me we could get on the GSA schedule and agencies would be falling over themselves to just pull our name up and order a million dollars of consulting services from us. I told their representative that’s not the way it works and he assured me it was and they had done that for lots of companies. I told him to email me the name of one. I’m still waiting.

I am not sure where the stimulus money went. I see some signs that the roads are being upgraded with Recovery Act funds, so that is a good thing. I don’t actually know anyone who got any of that $200 million that went to the NIH in grants, although I know a lot of people who applied, but that all went to universities any way.

I may actually bite the bullet and complete the section 8(a) application this year, although it grates on me to do it. The time I spend on that will take away from billable hours so it will actually COST me money. I’m still debating on it.

Don’t get the idea that we’re sitting around here whining. We have work enough to keep the people we have employed and I am now looking for new contracts. We’ve already turned down a few over the last year, which may sound inconsistent, but it’s not.

We submitted one proposal in May, a second in June. We have too much current work to take time away to do a proposal this month. I’ll submit one or two in August and September, depending on how tired I feel.

Taking a six-month or shorter contract that takes up all of your time and keeps you from bidding on multi-year contracts is not good business. Just bidding on everything that comes down the pike isn’t too bright, either. We look for a match between our capabilities and what the client needs, for areas we can really do excellent work. That way, they are happy and come back to us again and again. After a quarter-century in business, we DO kind of know what we are doing.

I hear a lot about tax breaks for small business. Well, we pay a hell of a lot of taxes and that would be nice. Even though we will probably be exempt from the requirement to provide health care, I have always offered that as an option to employees and our costs may go up a little. Taxes and health care costs are not what keep me from adding employees.

It seems like the people aiming to help small businesses are sincere. However, it’s like the old cliche that when the only tool you have is a hammer, every problem is a nail. Because most of these organizations have people who know how to write business plans, fill out loan applications, apply for certifications of some status or another and lobby on Capitol Hill, that’s what they see as the way to help small businesses.

Most people who have been in business for decades don’t need some consultant to help them develop a business plan before they can add jobs. If their business has been around a long time, they already have a line of credit. I’m not sure what they need is tax cuts or worse health care coverage for their employees.

What they need is work.

I’m surprised I have to explain this to you.

Jul

6

So, I am writing a paper on how you know you (or someone else) is a “real” programmer. That is, they don’t fit in that “new user” box any more. But how do you make that decision?

Is it like pornography, you just know it when you look at it? (Not that I ever personally looked at any of course, but I have heard you can find it on the Internet if you try really hard.)

Yesterday, Rob Meekings made a comment about design decisions. That is certainly a distinction, when you get to the point that you are actually thinking that way. For example, I often will merge everything together in one long dataset, a habit that makes those who love SQL and the star schema just cringe. The REASON I do this is that most of the people I work with are researchers using very powerful computers with datasets of a few thousand observations, or, at most, a few hundred thousand. Even on a desktop, an analysis with SAS, Stata or SPSS takes seconds. It isn’t worth taking an extra hour or two to make a program run in one second instead of two. It also may make the program more difficult for the user to maintain him/herself.

HOWEVER, when I am running a program that runs against a 100GB dataset and can take hours to run because the researcher cannot use a supercomputer, e.g., due to security classification, I’ll spend a good bit of time trying to make it run as efficiently as possible.

If there isn’t a pressing reason not to do it, I’d recommend someone with a large dataset considering running it on a cluster and take advantage of parallel processing capabilities. This means changing your code slightly to run on a different OS, often Linux or some other Unix version.

I do a lot of “throw away programming”, that’s not to say it’s garbage. Sometimes I think my work is quite good, in fact, but it’s not production code that runs every day to produce reports on 500 different stores. When I DO write production code, I do several things differently. One is that I make good use of %include statements. For example, if there is a footnote that is going to be in every single output that says, “Funding provided by National Science Foundation Rural Systemic Initiative Grant #1234-2010″ and several more lines about the university, address for contact, etc., I am going to have a small file that I just include. Yes, I could copy and paste it or have that as a template for when I create a new program. BUT what happens when we get another grant and we want to recognize both funding agencies in everything we publish?

My point, and you may be surprised by this point to find that I do, in fact, have one, is that a distinction between novice and non-novice programmers is that they have the luxury of thinking about a design because they know more than one way to do something.

Jul

4

I'll get this down eventually

I'll get this down eventually

Writing a presentation for WUSS, I had to fill out the usual check box for the intended audience:

Level of programming expertise:

___ Novice __ Intermediate __ Advanced

and I started wondering when exactly does someone stop being a novice? One answer is that your programming no longer LOOKS like it was written by a novice. That’s kind of circular reasoning, though, isn’t it? To be more specific, here are a few of those signs, generated from a survey of a random sample of 1.

(Note, if your programming does not always show all of the characteristics mentioned below, you are forbidden to feel bad. All but a very exceptional few programmers will admit to having made every ‘newbie’ mistake when they started, and on occasion, they still do when they are rushed, tired or distracted by three fighting children or after their third martini. As for that exceptional few – they’re chronic liars. Stay away from them.)

Five signs you’re no longer a novice, in no particular order ….

1. Good use of functions

AvgQtr = (Jan + Feb + Mar) /3

is a sign of a novice

AvgQtr = Sum(Jan, Feb, Mar) /3

is better

AvgQtr = Mean(Jan,Feb, Mar)

is what an intermediate programmer would do.

2. You know options of options
3. You understand how the particular language you are using processes data.

For example, in SAS, let’s say you have two datasets

Pretest has the following variables: Id Age Gender Testscore
Where testscore is (obviously) the pretest score.
Posttest has the same variables: Id Age Gender Testscore
Where testscore is (obviously) the posttest score.

If you do this (bad!)

Proc sort data = libref.pretest ;
By id ;
Proc sort data = libref.posttest ;
By id ;
Data libref.alltests ;
Merge libref.pretest libref.posttest ;
By id ;

You have just created a dataset that is a copy of posttest because the testscore from the second dataset named will copy over the first.

Try this:

Proc sort data = libref.pretest out = pre (rename = (testscore = pretest)) ;
By id ;
Proc sort data = libref.posttest out= post (rename = (testscore = posttest));
By id ;
Data libref.alltests ;
Merge pre post ;
By id ;

Yes, you COULD have done this by at least one data step where you renamed the testscore variable, but adding an extra step is inefficient.

A good, short article on beyond the basics in proc sort was written by Kelsey Basset.

4. Use your knowledge of functions in your programming logic.
5. Don’t forget about missing values.

For example, a researcher wants to categorize people who have ANY positive response to five questions on raising taxes, “Would you vote to raise taxes if … the state budget isn’t balanced?” “Would you raise taxes if … the option was to cut social services?” and so on.

A novice response would be:

If q1 = 1 then taxes = 1 ;
Else If q2 = 1 then taxes = 1 ;
Else If q3 = 1 then taxes = 1 ;
Else If q4 = 1 then taxes = 1 ;
Else If q5 = 1 then taxes = 1 ;
Else taxes = 0 ;

Better

If sum(of q1 – q5) > 0 then taxes = 1 ;
Else if sum(of q1 – q5) = 0 then taxes = 0 ;

The reason for having the second IF in there is that if you do not then all of those with missing values get set to zero, which may result in throwing off your results by a great deal, depending on how frequent missing data is.

There are a variety of ways, some better some worse. However, one statement that does exactly what we want is :

Taxes = Max(of q1 – q5) ;

If any of the questions were answered 1, the value of taxes is 1. If all were answered 0, the value is 0 and if all were missing, the value is missing.

I saw a similar example from SPSS on Douglas Smith’s page. Although Recode is actually a command and not a function, my point is the same. Once you proceed from being a novice, you are naturally seeing the ways you can make your program more efficient.

“Another example of using recode might be to invert the order of the values for a subjective evaluation variable. For instance, the variable “happy” has three valid response categories:

1 = Very Happy
2= Pretty Happy
3 = Not Too Happy

You might want to change the order to go from least happy to most happy. To do this, all you need to do is swap the values 1 and 3. The recode statement that will accomplish this is:

recode happy (1=3) (3=1).

Oh, and if you don’t use the command window, much less the Do-file editor in Stata, you are definitely a novice. Same goes for anyone who doesn’t write syntax for SPSS or hasn’t found a use for the Program window in SAS Enterprise Guide.

That isn’t to say that there will never come a day when one can be considered a programmer by simply being very good at pointing and clicking.

Just sayin’ …. today is not that day.

Jun

28

Sabbaticals?

June 28, 2010 | 2 Comments

When I was in high school I had a very defined career path. I told anyone who asked me (which was very few people since no one cares what a high school kid thinks) that my career goal was to be president of General Motors. I even applied to the General Motors College. (Bet you didn’t know they had their own college!)

They were my first choice but due to a glitch in getting the materials in from my school, my paperwork was not complete by the deadline and their rules were carved in stone.

So, I went to Washington University in St. Louis, where I received a great education despite the fact that I attended slightly more parties than classes. Wash U is generally known more for its pre-med program than as a party school but I didn’t let that stop me.

At 19, I graduated from college. Worked full-time all through college, worked full-time while getting my MBA and as an engineer for several years after that. For a while, I taught math and got a second masters at the same time.

At 29, I quit working full-time so I could finish my Ph.D.
After I graduated, I started working as a professor, expanded my consulting business I had started in 1985 into full-time. (Yes, that’s two jobs.)

At 39, I quit working full-time and took a post-doctoral position for a year. (Having my fourth baby at 39 slowed me down a bit.) Then, I took a position at a consulting company and continued the company I had started in 1985, taking on more and more business. (Yes, that’s two jobs.)

At 49, I quit working full-time, briefly retired and then took a position at a university. By then, the consulting company had split into two companies,which form The Julia Group. (Yes, that’s two jobs. In fact, for a while it was three as I was teaching statistics for the graduate division of another university.)

Someone noticed this recently and asked me,

“Do you deliberately take sabbaticals? That is supposed to be something only done in universities. And what are you going to do now?”

Well, it hasn’t been completely coincidental that I have done reverse sabbaticals and gone to a university every decade.

I find that as a consultant, I get paid for what I know and what I do. So, I may get asked to do a repeated measures Analysis of Variance over and over, for six projects in a row. Or, I may find myself repeatedly getting contracts to write grants for the Department of Education, because I have already gotten several funded.

Business is like that. There may be a few rare jobs where you get paid to learn things but those are mostly jobs where you learn things AFTER you have already put in a 40 hour week and those are your other 20 hours.

When I was an undergraduate, back when I attended classes with Fred Flintstone and Barney Rubble (if you even recognize that reference you are old!), there was a saying,

“No one ever got in trouble for buying IBM.”

Business hasn’t really changed all that much. Plenty of people buy Microsoft products because that is what they have always bought. If you’re going to hire a consultant to do X, you are pretty safe hiring a consultant who has already done X seven times for satisfied clients. That way, even if the person screws up, no one can blame you, it’s a reasonable choice.

So, every ten years or so, I get tired of doing X and I decided to do Y or 7 or purple.

I decide,

“You know, programming in SAS is cool, but I think I’ll take a look at what this Enterprise Guide thing will do, or maybe JMP or data mining or see what they’ve been developing at SPSS. Or, what the hell, maybe I’ll just go to Beijing and Tunisia.”

More than once, I have been called “insane” for giving up a great opportunity. The irony of that is that the second and third time I was giving up insanely great opportunities that I wouldn’t have had if I had not been “insane” enough to give up the first one.

I’ve been an engineer, math teacher, professor, statistician, programmer, consultant – and for thirty years run a business while raising four daughters.

And yet, the bizarre fact is that it has all turned out okay. After every “sabbatical” (which, incidentally, has always entailed a HUGE cut in pay because university salaries * blow * compared to the corporate sector), I’ve stepped into a new stream that paid much better and was more challenging than when I left.

Not only have I ignored every bit of career advice I was ever given from, “Stick to one thing,” to “Dress for success” to “Don’t have pictures of your children on your desk or you won’t be taken seriously” to “Always show up at work before the boss” to “Don’t express your own opinions”.

but .. to most of it I have replied,

“Bite me!”

It occurred to me that I have not so much had a career path as a career random walk.

Yet, it has turned out okay, as measured on The Julia Group scale, which is a factor score consisting of (unequally weighted) jelly beans, Chardonnay, time spent laying on tropical beaches, how much I love my children, terabytes and years of marriage to someone who brings me coffee in bed at 9 a.m.

So, what now? Well, I have a contract under review with a federal agency, six papers I’m committed to write and the family wants to go to Hawaii.

After that? I haven’t the faintest idea. But I’m sure I’ll like it. Because if I don’t, I won’t do it.

Jun

22

A large part of my day is spent playing with new software and trying to break it. Yes, there are actually grown-ups who get paid to do this for a living.

I find it hard to believe myself.

The theory, which actually works well, is that whenever someone has a question about something he or she wants to do, no matter how esoteric, I will have tried it at some point, based on my general philosophy of life which is, “What the hell… let’s see what happens.”

My inappropriately named desktop, since it is actually under my desk, runs Mac OS 10.6 and has five virtual machines with Vista, Windows 7 (32 & 64 bit), XP and Ubuntu. There is a supercomputer over my head that I can tap into from here directly that also runs SAS and Stata. So, why would I need JMP?

Besides, what really annoyed me at all the JMP events I went to (an N of 3) were all about look at these pretty pictures we got with JMP and nothing on how to do it. Finally, I went to one at SAS Global Forum which was by Wayne Levin of Predictum and was excellent (full disclosure: I probably wouldn’t recognize Wayne Levin again if I tripped over him, I only know the name because it is on a handout on my desk which has not been cleaned since I got back from SGF and he’s never given me so much as a jelly bean. It was still excellent.)

JMP is one of the many things that has been laying around here for the last couple of years that I’d look at every now and then, and think maybe I should do something with this. Lately, three things occurred to me.

1. It runs on a Mac, thus sparing me the 30 seconds of opening a virtual machine, that could then be used for such extremely important tasks as getting jelly beans out of my drawer.
2. It makes pictures, which fits well into my current interest in visual data analysis.
3. It gives me an answer for people who call up and say,
“SAS doesn’t run on a Mac? What the hell am I supposed to do now?”

I am actually married to one of those people who doesn’t believe anyone should buy software unless it AT LEAST runs on a mac and preferably Linux, too. Learning JMP turned out to be less trouble than finding another husband as good as the one I already have, so I decided to go with that.

I had a dataset downloaded from ICPSR and that I had done lots of work on in SAS. I was working on a project with someone who only uses JMP. So, I saved the dataset as a JMP file. We were working on a project to predict who would enlist in the military. I had a sample of > 2,500 high school sophomores who had been asked their plans after graduation. In JMP, I select ANALYZE from the main menu and then DISTRIBUTION. I moved the two variables into the Y column and clicked OK.

JMP TIP —-> NOTICE THE ARROWS —>

Those little red arrows next to almost everything do stuff. For example, when the results window first came up, I didn’t like the looks of it. No, it wasn’t rolling its eyes at me. It had the histogram vertically oriented and a table of Quantiles I had no interest in. Grey arrows expand and contract things. Red arrows give you options. If a grey arrow is pointing dowm and you click it, it hides what is underneath. Conversely, if it is pointing sideways it has hidden stuff underneath and you can click it to expand and see what that is. So. I got rid of the quantiles.

Clicking on the red arrow next to each variable gives a whole list of options and some options of the options. I clicked HISTOGRAM OPTIONS and then I clicked on VERTICAL which had been selected by default. Then I selected SHOW PERCENTS. Here is my first picture and my first conclusion. People are a bunch of liars.
distro_army1

Curt Gilroy, who was cited in the Army Times and has the impressive title of Director of Accessions for the Pentagon (which does not, despite what may have been implied by Sister Marion in the seventh-grade, have anything to do with the Virgin Mary going to heaven. That was the Ascension, or the Assumption. Either way, it definitely did not involve the Pentagon.)

Anyway, Gilroy says that 12% of military eligible youth show an interest in military service. So, if we put the 4% who said they “definitely will” (=4) and the 9% who said they “probably will” (=3) join the armed services after high school, we get 13% which sound about right.

However, 89% say that they definitely or probably will go to a four-year college. Uh, no. First of all, the percentage of freshman students who will graduate is only 73% according to the National Center for Education Statistics and of those only 69% will enroll in a four-year school. So, .73 *.69 = 50.4% and even given that some of the high school drop out has already occurred by the spring of tenth grade, uh, how about no, 89% of you are not going to four-year schools I am sorry to say.

I think race is a factor in military service. The data I used included race as 1 = African-American 2= White 3= Everyone else. I thought that third category doesn’t really make much sense for analysis. So, I created a new variable African-American which was 1 if race =1 and 0 if race = 2 or 3. Here is how:
Select COLS then NEW COLUMN. In the pop-up window, give it a name and then select FORMULA under column properties.
In the functions select CONDITIONAL and pick IF.
Formula box will pop up and it should be pretty obvious. You can just click on RACE to have it moved into your formula, then type = and put a 1 in the first box and a one in the second box for
If RACE = 1 then the new variable = 1.

Next, I can go to ANALYZE, MODELING, PARTITION and click on SPLIT a few times and I get my decision tree. It’s a start. I still think race should factor in there and I think the reason it doesn’t is because of that “garbage category” of three for everyone else – Asian, Native American, people who didn’t say. My hypothesis is that if I change that, race will become a factor.

decisiontree

So, what would I do with JMP? I guess since I should have left for home an hour ago, the answer is “get immersed in questions I’m interested in and lose track of time.”

Essentially, the same thing I do every day.


Blogroll

WP Themes