I was debating whether or not I should go to the Tech Coast Angels Fast Pitch competition last night since I had to catch a cross-country flight out of LAX less than two hours after it ended. In the end, I decided I would be crazy not to go. It’s less than ten miles away, the largest group of angel investors in southern California and a chance to see the ten finalists each pitch their companies in ninety seconds. On top of that, there was a moderated panel with six angel investor/ venture capitalist members of the Tech Coast Angels. As icing on the cake, there was a second panel with founders of several companies in early stage funding. And, for a cherry on top, I guess, they had a fireside chat (without the fireside) with Mike Jones, CEO of MySpace.
The angels were asked what characteristics they looked for in their “perfect” candidate for investment, recognizing of course that perfect candidates don’t exist in the real world.

Advantages they looked for included a serial entrepreneur, someone who had founded and sold at least one company already, someone who had previous experience in the same industry. Another advantage was being in a dynamic industry. They noted that the advantage of an industry with a lower growth rate was that a start-up would not have as many new companies as competitors but a major disadvantage was that there might not be a buyer out there seeking to acquire the company once it became (more) profitable.

Mike Jones said that a disadvantage to him would be if the company outsourced its technology. If you are supposed to be a technology company and you don’t have a technology person at founder/ partner level, that would be a red flag to him.

They all agreed that hiding anything was a deal-breaker. They gave the joking example of a felony conviction, but I presume they meant more things like a previous failed company, bankruptcy, being fired, poor credit or other non-criminal problems, as well as technical difficulties.

There was some discussion on whether ‘exaggerating’ was a deal breaker. Jim Adelman definitely felt it was (I think I like this guy). Others argued that it was human nature to over-sell yourself a little on your resume, but things like claiming a degree from Harvard that you did not have would be a definite out. Adelman disagreed. He wanted complete honesty.

On another topic, Jim Adelman commented,

“One thing we know about all operating plans is that they are wrong. The operations will not turn out the way you think. HOWEVER, your operations plan should be realistic given the drivers in the market.”

Let’s return to his first statement for a minute because it was a point that was made over and over throughout the evening. Scott Sangster put it like this,

“One thing for sure is that the company you end up with will not be the one you started out with. [You need the confidence…] to make that journey.”

Another of the angels (or maybe it was Scott again) said, “the business you end up with is never the one you envisioned as you started out.”

Mike Jones said that his company, UserPlane, originally intended to sell to the education market but found that the sales cycle was just too long for them. Next, they considered the health care market, but that too did not pan out. Finally, they found that dating sites and social media sites really were interested in their product, so they went after that market. The product was eventually sold to AOL for $40 million.

There was a great deal more insight I would have missed if I had skipped this event. For me, I would have to say that by far the most interesting and informative part of the evening was the questions that were asked of the presenters by the angels after each pitch.
It was SO worth going, even if it did mean that I had to fly all night to be in Florida by the next morning.

I’ll have to write about that later, though, because I am pretty sure if I don’t get coffee within the next fifteen minutes I am going to die.

I’ve just been hired on a project that uses SAS and I know nothing but SPSS, but I know that really well. Can I come over to your office tomorrow?

… so began a very interesting afternoon, where the new employee would show me code that she had written, say VALUE LABELS and ask, “Can you translate that to SAS?” and I would tell her about PROC FORMAT.  In a few hours, we breezed through the basic statements and finished up with regression.

I was confused when, in my first year of college, professors and classmates alike assured me that it really did not matter what programming language I took, because they’re all pretty much alike. Of course not all languages are interchangeable but I do now get the point – if you’ve never programmed before, a lot of what you have to learn, like re-usable code – whether you call it a macro, a method or a subroutine – is going to transfer to any other language. You’ll have variable types, arrays, functions, loops. I think I had pretty smart professors who refused to let us learn computer science in the abstract. You had to use a language, but they were right, it really didn’t matter which one.

The most fun part of Ruby, if you ask me, which you didn’t, but when has that ever stopped me, is defining methods, which if you’ve ever coded a SAS macro, you already understand. Defining a method looks like this

def methodname(parameters)

— lines of whatever you want the method to do

end

To call the method, you type


methodname(parameter values)

If you are used to writing macros, you are already translating this in your head to something like


%macro readfile(dsn,outfile) ;

Data lib..outfile ;

set &dsn ;

keep id total ;

%mend ;

%readfile(lib2.myfile, outfile) ;

Of course, if you have never written a SAS macro you may have no idea what I am rambling on about. Join the club.

I’ve been having a lot of fun with Ruby. While there are lots more upsides, two down sides need to be pointed out:

1. I was wrong. I hate that. Last year, Mark Stevens wrote a blog on Zero to SAS Certified in 3 months and I questioned what good was a certification you could get in three months. He explained that he had learned Fortran, C and VBA. I do understand how you could learn one programming language at the basic level in studying for 50 hours if you came in with a good understanding of how things work.

2. There are some things in Ruby (like the ubiquitous curly brackets) that are going to throw off SAS users. Two level names mean something completely different in Ruby.

page1.length  Does NOT indicate a two-level name with page1 as the directory and length as the file. When you see construction like this, the first thing is an OBJECT and the second is the action being performed on that object. So page1.length will (not surprisingly)  return the length of the object page1. Even though it is a switch from the syntax you are used to, I mean, come on, it’s so friggin’ obvious – fname.capitalize is going to capitalize the first letter of fname, name.upcase is going to put name in uppercase

Overall, I think there ought to be more of a push for people to branch out and learn new languages just for the hell of it, preferably one that is somewhat different than what you are used to. The similarities make it easy to learn while the differences, like the way Ruby handles arrays and variable definitions, make it worth learning.

Speaking of learn, I read an interesting paper by Gongwei Chen about SAS certification, where he discussed the benefit of getting certified as an advanced programmer even though he had ten years of experience.

I’m debating the idea of taking a Ruby certification exam, not because I think it will get us more business or impress my friends, but for the reasons that Gongwei mentioned in his paper. He said that it forced him to concentrate the time he spent studying and trying new programming techniques, because he had a target, and that, even though he had been programming for ten years, he used a defined subset of procedures and taking the exams forced him to broaden his view. I’d never really thought of certification in those terms before, and it was an intriguing perspective.

The first programming language I learned, 36 years ago, was BASIC, followed by Fortran the next semester. A couple of years later, my employer had me learn COBOL, FORESIGHT and M (the successor to MUMPS). Haven’t touched either in decades and of the three, COBOL is the only one that I’m sure still exists.

In 1982, I had my first SAS class, and I was around when PROC CATMOD (for categorical models) was a new idea, and lo-o-ng before things like structural equation models with SAS (that’s crazy talk! Learn LISREL for that.) I learned SAS macro language and forgot the % on %TO about 735,333 times. All the %eval and quoting functions drove me crazy …

… but in the end, sooner or later, I managed to bash whatever I needed into shape and my programs would run.

I decided I needed to do something new. In part, because I like new challenges. There have been points in my career when I would get so bored with what I was doing that I would write statements starting with the semi-colon and working backwards. You know then that your work has gotten too redundant.

Also, SAS has been around a LONG time now and that means that much of the way we do things is because whatever new statements, functions and procedures are written have to be integrated with the million lines or so of existing code. Much of that code was written before web pages, twitter and other essential fibers of our data digital lives were a consideration. The result is that you CAN parse text with SAS but it’s not really optimal.

So… after much soul-searching, or about as much soul-searching as could be done in the time to drink one glass of Chardonnay, I settled on Ruby. Why Ruby? I wanted something relatively new that would not have been encumbered in writing it with a mountain of legacy code. I was interested in analyzing data on the web primarily. The projects I had in mind were not going to be using petabytes of data but rather focused on a well-defined set, so the ability to handle ‘big data’ didn’t even figure in there.

And, I saw a couple of Ruby books, like Peter Cooper’s Beginning Ruby: From novice to professional, and The Little Book of Ruby that were very simple to follow. There were also a lot of videos available on-line, many so amateurishly produced they were unintentionally funny, but I wasn’t looking for Macbeth, I was looking for information, so that was fine.

Could I just as easily decided to pick up Python or Perl? Sure. But I started with Ruby for no particular reason and it was just SO easy to learn. Your mileage may vary. Some people who have never programmed before found Ruby, and the same resources I used to learn it, to be completely incomprehensible.

Now we come to the whole point of this post… If you’ve used SAS and you are thinking of picking up another language, you’ll find Ruby to be a piece of cake. Take this simple example from Cooper’s book

File.open("text.txt").each do |line|
line_count += 1
text << line
end

Of course, you recognize the File.Open as opening a text file (similar to the good old FILENAME statement. Then there is your DO loop with the DO and the END. The line_count += 1 is not identical, but similar enough to the syntax of line_count + 1 used in SAS that you could probably figure that it serves the same purpose of incrementing the variable by 1.

Or this example, from The Little Book of Ruby. If you aren’t familiar with SAS macro language it probably looks like something an Ewok would say.

puts( "\n\t#{(1 + 2) * 3}" )

On the other hand, if you are familiar with all the need to learn about masking special characters and specify when a value to an argument is to be interpreted as anything but text, you probably said,

“I’ll bet that goes to a new line, tabs over and then interprets (1+2) *3 and prints it out.”

You would be exactly right.

If you’ve worked with ODS much, you are familiar with the idea of inheritance, with different types of tables having certain characteristics and subclasses of that type inheriting all of those characteristics, even if you haven’t used those precise terms.  PROC TEMPLATE, which I avoid like the plague, even though Cynthia Zender tells me it will catch me eventually, is another example, where you can create a new template based on an existing one and it inherits all of the characteristics of the parent template.

Ironically, it was working with macros and ODS that drove me to look for a second language. I just thought there must be an easier way to work with text and I sure found it.

With books out there like R for SAS and SPSS users, Stata for SAS and SPSS users, etc. etc. I can’t believe no one has written a book Ruby for SAS users. It seems like it would be a natural fit. You could do the text processing with Ruby, which is free, easy to learn, and more than that, does not take a staff of Jedi Knights to install, unlike SAS Business Intelligence solutions. Then, you could pass the file, now structured data, back to SAS for statistical analysis.

After that, as Lewis Carroll said, it’s “Oh frabjous day!” and your life is beamish.

kid with lollipopRemember how you felt when you were a little kid at Disneyland and you got those lollipops as big as your face? Yeah, when it all works, you feel like that.

We invest in our 401ks, we need to think about investing in our government. A new way of thinking about investment, from Alan Silverman


That’s a great idea. Especially from the point of view of parents who are wanting to have a better world for their children. There is the opportunity to invest locally in your neighborhood association all the way to internationally. Opportunities for ‘investment’ range from social media – blogs, citizen journalism – to data analysis, organization ala wikipedia editors and more.
Can there be “Facebook Science”?
A new way of thinking about data, from Jean Holm – NASA/ JPL
Jean Holm was the reason I came to Gov20LA, because I am very interested in open data. With Egypt’s president resigning the previous day, there was some talk about the politics of open data. She said that Twitter enabled the transformation of government rather than causing it. I agree. She also said that within the U.S.,
Government data philosophy is moving from a “need to know basis” to a “need to share”.
Personally, though, I am very interested in the use of open data for research and skeptical about the uses of twitter, mash-ups and other social media for scientific research. I’m very skeptical of anyone’s ability to think deeply in 140 characters. Not that there cannot be brilliant tweets -  I just don’t think these will replace refereed journals.
Holm gave some really interesting uses of social media. These including finding NASA Experts via Social Networks, they have combined to form SpaceBook (get it- groan)  which pulls expert attributes from existing systems, e.g., publications database, what they charge their time to. So if I am a NASA researcher looking for someone else doing similar research, finding a colleague is at my fingertips – and objective, not like those people who claim to be experts at everything and then turns out they couldn’t find a derivative with a flashlight and a map.
There were a lot of really impressive ideas and brilliant people, more than I expected. What did I expect? Well, I expected that the one or two speakers I knew would be really knowledgeable, maybe a few other people there would be fascinating and most people would be just like most people you meet everywhere. It was kind of the opposite of that. I’d say the brilliant, fascinating and competent distribution in the room is skewed very, very far to the right of the normal curve.
It wasn’t all perfect. There were a couple of comments both during presentations and in conversations afterward that made me want to join the Tea Party.
Hint: If you are telling members of the public in essence “Our agency is perfect and if you had trouble finding the data you need you must be a moron”, um, the odds are your agency isn’t perfect and if there is a moron in the room well…
Still, those were FAR the exception. If you want to get excited about government (Yeah, sounds crazy, I know) or if you are a government and want to do better, you should be here.
And hey, city of Santa Monica, it’s still going on today and your citizens are wondering why the governments of Canada, the United Kingdom, the United States and Los Angeles are represented and we’re not. It’s not too late. In fact, if you leave now, you can meet me there.

Remember that old saying that 1,000,000 monkeys on a typewriter would eventually produce Shakespeare? Shakespeare

After the equivalent of more than a 1,000,000 monkey-years of text published on the web, so far, no Shakespeare. (For a superb, in-depth discussion of this point, read Jason Lanier’s book, “You are not a gadget”)

In very, very, brief, Lanier  says that crowd-sourcing is NOT a panacea, that comments from 250,000 Introductory Physics students do not even equal a mediocre physicist, much less Einstein.

Looking at the movement for Open Data, I strongly agree with his concerns. There are two reasons for my skepticism:

  1. Even with considerable knowledge of statistics, programming and the field in question, analyzing data correctly takes a long time.
  2. Without that knowledge, your results may be not only wrong but harmful.

Let’s start with the second problem. Some days I wonder if any of the people hyperventilating at the prospect of 100 million citizens doing their own analyses have ever heard of a Type I error. Let me burst their bubble …

Soap bubble

Let’s take death from car accidents as a variable, since it is pretty easy to tell if a person is dead and it is pretty hard to misdiagnose getting run over by a car. We could hypothesize that people who live on the west side of town are more likely to die in car accidents than people who live on the east side. It would be unlikely that the percentage of people dying in a given year will be EXACTLY the same with 1.67845% of the people on both sides of town dying, maybe it will be 1.70000% on the east side.

We understand that slight differences are expected, even when there really is no “real” difference in the population but it would be unusual to get a large difference by chance. How unusual? A statistically significant result is one that would occur by chance less than five times out of a hundred.

Of course, if you have very small sample sizes, for example, looking at the people who live on the east and west sides of Zap, North Dakota, you might find substantial differences in percentages, with 25% of the east side of Zap passing away compared to 0% on the west side. (Last I knew, Zap had a population of 8 ). That difference, though large in percentages, would not be unusual at all.

EVEN IF everyone knew exactly what they were doing and EVEN IF every analysis was done exactly right… 25,000,000 analyses would STILL give us 1,250,000 statistically significant results JUST BY CHANCE. Somewhere in with those results that occurred by chance are the real, honest-to-good differences. But how do we know which are real?

There was that very, very important qualifier,

“Even IF everyone knew what they were doing”

which as my lovely daughter puts it

“has the same probability as flying monkeys coming out of my butt”.

Many of the people who are asking questions are just throwing up a bar chart … Let’s skip over the innumerable mistakes and over-generalizations that can occur and assume instead that we get thousands of sociologists, demographers, statisticians,  knowledgeable people who majored in various fields or studied them as a hobby and now want to apply their knowledge. Way to go! Wikipedia of data! Woot! Woot!

What would THEY do? Maybe ,,,

  • Look for a pattern of results,  to see if we find that in every city in the country the people on the east side are more likely to have a close encounter of the fatal kind with a Toyota.
  • Or … see if we find the same pattern five years in a row
  • Or …split the data randomly into two datasets and look for the same pattern in each one

I’m not saying it couldn’t happen. I love wikipedia, and appearances to the contrary, I love the idea of open data. Here’s the thing … it will take a lot of time, effort and knowledge. When I download data from the data.gov site it invariably takes many hours of my  time to read the codebooks to understand what each variable represents, read the technical summary on how the data were sampled, stratified and weighted. Then, I run the programs to read in the data, often requiring merging multiple datasets from data.gov or from data.gov and other sites to include additional variables I might want, say economic variables for each census tract merged with test scores.

If I’m lucky, reading what I need to know to understand the data, running the programs and doing some basic analyses will take me a day or two. Often it can take as much as a week. And I have been using SAS (which many of the programs use) and working in applied statistics for over 25 years, I have a Ph.D. and I am almost always looking at data in areas where I’ve done research for a decade or more.

If I had an interest in, say global warming, and wanted to examine climate data, it would take me MUCH longer because there would be so much I would have to learn before I could begin to understand it.

Having said all of that, I think there may be a great possibility for open data, but perhaps not in the crowd-sourcing way. What I would love to see is NOT apps, NOT tweets but actual study. There is a model for this.

For years, the Inter-University Consortium for Political and Social Research has offered data to anyone at those universities that paid an institutional fee to belong. It was often not used

a) because in many schools people didn’t even know it was available and

b) it required someone with very good skills with SAS, SPSS or Stata (the syntax, not the pointing and clicking kind).

For those who did use it, the data were a gold mine. If you had to make the argument that in counties with higher rates of unemployment people were more likely to apply for disability benefits – bingo! You could find right there datasets on disability claims and unemployment.

I used the ICPSR data for many, many examples in courses. If a student was interested in a subject, I would help him or her obtain the data, put it into a dataset that was easily analyzed and provide assistance with the statistics.  Most results were on the lines of showing that children from low-income families were more likely to attend schools that had low API scores. I can think of a couple of students who came up with results I thought were revelations.

Now, data.gov and other open data resources are offering up data similar to ICPSR on a grand scale. What I’d love to see (and I’ve mentioned it before) is a repository for the RESULTS. Just like wikipedia has references and a format, I’d like to see that created for open data, with a link to the data, a link to the program, brief results and a contact person for more information.

I’d like it to be edited, where if someone posts results showing that women are more likely to die in childbirth than men and asserts that is proof that obstetricians are sexist that it be taken down – no, not a competing explanation of the results given, but actually deleted. Yes, care would have to be taken to be sure this didn’t end up like just a very, very large refereed journal (don’t even get me STARTED ranting on that!).

I just went in her room and asked the world’s most spoiled twelve-year-old what she was up to, she said,

“I’m plotting world domination. I’m going to start with the violent overthrow of the U.S. government. Since they’d probably ground the flights if I brought down the government, I’m going to do it after you get back from your trips in March, so you don’t get stuck in an airport, but before April 15 so you don’t have to pay taxes. See how thoughtful I am? Now, can I have an iPhone 4 and go see the Justin Bieber movie with my friends tomorrow?”

(I think she was being sarcastic.) Despite having an actual quote, and the fact that her current events report for social studies was on Mubarak’s resignation, I don’t think wikipedia would allow me to put up an article “Egypt youth protests spark pre-teen uprisings in America”. You NEED editing.

A good start for a wikipedia of data might be to provide encouragement and support for universities and professionals to use open data for their course assignments, dissertations, theses, blogs, conference presentations and then put up the results. Most of the work done by academics, students and other professionals never sees the light of day outside of a one-time presentation to a room of 30 people at a conference or class. Sometimes that’s just as well, but a lot of times what is presented is as good as most articles on wikipedia, and we’ve seen how many people that has helped.

Yes, it will be hard and a lot of work, but I think an open data wikipedia can be created and would be extraordinarily useful if done right. I just wouldn’t leave it up to monkeys.

Monkey

My favorite comic is the one where Dilbert is pointing to a number like 7,345,897 on a slide and saying that he did not have any real numbers so he just made some up because statistics show that numbers you make up are just as good as real ones. A member of the audience asks how many studies have shown that and he immediately responds, “87″.

The truth is, I don’t remember if the actual numbers were 7,345,897 and 87. See, it works!

In an unsavory combination of pseudo-intellectual bullying and hucksterism, there seem to be a lot of numbers thrown around lately that I am just not buying. Here is one I read in several contexts, including forum discussions on what all small businesses must do, a business coaching site (which I was never quite clear, even after a careful reading what they actually would coach us to do) and spam emails about the latest product/ service I must buy. They all said,

“And if you don’t do this, you will FAIL and be unemployed because there are 1.2 billion Indians and 1.3 billion Chinese who are working harder and have more technological expertise.”

There are some really smart people in India and China, I’m sure, but I was a bit skeptical about whether all 1.2 billion people in India were out for my job. I thought I would check some actual statistics.

Just as I suspected, it turns out that there are old people and children in India! In fact, 30.5% of Indians are under 15, so they are not going to be finished graduate school until after I retire. Another 5% are over 65, and while some no doubt work still work, I’m going to guess half are unable or unwilling. This reduces the figure by a third to a still considerable 800,000.

I’m required to be fairly literate for my work and the literacy rate in India is variously reported to be 61 – 65%. This brings down the number of people out for my job to 500,000,000 or so, which is still a lot but also a lot less than 1.2 billion.

Most of the work I have done over the years has required a Ph.D. or at a minimum a masters and 5-10 years of experience. Let’s just give shouting business coach guy and the spam people the benefit of the doubt and say that at least an adequate job could be done by a really good college graduate.

Wikipedia, usually a more trustworthy source than random people on a forum calling me a fat lazy American, says that the very well-respected Indian Institutes of Technology enroll about 8,000 students annually. That is a very far cry from 1.2 billion, and I’m guessing some of those students want to go to graduate school, teach at universities in India and do other stuff than take anyone American’s job.

According to research by Vivek Wadhwa, when we hear that India graduates 600,000 or a 1,000,000 or 1.2 billion engineers a year, it just flat is not true. Yes, there are a lot of people who graduate with a diploma that says engineering on it somewhere. However, he says, that is similar to if we counted as an engineer everyone who has a B.S. in Electrical Engineering from MIT, a two-year associate of science degree, a certificate from DeVry Institute, an Information Systems Management degree from the University of Phoenix or a Social Media certification from the American Institute of Social Media. I believe I just made up that last one, but I wouldn’t be surprised if it turns out to really exist.

Yes, we face international competition more than in the past, both because it is easier to outsource work due to technological advances and because their own educational progress has made some countries more competitive.

However, I think that the assertion that every one of the 1.2 billion men, women, elementary school children, new born babies and grandmas is a direct competitor for a high-level technical career is an estimate that is off by about 400,000%.

Ironically, the last one may turn out to be the closest number to accurate in this whole post.

SAS macros (well, any macros, really – Excel, whatever), can be a great thing. While at first glance they may look a bit hairy, if you have repetitive tasks it makes it easier to debug your code. That may sound nuts at first glance. How can adding something with the %global %do &var  and call symput(‘fname’,filename) kind of stuff make your code MORE readable.

It really can, though. For example with a project I am working on now, I have to do the same 30 lines of code  80,000 times, reading in 80,000 different files. I don’t know which is worse, the thought of having to write over two million lines of code or the thought of debugging it.

Even though I cannot imagine ever doing this without using macros, there are a couple things you need to keep in mind when using macros with the FILENAME statement.

Let’s just pretend that I am doing an analysis of all of the disciplinary records of all of the schools in the Las Tortugas Unified School district over several years. Every time a student whacks another student on the head with a math book and gets sent to the office, there is a report filed. At the end of each day, each school uploads its disciplinary report files to the main server. The first question is, how do I get the names of all of those 80,000 files?  If there was some logical file structure, like school 1 to school 1700, I could do a %DO loop, but schools actually have names (unless you are New York City), so that option is ruled out. Well, I’m sure as hell not typing them in! And, yes, I can think of a whole lot better ways for people to collect data than this also, but my clients aren’t usually interested in hearing how much easier my job would be if they had been collecting and entering their data differently for the past ten years.

Here’s my solution. This is in Windows but you could do the same thing easily enough with Linux.

Go to the C:\ prompt

Type

dir/s  > filenames.txt

This will output all of the directories and subdirectories into the file filenames.txt .  Here is one of those examples where you can be too smart for your own good. The file will have several lines of information at the top which are not file names but directories. Then, it lists the first directory. Then, every file in that directory. Then, the next directory, and ever file in that, and so on.

I briefly considered a solution with FIRSTOBS = and counting the number of lines until the first directory, then doing an INPUT statement with @@ to keep it on the same line and ….

Then I realized that all the lines that had a file were in this format:

01/12/2011 10:00 AM 33789   filename200901_1.txt

That date was the date that the files were received from the client and copied on to my hard drive, so it was the same for every file.  All I needed was the file name. As is common in projects like this, the files were saved each month to a separate subdirectory, with all of the files for a given year in one subdirectory and each month in a subdirectory under that, so the full reference for the file above would be:

c:\annmaria\ltusd\2009\01\filename200901_1.txt

So, I did this:

FILENAME names "C:\annmaria\ltusd\filenames.txt" ;
DATA in.files ;
ATTRIB a length = $10.  filename length = $40. ;
INFILE names ;
INPUT a $ b $ c $ d $ filename ;
yr = SUBSTR(filename,9,4) ;
mnth = SUBSTR(filename,13,2) ;
KEEP filename yr mnth ;
IF  a =  "01/12/2011"  then output ;

You don’t need to do this next part, but just to test a few things before I really get into the macros and run it 80,000 times, I do this:

%let fname = RIOSECO_2011011.txt ;
%let yr = 2011 ;
%let mnth = 01 ;
filename test "c:\annmaria\incidents\school\&yr\&mnth\&fname" ;

If you aren’t terribly familiar with macros and you run this, you may think it didn’t work to resolve the macro variables because you see this in your log:

filename test "c:\annmaria\incidents\school\&yr\&mnth\&fname" ;

What went wrong? Nothing went wrong. SAS is merely echoing back the statement you gave it. To see if it worked, you can do something like this:

DATA CHECK ;
INFILE test ;
INPUT v1 $ :

and you will find that your notes in your SAS log tell you that test is

"c:\annmaria\incidents\school\2011\01\RIOSECO_201101.txt"

Okay, nice, looks like it all works but I’m not going to change those three %LET statements  and re-run the code 80000 times. I need a macro!

This creates the incidents dataset I am going to build. My log tells me it has zero observations, but I create it here because I am going to add every file, one by one, and if I don’t have a dataset on the first execution, I will get an error message telling me that the file in.incidents does not exist. And I will be sad.

data in.incidents ;
infile test length=len  ;
run;

Here is the first macro, to read in each file as a temporary SAS dataset and add it to the permanent incidents dataset.

%macro readem(yr,mnth,fname,num) ;
/* Reads for every year, month and incident */
filename school ""c:\annmaria\incidents\school\&yr\&mnth\&fname" ;
data check&num ;
infile school ;
*** All the statements to read in and process your data go here;
**** Now I am going to build one long file from those 80,000 records ;
data in.incidents ;
set in.incidents check&num ;
proc datasets lib = work kill memtype= data ;
***** Without the PROC DATASETS you end up with 80,000 temporary files ;
run;
%mend readem ;

The macro below, calls the macro above and runs it 80,000 times.

%macro readall ;
%global yr mnth fname nmbr ;
%do i = 1 %to 80000 ;
Data test5 ;
set in.names ;
if _n_ = &i then do ;
call symput('fname',filename) ;
call symput ('mnth', mnth) ;
call symput ('yr', yr) ;
call symput ('nmbr', _n_) ;
output ;
end ;
run ;
%readem(&yr,&mnth,&fname,&nmbr) ;
%end ;
run;
%mend readall ;
%readall ;
r
un ;

Why the &nmbr? I don’t really need that, do I?  I mean, I’m deleting the temporary dataset after each step so why bother changing the number? I just did it for error checking. If there was a problem with a file, the message would say something like CHECK9876 not found. Since this was data collected over several years and people (I am sure just to annoy me personally) have a habit of changing how they record their data over time, if I saw that, say, from record 17000 on there were errors, I could be pretty sure that some field was added that month, and I’d have to go rewrite my program to account for that. Miraculously, there weren’t any errors, everything ran.

So, there you go. I did not need to write a 2,400,000 lines of code.

With the time I saved, I did research on Chardonnay. The house rocket scientist is also the resident wine procurer because he disapproves of my method, which consists of buying whatever has the weirdest label. I bought smoking loon for a long time, because hey, it has a duck on the label, with a cigar. That is hard to top. However, now in my extra spare time I have discovered there is a label named Barking Lizard. How have I missed this all of my life? I am SO picking up a bottle of this tomorrow! See how useful the time you save can be.

I started to write a blog on this topic, but it was too negative (even for me, and I can be pretty cranky), so I deleted it. Then, on Twitter, Jesse Luna asked me what specifically it was that I had a problem with about small business development centers, and that set me off all over again.

Here is my sanitized version of why programs to help small business suck (except SBIR).

1. The section 8a application is more complicated than running Linux on a supercomputer and doing my taxes combined. After a couple of failed efforts involving two other staff members and our CPA with our previous company, last year, I decided to bite the bullet and complete the application, come hell or high water. After the THIRD workshop I attended (remember, now I am the fourth person to have worked on this), I found out we were not eligible. If you were under the mis-impression that this is a program for minority businesses, or women-owned businesses or people who had disadvantages in starting business, well you are not quite right. I am sure I would have qualified for this program years ago, but years ago I did not have the time to devote to the application project because I was starting a business. My husband and I started saving for our retirement when we finished graduate school (which was  a LONG time ago) and now our savings put us over the net worth limit. I understand that budgets are tight and there are people more needy than us, so I am not objecting, but it would have been very helpful if we knew much sooner that this was not only a program aimed at diversity and had limits on what your pension and 401k could be, so we didn’t waste so much time. I’m also not objecting because when I asked for evidence that 8a certification was related to increased revenue, nobody could point me to any data.

2.  With only a single exception, for YEARS (I have been in business over 20 years) the only two pieces of information I ever got from ANY small business program was to be 8a certified (see above) and write a business plan. I have a business plan and I don’t believe for a minute that revising it is going to bring me the slightest bit of business. Now that I can tell them we aren’t 8a eligible I wonder what they will say.

3. Lately there is a lot about helping small businesses get credit. I have a line of credit with my bank. I don’t need a loan.

4. What I do need, what every small business owner I know needs, is WORK. The major help that would assist small business is to reduce the barriers in doing business with the government. The barriers are NOT knowledge of the requirements. The barriers ARE the requirements. Let me give an example: at the federal level, I am registered in CCR, ORCA, grants.gov and era Commons. I know I need to do all of that. I am certified as a small business for the state of California, Los Angeles County and the LA Metro Authority. Twice recently, I have been invited to bid on state contracts. One required that we have FIVE CURRENT contracts doing the exact type of survey that was proposed. Not five in three years, or five years (which we had) but five RIGHT NOW. Why? How is this in any way related to our ability to do the work? A second bid required FIVE previous contracts doing the EXACT type of data analysis that was in the contract. Not just, say, Analysis of Variance, or program evaluation, but (I am changing the details here) “program evaluation using Analysis of Variance of data on substance abuse prevention programs for children in foster care”.   When contracts are written this specifically it makes me wonder whether they were written for a specific business, whether the agency realizes that most small businesses won’t have five simultaneous contracts for the identical type of survey. On top of all of this, when I do get government contracts and subcontracts, I need to fulfill all types of requirements I don’t need for my commercial clients. I need commercial insurance, workers compensation insurance, a written sexual harassment policy and a whole bunch of other things my accountant handles. I just know they cost me money and aggravation. A written sexual harassment policy? I was the world judo champion and our research assistant is the number two ranked amateur woman in the world in mixed martial arts. You’d have to have a death wish to sexually harass anyone in this office.

The ONLY help that I have EVER gotten from any small business development program was many years ago. A small business incubator in North Dakota, where my previous company was founded, was incredible. They put us in touch with the accountant who my current company uses to this day. My previous company still uses her, too. (Donna Remer – she is a godsend ). Thanks to her, our payroll taxes, corporate taxes, taxes on taxes and whatever else it is, gets paid on time and we stay out of jail.

The second thing I have gotten, and which totally does not suck, is several Small Business Innovation Research awards. This is the best program for small business – ever. First of all, completely unlike the 8a application, if your SBIR proposal is approved, they actually give you money to do work. The first phase is a prototype and if you do well, you can get a second phase of funding, for a total of about three years of funding. Secondly, in doing the work, we have built capabilities that allow us to do better work in future proposals and for commercial contracts. Third, while the business plan SBDCs are always pushing focuses on the “staff” functions of business – marketing, balance sheets – the SBIR funding is focused on the “line” part of the business, the part that makes us money.

Now I hear that the various small business programs have a new “Women Owned Small Business” program that is supposed to some how address the fact that far less than 10% of federal business goes to women owned companies. So, all my problems are solved, right?

How much you want to bet that it translates into not one dollar more business except for those people offering services as consultants to get your business certified?

All I can say is that if I get one bit of paper from one agency asking me to prove that The Julia Group is a woman owned business, I am going to strip naked, sprawl across the fax machine and hit SEND.

(When I started as a statistical consultant in 1985 that might have been considered a bribe, now it’s more likely classed under abuse of a government official. And I am sure it will violate our written sexual harassment policy.)