Jan
19
The first things a statistical consultant needs to know
January 19, 2020 | 3 Comments
I’ll be speaking about being a statistical consultant at SAS Global Forum in D.C. in March/ April. While I will be talking a little bit about factor analysis, repeated measures ANOVA and logistic regression, that is the end of my talk. The first things a statistical consultant should know don’t have much to do with statistics.
A consultant has paying clients.
In History of Psychology (it was a required course, don’t judge me) one of my fellow students chose to give her presentation as a one-woman play, with herself as Sigmund Freud. “Dr. Freud” began his meeting with a patient discussing his fee. In fact, Freud did not accept charity patients. He charged everyone. There’s a winning trivial pursuit fact for you.*
Why am I starting with telling you this? Because I have had plenty of graduate students whose goal is “to be a consultant” but they seem to think their biggest problem when they start out is going to be whether they should do propensity score matching using the nearest neighbor or caliper method.
Here are the biggest problems you’ll face:
- Getting your first clients
- Getting paid
- Getting your data into shape
- Communicating results to your clients.
Let’s start with getting clients. I can think of four ways to do this; referrals, as part of a consulting company, through your online presence and through an organization. I’ve done three of them. First, and most effective, I think, is through referrals. I got my first two clients when professors who did consulting on the side recommended me. I do this myself. If someone can’t afford my fees or I am just booked at the moment, I will refer potential clients to either students, former students or other professionals I know who are getting started as a consultant. It’s not competing with my business. I am never going to work for $30 an hour again and if that’s all that’s in your budget, I understand. If all you need is someone to do a bunch of frequency distributions and a chi-square for you, you don’t need me, although I’m happy to do it as a part of a larger contract.
Lesson number one: Don’t be a jerk.
Referrals mean I’m using my own reputation to help you get a job and so I’m going to refer students who are good statisticians and who I think will be respectful and honest with the client. Don’t underestimate the latter half of that statement.

Lesson number two: It helps if you really love data analysis.
I’d be the first to say that I’m a much nicer person now than when I was in graduate school. Yes, it took me a while to learn lesson one, I am embarrassed to say. However, I really did love statistics and if any of my fellow students had trouble, I was the first person they asked and I was really happy to help. When those students later became superintendents of schools or principal investigators of grants, they thought of me and became some of my earliest clients. Some of my professors also became clients, although those were after I’d had several years of experience.
Lesson number three: Don’t think you are smarter than your clients.
A young relative, who has a Ph.D. In math asked me, “No offense but isn’t what you do relatively easy, like anyone who understood statistics could do it? Why are you so in demand?”
Corollary to this lesson: If you find yourself saying, “No offense” just stop talking right then.
One reason a lot of want-to-be consultants go bankrupt or have to find another line of work is they do think they are smarter than their clients. This manifests itself in a lot of ways so we’ll return to it later, but one way is that they charge much more than the work is worth.
How do you know how much your work is worth?
Lesson number four: Ask yourself, if I had twice as many grants/ contracts as I could do and I was paying someone to do this work, what would I be willing to pay?
That’s a good place to start.
I’ve met a lot of people over the years who charged much more than me and bragged to me about it. In the long run, though, I’m sure I made a lot more money. Clients talk. They find out that you are charging them three times as much as their friend down the block is getting charged by their consultant. You may think you’re getting away with it, but you won’t. You may get paid on those first few contracts but you’ll have a very hard time getting work in the future.
Lesson number five: Know multiple languages, multiple packages
I’ve had discussions with colleagues on whether it is better to be a generalist or a specialist.
I have had a few jobs where they just needed propensity score matching or just a repeated measures ANOVA but those have been the small minority over the past 30 years.
I would argue that even those who consider themselves specialists actually have a wide range of skills. Maybe they are only an expert in SAS but that includes data manipulation, macros, IML and most statistical procedures.
In my case, I would not claim to be the world’s greatest authority on anything but if you need data entry forms created in JavaScript/HTML/CSS, a database back end with PHP and MySQL, your data read into SAS, cleaned and analyzed in a logistic regression, I can do it all from end to end. No, I’m not equally good at all of those. It’s been so long since I used Python, that I’d have to look everything up all over again.
I’ve used SPSS, STATA, JMP and Statistica, depending on what the client wanted. I think I might have even had a couple of clients using RapidMiner. For the last few years, though, the only packages I’ve used have been SAS and Excel. Why Excel? Because that’s what the clients were familiar with and wanted to use and it worked for their purposes. (See lesson three.)
I was really surprised to read Bob Muenchen saying SPSS surpassed R and SAS in scholarly publications. Almost no one I know uses SPSS any more, but, of course, my personal acquaintances are hardly a random sample. I suppose it depends on the field you are in.
I have never used R.
Some people think this is a political statement about being a renegade. Others think it’s because I’m too old to learn new things or in subservience to corporate overlords or some other interesting explanation. (The Invisible Developer, who has been reading over my shoulder, says he never got past C, much less D through Q.)
Since I fairly often get asked why not, I will tell you the real reasons, which is a complete digression but this is my blog so there.
- In my spare time that I don’t have, I teach Multivariate Statistics at a university that uses SAS. Since I’m using SAS in my class anyway and need real life data for examples, when a client has licenses for multiple packages and doesn’t care what I use (almost always the case), I use SAS.
- About the time that R was taking off, my company was also taking off in a different direction. The Invisible Developer and I own the majority of 7 Generation Games which is an application of a lot of the research done by The Julia Group. When we started developing math games, we needed to learn Unity, C#, PHP, SQL, JavaScript, HTML/CSS. We also needed to analyze the data to assess test reliability, efficacy, etc. I called the analysis piece and told The Invisible Developer I was interested in all of it so I’d do whatever was left. He was really interested in 3D game programming so he did the Unity/C# part. I did everything else. Then, after a few years, I moved to Chile, where the language I most had to improve was my Spanish.

It worked out for me. We have a dozen games available from 7 Generation Games and now we’re coming out with a new line on decision-making.
I mention all this because I want to emphasize there isn’t a single path to succeeding as a consultant. There isn’t a specific language or package you have to learn. There is one thing you absolutely must have, though, and that’s the next post.
* (See Warner, S. L. Sigmund Freud and Money. (1989) Journal of the American Academy of Psychoanalysis. Winter;17(4):609-22)
Jan
6
From PHPMyAdmin to SAS Studio for lazy people
January 6, 2020 | Leave a Comment
I need to get a data set into SAS for a course I’m teaching in March. Students like real data and some kind folks were willing to allow their de-identified data to be used. Win-win.
How did I get this data? In a SAS data set with a handy code book? Oh, very funny!

I received a login to PHPMyAdmin where the data which were definitely not created for my personal convenience reside.
First, I downloaded the data as CSV for Excel. This gave me a file where everything was like this.
re_apply;”consumer_id”;”email”;”counselor”;”gender”;”date_of_birth”;”age”;”primary_disability”;”secondary_disability”;”education”;”member”;”tribe”;”district”;”job_when_entered”;”if_job_earnings”;”earnings_type_before”;”referral”;”other_refer”;”application_date”;”assessment_date”;”eligibility_date”;”ipe_date”;”notify_rights”;”vocational_goal”;”state_vr”;”status”;”closure_date”;”status_type”;”employment_date”;”type_employ”;”start_job_earnings”;”post_job_earnings”;”earnings_type_after”;”intermediate_goal”;”semesters”;”int_completed”;”intermediate_date”;”last_contact_date”;”comment”
Yes, one, long line with everything in quotes and every column separated by a semi-colon. These are the column names but all of the data are in this exact same format as well.
Of course, you COULD upload this file and read it into SAS but that would take time and effort.
OR you can download a regular CSV or ODS spreadsheet file, just pick one of the other options, and then all your data would be in nice columns but you have no header row. Of course, that’s pretty easy to write an input statement in SAS. You just need to type in a hundred or variable names and be sure to have the format correctly specified. Not hard but probably take you more than a minute.

And, as some of the posters noted on that forum, not everyone has access to SQL connect.
Or, you could be a completely lazy person like me and fix it all in about 12 clicks.
Here is how:
- Download the file as CSV for Excel
- Copy and paste the first line, the header, into Word
- Replace all the quotes with nothing, using Replace from the EDIT menu
- Under the TABLE command select “CONVERT TEXT TO TABLE”. For “Separate text at”, click on Other and put in a semi-colon.
- Now you have all of your filenames in nice columns as a table, copy that.
- Download the file again as CSV
- Insert a row and paste the filenames you copied in Step 5
- Save that file as an Excel file
- Upload it into SAS Studio
- Under TASKS and UTILITIES, select IMPORT DATA, drag your file you uploaded to the window and click on the little running guy.


Blogroll
- Andrew Gelman's statistics blog - is far more interesting than the name
- Biological research made interesting
- Interesting economics blog
- Love Stats Blog - How can you not love a market research blog with a name like that?
- Me, twitter - Thoughts on stats
- SAS Blog for the rest of us - Not as funny as some, but twice as smart. If this is for the rest of us, who are those other people?
- Simply Statistics, simply interesting
- Tech News that Doesn’t Suck
- The Endeavor -John D Cook - Another statistics blog