Lately, there’s been a lot of talk about making college, or younger, students feel as if they are really getting the same education when teaching online versus in the classroom.

As someone who has taught online since 1997 (yes, you read that right) and has taught the same classes both in the classroom and online, I have a few suggestions.

Online Classes Can be Better than Face to Face

Record Your Lectures

The very first suggestion I have is to record your lectures and make those downloadable. The university where I teach has Blackboard and this is an option. If your school does NOT have that option for whatever web meeting software you have and you have a Mac you can make a screen recording with QuickTime and upload it to a YouTube account.

Share Data Libraries

I teach multivariate statistics and we use some methods that require at least a modest sample size. Having students type in hundreds of records is ridiculous. Even better, I can download and clean data from sites like ICPSR or the California Health Interview Survey.

I upload the codebooks to the class website.

I upload these files to a class directory using SAS Studio. I give my students the LIBNAME with read-only access and they have a data set with thousands or tens of thousands of records all set to analyze.

For assignments where data cleaning is part of it, I give them access to the original data.

Yes, you can get SAS for free

Students can get a SAS studio account for free, run their programs, download and send me both their log and their output. 

Make Cheating Less Tempting

Friends who are new to teaching online say cheating is a real problem. I try to remove the temptation by making it harder. If I give you a dataset with 500 variables and ask you to pick 20, run a factor analysis, write up your results and send me your log and output I can at least see it was run under your account and it’s not going to be the same exact variables as someone else. 

That doesn’t mean a student can’t have paid someone to do it for them or had a relative do it. [I was shocked to read on a forum all the women who said they did their husband’s masters degree homework “Because the degree will help our household income and he works all day.”]

This type of cheating isn’t something you can prevent in face-to-face classes either unless you have the student write all of their papers in front of you.

One way to make cheating less tempting is to have assignments that students can individualize. A change I made for the fall semester is to give two different data sets for each assignment. One is the Monitoring the Future study with survey data from youth, the other is the California Health Interview Survey. 

I try to update these datasets fairly frequently, so I just replaced the 2009 CHIS with the 2018 data set.

So, if you are interested in social science or health analytics, you can pick whatever interests you. Sometimes the most hard core engineering majors pick the MTF study of youth because they have an adolescent at home and are curious about national norms, how adolescents rate their communication with parents, etc.

Still, I would like a third data set with something more marketing or engineering focused. If anyone has a suggestion, hit me up in the comments please.

Have Online Discussion Boards and Don’t Make Them Stupid

These boards should not be just a waste of time. Again, related to the preventing cheating, I often ask questions related to their papers, like,

“What variables are you thinking of using for your factor analysis assignment? Do you see any possible problems with those variables?”

The second part of each question is to ask another question for the next student to answer. 

I’m fortunate that I often have students who are in the same cohort so they know each other and will comment on something related to the other students’s work or interest.

Get to Know Your Students

I taught middle school students this summer in a Game Design Course and it was a blast. (We’re doing it again this fall, if you have a middle schooler you’d like to sign up, click here to get info and put GAME DESIGN in the message). 

Whether middle school or adults I ask them to turn on the camera and say hi the first day just so I know what they look like and their voice. 

Just like an in-person class, I start by asking everyone where they are from, making sure I know how to pronounce their names correctly and ask them to tell me one interesting thing. For the middle school students it might be the name of their dog or that they play saxophone. For the adults it might be that they work at CDC or really want to do research on infant mortality in Nigeria, where they grew up.

If you’re not a jerk, online classes can be better for your students

I have heard of instructors who insist all students have on their camera at all times, not on mute, be dressed appropriately, no distracting background. That’s just stupid. For my adult students, they may have small children running around, they may be making dinner. I don’t care. Why should I? If they miss something, they can replay the video later. This is one way online classes are BETTER for adult students.

I asked everyone to turn their cameras on for this picture

For my middle school students, maybe they are embarrassed about their room, their looks. As someone who has taught middle school, I can tell you that there is almost nothing a middle school student can’t be embarrassed about. Maybe they are lounging on their bed while listening to me. So what? This is a way online classes might be better for younger students.

Also, don’t be a jerk about the chat.

I do read all the chat messages that go on while I’m talking. If it is a question to me, I answer it. Some students feel more comfortable typing/ texting than talking.

My adult students never veer too far off tasks. With the middle school ones, I might need to drop into the chat from time to time and say “Enough with the poop emojis”. Usually, though , their classmates do it for me.

Well, I have lots more ideas but it’s Saturday and I have to finish writing my assignments for next month.

If you’ve been wondering why I haven’t been blogging for four months –

Well, there’s a pandemic, and demand for educational software has spiked, our 7 Generation Games company has upped both users and employees 50%, The Julia Group has more of a demand for online training, analysis and app development so, yeah, been busy.

Before the pandemic happened, I was planning on speaking at the SAS Global Forum on things I had learned as a statistical consultant. I wanted to call it “This is a hill I will die on” but one of my students suggested “This is a hill I will not die on” was a better title. However, by the time I had this idea the deadline for changing anything in your paper had already passed so the title was

buffalo in the snow
Spirit Lake Natio

From Santiago to the Spirit Lake Nation: 30 things I learned in 30 years as a statistical consultant

You can click the link above and read it.

My point is that I am a serious person doing serious things – some of the time and tomorrow I will write about statistics. However … since there is a blogging challenge going on

Today, Eva and I decided to write about quarantine clothes

I am hardly the fashion plate at the best of times. In my bio for The Family Textbook, which is hilarifying and you can purchase for the measly sum of $2.99 it mentions my proclivity for collecting weird socks, which is true. It also notes that I have never sent a dick pic. Also true.

Family textbook biographies

The first rule of web meetings is to wear clothes

The Invisible Developer, also the Chief Technology Officer of 7 Generation Games, contrary to popular belief, is very seldom bossed around by me. However, here is where I draw the line. When he proposed that he could be on time for a daily morning meeting – incidentally, at 11 am – if he attended in his bathrobe, I declared the meeting could start late and he would be clothed. We do, after all, have a sexual harassment policy around here and I am pretty sure showing up in video calls in your bathrobe under which you may or may not be wearing underwear violates it.

Rule #1 Does Not Apply if Your Camera is Turned Off

Gonzalo, a senior software developer, almost never appears with the web camera turned on and when he does, he was wearing a mask before it was cool. No, not like an N95 mask but like a “I’m-a-member-of-the-horde-from-World-of-Warcraft” mask.

If you think I am kidding, check out this video on designing video games which includes Gonzalo and his very cool mask.

When I mentioned the clothing required rule he said,

Wait, what? You can’t attend the meeting in your pajamas?

I told him that rule only applied if your camera was turned on, and then he calmed down considerably.

Rule #2: Only what can be seen on camera matters

Which is why, today, it was perfectly appropriate for me to attend three meetings wearing a plain, long-sleeved blue shirt, a hoodie, long underwear pants and sock monkey slippers.

Rule #3 All quarantine outfits can be improved by well-chosen socks

I have socks with flamingos, sushi, my granddaughter’s face, multi-colored chihuahuas and World War II female welders.

Variety in foot attire is an important part of the optimal quarantine outfit

Rule #4: Some meetings are so stupid, they require special socks

Yes, I have socks that say, “This meeting is bullshit”. I am prepared

I try to avoid useless meetings that should have been an email but sometimes these are unavoidable. In this case, it is extremely important to have the correct socks because you can look down and appear to be studiously considering whatever dumb ass suggestion the other person has just made.

Rule #5 For people who say you need to dress professionally for web meetings, see rule #4

My granddaughter was bored.

She had been home for three weeks, in Minnesota, which meant much of her time was spent indoors because it is cold outside and she lives in a city.

Not the most fun walk – Minnesota city streets in winter

This week was even worse because it was spring break and she said,

Me and my friends used to think that if we had no school and we could just stay home all the time it would be great but really it’s HORRIBLE.

Making it even worse, she and her sister were supposed to be spending spring break in Santa Monica with us, chilling by the beach and meeting up with friends from her old school.

Where my grandchildren were supposed to be

Recently, we’d created a WordPress site for her but it had nothing but the sample pages that came with it. She said she couldn’t think of anything to write. So, I said:

I challenge you to The Blog Hour!

Every day now, at 7 pm Pacific Time, we call each other and start blogging. There are no rules except that we need to start at (about) 7 pm and blog for no more than one hour. At 8 pm, promptly, we both stop.

You are welcome to join us

If you do, send me a link to your blog.

Eva’s first post was on Quarantine Ideas

Mine was Everything is NOT fine

Yesterday, she wrote on Quarantine Food

And I wrote about ideas to De-stress during a Pandemic

Something I have learned about blogging over the years …

There is no difference between the blogs you wrote because you felt inspired and those you wrote because of a challenge to write X number of words/ posts

I’ve been writing this blog for a dozen years, I did a judo blog pretty regularly for over a decade and I write posts on the 7 Generation Games blog, sometimes on life and sometimes on math.

When I look back over the years, I find it impossible to pick out the posts I did because there was some kind of public or personal challenge and those I wrote because I really felt strongly about what I had to say that day.

Eva thinks you can’t hang with us – prove her wrong!

SO … if you are stuck in the house and need a challenge, Eva and I are throwing it down. Join us!

Check out the follow up post on fashion advice from me. Those of you who have met me in person are already rolling your eyes.

Two Ojibwe girls in the woods 500 years ago
Another thing to do if you are bored, download Making Camp Premium or play it on the web

Probably like many of you who read this blog, this pandemic has lasted longer for me than most people. Statistics is my thing. I teach it, I make games about it , I code statistical analyses and I provide statistical consulting.

A few weeks ago, there were 1.9 cases of Coronavirus per million people in the United States. I remember looking at the growth curves in the U.S. and around the world, thinking to myself,

Oh, no, this is not going to be good.”

We’re now about 3,000 times the rate of infection we were then. It’s no wonder we’re all stressed.

Checking death statistics 10 times a day isn’t good for you

Initially, I checked the Worldometer site several times a day, thinking it could not possibly be as bad as I thought. No one else seemed to be that worried.

When everything started shutting down and more people were seriously concerned, I still spent my first hour every morning browsing the news on the virus. It was all bad and I found it hard to concentrate on work. Little things annoyed me.

I was already staying inside, not seeing my friends and family, working from home. Did me knowing exactly how much the death rate had climbed since yesterday do any good?

No, of course it didn’t. That was a rhetorical question.

What you should do instead

Start the morning with something you want to do.

For some people it might be a jog or a bike read. Good for you. I did enough training when I was young to last until I’m 200. (I’m serious. Google it.)

Mine may sound really dorky but on my list for a long time has been wanting to get better at WordPress. I write this blog and one on the 7 Generation Games site. I wrote a blog on mostly judo and life for a dozen years, though I rarely update that any more.

I took some courses on lynda.com for a month and then I got busy for 8 months and did nothing. So, now I am back at it.

Coffee

Every morning, I lay in bed, drink a cup of coffee and watch videos or read a book on WordPress

Whatever you’ve been WANTING to do, do that thing

Notice I said “wanting”, not “felt you should do”. No one looks forward to the next morning when they are going to clean out the junk drawer in the kitchen or do their taxes.

Three of the things I like most are coffee, sleeping late and programming. So, now, every morning, that is how I start my day.

Even better, my husband usually gets up, grinds the coffee beans and brings me up a cup so I don’t even have to get out of my warm bed.

frog

Tell the people who think you should start your day with the things you have to do that they should go eat a frog

You’re at home. You’re going to be home ALL fucking day! You can start off by playing a video game for an hour.

Get library card

Seriously, libraries are amazing. Before you start whining that the libraries are closed, know this …

Many libraries allow you to apply for a card online during the current pandemic

I have a card for the Los Angeles Public Library, the Santa Monica Public Library and, as a faculty member, I also have access to the National University library.

Through the Los Angeles library, I can download 15 ebooks a month using the Hoopla app. I can also download ebooks owned by the library and read these on a Kindle or iBooks app.

There is an app called Kanopy through which I can get six movies a month free.

I really like documentaries, so here is a place I’ve found a lot of interesting ones.

The Santa Monica Public Library only allows 6 downloads a month with Hoopla, which is why I needed two library cards!

There is just a lot of cool stuff, from apps to learn languages to checking out newspapers. Don’t want to subscribe just to read that one article? Use your library card.

Okay, so there are my two recommendations for today:

  • Start your day with something you want to do.
  • Check out the free books, movies, magazines, newspapers and apps from your public library.

You can also play AzTech: The Story Begins, on your iPhone or iPad while you are waiting to hear my next post recommendation. It’s an interesting idea. You’ll see.

I’ve read a lot of cheery tweets that said something like,

Young girl frowning in disapproval

“Buffy, Biff and I are isolated at home with our terrier, Boo. Here’s a picture. Isn’t he cute? We played card games, then I baked this three-course meal I saw on Pinterest. Biff is taking this time to finally become proficient in Mandarin with a course he is taking online.”

Seriously, what is WRONG with you people?

Now, those are the people we all want to slap, but there is another group that is more worrying. If working remotely is your usual mode, you are still drawing a paycheck and no one in your family is seriously ill, you may feel as if you should be going about life as usual.

I was in that group. After all, I have an office in my house where I usually work when I’m not traveling. My husband works upstairs. I’ve taught online for years. So, I’m in the same place, doing the same thing. Other people have real problems. Everything is fine.

Everything is NOT fine

A very sensible tweet I read said something like,

If you haven’t eliminated at least one student assignment, you are doing it wrong. Students are having to do their classes on line, have lost jobs, have jobs for which demand has skyrocketed overnight, have children or siblings at home interrupting them, have to share a computer, don’t have Internet access. They can’t go to the beach or the gym to de-stress. Some are home with abusive parents or partners. Expecting the same level of work is clueless.

I thought, “Well, yeah, I am sure that is true for students who are living in poverty, who are in elementary or middle school, but I teach graduate students who are professionals.”

Then … I got the assignments that were due after everything began locking down. Now, I should preface this by saying I have taught the same course for the same university for seven years. Over the past couple of years, the admission requirements for the program have been tightened, so the average student is more prepared.

My highly qualified graduate students made mistakes that I know they would not normally make

How do I know this? Before Coronavirus was an every day word, their work was as good or better than the average class. As the country began to shut down, they began to make mistakes at a far higher rate than my previous classes. These were particularly more common on problems that required detailed attention. For example, looking at the data to see that the subject numbers were all duplicated and then identifying this as a problem that requires repeated measures analysis.

I made mistakes that I would normally never make

One thing I am usually scrupulous about is data quality and data integrity. In fact, it was a major part of the paper I was supposed to give at SAS Global Forum – which was cancelled. The whole conference was cancelled, that is, not just my paper. Yet, I uploaded the wrong data set to the course directory, didn’t do any descriptive statistics and barely glanced at the PROC CONTENTS. Of course I know better!

The first step in solving a problem is admitting you have a problem

If you’ve read this blog for a long time you may know that I’m not a particular fan of poetry. However, I do know there was a poem with the title, “No man is an island.” (See, not as completely uncivilized as you thought!)

Even if you are healthy, have a safe place to live and a paycheck, you probably know people who don’t

Even if everyone you know- lucky you – is healthy, wealthy and wise, there is the probability that any one of you can get hit tomorrow. Your dad, grandmother or child can become sick. Someone in your family or a close friend can lose a job.

Your daily routine has been disrupted

You can’t go to the gym, church, the library, the mall. Maybe, like many of my friends, your judo club or church is where you used to spend many hours every week and now you can’t go there. People who were important in your life you can’t see any more. Maybe you can’t see your family and friends because they are at high risk due to health problems and have to self-isolate.

Yes, you aren’t living in a slum with no running water, so maybe you feel as if you should be “just shaking it off” and finding some “quarantine project” like Biff and Buffy.

Let me tell you this, Biff and Buffy are assholes. It’s perfectly normal to be anxious. The DEFINITION of anxiety is

A feeling of worry, nervousness, or unease, typically about an imminent event or something with an uncertain outcome.

Oxford Dictionary

We are definitely living in uncertain times.

So, now that we have admitted that it’s normal to feel anxious, the next post is some tips on what to do about it, without sounding TOO much like Buffy.

Last week, I mentioned that successful consultants have five categories of skills; communication, testing, statistics, programming and generalist.

COMMUNICATION

Communication is the number one most important skill. All five are necessary to some extent, but a terrific communicator with mediocre statistical analysis skills will get more business than a stellar statistician that can’t communicate. Communication is a lot more than explaining results to clients or making small talk at meet ups.

Documentation

Communication includes documentation, both in your code and internal documents such as codebooks or an internal wiki. It includes letting clients know what you’re going to do, what it’s going to cost, what that cost includes, what were your results and what those results mean. If you’re good at communicating with clients, colleagues and your future self, you’re half-way to success.

An example of the critical nature of communication can be found in the following retraction:

The identified programming error was in a file used for preparation of the analytic data sets for statistical analysis and occurred while the variable referring to the study “arm” (ie, group) assignment was recoded. The purpose of the recoding was to change the randomization assignment variable format of “1, 2” to a binary format of “0, 1.” However, the assignment was made incorrectly and resulted in a reversed coding of the study groups.”

Aboumatar and Wise (2019, p. 1417)

Because of this incorrect coding, the reported results were the exact opposite of what actually occurred.

Document coding!

Here is an example from a current research project where the CES-D depression scale was used, which requires several items to be reverse-coded before scoring.

In the HTML file where the user enters data that’s written to the database there is this comment:

    <h5 >I felt that I was just as good as other kids.</h5>
    <! –– This is reverse-coded. Don’t you dare change it. ––>
<div class=”row mb-3″>
    <button id=”cesd4_1″ data-src=”3″ class=”cesd4 btn btn-light shadow-box col-5 my-3 mx-auto”>Not at all</button>

 In the original file to read in the data to SAS, there is a comment:

*** NOTE: CESD IS ALREADY REVERSE-CODED. DOES NOT NEED CODING!;

FILENAME REFFILE2 ‘/home/directory3/data_analysis_examples/crossroads/cesd.xlsx’;

In the internal wiki, there is this note:

Tables in Acme Project Database

CESD – Center for Epidemiologic Studies Depression Scale – NOTE: The data are reverse coded at data entry. There is no need to reverse code these. There are 25 columns in this table; ID, username, session number, questions 1 through 20 of the CESD scale, the CESD total which is the sum of the 20 questions, named item21 for some odd reason, and a time stamp.

Document everything! Document how are items coded, how subscales or totals are computed.

This may seem like overkill, but how many retractions could be prevented by this level of documentation? If you are a consultant, it’s probable that at some point someone else will be looking at these data, or that you may be called back a year later to do a longitudinal analysis. Your colleagues and future you will thank you. A year or two from now, I don’t want to be looking at this data set and wondering if I need to reverse-code those items or if it was already done. I want to KNOW!

I deeply suspect that there are more erroneous results published due to incorrect coding of data than to incorrect analyses. After all, the peer reviewers, editors and readers see how you analyzed your data. No one sees how you coded it but you and, possibly, the person who has your position after you.

A few weeks ago, I ended my post with “there is one thing a statistical consultant absolutely must have and promised to say what that is in the next post. Maria and I had just picked up our rental car at the Detroit airport when she turned to me and asked:

So, what is the one thing a statistical consultant has to have?

I told her,

“I have absolutely no idea what I was thinking last month!”

In my defense, I have been in five states and 22 cities in the past 21 days. Maria says it is only 16 because I was in Minneapolis, Fargo and Denver twice each. She also says I can’t count Denver, Chicago or San Francisco since I only changed planes there. Poo!

In Long Beach thinking about statistical consultants

Now that I am back in Los Angeles and my brain has unfrozen I think there are actually five things you must have but one of these is the most important. In my not at all humble opinion, though, you need ALL FIVE.

The actually five skills a statistical consultant must have

Man playing drum in preparation for me saying what  are the 4 skills statistical consultants must have
Drum roll, please
  1. COMMUNICATION – This is the number one most important skill. If you don’t have the rest, you’ll still suck and be unemployed but a terrific communicator with mediocre statistical analysis skills will get more business. I don’t just mean shaking hands and small talk at conferences, either. Communication includes documentation, both in your code and in codebooks, an internal wiki, etc. It includes letting clients know what you’re going to do, what it’s going to cost, what that cost includes, what were your result and what those results mean. If you’re good at communicating with clients, colleagues and your future self, you’re half-way to success.
  2. TESTING – I’ve ranted on this blog a lot about testing because it is one of the areas where people often seem to fall short early in their careers. I got a lot of hate for this post when I said I don’t hire self-taught developers because there are things they don’t teach themselves adequately, like testing.
  3. Statistics – Well, duh. Props to the person in the Chronicle of Higher Education forum whose signature read, “Being able to find SPSS in the start menu does not qualify you to run a multinomial logistic regression.” Your clients may not know what power, quasi-complete separation or multicollinearity mean in interpreting an analysis. They do trust that you understand whatever is necessary to be understood for the work. Don’t let them down.
  4. Programming – when I was a graduate student Very Important Professors had lowly peon graduate students and programmers to write their code for them. All of those people had started their careers using punched cards, (honest!) it was that long ago. All of the statistical consultants I know do programming, or can code their own analyses if necessary. Even if you aren’t doing it all yourself – I’m certainly not these days – you need to know enough to review the code your minions wrote or help said minions when they get stuck. Sometimes, it’s just quicker to do it by yourself than explain to someone else, especially if you need to fix a bug in a code that a client is waiting on.
  5. Be a generalist – I’ll have more to say about this in future posts. In brief, even the consultants I know who are well-known specialists in one language know and use others. If you think your career is going to be you sitting on a mountain or in penthouse office, pontificating to others about sums of squares, the computation of Wilks’ lambda or options for PROC GLMSELECT , you are going to be sadly disappointed. On the other hand, if you do know of a job like that, I would consider taking it for a sufficiently large quantity of money.

I’ll be speaking about being a statistical consultant at SAS Global Forum in D.C. in March/ April. While I will be talking a little bit about factor analysis, repeated measures ANOVA and logistic regression, that is the end of my talk. The first things a statistical consultant should know don’t have much to do with statistics.

A consultant has paying clients.

In History of Psychology (it was a required course, don’t judge me) one of my fellow students chose to give her presentation as a one-woman play, with herself as Sigmund Freud. “Dr. Freud” began his meeting with a patient discussing his fee. In fact, Freud did not accept charity patients. He charged everyone. There’s a winning trivial pursuit fact for you.*

Why am I starting with telling you this? Because I have had plenty of graduate students whose goal is “to be a consultant” but they seem to think their biggest problem when they start out is going to be whether they should do propensity score matching using the nearest neighbor or caliper method.

Here are the biggest problems you’ll face:

  • Getting your first clients
  • Getting paid
  • Getting your data into shape
  • Communicating results to your clients.

Let’s start with getting clients. I can think of four ways to do this; referrals, as part of a consulting company, through your online presence and through an organization. I’ve done three of them. First, and most effective, I think, is through referrals. I got my first two clients when professors who did consulting on the side recommended me. I do this myself. If someone can’t afford my fees or I am just booked at the moment, I will refer potential clients to either students, former students or other professionals I know who are getting started as a consultant. It’s not competing with my business. I am never going to work for $30 an hour again and if that’s all that’s in your budget, I understand. If all you need is someone to do a bunch of frequency distributions and a chi-square for you, you don’t need me, although I’m happy to do it as a part of a larger contract.

Lesson number one: Don’t be a jerk.

Referrals mean I’m using my own reputation to help you get a job and so I’m going to refer students who are good statisticians and who I think will be respectful and honest with the client. Don’t underestimate the latter half of that statement.

Lesson number two: It helps if you really love data analysis.

I’d be the first to say that I’m a much nicer person now than when I was in graduate school. Yes, it took me a while to learn lesson one, I am embarrassed to say. However, I really did love statistics and if any of my fellow students had trouble, I was the first person they asked and I was really happy to help. When those students later became superintendents of schools or principal investigators of grants, they thought of me and became some of my earliest clients. Some of my professors also became clients, although those were after I’d had several years of experience.

Lesson number three: Don’t think you are smarter than your clients.

A young relative, who has a Ph.D. In math asked me, “No offense but isn’t what you do relatively easy, like anyone who understood statistics could do it? Why are you so in demand?”
Corollary to this lesson: If you find yourself saying, “No offense” just stop talking right then.

One reason a lot of want-to-be consultants go bankrupt or have to find another line of work is they do think they are smarter than their clients. This manifests itself in a lot of ways so we’ll return to it later, but one way is that they charge much more than the work is worth.

How do you know how much your work is worth?

Lesson number four: Ask yourself, if I had twice as many grants/ contracts as I could do and I was paying someone to do this work, what would I be willing to pay?

That’s a good place to start.

I’ve met a lot of people over the years who charged much more than me and bragged to me about it. In the long run, though, I’m sure I made a lot more money. Clients talk. They find out that you are charging them three times as much as their friend down the block is getting charged by their consultant. You may think you’re getting away with it, but you won’t. You may get paid on those first few contracts but you’ll have a very hard time getting work in the future.

Lesson number five: Know multiple languages, multiple packages

I’ve had discussions with colleagues on whether it is better to be a generalist or a specialist.

I have had a few jobs where they just needed propensity score matching or just a repeated measures ANOVA but those have been the small minority over the past 30 years.

I would argue that even those who consider themselves specialists actually have a wide range of skills. Maybe they are only an expert in SAS but that includes data manipulation, macros, IML and most statistical procedures.

In my case, I would not claim to be the world’s greatest authority on anything but if you need data entry forms created in JavaScript/HTML/CSS, a database back end with PHP and MySQL, your data read into SAS, cleaned and analyzed in a logistic regression, I can do it all from end to end. No, I’m not equally good at all of those. It’s been so long since I used Python, that I’d have to look everything up all over again.

I’ve used SPSS, STATA, JMP and Statistica, depending on what the client wanted. I think I might have even had a couple of clients using RapidMiner. For the last few years, though, the only packages I’ve used have been SAS and Excel. Why Excel? Because that’s what the clients were familiar with and wanted to use and it worked for their purposes. (See lesson three.)

I was really surprised to read Bob Muenchen saying SPSS surpassed R and SAS in scholarly publications. Almost no one I know uses SPSS any more, but, of course, my personal acquaintances are hardly a random sample. I suppose it depends on the field you are in.

I have never used R.

Some people think this is a political statement about being a renegade. Others think it’s because I’m too old to learn new things or in subservience to corporate overlords or some other interesting explanation. (The Invisible Developer, who has been reading over my shoulder, says he never got past C, much less D through Q.)

Since I fairly often get asked why not, I will tell you the real reasons, which is a complete digression but this is my blog so there.

  1. In my spare time that I don’t have, I teach Multivariate Statistics at a university that uses SAS. Since I’m using SAS in my class anyway and need real life data for examples, when a client has licenses for multiple packages and doesn’t care what I use (almost always the case), I use SAS.
  2. About the time that R was taking off, my company was also taking off in a different direction. The Invisible Developer and I own the majority of 7 Generation Games which is an application of a lot of the research done by The Julia Group. When we started developing math games, we needed to learn Unity, C#, PHP, SQL, JavaScript, HTML/CSS. We also needed to analyze the data to assess test reliability, efficacy, etc. I called the analysis piece and told The Invisible Developer I was interested in all of it so I’d do whatever was left. He was really interested in 3D game programming so he did the Unity/C# part. I did everything else. Then, after a few years, I moved to Chile, where the language I most had to improve was my Spanish.
Games in Spanish, English and Lakota

It worked out for me. We have a dozen games available from 7 Generation Games and now we’re coming out with a new line on decision-making.

I mention all this because I want to emphasize there isn’t a single path to succeeding as a consultant. There isn’t a specific language or package you have to learn. There is one thing you absolutely must have, though, and that’s the next post.

* (See Warner, S. L. Sigmund Freud and Money. (1989) Journal of the American Academy of Psychoanalysis. Winter;17(4):609-22)

I need to get a data set into SAS for a course I’m teaching in March. Students like real data and some kind folks were willing to allow their de-identified data to be used. Win-win.

How did I get this data? In a SAS data set with a handy code book? Oh, very funny!

A bunch of people laughing
This blog will pause while we all laugh at your naivete

I received a login to PHPMyAdmin where the data which were definitely not created for my personal convenience reside.

First, I downloaded the data as CSV for Excel. This gave me a file where everything was like this.

re_apply;”consumer_id”;”email”;”counselor”;”gender”;”date_of_birth”;”age”;”primary_disability”;”secondary_disability”;”education”;”member”;”tribe”;”district”;”job_when_entered”;”if_job_earnings”;”earnings_type_before”;”referral”;”other_refer”;”application_date”;”assessment_date”;”eligibility_date”;”ipe_date”;”notify_rights”;”vocational_goal”;”state_vr”;”status”;”closure_date”;”status_type”;”employment_date”;”type_employ”;”start_job_earnings”;”post_job_earnings”;”earnings_type_after”;”intermediate_goal”;”semesters”;”int_completed”;”intermediate_date”;”last_contact_date”;”comment”

Yes, one, long line with everything in quotes and every column separated by a semi-colon. These are the column names but all of the data are in this exact same format as well.

Of course, you COULD upload this file and read it into SAS but that would take time and effort.

OR you can download a regular CSV or ODS spreadsheet file, just pick one of the other options, and then all your data would be in nice columns but you have no header row. Of course, that’s pretty easy to write an input statement in SAS. You just need to type in a hundred or variable names and be sure to have the format correctly specified. Not hard but probably take you more than a minute.

You COULD theoretically download the data as an SQL file and use SQL connect according to some smart and less lazy people on SAS Communities.

And, as some of the posters noted on that forum, not everyone has access to SQL connect.

Or, you could be a completely lazy person like me and fix it all in about 12 clicks.

Here is how:

  1. Download the file as CSV for Excel
  2. Copy and paste the first line, the header, into Word
  3. Replace all the quotes with nothing, using Replace from the EDIT menu
  4. Under the TABLE command select “CONVERT TEXT TO TABLE”. For “Separate text at”, click on Other and put in a semi-colon.
  5. Now you have all of your filenames in nice columns as a table, copy that.
  6. Download the file again as CSV
  7. Insert a row and paste the filenames you copied in Step 5
  8. Save that file as an Excel file
  9. Upload it into SAS Studio
  10. Under TASKS and UTILITIES, select IMPORT DATA, drag your file you uploaded to the window and click on the little running guy.
Converting from text to table
Click 8 or So
Tasks and utilities
All that’s left is to drag the file

Anyone who uses SAS (or doesn’t) probably has their own reasons. I have a few but a major one is the ease of importing just about any type of data.

Mo’ clients, Mo’ problems

There are multiple types of consultants. I’m the type who is, literally, all over the map. I’ve been in five countries this year and I think 11 states plus the District of Columbia, but I might have left off a couple. I said 9 in a post on a different blog where I occasionally write about my life and judo, but then I remembered I’d been in Texas for SAS Global Forum where I gave a talk on biostatistics and also in New Mexico speaking on transition from school to work for tribal youth with disabilities.

What that means is that I work with a wide range of organizations and their data is not all in the same format.

If you work with a wide range of clients, ease of data import matters

If you’re a consultant who works consistently with one client, data formats may not be your biggest issue. You probably wrote a program to read in that data, no matter what messy format it was in and you’re good to go. In my case, though, every dataset, every project is different.

All the data, all the time

In the previous post, I mentioned reading in the IPEDS data, which is a relatively small public data set (around 7,000 x 60). Fantastically, that came with a SAS program so all I needed to do was upload the raw data file and change the INFILE statement.

Proc import does not a consultant make

Maybe when you were a student you imported your data sets by a PROC IMPORT step. This isn’t terrible. You should use this procedure when you can. However, you’re going to need to go several steps further.

Even worse, if you’ve been getting your data by simply using the LIBNAME statement your professor provided you or doing some pointy-clicky thing with SAS Studio or Enterprise Guide (or SPSS) you have a lot to learn.

Every year, I have graduate students who tell me they are going to become consultants. More often than not, I shake my head and think,

“You have no idea what you are getting into.”

– Me

If you are going to be working as a statistical consultant for a variety of clients, far more than PROC LOGISTIC or PROC GLIMMIX, your time is going to be spent in the DATA step.

It’s not just a matter of data formatting or missing data, but of creating the data you need that isn’t there. What do I mean by that? Ha ha, that is a future blog post that I may write next time I’m on a plane somewhere and have a spare moment. Probably tomorrow.

Next Page →