Lately, there’s been a lot of talk about making college, or younger, students feel as if they are really getting the same education when teaching online versus in the classroom.

As someone who has taught online since 1997 (yes, you read that right) and has taught the same classes both in the classroom and online, I have a few suggestions.

Online Classes Can be Better than Face to Face

Record Your Lectures

The very first suggestion I have is to record your lectures and make those downloadable. The university where I teach has Blackboard and this is an option. If your school does NOT have that option for whatever web meeting software you have and you have a Mac you can make a screen recording with QuickTime and upload it to a YouTube account.

Share Data Libraries

I teach multivariate statistics and we use some methods that require at least a modest sample size. Having students type in hundreds of records is ridiculous. Even better, I can download and clean data from sites like ICPSR or the California Health Interview Survey.

I upload the codebooks to the class website.

I upload these files to a class directory using SAS Studio. I give my students the LIBNAME with read-only access and they have a data set with thousands or tens of thousands of records all set to analyze.

For assignments where data cleaning is part of it, I give them access to the original data.

Yes, you can get SAS for free

Students can get a SAS studio account for free, run their programs, download and send me both their log and their output. 

Make Cheating Less Tempting

Friends who are new to teaching online say cheating is a real problem. I try to remove the temptation by making it harder. If I give you a dataset with 500 variables and ask you to pick 20, run a factor analysis, write up your results and send me your log and output I can at least see it was run under your account and it’s not going to be the same exact variables as someone else. 

That doesn’t mean a student can’t have paid someone to do it for them or had a relative do it. [I was shocked to read on a forum all the women who said they did their husband’s masters degree homework “Because the degree will help our household income and he works all day.”]

This type of cheating isn’t something you can prevent in face-to-face classes either unless you have the student write all of their papers in front of you.

One way to make cheating less tempting is to have assignments that students can individualize. A change I made for the fall semester is to give two different data sets for each assignment. One is the Monitoring the Future study with survey data from youth, the other is the California Health Interview Survey. 

I try to update these datasets fairly frequently, so I just replaced the 2009 CHIS with the 2018 data set.

So, if you are interested in social science or health analytics, you can pick whatever interests you. Sometimes the most hard core engineering majors pick the MTF study of youth because they have an adolescent at home and are curious about national norms, how adolescents rate their communication with parents, etc.

Still, I would like a third data set with something more marketing or engineering focused. If anyone has a suggestion, hit me up in the comments please.

Have Online Discussion Boards and Don’t Make Them Stupid

These boards should not be just a waste of time. Again, related to the preventing cheating, I often ask questions related to their papers, like,

“What variables are you thinking of using for your factor analysis assignment? Do you see any possible problems with those variables?”

The second part of each question is to ask another question for the next student to answer. 

I’m fortunate that I often have students who are in the same cohort so they know each other and will comment on something related to the other students’s work or interest.

Get to Know Your Students

I taught middle school students this summer in a Game Design Course and it was a blast. (We’re doing it again this fall, if you have a middle schooler you’d like to sign up, click here to get info and put GAME DESIGN in the message). 

Whether middle school or adults I ask them to turn on the camera and say hi the first day just so I know what they look like and their voice. 

Just like an in-person class, I start by asking everyone where they are from, making sure I know how to pronounce their names correctly and ask them to tell me one interesting thing. For the middle school students it might be the name of their dog or that they play saxophone. For the adults it might be that they work at CDC or really want to do research on infant mortality in Nigeria, where they grew up.

If you’re not a jerk, online classes can be better for your students

I have heard of instructors who insist all students have on their camera at all times, not on mute, be dressed appropriately, no distracting background. That’s just stupid. For my adult students, they may have small children running around, they may be making dinner. I don’t care. Why should I? If they miss something, they can replay the video later. This is one way online classes are BETTER for adult students.

I asked everyone to turn their cameras on for this picture

For my middle school students, maybe they are embarrassed about their room, their looks. As someone who has taught middle school, I can tell you that there is almost nothing a middle school student can’t be embarrassed about. Maybe they are lounging on their bed while listening to me. So what? This is a way online classes might be better for younger students.

Also, don’t be a jerk about the chat.

I do read all the chat messages that go on while I’m talking. If it is a question to me, I answer it. Some students feel more comfortable typing/ texting than talking.

My adult students never veer too far off tasks. With the middle school ones, I might need to drop into the chat from time to time and say “Enough with the poop emojis”. Usually, though , their classmates do it for me.

Well, I have lots more ideas but it’s Saturday and I have to finish writing my assignments for next month.

If you’ve been wondering why I haven’t been blogging for four months –

Well, there’s a pandemic, and demand for educational software has spiked, our 7 Generation Games company has upped both users and employees 50%, The Julia Group has more of a demand for online training, analysis and app development so, yeah, been busy.

Before the pandemic happened, I was planning on speaking at the SAS Global Forum on things I had learned as a statistical consultant. I wanted to call it “This is a hill I will die on” but one of my students suggested “This is a hill I will not die on” was a better title. However, by the time I had this idea the deadline for changing anything in your paper had already passed so the title was

buffalo in the snow
Spirit Lake Natio

From Santiago to the Spirit Lake Nation: 30 things I learned in 30 years as a statistical consultant

You can click the link above and read it.

My point is that I am a serious person doing serious things – some of the time and tomorrow I will write about statistics. However … since there is a blogging challenge going on

Today, Eva and I decided to write about quarantine clothes

I am hardly the fashion plate at the best of times. In my bio for The Family Textbook, which is hilarifying and you can purchase for the measly sum of $2.99 it mentions my proclivity for collecting weird socks, which is true. It also notes that I have never sent a dick pic. Also true.

Family textbook biographies

The first rule of web meetings is to wear clothes

The Invisible Developer, also the Chief Technology Officer of 7 Generation Games, contrary to popular belief, is very seldom bossed around by me. However, here is where I draw the line. When he proposed that he could be on time for a daily morning meeting – incidentally, at 11 am – if he attended in his bathrobe, I declared the meeting could start late and he would be clothed. We do, after all, have a sexual harassment policy around here and I am pretty sure showing up in video calls in your bathrobe under which you may or may not be wearing underwear violates it.

Rule #1 Does Not Apply if Your Camera is Turned Off

Gonzalo, a senior software developer, almost never appears with the web camera turned on and when he does, he was wearing a mask before it was cool. No, not like an N95 mask but like a “I’m-a-member-of-the-horde-from-World-of-Warcraft” mask.

If you think I am kidding, check out this video on designing video games which includes Gonzalo and his very cool mask.

When I mentioned the clothing required rule he said,

Wait, what? You can’t attend the meeting in your pajamas?

I told him that rule only applied if your camera was turned on, and then he calmed down considerably.

Rule #2: Only what can be seen on camera matters

Which is why, today, it was perfectly appropriate for me to attend three meetings wearing a plain, long-sleeved blue shirt, a hoodie, long underwear pants and sock monkey slippers.

Rule #3 All quarantine outfits can be improved by well-chosen socks

I have socks with flamingos, sushi, my granddaughter’s face, multi-colored chihuahuas and World War II female welders.

Variety in foot attire is an important part of the optimal quarantine outfit

Rule #4: Some meetings are so stupid, they require special socks

Yes, I have socks that say, “This meeting is bullshit”. I am prepared

I try to avoid useless meetings that should have been an email but sometimes these are unavoidable. In this case, it is extremely important to have the correct socks because you can look down and appear to be studiously considering whatever dumb ass suggestion the other person has just made.

Rule #5 For people who say you need to dress professionally for web meetings, see rule #4

My granddaughter was bored.

She had been home for three weeks, in Minnesota, which meant much of her time was spent indoors because it is cold outside and she lives in a city.

Not the most fun walk – Minnesota city streets in winter

This week was even worse because it was spring break and she said,

Me and my friends used to think that if we had no school and we could just stay home all the time it would be great but really it’s HORRIBLE.

Making it even worse, she and her sister were supposed to be spending spring break in Santa Monica with us, chilling by the beach and meeting up with friends from her old school.

Where my grandchildren were supposed to be

Recently, we’d created a WordPress site for her but it had nothing but the sample pages that came with it. She said she couldn’t think of anything to write. So, I said:

I challenge you to The Blog Hour!

Every day now, at 7 pm Pacific Time, we call each other and start blogging. There are no rules except that we need to start at (about) 7 pm and blog for no more than one hour. At 8 pm, promptly, we both stop.

You are welcome to join us

If you do, send me a link to your blog.

Eva’s first post was on Quarantine Ideas

Mine was Everything is NOT fine

Yesterday, she wrote on Quarantine Food

And I wrote about ideas to De-stress during a Pandemic

Something I have learned about blogging over the years …

There is no difference between the blogs you wrote because you felt inspired and those you wrote because of a challenge to write X number of words/ posts

I’ve been writing this blog for a dozen years, I did a judo blog pretty regularly for over a decade and I write posts on the 7 Generation Games blog, sometimes on life and sometimes on math.

When I look back over the years, I find it impossible to pick out the posts I did because there was some kind of public or personal challenge and those I wrote because I really felt strongly about what I had to say that day.

Eva thinks you can’t hang with us – prove her wrong!

SO … if you are stuck in the house and need a challenge, Eva and I are throwing it down. Join us!

Check out the follow up post on fashion advice from me. Those of you who have met me in person are already rolling your eyes.

Two Ojibwe girls in the woods 500 years ago
Another thing to do if you are bored, download Making Camp Premium or play it on the web

Probably like many of you who read this blog, this pandemic has lasted longer for me than most people. Statistics is my thing. I teach it, I make games about it , I code statistical analyses and I provide statistical consulting.

A few weeks ago, there were 1.9 cases of Coronavirus per million people in the United States. I remember looking at the growth curves in the U.S. and around the world, thinking to myself,

Oh, no, this is not going to be good.”

We’re now about 3,000 times the rate of infection we were then. It’s no wonder we’re all stressed.

Checking death statistics 10 times a day isn’t good for you

Initially, I checked the Worldometer site several times a day, thinking it could not possibly be as bad as I thought. No one else seemed to be that worried.

When everything started shutting down and more people were seriously concerned, I still spent my first hour every morning browsing the news on the virus. It was all bad and I found it hard to concentrate on work. Little things annoyed me.

I was already staying inside, not seeing my friends and family, working from home. Did me knowing exactly how much the death rate had climbed since yesterday do any good?

No, of course it didn’t. That was a rhetorical question.

What you should do instead

Start the morning with something you want to do.

For some people it might be a jog or a bike read. Good for you. I did enough training when I was young to last until I’m 200. (I’m serious. Google it.)

Mine may sound really dorky but on my list for a long time has been wanting to get better at WordPress. I write this blog and one on the 7 Generation Games site. I wrote a blog on mostly judo and life for a dozen years, though I rarely update that any more.

I took some courses on lynda.com for a month and then I got busy for 8 months and did nothing. So, now I am back at it.

Coffee

Every morning, I lay in bed, drink a cup of coffee and watch videos or read a book on WordPress

Whatever you’ve been WANTING to do, do that thing

Notice I said “wanting”, not “felt you should do”. No one looks forward to the next morning when they are going to clean out the junk drawer in the kitchen or do their taxes.

Three of the things I like most are coffee, sleeping late and programming. So, now, every morning, that is how I start my day.

Even better, my husband usually gets up, grinds the coffee beans and brings me up a cup so I don’t even have to get out of my warm bed.

frog

Tell the people who think you should start your day with the things you have to do that they should go eat a frog

You’re at home. You’re going to be home ALL fucking day! You can start off by playing a video game for an hour.

Get library card

Seriously, libraries are amazing. Before you start whining that the libraries are closed, know this …

Many libraries allow you to apply for a card online during the current pandemic

I have a card for the Los Angeles Public Library, the Santa Monica Public Library and, as a faculty member, I also have access to the National University library.

Through the Los Angeles library, I can download 15 ebooks a month using the Hoopla app. I can also download ebooks owned by the library and read these on a Kindle or iBooks app.

There is an app called Kanopy through which I can get six movies a month free.

I really like documentaries, so here is a place I’ve found a lot of interesting ones.

The Santa Monica Public Library only allows 6 downloads a month with Hoopla, which is why I needed two library cards!

There is just a lot of cool stuff, from apps to learn languages to checking out newspapers. Don’t want to subscribe just to read that one article? Use your library card.

Okay, so there are my two recommendations for today:

  • Start your day with something you want to do.
  • Check out the free books, movies, magazines, newspapers and apps from your public library.

You can also play AzTech: The Story Begins, on your iPhone or iPad while you are waiting to hear my next post recommendation. It’s an interesting idea. You’ll see.

I’ve read a lot of cheery tweets that said something like,

Young girl frowning in disapproval

“Buffy, Biff and I are isolated at home with our terrier, Boo. Here’s a picture. Isn’t he cute? We played card games, then I baked this three-course meal I saw on Pinterest. Biff is taking this time to finally become proficient in Mandarin with a course he is taking online.”

Seriously, what is WRONG with you people?

Now, those are the people we all want to slap, but there is another group that is more worrying. If working remotely is your usual mode, you are still drawing a paycheck and no one in your family is seriously ill, you may feel as if you should be going about life as usual.

I was in that group. After all, I have an office in my house where I usually work when I’m not traveling. My husband works upstairs. I’ve taught online for years. So, I’m in the same place, doing the same thing. Other people have real problems. Everything is fine.

Everything is NOT fine

A very sensible tweet I read said something like,

If you haven’t eliminated at least one student assignment, you are doing it wrong. Students are having to do their classes on line, have lost jobs, have jobs for which demand has skyrocketed overnight, have children or siblings at home interrupting them, have to share a computer, don’t have Internet access. They can’t go to the beach or the gym to de-stress. Some are home with abusive parents or partners. Expecting the same level of work is clueless.

I thought, “Well, yeah, I am sure that is true for students who are living in poverty, who are in elementary or middle school, but I teach graduate students who are professionals.”

Then … I got the assignments that were due after everything began locking down. Now, I should preface this by saying I have taught the same course for the same university for seven years. Over the past couple of years, the admission requirements for the program have been tightened, so the average student is more prepared.

My highly qualified graduate students made mistakes that I know they would not normally make

How do I know this? Before Coronavirus was an every day word, their work was as good or better than the average class. As the country began to shut down, they began to make mistakes at a far higher rate than my previous classes. These were particularly more common on problems that required detailed attention. For example, looking at the data to see that the subject numbers were all duplicated and then identifying this as a problem that requires repeated measures analysis.

I made mistakes that I would normally never make

One thing I am usually scrupulous about is data quality and data integrity. In fact, it was a major part of the paper I was supposed to give at SAS Global Forum – which was cancelled. The whole conference was cancelled, that is, not just my paper. Yet, I uploaded the wrong data set to the course directory, didn’t do any descriptive statistics and barely glanced at the PROC CONTENTS. Of course I know better!

The first step in solving a problem is admitting you have a problem

If you’ve read this blog for a long time you may know that I’m not a particular fan of poetry. However, I do know there was a poem with the title, “No man is an island.” (See, not as completely uncivilized as you thought!)

Even if you are healthy, have a safe place to live and a paycheck, you probably know people who don’t

Even if everyone you know- lucky you – is healthy, wealthy and wise, there is the probability that any one of you can get hit tomorrow. Your dad, grandmother or child can become sick. Someone in your family or a close friend can lose a job.

Your daily routine has been disrupted

You can’t go to the gym, church, the library, the mall. Maybe, like many of my friends, your judo club or church is where you used to spend many hours every week and now you can’t go there. People who were important in your life you can’t see any more. Maybe you can’t see your family and friends because they are at high risk due to health problems and have to self-isolate.

Yes, you aren’t living in a slum with no running water, so maybe you feel as if you should be “just shaking it off” and finding some “quarantine project” like Biff and Buffy.

Let me tell you this, Biff and Buffy are assholes. It’s perfectly normal to be anxious. The DEFINITION of anxiety is

A feeling of worry, nervousness, or unease, typically about an imminent event or something with an uncertain outcome.

Oxford Dictionary

We are definitely living in uncertain times.

So, now that we have admitted that it’s normal to feel anxious, the next post is some tips on what to do about it, without sounding TOO much like Buffy.

Last week, I mentioned that successful consultants have five categories of skills; communication, testing, statistics, programming and generalist.

COMMUNICATION

Communication is the number one most important skill. All five are necessary to some extent, but a terrific communicator with mediocre statistical analysis skills will get more business than a stellar statistician that can’t communicate. Communication is a lot more than explaining results to clients or making small talk at meet ups.

Documentation

Communication includes documentation, both in your code and internal documents such as codebooks or an internal wiki. It includes letting clients know what you’re going to do, what it’s going to cost, what that cost includes, what were your results and what those results mean. If you’re good at communicating with clients, colleagues and your future self, you’re half-way to success.

An example of the critical nature of communication can be found in the following retraction:

The identified programming error was in a file used for preparation of the analytic data sets for statistical analysis and occurred while the variable referring to the study “arm” (ie, group) assignment was recoded. The purpose of the recoding was to change the randomization assignment variable format of “1, 2” to a binary format of “0, 1.” However, the assignment was made incorrectly and resulted in a reversed coding of the study groups.”

Aboumatar and Wise (2019, p. 1417)

Because of this incorrect coding, the reported results were the exact opposite of what actually occurred.

Document coding!

Here is an example from a current research project where the CES-D depression scale was used, which requires several items to be reverse-coded before scoring.

In the HTML file where the user enters data that’s written to the database there is this comment:

    <h5 >I felt that I was just as good as other kids.</h5>
    <! –– This is reverse-coded. Don’t you dare change it. ––>
<div class=”row mb-3″>
    <button id=”cesd4_1″ data-src=”3″ class=”cesd4 btn btn-light shadow-box col-5 my-3 mx-auto”>Not at all</button>

 In the original file to read in the data to SAS, there is a comment:

*** NOTE: CESD IS ALREADY REVERSE-CODED. DOES NOT NEED CODING!;

FILENAME REFFILE2 ‘/home/directory3/data_analysis_examples/crossroads/cesd.xlsx’;

In the internal wiki, there is this note:

Tables in Acme Project Database

CESD – Center for Epidemiologic Studies Depression Scale – NOTE: The data are reverse coded at data entry. There is no need to reverse code these. There are 25 columns in this table; ID, username, session number, questions 1 through 20 of the CESD scale, the CESD total which is the sum of the 20 questions, named item21 for some odd reason, and a time stamp.

Document everything! Document how are items coded, how subscales or totals are computed.

This may seem like overkill, but how many retractions could be prevented by this level of documentation? If you are a consultant, it’s probable that at some point someone else will be looking at these data, or that you may be called back a year later to do a longitudinal analysis. Your colleagues and future you will thank you. A year or two from now, I don’t want to be looking at this data set and wondering if I need to reverse-code those items or if it was already done. I want to KNOW!

I deeply suspect that there are more erroneous results published due to incorrect coding of data than to incorrect analyses. After all, the peer reviewers, editors and readers see how you analyzed your data. No one sees how you coded it but you and, possibly, the person who has your position after you.

A few weeks ago, I ended my post with “there is one thing a statistical consultant absolutely must have and promised to say what that is in the next post. Maria and I had just picked up our rental car at the Detroit airport when she turned to me and asked:

So, what is the one thing a statistical consultant has to have?

I told her,

“I have absolutely no idea what I was thinking last month!”

In my defense, I have been in five states and 22 cities in the past 21 days. Maria says it is only 16 because I was in Minneapolis, Fargo and Denver twice each. She also says I can’t count Denver, Chicago or San Francisco since I only changed planes there. Poo!

In Long Beach thinking about statistical consultants

Now that I am back in Los Angeles and my brain has unfrozen I think there are actually five things you must have but one of these is the most important. In my not at all humble opinion, though, you need ALL FIVE.

The actually five skills a statistical consultant must have

Man playing drum in preparation for me saying what  are the 4 skills statistical consultants must have
Drum roll, please
  1. COMMUNICATION – This is the number one most important skill. If you don’t have the rest, you’ll still suck and be unemployed but a terrific communicator with mediocre statistical analysis skills will get more business. I don’t just mean shaking hands and small talk at conferences, either. Communication includes documentation, both in your code and in codebooks, an internal wiki, etc. It includes letting clients know what you’re going to do, what it’s going to cost, what that cost includes, what were your result and what those results mean. If you’re good at communicating with clients, colleagues and your future self, you’re half-way to success.
  2. TESTING – I’ve ranted on this blog a lot about testing because it is one of the areas where people often seem to fall short early in their careers. I got a lot of hate for this post when I said I don’t hire self-taught developers because there are things they don’t teach themselves adequately, like testing.
  3. Statistics – Well, duh. Props to the person in the Chronicle of Higher Education forum whose signature read, “Being able to find SPSS in the start menu does not qualify you to run a multinomial logistic regression.” Your clients may not know what power, quasi-complete separation or multicollinearity mean in interpreting an analysis. They do trust that you understand whatever is necessary to be understood for the work. Don’t let them down.
  4. Programming – when I was a graduate student Very Important Professors had lowly peon graduate students and programmers to write their code for them. All of those people had started their careers using punched cards, (honest!) it was that long ago. All of the statistical consultants I know do programming, or can code their own analyses if necessary. Even if you aren’t doing it all yourself – I’m certainly not these days – you need to know enough to review the code your minions wrote or help said minions when they get stuck. Sometimes, it’s just quicker to do it by yourself than explain to someone else, especially if you need to fix a bug in a code that a client is waiting on.
  5. Be a generalist – I’ll have more to say about this in future posts. In brief, even the consultants I know who are well-known specialists in one language know and use others. If you think your career is going to be you sitting on a mountain or in penthouse office, pontificating to others about sums of squares, the computation of Wilks’ lambda or options for PROC GLMSELECT , you are going to be sadly disappointed. On the other hand, if you do know of a job like that, I would consider taking it for a sufficiently large quantity of money.

I need to get a data set into SAS for a course I’m teaching in March. Students like real data and some kind folks were willing to allow their de-identified data to be used. Win-win.

How did I get this data? In a SAS data set with a handy code book? Oh, very funny!

A bunch of people laughing
This blog will pause while we all laugh at your naivete

I received a login to PHPMyAdmin where the data which were definitely not created for my personal convenience reside.

First, I downloaded the data as CSV for Excel. This gave me a file where everything was like this.

re_apply;”consumer_id”;”email”;”counselor”;”gender”;”date_of_birth”;”age”;”primary_disability”;”secondary_disability”;”education”;”member”;”tribe”;”district”;”job_when_entered”;”if_job_earnings”;”earnings_type_before”;”referral”;”other_refer”;”application_date”;”assessment_date”;”eligibility_date”;”ipe_date”;”notify_rights”;”vocational_goal”;”state_vr”;”status”;”closure_date”;”status_type”;”employment_date”;”type_employ”;”start_job_earnings”;”post_job_earnings”;”earnings_type_after”;”intermediate_goal”;”semesters”;”int_completed”;”intermediate_date”;”last_contact_date”;”comment”

Yes, one, long line with everything in quotes and every column separated by a semi-colon. These are the column names but all of the data are in this exact same format as well.

Of course, you COULD upload this file and read it into SAS but that would take time and effort.

OR you can download a regular CSV or ODS spreadsheet file, just pick one of the other options, and then all your data would be in nice columns but you have no header row. Of course, that’s pretty easy to write an input statement in SAS. You just need to type in a hundred or variable names and be sure to have the format correctly specified. Not hard but probably take you more than a minute.

You COULD theoretically download the data as an SQL file and use SQL connect according to some smart and less lazy people on SAS Communities.

And, as some of the posters noted on that forum, not everyone has access to SQL connect.

Or, you could be a completely lazy person like me and fix it all in about 12 clicks.

Here is how:

  1. Download the file as CSV for Excel
  2. Copy and paste the first line, the header, into Word
  3. Replace all the quotes with nothing, using Replace from the EDIT menu
  4. Under the TABLE command select “CONVERT TEXT TO TABLE”. For “Separate text at”, click on Other and put in a semi-colon.
  5. Now you have all of your filenames in nice columns as a table, copy that.
  6. Download the file again as CSV
  7. Insert a row and paste the filenames you copied in Step 5
  8. Save that file as an Excel file
  9. Upload it into SAS Studio
  10. Under TASKS and UTILITIES, select IMPORT DATA, drag your file you uploaded to the window and click on the little running guy.
Converting from text to table
Click 8 or So
Tasks and utilities
All that’s left is to drag the file

First of all, I want to draw your attention to this retraction in the Journal of the American Medical Association and mad props to Drs. Aboumatar and Wise and John Hopkins for doing the right thing in publicly retracting it.

For the TL; DR crowd

Someone who is probably now unemployed miscoded the study groups in this randomized clinical trial of self-management of Chronic Obstructive Pulmonary Disease. What does that mean? In this case, it meant that the reported results were the exact opposite of what was really observed because the treatment groups were coded incorrectly. Also, read the seven tips at the end of this post.

When I talk about statistical analysis, I focus 80% or more of my time and attention on the basics of knowing your data, cleaning your data and examining your data some more. To some, mostly younger, statisticians, that is not the sexy stuff. Why am I not talking about neural nets or generalized linear mixed models? Don’t I know that improving your prediction by .3% can result in millions of dollars in profit for a corporation that has 38 million customers?

What I know is that problems like the one in that JAMA article occur more often than we like to admit.

Recently, a student sent thesis results and then the next day sent an email saying, “Oops, I meant to use the DESCENDING option in PROC LOGISTIC but I didn’t, so the results are the exact opposite of what I said.”

A couple of years ago, I did an analysis with a depression scale for which the standardized coding is 0 to 3, but the application had used 1 to 4. The first analysis showed that every single person in the sample was clinically depressed. Fortunately, I caught this before it was published. Even when I re-analyzed the data with the correct scoring the mean score was extremely high. This was not a random sample of the population, but rather, children with a family member addicted to methamphetamine. The original (incorrect) analysis wasn’t in the opposite direction but it did somewhat overstate the problem.

Several years before that, I worked for a client who had a previous consultant with no knowledge of their particular field but who was a very good programmer. In reviewing some of that person’s code to understand the data and how it had been scored, I found that NONE of the items that should have been reverse-coded had been. The consultant had simply taken the sum of all of the items. This research had been published, by the way. I mentioned this to the client and suggested that a retraction was in order. That retraction never happened and I never worked for that client again.

My Six Tips for Saving Your Ass

  • Learn to code. I don’t mean you need to be the greatest SAS/ R/ Python whatever guru in the world but you should be able to read through the code someone else wrote and understand it. This means you should be able to read an IF-THEN statement, a loop re-coding all the items in an array and the statistical procedures used in your analysis.
  • Understand that the DESCENDING option in PROC LOGISTIC means that the probability modeled is reversed. So, by default, PROC LOGISTIC models the probability of response levels with lower Ordered Value, and if you have death (coded 0= lived, 1= died) as the dependent, the procedure is predicted who lived. If you use the DESCENDING option, it’s going to predict who died.
  • Know how many people should be in each group; control, experimental condition 1, experimental condition 2. Do a PROC FREQ and see if it matches what you expect.
  • Know the range for each item in your analysis and do a PROC MEANS with mean, minimum, maximum and standard deviation. Even if you have 500 or 600 variables it shouldn’t take you all that long to scan through that many lines and see if anything is out of range.
  • Know which items should have been reverse-coded and check if that was done.
  • Compute reliabilities for each scale in an analysis. While the reliability would not have been changed in the depression example where 1 was added to every response, it would have picked up those cases where the variables were not re-coded by showing very low reliabilities.

A seventh, extra bonus tip

If you can’t understand the code that someone has written, not because you are a moron (can’t help you there), but because they are one of those people who never write comments in code, don’t believe in documentation and write code that includes an unnecessary number of macro variables, user-written macros and overly complicated solutions, fire their sorry ass and hire someone less pompous. I’m not saying you shouldn’t have macros or that because a person uses a DATA step and you prefer PROC SQL you should get rid of them. What I am saying is if you ask a person what decisions they made in writing that code and what was the reason for, say, using a generalized linear model instead of a general linear model, they should be able to tell you.

Never fear, I’m not going to post all 30 things in this post. This is a series. A LONG series. Get excited.

I was invited to speak at SAS Global Forum next year and it occurred to me after thinking about it for 14.2 seconds that there are plenty of people at SAS and elsewhere that are more likely to have new statistics named after them than me.

While I can code mixed models, path analysis and factor analysis without much trouble, I’d be the first to admit that there are plenty of new procedures and ideas I see every year that I never really master. I mean to, I really do, but then I get back to the office and attacked by work. So, the person to introduce you to every facet of the bleeding edge, nope, that’s probably not me, either.

If you think this is where I experience impostor syndrome and say “I couldn’t possibly have anything worth saying”, we have obviously never met.

I’m the old person on the left. The youngest of many daughters is on the right.

Okay, there’s the most current picture of me, so now you sort of know who I am. I figured I better post a current one because I had not updated my LinkedIn photo in so long that I connected with someone who said,

“Oh, I have met your mom.”

And I had to reply,

“No, you have met me. My mom is 86 years old and retired to Florida, as federal law requires. Florida state motto: Your grandparents live here.”

So, when do you get to these 30 things?

Now. I decided to divide everything I learned into four categories.

  1. Getting clients
  2. Getting data into shape
  3. Getting answers
  4. Getting people to understand you.

I picked four because if I had five or six categories, people would expect there to be an even number of points in each because 30 divides evenly by five and six. See? I am good at math.

The money part: Getting clients

First, decide what kind of statistical consultant that you want to be.

Are you a specialist or a generalist?

You can be like my friend, Kim Lebouton, who specializes in SAS administration for the automotive industry and seems intent on keeping with the same clients until she or they die, whichever comes first. I linked to her twitter because she is too cool to have a web page.

You could be like Jon Peltier of Peltier Tech and specialize in Excel. Basically, if there is anything Jon doesn’t know about Excel, it’s not worth knowing. Personally, I feel as if most things about Excel are not worth knowing, which is why I’m not that kind of consultant.

I do love that the Microsoft Store carries our games for Windows, though, so woohoo for Microsoft.

Canoe the rapids and learn fractions, with your kids or by yourself because maturity is overrated

I’m the kind of statistician that doesn’t have a time zone.

A few years ago, I was at a conference when people were trying to coordinate their schedule for an online meeting. They were saying what time zone they were in and someone asked me,

“You’re on Pacific Time, right?”

My friend interrupted and said,

“She doesn’t have a time zone.”

It’s true. I was on Central Time last week, in North Dakota. I’m in California this week. Next week, I’m back on Central Time in Minnesota and South Dakota. The following week, I’m on Eastern Time in Boston.

In the winter here (which was summer there), I was in Chile. During the spring here (which was fall there), I was in Australia, and I’m in the U.S. now.

BUT HOW DO YOU FIND CLIENTS?

This is probably the question I get the most and I have an odd answer.

Get really good at something and the clients will find you.

Jon’s really good with Excel. Kim is superb at SAS administration. What am I good at? I’d say I am excellent at taking something that a client may only be vaguely aware is a statistical problem and solving it from beginning to end, in a way that makes sense to them.

If you try mansplaining me in the comments that what I do is called applied statistics, I will find where you live and slap you upside the head. I teach at National University in the Department of Applied Engineering. It’s in the fucking department name. I KNOW.

In response to the question in stats.stackexchange regarding the difference between mathematical statistics and applied statistics, there was this answer:

Mathematical statistics is concerned about statistical problems, while applied statistics about using statistics for solving other problems.

– Random person I don’t know on the Internet

Mathematical statistics often involves simulated (that is, fake) data, and nearly always uses data that is cleaned of data entry errors – in other words, not very representative of real life.

If you ask me, and even if you don’t , many data scientists act as if data issues can be fixed by having big enough data. This always seems to me similar to those startups who are losing money on every sale but aren’t worried because they are going to make it up on volume. Since data is key, let’s talk about that in the next post.

But wait! How do you get those first clients?

There is never a surplus of excellence – unless maybe you are an English professor, but they’re not reading this blog.

Network.

Let your professors know that you are interested in consulting. I got my first consulting contracts by referrals from professors who had more work than they could do. Similarly, I have referred several potential clients to students and junior professionals either because I was too busy, not interested or they could not afford my rates.

Go to conferences

I’ve had clients referred by other consultants who met me at a conference and a particular contract was not in their area of expertise but they thought it might be in mine. Similarly, I’ve referred clients to other people because I don’t really do that thing but maybe this person will be available.

Most jobs come by word of mouth

There is an evaluation consultant organization. I don’t know who the hell belongs to it. Much of the work that I do, someone’s job is on the line. That is, if they can’t demonstrate results, they may lose their funding and everyone in the building loses their job. In almost all of it, at some point the project director or manager or whoever is going to go present these results to a federal agency, tribal council or upper management, trusting that everything they say is true because I said so.

In that type of high stakes situation, they’re not going to get someone from an ad on Craig’s list. If that sounds like bad news, the good news is that after you have been around for a while and done good work, the jobs come to you.

Since a big difference between mathematical statisticians and applied statisticians is the messiness of the data, I’m going to address that in the next few posts. Expect more swearing. Because data.

Next Page →