Last week, I mentioned that successful consultants have five categories of skills; communication, testing, statistics, programming and generalist.

COMMUNICATION

Communication is the number one most important skill. All five are necessary to some extent, but a terrific communicator with mediocre statistical analysis skills will get more business than a stellar statistician that can’t communicate. Communication is a lot more than explaining results to clients or making small talk at meet ups.

Documentation

Communication includes documentation, both in your code and internal documents such as codebooks or an internal wiki. It includes letting clients know what you’re going to do, what it’s going to cost, what that cost includes, what were your results and what those results mean. If you’re good at communicating with clients, colleagues and your future self, you’re half-way to success.

An example of the critical nature of communication can be found in the following retraction:

The identified programming error was in a file used for preparation of the analytic data sets for statistical analysis and occurred while the variable referring to the study “arm” (ie, group) assignment was recoded. The purpose of the recoding was to change the randomization assignment variable format of “1, 2” to a binary format of “0, 1.” However, the assignment was made incorrectly and resulted in a reversed coding of the study groups.”

Aboumatar and Wise (2019, p. 1417)

Because of this incorrect coding, the reported results were the exact opposite of what actually occurred.

Document coding!

Here is an example from a current research project where the CES-D depression scale was used, which requires several items to be reverse-coded before scoring.

In the HTML file where the user enters data that’s written to the database there is this comment:

    <h5 >I felt that I was just as good as other kids.</h5>
    <! –– This is reverse-coded. Don’t you dare change it. ––>
<div class=”row mb-3″>
    <button id=”cesd4_1″ data-src=”3″ class=”cesd4 btn btn-light shadow-box col-5 my-3 mx-auto”>Not at all</button>

 In the original file to read in the data to SAS, there is a comment:

*** NOTE: CESD IS ALREADY REVERSE-CODED. DOES NOT NEED CODING!;

FILENAME REFFILE2 ‘/home/directory3/data_analysis_examples/crossroads/cesd.xlsx’;

In the internal wiki, there is this note:

Tables in Acme Project Database

CESD – Center for Epidemiologic Studies Depression Scale – NOTE: The data are reverse coded at data entry. There is no need to reverse code these. There are 25 columns in this table; ID, username, session number, questions 1 through 20 of the CESD scale, the CESD total which is the sum of the 20 questions, named item21 for some odd reason, and a time stamp.

Document everything! Document how are items coded, how subscales or totals are computed.

This may seem like overkill, but how many retractions could be prevented by this level of documentation? If you are a consultant, it’s probable that at some point someone else will be looking at these data, or that you may be called back a year later to do a longitudinal analysis. Your colleagues and future you will thank you. A year or two from now, I don’t want to be looking at this data set and wondering if I need to reverse-code those items or if it was already done. I want to KNOW!

I deeply suspect that there are more erroneous results published due to incorrect coding of data than to incorrect analyses. After all, the peer reviewers, editors and readers see how you analyzed your data. No one sees how you coded it but you and, possibly, the person who has your position after you.

Comments

Leave a Reply