Anyone who uses SAS (or doesn’t) probably has their own reasons. I have a few but a major one is the ease of importing just about any type of data.

Mo’ clients, Mo’ problems

There are multiple types of consultants. I’m the type who is, literally, all over the map. I’ve been in five countries this year and I think 11 states plus the District of Columbia, but I might have left off a couple. I said 9 in a post on a different blog where I occasionally write about my life and judo, but then I remembered I’d been in Texas for SAS Global Forum where I gave a talk on biostatistics and also in New Mexico speaking on transition from school to work for tribal youth with disabilities.

What that means is that I work with a wide range of organizations and their data is not all in the same format.

If you work with a wide range of clients, ease of data import matters

If you’re a consultant who works consistently with one client, data formats may not be your biggest issue. You probably wrote a program to read in that data, no matter what messy format it was in and you’re good to go. In my case, though, every dataset, every project is different.

All the data, all the time

In the previous post, I mentioned reading in the IPEDS data, which is a relatively small public data set (around 7,000 x 60). Fantastically, that came with a SAS program so all I needed to do was upload the raw data file and change the INFILE statement.

Proc import does not a consultant make

Maybe when you were a student you imported your data sets by a PROC IMPORT step. This isn’t terrible. You should use this procedure when you can. However, you’re going to need to go several steps further.

Even worse, if you’ve been getting your data by simply using the LIBNAME statement your professor provided you or doing some pointy-clicky thing with SAS Studio or Enterprise Guide (or SPSS) you have a lot to learn.

Every year, I have graduate students who tell me they are going to become consultants. More often than not, I shake my head and think,

“You have no idea what you are getting into.”

– Me

If you are going to be working as a statistical consultant for a variety of clients, far more than PROC LOGISTIC or PROC GLIMMIX, your time is going to be spent in the DATA step.

It’s not just a matter of data formatting or missing data, but of creating the data you need that isn’t there. What do I mean by that? Ha ha, that is a future blog post that I may write next time I’m on a plane somewhere and have a spare moment. Probably tomorrow.

First of all, I want to draw your attention to this retraction in the Journal of the American Medical Association and mad props to Drs. Aboumatar and Wise and John Hopkins for doing the right thing in publicly retracting it.

For the TL; DR crowd

Someone who is probably now unemployed miscoded the study groups in this randomized clinical trial of self-management of Chronic Obstructive Pulmonary Disease. What does that mean? In this case, it meant that the reported results were the exact opposite of what was really observed because the treatment groups were coded incorrectly. Also, read the seven tips at the end of this post.

When I talk about statistical analysis, I focus 80% or more of my time and attention on the basics of knowing your data, cleaning your data and examining your data some more. To some, mostly younger, statisticians, that is not the sexy stuff. Why am I not talking about neural nets or generalized linear mixed models? Don’t I know that improving your prediction by .3% can result in millions of dollars in profit for a corporation that has 38 million customers?

What I know is that problems like the one in that JAMA article occur more often than we like to admit.

Recently, a student sent thesis results and then the next day sent an email saying, “Oops, I meant to use the DESCENDING option in PROC LOGISTIC but I didn’t, so the results are the exact opposite of what I said.”

A couple of years ago, I did an analysis with a depression scale for which the standardized coding is 0 to 3, but the application had used 1 to 4. The first analysis showed that every single person in the sample was clinically depressed. Fortunately, I caught this before it was published. Even when I re-analyzed the data with the correct scoring the mean score was extremely high. This was not a random sample of the population, but rather, children with a family member addicted to methamphetamine. The original (incorrect) analysis wasn’t in the opposite direction but it did somewhat overstate the problem.

Several years before that, I worked for a client who had a previous consultant with no knowledge of their particular field but who was a very good programmer. In reviewing some of that person’s code to understand the data and how it had been scored, I found that NONE of the items that should have been reverse-coded had been. The consultant had simply taken the sum of all of the items. This research had been published, by the way. I mentioned this to the client and suggested that a retraction was in order. That retraction never happened and I never worked for that client again.

My Six Tips for Saving Your Ass

  • Learn to code. I don’t mean you need to be the greatest SAS/ R/ Python whatever guru in the world but you should be able to read through the code someone else wrote and understand it. This means you should be able to read an IF-THEN statement, a loop re-coding all the items in an array and the statistical procedures used in your analysis.
  • Understand that the DESCENDING option in PROC LOGISTIC means that the probability modeled is reversed. So, by default, PROC LOGISTIC models the probability of response levels with lower Ordered Value, and if you have death (coded 0= lived, 1= died) as the dependent, the procedure is predicted who lived. If you use the DESCENDING option, it’s going to predict who died.
  • Know how many people should be in each group; control, experimental condition 1, experimental condition 2. Do a PROC FREQ and see if it matches what you expect.
  • Know the range for each item in your analysis and do a PROC MEANS with mean, minimum, maximum and standard deviation. Even if you have 500 or 600 variables it shouldn’t take you all that long to scan through that many lines and see if anything is out of range.
  • Know which items should have been reverse-coded and check if that was done.
  • Compute reliabilities for each scale in an analysis. While the reliability would not have been changed in the depression example where 1 was added to every response, it would have picked up those cases where the variables were not re-coded by showing very low reliabilities.

A seventh, extra bonus tip

If you can’t understand the code that someone has written, not because you are a moron (can’t help you there), but because they are one of those people who never write comments in code, don’t believe in documentation and write code that includes an unnecessary number of macro variables, user-written macros and overly complicated solutions, fire their sorry ass and hire someone less pompous. I’m not saying you shouldn’t have macros or that because a person uses a DATA step and you prefer PROC SQL you should get rid of them. What I am saying is if you ask a person what decisions they made in writing that code and what was the reason for, say, using a generalized linear model instead of a general linear model, they should be able to tell you.

The famous statistician, F.N. (for Florence Nightingale) David was a professor at UC Riverside, where I earned my doctorate. My advisor told this story about her:

We were on this dissertation committee – I forget if it was for biology or what, back then, this was a small campus so if you were in statistics you could end up on any committee. So, he gets to the end of his defense, and F.N. David pulls the cigar out of her mouth and says,

“Young man, you believe your numbers far too much.”

The point Dr. Eyman was trying to make to me was that even if you have done every single computation perfectly …

“The government are very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn pleases.”

– Josiah Stamp

What is a conscientious statistical consultant to do?

Start with getting to know your data better than God knows the Bible. Let’s start with analyzing secondary data, for example, IPEDS, that has already been collected. I’ll talk about collecting your own data later. Let me just put in a plug for doing it electronically if possible. Also, make sure your data entry staff know which is the intervention and which is the control group. (You think I’m kidding but I’m not.)

Secondary data analysis: Read the documentation!

You think that is obvious, do you? IPEDS is the Integrated Postsecondary Education Data System, collected by the National Center for Education Statistics. It is my favorite type of data set and the type you almost never get. It includes pretty much the entire population of interest.

If you don’t know these things, you don’t know your data:

  • Is it a sample or the entire population?
  • If it’s a sample, what proportion of the population was sampled and how? Randomly? Stratified random?
  • Does the data set have sampling weights? What is the variable name for those weights (You’re going to use them, aren’t you? Please say yes.)
  • How were the data recorded?

This isn’t all you need to know. We’ll talk about specific variables next.

One reason I like IPEDS is that you can be pretty sure everyone reported data because it’s mandatory for any institution who gets federal financial aid. It also includes the U.S. service academies, which are about the only post-secondary institutions who don’t. It also gives you a SAS program for reading the data after you upload it. There are also SPSS and STATA programs.

Another thing I liked about IPEDS is it is, inside and out, one of the best documented data sets I’ve seen. I’d recommend it as an example of how to do things if you are going to be creating data sets for secondary analysis yourself. Don’t get used to it, though, because most of what you’ll find in your career is far worse than this. Here is just a simple example from one data set.

*** Created:    October 2, 2018                                ***;
 *** Modify the path below to point to your data file.        ***;
 ***                                                          ***;
 *** The specified subdirectory was not created on            ***;
 *** your computer. You will need to do this.                 ***;

If you want to analyze it using SAS Studio, now you know that once you’ve uploaded the data, you do need to change the INFILE statement. If you don’t know the full path, ctrl-click (Mac) or right-click (Windows) on the data file and select PROPERTIES

Select Properties to get the path to your file

Change the INFILE statement to what you see in the path, so now it looks like this

infile '/home/your_directory/IPEDS/hd2017.csv' delimiter=',' DSD MISSOVER firstobs=2 lrecl=32736;

You won’t necessarily have the delimiter, etc. It depends on your file. Okay, run it, you have data. Awesome!

When I run frequencies for the IPEDS data, I get 7,153 institutions but the IPEDS methodology report says there are 6,642. What the hell? Looking through the data, I find that 287 institutions were closed in either 2017 or prior. Another 38 were combined with another institution or not to be include for some unspecified reason “out of scope”. There were 41 that were “not primarily post-secondary institutions”, so I dropped those also. Since I’m only interested in individual, active institutions for the research I’m doing, I’m dropping those.

There were 88 institutions that were new in 2017 or had their Title IV (financial aid) eligibility restored. After debating back and forth, I decided to drop those, too. My interest is in developing a baseline of enrollment and retention, which these new institutions will only have for one year.

My point is that I’ve gotten one of the best data sets you could ever find and 7% of the data is inappropriate for my purpose. Does it matter as long as 93% of the data are correct? Well, I definitely think that my results would be less accurate.

My second point is that there is not anything “wrong” with the IPEDS data. I can imagine plenty of circumstances in which one would want to have the data on closed institutions.

These may seem like details, but I am pretty convinced that if you are not a “detail person” you are never going to make it in the long run as a statistical consultants. These details add up fast.

One last thought – if 7% of the data needed to be tossed out before we even got started, and this is an extremely well-funded, well-designed data set, what do you think the average secondary analysis is going to be like?

Never fear, I’m not going to post all 30 things in this post. This is a series. A LONG series. Get excited.

I was invited to speak at SAS Global Forum next year and it occurred to me after thinking about it for 14.2 seconds that there are plenty of people at SAS and elsewhere that are more likely to have new statistics named after them than me.

While I can code mixed models, path analysis and factor analysis without much trouble, I’d be the first to admit that there are plenty of new procedures and ideas I see every year that I never really master. I mean to, I really do, but then I get back to the office and attacked by work. So, the person to introduce you to every facet of the bleeding edge, nope, that’s probably not me, either.

If you think this is where I experience impostor syndrome and say “I couldn’t possibly have anything worth saying”, we have obviously never met.

I’m the old person on the left. The youngest of many daughters is on the right.

Okay, there’s the most current picture of me, so now you sort of know who I am. I figured I better post a current one because I had not updated my LinkedIn photo in so long that I connected with someone who said,

“Oh, I have met your mom.”

And I had to reply,

“No, you have met me. My mom is 86 years old and retired to Florida, as federal law requires. Florida state motto: Your grandparents live here.”

So, when do you get to these 30 things?

Now. I decided to divide everything I learned into four categories.

  1. Getting clients
  2. Getting data into shape
  3. Getting answers
  4. Getting people to understand you.

I picked four because if I had five or six categories, people would expect there to be an even number of points in each because 30 divides evenly by five and six. See? I am good at math.

The money part: Getting clients

First, decide what kind of statistical consultant that you want to be.

Are you a specialist or a generalist?

You can be like my friend, Kim Lebouton, who specializes in SAS administration for the automotive industry and seems intent on keeping with the same clients until she or they die, whichever comes first. I linked to her twitter because she is too cool to have a web page.

You could be like Jon Peltier of Peltier Tech and specialize in Excel. Basically, if there is anything Jon doesn’t know about Excel, it’s not worth knowing. Personally, I feel as if most things about Excel are not worth knowing, which is why I’m not that kind of consultant.

I do love that the Microsoft Store carries our games for Windows, though, so woohoo for Microsoft.

Canoe the rapids and learn fractions, with your kids or by yourself because maturity is overrated

I’m the kind of statistician that doesn’t have a time zone.

A few years ago, I was at a conference when people were trying to coordinate their schedule for an online meeting. They were saying what time zone they were in and someone asked me,

“You’re on Pacific Time, right?”

My friend interrupted and said,

“She doesn’t have a time zone.”

It’s true. I was on Central Time last week, in North Dakota. I’m in California this week. Next week, I’m back on Central Time in Minnesota and South Dakota. The following week, I’m on Eastern Time in Boston.

In the winter here (which was summer there), I was in Chile. During the spring here (which was fall there), I was in Australia, and I’m in the U.S. now.

BUT HOW DO YOU FIND CLIENTS?

This is probably the question I get the most and I have an odd answer.

Get really good at something and the clients will find you.

Jon’s really good with Excel. Kim is superb at SAS administration. What am I good at? I’d say I am excellent at taking something that a client may only be vaguely aware is a statistical problem and solving it from beginning to end, in a way that makes sense to them.

If you try mansplaining me in the comments that what I do is called applied statistics, I will find where you live and slap you upside the head. I teach at National University in the Department of Applied Engineering. It’s in the fucking department name. I KNOW.

In response to the question in stats.stackexchange regarding the difference between mathematical statistics and applied statistics, there was this answer:

Mathematical statistics is concerned about statistical problems, while applied statistics about using statistics for solving other problems.

– Random person I don’t know on the Internet

Mathematical statistics often involves simulated (that is, fake) data, and nearly always uses data that is cleaned of data entry errors – in other words, not very representative of real life.

If you ask me, and even if you don’t , many data scientists act as if data issues can be fixed by having big enough data. This always seems to me similar to those startups who are losing money on every sale but aren’t worried because they are going to make it up on volume. Since data is key, let’s talk about that in the next post.

But wait! How do you get those first clients?

There is never a surplus of excellence – unless maybe you are an English professor, but they’re not reading this blog.

Network.

Let your professors know that you are interested in consulting. I got my first consulting contracts by referrals from professors who had more work than they could do. Similarly, I have referred several potential clients to students and junior professionals either because I was too busy, not interested or they could not afford my rates.

Go to conferences

I’ve had clients referred by other consultants who met me at a conference and a particular contract was not in their area of expertise but they thought it might be in mine. Similarly, I’ve referred clients to other people because I don’t really do that thing but maybe this person will be available.

Most jobs come by word of mouth

There is an evaluation consultant organization. I don’t know who the hell belongs to it. Much of the work that I do, someone’s job is on the line. That is, if they can’t demonstrate results, they may lose their funding and everyone in the building loses their job. In almost all of it, at some point the project director or manager or whoever is going to go present these results to a federal agency, tribal council or upper management, trusting that everything they say is true because I said so.

In that type of high stakes situation, they’re not going to get someone from an ad on Craig’s list. If that sounds like bad news, the good news is that after you have been around for a while and done good work, the jobs come to you.

Since a big difference between mathematical statisticians and applied statisticians is the messiness of the data, I’m going to address that in the next few posts. Expect more swearing. Because data.

A twitter storm erupted recently in response to one person’s thread about how to find a 10x engineer . Since I started programming FORTRAN with punched cards back in 1974, was an industrial engineer in the 1980s and now run a software company, I’ve worked with a few people, rightly or wrongly considered to fall into that category. So, I thought I’d weigh in on the original author’s points.

10X Engineers hate meetings

There are only two types of software developers who don’t dislike meetings. New developers don’t mind meetings too much because they have a lot of questions like “who do I talk to if I need access to this repository” or “What version of Unity was used to develop this game I’m supposed to update?” They also have specific questions about why the sound function they wrote is not working and Bob, who wrote similar functions for another game is sitting right there. Another type of developer actually likes meetings because he is complete shit at his job and it gives him an excuse not to be expected to do it.

Every other engineer I have ever met either dislikes meetings or actively hates them. The ones you think don’t dislike meetings are just pretending.

We have a 10-minute meeting every morning at 7 Generation Games. People not in the office drop in online. Everyone complains about it but we do it anyway. Why? Because, for example, I can find out that Adekola actually finished the teacher reports for Making Camp Premium before he left and see an example. Then, I can tell the people in marketing to include that in their discussions with schools. I can also tell one of the developers to take that code and modify it for Tribu Matemática , the Spanish version of Making Camp. In 3 minutes, everyone knows what the reports look like, that they are available and who is working on the next one. This leaves seven minutes for José to ask Bob about the sound function.

10X Engineers have irregular hours and work when other people aren’t around

I can’t think of any software developers who work better when other people are around. Writing code for anything complex requires having a mental model in your head of at least the part you are writing and, hopefully, some of the larger project in which it is used.

I’ve worked with a few people who were hit it out of the park better than anyone else. One definitely was a late night person and preferred to get to work when he got there. However, when crunch time came, he could work 8am to 10pm and code all that time if he had to do it. He wasn’t going to like it, though, who would?

On the other hand, about half of the really top engineers I know – both software and hardware – choose to work 9 to 5, even when telecommuting. The main reason they give is that those hours allow them to spend time with their children or spouse. Contrary to popular belief, the 10x engineers I’ve known tended to be married, although they did seem to get married a little older than the average.

10X Engineers know every line of code that has gone into production

This is just nonsense. I remember when SAS was rewritten in C (yes, I am that old) and hearing that it was something like 3,000,000 lines of code. I am assuming the author meant that these 10x engineers know every line of code WRITTEN BY THEM that has gone into production.

I don’t believe that, either, assuming what he means is that they can recall it immediately and say,

“Yes, in that function beginning on line 683, I pause the audio that’s playing, change the source file to the audio for the next scene, change the image file for the image for the next scene, increment the counter by one and restart the audio”.

If what he means is that they kind of recognize it like that person you met at a conference two years ago and are trying to remember their name, I might faintly agree.

We wrote Spirit Lake: The Game in 2012-2014. NO ONE who worked on that game remembers all of the code in it. I can say this because it was all done by me and The Invisible Developer and he is as good as you’ll ever find.

Here is an experience I share with every software engineer I have ever met, including the very best ones. I look at code and think,

“Who wrote this crap? Please don’t let it be me three years ago.”

10x engineers laptop screen background color is typically black (they always change defaults). Their keyboard keys such as i, f, x are usually worn out faster than of a, s, and e (email senders).

They always change the defaults part is true. One thing for sure all of the best engineers I ever met had in common is they like to mess with things. I only knew two people who had black backgrounds – ever. When I have time I’ll have to post about pseudo-10x engineers. Anyway, neither of those guys are anything special unless weirdness is a category.

Most of the best people I know have either pictures of their family or their favorite activity, like soccer or hiking, or a vacation photo as a background. Usually the e key gets worn out first because it is the most common letter in the English language. People usually name directories, datasets and variables something comprehensible.

My kids and one of my kid's dog. What real 10x engineers laptop backgrounds look like
Their laptop backgrounds look like this, except with their own kids, not my kids, because that would be weird

Is there anything true about a 10x engineer?

Since my 10x merit badge hasn’t come in the mail yet, I don’t have time to address all 10 points from the original thread. There were two points he made that were consistent with my experience.

Most really good engineers aren’t really good interviewers

I could only speculate about why that is true, so I will leave it as that is what I’ve observed. Maybe it’s because they are uncomfortable with exaggeration or with being asked to prove their competence.

10x engineers rarely job hunt

I have found this totally to be true and it makes sense. If you have someone that good in your organization and your management is not made up of complete morons, they are doing all they can to hang on to their best people. Usually, unless they work for morons, people that good are hard to hire away, too, because their current company is doing its best to keep them.

How would I find a 10x engineer?

I wouldn’t, because we are a small company and we can’t afford to pay what someone like that is worth. On rare occasions, we have been super lucky to be able to catch someone great for a short term contract that they just wanted to take for personal reasons.

We find good people and we develop them to be at the top of their field. I think the best way to identify a good software developer in an interview is take a look at their code. Ask them to bring something to the interview and explain how they solved particular problems in the code. Ask why they made the choices they did. If it is a project they know well and are proud of, you’ll get a lot of information. If they say, “I don’t know” a lot, that’s a bad sign. I’ve also found that people who typically “don’t interview well” forget about the interview part, focus on the project and become interested in telling you all about it.

Oh , and for all those people on twitter who said, “I wish you all got as exercised about diversity and inclusion as you do about 10x engineers “

Well, I am way ahead of you, sister. I have a lot to say about women in tech and over on our 7 Generation Games blog, too.

One thing I like about our company a lot is none of our developers fit the stereotype of the 10x coder asshole. Don’t get me wrong, we have more than our share of people who are absolutely great at their jobs. What we don’t have is the arrogant attitude of:

What do you mean you don’t know how to integrate 3-D scenes from Unity with web pages? I knew how to do that when I was in the third grade!

First of all, I bet most of those people are liars and if you found their third-grade teacher she would say,

Oh, little Larry? I vaguely remember him. Wasn’t the brightest crayon in the box, now was he?

The fact is that all programmers make mistakes, including really dumb ones. As we get older, we may catch these before we hit the commit or run button, but, then again, maybe not.

Why is this code not working?

data fixdata ;
set fix1;

*** FIXES RECORDS WITH WRONG USERNAME ;
username = trim(username);
if username = "" then delete;
pos = index(username,'-') + 1;
username2 = substr(username,pos);
if upcase(index(username,'test)' > 0 then delete ;

Three mistakes with SAS in the code

I made all three of these mistakes lately (though not all the same day).

The first two SAS will catch for you if you read your log. Also, I cheated you here by not including the color coding that you’ll see in the SAS editor just to make it harder for you.

First of all, I have an unmatched parenthesis.

if upcase(index(username,'test)' ) > 0 then delete ;

This still doesn’t work because I have the closing quote in the wrong place.

if upcase(index(username,'test')) > 0 then delete ;

The hardest errors to find are when your code is running but still wrong

Now my log shows no errors but it’s still not working. I still have more users in the file than I should. The hardest kind of errors to find are logic errors. SAS will usually find the coding ones for you.

What I want to do is delete any usernames that have the word “test” in them, whether written as TEST, Test or test – or any other weird combination people might come up with, like TestAM or TestGarbanzoBeans12.

The problem here is that I used the UPCASE function after I had already searched for the value of “test” in the username. The INDEX function returns a number, which is the position at which the first character of the first occurrence of a string occurs.

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 74:12

Here is what I really need to do

if index(upcase(username),'TEST') > 0 then delete ;

I needed to change a few things. First, I need to put the UPCASE function inside the arguments passed to the INDEX function so that the first thing that happens is that username is set to upper case. Next, I need to change the string it’s matching to “TEST” since I set the username to uppercase. Now, finally, I pass that to an INDEX function and see if the string exists. If so, I delete the record.

This is a relatively simple bug to fix when presented like this. However, when it is hidden in hundreds of lines of code and all I know is that the number of records doesn’t match what I should have, it’s not so obvious, particularly when there are no errors showing in your SAS log. Did the username2 throw you off? Well, imagine hundreds more lines like that.

My points, and I do have more than one …

  1. Everyone makes mistakes, no matter how experienced they are.
  2. A bug fix may seem obvious after you have found it or if it’s been pointed out to you, but buried in hundreds or thousands of lines of code, it’s not so easy. An argument for modular development.
  3. READ THE NOTES! If you are deluded into thinking that if there is nothing in red in your SAS log all is good, get over that notion now.
  4. The enhanced editor is your friend. The incorrect quotation mark you could have picked up in the editor if you noticed that the closing parenthesis was in purple. In this particular case, it wasn’t so obvious because it was only off by one character. However, if you see a whole bunch of purple, meaning it is quoted, or green, for comments, it can be a tip right away that something is off.

One last thing – the SAS Enhanced Editor with color coding – I remember when it was brought up as a new, improved version of SAS and I laughed.

“Really? That’s what you’ve got? Color-coding? I mean, I’m sure it might help someone their first week on the job but we are all professionals here .. ha ha ha.”

Yeah, that was stupid, too. That enhanced editor has helped me catch so many bugs in code as I was writing it!

Random fact about me – I host a podcast

Yes, yes I do. It’s called More than Ordinary. This week’s co-host is Drew Kim who co-founded and runs an esports company with his mom. (Yes, you read that right.) Next week will be the continuation of a discussion of all things judo with Jason Harai of Ippon Dojo, in Washington. Basically, it’s all about having guests who are doing something out of the ordinary.

What are reusable blocks and why do you want to use them?

This can best be explained by an example. Over at 7 Generation Games, we have a new project under way to create organize the hundreds of videos, presentations and activities we’ve developed with our games into a teacher resource site. Most of these fall into one of a few categories. For example, we have 19 math videos from Fish Lake.

Whenever a lot of posts have the exact same structure, you have a use for reusable blocks.

Take a look at this post on the Fractions on a Number Line video.

  1. It has a subheading (h3 tag) with the main point of the video.
  2. This is followed by a short paragraph describing the video, with a background color of light blue.
  3. Next is the video. IMPORTANT – although Gutenberg does allow you to just enter a url and hit return for a video to be shown in a regular post, I found this did NOT work for reusable blocks. When I used embed instead, it worked fine.
  4. After the video is a heading (h2 tag) telling you this video is from an awesome game we make.
  5. Next are two links, one for getting the game for computers,
  6. Another link for the app store for iPads.
  7. Then there is an image from the game and
  8. A short paragraph describing the game.

How to create a reusable block

Select everything you want in the reusable block. In my case, it is all 8 of those blocks listed above. Then, click on the 3 dots at the top of the block menu and in the drop down menu select ADD TO Reusable blocks.

Give it a name, save it and now you have a reusable block.

How to use a reusable block

A reusable block as “copy-and-paste”

You have two options. The first is to just use the block as-is. Say, I just wanted to include an ad in blog posts, or some call to action, like signing up for our newsletter. Then, I could just insert that block like I do any other block – paragraph, image, etc. and this would be pre-populated with the content. Done.

Reusable blocks as templates

The more common option for me is going to be to modify that block, using it as a template. So, I insert the block just like I do any other block. Then, I click on the block I just inserted and select convert to regular block.

Don’t forget to convert to regular block or your edits will be made everywhere you used that block!

Now that I have it converted to a regular block, I can change the first heading, the description and paste in the url for the new video. My post is done. Not only does this save me time, but if I want to hand the task off to someone else, say a new intern, they have a ready-made format.

Another advantage is if I do need to change something everywhere, I can do it with one click. A few years ago, the site we had been using to sell our Mac and Windows games went out of business. It would have been really helpful to have had something like this so that I did not have to go in and change every page where there was a link to the old site.

So, yeah, reusable blocks have converted me to Gutenberg. (Converted , get it? Oh, never mind.)

Fish Lake fractions game with Native American girl stepping on stones across a creek

Like math? You’ll love this game.

Get Fish Lake here for Mac or Windows

or … want Fish Lake for iPad ? Get it in the app store

Around our office, there are a lot of haters of the Gutenberg editor. However, I’ve found quite a few new features that are hard not to like. Here are just a few of them.

The cover block type

Say you’d like to have a background image for your text, like the one below. Just use the COVER block. Well, when I do that and put in the text and hyperlink it, the text is blue which is a hard color to read against that background color. So, I select the text and in the Color Settings in the right block settings, I pick white. Voila! Click the box below and check out AzTech: The Story Begins.

If you change your mind and transform the cover block back to a regular old image, then the text overlaid becomes the caption for the image, but where’s the fun in that?

You can change all kinds of attributes of the background images, for example, the opacity here was set to 60% and for the next image below to 40%.

The button block type

The button block does exactly what you might guess, it creates a button, with an optional link. You can easily change the background and text color just by selecting from the right panel under block settings (as in panel on your right, not as in correct panel, well, actually, that, too. I think this would be even more useful if you could combine it with the cover block, but, as of now, alas, that is not an option.

Sadly, buttons are not allowed here )-:

The columns block type

It’s funny given how much people made fun of me for using tables in my websites way back when we were making them with GoLive, that now we are back to something approximating tables. Also, you can see from the example below that you can get some pretty slick designs just with the Gutenberg editor out of the box. In the past, you’d need to do backflips with additional CSS to get the layout exactly how you wanted it.

Here is the new teacher resource site I’m working on

With the columns block, you can specify the number of columns, and you can even split some of the columns. If I wanted, I could split my English as a Second Language block above into two, Spanish and Lakota.

Of course, the nice thing about columns is that they, hopefully, are more responsive than tables. I say hopefully because the theme I am working with turns this into a very nice layout with all the blocks underneath each other for phones, but for smaller iPads it shows two sets of two boxes and then the last two underneath each other with a bunch of white space on the side. Oh well, nothing is perfect – yet.

Change is hard, but the new editor promises to be worth it

There are quite a few other new types in Gutenberg that I haven’t needed to use yet, but I am sure I will in the future, and some older features that seem to be significantly improved. If, like most of us in the office, found yourself swearing at WordPress because you’re existing site was working just fine and now you have to learn all this new $#@! when you really don’t have time, you might want to re-think that position.

I had more than the two tips on becoming a better programmer than I gave in the last post but I had run out of margarita. Now, being replenished with tequila and fresh lime by The Invisible Developer, here are two more. He often tells me that I should refer to myself as a developer and not a programmer because that is beneath me. I have never pretended to be cool. I started with punched cards as a programmer and a programmer I will remain. At least until the second margarita.

margarita
It’s Friday!

If you aren’t familiar with github, you could have gone to Chris Hemedinger’s super demo at SAS Global Forum. We use github for version control and it is indispensable for that. When you have several people working on the same program, I can edit files, you can, too, and we all upload and download the latest versions without copying over each other’s code. If you are on a project with more than one developer, once you have used a git repository, you’ll fight anyone who tries to take it away.

Because it is so good for sharing, github is used a lot for open source projects and for people just making their code publicly available.

The main thing I learned that I didn’t know is that there is a https://github.com/sassoftware

I had just assumed since SAS is a private company and definitely not open source that there would not be much available. I was wrong.

Whatever language you use, there is probably a github for it.

Here is a funny thing. When I first started learning JavaScript, I scavenged github to find examples of people making simple games like tic-tac-toe , Memory or mazes. I’d modify the code to do what I wanted and I thought all of these people were so much smarter than me.

After I learned a bit more, sometimes I saw functions or libraries in the code that didn’t do anything and I realized that a lot of these people had done the exact same thing as me – copied someone else’s code and modified it for their purposes.

Start by copying code from github, but don’t stop there

If you ask me – and even if you don’t, I’m going to tell you anyway – it is absolutely fine to download code from someone else’s repository on github and tweak it a little for your own purposes. However, don’t stop there! Dive into it. Figure out what each function does, try to understand their logic.

A better person than me would have their own public git repository. Oh well, I have a bucket of private ones for work and I’ve been writing this blog for 11 years, so that will have to do. YOU should definitely have public repository, though. Changing the subject here …

Git Repositories that are NOT python, R or Viya

The top repositories almost all entail either integrating SAS and Python (not surprising because it is open source) or Viya or Visual Analytics (presumably because it is expensive and SAS wants to promote it). There are also a smattering of SAS-and-R repositories in the top hits and repositories for SAS and iOS and SAS and Android. I’m not interested in any of that at the moment.

Right now, I am super-swamped but I should have some free time over the summer, so here are my personal interests I am marking for later. With 116 repositories, any SAS aficionado should find something of interest, and remember, this is just the sassoftware repository. There are additional repositories of individual users, like the last one I noted below

SAS Studio Tasks is an area I’d like to learn more about, as in writing your own custom tasks.

Data mining is an area I am ALWAYS wanting to brush up on more . This library of flow diagrams for specific data mining topics looks really cool.

Not a SAS Institute repository, this one from Michael Friendly is on macros and looks super cool.

As I mentioned above, I started using github for JavaScript code and there are TONS of repositories for just about any language that would tickle your fancy (what exactly IS a fancy, anyway?)

I have more tips but it will have to wait for another margarita and since my grandchildren are spending the weekend and just invaded my office, that will have to wait.

I did a random sample of presentations at SAS Global Forum today, if random is defined as of interest to me, which let’s be honest, is pretty damn random most of the time. 

Tip #1 Stalk Interesting People

I don’t mean in a creepy showing up at their hotel room way. If you see someone presenting either in person or referenced in twitter, blogs, etc. , check out what else that person has freely available on the web, in published proceedings, etc.

Let me give you an example that applies even if you are not into logistic regression. (You’re not? Feel shame.)

The first session I attended was a Super Demo in the exhibit hall which for some reason I don’t understand is always called the Quad. 

In a nutshell, logistic regression is usually 

  • binary, which is where I started out, modeling mortality studies, you’re either dead or alive
  • multinomial, that is, multiple categories, like college major or religion or 
  • ordinal , like someone is a subscriber, contributor, editor or administrator on a group blog, which are progressively higher levels of involvement

What if the data fit the proportional odds model for some of the explanatory variables and not others? You can do a partial proportional odds model. 

Line plots on slide
Graphing your data is a great way to see if the proportional odds model makes sense. You can see that it does for the variable on the right, but for the left, not so much.

Unfortunately, the super demos do not have a paper published in the app or proceedings, however, the presenter, Bob Derr from SAS mentioned he had presented a paper on this topic in 2013 (way to play hard to get, Bob – not!)

Paper reference on slide (also below in blog)

I skipped the next presentation to read it (and to write this post). If you are at all interested in multinomial and ordinal logistic regression, you should, too. You can find it here in the SAS Global Forum 2103 proceedings. http://support.sas.com/resources/papers/proceedings13/446-2013.pdf

It’s an outstanding paper and I am going to require it for my course next year. I think the students will find it far more accessible than some of the readings we have been using. They don’t complain loudly, but I know, I know. 

Tip #2 Read the Documentation (No, seriously, keep reading)

People who answer comments with LMGTFY (let me Google that for you) or RTFM (read the fucking manual), just so you know, that quit being funny around 1990. However, SAS documentation really is a treasure trove. It’s not just SAS, the same could be said about jQuery documentation or the WordPress Codex but we’re not talking about those today, are we? Please try to stay on topic. 

The SAS documentation runs many, many thousands of pages. It’s far better and more detailed than you would think. Let me give you an example a very helpful person named Michael pointed out in the Quad (what the hell is it with that name?) today. As I’ve mentioned several times lately, my students often struggle with repeated measures ANOVA. He suggested checking out the page on longitudinal data analysis.

http://support.sas.com/rnd/app/stat/procedures/LongitudinalAnalysis.html

It gives four different procedures (none of which are GLM, I noted, but that’s a discussion for another day). 

Related to that, I recommend when you are learning procedures just running some of the code examples. For example, here is one for repeated measures with PROC MIXED. http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_mixed_examples02.htm&docsetVersion=15.1&locale=en. (Yes, I really do have that on my mind lately)

Think about this, though. Once you graduate from whatever your last degree turns out to be, you don’t have anyone checking your work and telling you if it is right or not. You just write your code and hope for the best. That sucks, huh?

When you are learning a new procedure, you can write code using the data shown in the SAS documentation and see if your results match. Like an answer key for life! I always wanted one of those.

Next Page →