Sep

8

You wouldn’t think there would be that much to say about scree plots. That is because you are like me and sometimes wrong.

The problem I often have teaching is that I assume people know a lot more than is reasonable to expect for someone coming into a course. Sometimes, I’m like a toddler who thinks that because she knows what color hat the baby was wearing yesterday that you do, too.

Toddler with baby

 

So …. a scree plot is a plot of the eigenvalues by the factor number. In the chart below, the first factor has an eigenvalue of about 5.5 while the eigenvalue of the second factor is around 1.5. (If you don’t know what an eigenvalue is, read this post. )

scree plot with bend in plot after second factor

 

As I mentioned in the previous post, an eigenvalue greater than 1 explains more than a single item, but as you can tell by looking at the plot, some of those eigenvalues are barely higher than one. Should you keep them? Or not?

What is scree, anyway? Scree is a pile of debris at the base of a cliff. In a scree plot, the real factors are at the top of the cliff and the scree is the random factors at the bottom you should discard. So, based on this, you might decide you only have one real factor.

The idea is to discard all of the factors after the line starts to flatten out. But is that after 1 factor? It kind of flattens out after four?  Maybe?

Sometimes a scree plot is really clear, but this one, not as much. So, what should you do next?

Hmm … maybe I should write another post on that.

Oct

1

One definition of insanity is doing the same thing over and over, expecting different results. One thing that can drive programmers insane is doing the same thing over again and GETTING different results.

In a past life, working in tech support, I learned that whenever anyone calls and says,

I did it exactly like your example and it didn’t work for me.

– they are lying.

In my experience, when you have the same programming statements but get different results, something else is always different and that something is often the demands put on the system.

How can that be if your statements are the same? Let me give two examples, one using javascript and one using SAS.

Javascript

I had made a game using canvas and html5. The game had three layers. The background was the bottom layer, some game objects that mostly stayed in the same place were the middle layer and the top layer was a game object that changed with each move. The init function on load drew all the layers. On update, all three layers were updated by calling one function. All was well.

function drawAll() {
draw1();
draw2();
draw3();

}

Then, I made another game the exact same way and I could not get rid of the screen flicker every time the player piece on the top layer moved. I tried clearing the canvas between each re-draw which had solved the problem in the past. Nope. What finally did work, in case you run into this problem yourself, is that I only drew the background in the init function and never re-drew it.

function init(){
layer3 = document.getElementById(“layer3”) ;
layer2 = document.getElementById(“layer2”) ;
layer1 = document.getElementById(“layer1”);
ctx = layer1.getContext(‘2d’) ;
ctx2 = layer2.getContext(‘2d’) ;
ctx3 = layer3.getContext(‘2d’);
window.addEventListener(‘keydown’,getkeyAndMove,false);
startwall() ;
draw1() ;
draw2() ;
draw3() ;

}

function drawall() {
draw2();
draw3();

}

Problem solved. My conclusion was that the second program involved a lot more complicated drawing of objects. Instead of just placing an image here or there, the program needed to compute  collisions, read lines from an array, draw objects and the time it took was noticeable.

SAS

Several times I have written a program that worked wonderfully on a high performance computing cluster but crashed on my laptop, or failed on SAS on demand but worked beautifully on my desktop . The difference in all of those cases was that the processing requirements exceeded the capabilities of the machine. All is not lost in those cases. One pretty obvious but not always feasible solution is to use a different machine. When that isn’t an option, there are workarounds. For example, if I wanted students to analyze an enormous dataset, I could have them analyze the correlation matrix instead of trying to load a 100gb dataset – but that is another post.

Apr

5

fishtank

In a Dilbert cartoon, the pointy-haired boss tells Dilbert,

We need to give our customers what they want.

To which Dilbert replies

What our customers want is better products for free.

Upon reflection, Dilbert and the boss agree to give them a fish bowl screensaver

It has been said before that SAS is just offering on-demand for free to compete with R in the educational market. That may be true, but Microsoft and Adobe want to compete in the educational market, too, and they aren’t offering me free stuff so I say, “Hurray for SAS”.

The three biggest problems I would say SAS had in attracting student and faculty use were:

  1. It was a pain in the ass to install and update
  2. It was too expensive
  3. It only ran on Windows and Unix machines

SAS On-demand was the beginning, with a free version of SAS Enterprise Guide and SAS Enterprise Miner. It was pathetically slow over wireless, though, so much so that I took to recording the instructions in my office  where I had a wired connection and putting the movie on line for students to watch, or playing it in class. Students would try to do the assignments in class, but again, the wireless connection was a major bottleneck. Also, many of my students had Macs and SAS On-Demand with SAS Enterprise Guide only ran on Windows NATIVE (not virtual machines). It was less of a pain in the ass to install but there were occasional problems.

Enter SAS Web Editor. The drawback is that you need to learn programming, but personally, I have come full circle to considering that an advantage rather than a drawback, and so, I believe will my students.

Not only is the Web Editor free but there is nothing to install. It runs in a browser. Before you get all excited, let me point out that the version I am using is free to FACULTY AND STUDENTS IN HIGHER EDUCATION. Registering as a professor took me a few minutes and I was approved that afternoon.

If you are a student, once you register and log in here

https://support.sas.com/ctx3/sodareg/index.html

You can select the course at your university for which you need a SAS On-Demand license. If your professor selected SAS Web Editor, once you have registered all you need to do is click

runclient

Run Client and SAS Web Editor opens in a new window. Not only does it run on a Mac or Windows machine but it also runs on an iPad.

Not one to take anyone’s word for anything, when I was stuck in the theater yesterday, I pulled out my iPad and tried it. The Spoiled One and her friends had gone to an R-rated movie and since she needed a parent to get in, I paid for a ticket, walked in with her and walked out back to the lobby before the movie had a chance to rot my brain. (Suffice it to say that we have different tastes.)

So, here I am with no wi-fi and the original iPad. I figured if it would work on this it would work on anything. First, I started the web editor, just by logging into my account at the link above and then clicking on Run Client. Popped up fine. At first you’ll see your list of projects.

I opened a project I had run before. See below.webeditoripad

Click on the BROWSE button at top left of the screen to see your list of projects again.

I clicked on the little running guy to run my project. It ran in a few seconds and the results popped up. This was a very small job, as you can see, with only 634 records.  I did two frequency procedures, a proportional random sample by strata with proc surveyselect and a proc print – not exactly high intensity programming, but very similar to the type of assignment a student might be doing.

ipadresults

Since I was still sitting there waiting for The Spoiled One’s movie to be over, I used the Web Editor to analyze some dummy data similar to a problem a student was working on, run a one-sample t-test

proc ttest ho = 11 ;

var score ;

in case you were wondering and answer her question.

So, for what the average students would need to do and what the average professor would need to help them, yes SAS Web Editor is a better product, for free.

To my disappointment, no fish bowl screen saver was included.

 

Jan

6

As I mentioned previously, this week’s posts have been inspired by John D. Cook who generously does ELEVEN twitter tip accounts, Monday through Friday.

Don’t get used to it, but since this is the beginning of the year, I thought I would start out all saintly and enthusiastic with some basic tips for people new to SAS On-Demand. Yes, I can see you are excited.

Julia in autopia

Yesterday, I gave an example of a table analysis using the Kaiser-Permanente study of the oldest old. If you are doing a table analysis with a file like this one that has hundreds of variables with enlightening names like FA3_0X1  , it will be a lot easier to use the drag and drop menus if you use the labels instead of variable names.

Window with drop down menu on variable names, show labels clicked

What you want to do is right-click in the pane that has the variable names. A drop-down menu will appear and SHOW NAMES will be clicked. Click on SHOW LABELS, right below it, and the variable labels will show up so you can see that AXDGF7 is something like “Drinks wine with prostitutes”.

(Disclaimer: In fact, the Kaiser-Permanente study of the Oldest Old did not ask patients how frequently they drank wine with prostitutes. Whether or not they should have is a different issue.)

Random tip #2: Finding your site number

Let’s say you are having a problem with SAS On-Demand and you want to contact SAS technical support. You go to fill out the form on line and one of the REQUIRED fields is your site number. Normally, you will find your site number in your SAS log. So, you run a program or task in SAS On-Demand and guess what, no site number.

So you can’t submit the form because you can’t find the site number and you can’t ask how to find the site number because you can’t submit the form.

Do this:

Open SAS On-Demand

Option A: Click on HELP>  About SAS Enterprise Guide

You should get a window that gives your site number, like 0060061234

You can also click on a link and get some more cool information about your configuration, if you are into that sort of thing, which it just so happens, I am.

Option B:

What if that doesn’t work? I haven’t had any problems with the Help menu with SAS On-Demand at Pepperdine but I did once work somewhere that did not have the help files installed for SAS because someone who only appeared to have been put on this earth by God specifically to annoy me, had decided it took up too much space (don’t even ask). So, I was wondering if there was a Plan B.  Why, yes, yes there is.

Go to File > New > Program

Type this:

%put %syssite ;

run;

Click on the green run button.

Look in the log that results and under

%put %syssite ;

You will see a number like 0060061234

That is your site number.

Now, wasn’t that exciting?

Aug

23

Lately, there’s been a lot of talk about making college, or younger, students feel as if they are really getting the same education when teaching online versus in the classroom.

As someone who has taught online since 1997 (yes, you read that right) and has taught the same classes both in the classroom and online, I have a few suggestions.

Online Classes Can be Better than Face to Face

Record Your Lectures

The very first suggestion I have is to record your lectures and make those downloadable. The university where I teach has Blackboard and this is an option. If your school does NOT have that option for whatever web meeting software you have and you have a Mac you can make a screen recording with QuickTime and upload it to a YouTube account.

Share Data Libraries

I teach multivariate statistics and we use some methods that require at least a modest sample size. Having students type in hundreds of records is ridiculous. Even better, I can download and clean data from sites like ICPSR or the California Health Interview Survey.

I upload the codebooks to the class website.

I upload these files to a class directory using SAS Studio. I give my students the LIBNAME with read-only access and they have a data set with thousands or tens of thousands of records all set to analyze.

For assignments where data cleaning is part of it, I give them access to the original data.

Yes, you can get SAS for free

Students can get a SAS studio account for free, run their programs, download and send me both their log and their output. 

Make Cheating Less Tempting

Friends who are new to teaching online say cheating is a real problem. I try to remove the temptation by making it harder. If I give you a dataset with 500 variables and ask you to pick 20, run a factor analysis, write up your results and send me your log and output I can at least see it was run under your account and it’s not going to be the same exact variables as someone else. 

That doesn’t mean a student can’t have paid someone to do it for them or had a relative do it. [I was shocked to read on a forum all the women who said they did their husband’s masters degree homework “Because the degree will help our household income and he works all day.”]

This type of cheating isn’t something you can prevent in face-to-face classes either unless you have the student write all of their papers in front of you.

One way to make cheating less tempting is to have assignments that students can individualize. A change I made for the fall semester is to give two different data sets for each assignment. One is the Monitoring the Future study with survey data from youth, the other is the California Health Interview Survey. 

I try to update these datasets fairly frequently, so I just replaced the 2009 CHIS with the 2018 data set.

So, if you are interested in social science or health analytics, you can pick whatever interests you. Sometimes the most hard core engineering majors pick the MTF study of youth because they have an adolescent at home and are curious about national norms, how adolescents rate their communication with parents, etc.

Still, I would like a third data set with something more marketing or engineering focused. If anyone has a suggestion, hit me up in the comments please.

Have Online Discussion Boards and Don’t Make Them Stupid

These boards should not be just a waste of time. Again, related to the preventing cheating, I often ask questions related to their papers, like,

“What variables are you thinking of using for your factor analysis assignment? Do you see any possible problems with those variables?”

The second part of each question is to ask another question for the next student to answer. 

I’m fortunate that I often have students who are in the same cohort so they know each other and will comment on something related to the other students’s work or interest.

Get to Know Your Students

I taught middle school students this summer in a Game Design Course and it was a blast. (We’re doing it again this fall, if you have a middle schooler you’d like to sign up, click here to get info and put GAME DESIGN in the message). 

Whether middle school or adults I ask them to turn on the camera and say hi the first day just so I know what they look like and their voice. 

Just like an in-person class, I start by asking everyone where they are from, making sure I know how to pronounce their names correctly and ask them to tell me one interesting thing. For the middle school students it might be the name of their dog or that they play saxophone. For the adults it might be that they work at CDC or really want to do research on infant mortality in Nigeria, where they grew up.

If you’re not a jerk, online classes can be better for your students

I have heard of instructors who insist all students have on their camera at all times, not on mute, be dressed appropriately, no distracting background. That’s just stupid. For my adult students, they may have small children running around, they may be making dinner. I don’t care. Why should I? If they miss something, they can replay the video later. This is one way online classes are BETTER for adult students.

I asked everyone to turn their cameras on for this picture

For my middle school students, maybe they are embarrassed about their room, their looks. As someone who has taught middle school, I can tell you that there is almost nothing a middle school student can’t be embarrassed about. Maybe they are lounging on their bed while listening to me. So what? This is a way online classes might be better for younger students.

Also, don’t be a jerk about the chat.

I do read all the chat messages that go on while I’m talking. If it is a question to me, I answer it. Some students feel more comfortable typing/ texting than talking.

My adult students never veer too far off tasks. With the middle school ones, I might need to drop into the chat from time to time and say “Enough with the poop emojis”. Usually, though , their classmates do it for me.

Well, I have lots more ideas but it’s Saturday and I have to finish writing my assignments for next month.

If you’ve been wondering why I haven’t been blogging for four months –

Well, there’s a pandemic, and demand for educational software has spiked, our 7 Generation Games company has upped both users and employees 50%, The Julia Group has more of a demand for online training, analysis and app development so, yeah, been busy.

Jan

19

I’ll be speaking about being a statistical consultant at SAS Global Forum in D.C. in March/ April. While I will be talking a little bit about factor analysis, repeated measures ANOVA and logistic regression, that is the end of my talk. The first things a statistical consultant should know don’t have much to do with statistics.

A consultant has paying clients.

In History of Psychology (it was a required course, don’t judge me) one of my fellow students chose to give her presentation as a one-woman play, with herself as Sigmund Freud. “Dr. Freud” began his meeting with a patient discussing his fee. In fact, Freud did not accept charity patients. He charged everyone. There’s a winning trivial pursuit fact for you.*

Why am I starting with telling you this? Because I have had plenty of graduate students whose goal is “to be a consultant” but they seem to think their biggest problem when they start out is going to be whether they should do propensity score matching using the nearest neighbor or caliper method.

Here are the biggest problems you’ll face:

Let’s start with getting clients. I can think of four ways to do this; referrals, as part of a consulting company, through your online presence and through an organization. I’ve done three of them. First, and most effective, I think, is through referrals. I got my first two clients when professors who did consulting on the side recommended me. I do this myself. If someone can’t afford my fees or I am just booked at the moment, I will refer potential clients to either students, former students or other professionals I know who are getting started as a consultant. It’s not competing with my business. I am never going to work for $30 an hour again and if that’s all that’s in your budget, I understand. If all you need is someone to do a bunch of frequency distributions and a chi-square for you, you don’t need me, although I’m happy to do it as a part of a larger contract.

Lesson number one: Don’t be a jerk.

Referrals mean I’m using my own reputation to help you get a job and so I’m going to refer students who are good statisticians and who I think will be respectful and honest with the client. Don’t underestimate the latter half of that statement.

Lesson number two: It helps if you really love data analysis.

I’d be the first to say that I’m a much nicer person now than when I was in graduate school. Yes, it took me a while to learn lesson one, I am embarrassed to say. However, I really did love statistics and if any of my fellow students had trouble, I was the first person they asked and I was really happy to help. When those students later became superintendents of schools or principal investigators of grants, they thought of me and became some of my earliest clients. Some of my professors also became clients, although those were after I’d had several years of experience.

Lesson number three: Don’t think you are smarter than your clients.

A young relative, who has a Ph.D. In math asked me, “No offense but isn’t what you do relatively easy, like anyone who understood statistics could do it? Why are you so in demand?”
Corollary to this lesson: If you find yourself saying, “No offense” just stop talking right then.

One reason a lot of want-to-be consultants go bankrupt or have to find another line of work is they do think they are smarter than their clients. This manifests itself in a lot of ways so we’ll return to it later, but one way is that they charge much more than the work is worth.

How do you know how much your work is worth?

Lesson number four: Ask yourself, if I had twice as many grants/ contracts as I could do and I was paying someone to do this work, what would I be willing to pay?

That’s a good place to start.

I’ve met a lot of people over the years who charged much more than me and bragged to me about it. In the long run, though, I’m sure I made a lot more money. Clients talk. They find out that you are charging them three times as much as their friend down the block is getting charged by their consultant. You may think you’re getting away with it, but you won’t. You may get paid on those first few contracts but you’ll have a very hard time getting work in the future.

Lesson number five: Know multiple languages, multiple packages

I’ve had discussions with colleagues on whether it is better to be a generalist or a specialist.

I have had a few jobs where they just needed propensity score matching or just a repeated measures ANOVA but those have been the small minority over the past 30 years.

I would argue that even those who consider themselves specialists actually have a wide range of skills. Maybe they are only an expert in SAS but that includes data manipulation, macros, IML and most statistical procedures.

In my case, I would not claim to be the world’s greatest authority on anything but if you need data entry forms created in JavaScript/HTML/CSS, a database back end with PHP and MySQL, your data read into SAS, cleaned and analyzed in a logistic regression, I can do it all from end to end. No, I’m not equally good at all of those. It’s been so long since I used Python, that I’d have to look everything up all over again.

I’ve used SPSS, STATA, JMP and Statistica, depending on what the client wanted. I think I might have even had a couple of clients using RapidMiner. For the last few years, though, the only packages I’ve used have been SAS and Excel. Why Excel? Because that’s what the clients were familiar with and wanted to use and it worked for their purposes. (See lesson three.)

I was really surprised to read Bob Muenchen saying SPSS surpassed R and SAS in scholarly publications. Almost no one I know uses SPSS any more, but, of course, my personal acquaintances are hardly a random sample. I suppose it depends on the field you are in.

I have never used R.

Some people think this is a political statement about being a renegade. Others think it’s because I’m too old to learn new things or in subservience to corporate overlords or some other interesting explanation. (The Invisible Developer, who has been reading over my shoulder, says he never got past C, much less D through Q.)

Since I fairly often get asked why not, I will tell you the real reasons, which is a complete digression but this is my blog so there.

  1. In my spare time that I don’t have, I teach Multivariate Statistics at a university that uses SAS. Since I’m using SAS in my class anyway and need real life data for examples, when a client has licenses for multiple packages and doesn’t care what I use (almost always the case), I use SAS.
  2. About the time that R was taking off, my company was also taking off in a different direction. The Invisible Developer and I own the majority of 7 Generation Games which is an application of a lot of the research done by The Julia Group. When we started developing math games, we needed to learn Unity, C#, PHP, SQL, JavaScript, HTML/CSS. We also needed to analyze the data to assess test reliability, efficacy, etc. I called the analysis piece and told The Invisible Developer I was interested in all of it so I’d do whatever was left. He was really interested in 3D game programming so he did the Unity/C# part. I did everything else. Then, after a few years, I moved to Chile, where the language I most had to improve was my Spanish.
Games in Spanish, English and Lakota

It worked out for me. We have a dozen games available from 7 Generation Games and now we’re coming out with a new line on decision-making.

I mention all this because I want to emphasize there isn’t a single path to succeeding as a consultant. There isn’t a specific language or package you have to learn. There is one thing you absolutely must have, though, and that’s the next post.

* (See Warner, S. L. Sigmund Freud and Money. (1989) Journal of the American Academy of Psychoanalysis. Winter;17(4):609-22)

Jan

6

I need to get a data set into SAS for a course I’m teaching in March. Students like real data and some kind folks were willing to allow their de-identified data to be used. Win-win.

How did I get this data? In a SAS data set with a handy code book? Oh, very funny!

A bunch of people laughing
This blog will pause while we all laugh at your naivete

I received a login to PHPMyAdmin where the data which were definitely not created for my personal convenience reside.

First, I downloaded the data as CSV for Excel. This gave me a file where everything was like this.

re_apply;”consumer_id”;”email”;”counselor”;”gender”;”date_of_birth”;”age”;”primary_disability”;”secondary_disability”;”education”;”member”;”tribe”;”district”;”job_when_entered”;”if_job_earnings”;”earnings_type_before”;”referral”;”other_refer”;”application_date”;”assessment_date”;”eligibility_date”;”ipe_date”;”notify_rights”;”vocational_goal”;”state_vr”;”status”;”closure_date”;”status_type”;”employment_date”;”type_employ”;”start_job_earnings”;”post_job_earnings”;”earnings_type_after”;”intermediate_goal”;”semesters”;”int_completed”;”intermediate_date”;”last_contact_date”;”comment”

Yes, one, long line with everything in quotes and every column separated by a semi-colon. These are the column names but all of the data are in this exact same format as well.

Of course, you COULD upload this file and read it into SAS but that would take time and effort.

OR you can download a regular CSV or ODS spreadsheet file, just pick one of the other options, and then all your data would be in nice columns but you have no header row. Of course, that’s pretty easy to write an input statement in SAS. You just need to type in a hundred or variable names and be sure to have the format correctly specified. Not hard but probably take you more than a minute.

You COULD theoretically download the data as an SQL file and use SQL connect according to some smart and less lazy people on SAS Communities.

And, as some of the posters noted on that forum, not everyone has access to SQL connect.

Or, you could be a completely lazy person like me and fix it all in about 12 clicks.

Here is how:

  1. Download the file as CSV for Excel
  2. Copy and paste the first line, the header, into Word
  3. Replace all the quotes with nothing, using Replace from the EDIT menu
  4. Under the TABLE command select “CONVERT TEXT TO TABLE”. For “Separate text at”, click on Other and put in a semi-colon.
  5. Now you have all of your filenames in nice columns as a table, copy that.
  6. Download the file again as CSV
  7. Insert a row and paste the filenames you copied in Step 5
  8. Save that file as an Excel file
  9. Upload it into SAS Studio
  10. Under TASKS and UTILITIES, select IMPORT DATA, drag your file you uploaded to the window and click on the little running guy.
Converting from text to table
Click 8 or So
Tasks and utilities
All that’s left is to drag the file

Sep

30

The famous statistician, F.N. (for Florence Nightingale) David was a professor at UC Riverside, where I earned my doctorate. My advisor told this story about her:

We were on this dissertation committee – I forget if it was for biology or what, back then, this was a small campus so if you were in statistics you could end up on any committee. So, he gets to the end of his defense, and F.N. David pulls the cigar out of her mouth and says,

“Young man, you believe your numbers far too much.”

The point Dr. Eyman was trying to make to me was that even if you have done every single computation perfectly …

“The government are very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn pleases.”

– Josiah Stamp

What is a conscientious statistical consultant to do?

Start with getting to know your data better than God knows the Bible. Let’s start with analyzing secondary data, for example, IPEDS, that has already been collected. I’ll talk about collecting your own data later. Let me just put in a plug for doing it electronically if possible. Also, make sure your data entry staff know which is the intervention and which is the control group. (You think I’m kidding but I’m not.)

Secondary data analysis: Read the documentation!

You think that is obvious, do you? IPEDS is the Integrated Postsecondary Education Data System, collected by the National Center for Education Statistics. It is my favorite type of data set and the type you almost never get. It includes pretty much the entire population of interest.

If you don’t know these things, you don’t know your data:

This isn’t all you need to know. We’ll talk about specific variables next.

One reason I like IPEDS is that you can be pretty sure everyone reported data because it’s mandatory for any institution who gets federal financial aid. It also includes the U.S. service academies, which are about the only post-secondary institutions who don’t. It also gives you a SAS program for reading the data after you upload it. There are also SPSS and STATA programs.

Another thing I liked about IPEDS is it is, inside and out, one of the best documented data sets I’ve seen. I’d recommend it as an example of how to do things if you are going to be creating data sets for secondary analysis yourself. Don’t get used to it, though, because most of what you’ll find in your career is far worse than this. Here is just a simple example from one data set.

*** Created:    October 2, 2018                                ***;
 *** Modify the path below to point to your data file.        ***;
 ***                                                          ***;
 *** The specified subdirectory was not created on            ***;
 *** your computer. You will need to do this.                 ***;

If you want to analyze it using SAS Studio, now you know that once you’ve uploaded the data, you do need to change the INFILE statement. If you don’t know the full path, ctrl-click (Mac) or right-click (Windows) on the data file and select PROPERTIES

Select Properties to get the path to your file

Change the INFILE statement to what you see in the path, so now it looks like this

infile '/home/your_directory/IPEDS/hd2017.csv' delimiter=',' DSD MISSOVER firstobs=2 lrecl=32736;

You won’t necessarily have the delimiter, etc. It depends on your file. Okay, run it, you have data. Awesome!

When I run frequencies for the IPEDS data, I get 7,153 institutions but the IPEDS methodology report says there are 6,642. What the hell? Looking through the data, I find that 287 institutions were closed in either 2017 or prior. Another 38 were combined with another institution or not to be include for some unspecified reason “out of scope”. There were 41 that were “not primarily post-secondary institutions”, so I dropped those also. Since I’m only interested in individual, active institutions for the research I’m doing, I’m dropping those.

There were 88 institutions that were new in 2017 or had their Title IV (financial aid) eligibility restored. After debating back and forth, I decided to drop those, too. My interest is in developing a baseline of enrollment and retention, which these new institutions will only have for one year.

My point is that I’ve gotten one of the best data sets you could ever find and 7% of the data is inappropriate for my purpose. Does it matter as long as 93% of the data are correct? Well, I definitely think that my results would be less accurate.

My second point is that there is not anything “wrong” with the IPEDS data. I can imagine plenty of circumstances in which one would want to have the data on closed institutions.

These may seem like details, but I am pretty convinced that if you are not a “detail person” you are never going to make it in the long run as a statistical consultants. These details add up fast.

One last thought – if 7% of the data needed to be tossed out before we even got started, and this is an extremely well-funded, well-designed data set, what do you think the average secondary analysis is going to be like?

Jul

16

A twitter storm erupted recently in response to one person’s thread about how to find a 10x engineer . Since I started programming FORTRAN with punched cards back in 1974, was an industrial engineer in the 1980s and now run a software company, I’ve worked with a few people, rightly or wrongly considered to fall into that category. So, I thought I’d weigh in on the original author’s points.

10X Engineers hate meetings

There are only two types of software developers who don’t dislike meetings. New developers don’t mind meetings too much because they have a lot of questions like “who do I talk to if I need access to this repository” or “What version of Unity was used to develop this game I’m supposed to update?” They also have specific questions about why the sound function they wrote is not working and Bob, who wrote similar functions for another game is sitting right there. Another type of developer actually likes meetings because he is complete shit at his job and it gives him an excuse not to be expected to do it.

Every other engineer I have ever met either dislikes meetings or actively hates them. The ones you think don’t dislike meetings are just pretending.

We have a 10-minute meeting every morning at 7 Generation Games. People not in the office drop in online. Everyone complains about it but we do it anyway. Why? Because, for example, I can find out that Adekola actually finished the teacher reports for Making Camp Premium before he left and see an example. Then, I can tell the people in marketing to include that in their discussions with schools. I can also tell one of the developers to take that code and modify it for Tribu Matemática , the Spanish version of Making Camp. In 3 minutes, everyone knows what the reports look like, that they are available and who is working on the next one. This leaves seven minutes for José to ask Bob about the sound function.

10X Engineers have irregular hours and work when other people aren’t around

I can’t think of any software developers who work better when other people are around. Writing code for anything complex requires having a mental model in your head of at least the part you are writing and, hopefully, some of the larger project in which it is used.

I’ve worked with a few people who were hit it out of the park better than anyone else. One definitely was a late night person and preferred to get to work when he got there. However, when crunch time came, he could work 8am to 10pm and code all that time if he had to do it. He wasn’t going to like it, though, who would?

On the other hand, about half of the really top engineers I know – both software and hardware – choose to work 9 to 5, even when telecommuting. The main reason they give is that those hours allow them to spend time with their children or spouse. Contrary to popular belief, the 10x engineers I’ve known tended to be married, although they did seem to get married a little older than the average.

10X Engineers know every line of code that has gone into production

This is just nonsense. I remember when SAS was rewritten in C (yes, I am that old) and hearing that it was something like 3,000,000 lines of code. I am assuming the author meant that these 10x engineers know every line of code WRITTEN BY THEM that has gone into production.

I don’t believe that, either, assuming what he means is that they can recall it immediately and say,

“Yes, in that function beginning on line 683, I pause the audio that’s playing, change the source file to the audio for the next scene, change the image file for the image for the next scene, increment the counter by one and restart the audio”.

If what he means is that they kind of recognize it like that person you met at a conference two years ago and are trying to remember their name, I might faintly agree.

We wrote Spirit Lake: The Game in 2012-2014. NO ONE who worked on that game remembers all of the code in it. I can say this because it was all done by me and The Invisible Developer and he is as good as you’ll ever find.

Here is an experience I share with every software engineer I have ever met, including the very best ones. I look at code and think,

“Who wrote this crap? Please don’t let it be me three years ago.”

10x engineers laptop screen background color is typically black (they always change defaults). Their keyboard keys such as i, f, x are usually worn out faster than of a, s, and e (email senders).

They always change the defaults part is true. One thing for sure all of the best engineers I ever met had in common is they like to mess with things. I only knew two people who had black backgrounds – ever. When I have time I’ll have to post about pseudo-10x engineers. Anyway, neither of those guys are anything special unless weirdness is a category.

Most of the best people I know have either pictures of their family or their favorite activity, like soccer or hiking, or a vacation photo as a background. Usually the e key gets worn out first because it is the most common letter in the English language. People usually name directories, datasets and variables something comprehensible.

My kids and one of my kid's dog. What real 10x engineers laptop backgrounds look like
Their laptop backgrounds look like this, except with their own kids, not my kids, because that would be weird

Is there anything true about a 10x engineer?

Since my 10x merit badge hasn’t come in the mail yet, I don’t have time to address all 10 points from the original thread. There were two points he made that were consistent with my experience.

Most really good engineers aren’t really good interviewers

I could only speculate about why that is true, so I will leave it as that is what I’ve observed. Maybe it’s because they are uncomfortable with exaggeration or with being asked to prove their competence.

10x engineers rarely job hunt

I have found this totally to be true and it makes sense. If you have someone that good in your organization and your management is not made up of complete morons, they are doing all they can to hang on to their best people. Usually, unless they work for morons, people that good are hard to hire away, too, because their current company is doing its best to keep them.

How would I find a 10x engineer?

I wouldn’t, because we are a small company and we can’t afford to pay what someone like that is worth. On rare occasions, we have been super lucky to be able to catch someone great for a short term contract that they just wanted to take for personal reasons.

We find good people and we develop them to be at the top of their field. I think the best way to identify a good software developer in an interview is take a look at their code. Ask them to bring something to the interview and explain how they solved particular problems in the code. Ask why they made the choices they did. If it is a project they know well and are proud of, you’ll get a lot of information. If they say, “I don’t know” a lot, that’s a bad sign. I’ve also found that people who typically “don’t interview well” forget about the interview part, focus on the project and become interested in telling you all about it.

Oh , and for all those people on twitter who said, “I wish you all got as exercised about diversity and inclusion as you do about 10x engineers “

Well, I am way ahead of you, sister. I have a lot to say about women in tech and over on our 7 Generation Games blog, too.

May

27

What are reusable blocks and why do you want to use them?

This can best be explained by an example. Over at 7 Generation Games, we have a new project under way to create organize the hundreds of videos, presentations and activities we’ve developed with our games into a teacher resource site. Most of these fall into one of a few categories. For example, we have 19 math videos from Fish Lake.

Whenever a lot of posts have the exact same structure, you have a use for reusable blocks.

Take a look at this post on the Fractions on a Number Line video.

  1. It has a subheading (h3 tag) with the main point of the video.
  2. This is followed by a short paragraph describing the video, with a background color of light blue.
  3. Next is the video. IMPORTANT – although Gutenberg does allow you to just enter a url and hit return for a video to be shown in a regular post, I found this did NOT work for reusable blocks. When I used embed instead, it worked fine.
  4. After the video is a heading (h2 tag) telling you this video is from an awesome game we make.
  5. Next are two links, one for getting the game for computers,
  6. Another link for the app store for iPads.
  7. Then there is an image from the game and
  8. A short paragraph describing the game.

How to create a reusable block

Select everything you want in the reusable block. In my case, it is all 8 of those blocks listed above. Then, click on the 3 dots at the top of the block menu and in the drop down menu select ADD TO Reusable blocks.

Give it a name, save it and now you have a reusable block.

How to use a reusable block

A reusable block as “copy-and-paste”

You have two options. The first is to just use the block as-is. Say, I just wanted to include an ad in blog posts, or some call to action, like signing up for our newsletter. Then, I could just insert that block like I do any other block – paragraph, image, etc. and this would be pre-populated with the content. Done.

Reusable blocks as templates

The more common option for me is going to be to modify that block, using it as a template. So, I insert the block just like I do any other block. Then, I click on the block I just inserted and select convert to regular block.

Don’t forget to convert to regular block or your edits will be made everywhere you used that block!

Now that I have it converted to a regular block, I can change the first heading, the description and paste in the url for the new video. My post is done. Not only does this save me time, but if I want to hand the task off to someone else, say a new intern, they have a ready-made format.

Another advantage is if I do need to change something everywhere, I can do it with one click. A few years ago, the site we had been using to sell our Mac and Windows games went out of business. It would have been really helpful to have had something like this so that I did not have to go in and change every page where there was a link to the old site.

So, yeah, reusable blocks have converted me to Gutenberg. (Converted , get it? Oh, never mind.)

Fish Lake fractions game with Native American girl stepping on stones across a creek

Like math? You’ll love this game.

Get Fish Lake here for Mac or Windows

or … want Fish Lake for iPad ? Get it in the app store

keep looking »

Blogroll

WP Themes