### Oct

#### 7

# When are you done? The big question for entrepreneurs

October 7, 2014 | Leave a Comment

Every day, every week, I face the same question that all entrepreneurs ask themselves –

“How do you know when you are done?”

Most days, I start work around 10 am and finish about 14 hours later. Usually, I take off an hour for lunch and an hour for dinner, or take a few hours in the middle of the day to get away from the office. Sunday, it was taking my grandchildren to the Natural History Museum and the park. I average 10-11 hours a day, seven days a week. Even then, there is no end in sight to the tasks I want to accomplish, goals I want to achieve. When there’s no time clock to punch, no boss looking over your shoulder, how do you decide when it’s time to hang it up for the day?

One answer is when you are just exhausted and making more mistakes than you are progress. Frankly, the prospect of just working every night until I fall asleep from exhaustion isn’t very appealing. I did that in the year after my husband died, and even though it was probably a preferable (and more profitable) way of coping than drinking myself into a stupor every night, I can tell you that it’s not a lifestyle I would recommend. The reality is that there is never, ever going to be a day at the office when I say,

Okay, that’s it. No more work to do here. Time to head to the beach.

Some people (who are not me), would say that you should take off to celebrate achievements. For example, last week, I

- found out that a project we had worked on for a client had been wildly successful,
- submitted a grant proposal to create a game for English language learners, including receiving written agreement from teachers in three school districts in three different states to assist with development,
- finished 1/4 of the lectures for a course I will be teaching soon,
- made major improvements in one level of the Fish Lake game, which we will be able to use for Spirit Lake as well,
- found out that a huge school district is now using Spirit Lake,
- renewed a consulting contract,
- created css to improve our web pages in the Fish Lake game,
- did the usual stuff of meetings, approving payroll, answering email, reviewing staff tasks on basecamp, updating a few things in the company wiki, approved a couple of employment contracts.

And all of this was accomplished with having spent all of Monday in airports and on planes flying back from Kansas City where I had been as coach for a judo team of seven students from Gompers Middle School. So … did I take off early? No, because I still needed to

- submit a revised budget for a contract,
- submit another revised budget for a grant,
- rewrite the PHP for a client database,
- get ready for an investors’ meeting,
- figure out what is wrong with the gravity in one level where the player is literally walking on air.

My unhelpful point here is that I DON’T necessarily take off to celebrate and I definitely don’t take off when I have something that could be very important to our company, like a meeting with a potential investors during which I want to get as much information (and not look like an idiot) for that time down the road when we do need to bring in outside investors.

What I DO try to to do is stop working by midnight every night. There just seems to be something dysfunctional about not leaving the office the same day you came in, even if you come in at 10 a.m. I don’t take off to celebrate so I can take off when I feel that I need a break.

One thing I can guarantee you for an absolute fact is that you will be less effective if you don’t get enough sleep. You’ll make mistakes you never would have made if you were not so tired. Knowing this, another reason that I try to quit working at midnight so I can be asleep by 2 a.m. That gives me 8 hours to sleep before I get up and hit it again at 10 a.m.

Staying up until 5:30 a.m. as The Invisible Developer sometimes does strikes me as counter-productive. You’re just going to sleep later the next day, so why not just go to sleep now and start up again when you are rested enough to be more effective. Even if I do say so myself, this post I wrote about doing one more thing before you go to bed is worth reading. Often, that one more thing will be to make the list of the things that are a priority for tomorrow. I then can knock off with confidence that I’ll get on those things first thing the next day.

I work hard, I work a lot, but I have learned not to make myself crazy trying to get everything done, because … **at the end of the day, there’s another day. That’s how time works.**

### Oct

#### 6

# How much matrix algebra do statistics students REALLY need?

October 6, 2014 | 6 Comments

Following a discussion using matrix algebra to show computation in a Multivariate Analysis of Variance, a doctoral student asked me,

“Professor, when will I ever use this? Why do I need to know this?”

He had a valid point. I’m always asking myself why I’m teaching something. Is it because it interests me personally, because it is in the textbook or because students really need to know it.

Let’s take some things about matrix algebra we always teach students in statistics.

*What conformable means and why it might matter*

Two matrices are conformable if they can be multiplied together. When you multiply two matrices, the row of the first matrix will be multiplied by the column of the second matrix. You sum the products and that is the first element in the matrix. You repeat this until you have multiplied all of the rows in the first matrix by all of the columns in the second.

So — you can multiply a 3 x 2 matrix by a 2 x 3 matrix but not vice versa.

Multiplying a matrix of dimension a x b and a matrix of dimension c x d will give you a resulting matrix with a rows and d columns, that is, of dimensions a x d .

This can give you results that sometimes seem counter-intuitive, like that the product of a 1 x 3 matrix and a 3 x 1 matrix is a 3 x 3 matrix.

It may seem weird that the result of matrix multiplication can either be a larger matrix than both of the matrices you multiplied, or smaller than both of them, but there it is.

If both matrices are square, that is, of dimension n x n, then the resulting product will also be an n x n matrix.

And, of course, any matrix can be multiplied by its transpose because the transpose of an m x n matrix will always be n x m .

If a square matrix is of full rank, it means that none of the rows are linearly dependent. If you DO have linear dependence, it means you have redundant measures. Now, I could go on to prove this mathematically and all of it is very interesting to me.

I question, though, whether you really need to know anything about matrix algebra to understand that redundant measures are a bad thing.

Do you need matrix algebra to explain that we are going to apply coefficients (do you even need to refer to it as a vector?) to the values of each variable for each record and get a predicted score such that

predicted score = b0 + b1X1 + b2X2 …. b.Xn

When I was in graduate school, calculators that did statistical analyses, even as simple as regression, cost a few hundred dollars which was the equivalent of three months of my car payment. Computer time was charged to your department by the hour. So … my first few courses, I did all of my homework problems using a pencil and paper, transposing and inverting matrices – and it was a huge pain in the ass.

Then, I got a job as a research assistant and one of the perks was hours of computer time. I thought I’d died and gone to heaven. It took me less than half an hour to get all of my homework done using SAS (which ran on a mini-computer and spit out printouts that I had to walk across campus to pick up).

My students are learning in a completely different environment. So … do they need to learn the same things in the same way I did? This is a question I ponder a lot.

### Oct

#### 3

# USDA is the biggest proponent of women in tech

October 3, 2014 | 1 Comment

**I’m pretty certain that I’m a woman in technology.**

Last night, I was using SAS on a virtual machine through a remote desktop connection to prepare data from the National Hospital Discharge Survey for use in examples of MANOVA and multinomial logistic regression.

Today, I was working on improving animation in the Javascript for a browser-based game that leads into the 3-D portion of an adventure game I designed to teach fractions.

Next week, I will start on a contract to completely re-do the PHP/ MySQL database for a client to bring it to something more secure and up to date.

Oh, and I also was reviewing my notes for the graduate courses in biostatistics and advanced multivariate statistics that I’m teaching this fall.

Pretty certain that by any standard – writing code, founding companies, graduate degrees, university appointment, successful Kickstarter – I am definitely a woman in tech/ STEM whatever the day’s buzzword.

I read SO many articles, blog posts, tweets about the need for women in tech, women-led start-ups, women entrepreneurs.

*If you ask me, the U.S. Department of Agriculture is the greatest proponent of women in tech that there is, because they have actually put up money and funded us to do a prototype of an adventure game that teaches math.*

When results from that were positive, they funded us again with a Phase II Small Business Innovation Research award to develop the games for commercialization.

I have written here before about the troubling nature of the Black Girls Code, Latina Girls Code emphasis that seems to completely overlook the grown women who are here now. I am NOT saying those aren’t good programs. I assume they are but I have no personal experience. What I am saying is pretty much what I said in January.

It seems to me that when people are looking at minorities or women to develop in their fields, they are much more interested in the hypothetical idea of that cute 11-year-old girl being a computer scientist some day than of that thirty-something competing with them for market share or jobs. If there are venture capitalists or conference organizers or others out there that are sincerely trying to promote WOMEN who code, not girls, I’ve never met any.

(Since then, I have met a couple of conference organizers.)

I suppose Ada Lovelace was cool – my two-year-old granddaughter has a shirt with her picture on it. Still, I don’t think a trending hashtag of #fuckyeahadalovelace did anything for me as a woman in tech.

**You know what helped me as a woman in tech? Seed money from the USDA.** You can see what we did with it here at our 7 Generation Games site.

One thing Sheryl Sandberg got right in her book, Lean In, was that women tend to be judged on their accomplishments where men are judged on their potential. Of course, you also don’t want to be “too old” to be an innovator so by the time women have those accomplishments, they are past their prime as entrepreneurs according to those VCs who believe that people over 30 are too old to do a start-up.

It’s hard for me to complain about my life when my morning starts out with reading technical books with lines like, *“Figure 1 shows the sprite with the red and green blood particles for player and zombie”.*

My point is that our company is in the situation we are in not because of any “help minorities code” program but because USDA and our backers on Kickstarter gave us cold, hard cash to develop our products.

Want to help women in tech? Back them on Kickstarter. Buy their products. Tweet about their products and companies to help their marketing. Invest in their companies.

**USDA got it right.**

Thank you.

### Oct

#### 3

# SAS Tricks for Massaging Data into Shape

October 3, 2014 | Leave a Comment

Today, I was thinking about using data from the National Hospital Discharge Survey to try to predict type of hospital admission. Is it true that some people use the emergency room as their primary method of care? Mostly, I wanted to poke around wit the NHDS data and get to know it better for possible use for teaching statistics. Before I could do anything, though, I needed to get the data into a usable form.

I decided to use as my dependent variable the type of hospital admission. There were certain categories, though, that were not going to be dependent on much else, for example – if you are an infant born in a hospital, your admission type is newborn. I also deleted the people whose admission type was not given.

The next question was what would be interesting predictor variables. Unfortunately, some of what I thought would be useful had less than perfect data, for example, discharge status, about 5% of the patients had a status of “Alive, disposition not stated”.

I also thought either diagnostic group or primary diagnosis would be a good variable for prediction. When I did a frequency distribution for each it was ridiculously long, so I thought I would be clever and only select those diagnoses where it was .05% or more, which is over 60 people. Apparently, there is more variation in diagnosis than I thought because in both cases that was over 330 different diagnoses.

Here is a handy little tip, by the way –

PROC FREQ DATA = analyze1 NOPRINT ;

TABLES dx1 / OUT = freqcnt ;

PROC PRINT DATA = freqcnt ;

WHERE PERCENT > 0.05 ;

Will only print out the diagnoses that occurred over the specified percentage of the time.

I thought what about the diagnoses that were at least .5% of the admissions? So, I re-ran the analyses with 0.5 and came up with 41 DRGs. I didn’t want to type in 41 separate DRGs, especially because I thought I might want to change the cut off point later, so I used a SAS format, like this. Note that in a CNTLIN dataset, which I am creating, the variables MUST have the names fmtname, label and start.

Also, note that the RENAME statement doesn’t take effect until you write out the new dataset, so your KEEP statement has to have the old variable name, in this case, drg.

Data fmtc ;

set freqcnt ;

if percent > 0.5 ;

retain fmtname ‘drgf’ ;

retain label “in5” ;

rename drg = start ;

keep fmtname drg label ;

Okay, so, what I did here was create a dataset that assigns the formatted value of in5 to everyone of my diagnosis related groups that occurs in .5% of the discharges or more.

To actually create the format, I need one more step

proc format cntlin = fmtc ;

Then, I can use this format to pull out my sample

DATA analyze2 ;

SET nhds.nhds10 ;

IF admisstype in (1,2,3) ;

IF dischargestatus in (1,3,4,6) & PUT(drg,drgf.) = “in5” then insample = 1 ;

ELSE insample = 0 ;

I could have just selected the sample that met these criteria, but I wanted to have the option of comparing those I kept in and those I dropped out. Now, I have 71,869 people dropped from the sample and 59,743 that I kept. (I excluded the newborns from the beginning because we KNOW they are significantly different. They’re younger, for one thing.)

So, now I am perfectly well set up to do a MANOVA with age and days of care as dependent variables. (You’d think there would be more numeric variables in this data set than those two, but surprisingly, even though many variables in the data set are stored as numeric they are actually categorical or ordinal and not really suited to a MANOVA.)

Anyway …. I think that MANOVA will be one of the first analyses we do in my multivariate course. It’s going to be fun.

### Oct

#### 2

# Matrix Algebra, Just Because

October 2, 2014 | 1 Comment

I was talking to a friend of mine today who had taken a test for a new job recently and he had a hard time with the math portion of it. We were in college about the same time and he did perfectly fine in math, but it had been a while. This got me to thinking that I should review things like matrix algebra from time to time, just because it has been a while since I had any need to multiply a matrix without a computer. Well, actually, I can’t imagine that I will ever have such a need but since I’m teaching multivariate statistics and the textbooks generally have a lot of matrix algebra, I thought I should brush up on it whether I ever need it or not.

I had the normal equations for regression drilled into my brain in graduate school and there was a time in my life when I actually had spare time when I found solving systems of linear equations something amusing to do. All of that was a very long time ago.

So …. as I sit here thinking what do my students need to know, I run into the Goldilocks problem yet again. Nothing seems just right. Teaching multiplying a scalar by a matrix seems a waste of time, no matter how brief. All you do is multiply every number in the matrix by that value. Okay, got it.

They should know what an Identity matrix is. This could actually have some useful implications in statistics. If your correlation matrix is close to an identity matrix, with 1 in the diagonals and 0s in the off-diagonal then it tells you that your variables are uncorrelated. If you analyzed a matrix of random data, this is exactly what you would expect to get.

If you multiply a matrix by the identity matrix, I, you are going to get the original matrix as a result, hence the name, identity matrix.

IA = A

This is analogous to the identity property of scalar (that is, regular numbers, not matrices) multiplication that 1X = X

The determinant of a matrix is, for a 2 x 2 matrix, of this form

a b

c d

is equal to

(ad – bc)

To find the inverse of a matrix, the reciprocal of the determinant, that is 1 / (ad-bc), in the case of our same 2 x 2 matrix is multiplied by the following matrix

d -b

c -a

Here is a really good Khan Academy video on finding the inverse of a matrix.

This is particularly important in statistics because you will occasionally get a message on your output that the “determinant is zero” and it would be helpful to you if you understood what that meant and why it was important.

One important point here is that you need the determinant to find the inverse of a matrix. For example, to find the vector of regression coefficients you would use this equation

Notice here that you need to take the inverse of the product of the transpose of the X matrix and the X matrix. What if the determinant is zero? Well, you can’t divide by zero – SO THERE IS NO SOLUTION.

At this point, you want to start to chase down why the determinant is zero. Do you have redundant measures? Is there no variance in the sample?

All of this is very interesting to me personally, but aside from that, I keep asking myself whether the students really need an in-depth understanding of matrix algebra when it is all done by a computer. I really don’t know the answer to that, which is why I keep thinking about it.

### Oct

#### 1

# Flickering screens, stalled machines and not working like it used to

October 1, 2014 | Leave a Comment

One definition of insanity is doing the same thing over and over, expecting different results. One thing that can drive programmers insane is doing the same thing over again and GETTING different results.

In a past life, working in tech support, I learned that whenever anyone calls and says,

I did it exactly like your example and it didn’t work for me.

– they are lying.

In my experience, when you have the same programming statements but get different results, something else is always different and that something is often the demands put on the system.

How can that be if your statements are the same? Let me give two examples, one using javascript and one using SAS.

**Javascript**

I had made a game using canvas and html5. The game had three layers. The background was the bottom layer, some game objects that mostly stayed in the same place were the middle layer and the top layer was a game object that changed with each move. The init function on load drew all the layers. On update, all three layers were updated by calling one function. All was well.

function drawAll() {

draw1();

draw2();

draw3();

}

Then, I made another game the exact same way and I could not get rid of the screen flicker every time the player piece on the top layer moved. I tried clearing the canvas between each re-draw which had solved the problem in the past. Nope. What finally did work, in case you run into this problem yourself, is that I only drew the background in the init function and never re-drew it.

function init(){

layer3 = document.getElementById(“layer3”) ;

layer2 = document.getElementById(“layer2”) ;

layer1 = document.getElementById(“layer1”);

ctx = layer1.getContext(‘2d’) ;

ctx2 = layer2.getContext(‘2d’) ;

ctx3 = layer3.getContext(‘2d’);

window.addEventListener(‘keydown’,getkeyAndMove,false);

startwall() ;

draw1() ;

draw2() ;

draw3() ;

}

function drawall() {

draw2();

draw3();

}

Problem solved. My conclusion was that the second program involved a lot more complicated drawing of objects. Instead of just placing an image here or there, the program needed to compute collisions, read lines from an array, draw objects and the time it took was noticeable.

**SAS**

Several times I have written a program that worked wonderfully on a high performance computing cluster but crashed on my laptop, or failed on SAS on demand but worked beautifully on my desktop . The difference in all of those cases was that the processing requirements exceeded the capabilities of the machine. All is not lost in those cases. One pretty obvious but not always feasible solution is to use a different machine. When that isn’t an option, there are workarounds. For example, if I wanted students to analyze an enormous dataset, I could have them analyze the correlation matrix instead of trying to load a 100gb dataset – but that is another post.

« go back## Blogroll

- Andrew Gelman's statistics blog - is far more interesting than the name
- Biological research made interesting
- Interesting economics blog
- Love Stats Blog - How can you not love a market research blog with a name like that?
- Me, twitter - Thoughts on stats
- SAS Blog for the rest of us - Not as funny as some, but twice as smart. If this is for the rest of us, who are those other people?
- Simply Statistics, simply interesting
- Tech News that Doesn’t Suck
- The Endeavor -John D Cook - Another statistics blog