Dec

31

I read a lot. This year, I finished 308 books  on my Kindle app, another dozen on iBooks, a half-dozen on ebrary and 15 or 20 around the house. I don’t read books on paper very often any more. It’s not too practical for me. I go through them at the rate of about a book a night, thanks to a very successful speed reading program when I was young (thank you, St. Mary’s Elementary School). Don’t be too impressed. I don’t watch TV and I read a lot of what I colleague called, “Junk food for the brain”. I read a bunch of Agatha Christie novels, three Skullduggery Pleasant books, several of the Percy Jackson and the Olympian books. Yes, when it comes to my fiction reading, I have the interests of a fourteen-year-old girl. Trying to read like a grown up, I also read a bunch of New York Times bestseller novels and didn’t  like any of them.

So, I decided to do my own “best books list” based on a random sample of one, me, and make up my own categories.

Because I taught a course on multivariate statistics,  I read a lot of books in that area, and while several of them were okay, there was only one that I really liked.

The winner for best statistics book I read this year, 

Applied logistic regression, 3rd Edition, by David Hosmer, Stanley Lemmeshow and Rodney Sturdivant.

I really liked this book. I’m not new to logistic regression, but I’m always looking for new ideas, new ways to teach, and this book was chock full of them.  What I liked most about it is that they used examples with real data, e.g., when discussing multinomial logistic regression, the dependent variable was type of placement for adolescents, and one of the predictor variables was how likely the youthful offender was to engage in violence against others. It is a very technical book and if you are put off by matrix multiplication and odds ratios, this isn’t the book for you. On the other hand, if you want any in depth understanding of logistic regression from a practical point of view, read it from the beginning to end.

Best SAS book  I read this year …

Let me start with the caveat that I have been using SAS for over 30 years and I don’t teach undergraduates, so I have not read any basic books at all. I read a lot of books on a range of advanced topics and most of them I found to be just – meh. Maybe it is because I had read all the good books previously and so the only ones I had left unread lying around were just so-so. All that being said, the winner is …

Applied statistics and the SAS programming language (5th Ed), by Ronald Cody and Jeffrey Smith

This book has been around for eight years and I had actually read parts of it a couple of years ago, but this was the first time I read through the whole book. It’s a very readable intermediate book. Very little mathematics is included. It’s all about how to write SAS code to produce a factor analysis, repeated measures ANOVA, etc. It has a lot of random stuff thrown in, like a review of functions, and working with date data. If you have a linear style of learning and teaching, you might hate that. Personally, I liked that about it. This book was published eight years ago, which is an eon in programming time, but a chi-square or ANOVA have been around 100 years, so that wasn’t an issue. While I don’t generally like the use of simulated data for problems in statistics, for teaching this was really helpful because when students were first exposed to a new concept they didn’t need to get a codebook, fix the data. For the purpose of teaching applied statistics, it’s a good book.

Best Javascript programming book I read this year

I read a lot of Javascript books and found many of them interesting and useful, so this was a hard choice.

The jQuery cookbook, edited by Cody Lindley

was my favorite. If you haven’t gathered by now, I’m fond of learning by example, and this book is pretty much nothing but elaborate examples along the lines of , “Say you wanted to make every other row in a table green”. There are some like that I can imagine wanting to do and others I cannot think of any need to use ever. However, those are famous last words. When I was in high school, I couldn’t imagine I would ever use the matrix algebra we were learning.

Best game programming book I read this year

Again, I read a lot of game programming books. I didn’t read a lot of mediocre game programming books. They all were either pretty good or sucked. The best of the good ones was difficult to choose, but I did

Building HTML5 Games by Jesse Freeman

This is a very hands-on approach to building 2-D games with impact, with, you guessed it, plenty of examples. I was excited to learn that he has several other books out. I’m going to read all of them next year.

So, there you have it …. my favorite technical books that I read this year. Feel free to make suggestions for what I ought to read next year.

 

Dec

24

My Success Is Personal

December 24, 2014 | 1 Comment

One of the studies cited in Sheryl Sandberg’s book, Lean In, that really resonated with me was the one that said we judge women in the work place based on their accomplishments but men on their potential.

Women are judged on accomplishments, Men on potential.

This phenomena is one of the (many) reasons the whole bewailing the lack of women in tech rubs me the wrong way.

Methinks you all protest too much.

While the main tenet of Sandberg’s book is that women need to do more – raise their hands, speak up – she seems to gloss over a little the reasons women do not. I am not disagreeing with her, nor am I accusing her of ignoring barriers.  I just think with her privileged background she does not see them as clearly and personally as I do.

Let’s take the Heidi/ Howard study, one of a dozens of similar analyses – where subjects are given identical resumes but with names of either Heidi or Howard. Although they are identical on paper and the judges have never met them (because they don’t actually exist!), they are significantly more likely to rate Howard as someone they would hire compared to Heidi.

Let that sink in a moment – even when men and women have identical accomplishments, people are more likely to hire a man. This is the exact opposite of affirmative action.

While we hear a lot about how women’s own lower expectations hold them back – they are less likely to aspire to the C-suite, less likely to apply to jobs unless they meet 100% of the qualifications, less likely to assume they can just “learn on the job” –  we hear very little about why women might have these expectations.

In fact, from the research Sandberg and others cite, it appears their lower expectations are a realistic assessment. I am the last person in the world to say that we should ACCEPT those barriers to women, but accepting them and denying they exist are two different animals.

In my case, now add being female to being Hispanic and way over 30. This is in an environment that calls itself a meritocracy but still feels it is perfectly acceptable to say that anyone over 32 is too old to start a company. Evidence of ageism in Silicon Valley is not hard to find.

Yet, we continually blame those “others”, who are not white or Asian, not male, not under 30, didn’t drop out of  Stanford or Harvard for not having the qualifications, for not leaning in or for ‘choosing a lifestyle that includes family’.

I call bullshit on all of this.

Want to hear some accomplishments? I graduated from Washington University in St. Louis at age 19. Then I went on to an MBA at the University of Minnesota which I chose because it had an outstanding judo team in the area. I finished my MBA at 21. By 26 I had become the first American to win the world judo championships, while working full-time as an engineer at General Dynamics. I also had my first child. By 31 I finished another masters and Ph.D. , wrote my first computer game to teach math, dropped that idea since computer games were mostly text-based and started my first statistical consulting company. Over the next 25 years, I founded or co-founded three more companies,  received tens of millions of dollars in grant funding, wrote a dozen articles for scientific journals, wrote a few book chapters, presented at so many technical conferences that I have lost track on everything from logistic regression to analysis of data from non-random stratified surveys to effectiveness of blended instruction methods with non-traditional students.

When I wanted to start a company to make educational games, no investors, incubators or accelerators were interested, so I went out and obtained around $600,000 through grant-funding, Kickstarter and now sales of our first game.

This article by Carlos Bueno on the clique of the tech industry describes our experiences to a T.

Even with traction, a product on the market, we didn’t see any interest from investors or accelerators so we went ahead and developed a second game, due for release next month, and we are just starting on our third game. We also applied for another grant, which is under review, and we are planning a Kickstarter campaign for early next year. One reason we do all of this is because I KNOW that our company will be judged on accomplishments instead of potential.

Tell me what you do in 30 seconds or I’m moving on.

Our first prototype of our first game needed SO much additional work. They all do. We made hundreds of improvements and all of those went into our second game. That second game is much better than the first and that pretty much always happens, too.Our next games are web and mobile apps. We had reasons for doing it this way. One is that we started out working with schools where we had connections and they did not have iPads and they weren’t in Silicon Valley. They were on Indian reservations in North Dakota and in low-income communities in southern California. Thanks to the support from these institutions we were able to collect data on market needs and effectiveness.I could tell it all to you in two minutes, but I often see people shutting down before I even start talking. I don’t look or dress like Mark Zuckerberg. If anything, I look like Mark Zuckerberg’s maid, or maybe the person that checks him in at the hotel. It is all my own fault, of course, because I have been known to wear a suit.

Being successful is personal to me.

I intend to prove that we can make games that are fun to play for any age and that make you smarter.  I intend to prove that you can make a lot of money meeting real needs for the vast majority of people, not just making sites where people post pictures of how wasted they got last night, but that actually improve their lives.

I understand completely those women who “give up”, “lower their expectations” because they are told directly and indirectly 100 times a day that they are not the chosen ones. Who do they think they are? Their game design has flaws. They are using Unity 3D instead of (insert anything else here) , they are not creating a multi-player game, they need a warm introduction to a venture capitalist. No matter how qualified you are, there will be a base you haven’t covered, a programming language you haven’t learned, some reason that you aren’t qualified. Basically, this is our table in the lunchroom and you can’t sit with us.

This may make me sound crabby and grinch-like but it is the opposite, because it is personal and I am going to succeed.

Just watch me.

==========

WANT TO SEE OUR FIRST GAME?

You can purchase Spirit Lake: The Game here, give it as a gift or donate it to a classroom or school.Anyone who purchases this week can buy Fish Lake, our next game, at 50% off. We’ll send you a discount code in your email next month.mapYou can download a free demo of the game here.

Dec

18

Two girls talking

(There may even be a part two, if I get around to it.)

Let me ask you a couple of questions:

1. Do you have more than just one dependent variable and one independent variable?

2. If you said, yes, do you have a CATEGORICAL or ORDINAL dependent variable? If so, use logistic regression. I have written several posts on it. You can find a list of them here. Some involve Euclid, marriage, SAS and SPSS. Alas, none involve a naked mole rat. I shall have to remedy that.

3. You said yes to #1, multiple variables, but no to number 2, so I am assuming you have multiple variables in your design and your dependent variable is interval or continuous, something like sales for the month of December, average annual temperature or IQ. The next question is do you have only ONE dependent variable and is it measured only ONCE per observation? For example, you have measured average annual temperature of each city in 2013 or sales in December , 2012. In this case, you would do either Analysis of Variance or multiple regression. It doesn’t matter much which you do if you code it correctly. Both are specific cases of the general linear model and will give you the same result. You may also want to do a general linear MIXED model, where you have city as a random effect and something else, say, whether the administration was Democratic or Republican as a fixed effect. In this case I assume that you have sales as your dependent variable because contrary to the beliefs of some extremists, political parties do not determine the weather. Generally, whether you use a mixed model or an Ordinary Least Squares (OLS) plain vanilla ANOVA or regression will not have a dramatic impact on your results unless the result is a grade in a course where the professor REALLY wants you to show that you know that school is a random effect when comparing curricula.

4. Still here? I’m guessing you have one of two other common designs. That is, you have measured the same subjects, stores, cities, whatever, more than once. Most commonly, it is the good old pretest posttest design and you have an experimental and control group. You want to know if it works. If you have only tested your people twice, you are perfectly fine with a repeated measures ANOVA. If you have tested them more than twice, you are very likely to have grossly violated the assumption of compound symmetry and I would recommend a mixed model.

5. All righty then, you DO have multiple variables, they are NOT categorical or ordinal, your dependent variable is NOT repeated, so you must have multiple dependent variables. In that case, you would do a multivariate Analysis of Variance.

Some might argue that logistic regression is not a multivariate design. Other people would argue with them that, assuming your data are multinomial, you need multiple logit functions so that really is a type of multivariate design. A third group of people would say it is multivariate in the ordinal or multinomial case because there are multiple possible outcomes.

Personally, I wonder about all of those types of people. I wonder about the amount of time in higher education spent in forcing students to learn answers to questions that have no real use or purpose as far as I can see.

On the other hand, while knowing whether something falls in the multivariate category or not probably won’t impact your life or analyses, if you treat time as an independent variable and analyze your repeated measures ANOVA with experiment and condition as a 2 x 2 ANOVA, you’re screwed.

Know your research designs.

Dec

12

There may come a day (shudder) when I am called upon to find what my mother refers to as “a real job”. I’m not sure how I would go about it. For the past 30 years, here is how my career has gone.

I think one can clearly detect a pattern here, mainly that I should spend more time walking in doors to buildings.

When getting a new job, I’ve generally been in the work equivalent of “married but looking”. I know that sounds horrible but what I mean is that I have had a job that I was considering getting out of, but I didn’t necessarily want the people at the job to know that.

This problem is common to people in any field, but I think those in analytic jobs have another problem.

Most of us did not come here by the approved route that Human Resource offices think we should. Personally, I have a B.S. in Business Administration, an MBA, an M.A. and a Ph.D. in Educational Psychology, where I specialized in Applied Statistics and Psychometrics (Tests & Measurement). Along the way, I have had more courses in statistics than anyone with a masters in the subject, I have worked for thirty years programming in multiple languages – SAS, PHP and javascript mostly, with a few early years of Fortran, Basic and some defunct languages. I’ve taught courses in statistics and programming at all levels from undergraduate through doctoral students. Yet …. I do not have a “B.S. in Computer Science” or whatever the requirement du jour is.

Enter analyst finder.  This is the new company started by Art Tabachneck of SAS fame. If you’ve been using SAS for any length of time at all, you’ve run across his papers and if you live in Canada and drive a car you have been affected by his work. He uses SAS to set automobile insurance rates.

I checked out the site and it takes less than 15 minutes to fill out a form to be included in their data base. The really cool thing is that it asks about so many areas of expertise – what industries you have had experience, are you familiar with SAS, SQL, ANCOVA … it is a very, very long list – but you can just check off the boxes that apply  to you.

If I was actually looking for a job, I might have spent a little time filling in the “essay questions” that allow you to expand on your credentials as well.

How it works

Currently, Art is compiling a database of analysts. Once this is of reasonable size, employers will be able, for a very modest fee  – around $300 – to submit position descriptions. Analysts who match those descriptions will be contacted and asked if they are interested. The 20 names with the closest match who have expressed interest will be sent to the employer with contact information.

As an employer, it sounds like a great service. If I’m ever in the market for a “real job”, as an employee, it’s the first place I would hit up.

So … go check it out. It’s totally free to analysts, which is very broadly defined. If you’re interested, download the form, fill it out and send it back.

It’s a more scientific method for running around the city walking through doors hoping you run into someone who offers you a job.

Speaking of which, I need to be walking in the door of my office in less than 8 hours, so I guess I’ll call it a night.

 

Dec

9

This is the most depressing chart I have seen in a long time. Below are the results of our pretest on knowledge of fraction operations of 322 students in grades 3 through 7, attending schools on and adjacent to American Indian reservations.

chart of fractions test results

These are questions like,

“Drag 6/1 to the correct spot on the number line.”

Which was one of only two questions that at least 50% of the children answered correctly.

or

Identify the letter that marks 7/8 on a number line

14% of the children answered that right.

Then there are the word problems,

“Bob and Ted painted a wall. Bob painted 1/5 of the wall and Ted painted 2/5 of the wall. How much of the wall is left to paint?”

38% of the children answered that correctly.

Looks like they did better on item 7, which asks which of these statements is true

5/6 < 3/4
2/8 < 1/4
3/6 = 6/12
2/3 = 4/5

26% of them got that correct. Guess what? That was one of the few multiple choice items on the test, so random guessing would have gotten it correct 25% of the time.

This is a test of what is ostensibly third- through fifth-grade math. Two-thirds of the test is at the fourth-grade level or below. As our results indicate, the majority of the students who took the test would not understand what that statement means.

For the 163 fifth-graders who took our pretest, the mean score was 28%.

For the 114 fourth-graders, the mean was a dismal 14.7%.

It wasn’t that the students didn’t try. I looked and there were very few places they left the items blank. They simply did not know.

These students came from several different schools, and while there may be differences between schools,  there is nothing to suggest one school with abysmal results pulled down all of the others.

I called our lead cultural consultant, Dr. Erich Longie, out at Spirit Lake, and told him that I was concerned about presenting these results to the schools that they might want to shoot the messenger. After all, it is important to us that these schools continue to provide us their input and guidance. He told me not to worry about it too much.

“They know,” he told me, “As someone who has been a teacher and administrator in schools on the reservations, I’m not surprised by the results and I can’t imagine these schools will be, either. What we all ought to be worried about is making sure that the post-test scores don’t look like this.”

So … students will start playing Fish Lake in the schools next month. No pressure here.

pike

Excuse me while I get back to work.

Dec

3

I was going to call this new category for my blog

“Mama AnnMaria’s advice on not getting your ass fired”  but it turned out to be too long to fit in the box.

It may surprise young people in the work place to find out that people who admit to having screwed up are often valued more as employees than those who are blameless.

Who cares whose fault it is?

One of the things that drives me crazy is when the first thing (and sometimes the second and third thing) an employee does in response to a problem is to find proof that it was not his or her fault. There are a whole lot of reasons why this is stupid, bad and will eventually get your ass fired.

Are you exclaiming.

What? Why would you fire the one person who never makes a mistake?

Well, for starters, you are clearly delusional. Everybody makes mistakes so if you are convinced you NEVER make mistakes, it is never your fault, then you have a tenuous grasp on reality that you may suddenly lose one day and begin mowing down your co-workers with an Uzi, convinced that they are evil demon zombies out to eat your non-mistake-making perfect brain. As a responsible employer, I cannot take that chance.

Next is the fact that you are wasting time and energy. You could have found the missing data and gotten it to Dr. Cflange. Instead, you put your effort into finding that email from seven months ago where Bob said we didn’t need to worry about sending the data to Dr. Cflange to prove that it wasn’t your fault that the data was not sent to our collaborator, after all, Bob told you not to bother. So, here we are, three hours later and Dr. C still hasn’t gotten the data. Besides, the fact that Bob told you that seven months ago when Dr. Cflange was in Uzbekistan does not absolve you of responsibility of sending out that data any time until the end of the world. Plus, Bob hates you now.

Which brings me to my next point – if you are always claiming you are blameless, then by implication, you are blaming someone else. Your boss is not stupid.

broken window

It’s like that time when my mom came home and the front window was broken. She asked what happened and we all swore up and down that we had nothing to do with it. She asked,

“So, you were all just standing around and the glass just fell out of the window?”

We all swore that yes, it had happened exactly like that.

(Mom, if you are reading this, it wasn’t me that pushed one of the Slattery boys into the window. Just so you know.)

Unlike me, who did not throw said sibling under the bus, if you are pointing at Bob and saying,

“It was him, it’s his fault, not me!”

Then, guess how likely Bob is to be inclined to help you out in the future. So … people who are always blaming everyone around them are not going to have as good teamwork with their co-workers.

Listen carefully here, because this next part is really important. Let’s assume the people you work with are not idiots, that there is a reason you are working for them instead of them working for you. Let’s call that reason -“experience”. Not being idiots, your bosses realize that everyone makes mistakes.

Employers are not looking for people who never make mistakes. Those people don’t exist. They are looking for people who can fix problems.

Final two reasons never taking responsibility for any mistake is going to eventually get your ass fired –

If every time an issue comes up it’s like an argument before the Supreme Court to get you to address it because you are so involved in gathering your evidence why it was not your fault, eventually people will quit pointing out problems to you because it’s just not worth the hassle.

If you never believe that any problem is your fault, then you will never get any better at preventing them, because none of the problems that occur have anything to do with you.

The most impressive interactions I have with employees often begin like this:

“That was my mistake that X happened. I would like to take the responsibility of fixing it by doing Y.”

Those people are probably never going to get their asses fired.

Now you know. Act accordingly.

 

Dec

2

What if you wanted to turn your PROC MIXED into a repeated measures ANOVA using PROC GLM. Why would you want to do this? Well, I don’t know why you would want to do it but I wanted to do it because I wanted to demonstrate for my class that both give you the same fixed effects F value and significance.

I started out with the Statin dataset from the Cody and Smith textbook. In this data set, each subject has three records,one each for drugs A, B and C. To do a mixed model with subject as a random effect and drug as a fixed effect, you would code it as so. Remember to include both the subject variable and your fixed effect in the CLASS statement.

Proc mixed data = statin ;
class subj drug ;
model ldl = drug ;
random subj ;

To do a repeated measures ANOVA with PROC GLM you need three variables for each subject, not three records.

First, create three data sets for Drug A, Drug B and Drug C.

Data one two three ;
set statin ;
if drug = ‘A’ then output one ;
else if drug = ‘B’ then output two ;
else if drug = ‘C’ then output three ;

Second, sort these datasets and as you read in each one, rename LDL to a new name so that when you merge the datasets you have three different names. Yes, I really only needed to rename two of them, but I figured it was just neater this way.

proc sort data = one (rename= (ldl =ldla)) ;
by subj ;

proc sort data= two (rename = (ldl = ldlb)) ;
by subj ;
proc sort data=three (rename =(ldl = ldlc)) ;
by subj ;

Third, merge the three datasets by subject.

data mrg ;
merge one two three ;
by subj ;

Fourth, run your repeated measures ANOVA .

Your three times measuring LDL are the dependent . It seems weird to not have an independent on the other side of the equation, but that’s the way it is. In your REPEATED statement you give a name for the repeated variable and the number of levels. I used “drug” here to be consistent but actually, this could be any name at all. I could have used “frog” or “rutabaga” instead and it would have worked just as well.

proc glm data = mrg ;
model ldla ldlb ldlc = /nouni ;
repeated drug 3 (1 2 3) ;
run ;

Compare the results and you will see that both give you the numerator and denominator degrees of freedom, F-statistic and p-value for the fixed effect of drug.

Now you can be happy.

Blogroll

WP Themes