It’s still technically the weekend so I’m not blogging about statistics until tomorrow.  After some debate, I do think I have a multivariate stat textbook selected, though, so that’s good.

I got an invitation to go to a luncheon an old friend of mine sets up every few months for a bunch of us that used to work out together back in the 1980s. Almost all of those who attend are retired and three of the guys (they are all guys except for me) passed away in the past year or so. We’re getting to “that age”.

Ten or fifteen years ago, it was everyone’s parents that were dying. I competed in judo for 14 years, and taught it for another 30, so I have a lot of friends and acquaintances who are Japanese. At Japanese Buddhist funerals, at least here in Los Angeles, there is a song they always sing in Japanese. One day, after a half-dozen or more of our friends’ parents had passed away in a pretty short period of time, my friend, Hayward, leaned over to me during the service and said,

You know, I’m getting a little worried. I’m starting to know all of the words to that song.

Now, though, it is our friends and acquaintances who are dying, not their parents. That’s the sort of thing that gives you pause. Every time I go to one of those luncheons, we are talking about the people who we miss who aren’t there any more.

I’ve been working a lot of hours – as always. The Spoiled One talked me into going out with her twice this weekend,

Malibu farmonce to have dinner at the end a pier in Malibu at sunset, and once to go hiking in the Santa Monica mountains. She can be a good influence sometimes. She goes to boarding school during the week and she asked,

I’m only home on the weekends. You have all week to work. Why do you have to work while I’m here?

I didn’t have a really good answer to that.

The Invisible Developer said to me,

You know, I think we’re getting to that age where I don’t think we should have to do much other than what we want to do.

Ignoring the fact for a moment that a) he may be correct and b) that does reflect that we have a privileged life that most people in the world don’t attain, I spent Sunday until midnight writing the budget justification for a grant which was decidedly something I did NOT want to be doing. I have made an adjustment now that at midnight, I try to quit working, no matter what.

I made a major decision to write, at most, one more grant. If we get the Phase I that I’m writing now, I’ll write the Phase II. Other than that, I’m done. Over it. Seriously, after you’ve brought in $30 million or so, is getting $30,100,000 going to make  a difference in your overall accomplishments? That’s why I quit keeping track of grants funded after the first $30 million and just put down the latest ten or so. No, I don’t get to keep that money, either. It gets paid out over the years in salaries, rent, supplies, student scholarships and all of the other stuff the grants were written to do.

Once this grant is done, I will be working on the games and doing nothing else in September and October. Then, there are a few months of teaching classes and another six months after that of just working on the games.

I’m saying, “No” a lot.

  • No, I am not interested in another consulting contract.
  • No, I don’t want to work on a journal article with you.
  • No, I’m not writing another grant.
  • No, I’m not teaching any more classes.
  • No, I will not present at your conference.

I’m throwing all of my eggs in one basket, working on making our games better and better. We’re taking a risk, focusing just on this and hiring more people to work on the games to boot. There is actually a lot of statistics in it, too, both analyzing the data we’re in the middle of collecting and in our next game, under design, which teaches statistics.

Maybe someone else would retire and lay on a beach, but I’ve tried that a few times and I’m a completely failure at it.

2014-09-13 18.47.12The Invisible Developer said that someone asked Bob Dylan why he was still making music and playing shows when he didn’t need the money and Dylan replied,

What else would I do? This is what I do.

It seems like a good answer. As for me, what I do is game design, coding, statistics. I’m just going to do that. It occurs to me that I have just written either the last or next-to-last grant budget I am ever going to write. And that makes me very, very happy.

 

 

 

 

The new common core standards have statistics first taught in the sixth grade, or so they say. I disagree with this statement because as I see it, much of the basis of statistics is taught in the earlier grades, although not called by that name.  Here are just a few examples:

  • Bar graphs
  • Line plots
  • X,Y coordinates
  • Fractions and decimals (since the mean is rarely going to be an integer)
  • Ratios and proportions – in summarizing a data set, it’s pretty common to point out, for example, that the ratio of game fish to non-game fish was 3:2. We are often asking if the percentage of something observed is disproportionate to the percentage in the population.

It doesn’t bother me that these topics are not called statistics, I’m just pointing it out. Whether a line is considered a regression line or simply points in two-dimensional space is a matter of context and nothing else.

Speaking of lines and graphs, the very basis of describing a distribution starts with graphing it. So, those second grade bar graphs? Precursors to sixth grade requirements to summarize and described a set of data.

You might say I’m going to an extreme including fractions in there because I may as well throw in addition and division. After all, you need to add up the individual scores and divide by N. Actually, I wouldn’t argue too much with that view.

You can’t even compute a standard deviation without understanding the concepts of squares and square roots, so it would be easy to argue that is at least a prerequisite to statistics.

While I’ve heard a lot of people hating on the common core, personally, I’m interested in seeing how it plays out.

What I expect will continue to happen is that many children will be turned off of math by the third grade because it is generally taught SO abysmally. That isn’t all the fault of teachers – the books they are given to use are often deathly boring. This isn’t to say I am not bothered by the situation. It bothers me a lot.

Working mostly in low-performing schools, I see students who are not very proficient with fractions, proportions, exponents or mathematical notation. We are trying to design our games  to teach all of those prerequisites and then start showing students different distributions, having them collect and interpret data.

Lacking prerequisites is one of the three biggest barriers I see in teaching statistics, or any math, to students. The other two are related; low expectations for what students should be able to learn at each grade, and the fiction on the part of teachers and students that everything should be easy.

People were all up in arms years ago because there was a Barbie doll that said, “Math is hard.”

Guess what? Math is hard sometimes and that is why you have to work hard at it. Even if you really like math and do well at it in school, even if it’s your profession, there are times when you have to spend hours studying it and figuring something out.

Today, I was reviewing textbooks for a course I’ll be teaching on multivariate statistics. I didn’t like any of the three I read for the course, although I found one of them pretty interesting just from a personal perspective. The one I liked had pages after pages of equations in matrix algebra and it would be a definite stretch for most masters students. I’m really debating using it because I know, just like with the middle school students, there will be many lacking prerequisites and it will take a LOT of work on my part to explain vectors, determinants before we can even get to what they are supposed to be learning.

Last week, I had someone seriously ask me if we could make our games “look less like math so that students are learning it without realizing it”. No, we cannot. There’s nothing wrong with learning math that you need to disguise it to look like something else.

Whenever I catch myself thinking in designing a game, “Will the students be able to do X?” and I think they will not because they are lacking the prerequisites, I build an earlier level to teach the prerequisites and go ahead and include X anyway.

Here is why — I’m sitting at the other end teaching graduate students where the text begins like this:

root mean square residual (RMR) For a single-group analysis, the RMR is the root of the mean squared residuals:

\[  \mbox{RMR} = \sqrt {\frac{1}{b} [ \sum _ i^ p \sum _ j^ i (s_{ij} - \hat{\sigma }_{ij})^2 + \delta \sum _ i^ p (\bar{x}_ i - \hat{\mu }_ i)^2 ]}  \]

where

\[  b = \frac{p(p+1+2 \delta )}{2}  \]

is the number of distinct elements in the covariance matrix and in the mean vector (if modeled).

For multiple-group analysis, PROC CALIS uses the following formula for the overall RMR:

\[  \mbox{overall RMR} = \sqrt {\sum _{r=1}^ k \frac{w_ r}{\sum _{r=1}^ k w_ r} [ \sum _ i^ p \sum _ j^ i (s_{ij} - \hat{\sigma }_{ij})^2 + \delta \sum _ i^ p (\bar{x}_ i - \hat{\mu }_ i)^2 ] }  \]
Okay, actually I just pulled that from the SAS PROC CALIS documentation because I was too lazy to copy all of the equations that were in the book I was reading, which went on for pages and pages in this vein, but you get the idea.
Now, these 6th graders are 11 years from being in my course. What I want to know is in what grade do we magically go from having it “not look like math” to reading sentences like,
“The probability distribution (density) of a vector y denoted by f(y) is the same as the joint probability distribution of y1 …. yp . “
or
“It is easy to verify that the correlation coefficient matrix, R, is a symmetric positive definite matrix in which all of the diagonal elements are unity.”
If those two sentences don’t make absolute perfect sense to you, well, you’re fucked because those were the two easiest sentences in the chapter that I just pulled out because I didn’t need to go to the effort of typing in a lot of matrices. I’m teaching at Point B and I want to know how, if students’ Point A is not doing anything that looks like math, they are ever going to get here.
I think the answer is pretty obvious, which is why I’m insisting on teaching every bit of math every chance I can get.

I’m working on a section of a game that teaches fractions. If a player misses the question about where to meet up with the returning hunter, he or she gets sent to study. There is a movie that plays before this about needing to get back to the camp before dark.

Here is the question,

“The sisters begin to worry their brothers won’t make it back by dark. They start down the trail to meet them. They decide to stop and wait at the spot where their brothers will be 3/4 of the way back to camp. How far FROM the camp will the girls be?”

trail

 

I used this question because I want students to think about a few ideas:

  • Distance between two points can be thought of as a whole.
  • If you are a/b distance FROM point X, the remaining distance TO point X is 1 - a/b  . Of course, I don’t expect them to state it like that.
  • 1/4= 2/8
  • Number lines can be numbered in either direction. You can have 0 on the left or 0 on the right. The distance will be the same. The size of each interval will be the same.

These are kind of important ideas in math – equivalence, the arbitrary nature of labeling points on a line.

Students can click on GIVE ME A HINT, and a hints page pops up that explains, among other things, why you were wrong if you answered that the sisters would be 3/4 of the distance to the hunting grounds FROM the camp. If, even after reading the hints, (or if they skip the hints and just guess, we’re talking kids, after all) they get the problem wrong, the player is sent to watch a video clip explaining the problem, and then has to take a quiz to get back to the game.

SO … I had the thought instead of writing the quiz questions out of thin air, I might read what some more experienced teachers were giving to students in this grade as math problems. After all, I haven’t taught middle school math since the 1980s.  I went to several sites, I even purchased some things like “One year of fifth-grade homework problems” etc.

When I looked at page after page of what students are being given as homework assignments, the only thing I could think was “Are you fucking kidding me? No wonder kids hate math.”

All of the homework was like this:

1/4 + 1/3 =   ?

For FIFTY problems. That’s it! Then, the next day, it would be another fifty problems like this:

5/6 – 1/4 =  ?

Okay, you need to learn to add and subtract fractions, but is that ALL you need  to learn? Obviously not. How boring must it be to sit and just calculate answers to the same type of problem over and over? This stuff made me start to hate math and I LOVE math.

How can you possibly think that is teaching kids math? That’s like making them copy down all of the words in the dictionary and pretending you taught them literature.

Don’t even get me started on teaching statistics – wait, too late. I’m started. That is my rant for tomorrow.

Sometimes the benefits of attending a conference aren’t so much the specific sessions you attend as the ideas they spark. One example was at the Western Users of SAS Software conference last week. I was sitting in a session on PROC PHREG and the presenter was talking about analyzing the covariance matrix when it hit me –

Earlier in the week, Rebecca Ottesen (from Cal Poly) and I had been discussing the limitations of directory size with SAS Studio. You can only have 2 GB of data in a course directory. Well, that’s not very big data, now, is it?

It’s a very reasonable limit for SAS to impose. They can’t go around hosting terabytes of data for each course.

If you, the professor, have a regular SAS license, which many professors do, you can create a covariance matrix for your students to analyze. Even if you include 500 variables, that’s going to be a pretty tiny dataset but it has the data you would need for a lot of analyses – factor analysis, structural equation models, regression.

Creating a covariance data set is a piece of cake. Just do this:

proc corr data=sashelp.heart cov outp=mydata.test2 ;
var ageatdeath ageatstart ageCHDdiag ;

The COV option requests the covariances and the OUTP option has those written to a SAS data set.

If you don’t have access to a high performance computer and have to run the analysis on your desktop, you are going to be somewhat limited, but far less than just using SAS Studio.

So — create a covariance matrix and have them analyze that. Pretty obvious and I don’t know why I haven’t been doing it all along.

What about means, frequencies and chi-square and all that, though?

Well, really, the output from a PROC FREQ can condense your data down dramatically. Say I have 10,000,000 people and I want age at death, blood pressure status, cholesterol status, cause of death and smoking status. I can create an output data set like this. (Not that the heart data set has 10,000,000 records but you get the idea.)

Proc freq data= sashelp.heart ;
Tables AgeAtDeath
*BP_Status
*Chol_Status
*DeathCause
*Sex
*Smoking /noprint out=mydata.test1;

This creates a data set with a count variable, which you can use in your WEIGHT statement in just about any procedure, like

proc means data = test1 ;

weight count ;

var ageatdeath ;

 

Really, you can create “cubes” and analyze your big data on SAS Studio that way.

Yeah, obvious, I know, but I hadn’t been doing it with my students.

On my way home from the 2014 Western Users of SAS Software conference. When I was younger, I would go to every basic session trying to find something I could use that wasn’t over my head. As I got older, I went to the statistics sessions to see if there was anything new or more advanced I had not mastered yet.

Now that I’m really old, I just do my own presentations and then spend the rest of the conference wandering around to anything that looks interesting. Sometimes, the most interesting stuff is the questions after a session or just the random people I run into in the hallways.

Interesting stuff: Part 1 Data coolness

I had used the California Health Interview as example data for classes but I was not aware of the huge breadth of data available there. Also, if you are a researcher and ask them nicely they will create data sets for you, as long as the data are available and it can be done without violating confidentiality requirements. Check them out here.

http://data.ca.gov/

Say you wanted to chart the number of amputations per 100,000 workers over the past six years. The state of California has you covered.

amputation chart

That was pretty random, yes? Want Pneumoconiosis hospitalizations? Just check it out if you ever need health data, death data, politics – anyway, good resource.

Interesting stuff parts: 2, 3 & 4 which I hope to write about this week

Another random idea that I have certainly had before but never implemented … eek, I have to go check out but remind me it has to do with getting around the SAS Studio limit on ginormous data.

Also, F-test , p-value and r-square

And permutations, random data, bootstrapping and creating your own version of F-tests, t-values and p-values

I’m just heading off to the Western Users of SAS Software meeting that starts tomorrow.  After the keynote, during which I have promised not to swear even once, I’m doing a SAS Essentials talk on Thursday, where I teach students 10 basic steps that allow them to complete an entire annual report project.

One of these is PROC DATASETS.  It is used twice in the project. First, they get a list of all of the datasets in the directory. We’re using SAS Studio which runs on the SAS server. Since students neither have access to issue Unix commands directly nor do they know any most likely, we use PROC DATASETS.

libname mydata  "/courses/u_mine.edu1/i_1234/c_7890/wuss14/";
proc datasets library= mydata ;

This gives me the output below.

# Name Member Type File Size Last Modified
1 SLPOST_SCORED DATA 208896 26Jun14:04:00:40
2 SLPRE_SCORED DATA 487424 26Jun14:04:00:41
3 SL_ANSWERS DATA 619520 26Jun14:03:59:42
4 SL_PRE_POST DATA 196608 26Jun14:04:00:03

 

Once we have cleaned up the data in every data set, we are not quite ready to start merging them together. A common problem is that data sets have different names, lengths or types for the same variable. You’d be wise to check the variable names, types and lengths of all the variables.  So, here is where we use PROC DATASETS a second time.

proc datasets library= work ;
contents data = _all_ ;

This time, we added another statement. The “contents data = _all_ “ will print the contents of all of the data sets. In perusing the contents, I see that grade is entered as character data in one – 5th, 4th and so on, while it is numeric data in another. This is the sort of thing you never run into in “back of the textbook” data, but that shows up often in real life.

Those are two super simple steps that allow you to do useful things.

You can do more with PROC DATASETS – append, compare – but my plane is boarding so more about that some other time.

 

 

I’m upset that I’m not perfect and I’m also very tired.

money

The Invisible Developer asked me tonight if I had a list of all of the grants that I’d had funded in my career.  For some reason, he thought I should have kept track of that. I told him that no one cared, not even me. The first few years, I would list in my c. v., “Over two million in funded projects.” “Over three million in funded projects.” Eventually, I felt like one of those McDonalds signs that keep changing, “Over 427 billion served.”

So, if I mentioned grant writing at all, I would just list a half-dozen or so funded projects. Really, once you’ve brought in over $7.5 million, people don’t care so much about the details.  Dr. Erich Longie, who was president of Cankdeska Cikana Community College when I worked for them used to say we had gotten over $30 million. I honestly forget. I sat down and wrote whatever I could remember and came up with about $19 million, but I’m sure there are some from 15 or 20 years ago that I forgot. Of that $19 million, $15 million were grants I wrote with no help whatsoever. That is, I sat in front of computer, swore, wrote, added numbers, wrote some more, swore some more and in the end produced 100 or 200 pages that were good enough that someone gave the funding to run a program and pay people’s salaries for five more years. The other $4 million, I wrote large parts of but other people helped with budget or some other part.

I know I have forgotten a bunch, and I don’t even care to look. When I was trying to come up with a list, I saw one grant for $1.5 million in my list of examples and thought, “Oh yeah, I had totally forgotten about that.”

And yet, today I made a mistake on a grant and I kept thinking what an idiot I am.

I used to make a lot of money writing grants for people. There was one year when every single proposal I wrote got funded. I was the flavor of the month. Everyone wanted me to write grants for them. Of course, the first time I wrote one that didn’t get funded, the client was pretty upset. How could that happen?

ice cream

In an average year, 85% of the proposals I wrote got funded. I’m not as great as that makes me sound. I was selective in what competitions I chose. If there wasn’t at least a one in seven chance of getting funded, I didn’t apply. When I started, my cut-off was one in five, but things have gotten more competitive. (That is, I would look at the number of proposals they funded last year and the number of applications they received and calculate the odds. Some competitions fund as few as 3% of the applicants. I wouldn’t bother with these.) Still, hitting 85% when only 14-20% get funded is a pretty good track record.

I don’t think the mistake that I made will keep the grant from getting funded. It was possible to submit a revision, since it was before the deadline, and I did that, on time. Still, I felt like an idiot.

Confession: I don’t really like grant writing

I’ve never liked grant writing and I’ve only ever met one person who did. He was very good at it but he’s retired now and probably nearly 80. It’s tedious work. You read 100+ pages of instructions, write 100 pages to fit a formula – Needs Assessment, Objectives, Project Design, Evaluation, Adequacy of Resources, Personnel. Fill in every box and bubble. Cite research to back up everything.

This is probably why I’ve never been very sympathetic when my children complained about their schoolwork being boring or hard. The fun, easy stuff we do for free. The boring or hard stuff, you need to pay. Grant writing is both boring AND hard. I did it for years because I was a widow with three young children and I needed the money. I’m grateful that I was able to support my children through private schools, good universities and the Olympics.

The only grants I write any more are for people I have worked with for years. Don’t call and ask me if  I will write one for you because the answer is “No.”

No matter how many millions I hit, I feel terrible when I miss

Often, I’m writing grants for institutions where people are on soft money. That means, if the grant I write isn’t good enough, people lose their jobs. So, people somewhere else who did get the grant keep their jobs, but I don’t know THOSE people.

My point, and I do have one

I was going to write about PROC DATASETS today, but I wrote about this instead because it has been on my mind.

I think I have this in common with a lot of successful people – no matter how much money I bring in, how many good papers I write, no matter how many keynotes “knock ‘em dead”, no matter how many grants get funded – if I slip once, I think I’m a failure.

Realistically, I know this is not true. If this particular grant doesn’t get funded, I’ll still have written tens of millions of dollars in successful proposals. This struck me the LAST time I had something not funded, over a year ago.  I was feeling bad about it, and happened to be looking for something in a filing cabinet. (Yes, we have filing cabinets.) Going through those, I came upon file after file of data, reports, budget reports, from one funded project after another. It occurred to me that I had a LOT of successful projects over the years. It’s like this proposal I just finished. I think it was excellent work but there were one or two mistakes, and they weren’t even fatal mistakes (I hope!)

Here is my advice – successful people tend to immediately forget their successes and focus on the next challenge. That may be part of what makes them successful but it can also be a bit depressing if you forget the successes of the past when you are confronted with a failure in the present it looms unrealistically large. So, yes, fix your mistakes, learn from them, but also take some time every day to pat yourself on the back for the many, many mountains you’ve climbed in life.

 

 

Last week was very productively spent at Unite 2014 learning about all things Unity.

In case you are not into game development, Unity claims to be used by over a million game developers around the world. While I rather suspect those statistics are up their with Second Life and Twitter counting everyone who ever signed up for an account, there is no denying that one whole hell of a lot of people use Unity for game development, including us. I have to say all of my major objectives in attending were met.

2d Game image

The first thing I wanted to achieve was make a definite decision whether to go with Unity for the 2D game for the iPad that we are going to dive into next month. We have some artwork, a rough design, but we’re coming up on the first point of no return decision. Well, there’s always a return, but if we start with Unity and then switch to solution X it may take us quite a bit of time to re-tool.

The decision was, yes, we definitely want to use Unity. My concerns about performance on lower powered devices were addressed. First, I spoke to some helpful folks from Unity who pointed out that you can set your game’s graphic quality ranging from Fastest through Beautiful to Fantastic. Yeah, those are actually the last two settings. I also attended a session on tips for working with mobile devices that gave me some good ideas, like if we have character that has a sword, instead of having two images, a sword and a character, have that be drawn as one image.

Two other clinchers for unity were

the number of platforms to which we could expand eventually – play station, xbox, android phones, smart TVs. Unity works with all of those. The same would not be true of code we wrote in javascript for the web.( Speaking of javascript, even though Unity supports both C# and javascript, I noted that the examples were overwhelmingly C# ones in the presentations and their seems to be a definite lean in that direction), and

the number of vendors with integrated add-on packages, everything from SpeedTree, which makes drawing trees fascinating to mixamo which offers a much simpler way for making 3-D animated characters. I was so impressed with mixamo that I texted one of our fabulous artists from the presentation, This is something we need to start using, and by we, I mean you, because we both know I suck at art.

The second thing I wanted to achieve was to get more familiar with Unity. That was achieved. I was able to follow the examples in the Training Day and do the Nightmares game, which was pretty fun. The next couple of days, in my spare time, I made another much simpler game from scratch for my grandchildren to play. It won’t win any awards for originality or anything else, but my Unity knowledge definitely spiked up in a week.

Screen shot of nightmares game

One reason I insist on going to events like this, even though people tell me that I am the CEO and should be doing CEO things, is that I would never, ever find 40 hours in a week just to learn  if I stayed back in the office. I’ve written before about the Red Queen’s Race in technology, where you need to run as fast as you can to stay in the same place. I turned 56 last week and more opportunities are coming my way than ever before, which I attribute to refusing to equate age with stagnation.

No brogrammer culture in sight

Speaking of age – I usually go to conferences on statistics – the Joint Statistical Meetings, SAS Global Forum, etc. Sometimes I go to start-up events. This was my first game developer conference and I had heard horrid things about the game industry, that women are sexually harassed, assaulted, disrespected.

As far as horrid brogrammer culture – didn’t see it, and I looked. The demographics were overwhelmingly male, somewhere between 90-95%, I would guess. None of the sessions I attended had a female presenter. On the other hand, I didn’t submit a paper. I suggested it to The Invisible Developer and he didn’t want to do it, and I was too busy with everything else. We decided next year, for sure we would co-author one and submit it. Should be fun.

My point is, I don’t  think they received hardly any submissions from women, just based on the number of women attending.

Despite all of the people who claim to have started coding in the womb and how much VCs supposedly drool over twenty-somethings, I saw about as many people under 20 as I did over 60. That’s based on me eye-balling it, I didn’t actually go around carding people. Given the amount of grey hair and balding, I’m going for the crowd was overwhelmingly in their thirties and forties.

While there were far fewer women than at statistics conferences, there were more than the zero African-Americans and Latinos you usually see at statistics events, although it was clear from the eavesdropping during the coffee breaks (I call it qualitative data collection) that many of these folks were actually from Latin America attending the conference. It was FAR more international than SAS Global Forum or JSM, even though both of those have a smattering of international folks.

As far as the whole sexual harassment, mansplaining, unwelcome thing – didn’t see it. Nada. Zip. Zilch. Every single person we met was nice, polite and interested in talking about game development. No one treated me like I was a second-class citizen and the only person who insisted on explaining stuff to me that I already knew was The Invisible Developer, but he has lots of other non-annoying traits that make up for it, so it doesn’t bother me.

It may be that I am old, plus I was there with my husband, so no one would bother me. However, I really did look, whether it was at cocktails in the evening, at lunch, during the coffee breaks, at the young women sitting around me in conference sessions  - and I did not see a single hint of the kind of bad behavior I’ve been hearing about. I’m a small person and at this conference, I was just there to hang out and learn stuff, so I was wearing jeans and a hoodie most days, my point being, there wasn’t any reason people would be on their best behavior around me.

I’m not saying it doesn’t happen. I’m saying I didn’t see it happen here.

All I can say is — you should go to the next Unite conference. Learn stuff about game development and people will be nice to you. What more can you want? Well, if you want more, I should add that Seattle had some awesome restaurants.

If you want to go next year, do jump on it right away when you see it advertised though, because everything sold out – the conference, training day, nearby hotels.

 

Lately, I’ve been working on a report that uses eight datasets that all have the same problems with the usernames.

In addition to needing to remove every username that contained the word “test” or “intern” we also needed to delete specific names of the classroom teachers who had played the game. We needed to correct names that were misspelled.

Here are a few examples of a very long list of statements:

if username in("MSDMARIA","1ANSWERKEY","MSDELAPAZ","MSCARRINGTON") then delete ;

if username = “GRETBUFFALO” then username = “GREYBUFFALO” ;

else if username = “HALFHORES” then username = “HALFHORSE” ;

else if username =”TTCARLSON18TTCARLSON18″ then username = “TTCARLSON18″ ;

 

These problems occurred in every dataset.

A second problem found when looking at the contents of each of the 8 datasets was that the username variable was not the same length in all of them, which could cause problems later when they were to be merged together or concatenated. Also, now that all of the usernames have been cleaned up, none should be over 12 characters in length.

Wouldn’t it be nice if there was a way to just get the first n characters of a string?

What did I tell you before about when you find yourself saying that with SAS?

Enter our  character function, substr, which returns a substring of a variable beginning at any position and for as many characters as you like. Problem solved.

newid = substr(username, 1, 12) ;

 


It seems pretty inefficient to write this set of statements eight times in eight different data sets. Also, next year we will have another eight data sets, and some will have these same students’ usernames and same problems. Wouldn’t it be a lot easier to have these statements in one place and add to the “fixnames.sas” file whenever we find a new problem?

So, now we have the write once, use anywhere solution of %INCLUDE.

What %INCLUDE does

The %INCLUDE statement references lines from an external file and processes them immediately. It has almost the exact same effect as if you had copied and pasted those lines write into your program. The exception that makes it “almost” is that a %INCLUDE statement must begin at a statement boundary. That is, it has to be either the first statement in your program or occur after a semi-colon ending a statement as in this example.

data studentsf ;

infile inf delimiter = “,” missover ;

attrib teacher length = $12. username length = $ 16. ;

input username $ age sex $ grade school $ teacher $ ;

%include “/courses/u_mine.edu1/wuss14/fixnames.sas” ;

Also, you need to think about it as if you had copied and pasted those lines into your program. Is it still valid code? Whenever using %INCLUDE, you should make sure the code runs in your program as expected, with no errors, before cutting it out and making it an external file.

To source or not to source

The default is not to show the statements that were in the included file. Generally, this is desirable. This is code you have already debugged and if you are using it multiple times (otherwise, why bother with the %INCLUDE), having the same 20 lines repeated 8 times in your log just makes it harder to debug.

Professors might want to use real data but hide all of the messy data handling from the students initially in fear they would run screaming for the door. I meant, professors might want to gradually introduce SAS statements and functions for data handling.

In either case, students could use the %INCLUDE statement as shown in the example above. To see the code include in your log is quite simple, just add a source2 option as shown.

%include “/courses/u_mine.edu1/wuss14/fixnames.sas” /source2 ;

It will be in your log as follows

NOTE: %INCLUDE (level 1) file “/courses/u_mine.edu1/wuss14/fixnames.sas is file “/courses/u_mine.edu1/wuss14/fixnames.sas.

419 +username = compress(upcase(username),”. “) ;

420 +if (index(username,”TEST”) > 0 or index(username,”INTERN”) > 0

 

and so on.

The + signs next to each statement denote it was in the included file.

If you want to know why I think it is so important for new SAS users to learn about the %INCLUDE statement, you should come to the Western Users of SAS Software conference in San Jose next month. Especially if you’re a student, you should come, because they cut you a really good deal.

If you’re not a student and you have a real professional job – well, then, you should be able to afford it. There will be funny hats, beer, coding and cookies. What more could one ask?

wuss_hats

A few years ago, I was at the Western Uses of SAS Software conference  and renowned statistician Stanley Azen played the piano and sang at the closing ceremony.

Briefly, very briefly, I considered beginning my presentation on 10 SAS steps to an annual report, by writing a song, These are a few of my favorite PROCs, and then singing it to the tune of “These are a few of my favorite things.”

This plan was dismissed a nanosecond later when I was reminded by The Perfect Jennifer that my singing bears an uncanny resemblance to the sound Beijing the cat used to make in the middle of the night when fighting with the cat next door.

Beijing the Cat

To make up for my disappointment over my lack of musical rendition, I decided to do a few posts on my favorite PROCS, in no particular order. Today’s contestant is … drum roll please ….

PROC IMPORT

Whenever possible, I try reading in the data using the IMPORT procedure, because, it is very simple and my goal in programming is not impress people with my brilliance – it is to get the job done with maximum efficiency and minimum effort.

As can be seen in the example below, there is no need to declare variable lengths, type or names. Only three statements are required.

PROC IMPORT OUT= work.studentsf DATAFILE= "/courses/u_mine.edu1/wuss14/Fish_students.csv" DBMS = CSV REPLACE;

GETNAMES = YES ;

DATAROW=2;

This PROC IMPORT statement gives the location of the data file, specifies that its format is csv (comma separate values), the output file name is studentsf, in the work directory and that if the file specified in the OUT= option already exists, I want it to be replaced.

The second statement will cause SAS to get the variable names from the first row in the file. Since the variable names are in the first row of the file, the data begins in row 2.

Limitations of PROC IMPORT

As handy as it can be, PROC IMPORT has its limitations. Three we ran into in this project are:

  • Excel files cannot be uploaded via FTP to the SAS server, so , no PROC IMPORT with Excel if you are using the SAS Web Editor,
  • If the data that you want to import is a type that SAS does not support, PROC IMPORT attempts to convert the data, but that does not always work.
  • For delimited files, the first 20 rows are used to determine the variable attributes. You can give a higher value for the number of rows scanned using the GUESSINGROWS statement, but you may have no idea what that higher value should be. For example, the first 300 rows may all have numbers and then the class that was records 301-324 has entered their grade as “4th” instead of the number 4.

Although PROC IMPORT is the first thing I always try, one of my pet peeves about instructors and textbooks is when it is the only thing they teach. It’s smart to try the simplest solution first. It’s dumb not to have a back up plan for the instances when that doesn’t work.

For more on what to do in those cases, you can come to WUSS in San Jose. Just a reminder – regular registration closes August 25. After that date, you’ll have to register on site.

 

Next Page →