You’d think it would be easy to find source code for a simple applet to demonstrate long division with a one-digit divisor and two-digit quotient. I wanted it to show the steps in long division, with the product for the first digit in the quotient shown, then that subtracted from the dividend, and the next digit in the quotient shown.

I had one in Flash I wanted to replace and I found all kinds of applets but none with the source code, so, here, as a public service, is what I did this evening while drinking beer.

You can see the end product here 

In case you are dying to know, here is how I write a program, regardless of the language:

  1. Get something to work
  2. Clean it up to make it better

To me, trying to get your code perfect on the first try is like expecting your first draft of an article to be perfect. I find it much easier to dash something off and then go back and rewrite. I know not everyone does it that way but it works for me.

I’m going to use jquery so let’s start with that

<script type=”text/javascript” src=”../javascript/jquery-1.11.0.min.js”></script>
<script type=”text/javascript” src=”../javascript/jquery-ui.min.js”></script>

<script type=”text/javascript”>

<! — First you need a random number function –>

function randnum(min,max)   {
var num=Math.round(Math.random()*(max-min))+min;
return num;
}

// Set up to get a new problem when the window loads. Create variables ;

window.onload = getProb ;
var rightans1 ;
var rightans2 ;
var dividend ;
var quotient ;
var divisor ;

 

function getProb()
{

quotient=randnum(10,100);
divisor=randnum(1,9) ;
document.getElementById(“ans1″).value = “” ;
document.getElementById(“ans2″).value = “” ;
document.getElementById(“yans1″).innerHTML = “” ;
document.getElementById(“yans2″).innerHTML = “” ;
dividend = quotient *divisor ;
divisor=dividend / quotient;
var w = quotient + “” ;
rightans1 = w.substring(0,1) ;
rightans2 = w.substr(1,1) ;
document.getElementById(“c”).innerHTML = divisor ;
document.getElementById(“divide”).innerHTML = dividend ;

}

Function above creates a problem with a quotient between 10 and 100 ;

The divisor will be between 1 and 9  ;

The quotient will be between 10 and 100 ;

I set the values in the form all to empty so that when a student reloads the page and gets a new problem the answer from the previous problem is not still there.

I made the dividend equal the quotient times the divisor to make sure that the divisor went into the dividend evenly with no remainder. I made a  local string variable, w, and then created two variables that were the first and second characters of that variable.

The final two lines in this statement write the problem to the page.

 

— Checking the problem is step 2.  Since I don’t like horrendously long blog posts, I’ll put that up tomorrow.

Getting ready to teach a data mining course at the end of the year, I started looking through data sets I have on my desktop. Not sure what I will end up using. My first lesson, no matter what, is going to be on data quality.

The very first thing I did was a series of PROC FREQs. Then, I thought maybe that was a mistake. Perhaps I ought to start off with SAS Enterprise Guide or Enterprise Miner.  Here is how I did the first peek at data quality with Base SAS. I’m going to do the same thing with Enterprise Guide tomorrow and see if it would be easier. After that, I’ll try Enterprise Miner. I know I downloaded the SAS On Demand version a while back and haven’t done much with it lately.

(There is a new SAS for the Web offering but from what I have seen (admittedly, a while back), it requires you to set up a virtual machine with VMware and I did not have the time to do it nor could I find my Windows 8 or Windows 7 install disk. Must clean office.)

The first thing I did was pull out a data set with a couple of thousand student quiz records. Yes, I know in data mining we will get to data sets in the millions but this is the first exercise of the first class.

I did not expect to have 2,000 quiz records because we only have around 1,200 beta testers and about 200 of those are teachers who I would expect would get all of the in-game problems correct so never be routed to a quiz. I also know from observation that some of the students never made it to the part of the game where they could do the quizzes. The first challenge page requires students to be able to read simple words and subtract two-digit numbers.

I did a super-simple PROC

proc freq data = in.realquiz ;
tables username*quiztype ;

and found that a couple of the users had supposedly taken the same quiz 40 or more times. One students showed having taken the quiz 70 times and another 91 times. While that is theoretically possible, I was suspicious because after those three, the highest number was 7.

I went into the data set and looked at those particular records and the time stamp showed them coming in tenths of a second apart. Clearly, the student was not answering 5-7 questions in less than a tenth of a second.

We tracked these down to a particular school that was having issues with the firewall. It appeared that when the program couldn’t connect to the server, it tried again and again. When there was a connection, all of those records went through at once.

LESSONS LEARNED

  1. Always look at the outliers. Don’t just toss them out. They can tell you things. In this case, taking a closer look at that PHP code is on my list of program fixes. If it happens at one school it can happen at others.
  2. Time stamps are your friend.  I try to include them whenever I can. Yes, it might take up a bit of time and space but there is nothing like it for detecting duplicate records – and fraud.
  3. Just because data has supposedly been cleaned up, never, never assume that it is problem-free.

 

At the moment, we are interested in knowing the most common failure point in the games. Do we need to add in more teaching and problems earlier? The games are designed to teach and test students in mathematics at the fourth and fifth grade levels. The teachers we work with often tell us that their average student is below grade level. So here was my next series of steps.

proc sort data = in.realquiz ;
by username quiztype ;

data test ;
set in.realquiz ;
by username quiztype ;
if first.quiztype ;

*** These first steps sort the dataset by username and type of quiz and then only retain the first instance of each. So, if a student actually did take the same quiz seven times, I am only interested in the fact that beginning the game, he or she could not do multiples of 3, not that it took seven tries to get there.

proc freq data = test ;
tables quiztype*pass / out=quizfreq ;
run;

*** This step shows both the quizzes students took and the result.

This is the point at which I began to become concerned, not about data quality but by what the data was beginning to reveal.

Table of quizzes by passing

Visually impaired – click here for HTML files of tables instead of png

Over half of the students failed and the quizzes they were failing seemed to be at the lower levels – around third-grade math.

LESSON LEARNED

You can get some very valuable information from some very simple statistics. A lot more about that, tomorrow, though, since I have to get back to work ….

I can imagine the type of person served by an expensive, intensive programming bootcamp – someone with money (or, at least, good credit) and several weeks of free time. That has never described me in my life. The last time I had six weeks free was in the summer after tenth grade, before I started working full-time and at age 14, I had neither money nor credit.

It doesn’t require a pile of money and uninterrupted summer’s worth of time to keep up or catch up on technology. If you fall behind, you have no one to blame but yourself.

My whole life, I have been interested in learning more about everything. (Well, except about literary and film criticism because, well, it sucks. Just try reading any of it and you’ll see I am right.) That’s included lots of graduate coursework. For years, I took one class a year – in something – microbiology, matrix algebra – just to learn something new. Now, I try to teach a course a year. Last year it was biostatistics. This year,  I think I will teach both biostatistics and data mining.  I always learn something new when trying to come up with good examples and activities for classes, I have to keep up on the latest software and operating systems. It isn’t just free, but they pay me – not a lot, which is why I only teach once or twice a year.

Someone recently tweeted,

I hope to never learn the meaning of the word “webinar”.

Webinars aren’t all bad (just most of them!). However, I was on one this morning Yakov Fain did on building HTML5 applications, hosted by O’Reilly Media that was definitely worth an hour of my time. It was free, by the way. Now, I’m sure it was just a way for them to sell books – which worked, since I bought one – but it is also a way to get a lecture by experts on a topic. I probably get 80 invitations to webinars for each one I attend. It’s not the most exciting format so I don’t recommend signing up unless it’s a subject you are really interested in learning.

Virtual conference – this is  a first for me to attend, so I will tell you how it works out. I signed up for one on health analytics sponsored by SAS. It looked interesting and it was free. There was a virtual conference I was interested in a while back, on javascript, but it was several hundred dollars for what appeared to me the equivalent of watching youtube videos. Maybe I missed out on something amazing. I’ll never know.

jennifer

Youtube – You are mistaken if you’ve only thought of it as videos for cute kittens and teenage rock star wanna-bes. (Are they actually called rock stars any more? Are they all rap star wanna-bes?) I actually watch youtube videos on jQuery and javascript on the TV while riding the exercise bike. This habit has caused The Perfect Jennifer to wonder aloud more than once
Exactly how is that you people don’t die of boredom around here?
Sadly, the public library hasn’t been a very good resource for me for programming resources. The books they have tend to be far out of date. It makes me sad because I love libraries and have cards for both the Santa Monica and Los Angeles libraries as well as a couple of university ones.

If your university or company offers you an account on the Safari library, I would jump on that because you get unlimited access to all of their books, videos and courses. The individual price for $43 a month seems a little much to me. If I didn’t have a free license, I’d just buy the ebooks I needed. We already have a LOT of technical books, though. If you don’t, maybe it’s worth it.

Just for questions, answers and randomly poking around stackoverflow.com is awesome.

It reminds me of when I was first learning SAS over 25 years ago. I was on the SAS-L mailing list and would just read every day what the really smart people were talking about.

I have to get back to work but there are lots more resources out there, both that I didn’t have time to list and others that I’m sure I don’t know about. Have a favorite?  Please share in the comments. I’m always looking for new places to learn cool stuff.

snow

 

Since there is a blizzard and I’m inside analyzing data, I thought it was a good time for another random SAS tip.

The default output type for SAS 9.4 is HTML, which is nice for presentation and sharing, but sometimes I would like a plain text output, especially if I’m going to be doing something like copying and pasting the output into my program.

You can easily toggle between plain text and HTML output by doing this

Go to the TOOLS menu, select OPTIONS and then PREFERENCES.

In the Preferences window, click on the RESULTS tab.

window with listing preference

 

Check the box next to Create listing and uncheck the box next to Create HTML . Click OK.

Now your results will be in plain text output instead of HTML. To switch back, just go through the same steps, uncheck the listing box and check the HTML box.

Speaking of copying your output into your program ….

I wrote previously about a problem where  I needed to do, among other things, a PROC CONTENTS with the variables in the order they occur in the dataset, not alphabetical order (use varnum) .

 

If you just copy and paste the output of a typical CONTENTS procedure and all you want is the variable names,  you will have a lot of stuff about data type, length and label that you need to delete. Also, my life experience has been that the keystrokes you make (including the backspace key), the more likely you are to make mistakes. When it comes to the keyboard, less is more.

What I really want is the variables output in the order they appear and nothing more. The SHORT option does this for me.

proc contents data= in.sl_pretest varnum short;

and produces this output

Which_choice_is_the_same_as_the What_is_five_time_six__ Fred_walked_for_one_hour__How_ma How_is_nine_thousand__thirty_sev There_are_124_students_making_th
Valerie_has_225_pennies__She_div Joe_did_this_division_problem Which_sign_goes_in_the_space_to

What possible good is that mess? Well, I copy and paste it under the RENAME keyword and then hit the spacebar between each variable name and type = q1   , or whatever number,  like so

RENAME

Which_choice_is_the_same_as_the = q1

What_is_five_time_six__  = q2

Fred_walked_for_one_hour__How_ma = q3

 

As I mentioned in the previous post, I could not do this using an array statement because the data were of mixed type, character and numeric, and SAS does not accept data of mixed types. I also mentioned how to get around that so if you are interested, go back and read that post.

This week I had one of those pain-in-the-ass problems. I had a test with 24 items but they were of mixed types. That is, for some the answer was multiple choice and for others it was numbers.

The data was received as an Excel file.

Now, I could have opened it with SAS Enterprise Guide and specified data types for each variable, but the problem is, I am going to get this particular data set over and over, so I want code I can write once and run every time.

As if that wasn’t bad enough, the variables all had names like:

which_choice_is_the_same_as_the_

I wanted to rename these all to something sane like q1, q2 etc.

The first step was an option I don’t think I’ve ever used before, oddly enough.

 

proc contents data= annoying  varnum ;

 

Normally, SAS gives you the variables in a data set in alphabetic order when you do a PROC CONTENTS. The varnum option lists the variables in the order they appear on the data set. This was immensely helpful because it spared me going through the data trying to figure which was the first question, which was the second, and so on.

I just copied the variables in order after a RENAME statement and tacked on an =q1, q2, etc. like so

Data better ;

set annoying ;

rename

which_choice_is_the_same_as_the_ = q1
what_is_five_time_six__ = q2

etc.

proc contents data= better ;

I could have combined this with the previous step , but the fact is that unless the data set is really gigantic, the time that needs to be preserved is not computer processing time but my time, and this way was quicker because I didn’t have to write out those ridiculous variable names and worry about the program failing because I used  _ in the name instead of __ .

SAS does have a function to detect variable type, but that wouldn’t really have helped me because I still need to write all of these variables into a single array of item1 – item24 for later use, and you cannot have mixed type arrays. So, I did this

 

data mo_better ;
set better ;
array qs{*} q2 – q6 q10 q12 q14 -q16 q19- q21 ;
array itemN {*} $12 item2 – item6 item10 item12 item14 – item16 item19 – item21 ;
array qsA {*} $12 q1 q7-q9 q11 q13 q17 q18 q22- q24 ;
array itemA {*} $12 item1 item7 – item9 item11 item13 item17 item18 item22- item24 ;
do i = 1 to dim(qs) ;
 itemN{i} = put(qs{i},12.) ;
end ;
do j = 1 to dim(qsa) ;
 itemA{j} = qsa{j} ;
end ;
drop i j q1 – q24 ;

I have 4 arrays. The first consists of the numeric variable type questions. I couldn’t use _numeric_ to create an array of all numeric variables because there were others in the data set that were NOT test questions but were numeric and I did not want them in my array. I had to actually list each variable individually or in a range like q14-q16.

The next array is the one I am going to recode the variables into as character variables. Notice that character arrays need a $ and a length. The next two arrays are the character variables and the variables I’m going to copy them into. I could have just renamed the character variables in a RENAME statement and then changed the length in an ATTRIB statement  but  it would have taken more typing.

The DIM function is the dimension of the array, so it is going to loop through from 1 to however many variables in the array because I didn’t feel like counting them.

The PUT function is going to put this numeric variable into a new character variable with the specified length. It changes the variable to character.

The next loop just puts all the character variables into other character variables with the names item1, item7, etc. Now I have variables that are all the same type and length, named item1 – item24 and I can do things with them like compare each student’s response to each variable to the answer key, score it right or wrong and sum up the scored items, like this (1ANSWER is the first username)

 

Data in.pre_scored ;
set mo_better ;
by username ;
Array scored {24} sc1 – sc24 ;
Array items {24} $12 item1- item24 ;
Array ans{24} $12 ans1 – ans24 ;
if _n_ = 1 then do i = 1 to 24 ;
ans{i} = items{i} ;
end ;
else do i = 1 to 24 ;
if ans{i} = items{i} then scored{i} = 1 ;
else scored{i} = 0 ;
end ;
Retain ans1 – ans24 ;
total = sum(of sc1-sc24) ;

Since this is part of a two-year grant and I am going to receive these same test data sets many times, I am now finished with reading in and scoring the data for the next two years. After this, I just need to import the excel file and click run. I am happy.

Also curious, because I noted that this year’s pre-test scores are 1.5 standard deviations higher than the previous year.  I suspect this is because we have many more fifth-graders in this sample. So … with the scoring done automatically, I can now go on to interesting stuff.

If you want to check out the game these results came from, you can read about it here

 

I read a blog post where the author said the women who dropped out of programming “should have been discouraged” because it’s not for everyone and many women try to use smiles and flattery to get men to do their work for them.

I actually have had the experience the author cites, but with both men and women. It’s true there are some people in the tech field who are very introverted or socially inept. They are willing to help you with your technical problems if you will just stop by and have a cup of coffee and chat with them.

I’m not that person. I have a husband and four daughters. Interestingly,The Invisible Developer, who is so introverted as to be never seen in public is also not that person. He has me and aforementioned four daughters. That is enough for him.

Clearly, people who want you to do their work for them are annoying, however, I haven’t found them to be limited to one gender at all.  Lately, I’ve been wondering whether they are like that in SOME  cases because they don’t believe they can learn to do it themselves. I don’t know the answer to that.

What I do know, though, is that over the years I have known many people to succeed in areas I would not have given them a chance. Two very fine physicians that I know didn’t attend the best high schools, have the grades as undergraduates and honestly,  I didn’t think they had a prayer of  getting into medical school, much less succeeding. Neither got accepted in medical school the first year that they tried. People I would not have given the chance of a prayer in hell of becoming elite athletes have often gone on to surprise me, including a couple who won Olympic medals.

Life discourages people enough. Don’t add to it!

That advice is particularly true for programming. The last couple of days have been discouraging. We had our next install almost ready and then I found some bugs in it. Then we thought it was done, and I found some more bugs in it.

Yes, he's a man and we work together to fix stuff

The Invisible Developer is upstairs fixing those and testing the latest version. I am downstairs fixing his code on the next game (so much for women wanting men to do their work for them, and he is definitely a man. I can point to fact of having collaborated in producing The Spoiled One as irrefutable proof of said manliness. Photograph attached.). Actually, he’s brilliant and totally capable of fixing it himself, but he was already working on the other game.

Everyone’s code, if it is the least bit complicated, is going to have bugs in it. Sometimes it can take you days to find them.

Some days we succeed in writing quizzes where students can drag and drop answers, video clips with sound and animation play in response to correct answers with dialogue in English and Dakota, and then the student is transported back to a 3-D virtual world to continuing playing.

Other days, nothing happens. Just nothing. There are no errors in our consoles, just a screen looking obstinately back at us refusing to do what it’s supposed to do.

Programming is discouraging some days on its own and the LAST thing you need those days is someone saying,

“Maybe you’re just not cut out for this. “

I was complaining about how today had just not been productive, that I wanted to have the latest fixes on Spirit Lake in the hands of the teachers today but it wasn’t saving the game state frequently enough. While The Invisible Developer worked on that I found that some of the quizzes in the next build of Fish Lake were telling the student the answer was wrong even when it was right.

The Spoiled One said,

“Don’t worry, Mom. You’ll figure it out. You have time. Life is long.”

You know what? She was right. We figured it all out today. People should be encouraged. I’m proud of that she has figured this out at not-quite-sixteen.

One of the many questions on start-up accelerator applications that make me go “Hmm”, is this question :

How many lines of code have you written?

I have heard of, but thankfully never worked at, organizations that evaluated their technical staff by the lines of code written.

Let me give you two stories that illustrate why this is a bad example.

fairy

Once upon a time ….

Many years ago, I worked at an organization that decided the programming staff was overpaid and generally had a bad attitude. (No, this wasn’t due solely to me. In fact, unbelievably, I was one of the easier to get along with people on the technical staff).  So … they hired some people at low salaries who had, I believe, a three-month training course in SAS. Most of the senior people avoided the cube farm where these new hires were housed, believing that it would be apparent soon enough that you get what you pay for.

I would generally come in around 10:30 or 11 and leave the office around 8 pm. I couldn’t help but notice several times that some of these new programmers were still there when I left. Leaving one evening, I saw one woman in tears in her cubicle, so I stopped and asked what was the matter. She said she had come into the office at 6 a.m. and was still waiting for her program to run. I sat down with her and looked at her program, which was a simple thing to create a few total and subtotal scores and get statistics on these by state. Her program looked like this:

 

LIBNAME  in “directory”;

Data Alabama ;
set in.us ;
If var1 = .  then var1 = 0 ;
If var2 = .  then var2 = 0 ;
If var3 = .  then var3 = 0 ;
Total = var1 + var 2 + var 3;
If state= “Alabama” ;

run;
Proc means data = alabama ;
var total ;
run;

REPEATED 50 TIMES (50 states + Washington, DC) for a total of 562 lines of code (there is only one Libname statement).

The reason it was taking so long is that she was reading in this dataset with millions of records 51 times. There are many ways this could be fixed. Since I was on my way home, I sat down and did this.

libname mydata “directory”  ;
data test ;
set mydata.us ;
total = sum(var1,var2,var3) ;
keep total state ;

Proc tabulate data= test ;
class state ;
var total ;
Table state ,(total*(n*f=comma12.0(mean std)*f=comma8.2) );
run ;

My program was 10 lines, read the dataset in once and produced a nicely formatted table.

So, was she 60 times more productive? I don’t think so.

Story number two happened in the last week. I have been working on improving our two games, Spirt Lake, and particularly Fish Lake. A major improvement has been merging multiple scripts into one.

Here is what we did with our prototype, since we had to meet a deadline:

  • Wrote a script to handle multiple choice tests.
  • Wrote another script to handle tests that had an integer or decimal answer.
  • Wrote a third script to handle tests that had a fraction as an answer, like 4/5 , to be sure it also accepted 8/10, etc.
  • Wrote a fourth script to handle tests where the answer was dragged and dropped.

etc.

Now obviously, de-bugging would be simpler if we have only one or two scripts. So, this week, I have been taking a couple of scripts and making them more generalizable and deleting many others.

Another thing I’ve done is create a CSS style sheet for each game and included that link in files instead of having the common classes defined in each page.

 

The number of code in the project has gone DOWN by hundreds of lines, but I think the ease of maintenance and documentation has gone UP.

Now, if you asked me how many lines of code I have written in my life, that might be a relevant question. (True story, I once worked on a job where I did repeated measures ANOVA so many times for so many projects, I got so bored, I started writing statements backward beginning with the semi-colon.)

Well, I better get to bed since it is well past midnight, I have seven teenagers sleeping over at my house and I have to get up in the morning and take them all to Disneyland for The Spoiled One’s birthday.

7 Generation Games Logo

BTW – You can buy Spirit Lake: The Game here 

The Invisible Developer had commented that I write an awful lot about SAS and maybe I should write about some other language. For Christmas last year, someone gave me an impact.js license so I made a little game where players drop snares to catch rabbits and collect berries. This doesn’t have much educational value,  I was just playing around. I thought it would be amusing to have the food items they collect in the game be equal in value to the number of calories in that item.

If you have impact and wanted to do this yourself, here is what you would do.

1. Basic stuff - include game.entities.berry, game.entities.rabbit and any other food item in your main.js script. It goes right at the beginning with any other entities you require

 

ig.module(
‘game.main’
)
.requires(
‘game.entities.berry’,
‘game.entities.rabbit’,

– more stuff –

)

2. Create the score in your game info function that stores information

GameInfo = new function(){
this.food = 0;
— other stuff you want to initialize
}

 

3. When you extend the game to add your own cool stuff include an addFood function

MyGame = ig.Game.extend({

– init and other functions

addFood: function(amt){
//pickup item
GameInfo.food += amt; //add caloric value to the food score
}
,

– draw and other functions

 

4. To each entity script, add a function that defines how the player gets the food. Here are two examples.

Collecting berries

In the case of the berries, the player will just walk by the bushes and collect the berries. Think Pac Man!

In your berry.js file add a check function like this

EntityBerry = ig.Entity.extend({

— other stuff

check: function(other){
if (other.name == “player”){
ig.game.addFood(5);
this.kill();
}}
})

So …. it is about 5 calories per berry. When the player walks by a bush and comes into contact with a berry (picks the berry), the berry disappears and the player’s food count goes up by 5.

Snaring rabbits

Here is a second example. In this one, they drop snares around the virtual woods and when they snare a rabbit they get 1,000 points which is the approximate calorie content of a dressed rabbit, according to the USDA Nutrient database . I assumed this yielded an average of 2 pounds of meat.

For my rabbit I have extended the rabbit.js script as follows

EntityRabbit = ig.Entity.extend({

— other stuff

kill: function(other){
ig.game.addFood(1000);
this.parent();
}
})

But what is going to kill my rabbits? The snares, of course, so I added this into my snare.js script

EntitySnare = ig.Entity.extend({

— other stuff

check: function(other){
if (other.name == “rabbit”){
other.receiveDamage(100,this) ;
ig.game.addFood(1000);

}}
})

Since the rabbit only has 100 health points, that kills it off so your rabbit disappears and your food value goes up by 1,000.

As you can see, you could easily add shooting deer, buffalo and other food in the same way.

——————————————–

After I had played around with this for a bit, I thought it was a waste to just trash it so I put it into our upcoming game, Fish Lake, in between levels. When they finish Level 3, they play this game and then go on to Level 4. Our main game is 3-d, this is just a little interlude. I like to throw surprises into the game so kids like it and keep playing.

—————————————-

Someone in Los Angeles was very upset by our Spirit Lake game where players shoot wolves and buffalo. She said she just could not kill animals. (The Invisible Developer asked me if she was aware that they were virtual animals and not real.) I told her that our games are based on Native American history and history is what happened, not what you think should have happened or wanted to happen. In fact, there is a very touching story in Fish Lake narrated by Debbie Gourneau of the Turtle Mountain reservation on how many people died of starvation and how many more would have died were it not for the jackrabbits.

——————————

buffalo in the snowClick here to get Spirit Lake: The Game for $9.99

 

P. S. The amount of information produced by USDA is nothing short of amazing, and I don’t say that just because they funded are grant. They really are incredible.

 

 

From the random file — I’ve been super-busy working on our new startup, 7 Generation Games , and Darling Daughter Number Three had to defend her world title again which distracted me a bit, so I have a bunch of half-written posts, I thought I’d just put up at random, for the same reason I do everything else on this blog, the hell of it.

902q798q467453q965pq86-34q9e’w5wi34ytrsghsf.ksfbcmn  - random!

I spend some time playing with other people’s data for a whole lot of reasons – for students to analyze as a learning experience, because I’m interested in a problem addressed by the data, to create presentations for elementary schoolchildren showing what one can learn from statistics.

Here are a few tips that may make your life easier:

Read the user’s guide. Most of all check to see if this is a random sample. If you are just using the data for the purpose of teaching your students who to compute a t-test, then it really doesn’t matter whether it is a completely random sample or not. However, if you are going to be drawing any conclusions based on these results, make sure you know whether the data should be weighted, stratified, or just really not used to generalize to the population at all. If your sample consists of actuaries who are also equestrian competitors, I’m afraid not too much generalization should occur. (Don’t write and tell me about your horse, Beau, and how the two of you are exactly representative of the state of Vermont. You’re not and I don’t care any way.)

Much of the open data I work with is very large data sets and I spend several hours trying to get a feel for the data before I do much with it. If I’m going to use the same data set for a course with a lot of students, I’d like it to have lots of variables, and many of them to be numeric so the students could combine them into scales, do a factor analysis or other quantitative uses and they wouldn’t end up all  using the same few numeric variables. They could have a little individuality in their research question and design.

One way to find number of numeric variables in a data set using SAS.

data testmiss ;
set in._500family ;
array allnums {*} _numeric_ ;
x = dim(allnums) ;
proc means data = testmiss ;
var x ;
run ;

 ++ Equally Random +++

artwork from game

If you buy the beta for Spirit Lake now for $9.99 you’ll get our version 2.0 for free in May. It will be good.  I’ve been working on the newest game, Fish Lake for the last two weeks, but soon I’m going to swap with The Invisible Developer and do nothing but work on Spirit Lake for another few weeks.

 

drinking for science

There are multiple reasons that I haven’t gotten around to Day 10 of the 20-day blogging challenge. In part, because I have been really busy, and the other part is because I read this topic,

“Share ideas that your classroom uses for brain breaks and/or indoor recess”

and I thought

I got nothin’

Anyone who knows me well can tell you that I am NOT a very fun person. I like to think that I have some good qualities, but playfulness is not among them. Ph.D., world champion, founded/ co-founded a few companies, publishes scientific articles – does this sound like I spend a lot of time playing frisbee in the park? No, I didn’t think so. About the closest I come to this in class is on the first day having everyone introduce themselves and talk about their research interests – which is not really very close, I must admit.

For the last SAS assignment of the Public Health Research Methods course, I decided to make a video and upload it to youtube. For one of the dependent variables, I used how often in a year a person engaged in binge drinking, defined as 4 or more drinks per day. I’ve probably had four drinks in a day a few times in my LIFE so I was surprised to find that the average person  (out of over 40,000), said they did this on average 2.4 times per year.

Today has been a really frustrating day. Yesterday, after a margarita at dinner, I came home and was working on our newest game, Fish Lake, and everything was progressing smoothly. Today, for both The Invisible Developer and I, it has been just beating our heads against the wall. For example, I have this PHP script that ran intermittently today – I have three records written to the database – and all of the rest of the times, it failed with an error. The I.D. has been having similar problems.

I took a break and made a video on how to do simple statistics with SAS to test the hypothesis that I could do a screen recording with Quicktime, write a program using SAS On-Demand in Firefox, record the audio in Garageband and drink Chardonnay all at the same time because Von’s had a half-price sale on wine over $20 a bottle and, well, you know – science.

You can determine if my hypothesis – whatever the hell it was – was supported. Bizarrely, the equals signs do not show up in the video. How weird is that?

← Previous PageNext Page →