I’m never going to understand the idea that start-ups are for young people. It is like the ads you see where people want a 25-year-old with 15 years of experience, you know,

“Expert in C++, systems administration, Linux, Windows, SAS, R, Hadoop, Ruby, Python and Java. Must have 5-plus years experience in development for mobile platforms.”

That’s where you get all this nonsense with people claiming to be programming since they were nine. That’s funny because I have seen plenty of nine-year-old boys, usually on skateboards, and none of them were programming.

If you really do have experience programming in multiple languages, at least within your group, if not you personally, it’s a big advantage because some languages are better at certain tasks than others.  It’s not that you can’t do pretty much anything in any language. Hell, I’m sure I could do structural equation models using Excel if I wanted to badly enough. It’s just more effort.

For example, our game is going great in terms of collecting data, but there is one slight problem. The problem is that we want to track the time students in each class spend playing it, the correct and incorrect answers, and a lot more.  When students take pop-up quizzes their username is captured, but there were some parts in the game where that was not happening. This is a beta-test after all. We give the school the update in a week or so that will have this fix in it. What to do with the first three weeks of data that we have collected?

Simple – there is a PHP script that writes the data to MySQL database. One of the options is to download the data as an OpenOffice spreadsheet. I could import that into SAS and it has a field with the timestamp. Since we know what hours each class is scheduled to use the game, I assigned the records based on time of day to class and am now whipping out a nice little pdf report for the teachers and administration. I’m also merging that with other data that includes the username, grade, class and scores on the quizzes so we can give each teacher and the administration a complete picture of how the students are doing.

Even easier, though, was to download it as a csv file, do some of the simple data things-  typing, variable names, breaking the long-ass string into columns – in SPSS. That was especially easier since my main computer is a Mac (I do have bootcamp on it and a Windows  machine on the other desk behind me). I then output the file as a SAS dataset and away  I went.

(Side note: Although the SAS Web Editor for academics is a great stride in the right direction, I still think they need to get moving with a Mac native version.)

Then there are all the little things, like adding the narrative in Dakota,  editing movies, the 3-D programming, the 2-D animation, Javascript for the logic and some of the auxiliary games, etc. etc.

Our game is not an “app”.  It is not something we did in a week with an SDK. While there is no question that a lot of start-ups are done by people who are very young, there is also the fact that a huge percentage of them fail. Something that only uses one language or one area of expertise (and how many things CAN you be an expert in at 25 years old?) may be far more easily replicated or replaced than something more complex.

That is right-hecetu

Tasina

Hey, kids, wanna come over after school and study math? The two students we gave this tempting offer stayed until the office closed and we had to leave. It’s been like that all week.

Kids playing math gameThis is what kids playing our new math game look like. I would insert a video but it is not so interesting to watch in some ways. They all wear headphones, because there are usually 14 or 15 kids playing at a time and they quickly get to very different levels. What you see is kids at a computer clicking a mouse, typing numbers, staring at a screen – engrossed. Our scheduled 30-minute sessions today both morphed into hour sessions.

Don’t get the wrong idea – we are coming home with a HUNDRED fixes, small and large. Some are actual problems – the main one being that children naturally want to explore the virtual world and they wander off into the woods and get lost. We need to add more barriers in more places. Most are ideas to make it better – add more hints, add more places to click and get stories on Dakota culture, more supplemental materials for teachers, new levels, more complete compatibility across browsers and operating systems.

Unlike a lot of companies creating educational software, we spend days in classrooms with students using the game. We ask teachers their opinions and then we go back and make those changes. We ask kids what they would like to see. We also look at the Common Core Standards and the state standards. Let me say something about state standards that is probably not a revelation to anyone very familiar with them – the percentage of kids who meet state standards varies WILDLY by socioeconomic status. The proportion of kids who can divide a three-digit number by a one-digit number is vastly different depending on the average income of your zip code.

We intend to change that.

Yesterday, I mentioned this problem

For 17 girls diagnosed with anorexia, weight change after family therapy was as follows:

11,11, 6, 9, 14, -3, 0, 7, 22, -5 , -4, 13, 13, 9, 4 , 6, 11

Partial results are shown below. Fill in the missing results:

And we had gotten the table completed as far as this. We also along the way found out that the mean was 7.29

Lower C.L. Upper C.L. t-value df 2-tail Sig
3.60  10.98  16 .0007

#1 CHILL !

I mean this most seriously. This really is the first step.

#2 UNDERSTAND!

What is it you are asked to do in the problem? All that is left is to find the t-value. Here is where several of the students went wrong. So many of them went wrong I would have thought they had cheated, but they were sitting all around the room. Barring some secret hand signals, that was not possible.

Many of the students obtained a value of around 2.12, which is very much NOT correct.  I was confused and then I realized that while *I* knew that the problem was asking for the obtained t-value, what the students had computed was the critical t-value with 16 degrees of freedom. The problem did not specify and the textbook author, like me, just assumed that you would know that the value shown on a print-out was the obtained t-value, not the critical t-value.

Well, sure you would know that if like me, and no doubt like the author of the textbook, you had been looking at printouts from statistical programs for the past 30 years. These students could not be expected to know that, so, I ended up giving them full credit if that is what the answered.

What you should know now

  • The t-value referenced in the print-out is the OBTAINED t, not the critical t-value for that number of degrees of freedom.
  • The formula for obtaining t  is (obtained mean – hypothesized mean)/ standard error
  • Your hypothesized mean is 0
  • Your obtained mean is 7.29
  • The standard error is the standard deviation divide by the square root of N
  • The critical value for t for 16 degrees of freedom when p < .05 is 2.12
  • The lower confidence limit is the mean MINUS the CRITICAL t times the standard error
  • The lower confidence limit is 3.6
  • The difference between the mean and the lower confidence limit is 3.69
  • The standard deviation is the square root of the sum of squared deviations from the mean divided by n -1

#3 SELECT A STRATEGY

There are a number of ways to find the t-value. All involve subtracting the hypothesized mean from the obtained mean and dividing by the standard error. Some ways are harder than others. You could compute the standard deviation and divide by the square root of N but that is a lot of work. Here is what I think is the easiest way

  • Divide 3.69 by 2.12  — that will give us the standard error
  • Subtract 0 from 7.29
  • Divide 7.29 by  the standard error

In this case, it was this step and the previous one where people ran into trouble. What is interesting is that they did not realize what they DIDN’T understand. That is, they didn’t understand that the t-value they were expected to produce was the obtained t-value, not the critical t-value.

You could (and many people did) compute the standard deviation, then divide it by the square root of N to get the standard error and it would give the correct answer, but it just seems more work than dividing 3.69 by 2.12.

#4 DO IT

Carry out your strategy.

  • 3.69/ 2.12   —- The standard error is 1.74
  • 7.29 -0  = 7.29
  • Divide 7.29 by 1.74 = 4.19
That’s your answer. As in the previous example, the actual doing it part is pretty easy.

#5 TEST IT

Do a reality check. No one in the class asked which t-value it should be and it never occurred to me that people would not automatically know that it was the obtained t-value that is of interest. I mean, seriously, what’s the purpose of doing a study to find a critical value of t that was established a hundred years ago? I’m not surprised though, that people who are not experienced statisticians don’t immediately think of that. Probably a lot of what statisticians do doesn’t seem very obvious so maybe it’s just another of those weird things.

So, I guess it is up to me on Thursday to explain to the class that you have a critical value for a test statistic and an obtained value.

A lot of #5 comes from experience. For example, immediately, when I saw t-values of around 2 that the students had obtained, I thought that can’t be right, because even with 17 people, 7 pounds is pretty far from 0, it seemed like it ought to be significant.

So …. this brings me to number 6

geniuses

#6 PRACTICE

The more problems you do, the better you get at solving them. People often get the impression that people who are good at math have some kind of special math brain. It’s not true. If you are telling yourself that you are just not good at math, cut it out right now before I come over there and smack you.

I married a rocket scientist – literally – someone whose idea of the way to a woman’s heart was to write a program to generate fractals and email her a pink fractal for Valentine’s Day. It worked, too.  And yet — I can guarantee you that he, and I, both ran into the same obstacles in learning mathematics that anyone else does. The only difference between us and our friends who quit school and ended up working at Wal-Mart is that we spent hours and hours and hours learning programming, statistics(well, I learned the statistics), Calculus, Physics (well, he learned the physics).

Last week, more than one student said to me, with some frustration.

“Dr. De Mars, I studied for HOURS for this class.”

Yes!

I was grading the quizzes from my Advanced Quantitative Data Analysis class. This is a class of really smart people in a doctoral program at a selective university. And yet, some of them still had problems with the quiz. Therefore, in however many parts I feel like doing, I am going to discuss how to solve any statistics problem.

#1 CHILL !

Sun bathing in TunisiaI mean this most seriously. Often, I see people make mistakes because they panic, think they can’t do it, underestimate themselves and think, “The problem cannot be that easy”.

Here is an example:

For 17 girls diagnosed with anorexia, weight change after family therapy was as follows:

11,11, 6, 9, 14, -3, 0, 7, 22, -5 , -4, 13, 13, 9, 4 , 6, 11

Partial results are shown below. Fill in the missing results:

 

Lower C.L. Upper C.L. t-value df 2-tail Sig
3.60 .0007

#2 UNDERSTAND!

What is it you are asked to do in the problem? You need to find the upper confidence limit for the mean, the t-value and the degrees of freedom.

What are the degrees of freedom for a t-test?

A single sample: There are n observations. There’s one parameter (the mean) that needs to be estimated. That leaves n-1 degrees of freedom for estimating variability. “

The degrees of freedom when you are estimating the mean with one sample is N-1, or  17-1, which is 16.

To understand a problem, look at the numbers you DO have.

  • You have the lower confidence limit.
  • You have all of the individual scores
  • You know the number of scores (17)

Think about what you DO know (or can look up in a textbook)

  • The mean is the sum of the scores divided by the number of scores
  • The lower confidence limit is the obtained mean MINUS (t * standard error).
  • The UPPER confidence limit is the obtained mean PLUS (t * standard error).

 

#3 SELECT A STRATEGY

There are a number of ways to find the upper confidence limit but all involve adding the value of (t*standard error)  to the mean. With what you have from #2, I’d think the easiest strategy is

  • Find what the mean is
  • Find the difference between the lower confidence limit and the mean
  • Add that number to the mean

This is often the step where people have trouble. I think it comes from three missteps. One is that they are too stressed out. The second is they don’t relax a minute and think about what they DO know first. The third is that they don’t relax a minute and think about what is the right strategy. In short, I think most people (and I am as guilty of this as anyone) don’t spend enough time on the first three steps before jumping right to number four.

#4 DO IT

Carry out your strategy.

  • The mean is 7.29
  • 7.29 -3.6 = 3.69
  • Add 3.69 to 7.29  to get 10.98

That’s your answer.

#5 TEST IT

Do a reality check. The mean is 7.29 . If it doesn’t fall between your upper and lower confidence limits, you did something wrong.

Check back tomorrow for further proof that these steps can be applied to any statistics problem (and any math problem – maybe any problem in life. )

Twenty-eight years ago, I won the world judo championships. Unlike almost everyone else who accomplishes that feat, I did NOT go into running a judo school, selling martial arts supplies, or, more recently, mixed martial arts.

armbar at practicePhoto courtesy of Hans Gutknecht of the Los Angeles Daily News

On the contrary, I immediately went into a doctoral program at the University of California, where I specialized in Applied Statistics and Psychometrics. After several years as a professor, I went into the consulting business full-time.

So, was that 14 years of competing and training a waste, as far as my career is concerned? I would say no. Thinking about it lately, I see some important lessons I learned from martial arts.

1. To succeed, you don’t need to be like the other successful people. No one is a less unlikely world judo champion than me. I wasn’t Japanese, I had no money, and oh, yes, I wasn’t male. The first women’s world championships were still eight years in the future when I started judo. Japan has the largest number of international medalists, followed by France and the former Soviet Union, I never even had an instructor from any of those countries.

Of course, this has been an extraordinarily useful lesson since being female, over 50 and Hispanic and not only not dropped out of Ivy League schools but having actually graduated with a Ph.D., no one looks less like Silicon Valley than me.

2. To succeed you don’t need to be in “the right place” or with “the right people”. It is NOT all who you know.  I started at Alton YMCA in middle of nowhere Illinois, and trained there my first few years. While many people travel to Japan or Europe to train 20 or 30 times, I went to Japan once, for my junior year abroad, because it was all I could afford. While I was there, Margot Sathay taught me. She was, at the time, the highest ranking non-Asian woman in the world. No one else wanted to bother with me.

This is, of course, a good thing, because I am not in Silicon Valley, Boston or New York City. I’m in Los Angeles and don’t intend to move because I like my life.

3. Success requires effort. Amazing success requires amazing effort. I work 7 days a week. My “off-days” I only work six hours or so. Other days, I usually work from 11 a.m. to 2 a.m. with an hour or so break for lunch or dinner.

4. Working hard doesn’t mean not enjoying life. Just like when I was competing in judo, I am enjoying what I am doing very much, so it is not all that difficult. One side benefit from all of those years of training – I never got into the habit of watching TV. I watch maybe 4 hours of TV a week – frees up a whole lot of time compared to the typical American’s schedule. Half the time I’m watching TV I’m probably riding the exercise bike in the living room at the same time.

5. Focus on what you can do, not what you can’t. I tore my ligaments and cartilage in my right knee in an accident when I was 17 years old. Knee replacements and even orthoscopic surgery were years in the future. Any reasonable person would have quit competing. Instead, I focused on being best in the world at matwork and won international tournaments on four continents. I couldn’t train with top athletes twice a day because I had a job as an industrial engineer. I couldn’t quit my job because I was a divorced mom with a baby that needed stuff. So, I got up every morning, ran or lifted weights, went out on my lunch hour and ran or lifted weights and then did judo after work every night.

I have a lot of advantages right now. We have every possible piece of hardware and software for development and testing. After years in business, we have a stable customer base and a sizable 401k. I can afford to invest a lot of my unpaid time and company funds in design and development. Yes, some of those customers mean I can’t always spend as much time on development as I want. On the flip side, that gives us money and stuff. We’re not part of any incubator, accelerator or co-working arrangement, but The Rocket Scientist, the person who, when we were dating, my research assistants referred to as “Computer God”, is right upstairs.

6. Persistence is probably the biggest lesson I learned from martial arts. When I was 12 years old, there were thousands of kids in this country as good at judo as me, if not much better. When I was 16 years old, there were probably hundreds of kids in this country as good as me, if not better. When I was 21 years old, there were certainly less than 100 people in this country as good at judo as me. When I was 26 years old, I was the best in the world. The biggest difference between me and the thousands of other kids is that I just kept at it.

The first draft of the game was pretty ragged – just like my technique when I started judo. BUT – it gave our team something to work with. For the last few weeks, we have all been working on fixing every part of it,changing the intro, making all the graphics the same size, creating a theme for the web pages,  adding levels to the 3-D portion, adding a sound track – adding, changing, fixing.

It did not look very good at all when we began, but I have learned that how you look at the beginning, or even in the middle, is not the important part. It’s how you end up.

Several times on this blog, I have mentioned that the most common errors I make, and most programmers*, are the simple things like typing or forgetting to close a bracket or tag. Many of those errors are now automatically fixed by the intellisense of various IDEs (Integrated Development Environment), like Webstorm, but they still pop up.

Well, as I have said before (after nearly five years, almost everything on this blog I have said before), I started writing this blog because I wanted to remember solutions to problems for when I had the same problem six months later and was in a different state using a different computer at some random hotel. This is one of those problems.

We are developing a game that uses mostly javascript but since it runs on the web there is also html involved and some PHP (not written by me) that is used for writing the data from the game to our database.

Everything was working until Friday when all of a sudden the input forms quit inputting. I was able to look at the files on the server and see that a few of them had been changed on Friday. Obvious suspect, no?

Unfortunately, I had updated my local machine using the other person’s code from the server, so I had the non-working files on my machine also.

Of course, I had back-ups on my local machine. So, I copied those files over the newer ones on my local machine. This should bring me back to the pre-update state, where everything was working, yes? No.

How could this be? Files on local machine worked. Copied over by files on server. Copied original files that worked back on local machine (yes, they really were the correct files) and they did not work. How can that be?

Well, I gave it away in the title of this post. One of the javascript files had the ABSOLUTE path to the PHP file, not the relative path. So, when I ran it on my local machine, it accessed the non-working, newer, file on the server.

A billion thank-yous to the wonderful tech support staff at pair.com , our web host, who helped me figure this out on a Saturday night. They could see that there was an attempt to connect to the database, and so we knew that the javascript was working and it must be something in the PHP file. At which point I said,

“I can see the error in the PHP file on the remote server but that would only be a problem if the javascript had that file name coded in to call instead of a relative path …”

and then a lightbulb went off and it was all fixed in a minute.

Take away message – if things aren’t working in any way that makes sense, perhaps you are looking at the wrong files.

* The Rocket Scientist says that I should not call myself a programmer because programmers are lower-level and it is only a slightly more impressive title than code monkey. I am ignoring him for now, but if anyone has suggestions for new titles, I am listening. My business cards say “President”,  which last week led a child just old enough to read, but not old enough to follow politics obviously, to ask me if I was the president of the country. He was very disappointed when I said no.

 

I’ve been somewhat of a fan of SAS On-Demand for Academics, but there are two problems. One is that it runs slow and the other is that it doesn’t run native on a Mac at all. Enter the SAS Web Editor.

I just started using it yesterday and so far, I love it. It is in beta, so if you are interested, I suggest you contact the people in the academic division of SAS and get right on it.

One glitch so far to be aware of -

If you re-load the page, it reloads everything. I wanted to clear the results, and when I reloaded the page, everything was cleared, including the code. Since I was just messing around it was no big deal, but if it was your assignment, you might be unhappy.

Solution: Save your code often

This is something you just ought to be doing anyway, anywhere, as a good habit.

Here is my sample code

/* Enter your code here */
data test ;
input group $ wt ;
cards ;
aa 207
ca 55
;
proc freq ;
tables group / binomial (p=.422 ac) ;
weight wt ;
run;

and here is a link to my output which I just downloaded by clicking on the download button. The code above tests whether the proportion of African-Americans stopped in 262 routing traffic stops is different than the population proportion of 42.2% . Through in confidence intervals for good measure.

As I said, I’ve only been using this for less than 24 hours and I started with solving statistics homework problems. So far, it is awesome. Whether it will be equally awesome once I upload some data sets and try to run more complex statistics, who knows. So far, though

  • Runs on a Mac – check
  • Can save programs on-line and access them later – check
  • Fast response time  – check
  • Does basic statistics – check
  • Produces presentation-quality output that can be easily downloaded – check

 

For reasons I may explain later – or maybe not – I decided to analyze the TIMSS data, which is Trends in International  Mathematics and Science Study.

Use a colon: Nifty tip #1
** Ran this first ***

libname LIB ‘C:\TIMSS2007\Data';

proc contents data = lib.G4_ACHIEVE07;

*** Modified to only keep math items ;
data lib.G4_ACHIEVE07;
set lib.G4_ACHIEVE07;
drop s: ;

I was only interested in the mathematics items, not the science ones, and since I did not want 170+ items cluttering up my data set, I used the statement below

DROP S: ;

This statement drops all of the variables beginning with S. You should be cautious of this, because  if there is a variable with a name like STUDENT_ID that will be dropped also.  This is why I ran the PROC CONTENTS first and verified that all of the science items and only t hose items began with an S.

Nifty tip #2  – use a %INCLUDE statement

It only appears that the point of today’s blog is to include all possible special characters. That is merely a fringe benefit.

The %INCLUDE statement essentially copies and pastes code from another file into your program in the spot where you inserted it. I like it for things like 400 lines of formats because just like I don’t like extra variables cluttering up my data set, I don’t like extraneous lines cluttering up my code. I do need to use the PROC FORMAT but I don’t need to see it every time I run the program and I do not want to store it permanently.

%include “c:\timss2007\programs\achievefmts.sas” ;

Problem solved.

Nifty tip #3  Use the LINESIZE option to see all of your results on one line

I am easily annoyed. If you read my blog often, you know that this has been established. If I have 200 variables and the minimum and maximum does not fit on the same line as the mean because the label is

“This is that question where we asked the student about long division which involves dividing a two- digit number into a three-digit number and has a remainder”

and then you have the mean and need to scroll down 200 lines to see the minimum and another 200 lines to see the maximum, well it’s annoying.

Do this:

OPTIONS LINESIZE = 255 ;

or whatever large number you like. No, I don’t have paper that is that wide, but I’m not planning on printing this out, I just want to scroll through and see that the minimum, maximum, mean and standard deviation are reasonable.

Nifty tip #4 Use a temporary data step to find the number of variables

No, smart ass, PROC CONTENTS would NOT do this. I want to know how many math items there are, not how many total. The math items (now that I deleted the science ones above) are in order. I run this statement, look in my log and it tells me there are 178 variables in the data set.

data test ;
set lib.G4_ACHIEVE07;
keep M031106 –M041191 ;

Nifty tips #5  and #6 – Create an array and use the VVALUE function to score data

TIMSS has formats (remember the %INCLUDE ) that are things like 98 = “NOT ADMIN.” , 10 = “CORRECT RESPONSE”.

data lib.G4_scored ;
set lib.G4_ACHIEVE07;
array ans{*} M031106 –M041191 ;
array sc{*} $18 tmp1 – tmp178 ;

I created an array of the mathematics items and the ans{*} says to create an array of dimension however many variables there are between M031106 and M041191 . The double dashes signify between as in “between locations in the data set” with M031106 coming first. If you use one dash SAS assumes the variables are numbered M031106  M031107 all the way toM041190 M041191. Which they are not. Do double dashes count as two special characters or only one?

I could have used 178 instead of * since I actually knew there were 178 variables, but I wanted to throw in another special character. Yes, I am immature. That was established long ago. The $18 denotes this as an array of character variables and assigns them all a length of 18 which is the length of the maximum formatted response. Also, a $ is another special character.

do i = 1 to 178 ;
sc{i} = trim(vvalue(ans{i})) ;
if sc{i} in (“INCORRECT RESPONSE”,”NOT REACHED”, “OMITTED”) then ans{i} = 0 ;
else if sc{i} = “CORRECT RESPONSE” then ans{i} = 1 ;
else if sc{i} = ” PARTIAL RESPONSE” then ans{i} = .5 ;
else if sc{i} = “NOT ADMIN.” then ans{i} = . ;
end ;
drop tmp1 – tmp178 M031002 M031223;

Here I have my handy do-loop and a VVALUE function. You can use VVALUE when you don’t know the variable format, or, as in my case, are too lazy to look it up and type it in. The formatted value of ans{i} , whatever that format might be, is put into sc{i}. I also used the TRIM function to trim trailing blanks while I was at it.

Now that I have scored all of the items to suit my nefarious purposes, I drop the temporary variables as well as two variables that it turned out are questions not administered to anyone.

And that, is my nifty SAS tips of the night.