Feb
29
I’m sick of that bullshit about not being able to find women in tech: part 2
February 29, 2016 | 2 Comments
Yesterday, I wrote about Cindy Gallop’s talk on finding diverse talent
The Rest of the Story …
There were two additional points in her presentation I want to address, but first …
Play it for a few minutes and come back here for the rest of the story.
Did you find yourself saying,
“Yes, but your group of minority/ female developers and artists did not have good enough graphics/ CSS that perfectly centered video/ all of the Spanish language translations done ..”
The fact is, I gave you the link to a prototype for a reason. It emphasizes two of the truest points Cindy Gallop makes in her presentation.
We hire men based on their potential but we hire women based on their demonstrated ability to do the work.
Did I mention that the link you reviewed was a prototype? Yes, I did. Ever since we started 7 Generation Games, our start-up arm that is distributing our educational games, we have heard the same refrain from investors.
- We don’t think this idea will work. Come back when you have a prototype
- We don’t think you can make a commercial game for that price. Come back when you have a completed game.
- We don’t think schools will use these games. Come back when you have 1,000 users.
- We don’t think these games will work. Come back when you have data.
- We don’t think there is a market for games that need to be installed on the desktop. Come back when you have a version in the cloud.
- We don’t think there is a market for web-based games. Come back when you have an iPad version.
Are we seeing a pattern here? I’m actually not whining. Well, not whining any more than usual. We’re still here while most of those companies that received funding two or three years ago when we were just starting have since disappeared.
We’ve received over $600,000 in federal grants, we’ve had two successful crowd-funding campaigns.
We were part of the Boom Startup Ed Tech Accelerator. We just closed our first angel investor round, late in 2015, where we raised $240,000. My point is that we did that MUCH later in the game than I think we would have if we were co-founded by a couple of white or Asian males from Stanford. We don’t look the part of a start-up team.
Funny, I believe my experience as a non-male, non-Japanese competing in judo back in those pre-Title IX days has been great preparation for co-founding a startup. I had 14 years of experience as a competitor with people denying me funding because I wasn’t good enough, didn’t do things right, didn’t run with the right group to get coaching to succeed. Then, I was the first American to win the world judo championships and this weekend I’m getting inducted into the International Sports Hall of Fame.
I actually appreciate the haters and the doubters as they do point out areas we can improve our products and we are continually working on that. We have come very far with relatively little funding for making games and we will go much farther yet.
I’m not sure how much more we have to demonstrate before we attract the attention of
<sarcasm> those accelerators and investors who are looking so-o hard for women-owned startups </sarcasm>
If you’re interested in our desktop games, check out the demos here,
If you are interested in games that run on the web, those are in beta and will be done in a few months. Email info@7generationgames.com if you’d like more information on those.
What you should NOT do is tell me how you are trying so hard to find women in tech to support because I am seriously, seriously tired of hearing that bullshit.
Check back tomorrow for what you really shouldn’t say about women in tech if you don’t want me to slap you.
Feb
28
Seriously, Don’t Say That Shit about Wanting Women in Tech Ever Again
February 28, 2016 | 2 Comments
First of all, you should all watch this video by the brilliant Cindy Gallop. Everything she says about recruiting women for jobs as Executive Creative Directors applies exactly to women, black , Hispanic or Native American men applying for jobs in technology or for investor funding.
Did you watch it? Good! Let me reinforce one of her points.
- If you do not have diversity in your team or portfolio it is BECAUSE YOU DON’T REALLY WANT IT. If you cannot find women/ Latinos/ Native Americans/ African-Americans it is because you are not looking hard enough.
The last software intern we hired was Native American, which I discovered when her tribal enrollment card was one of the documents she presented on the first day of work. The two software developers we hired before her were both Latino. One of our artists is Native American which I discovered when I said we had hired him in part because we were so impressed with the paintings he did of scenes with Native American subjects and he mentioned that he is Ojibwe.
We found good people by reaching out to the people we knew for recommendations. We posted on our company and personal Facebook pages, posted on our company blog, tweeted on our company and personal accounts. See the number of times I said “personal” in there?
We did not go to any major efforts to have a technology company that is 66% minority employees. I gave a presentation on a panel at East Los Angeles College and we have hired two people from there since.
A couple of our employees were referred by mutual acquaintances who knew them and knew what we needed and forwarded our position announcement.
We aren’t prejudiced against white males any more than I am going to assume that you are prejudiced against African-American women or Latinas. The question is, how many do you know? My best friend is Latino and so, not coincidentally, is his son. We hired his son as art director because his work is a perfect fit for the games we are creating. See below.
If the people in your network are mostly white men, that is probably going to be most of the people you get as applicants.
Try reaching out to people outside of your network.
Post your internship opportunities at East Los Angeles College
Contact Sabio coding bootcamp and recruit their graduates
I know there are many, many places you can find diverse talent. There are two I just thought of off the top of my head from which we have recruited people. I know you have access to some electronic device, since you are reading this. It’s not that hard to find people, if you really want to do it.
Come back tomorrow for “I’m sick of that bullshit about not being able to find women in tech: part 2″
Feb
24
SAS Studio: Finding prevalence with pointing and clicking
February 24, 2016 | 3 Comments
Policy makers have very good reason for wanting to know how common a condition or disease is. It allows them to plan and budget for treatment facilities, supplies of medication, rehabilitation personnel. There are two broad answers to the question, “How common is condition X?” and, interestingly, both of these use the exact same SAS procedures. Prevalence is the number of persons with a condition divided by the number in the population. It’s often given as per thousand, or per 100,000, depending on how common the condition is. Prevalence is often referred to as a snapshot. It’s how many people have a condition at any given time.
Just for fun, let’s take a look at how to compute prevalence with SAS Studio.
Step 1: Access your data set
First, assign a libname so that you can access your data. To do that, you create a new SAS program by clicking on the first tab in the top menu and selecting SAS Program.
libname mydata "/courses/number/number/" access=readonly;
(Students only have readonly access to data sets in the course directory. This prevents them from accidentally deleting files shared by the whole class. As a professor with many years of experience, let me just tell you that this is a GREAT idea.)
Click on the little running guy at the top of your screen and, voila, your LIBNAME is assigned and the directory is now available for access.
(Didn’t believe me there is a little running guy that means “run”? Ha!)
Next, in the left window pane, click on Tasks and in the window to the right, click on the icon next to the data field.
From the drop down menu of directories, select the one with your data and then click on the file you need to analyze.
Step 2: Select the statistic that you want and then select the variable. In this case, I selected one-way frequencies, and one cool thing is that SAS will automatically show you ONLY the roles you need for a specific test. If you were doing a two-sample t-test, for example, it would ask for you groups variable and your analysis variable. Since I am doing a one-way frequency, there is only an analysis variable.
When you click on the plus next to Analysis Variables, all of the variables in your data set pop up and you can select which you want to use. Then, click on your little running guy again, and voila again, results.
So … the prevalence of diabetes is about 11% of the ADULT population in California, or about 110 per 1,000.
You can also code it very simply if you would like:
libname mydata “/courses/number/number/” access=readonly;
PROC FREQ DATA = mydata.datasetname ;
TABLE variable ;
Of course, all of this assumes that your data is cleaned and you have a binary variable with has disease/ doesn’t have disease, which is a pretty large assumption.
Now, curiously, the code above is the exact SAME code we used to compute incidence of Down syndrome a few weeks ago. What’s up with that and how can you use the exact same code to compute two different statistics?
Patience, my dear. That is a post for another day.
Feb
16
Urban vs Rural Barriers to Ed Tech: An example of Fisher’s Exact Test
February 16, 2016 | 1 Comment
Who was it that said asking a statistician about sample size is like asking a jeweler about price. If you have to ask, you can’t afford it.
We all know that the validity of a chi-square test is questionable if the expected sample size of the cells is less than five. Well, what do you do when, as happened to me recently, ALL of your cells have a sample size less than five?
The standard answer might be to collect more data, and we are in the process of that, but having the patience of the average toddler, I wanted that data analyzed NOW because it was very interesting.
It was our hypothesis that rural schools were less likely to face obstacles in installing software than urban schools, due to the extra layers of administrative approval required in the latter (some might call it bureaucracy). On the other hand, we could be wrong (horrors!). Maybe rural schools had more problems because they had more difficulty finding qualified personnel to fill information technology positions. We had data from 17 schools, 9 from urban school districts and 8 from rural districts. To participate in our study, schools had to have a contact person who was willing to attempt to get the software installed on the school computers. This was not a survey asking them whether it would be difficult or how long it would take. We actually wanted them to get software ( 7 Generation Games ) not currently on their school computers installed. To make sure that cost was not an issue, all 17 schools received donated licenses.
You can see the full results here.
In short, 8 of the 9 urban schools had barriers to installation of the games which delayed their use in the classroom by a median of three months. I say median instead of mean because four of the schools STILL have not been able to get the games installed. The director of one after-school program that wanted to use the games decided it was easier for his program to go out and buy their own computers than to get through all of the layers of district approval to use the school computer labs, so that is what they did.
For the rural schools, 7 out of 8 reported no policy or administrative barriers to installation. The median length of time from when they received the software to installation was two weeks. In two of the schools, the software was installed the day it was received.
Here is a typical comment from an urban school staff member,
“I needed to get it approved by the math coach, and she was all on board. Then I got it approved at the building level. We had new administration this year so it took them a few weeks to get around to it, and then they were all for it. Then it got sent to the district level. Since your games had already been approved by the district, that was just a rubber stamp but it took a few weeks until it got back to us, then we had all of the approvals so we needed to get it installed but the person who had the administrator password had been laid off. Fortunately, I had his phone number and I got it from him. Then, we just needed to find someone who had the spare time to put the game on all of the computers. All told, it took us about three months, which was sad because that was a whole semester lost that the kids could have been playing the games. “
And here is a typical comment from a rural staff member.
“It took me, like, two minutes to get approval. I called the IT guy and he came over and installed it.”
The differences sound pretty dramatic, but are they different from what one would expect by chance, given the small sample size? Since we can’t use a chi-square, we’ll use Fisher’s exact test. Here is the SAS code to do just that:
PROC FREQ DATA = install ;
TABLES rural*install / CHISQ ;
Wait a minute! Isn’t that just a PROC FREQ and a chi-square? How the heck did I get a Fisher’s exact test from that?
Well, it turns out that if you have a 2 x 2 table, SAS automatically computes the Fisher exact test, as well as several others. I told you that you could see the full results here but you didn’t look, now, did you?
You can see the full results here.
In case you still didn’t look, the probability of obtaining this table under the null hypothesis that there is no difference in administrative barriers in urban versus rural districts is .0034.
If you think these data suggest it is easier to adopt educational technology in rural districts than in urban ones, well, not exactly. Rural districts have their own set of challenges, but that is a post for another day.
Feb
9
Watch me work: Finishing the test scoring with more SAS character functions
February 9, 2016 | Leave a Comment
Recall that in the last post we were using SAS functions to score a test that had been completed by middle school and upper elementary students. Since we wanted to make it as easy as possible for students to enter their answers, we accepted just about any format.
Picking up where we left off …
SUBSTR FUNCTION – READING ONLY PART OF AN ANSWER
In one question, the correct answer is 1/8 . Students entered 1/8, 1/8 cup, 1/8 cup of beans, and so on. To score these, we use the substr function to read the first 3 characters and score the problem correct if those are “1/8”
If substr(q22,1,3) = “1/8” then q22 = 1 ;
else q22 = 0 ;
MISSING FUNCTION – TO CHECK FOR MISSING DATA
For q27, students clicked on which of the equations are correct. If they clicked the correct equation, the variable was set to 1.
When they didn’t click on anything, it was missing. I wanted that to be changed to a zero, so I used the missing function, like this:
if missing(q27) then q27 = 0 ;
ARRAY, DO-LOOP and INPUT FUNCTION
Now, SAS is not the newest kid on the block
and I can relate, because neither am I, not even if I’m on a relatively old block
The problem with being an older language is that you have static types and you cannot have mixed arrays. What does that mean? It means that if you have defined q1 as a character variable because it might have a $ in it then by God it is going to stay a character variable and you can’t be doing any funny stuff like finding the mean and standard deviation of it. Also, if you are going to have an array, everything in it better be either all character variables or all numeric variables.
Well, fine, then, here is how I change all those now scored questions to numeric. First, I created a new numeric array of 32 items. You can tell it is numeric because there is no $.
Second, I used a DO loop and the INPUT function. The input function will read in a variable and and read it out in a different format, in this case, a numeric variable with a length of 8 and 0 decimal places.
I dropped the index variables j and i , which I mentioned in a previous post.
Now that I have my variables all nicely numeric, I can use the SUM function and add up all of the scored items into a total score.
array items {32} item1 – item32 ;
do j = 1 to 32 ;
items{j} = input(qs{j},8.0);
end;
drop i j ;
total = sum(of item1-item32);
Now that I have my data, the fun stuff begins, but that’s for another post because I need to get back to making games.
This is my day job. Check it out. Buy a game. Maturity is over-rated!
Or donate a game to a school for good karma.
Feb
5
Watch me work: Compress Function for Test Scoring
February 5, 2016 | 1 Comment
Did you ever fill out one of those online forms where you kept trying to submit it and got messages like,
“You need to enter your phone number in the format 311-234-12234”
or
You cannot have any special characters in this field.
That one really irritates me because, in fact, my last name has a space in it and many websites refuse to accept it. Take it up with The Invisible Developer, or his ancestors.
Have you ever just said the hell with it, and skipped filling out the form? Preventing users from entering all but the expected data type saves problems when you analyze your data, but it can also cause people to give up on your stupid web form.
So … when I created the pretest for Forgotten Trail and Aztech, I made it accept just about anything. If you wanted to write in 6, six, 9R6, 6 left over — any and all of those would be accepted and recorded.
You can get the first two games we developed here.
What now? I have to score that test, but I’d rather the difficulty be on me than 150 or so middle school students who are our first test group.
So… how to fix it, with SAS character functions. Here is me, scoring the first half of the test:
First, I read the data into a new data set because I want to preserve the original data and not write over it. I may want to look at the exact incorrect answers later.
I create a character array of all 32 items on the test, and then I use a DO loop to change all of the questions to upper case.
Data in.recode ;
set in.pretestGMS ;
array qs{32} $ q1 – q27 q28a q28b q28c q29 q30 ;
do i = 1 to 32 ;
qs{i} = upcase(qs{i}) ;
end ;
Now, on to the questions. I eventually need all of these items to be score 1= correct, 0= incorrect
q1 is a question about money. People put all kinds of wrong answers – $35, $40, as well as the correct answer, 100 and $100. I used the COMPRESS function to remove the ‘$’ , then set q1 to equal 1 if the answer was 100, an 0 otherwise.
q1 = compress(q1,”$”) ;
if q1 = 100 then q1 = 1 ;
else q1 = 0 ;
The second use of compress function removes trailing blanks – if you don’t put any second parameter in the compress function, it just removes blanks. In q2, the answer was 4 but the students put “four”, “four frogs” “4/14” and so on. All of these are correct. You can have a list in an IF statement and if the variable matches any of those values in the list, then do something, in this case, set the answer as correct.
q2 = compress(q2) ;
if compress(q2) in (“4″,”FOUR”,”FOURFROGS”,”4/14″,”4OUTOF”,”4FROGS”) then q2 = 1;
else q2 = 0 ;
*** How to keep only numeric data using a simple SAS function (take that all you regular expression fetishists!)
The third use of the compress function KEEPS the characters that are the second parameter, because I added an optional third parameter of “k”, to KEEP the characters in the second parameter instead of discard those. So, this keeps numbers and deletes everything else from the answer. If it is 150, it is scored correct, otherwise, it’s wrong.
if compress(q5,”0123456789″,”k”) = 150 then q5 = 1;
else q5 = 0 ;
A lot of the items were similar, so that is half of scoring the test. I’ll try to write up the rest from the airport tomorrow, but for now, I need to write a couple of emails, finish this scoring program and pack before 2 am, and that only gives me about 40 minutes.
Feb
3
Watch me work: Data Project
February 3, 2016 | Leave a Comment
On twitter, there were a few comments from people who said they didn’t like to take interns because “More than doing work, they want to watch me work.”
I see both sides of that. You’re busy. You’re not netflix. I get it. On the other hand, that’s a good way to learn.
So, here you go. I’m starting on a data analysis project today and I thought I’d give you the blow by blow.
It just so happens that the first several steps are all point-y and click-y. You could do it other ways but this is how I did it today. So, step one, I went to phpMyAdmin on the server where the data were saved and clicked Export.
For the format to export, I selected CSV and then clicked on the Go button. Now I have it downloaded to my desktop.
Step 3: I opened SAS Enterprise Guide and selected Import Data. I could have done this with SAS and written code to read the file, but, well, I didn’t. Nope, no particular reason, just this isn’t a very big data set so I thought, what the heck, why not.
Step 4: DO NOT ACCEPT THE DEFAULTS! Since I have a comma-delimited file with no field names, I need to uncheck the box that says File contains field names on record number. SAS conveniently shows you the data below so I can see that it is comma-delimited. I know I selected CSV but it’s always god practice to check. I can also see that the data starts at the first record, so I want to change that value in Data records start at record number to 1.
Step 5: Change the names – I am never going to remember what F1, F2 etc. are, so for the first 5 , I click on the row and edit the names to be the name and label I want.
That’s it. Now I click the next button on the bottom of the screen until SAS imports my data.
I could have continued changing all of the variable names, because I KNOW down the line I am not going to remember that F6 is actually the first question or that F25 is question 28a. However, I wanted to do some other things that I thought would be easier to code, so I opened up a program file in SAS Enterprise guide and wrote some code.
/* THIS CREATES TWO ARRAYS BECAUSE I AM TOO LAZY
TO RENAME 32 QUESTIONS INDIVIDUALLY
THE PRETEST DATA SET WAS CREATED BY THE STEPS ABOVE USING IMPORT DATA */
data pretest2 ;
set pretest ;
** NOTE THAT THERE IS A $ AFTER THE NUMBER OF ELEMENTS IN THE ARRAY
** BECAUSE THIS IS A CHARACTER ARRAY ;
array ren{32} $ f6-f37 ;
array qs {32} $ q1-q27 q28a q28b q28c q29 q30;
do i = 1 to 32 ;
qs{i} = ren{i} ;
end ;
** YOU CAN ALSO USE A RENAME STATEMENT TO RENAME THE SAME VARIABLES ;
rename f38 = date_test ;
*** SINCE I NO LONGER NEED THE VARIABLES F6- F37 OR THE INDEX VARIABLE FOR THE
ARRAY, I DROP THEM HERE ;
drop f6- f37 i ;
*** SOME STUDENTS SAVED THE TEST MORE THAN ONCE BECAUSE THEY SAVED BEFORE THEY WERE DONE AND AT THE END. SO, I SORT BY USERNAME AND TEST. WE WILL ONLY KEEP THE LAST ONE.
proc sort data=pretest2 ;
by username date_test ;
*** THIS KEEPS JUST THE LATEST TEST DATE. ALSO, WE TESTED THIS 45 TIMES IN
THE PROCESS OF GETTING READY FOR USE IN THE SCHOOLS. ALL OF OUR STAFF USED USERNAMES WITH ‘TEST” SO I USED THE INDEX FUNCTION TO FIND IF THERE WAS A “TEST” IN THE USERNAME AND, IF SO, DELETED THAT RECORD ;
data pretest2 ;
set pretest2;
by username date_test ;
if last.username ;
if index(username,‘TEST’) > 0 then delete;
run;
Okay, that’s it. Now I have my data all ready to analyze. Pretty painless, isn’t it?
Want to learn more about SAS?
Here is a good paper on Arrays made easy .
If you’re interested in character functions like index, here is a good paper by Ron Cody.
Blogroll
- Andrew Gelman's statistics blog - is far more interesting than the name
- Biological research made interesting
- Interesting economics blog
- Love Stats Blog - How can you not love a market research blog with a name like that?
- Me, twitter - Thoughts on stats
- SAS Blog for the rest of us - Not as funny as some, but twice as smart. If this is for the rest of us, who are those other people?
- Simply Statistics, simply interesting
- Tech News that Doesn’t Suck
- The Endeavor -John D Cook - Another statistics blog