Feb
26
Whipping your data into shape with SAS : Day 2 Fixing Errors & Identifying Input Datasets
February 26, 2018 | Leave a Comment
Last post, we happily uploaded our data, read it into SAS using a combination of SAS utilities and coding, decided all was lovely and used this code to concatenate the 4 datasets.
DATA allplants ;
set import1 – import4 ;
IF you get an error at this point, what should you do?
Let’s say you get the error below?
118
119 Data allplants ;
120 set import1 – import4 ;
ERROR: Variable Finance_Commission___Interest_Co has been defined as both character and numeric.
121 run ;
This is one of those examples where you can be too clever. We aren’t going to use this variable in the analysis so let’s just drop it. Ask yourself, do I need this variable? If the answer is , as in this case, no you don’t, just drop it.
- The (drop =) after the dataset name will drop the variable you list.
- The (in = a) creates a temporary variable, a, that is true of the record comes from the dataset import1 and false otherwise.
- Since both options go in parentheses after the data set name you include both of these in the same set of parentheses.
- Now that you have the variables denoting the source dataset , you can use those in IF-THEN-ELSE statements like any other variable.
Data allplants ;
set import1 (drop =Finance_Commission___Interest_Co in=a)
import2 (drop =Finance_Commission___Interest_Co in=b)
import3 (drop =Finance_Commission___Interest_Co in=c)
import4 (drop =Finance_Commission___Interest_Co in = d);
if a then group = “student” ;
else if b then group = “control” ;
else if c then group = “devloper” ;
else if d then group = “testcase” ;
run ;
Now we’ve dropped the troublesome variable and have a group variable based on the source.
So, this code SEEMS like it should work and the data are all good. We look at the log and see no errors, but maybe we should take some more steps just to be safe.
If you’d like a whole lot easier statistics and to take a brief break from maturity while learning about Latin American history and culture, check out AzTech: The Story Begins
Feb
24
Whipping your data into shape with SAS : Part 1 for Today
February 24, 2018 | Leave a Comment
I’m sure I’ve written about this before – after all, I’ve been writing this blog for 10 years – but here’s something I’ve been thinking about:
Most students don’t graduate with nearly enough experience with real data.
You can use government websites with de-identified data from surveys, and I do, but I teach primarily engineering and business students so it would be helpful to have some business data, too. Unfortunately, businesses aren’t lining up to hand me their financial, inventory and manufacturing data (bunch of jerks!)
So, I downloaded this free app, Medica Scientific from the app store and ran a simulation of data for a medical device company. Some friends did the same and this gave me 4 data sets, as if from 4 different companies.
Now, that I have 4 Excel files with the data, before you get to uploading the file, I’m going to give you a tip. By default, SAS is going to import the first worksheet. So, move the worksheet you want to be first. In this case, it’s a worksheet named “Financials”. Since SAS will use the first worksheet, it could just as well be named “A whale ate my sandwich”, but it wouldn’t be as obvious.
While you are at it, take a look at the data, variable names in the first row. ALWAYS give your data at least a cursory glance. If it is millions of records, opening the file isn’t feasible and we cover other ‘quick looks’ in class.
These steps and the next few use SAS Studio, which is super-duper helpful for online courses.
1. Upload the file into the desired directory
2. Under Tasks and Utilities select Utilities and then Import Data
3. Click select file and then navigate to the folder where your file is and click open
4. You’ll see a bunch of code but nothing actually happens until you click on the little running guy.
Okay, now you have your code. Not only has SAS imported your data file into SAS, it’s also written the code for you.
FILENAME REFFILE '/home/annmaria/examples/simulation/Tech2Demo.xlsx';
PROC IMPORT DATAFILE=REFFILEDBMS=XLSX OUT=WORK.IMPORT1;
GETNAMES=YES;
RUN;
PROC CONTENTS DATA=WORK.IMPORT1;
RUN;
Now, if you had a nice professor who only gave you one data set, you would be done, which is why I showed you the easy way to do it.
However, very often, we want to compare several factories or departments or whatever it is.
Also, life comes with problems. Sigh.
One of your problems, which you’d notice if you opened the data set is that the variables have names like “Simulation Day” . I don’t want spaces in my variable names.
My second problem is that I need to upload all of my files and concatenate them so I have one long file.
Let’s attack both of these at once. First, upload the rest of your files.
Now, open a new SAS program and at the top of your file, put this:
OPTION VALIDVARNAME=V7 ;
It will make life easier in general if your variable names don’t have spaces in them. The option above automatically recodes the variables to valid variable names without spaces.
Now, to import the next 3 files, just create a new SAS program and copy and paste the code created by your IMPORT procedure FOUR TIMES (yes, four).
From Captain Obvious:
Although you’d think this would be obvious, experience has shown that I need to say it.
- Do NOT copy the code in this blog post. Copy the code produced by your own IMPORT procedure, it will have your own directory name.
- Do NOT name every output data set IMPORT1 because if you do, each step will replace the data set and you will end up with one dataset and be sad.
Since I want to replace the first file, I’m going to need to add the REPLACE option in the first PROC IMPORT statement.
OPTION VALIDVARNAME=V7 ;
FILENAME REFFILE '/home/annmaria/examples/simulation/Tech2Demo.xlsx';
PROC IMPORT DATAFILE=REFFILEDBMS=XLSX
REPLACE
OUT=WORK.IMPORT1;
GETNAMES=YES;
RUN;
PROC CONTENTS DATA=WORK.IMPORT1;
RUN;
FILENAME REFFILE '/home/annmaria/examples/simulation/Tech2Demo2.xlsx';
PROC IMPORT DATAFILE=REFFILEDBMS=XLSX
REPLACE OUT=WORK.IMPORT2;
GETNAMES=YES;
RUN;
PROC CONTENTS DATA=WORK.IMPORT2;
RUN;
Do that two more times for the last two datasets
Did you need to do the utility? Couldn’t you just have done the code from the beginning? Yes. I just wanted to show you that the utility existed. If you only had one file and it had valid filenames, which is a very common situation, you would be done at that point.
In a real-life scenario, you would want to merge all of these into one file so you could compare clinics, plants, whatever. Super easy.
[IF you have write access to a directory, you could create a permanent dataset here using a LIBNAME statement, but I’m going to assume that you are a student and you do not. The default is to write to the working directory. ] ;
DATA allplants ;
set import1 - import4 ;
IF you get an error at this point, what should you do?
There are a few different answers to that question and I will answer them in my next post.
Feb
15
How SAS Helped Me Make Our Best-Selling Educational Game: Part 2
February 15, 2018 | Leave a Comment
Last time, I gave a bit about the requirements of a game to match the most synonyms in one minute, and how what I learned using SAS was a basis for several parts of the game. This activity is going into Making Camp Premium, which will be a paid version of our best-selling game, Making Camp Ojibwe. I don’t know if you can call it best-selling because you can download it for free, and Spirit Lake has been around longer so has more players, but Making Camp gets more new downloads each month than any of our other games. This is surprising since the game is written in JavaScript and we have other games made with Unity that have way cooler effects. Just goes to show you can’t predict perfectly what kids will like.
While you are waiting for me to finish this game, head to the app store and get Making Camp Ojibwe , free, for your iPad.
Now, back to the synonyms game. We’d finished the timer, which, when it ended, showed your title points and a happy or sad image.
Okay, this first part is boring, just initializing a bunch of variables I will use later.
var thisone = 0; var boxmove = 0; var thisel; var thesepts = 0; var question ; var correct = 0 ; // This is the array of words. The first is displayed as the word to match ; // The next three words are synonyms and the last four words are incorrect answers ; var words = [ ["large", "big","enormous","gigantic","awkward","introspective","sane", "bulbous"], ["fast", "rapid","quick","speedy","awkward","boring","dull", "bulbous"], ["fat", "stout", "thick", "overweight", "thin", "unprofitable", "sense", "dazzling"], ["bad", "terrible", "not good", "awful", "couch", "sad", "ugly", "usual"], ["angry", "mad", "furious", "livid", "happy", "simple", "connected", "personal"], ["tale", "story", "fable", "yarn", "hind leg", "hippo", "newspaper", "earnest"], ["little", "small", "tiny", "itty bitty", "large", "thoughtless", "sleek", "perturbed"], ["strange", "odd", "queer", "weird", "couch", "sad", "ugly", "happy"], ["rare", "uncommon","unusual","not typical","irate","musical","aromatic", "within"] ];
I need more rows in this array. If you feel creative and want to help a sister out, post a word and 3 synonyms in the comments. Getting back to SAS, I have used SAS arrays since they first came out and were implicitly indexed. In other words, it’s been a minute. If one-dimensional arrays were great, two-dimensional arrays were great-squared. Some people will tell you that JavaScript does not have two-dimensional arrays and rather, you have an array of arrays. To those people, I say, “Bah, humbug!”
Systematic Random Sampling Saves the Day
Alrighty, then, on to creating the synonym problem. Sometimes you can be too clever. My challenge was to make sure that the choices were put in random order so that the first 3 boxes weren’t always the correct answer. I went through a lot of possible solutions where I tried to splice the array to pull out a word randomly used, then pull another random choice from the shortened array, using the length attribute.
After all of that, I realized there was a really simple solution. Pull out a random number. Take that and the rest of the items in the row, then start at the beginning again. Systematic random sampling. Yep. Super simple. Every useful programming language on earth has a random number function, including SAS, of course. First, we randomly pull a row out of the array. Then, we start with the n+1 word in that array, when n is a random number between 1 and 7. (Look at qnum to see how we get that). We pull the word that is in the n+1 position in the row and assign it to the first box. Then, the next box gets the next word in order. When we get to the end of the row, the next box will have the first synonym. So, if my random number is 5, the boxes for the choices are words # 5, 6, 7, 1 , 2, 3, 4 and boxes 4-6 are the correct answers.
Next, we have a
for (var i=1; i < 8; i++) {
some code
}
Really it is the exact same as
DO i = 1 to 7 ;
*** some code ;
END ;
After that, there are some IF- THEN – ELSE and assignment type statements. The only thing not really applicable to SAS is draggable function and appending some divs to the page.
I started this post writing about how everything in SAS made it easy for me to develop games using JavaScript but now that I think of it, it would work just as well the other way and if you know some JavaScript, learning SAS would be a piece of cake. You can check out the code below. It’s getting late here in Santiago, Chile and I still want to call my infinitely patient husband back in California so I’ll pick up next time on scoring the answers right or wrong.
/* THIS CREATES THE PROBLEM. A word is selected randomly from the array, then the start point in the list of synonyms is randomly selected. This is systematic random sampling. The words are put in boxes for the divs starting with the random number and when it gets to 7, it goes back to the beginning of the word list (but after the word you are finding the synonym for, that's why you need the 1+ ) Divs that get the first 3 synonyms in the array are assigned a class of 'right' and the others are assigned a class of 'wrongb'. Draggable function is assigned to each of the choice boxes created. If the choice is correct, the variable thisone is assigned the value of 1 when the box is dragged; */ function createProblem() { question = Math.floor(Math.random()* words.length); $("#segment2").text(words[question][0]); var qnum =1 +Math.floor((Math.random() * 7)) ; // Start at random number ; for (var i=1; i < 8; i++) { divid = "#div" + i ; var boxid = "#box" + i ; if (qnum < 4) { $(divid).append('<div class="smallbox draggable right" id="' + boxid + '">'+ words[question][qnum] + '</div>'); } else { $(divid). append('<div class="smallbox draggable wrongb" id="' + boxid + '">' + words[question][qnum] + '</div>');} $('.draggable').draggable({ start: function (event, ui) { if ($(this).hasClass('right')) { thisone = 1; thisel = this; } else { thisone = 0; } } }); if (qnum < 7) {qnum++;} else {qnum = 1; } } } // END CREATION OF WORD BANK PROBLEM ;
Feb
14
SAS taught me how to make best-selling games
February 14, 2018 | 1 Comment
I’m going to be speaking at SAS Global Forum about the places you can go starting your career with SAS, for example …
If you know anything about SAS, you might think from the title that I used my mad data analysis skills to figure out what works and what doesn’t for games. While that is somewhat true, it is not at all what this post is about. In fact, learning SAS first helped me a lot when it came to actually MAKING games. No, there is not a lick of SAS code in our games, but the concepts and ideas came to me fairly easily because of my experience using SAS.
(If you read this and start to post a comment saying I could have learned everything here from Python or C or whatever your favorite language is, I am sure you are right. The fact is, though, I didn’t. )
Let me give you an example:
The object of the game is to match as many synonyms as possible in one minute. This is what has to happen:
- On loading the page, randomly select a word to display on the screen, start the timer and music
- Show the number of seconds on the page, going down every second
- On the page, show 7 other words, 3 that are synonyms and 3 that are not synonyms, making sure that the correct and incorrect words show up in random order.
- If the player drags a correct word into the box, it turns green and adds 1 point to the score.
- If the player drags an incorrect word, the box turns red
- If all three choice boxes are filled, all the boxes are cleared and a new word and choice boxes are shown
- When the time is up, if the player has a perfect score, show a happy image and appropriate text.
- If the player doesn’t have a perfect score, show a less happy image and appropriate text.
- When time is up, show a button the player can click to play again.
What in the heck does all of that have to do with SAS? It’s all written in JavaScript, the reason for that is a post for another day, but let’s look at some code:
<script type="text/javascript"> $(document).ready(function () { //Timer script ; var time = 60000; var timer ;
This first bit just starts a script, and the beginning of a function that will execute when the document is ready. That is, I don’t want JavaScript to try acting on elements that aren’t loaded yet. My first exposure to writing functions was in the 1980s. It was a very significant event. I swear, I even remember the cramped graduate assistant office at the University of California, Riverside where I read my first book of SAS macros. I think it was a book of macros written by users. This is how we distributed things before the Internet. I thought the idea of writing my own functions was the coolest thing I had ever heard.
Now, for the timer. Everyone knows what a variable is, or you do if you did anything with any language. Here, I am initializing the time to 60,000 milliseconds. Initializing a variable, another basic idea I learned from SAS. I’m going to use that other variable, timer, later to execute the myTimer function. Just wait.
//Timer script ;
function myTimer() { if (time > 0) { var nowTime = time/1000 ; document.getElementById("timer").innerText = nowTime ; time = time - 1000; } else if (time <= 0) { document.getElementById("timer").innerText = "0"; clearInterval(timer); $("#form1").hide(); //IF ALL OF YOUR ANSWERS WERE CORRECT ; if (correct === boxmove) { $("#correct").text("PERFECT! You answered " + thesepts + " correctly."); $("#correcto").slideDown('slow'); playAudioLocal("../../sounds/correct1"); } else { $("#wrongo").show(); $("#incorrect").text("You answered " + thesepts + " correctly.").slideDown('slow'); playAudioLocal("../../sounds/flute"); } $("#redo").show(); }
Except for a few specific details, everything in the script above, I learned or improved from using SAS.
IF- THEN-DO-END – instead of DO and END , I have an opening { and a closing } but it’s the same thing.
If the time is greater than 0, the variable nowTime is going to be set to time divided by 1,000 since most people would prefer to see their time in seconds rather than milliseconds. By the way, nowTime is a local variable, defined within a function. Local variables is another idea I first learned from SAS macros, thank you very much. The text of the element in the page named ‘timer’ is now set to whatever the number of seconds remaining is (nowTime). We deduct another milliseconds from time.
ELSE – DO is another common SAS bit of code . If there is no time left, do all of this stuff, e.g., set the time value to 0, stop calling the timer function.
You can have nested IF-THEN-DO code in SAS, as I do here in my JavaScript.
While SAS didn’t introduce me to text functions, it’s where I learned a lot of them. Here, we have a JavaScript text function where I’m concatenating a string with a variable and then another string.
So, we’ve knocked off numbers 2, 7, 8 and 9. All of the showing and hiding elements had nothing to do with SAS . That part was straight jQuery but that was the easy part. Actually, this whole part was pretty easy. A few tricky bits show up later on. Maybe I’ll get to them in my next post. While you are waiting with bated breath …..
Check out Making Camp because maturity is overrated. Learn Ojibwe history, brush up on your math skills and build out your virtual wigwam.
Blogroll
- Andrew Gelman's statistics blog - is far more interesting than the name
- Biological research made interesting
- Interesting economics blog
- Love Stats Blog - How can you not love a market research blog with a name like that?
- Me, twitter - Thoughts on stats
- SAS Blog for the rest of us - Not as funny as some, but twice as smart. If this is for the rest of us, who are those other people?
- Simply Statistics, simply interesting
- Tech News that Doesn’t Suck
- The Endeavor -John D Cook - Another statistics blog