|

Reading JSON data saved in a column and other SAS string function tricks

Sometimes data changes shape and type over time. In my case, we had a game that was given away free as a demo. We saved the player’s game state – that is, the number of points they had, objects they had earned in the game, etc. as a JSON object. Happy Day! The game became popular. Schools started using it. We came out with a premium version with lots more activities, a bilingual Spanish-English version, a bilingual Lakota- English version. Life is good.

Making Camp Premium on Google Play 

Once schools started using our games, they wanted data on how much students played, how many problems they answered. This is when life started to get complicated. We added more fields to the JSON object to show which activities they had completed and whether they had won.

Data for one person might look like this, or much, much longer.

“{“”points””:””8″”,””first_trade””:””false””,””first_visit””:””true””, “”has_wigwam””:””true””,””inventory””:””6,3,4,14″”,””inventory_position””:””[{\””left\””:0,\””top\””:0},{\””left\””:0,\””top\””:0},{\””left\””:0,\””top\””:0},{\””left\””:176,\””top\””:4},{\””left\””:-309,\””top\””:-254},{\””left\””:0,\””top\””:0},{\””left\””:619,\””top\””:-45},{\””left\””:0,\””top\””:0},{\””left\””:0,\””top\””:0},{\””left\””:0,\””top\””:0},{\””left\””:0,\””top\””:0},{\””left\””:0,\””top\””:0},{\””left\””:0,\””top\””:0},{\””left\””:0,\””top\””:0},{\””left\””:293,\””top\””:-44},{\””left\””:0,\””top\””:0},{\””left\””:0,\””top\””:0},{\””left\””:0,\””top\””:0},{\””left\””:0,\””top\””:0},{\””left\””:0,\””top\””:0},{\””left\””:0,\””top\””:0},{\””left\””:0,\””top\””:0}]””,””milestone””:””EarnPage””}”

The JSON engine didn’t work (not surprisingly)

I actually didn’t expect it would work because we didn’t actually have a JSON object but rather a JSON object converted to a string and saved in an SQL database. However, if something would only take a minute, I try it first

libname testmcb JSON fileref =reffile ;

Okay, so what now? We have a variable record length, the values we want might be in any column. It must be really difficult to figure out, right? Not really.

This next bit looks complicated but it actually took very little time to code. Before we get into the code, let’s talk about what I needed to do and why. The “why” is that I want to know how much kids played the games. There are three milestones – logging in, getting a wigwam and trading for items. However, some kids just liked playing the mini-games and they played A LOT without pausing to trade in their points. Also, you can’t just look at how many points they have because they may have traded some points for items. Fortunately, all items cost two points, so I need to compute points + (items in inventory/2).

  • First, there were some records that did NOT start with points, they started with “milestone” because we were coding the JSON object one way and then we switched. (Pause to swear under my breath at people here.) So, I need to decide what type of record it was.
  • Second, I want to read in those variables that are milestones in the game play. That is, is this their first time playing, did they get points, did they get a wigwam and did they trade for anything for their wigwam. I need to keep in mind that a variable might be coded 1 or true if the player passed that milestone and 0, false or null if not. (Pause for more swearing.)
  • Third, I want to create Boolean variables that show whether or not a player passed a particular milestone.
  • Fourth, find out the NUMBER of objects in the inventory.
  • Fifth, divide the number in inventory by two and add to the points

This actually took me very little time. Let’s look at it bit by bit (ha ha).

The first line just assigns a file reference for where my text file is located.

FILENAME reffile '/home/annmaria.demars/data_analysis_examples/data2018/exceldata/mc_bilingualtest.txt';

Data mcb_json ;
INFILE reffile DLM="," LRECL = 1337 ;

*** READ IN FILE, VARIABLES ARE SEPARATED (DELIMITED) BY A COMMA. LENGTH OF THE RECORD IS 1,337 ;

INFORMAT first_trade $40. first_visit $40. has_wigwam $40. ;
INPUT  firstcol $ 15-16 @@ ;
IF firstcol = '":' THEN  INPUT @43 points $ first_trade $ first_visit $ has_wigwam $ ;
ELSE INPUT @15 points $ first_trade  first_visit  has_wigwam  ;

***** READ IN THE VARIABLES FOR TRADING, FIRST VISIT AND WIGWAM AS CHARACTER WITH A LENGTH OF 40 ;
****  READ COLUMNS 15-16 AND STAY ON THAT LINE ;
**** IF THE VALUE IN COLUMNS 15-16 = “:  THEN IT HAD MILESTONE AT THE BEGINNING;
***** THE VARIABLES WE WANT START AT COLUMN 43, OTHERWISE, THEY START AT COLUMN 15;
*** THESE ARE SEPARATED BY COMMAS SO WE CAN JUST LIST THE VARIABLE NAMES;
*** NOTE THAT I NEED TO SPECIFY THAT POINTS IS CHARACTER DATA SINCE IT’S NOT IN MY INFORMAT STATEMENT ;

IF INDEX(first_trade,"true") > 0 or index(first_trade,"1") > 0 then traded =0 ;
ELSE IF INDEX(first_trade,"false") or index(first_trade,"0") then traded =1 ;
IF INDEX(first_visit,"false") or index(first_visit,"0") then play_twice = 1 ;
ELSE IF INDEX(first_visit,"true") or index(first_visit,"null") or index(first_visit,"1") then play_twice = 0 ;
IF INDEX(has_wigwam,"true") then wigwam = 1 ;
ELSE IF INDEX(has_wigwam,"null") > 0 or index(has_wigwam,"0") > 0 then wigwam = 0 ;

*** ABOVE I JUST USE A FEW IF THEN STATEMENTS TO CREATE MY NEW VARIABLES ;
**** The INDEX function returns the position in the variable in the first argument where it finds the value in the second argument;
**** If the value isn’t found, it returns a value of 0 ;

This last step isn’t strictly necessary, except that it is. Here I do a cross-tabulation to make sure that all of the variables were assigned correctly and they were.  For example, you can see that if the value of “has_wigwam” was 0 or null the wigwam variable was set to 0. If has_wigwam was equal to “true” the wigwam variable was set to 1.

PROC FREQ ;
TABLES  traded*first_trade first_visit*play_twice wigwam*has_wigwam ;

Screen shot of table of has_wigwam by wigwam values

This would have worked for the whole JSON object except for the commas in the inventory. If you look at just a piece of the data, you can see that after the variables denoting milestones there is a variable that is actually an array, separated by commas.

“first_visit””:””true””, “”has_wigwam””:””true””,””inventory””:””6,3,4,14″”,”

We’ll look at how to handle that in the next post.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *