Fixing Data: Part 1 of a zillion – duplicate dates

I read a comment on line saying SAS probably would not disappear as an option for statistical analysis because “it’s good when you need to do a lot of data manipulation”.

I wonder what world those people live in that data comes all cleanly packaged and whether they have unicorns there.

Back on Planet Earth, I have a data set that has multiple records for the same date for the same students.  For some reason, the data were being sent at the end of each screen at one site, instead of at the end of the test. So, the data look like this:

kat123 4 5 18 11   2017-04-23 17:39:26

kat123 4 5 18 11   42 17 8 0 1 2017-04-23 17:41:12

and so on.

The students also took a post-test, months later, so …

I need the last record for each date, but my data has date and time

You might think doing

testday= datepart(date_entered);

would work and it would except for the fact that

My date is saved as a character format! What do I do?

You can read some suggestions here in SAS communities

https://communities.sas.com/t5/Base-SAS-Programming/how-to-convert-char-var-to-sas-date/td-p/45067

I could not find

2016-02-03 19:41:26

and I spent a good hour trying different methods to get this to work. I will spare you the details and maybe I could have gotten some method to work (no, whatever you are considering, I probably already tried). However, this occurred to me …

Do you really need to change it to a date format?

In this case, I was not doing any calculations with the date value, I simply needed the day part as a unique value.

I could just use the first 10 characters like this

day_of_test = substr(date_entered,1,10) ;

If you figured this out in the first sentence or two you are probably laughing by now (shut up).  Yes, it doesn’t matter if it is formatted as a date or not. So, that is what I did.  After creating a variable that is just the day of the test, I sorted by username, day of test and date entered (which included the time value). Then, I read in the data using the BY statement in the Data step so there would be  last. variable created that is whether or not this is the last record with that value in the BY group.  I output the last record for each day by using a subsetting IF statement.

Data fixdata ;
set mydata.aztech_pre ;

*** CREATE day_of_test variable as characters 1-10  ;
day_of_test = substr(date_entered,1,10) ;

*** SORT by username, day of test and date entered (including time);
proc sort data=fixdata;
by username day_of_test date_entered ;

*** DATA step that only saves last record ;
Data mydata.aztech_pre ;
set fixdata ;

***  BY statement to define that the data is by username and day_of_test ;
*** NOTE:  If you didn’t do the PROC sort first, this won’t work. For shame! ;
by username day_of_test ;

***
if last.day_of_test  ;
run;

So, that worked perfectly. I included my missteps because it is easy when you are a newbie to believe that everyone is smarter than you and never makes bonehead mistakes. Not so. We all make them all of the time. The important thing is, figuring it out in the end. Sometimes the easy way is not so obvious.

Or, maybe it is and I’m a bonehead. Either way, it worked. Now on to step 2.

 

When I am not writing about SAS, I’m making games that teach math, social studies and language.

Check them out.

screen shots from our games

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *