Finding Consecutive Dates and Years in SAS
In the blog post Working With Consecutive Events in SAS, I demonstrate a few approaches of how to work with consecutive same-group events. In that case, ‘consecutive’ meant observations that belong to the same group after each other. Today, I will investigate consecutive dates and years in SAS. Often programmers are required to identify periods in the data where a consecutive streak of n days or years take place. Ths problem is different from the one in the post above because a consecutive streak does not consist of same-value observations. Rather, it constist of observations that increase by a fixed interval (usually 1) for each observation. Today, I will demonstrate two approaches to the problem.
A Simple Data Step Approach
First let ud consider the following problem. I want to identify dates that consist of a consecutive sequence of dates of any length greater or equal to two. Obviously, I do not want the sequences to cross groups. Consider the data below. In the following data step, I use a Set Statement with a By Statement after it. I do this only to initialize the first.id and last.id variables. Next, I simulate a lead function by using the Firstobs=2 Option in the second read of have. Now, I have the next date for each observation. Regardless of the ID.
Now, I have the next date for each observation in the data. Next, I use the Dif Function to find the difference between the current and the previous date. Finally, I set the flag variable. I initialize it to zero. There are two situations where a date is a part of a consecutive sequence of dates:
- If the next date is equal to the current date plus one and the current observation in not the last in the By-Group (last.id=0)
- If the current date minus the previous date is equal to one (dif(date)=1) and the current observation in not the first in the By-Group (first.id=0)
If any of these conditions are true, I set flag=1. Run the code below and verify the desired results.
data have; input id date :ddmmyy10.; format date ddmmyy10.; datalines; 1 20/11/2019 1 21/11/2019 1 22/11/2019 1 25/11/2019 1 27/11/2019 1 29/11/2019 2 20/11/2019 2 21/11/2019 2 23/11/2019 2 25/11/2019 2 26/11/2019 2 28/11/2019 3 29/11/2019 3 01/12/2019 ; data want(drop=_:); set have; by id; merge have have (firstobs=2 rename=(keep=_date rename=date=_date)); _dtdif=dif(date); flag=0; if date+1=_date & last.id=0 then flag=1; if _dtdif=1 & first.id=0 then flag=1; run;
A SAS Hash Object Approach
Consider the following problem. I want to identify all observations within the same ID that is part of a four year streak of consecutive years. That is, if year is equal to 2004, I want to flag the observation only if there are three years around it in a sequence. For example, I want to flag it if 2003, 2005 and 2006 exists. But 2003, 2005, 2007 does not count because it does not create a four year consecutive streak. Be aware, that in this particular problem, I do not care about the order of years. As long as the relevant years exist within the same ID, I will flag the observation.
I use a hash object scheme to approach this problem. First, I declare the object. Next, I read the have data set sequentially. Next, I iterate y from -3 to 3. I use the check method for each y on year+y. Consecuently, I look three years back and forward sequentially. If the search is successful, I increase the counter c by 1. If not, I set c to 0. This means that if c equals 4 at come point, I have identified a sequence of 4 years, which the current observation is a part of. If that is the case, I set flag to one, and exit the loop.
Run the code and verify the results. For ID=1, the first four observations form the desired sequence. Furthermore, they appear right after each other. They do not have to though. For ID=2, I identify 2003-2006 as a 4 year consecutive sequence. However, these years are not ordered ascending within the ID.
data have; input id year; datalines; 1 2003 1 2004 1 2005 1 2006 1 2008 1 2009 2 2006 2 2003 2 2004 2 2009 2 2005 2 2008 3 2003 3 2004 ; data want(drop=y c); if _N_=1 then do; declare hash h(dataset:'have'); h.definekey('id', 'year'); h.definedone(); end; set have; do y=-3 to 3; if h.check(key:id, key:year+y)=0 then c=sum(c, 1); else c=0; if c=4 then do; flag=1; leave; end; end; run;
In this post, we consider two problems regarding consecutive dates and years in SAS. I present two approaches. A plain data step approach and a hash object approach. The two solutions attack the problem in widely different manners.
Naturally, these kind of problems exist in many different forms. And there are many good solutions out there. The approaches in this post are not among the sommon ones. Browse the SAS Community to find more approaches to problems like these.
You can download the entire code from this post here.