Dynamic Lags in SAS with the Hash Object

In the blog post The SAS Lag Function Explained By Example, I introduce the SAS Lag Function. The Lag function sets up a static queue. When the function is called, SAS returns the oldest element in the queue and inserts the current value in the queue. This queue is static. Today, I will introduce the idea of lagging variables with the hash object. We will see that the dynamic nature of the hash object makes it the perfect tool to create a dynamic queue for this purpose.

In the examples to some, I will use the simple data set below.

data have;
input id x;
datalines;
1 1
1 2
1 3
2 4
2 5
2 6
;

Dynamic Lags in SAS with the Hash Object

Let us see how to use the hash object to create lags. I divided the code into five steps.

  1. First, I create two hash objects h1 and h2. I use h1 to keep track of the number of times I have visited this id. I use h2 to hold the lagged values of x.
  2. Next, I read the data set from above. I initialize seq to 0 and lagx to missing.
  3. In step 3, I use the Find() Method to look up how many times the id has been visited already (seq). Next, we add 1 to seq. This gives us the correct number, regardless of whether it is the first visit or not. Finally, I use the Replace() Method to insert the number back into h1.
  4. Next, I use the Add() Method to insert the current value of x into h2. I relate the current value of x to id and seq. Consequently, h2 knows what value x had on the first visit within the same id. And the second, and third and so on.
  5. Finally, I look up the lagged value in h2. I do so with explicit key argument tags. The first one is simply the id. The second one is the seq minus how many lags I want. In this case two. I can do this because in step 3 and 4, I explicitly keep track of seq and insert it into the correct position in h2.

Run the code example below and verify that the result is similar to using a lag2 function.

data want(drop=rc seq);
 
   if _N_=1 then do;
      declare hash h1 ();                    /*  1  */
      h1.definekey('id');
      h1.definedata('seq');
      h1.definedone();
 
      declare hash h2 ();                    
      h2.definekey('id', 'seq');
      h2.definedata('lagx');
      h2.definedone();
   end;
 
   set have;
   seq=0; lagx=.;                            /*  2  */
 
   rc = h1.find();                           /*  3  */
   seq + 1;
   h1.replace();
 
   h2.add(key : id, key : seq, data:x);      /*  4  */
 
   rc = h2.find(key : id, key : seq-2);      /*  5  */
 
run;

Discusion

So why bother using the hash object to create lags in SAS when the Lag Function is available? Obviously, it takes more coding. However, there are two reasons why the approach above is superior to the static Lag Function. Firstly, we do not have to take explicit action to handle By Groups. Since id is a key variable in both hash objects, we can not lag across ids. Secondly, we do not need to worry about the order of observations. Unlike the Lag Function, the sorting or grouping of data does not matter. A direct consequence of the dynamic nature of the hash object.

If you work with very large data sets, you may run into memory trouble. Evetually, h1 and h2 will hold an entry for each observation in the input data set. However, you can see a clever workaround in the article at the bottom.

Summary

In this post, I demonstrate how to create dynamic and flexible lags in SAS with the hash object. We learn that the method is very flexible. Furthermore, it handles By Groups well. Also, the data doesn’t care about the grouping of the data. However, memory may be an issue as every record is eventually stored in the hash objects.

The inspiration for this post comes from the article Leads and Lags: Static and Dynamic Queues in the SAS DATA STEP by Mark Keintz. I highly recommend reading it.

You can download the entire code from this post here.