SAS Lead Function Example with Proc Expand

Granted, there is no such thing as a LEAD function in SAS. Due to the way that SAS processes the data step, it does not support a LEAD function. However, looking at threads in the SAS Community and Google search suggestions, many SAS users could use a LEAD function from time to time. In the following, I will present three examples of simulating a LEAD function in SAS. Two data step solutions and my favorite, using PROC EXPAND at the end. The three different solutions all create the same output data.

First, let us create some example data. For demonstration purposes, I will keep it very simple

data Sample;
   do time = 1 to 10;
      value = time * 2;
      output;
   end;
run;

The First Naive Approach: Sort-LAG-Sort

Since the logical reverse of lagging is leading, it makes sense to reverse (sort it descending) the data, use the LAG Function and reverse it back again. And for a simple dataset as above, it works. The method is presented below

proc sort data = Sample out=Sample_Sort;  /* Sort data ascending        */
   by descending time;
run;
 
data Lead_DataStep1;                      
   set Sample_Sort;
   lead1 = lag1(value);                   /* Look one observation back  */
   lead2 = lag2(value);                   /* Look one observations back */
run;
 
proc sort data = Lead_DataStep1;          /* Sort data back             */
   by time;
run;

This approach is intuitive, but for large datasets it is very performance costly to sort your data twice. Also, this method does not work well if you have a more complicated data structure and have to lead within by groups in the data.

Merge and Use FIRSTOBS Option

Merging the same dataset multiple times and using the Firstobs Option to control what observation is read first is another way of simulating a Lead Function in a SAS data step. Obviously, we do not want all the variables to be read multiple times. Only the variable we want to lead, which we then rename accordingly.

data Lead_DataStep2;
   merge Sample                                                /* Read in Sample data                                   */
         Sample(firstobs=2 keep=value rename = (value=lead1))  /* Merge it with itself, but start reading at second obs */
         Sample(firstobs=3 keep=value rename = (value=lead2)); /* Merge it with itself, but start reading at third obs  */
run;

This is a more elegant solution and avoids having to sort the data twice. Though we still have to read in the data more than once. And it is still hard to implement this solution if we want to lead within by groups (though it can be done, an example can be found in the article Leads and Lags in SAS by Mark Keintz). I use this technique in the post Working With Consecutive Dates and Years in SAS.

The Right Approach: PROC EXPAND

My absolute favorite method of simulating a LAG function in SAS is using PROC EXPAND, which is part of SAS/ETS. With PROC EXPAND we avoid reading the data more than once because it has a LEAD option. The code below creates the same data as the previous data step solutions.

proc expand data=Sample out=Lead_Proc_Expand method=none;
   id time;
   convert value=lead1 / transformout=(lead 1);
   convert value=lead2 / transformout=(lead 2);
run;

By default, PROC EXPAND interpolates using cubic splines. Since we do not want any interpolation we specify METHONE=NONE in the PROC EXPAND options. In the ID statement, we specify time to be the variable that identifies the observations. By default, PROC EXPAND interprets this as a date or datetime variable. In the convert statement, we specify the variable we want to lead and use the TRANSFORMOUT option with the LEAD option and how many observations we wish to look ahead.

PROC EXPAND By-Group Example

One of the huge advantages of using PROC EXPAND to simulate a LEAD function in SAS is that it easily handles leading within groups. Simply specify the BY statement and the name of the variable, that defines the groups.

data Group_Sample;
   do group = 1 to 4;
      do time = 1 to 4;
         value = time * 2;
         output;
      end;
   end;
run;
 
proc expand data = Group_Sample out=Lead_ProcExpand_Group method=none;
   by group;
   id time;
   convert value=lead1 / transformout=(lead 1);
   convert value=lead2 / transformout=(lead 2);
run;

Summary

From reading this post, you may have guessed that PROC EXPAND is one of my favorite procedures in SAS. I guarantee you that this is not the last time that I recommend using it for data manipulation purposes on my blog. I encourage you to take the time and explore the many possibilities that PROC EXPAND holds in the documentation here and in the great article Stupid Human Tricks with PROC EXPAND by David L. Cassell.

Also, check out the related posts Moving Averages In SAS, Use Temporary Arrays to Store Lagged Values in SAS and Investigating the Lag Function in SAS. Furthermore, I demonstrate a dynamic lagging approach in the blog post Dynamic Lags with the Hash Object, that can easily be extended to lookaheads.

You can download the entire code from this post here.

LEAD Function in SAS

The First Naive Approach: Sort-LAG-Sort

Merge and Use FIRSTOBS Option

The Right Approach: PROC EXPAND

PROC EXPAND By-Group Example

Summary