LEAD Function in SAS
Granted, there is no such thing as a LEAD function in SAS. Due to the way that SAS processes the data step, it does not support a LEAD function, though looking at various SAS Communities and Google search suggestions, many SAS users could use a LEAD function from time to time. In the following, I will present three examples of simulating a LEAD function in SAS. Two data step solutions and my favorite, using PROC EXPAND, at the end. The three different solutions all create the same output data.
First, let us create some sample data. For demonstration purposes, I will keep it very simple
data Sample; do time = 1 to 10; value = time * 2; output; end; run;
The First Naive Approach: Sort-LAG-Sort
Since the logical reverse of lagging is leading, it makes sense to reverse (sort it descending) the data, use the LAG function and reverse it back again. And for a simple dataset as above, it works. The method is presented below
proc sort data = Sample out=Sample_Sort; /* Sort data ascending */ by descending time; run; data Lead_DataStep1; set Sample_Sort; lead1 = lag1(value); /* Look one observation back */ lead2 = lag2(value); /* Look one observations back */ run; proc sort data = Lead_DataStep1; /* Sort data back */ by time; run;
This approach is intuitive, but for large datasets it is very performance costly to sort your data twice. Also, this method does not work well if you have a more complicated data structure and have to lead within by groups in the data.
The Second Naive Approach: Merge and Use FIRSTOBS Option
Meriging the same dataset multiple times and using the FIRSTOBS option to control what observation is read first is another way of simulating a LEAD function in a SAS data step. Obviously, we do not want all the variables to be read in multiple times, only the variable we want to lead, which we then rename accordingly.
data Lead_DataStep2; merge Sample /* Read in Sample data */ Sample(firstobs=2 keep=value rename = (value=lead1)) /* Merge it with itself, but start reading at second obs */ Sample(firstobs=3 keep=value rename = (value=lead2)); /* Merge it with itself, but start reading at third obs */ run;
This is a more elegant solution, and avoids having to sort the data twice. Though we still have to read in the data more than once. And it is still hard to implement this solution if we want to lead within by groups (though it can be done, an example can be found in the article Leads and Lags in SAS by Mark Keintz)
The Right Approach: PROC EXPAND
My absolute favorite method of simulating a LAG function in SAS is using PROC EXPAND, which is part of SAS/ETS. With PROC EXPAND we avoid reading the data more than once because it has a LEAD option. The code below creates the same data as the previous data step solutions.
proc expand data=Sample out=Lead_Proc_Expand method=none; id time; convert value=lead1 / transformout=(lead 1); convert value=lead2 / transformout=(lead 2); run;
By default, PROC EXPAND interpolates between data point using cubic splines. Since we do not want any interpolation we specify METHONE=NONE in the PROC EXPAND options. In the ID statement, we specify time to be the variable that identifies the observations. By default, PROC EXPAND interprets this as a date or datetime variable. In the convert statement, we specify the variable we want to lead and use the TRANSFORMOUT option with the LEAD option and how many observations we wish to look ahead.
PROC EXPAND By-Group Example
One of the huge advantages of using PROC EXPAND to simulate a LEAD function in SAS is that it easily handles leading within groups. Simply specify the BY statement and the name of the variable, that defines the groups.
data Group_Sample; do group = 1 to 4; do time = 1 to 4; value = time * 2; output; end; end; run; proc expand data = Group_Sample out=Lead_ProcExpand_Group method=none; by group; id time; convert value=lead1 / transformout=(lead 1); convert value=lead2 / transformout=(lead 2); run;
From reading this post, you may have guessed that PROC EXPAND is one of my favorite procedures in SAS. I guarantee you that this is not the last time that I recommend using it for data manipulation purposes on my blog. I encourage you to take the time and explore the many possibilities that PROC EXPAND holds in the documentation here and in the great article Stupid Human Tricks with PROC EXPAND by David L. Cassell.
You can download the entire program with the different approaches here