Moving Average in SAS

When you are working with transactions or time series data, it is often convenient to create a moving average I have seen many different ways of computing moving averages in SAS programming. Some more efficient than others. In this post, I will present three different ways of computing moving averages using the data step, the IML language and finally PROC EXPAND. First of all, let us create a simple test data set for our examples.

data testdata;
input time value @@;
0 100 1 101 2 103 3 107 4 116 5 108 6 113 7 106 8 110 9 115 10 121
11 117 12 123 13 127 14 121 15 126 16 131 17 125 18 125 19 130 20 135

The test data set contains 21 ordered observations with a time variable and a value variable. This is a typical time series data set.

Data Step

First, let us consider creating a moving average using the data step. First, I set the number of periods n, I want to calculate my moving average on. Then I use array processing to calculate the desired number of lags, calculate the average over the current observation and the calculated lags, and finally drop the lagged variables again. You should be cautious about the tail when you create a moving average. In the first or second observations, I do not want a three period backwards looking moving average. That would not make sense. Therefore, I handle the cases where the observation number is less than the desired number of periods separately, and calculate the average only on the periods available. The result is seen on the right.

%let n=3;
data DataStepMoveAv;
   set testdata;
   array cum_{&n};
   do i = &n to 2 by -1;
   if n < &n then MovAv = sum(of cum_[*]) / n;
   else MovAv = sum(of cum_[*]) / &n;
   retain cum_:;
   drop i n;
Exponentially Weighted Moving Average from PROC EXPAND

The Data Step way is in my opinion a bit overly complicated, when you about the next ways of creating a rolling average I will present. It requires many complicated features of the data step and it is hard to implement by-group processing, though this is done beautifully in the macro from the book Cody’s Collection of Popular SAS Programming Tasks and How to Tackle Them. You can find the macro in the free code examples from the book in the link.


Next we look at an example using PROC IML. First, I read in the data created above. Then I set the number of periods I want to look bank (n=3). Next, I allocate a vector to hold the Moving Average values. Then I use n to calculate the range of values to be used for my rolling average. Again, remember the tails. In the data step solution, I used a separate if statement to deal with this. Here, I use max(1, i-n+i) to ensure that we do not use observations less than the first one.

proc iml;
   use testdata;
      read all var _ALL_; 
   close testdata;
   MovAv = j(nrow(value), 1, .);
   do i = 1 to nrow(value);
      idx = max(1,i-n+1):i;
      MovAv[i] = mean(value[idx]);
   title "Backward Moving Average from SAS/IML";
   call series(time, MovAv);

The IML way is in my opinion preferred over the Data Step method, because of the simplicity of the code. Also it is easier to calculate rolling averages for different groups and much easier to calculate both weighted and exponentially weighted rolling averages. Furthermore, it is much easier to adjust to a centered or forward looking moving average in a vectorized language such as IML than in the data step, where we would also have to incorporate leads of observations in the code. For examples of creating Weighted Moving Averages and Exponentially Weighted Moving Averages in SAS/IML, see the article Rolling Statistics in SAS/IML by Rick Wicklin.


Exponentially Weighted Moving Average from PROC EXPANDLastly, let us consider my favorite way of creating moving statistics in SAS, PROC EXPAND. In the blog post LEAD function in SAS, I said that is would not be my last post recommending PROC EXPAND. Below, I have written an Expand Procedure. In the procedure options, I specify the name of the output data set and the METHOD=NONE option to suppress the fitting of cubic splines between the data points. I have specified four Convert Statements in the procedure. The first one creates a simple three period backwards moving averageas in the previous examples. In the next one, I specify transformout=(movave(1 2 3)) in the statement options. This creates a Weighted Moving Average, standardizing the weights 1, 2 and 3 so that the variable WMovAv is given as (1 \cdot value_{t-2} + 2 \cdot value_{t-2} + 3 \cdot value_{t}) / (1+2+3). The third convert statement creates the variable CMovAv and assigns a three period centered moving average to it. That means that it takes the average of the previous, present and next observation in the time series data. Finally, the fourth convert statement creates an Exponentially Weighted Moving Average with smoothing weight number 0.3. The smoothing weight number (SWM) is a number between 0 and 1, which calculates the next value in the EWMA like this: EWMA_t = SWM \cdot value_t + (1-SWM) EWMA_{t-1}. This means that the Exponentially Weighted Moving Average is a 1 period Rolling Statistic with a single parameter between 0 and 1. You can see the resulting curves created by PROC EXPAND to the right.

proc expand data=testdata out=MoveAv method=none;
   id time;
   convert value=MovAv / transformout=(movave 3);
   convert value=WMovAv / transformout=(movave(1 2 3));
   convert value=CMovAv / transformout=(cmovave 3);
   convert value=EWMovAv / transformout=(ewma .4);

There are many different ways to create moving averages in SAS. In this post it is clear, that PROC EXPAND is preferred over other alternatives. The only reason for which I would use other methods than PROC EXPAND is if I did not have access to SAS/ETS. In fact, it it not only rolling averages, but rolling statistics in general that PROC EXPAND creates swiftly. Consult the Transformation Operations part of the PROC EXPAND Documentation to see how many rolling statistics the procedure can create easily. Also, check out Easy Rolling Statistics with PROC EXPAND for a great article on how to think of and calculate rolling statistics.

You can download the entire program from this post, including the code creating the plots here.