Replace Missing Values With Previous Non Missing

A frequently asked question on the SAS Communities is how to replace missing values with the previous/most recent non missing value. In this post, I will demonstrate how to do this with the UPDATE Statement. The update statement is a bit overlooked, but in situations like this, it is a life saver.

A SAS Example

First, let us create some example data. I create a very simple data set with ID and value as variables, where ID has three levels: 1, 2 and 3 and value is a numeric variable with a few missing values.

data MyData;
input ID$ Value;
datalines;
1 2
1 .
1 .
1 4
2 .
2 9
2 .
2 .
3 3
3 .
3 0
;

Obviously, this data is for demonstration purposes only. As you can see, Value has a few missing values. I want to replace these missing values with the previous non-missing value. Furthermore, I want to do this By Group. Meaning, I want to “drag” non missing values down over missing values, but not crossing IDs.

I use the UPDATE Statement with MyData as both the master data set and the transaction data set. I specify ID in the BY Statement, so naturally I assume that the data is sorted by the ID variable.

data Want;
   update MyData(obs=0) MyData;
   by ID;
   output;
run;
 
proc print data=Want;
run;

The key to this trick is the UPDATE Statement. I use MyData(obs=0) as the master data set to load the relevant variables and no observations into the PDV. Next, I also use MyData (all observations included) as the transaction data set and update the master data set from this. Also, I have to use an explicit OUTPUT Statement, because by default SAS only updates the first observation in each by group and outputs that.

You can see the resulting data to the right. As desired, we have replaced the missing values with the most recent non missing value. Also, we are not crossing over any By Groups, which is rarely desired to do in this kind of problems.

Summary

This post demonstrates how to replace missing values with previous non-missing values. I make heavily use of the update statement using the same data set (with different observation counts) as both the master and transaction data set. If you are not familiar with the UPDATE Statement and the syntax of the master – and transaction data set, I strongly encourage you to consult the SAS Documentation on the statement and the great examples included.

I have previously written posts about how to Replace Missing Values With Mean and Replace Missing Values With Zero.

You can download the entire code from this post here.