Replace Missing Values With Mean

In the blog post Replace Missing Values With Zero, I demonstrate two different methods to replace missing values with zero. Now, we generalize this concept a bit. For various reasons, you may want to replace missing values with different quantities or statistics. For example, you may want to replace missing values in a data set with the group mean for one or more numeric variables.

First, let us create a small example data set and see how we can do this efficiently in SAS.

data Missing_Values;
input ID$ var1 var2 var3 @@;
datalines;
1 . 3 4 1 4 . 2 1 . . . 1 2 8 .
2 2 0 . 2 5 . 1 2 . 4 . 3 . . 3
3 5 . 7 3 3 1 7 3 . . 2 3 3 . 7
3 . 1 9 
;
SAS Code Example

First we sort the data after the group variable ID.

proc sort data=Missing_Values;
   by ID;
run;
 
Next, I use PROC STDIZE to replace the missing values with the group mean. I specify the data= and out= options to be the desired data set names. Then I use the REPONLY option to specify that I do not want any standardization done. By default PROC STDIZE standardizes data. Finally I specify the missing=mean option to specify that I want to replace the missing values with variable mean values. I use a by statement in the procedure and specify ID as the by variable to replace missing values with the group means and not the overall mean values. 
 
proc stdize data=Missing_Values out=Missing_Values_Mean reponly missing=mean;
   by ID;
run;

I specify no variables in the procedure. Consequently, I replace missing values with group means for all numeric variables in the data set.

Furthermore, this method of replacing values with a group statistic is not limited to mean values. Consult the documentation for PROC STDIZE to see what other statistics you can replace missing values with.

Summary

Missing values are part of the game when you are dealing with data in SAS. Replacing these missing values can be the solution to your problem. But you should be aware, that you should only replace missing values when it actually makes sense. There is a reason the value is missing.

You can download the entire code from this blog post here.