Replace Missing Values with Zero in SAS
Missing values are part of the game when you are dealing with data in SAS. Sometimes you want to replace the missing values with some other quantity. This short blog post shows you how to replace missing values with zeros in SAS.
First of all, let us create some small example data set. The code below creates a SAS data set with five observations and quite a few missing values. I will use this data set in the examples to come.
data Missing_Values; input ID$ var1 var2 var3; datalines; 1 . 3 4 2 2 0 . 3 . . 3 4 . 8 . 5 5 . . ;
SAS Data Step Method Example
First off, let us look at an example of replacing the missing values with a simple Data Step. After reading in the data set above, I create an array NumVar to hold all numeric variables in the data set. Next I loop over all objects in the NumVar array and use simple if-then logic to set missing values to zero.
data DataStepMethod; set Missing_Values; array NumVar _numeric_; do over NumVar; if NumVar=. then NumVar=0; end; run;
An easy to understand approach to a simple data problem. Furthermore, you can easily modify the code to replace missing values for only the desired variables by inserting only the those variables into the array.
PROC STDIZE Method Example
Even though the data step method above works just fine, there is a simpler way. PROC STDIZE lets you easily replace missing values with zero. Furthermore, it does so with much less coding than in the data step example. The example below creates the Exact Same Data Set as the data step aproach.
proc stdize data=Missing_Values out=ProcStdizeMethod reponly missing=0; run;
By default, PROC STDIZE standardizes the input data by some location and scale parameter. However, we can suppress the standardization with the REPONLY Option. This option ensures that we are only to replace missing values and nothing else. Furthermore, we specify the MISSING=0 option to specify that missing values are to be replaced with zeros. If we omit this option, missing values are replaced by the location measure by default.
In the above example, we replace with zero for all numeric variables. This happens when no VAR statement is specified in PROC STDIZE. However, it can be convenient to replace missing values for specific variables only. To do this, simply specify the relevant variables in the var statement as below.
proc stdize data=Missing_Values out=StdizeMethod_Var reponly missing=0; var var1; run;
I encourage you to run the above code examples and see how the data sets differ. Also, you should take the time to play around with the STDIZE procedure. What happens when you omit the MISSING=0 option? And can you set the MISSING= option to other quantities that make sense?
Missing values are part of the data game. And you should be careful how you treat these. You should only replace missing values with other quantities if it makes sense. For example, you should only replace a missing value with a zero, if the missing value implies a lack of the particular variable value.
Zeros are not the only value that make sense to insert instead of missing values. In future blog posts, I will demonstrate how to Replace Missing Values With Groups Means and how to Replace Missing Values With The Previous Non Missing Value.
You can download the entire code from this blog post here.