This post investigates the behavior of the data step when a new iteration takes place. Specifically, we explore when SAS sets variable values to missing and when the values are retained across iterations. In the examples to come, I will use the simple data set below.

data have;
input x @@;
datalines;
1 2 3 4
;

The default behavior is that SAS sets variable values to missing at the beginning of each data step iteration. See the example below. Here, SAS sets the y variable to 1 at the second iteration of the data step. However, it remains missing for all other observations. The Documentation sets up a list of exceptions to when the values are set to missing in

When Variable Values Are Automatically Set to Missing by SAS

. In the following examples, I will go through each step by a simple example to give you an idea of when the data step retains values across iterations.

data want;
   set have;
   if _N_ = 2 then y = 1;
run;

Variables named in a RETAIN statement

The first condition is about the Retain Statement. When a variable is specified in a Retain Statement, SAS does not set its value to missing at each iteration. Instead the value is (not surprisingly) retained across iterations. See the small example below. All that is changes from the example above is the Retain Statement.

data want;
   set have;
   if _N_ = 2 then y = 1;
   retain y;
run;

Variables created in a SUM statement

Next let us consider the Sum Statement. As the Sum Statement Documentation states, “The variable’s value is retained from one iteration to the next, as if it had appeared in a RETAIN statement.”. Therefore, the values do not reinitialize to missing.

data want;
   set have;
   y + x;
run;

Data elements in a _TEMPORARY_ SAS array

The values in a temporary array are not variable values. However, the documentation does mention this point. The elements in a temporary array do not change with data step iterations. See the code below, The array elements are initialized at compile time. However, at no point in execution time do they change. This makes sense because elements in a temporary array are not variables. Therefore, it does not make sense to set them to missing at each iteration. Quite a few temporary array tricks relies on the fact that the values are invariant of data step iterations. For example the Array Lag Trick.

data want;
   set have;
   array a{3} _temporary_ (10 20 30);
   t = catq('d', '|', of a[*]);
   put t=;
run;

Data elements that are initialized in an ARRAY statement

The next exception is about the regular SAS Data Step Array. Variables, where we initialize their value in the Array Statement are automatically retained between iterations in the Data Step. Quite a few SAS Programmers are unaware of this fact. So be careful when you set initial values for variables in a regular non-temporary array. You may need to handle this later in your program. See the code below for a small example.

data want;
   set have;
   array a{1} (1);
run;

Variables created with options on Statements, that read data

This statement is quite broad. Variables that are created with options on Data Step Statements, which read data are not set to missing for each iteration of the data step. This goes for the File, Infile, Set, Merge, Update and Modify Statements. For a small example, see the data step below. Here, I use the three options nobs, indsname and curobs options. Consult the documentation if you want to know what they do. In this context, it does not matter. What matters is that their values are not set to missing at the beginning of a new data step iteration.

data want;
   set have nobs=nobs indsname=indsname curobs=curobs;
   _nobs     = nobs;
   _indsname = indsname;
   _curobs   = curobs;
run;

Automatic variables

Finally, all automatic variables are not set to missing at each iteration of the SAS Data Step. Let us see the two familiar automatic variables _N_ and _IOC_. Consider the example below. Here, I simply assign the automatic variable values to new variables to be able to keep them in the data set. You can see in the resulting data, that SAS does not set the variables to missing at each iteration.

data want;
   set have;
   n    = _N_;
   iorc = _iorc_;
run;

Summary

This post attempts to clarify the rules of when the data step sets variable values to missing at the start of each iteration. And when it does not. We explore 6 exceptions to the default behavior of the data step. The default behavior being that the data step sets values to missing at the start of each iteration. Some exceptions are more natural than others.

You can download the entire code from this post here.