Most SAS programmers are familiar with the automatic _N_ variable in the Data Step. However, not all of them fully understand the variable. Actually far from it. In this post, I will introduce the automatic _N_ variable. I will demonstrate when the variable is incremented and just as important: when it does not increment. Finally, I will shed light on some common misconceptions about the variable.
In the code snippets to come, I will use the very simple data set below. The data set has one variable x and three observations.
data have; input x @@; datalines; 1 2 3 ;
A Simple Introduction
According to the SAS Data Step Documentation, two automatic variables are created in a data step. The _ERROR_ variable and the _N_ variable. The _N_ variable is commonly used to keep track of the number of times the data step has iterated. Also, it is a very common approach to use it when we want to take some action only on the first or last observation in a data step. A good example is declaring a hash object.
Take a look at the data step below. We know that the have data set has three observations. This means that the data step iterates three times. Consequently, the value _N_=1, 2 and 3 are written to the log. Check it out for yourself.
data test; set have; put _N_=; run;
Next, let us take a look at when the _N_ variable value increments. A common misunderstanding about the _N_ variable is that it indicates what number in a data step the current observation has. That is not true. As the Documentation says: “The _N_ variable is initially set to 1”. That means that the _N_ variable is equal to one before any observation is read into the PDV, i.e. before any Set Statement. The _N_ variable increments by one each time the data step passes by a data statement.
Take a look at the data step below. For demonstration, I put the value of the _N_ variable in the log before and after the set statement. You can see that they are equal before and after. Also, we can verify that the _N_ variable increments each time the internal data step loop passes by the data statement.
data test2; put _N_=; set have; put _N_=; run;
Manually Altering the _N_ Value
Now let us take things a bit further. Because while the _N_ variable does increment when it passes by the Data Statement, this is not the entire truth. Not many SAS programmers are aware of this, but we can alter the value of _N_ ourselves. Take a look at the data step below. Just like above, i read the three observations from have. However, this time, I change the value to zero and print the value in the log again. Run the code and check the log.
As you can see, the first time I print the value of _N_ in the log it gives me the same result as above. Even though I alter the value in the meantime. This raises two important facts about the SAS _N_ variable:
- We are able to manually alter the value of the automatic _N_ variable.
- While the data step does increment the value of _N_ each time it passes by a Data Statement, it does not do so blindly. The SAS Data Step remembers the value that the variable had last time the data step iterated and adds one to that. This makes the _N_ variable safe to alter, while meantime, it keeps track of the number of times it passed by a data statement.
data test3; set have; put _N_=; _N_=0; put _N_= //; run;
Putting It All Together
Why would we want to manually alter the _N_ variable? First of all, it is automatically dropped. If we want a temporary counter variable in a data step, that is not retained, the _N_ variable is safe to use.
A quite common use of the _N_ variable in the SAS data step is the DoW Loop. Take a look at the example below. Here, I use a DoW Loop to read the data set have instead of the implicit data step loop. Is the value of _N_ ever incremented? No! _N_ remains one through the entire data step because it never passes by the Data Statement again.
data test4; do until (lr); set have end=lr; put _N_=; end; run;
However, it does not take much work to manually increment the _N_ variable to out liking. Take a look at the data step below. Here, I use _N_ as a counter variable and manually increments its value each time the explicit Do Loop increments. Consequently, the data step below prints the same values to the log as the first data step in the post. Only in the code below, I manually increment the value, while the data step internally increments the value of _N_ in the first data step. I use this technique in the post Working with Consecutive Events in SAS.
data test5; do _N_=1 by 1 until (lr); set have end=lr; put _N_=; end; run;
In this post, I have presented _N_ variable: I have discussed what the _N_ variable is and perhaps more important, what it is not. I have discussed a few common misconceptions among SAS programmers and demonstrated a few examples where the automatic variable comes in handy. Next week, I will blog about the less commonly known _IORC_ Variable. Also, you can read about the automatic _I_ variable in the post Implicit Vs Explicit Array in SAS.
You can download the entire code from this post here.