As a SAS programmer, you probably know about the Do Loop, the Do While and Do Until Loops. But do you know about the DoW Loop? If not, you are in for a real treat. In its essence, the DoW loop takes control from the implicit loop structure of the data step to an explicit do loop with the Set Statement inside the loop. This post is devoted to exploring the DoW Loop by example and the flexibility that it holds.
In the following, I will use the example data below.
data MyData; input ID var @@; datalines; 1 10 1 30 1 50 2 40 2 60 2 70 2 40 3 80 ;
The name “Dow Loop” is not an official SAS term. It is not documented. Rather, the name originates from the SAS-L Community where the user Ian Whitlock used the technique flawlessly. The technique gained popularity in the community and not long after that, the loop was referred to as a abbreviation of the two words Do and Whitlock. The DoW Loop.
The DoW Loop
Let us take a look at a simple DoW Loop example. In the data step below I use a Do Until construct with a Set Statement inside. The set statement inside a do loop was what confused me the most, when I first saw this technique. Therefore, let us take a closer look at the DoW Loop below.
- Here, I use the Do Until Loop to repeat until last.ID. I use the automatic variable _N_ as the iterator variable to control how many times the loop iterates for each ID. _N_ is safe to use because it merely controls how many times the internal data step loop has iterated. Furthermore, the variable is automatically dropped, so I don’t need to specify and drop a new variable this way.
- The Set Statement is now specified. Placing the Set Statement inside the loop forces the control of the data step implicit loop structure to the explicit Do Until Loop. The trick here is to remember that each time the data step execution hits the Set Statement, SAS reads the next observation of the data set. This continues until SAS encounters an empty data set buffer.
- A simply By Statement with the ID Variable specified ensured that we can use the last.ID variable in step (1). This is a classic use of the DoW loop.
- Here, I create a simple sum variable where I add the value of the var variable each time the loop iterates.
data DowLoop; do _N_=1 by 1 until (last.ID); /* 1 */ set MyData; /* 2 */ by ID; /* 3 */ sum=sum(sum, var); /* 4 */ end; run;
Be aware that this does not create one observation for each input observation. Instead, it creates an observation for each ID. Remember, there is still an implicit Output – and Return Statement at the bottom of the Data Step.
The Double DoW Loop
The data step example above creates a summary data set with three observations. However, we can create this data set so many other ways and even simpler and more efficient. Why should we resort to this technique? Let us take things a bit further and find the answer with and example of the Double DoW Loop.
Let us assume that I do not want to create a summary data set. I want every observation from the original data set, but for each by group, I want the average value in a variable.
The first part of the code below is identical to the simple DoW Loop above. Next, I create the average variable. Again, I can use the _N_ variable, which I iterate fir each observation of the current ID. Now comes a second DoW Loop. It looks quite a lot like the first one. However, I use an explicit Output Statement to output each observation from the input SAS data set. Furthermore, the second DoW Loop retains the average variable throughout the iterations. Consequently, the average for each ID will appear in every single observation.
A word of caution. In the double DoW Loop, we have two set statements with the same data set specified. However, under the hood SAS does not actually point to the same data set twice. Behind the scenes, SAS sets up two distinct data set buffers and accesses them accordingly. If this was not the case, SAS would read the first by-group in the first loop and iterate through the next by-group in the second one. This is clearly not desirable.
data Double_DowLoop; do _N_=1 by 1 until (last.ID); set MyData; by ID; sum=sum(sum, var); end; avg=divide(sum, _N_); do until (last.id); set MyData; by id; output; end; run;
The Double Dow Loop creates exactly what we want. A data set with every observation from the original data set present, and with an average variable for each ID. The exact same result can be achieved with the following SQL Procedure
proc sql; create table dow_SQL as select *, sum(var) as sum, mean(var) as avg from MyData group by id. order by id; quit;
So once again: Why the DoW Loop, when we can achieve the same with other tools? There are several reasons why.
First and foremost, the DoW Loop offers flexibility and control. It lets you logically separate By-Group processing and Between-Group processing. It is easy to set up a program where you do some preparation before processing a by-group. Then process the by-group and afterwards do some work that processes the final result. Furthermore, in many cases the DoW loop will be more efficient than the SQL approach. It may seem like the SQL Procedure reads the data only once, but this is not the case. In fact, the SQL Procedure may slow things down compared to the data step due to many random accesses. However, we know that the data step read the data sequentially exactly twice.
In this post, we have seen a few examples of the DoW Loop in SAS. An amazingly flexible tool if used with care. We have seen how programming tasks that most programmers may solve with other tools can be overcome with the Dow and Double DoW Loop. Furthermore we have discussed why the DoW Loop is a good idea to add to your SAS toolkit, which is not obvious at first. I demonstrate a classic use of the DoW loop in the posts A SAS Hash Object Of Hash Objects (Hash Of Hash) and Using a Temporary Array to Store Lagged Values.
This post is means as an introduction by example. There are several articles out there, that explains the technique in more depth. My favorite one is The DOW-Loop Unrolled by Paul Dorfman.
You can download the entire code from this post here.