Why Arrays Should be _TEMPORARY_ When Possible

Most SAS programmers know about arrays and how to use them in various situations in the data step. However, too few programmers know about the true power of the temporary array and how it differs from the ‘usual’ one. This post is devoted to the temporary array, the _TEMPORARY_ Keyword and why you should use it more in your data step programs. Furthermore, I present an example of a lookup method.

Normal vs Temporary

When we use the Array Statement, SAS creates corresponding variables in the PDV. So in the first declaration below, SAS creates ten additional variables in the PDV: NonTemp1, Nontemp2, …, Nontemp10. Naturally, this creation comes at a cost. More disk space usage, CPU and time consumption in creating the variables.

Now, in many applications, we do not need the actual variables in the PDV. In those cases, you should use the _TEMPORARY_ keyword to declare the array as a temporary construct. This means that we can only reference the array elements by their index such as Nontemp[1], Nontemp[2] and so on. Run the code below and confirm in the log, that only the first array statement creates variables in the PDV.

data _null_;
   array NonTemp{10};
   array Temp{10} _temporary_;
   put _all_;
run;

A Matter of Speed

As mentioned, the variable creation in the PDV is CPU and time consuming. As a small example, let us first create an array with 1mio character elements of length 100.

data test;
   array NonTemp{1000000} $ 100;
run;

This takes about 20 seconds to run. It is not surprising that it takes time to create 1mio variables in the program data vector. Now, let us see what happens if we create the same as a temporary construct with the _TEMPORARY_ keyword.

data test2;
   array Temp{1000000} $ 100 _temporary_;
run;

This code runs in 0.1 seconds. A massive time reduction. Though the only thing that seemingly separates the conventional and the temporary array, they are widely different structures. While the normal construct takes up disk space, the temporary is an in-memory construct. Just like the hash object. It sound fast already right?

A Practical Example: The Array Lookup

A classic use of the temporary construct is the Array Table Lookup. Here, I use two DoW Loops to first read the data and fill a temporary array with values. Next, I read the data again and look up the data from the temporary array. When applicable, this method is blazingly fast due to the in-memory construct using the _TEMPORARY_ keyword. Also, see the technique put to use in the post Bitmapping in SAS.

You can copy and paste the data from the example here.

data Lookup;
   array EmpHours{99999} _temporary_;
   do until(eof1);
      set emphours end=eof1;
      EmpHours[empid]=hours;
   end;
   do until(eof2);
      set employees end=eof2;
      hours=EmpHours[empid];
      output ArrayLookup;
   end;
   stop;
run;

The lookup takes less than a second on my system. If you omit the _TEMPORARY_ keyword, the lookup is much slower (if your system can even handle creating 99999 new variables in the PDV).

Summary

In this post, we have seen by example the cons and many pros of the temporary version. The points raised in the post underlines the fact that whenever you can, you should make your array temporary with the _TEMPORARY_ Keyword. This saves disk space, time and CPU power.

For a more thorough walkthrough of the topic of SAS arays, see the article Arrays Made Easy: An Introduction to Arrays¬†and Chapter 3.10 in Carpenter’s Guide to Innovative SAS Techniques. For another handy use, see the posts Using Temporary Arrays to Store Lagged Values in SAS and Random Sampling Without Replacement.

You can download the entire code from this post here.