Last week, I introduced the SAS Hash Object in PROC FCMP. Today, I will show you how to use the hash object as a caching mechanism to reduce to number of calculations in PROC FCMP. And possibly save time and CPU.
In the examples to come, I will use the very simple data set below
data have; do i=1 to 1e6; x=rand('integer', 1, 1000); output; end; run;
A Classic FCMP Function
First, let us use PROC FCMP and write a very simply function. This function simply takes the input variable x and calls the Int Function twice. Obviously, you would not do this in real life. This is just a simple example of a computationally heavy function call.
proc fcmp outlib=work.functions.fun; function somefunction(x); do i=1 to 1000; y=int(int(x)); end; return (y); endsub; run;
Now let us call the SAS function on each value in the data set above. This takes about 16 seconds on my laptop.
options cmplib=work.functions; data test1; set have; y=somefunction(x); run;
Using the Hash Object to Reduce the number of calls
Next, let us see how we can make the function a bit more complicated, but way more efficient. Below, I define the function and declare a hash object h as the first action. I use the input value x as key and the calculated y as data value. I do the same calculation as in the previous function definition. However, before I do the computation, I check if the result already exist in the hash object. If it does, there is no need to do the calculation again. I simply return the value that is in the hash object cache. If it does not exist in the SAS hash object, I do the computation, add it to the cache and return y.
proc fcmp outlib=work.functions.fun; function somefunction_withcache(x); declare hash h(); rc=h.defineKey('x'); rc=h.defineData('y'); rc=h.definedone(); /* If already calculated, return value */ if h.find()=0 then return(y); /* Else do the computation, return it and add it to the hash object */ do i=1 to 1000; y=int(int(x)); end; rc=h.add(); return (y); endsub; run;
Now, I do create the same data set as above, but with the new function. This takes less than a second.
options cmplib=work.functions; data test2; set have; y=somefunction_withcache(x); run;
You can run the code below to verify that the two data sets are identical.
proc compare base=test1 comp=test2; run;
In this post, we have seen how to use the SAS hash object as a cache mechanism in PROC FCMP. I provide a very simple example and demonstrate that the technique can be time and CPU saving. The example is easily extendable to more sophisticated situations.
You can see another example of caching in PROC FCMP in the article Hashing in PROC FCMP to Enhance Your Productivity. Also, see another alternative hash object in the posts An Array Hashing Scheme in SAS and A SAS Macro Approach to Array Hashing.
You can download the entire code from this post here.