Three Basic Techniques to Reduce SAS Hash Object Size

Last week, I demonstrated how to Measure the Memory Footprint of a Hash Object in SAS. The hash object is an in memory data construct. Computer memory is expensive and as a SAs programmer, we should not read unnecessary data into memory. In this post, I will demonstrate three basic techniques to reduce your hash object size.

In the example to come, I will use some made up data set. You can copy/paste the data set into your editor from here.

As a reference point, I simply read the data set into a hash object as is. The consequence is a memory footprint of 230 MB.

data _null_;
   if 0 then set HashData;
   before=input(getoption('xmrlmem'),20.);
      if _N_ = 1 then do;
         declare hash h(dataset:"HashData");
         h.defineKey('id1');
         h.defineData(all:"Y");
         h.defineDone();
      end;
   after=input(getoption('xmrlmem'),20.);
 
   hashsize=before-after;
 
   put "Hash Object Takes Up:" hashsize sizekmg10.2;
run;

1. Read in the necessary variables only

Not surprisingly, the size of a hash object increases with the number of variables read into it. In fact, when a variable is defined both as a key and a data variable, it is read both into the key and the data part. Taking up twice as much memory. Therefore we should only read in the necessary variables and be careful not to overuse the tempting ALL:”Y” Option in the Definedata Statement. I read in only three variables in the data portion of the hash object below.

data _null_;
   if 0 then set HashData;
   before=input(getoption('xmrlmem'),20.);
      if _N_ = 1 then do;
         declare hash h(dataset:'HashData');
         h.defineKey('id1');
         h.defineData('first_name', 'last_name', 'state');
         h.defineDone();
      end;
   after=input(getoption('xmrlmem'),20.);
 
   hashsize=before-after;
 
   put 'Hash Object Takes Up:' hashsize sizekmg10.2;
run;

The result is a 166 MB memory consumption. A nice memory saving at a reasonable cost.

2. Read in the necessary items only

The size of a hash object increases with the number of items. For each item added to the object, the size roughly increases with the sum of the lengths of the input variables. Therefore, it is desirable to read the necessary items only into the object. When we use the Dataset: Option in the Declare Statement we can use the Where= Data Set Option to control what observations are read into the SAS hash object.

data _null_;
   if 0 then set HashData;
   before=input(getoption('xmrlmem'),20.);
      if _N_ = 1 then do;
         declare hash h(dataset:"HashData(where=(gender='F'))");
         h.defineKey('id1');
         h.defineData('first_name', 'last_name', 'state');
         h.defineDone();
      end;
   after=input(getoption('xmrlmem'),20.);
 
   hashsize=before-after;
 
   put 'Hash Object Takes Up:' hashsize sizekmg10.2;
run;

Above, I use the Where= Data Set Option to specify that I am only interest in female observations in the input data set. Not surprisingly, this roughly cuts the memory consumption of the SAS hash object in half. Consequently, the memory footprint reduces to 83 MB.

3. Minimize Length of Hash Variables

As I stated earlier: For each item added to the object, the size roughly increases with the sum of the lengths of the input variables. Consequently, is is beneficial to control the lengths of input variables. Remember, hash variables (both key and data) inherit the attributes of the PDV variables associated with them. Now, remember that when a variable length is specified it can never change. Therefore, you should only change the length of the variables read into the hash object if it makes sense. Do not create a new data set on top of the original one and specify smaller lengths. In real life situations, this is rarely beneficial. Take control of the variable lengths at the source. In the original data set.

Now, I update the data step that creates the example data and reduce the lengths of the input variables so they still make sense. Then I read the data set into a hash object again. This results in a memory footprint of merely 45 MB.

Summary

In this post, I have demonstrated three basic techniques to reduce the size of a SAS hash object. The result of the techniques are that we reduced the hash object size from 230 MB to 45 MB. All with techniques that most SAS programmers are familiar with. Since computer memory is expensive, it is desirable to minimize the use of it.

In a future blog post, I will demonstrate Two Advanced Techniques to Reduce SAS Hash Object Size.

You can download the entire code from this post here.