In the blog post Three Basic Techniques to Reduce SAS Hash Object Size, I demonstrate how we can reduce the memory footprint of a SAS hash object with familiar programming tools. Today, I will show a few more techniques. Though these are a bit more complicated. As in the previous post, I will use the example data set HashData. You can copy/paste the data into your editor from here.

First, I simply read in the data set as is with the five key variables id1 to id5 and all data variables possible

data _null_;
   if 0 then set HashData;
   before=input(getoption('xmrlmem'),20.);
      if _N_ = 1 then do;
         declare hash h(dataset:"HashData");
         h.defineKey('id1', 'id2', 'id3', 'id4', 'id5');
         h.defineData(all:"Y");
         h.defineDone();
      end;
   after=input(getoption('xmrlmem'),20.);
 
   hashsize=before-after;
 
   put "Hash Object Takes Up:" hashsize sizekmg10.2;
run;

This leaves a 346 MB memory footprint.

Use Metadata to Specify Data Variables

Sometimes you want to specify data variables that matches certain patterns or meet certain requirements. We can use SAS Dictionary Tables or the Sashelp Metadata Views to find those variables. Above I specify id1 to id5 as key variables. However, i may not want them to be data variables as well. I can exclude them from the data part of the hash object like this

%let key='id1', 'id2', 'id3', 'id4', 'id5';
proc sql;
	select compbl(quote(strip(name))) into :datavars separated by ','
	from dictionary.columns 
	where upcase(libname)=upcase('work') 
	  and upcase(memname)=upcase('hashdata')
     and find(name, 'id') eq 0;
quit;
%put &datavars.;
 
data _null_;
   if 0 then set HashData;
   before=input(getoption('xmrlmem'),20.);
      if _N_ = 1 then do;
         declare hash h(dataset:"HashData");
         h.defineKey(&key.);
         h.defineData(&datavars.);
         h.defineDone();
      end;
   after=input(getoption('xmrlmem'),20.);
 
   hashsize=before-after;
 
   put "Hash Object Takes Up:" hashsize sizekmg10.2;
run;

Here, I simply extract all the variables from the SAS HashData data setthat do not have the string ‘id’ in their name. Then I put these variables into a format that the hash object can read. This technique results in a memory footprint of 269 MB.

Create MD5 Hash Representation of Key Variables

When we work with large quantities of data, we may want to specify many key variables. I have seen examples of up to 30 key variables. Instead of specifying all those key variables, we can create an MD5 Hash Representation of the concatenation of the key variables. Needless to say, this can vastly reduce the SAS hash object size if we have many key variables with long lengths. In the example above, I have five key variables with a length of 20 each. This is a combined length of 100. We can represent the key variables with a unique MD5 hash key with a length of 16 like this

data HashData;
   set HashData;
   length key $16;
   key=md5(cats(of id1-id5));
run;
 
data test;
   if 0 then set HashData;
 
   before=input(getoption('xmrlmem'),20.);
      if _N_ = 1 then do;
         declare hash h(dataset:"HashData");
         h.defineKey('key');
         h.defineData(&datavars.);
         h.defineDone();
      end;
   after=input(getoption('xmrlmem'),20.);
 
   hashsize=before-after;
 
   put "Hash Object Takes Up:" hashsize sizekmg10.2;
run;

This reduces the memory footprint of the SAS hash object to 192 MB. Be aware that in this example, I overwrote the HashData data set simply to demonstrate the creation of the hash representation. Needless to say, in a real-life situation, you should create the hash variable in the original data.

Summary

In this post, I have demonstrated two advanced techniques to reduce the size of a hash object in SAS. We reduced the memory footprint of a hash object from 346 MB to 192 MB. The magnitude of size reduction from these techniques depends on the lengths of the variables in question. However, in both cases, they can be very handy if the number of key variables is large.

As a related post, read about The Hash Object Memrc Argument in Definedone. Also, the approach in the post Using the SAS Hash Object and Point= Option can reduce the memory footprint quite a lot.

You can download the entire code from this example here.