The SAS hash object is a very flexible data structure. It can contain numeric and character data variables. But did you know that it can contain other hash objects as well? Not many programmers do. This post demonstrates the Hash of Hash method by a few examples.
Disclaimer: If you have not yet read the blog post Declaring and Instantiating the Hash Object in SAS, I recommend that you do so before taking on the Hash of Hash technique.
The Hash of Hash Technique
The Hash of Hash technique is best introduced by an introductory example. In the data step below, I create two a hash object to contain (or rather point to) instances of other hash objects. Finally, I loop through the hash objects and perform some action on the instances that it points to. The approach goes as follows
- First, I declare the hash object h. Be aware, that I do not instantiate the object. This means I create a variable of type ‘Hash’ in the PDV. However, no instance of the object exists in memory yet.
- Next, i declare and instantiate the hash object HoH. I define the variable ds as the only key variable. Next, I define the data variables. This is where the magic happens. I define the object h as a data variable. I also define ds as a data variable. Finally, i declare a hash iterator to later loop through the HoH object.
- Here, I instantiate the first instance of h. I let ds=’sashelp.class’. Then I use the _NEW_ Operator to create an instance of h. The definition of keys and data variables is done as usual and the instantiation is complete.
- This step is important. Here, I use the Add Method to add the current PDV values of the variables ds and h to the HoH hash object. This means that the PDV variable h acts as a pointer to the active instance of h. The current values are the values we created in step 3, i.e. the sashelp.class data set and the newly created instance of h.
- This step acts the same as step 3. This means that we create a new instance of the hash object h. Consequently, this instance is now active and the PDV variable h now points to this instance of the hash object h. Remember, two instances of h now exist in memory. One is active (the one that contains the sashelp.iris data set). One is inactive. The PDV variable
- Finally, I use a Do While Loop and the Hash Iterator Next() Method to loop over each element in HoH. Each time the Next() Method is called, a new hash object instance of h is placed in the PDV, pointing to the active instance in memory. When no more elements are present in HoH, the loop and data step terminates.
data _null_; if 0 then set sashelp.class sashelp.iris; declare hash h; /* 1 */ declare hash HoH(ordered:'Y'); /* 2 */ HoH.definekey ('ds'); HoH.definedata('h','ds'); HoH.definedone(); declare hiter HoHiter("HoH"); ds='sashelp.class'; /* 3 */ h = _new_ hash(dataset:ds); h.definekey('name'); h.definedata(all:'Y'); h.definedone(); HoH.add(); /* 4 */ ds='sashelp.iris'; /* 5 */ h = _new_ hash(dataset:ds, multidata:'Y'); h.definekey('species'); h.definedata(all:'Y'); h.definedone(); HoH.add(); do while (HoHiter.next() = 0); /* 6 */ numrows=h.num_items; put (ds numrows)(=); end; run;
A Practical Application
In the blog post Split Dataset by Group with the SAS Hash Object, I demonstrate how to use the hash object to split a data set into multiple data sets based on distinct variable values. I do so in a simple and easy to follow manner. However, the simplicity comes at a cost. Namely, the data set has to be sorted by the variable of interest. However, we can overcome this if we use the Hash of Hash technique to split the data. The example below goes as follows.
- First, I declare and instantiate the hash object to point to other hash object instances in memory. Also, I declare (but do not instantiate the hash object h). This is equivalent to step 1 and 2 above.
- Next, I use a DoW Loop to explicitly process each observation in the input data set sashelp.class.
- This step is important. First, I use the Find() Method to search for the current value in the HoH object. Remember, that the Find() Method copies the hash object data values into the corresponding PDV variables. If the Find() Method succeeds (returns 0) then only the replacing of PDV values is done and we continue to step 4. If the Find() Method fails (returns non zero value), then a new instance of h is created and added to the HoH object.
- Now, the data set observation is added to the hash object h. Remember, that the active instance of h fits exactly to this observation due to the Find() Method above. If Find() succeeds, then we place the appropriate instance in the PDV (the PDV points to the appropriate instance). If Find() does not succeed, the appropriate instance is created and pointed towards in the PDV.
- Finally, we iterate through HoH. HoH contains as many items as there are distinct values for the variable in the data set. Each time we call the Next() Method, the PDV points to a new instance of h. Finally, for each instance of h, I use the Output() Method to output the data part of the hash object to a distinct data set.
data _null_; declare hash HoH(); /* 1 */ HoH.definekey ('sex'); HoH.definedata('h','sex'); HoH.definedone(); declare hiter HoHiter("HoH"); declare hash h; do until (eof); /* 2 */ set sashelp.class end=eof; if HoH.find() ne 0 then do; /* 3 */ h=_new_ hash(dataset:'sashelp.class(obs=0)', multidata:'Y'); h.definekey('sex'); h.definedata(all:'Y'); h.definedone(); HoH.add(); end; h.add(); /* 4 */ end; do while(HoHiter.next() = 0); /* 5 */ h.output(dataset:cats('data_',sex)); end; run;
This creates a distinct data set for each distinct variable value of sex in sashelp.class. Not surprisingly, this creates two data sets. The Hash Of Hash method is handy and very flexible. However, since the hash object is an in-memory construct, the entire data set will eventually reside in-memory. This restricts the size of the input data set by memory instead of disk space.
This post demonstrates the Hash of Hash technique by a few examples. The Hash of Hash technique is an advanced, but very flexible technique in the hash object toolkit. The key to mastering it is to realize that the PDV can contain variables of type ‘hash’, which acts as pointers to hash object instances in memory. Finally, we have seen a practical example, where we split a SAS data set by distinct variable values. No sort needed.
The book Data Management Solutions Using SAS Hash Table Operations devotes the entire chapter 9 to the Hash of Hash method.
You can download the entire code from this example here.