In a recent thread on the SAS Community, a user asked how to find minimum and maximum values within groups using the hash object. One might ask why not simply use an appropriate procedure such Proc Summary. This is a reasonable question. However, doing this with the hash object highlights a lot of the interesting features of the hash object. Therefore, I will go through to approaches to do this. One that requires grouped input. And a hash of hashes approach that does not.

In the examples to come, I will use the example data below.

data have;
input k d;
datalines;
1 10
1 20
1 30
2 5 
2 10
2 15
3 20
3 30
3 40 ;

Minimum Value with Hash Object

First, let us assume that the data above is actually sorted. Or at least grouped. I split the code below into three overall steps. Let us look at them one by one.

    1. First, I declare hash hash object. At first, i do not read any data into it. I simply declare it and make sure that I use the Multidata argument tag. More importantly, I use the Ordered : ‘A’ tag to make sure that the items we insert into h are sorted into ascending order. Be aware, that I use d as the key. Not k. Finally, I declare a hash iterator object and relate it to h.
    2. Next, I use a Dow Loop to read data into the SAS hash object. One group at the time.
    3. In step three, I know that a single key group is in the hash object. Furthermore, I know that the items are sorted by d. Therefore, I can get the minimum value for that group, simply by accessing the first item in the group. I do so with a Next() Iterator Call. This copies the data values from the hash object into the PDV. Finally, I unset the item list again by calling the Prev() Method and clear the hash object. This is important. Because if I do not unset the list, the item group is locked by the hash iterator. And if the item group is locked by the iterator object, I can not clear the hash object. If the hash object is not empty by the end of the program, the previous item-group will still be in the object when we add the next one.
data want;
   if _N_ = 1 then do;                                                              /* 1 */
      declare hash h (dataset : 'have(obs=0)', ordered : 'A', multidata : 'Y');
      h.definekey ('d');
      h.definedata(all : 'Y');
      h.definedone();
      declare hiter hi ('h');
   end;
 
   do until (last.k);                                                               /* 2 */
      set have;
      by k;
      h.add();
   end;
 
   _N_ = hi.next();                                                                 /* 3 */
   _N_ = hi.prev();
   h.clear(); 
run;

The approach above is a simplified version of the technique in 3 Ways to Select Top N By Group in SAS.

Obviously, there are always more than one way of doing things. I present the approach above mainly because it highlights some interesting features of the hash object. Another approach could be this. A bit more memory heavy, but quite slick as well.

data want;
   dcl hash h(ordered : 'Y');
   h.definekey('k');
   h.definedata('k', 'min');
   h.definedone();
 
   do until (z);
      set have end = z;
      if h.find() then min = d ;
      else do;
         if d < min then min = d;
      end;
      h.replace();
   end;
 
   h.output(dataset : 'test');
run;

Unsorted input – Hash of Hashes

In the example above, we assume that the input is sorted by the key variable. Suppose that is not the case, but we still wish to find the minimum or maximum value by group with the hash object. The Hash of Hashes approach is a real treat here. Again, I split up the code below into three main parts. However, I will not be very detailed. So to fully understand the process read A SAS Hash Object Of Hash Objects first.

  1. First, I declare the HoH. I want this object to point to instances of other hash objects. Furthermore, I want those objects to have iterators applied to them. Therefore, i specify h and hi in the data portion of HoH. I declare an iterator object in HoH at the end.
  2. In step 2, I read the entire input data. For each observation, I call HoH.find. If the appropriate hash object already exists, zero is returned and ge go straight to the h.add() call. If it does not exist, we declare an instance of and add it to HoH, This works because k is also a key value in HoH. Like above, it is very important that we use the Ordered: “Y” arguement.
  3. Finally, I use the iterator applied to HoH to enumerate HoH. For each item in the object, we set an instance of h active. The object that is set active is determined by the value of k. For each instance that is set active, we copy the first item values into the PDV. Just like above, I make sure to unset the item list again with the Prev() Method call.
data have;
input k d;
datalines;
1 10
3 40
2 10
1 30
3 20
2 5 
1 20
2 15
3 30
;
 
data want;
   dcl hash HoH(ordered : 'A');                                                     /* 1 */
   HoH.definekey('k');
   HoH.definedata('h', 'hi', 'k');
   HoH.definedone();
   dcl hiter HoHiter('HoH');
 
   do until (lr);
      set have end=lr;
 
      if HoH.find() ne 0 then do;                                                   /* 2 */
         dcl hash h(dataset : 'have(obs=0)', multidata : 'Y', ordered : 'A');
         h.definekey('d');
         h.definedata(all : 'Y');
         h.definedone();
         dcl hiter hi('h');
         HoH.add();
      end;
 
      h.add();
   end;
 
   do while(HoHiter.next() = 0);                                                    /* 3 */
      _N_ = hi.next();
      _N_ = hi.prev();
      output;
   end;
run;

Summary

In this post, we see an example of how to find minimum and maximum values by group with the hash object in SAS. In both examples, I find minimum values. If we cant to find maximum values instead, simply set Ordered:”D” instead of “A”.

As a related post, read the post A Few Hash of Hashes Examples, Calculate Sums With the SAS Hash Object and Top N By Group with Hash of Hashes in SAS.

You can download the entire code from this post here.