The SAS Hash Object Removedup Method allows you to delete single items in a hash object. In contrast to the regular Remove() Method, which removes entire key-groups. However, the usage of the method is not as straight forward as it may seem. This post aims to clarify the usage of the Removedup Method by example.
In the examples to come, I will use the simple SAS data set below.
data have; input k d; datalines; 1 10 1 20 1 30 2 5 2 10 2 15 3 20 3 30 3 40 ;
A Simple Removedup Example
First, a simple example of how the Remodedup Method works. In the code below, I create a simple hash object with the keys k and the data variables k and d. I use the Find() Method to find the first item in the same-key group for k=2. Then the Find_Next() call points to the second item in the k=2 group. Finally, I use the Removedup Method to remove the item that we currently point to. And only that item. At the bottom, I print the content of the hash object. Notice that SAS successfully deletes the second item in the k=2 group.
Notice that we must specify the multidata:”Y” argument tag in the Declare Statement. However, this will always be the case, since otherwise, you could simply use Remove().
data _null_; declare hash h(dataset : "have", multidata : "Y", ordered : "A"); h.definekey("k"); h.definedata("k", "d"); h.definedone(); declare hiter hi("h"); k = .; d = .; h.find(key : 2); h.find_next(); h.removedup(); do while (hi.next()=0); put (k d)(=); end; run;
Selective deleting multiple items
Next, let us see how to use the SAS Removedup Method to delete multiple items based on some criteria. Suppose, we want to delete all items for which d = 20 in the k=1 and k=3 groups. This is fairly easy with a simple loop over the desired key values and Find()/Find_Next logic. Run the code below and verify that we remove the desired items. This is fairly simple because only one item in each same-key group fulfills the criteria.
Next, suppose that we want to do the same thing. Only this time, we want to remove all items for which d is either 20 or 30. Looking at the SAS data set, we see that this is true for multiple items. Uncomment the line with the new criteria below. This does not yield the desired result. The items for which d=30 are still in the hash object. This happens because after a successful Removedup Call, SAS unsets the item list that the Find()/Find_Next() methods currently traverses. Consequently, we would have to traverse the entire list again. Starting from the first item with a Find() Method call and forward. Furthermore, we would have to keep track of whether the enumeration process currently dwells on the last item in the group with a Has_Next Method Call. This is the approach of the example in the documentation and on page 88-92 in the book Data Management Solutions Using SAS® Hash Table Operations. However, it is overly complicated. Read the next section to see a simpler way.
data _null_; declare hash h(dataset : "have", multidata : "Y", ordered : "A"); h.definekey("k"); h.definedata("k", "d"); h.definedone(); declare hiter hi("h"); k = .; d = .; do k = 1, 3; do rc = h.find() by 0 while (rc = 0); if d in (20) then rc2 = h.removedup(); *if d in (20, 30) then rc2 = h.removedup(); rc = h.find_next(); end; end; do while (hi.next()=0); put (k d)(=); end; run;
The SAS Do_Over Method
Not long ago, this was a topic at the SAS Community. Read the thread REMOVEDUP and REPLACEDUP in HASH Object. It seems that the Do_Over Method prevents SAS from unsetting the item list. Run the code below and check the log. This successfully removes the items that we want. Though the documentation does not mention it, the Do_Over call works much better than the Find/Find_Next approach above. I am unsure whether the unsetting of the item list actually does not happen or the various enumerations and checks simply happen behind the scenes. Anyway, the code is much cleaner and easier to understand.
I learned this trick from the great FreelanceReinhard.
data _null_; declare hash h(dataset : "have", multidata : "Y", ordered : "A"); h.definekey("k"); h.definedata("k", "d"); h.definedone(); declare hiter hi("h"); k = .; d = .; do k = 1, 3; do while (h.do_over()=0); if d in (20, 30) then rc2 = h.removedup(); end; end; do while (hi.next()=0); put (k d)(=); end; run;
Simplifying a SAS Documentation Example
The Removedup Documentation has a single example. Here, the second item in each item list is removed with the Removedup Method. This is a classic example of using Find()/Find_Next and Has_Next() to do the single-item removal. However, as we learn above, there is a simpler way.
data testdup; length key_id value 8; input key_id value; datalines; 1 10 2 11 1 15 3 20 2 16 2 9 3 100 5 5 1 5 4 6 5 99 ; data _null_; length r 8; dcl hash h(dataset:'testdup', multidata: 'y', ordered: 'y'); h.definekey('key_id'); h.definedata('key_id', 'value'); h.definedone(); call missing (key_id, value); do key_id = 1 to 5; rc = h.find(); if (rc = 0) then do; h.has_next(result: r); if (r ne 0) then do; h.find_next(); h.removedup(); end; end; end; dcl hiter i('h'); rc = i.first(); do while (rc = 0); put key_id= value=; rc = i.next(); end; run;
Therefore, let us rewrite the code from the example to utilize the Do_Over Method. Take a look at the code below. This created the same result in the log as the code above. However with less and simpler coding. Simply by the fact, that Do_Over lets us stay within the item list when a successful Removedup Method call takes place.
data _null_; length r 8; dcl hash h(dataset:'testdup', multidata: 'y', ordered: 'y'); h.definekey('key_id'); h.definedata('key_id', 'value'); h.definedone(); dcl hiter i('h'); call missing (key_id, value); do key_id = 1 to 5; do c = 1 by 1 while (h.do_over()=0); if c = 2 then h.removedup(); end; end; do while (i.next()=0); put key_id= value=; end; run;
In this post, we explore the SAS Hash Object Removedup Method. The Removedup Method lets us remove single items in hash objects with multiple items per key. However, we see that the usage is not always straightforward. The fact that the method unsets the current item list for a successful call makes it tricky. However, we learn that the Do_Over method simplifies the process and saves us typing and calls.
In a future post, I will write about the Replacedup Method.
You can download the entire code from this post here.