I have previously written the two blog posts SAS Array Lookup Example and SAS Proc Format Lookup Example. I demonstrate examples of how to perform table lookups with arrays and formats. While these are both valid and efficient search techniques, they do have limitations. These limitations are eliminated if we use the powerful SAS Hash Object. In this post, I will briefly demonstrate an example of a simple lookup. Furthermore, I will discuss how the hash object overcomes the limitations of the lookup technique described in the two posts above.

In the examples to come, I will use almost the same example data as in the Format Example post, generated with the following code

data employees(drop=i);
   length empid $20;
   array first_names{20} $15 _temporary_ ("Paul", "Allan", "Thomas", "Michael", "Chris", "David", "John", "Jerry", "James", "Robert",
                                          "William", "Richard", "Bob", "Daniel", "Paul", "George", "Larry", "Eric", "Charles", "Stephen");
   array last_names{20} $15 _temporary_ ("Smith", "Johnson", "Williams", "Jones", "Brown", "Miller", "Wilson", "Moore", "Taylor", "Hall",
                                        "Anderson", "Jackson", "White", "Harris", "Martin", "Thompson", "Robinson", "Lewis", "Walker", "Allen");
   call streaminit(123);
   do i=1 to 50e6;
      empid=compress(uuidgen(), '-');
proc surveyselect data=employees out=temp(keep=empid) seed=123 noprint                
data emphours;
   set temp;
   hours=round(rand('Uniform', 10, 100), 0.01);
proc datasets lib=work nolist;
   delete temp;

The only difference is that the lookup table emphours contains three lookup variables instead of one.

A Simple SAS Hash Object Lookup

The fact that the lookup table emphours contains more than one search variable outlines the shortcomings of the other search techniques well. They can only lookup one value for a given key (not entirely true, but is takes a lot of effort to search for multiple values). For example, if we use the format lookup, we would have to create as many formats and as many CNTLIN Data Sets as the number of lookup variables. This is certainly not desirable.

However, we can easily lookup multiple variable values for a given key with the hash object. To do so, we simply declare the hash object in the first iteration of the data step (if _N_=1). I specify empid as the key variable and hours, sickdays and seniority as data variables in the object. Finally, I set the data variables as missing. Otherwise, they are retained from the previous observation in the data set. Finally, I set the Employees data set and use the FIND() Method assigned to perform the lookup.

data HashLookup(drop=rc);
   if 0 then set work.emphours;
   if _N_ = 1 then do;
      declare hash h(dataset:'work.emphours');
      h.defineData('hours', 'sickdays', 'seniority');
      call missing(hours, sickdays, seniority);
   set Employees;

The if 0 then set statement at the top of the program ensures that the relevant variables exist in the PDV. The Hash Lookup data set contains the variables from Employees and the three data variables from EmpHours. Thus, we successfully performed an efficient table search of multiple data values without creating any new data sets or catalogs (formats).


We have seen that this lookup method easily overcomes the shortcomings of other lookup techniques in SAS. However, we have only seen the tip of the iceberg. The hash object is extremely flexible and efficient as it exists only in memory. I will leave this post as a simple teaser as many more posts on the topic are to come. Stay tuned!

If you want to learn more about the rich capabilities of the SAS Hash object, I recommend the two books Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study and SAS® Hash Object Programming Made Easy. Also, see the blog post 5 Tips to Learn and Understand the Hash Object in SAS.

In this post, I show how to look up a single data value for a given key. In the blog post SAS Hash Object Do_Over Method Example, I demonstrate how to lookup several data values for a given key. Also, I have previously blogged about Split Datasets By GroupRun Time Effect Of Hash Object HASHEXP Argument Size and using the Hash Object as a Dynamic Placeholder. Furthermore, you can even use the Hash Object in PROC FCMP and create an Array Hashing Scheme in SAS.

You can download the entire code from this article here.