Use SAS Hash Object in PROC FCMP

The SAS hash object is a very flexible and efficient data structure. Most SAS programmers that are familiar with the hash object know it from the Data Step. However, the hash object is also applicable in PROC FCMP. In this post, I will demonstrate how to use the hash object in PROC FCMP. I will demonstrate how the FCMP version of the hash object can extend the capability of the ‘regular’ hash object. Furthermore, I will discuss a few differences between the two versions.

In the examples to come, I will use the following example data. The ids data set contains a list of id’s. The employees contains data about made up employees, that have empids. Some of which are present in the ids data set.

data employees(drop=i);
   length empid $12;
   array first_names{20} $15 _temporary_ ("Paul", "Allan", "Thomas", "Michael", "Chris", "David", "John", "Jerry", "James", "Robert",
                                          "William", "Richard", "Bob", "Daniel", "Paul", "George", "Larry", "Eric", "Charles", "Stephen");
   array last_names{20} $15 _temporary_ ("Smith", "Johnson", "Williams", "Jones", "Brown", "Miller", "Wilson", "Moore", "Taylor", "Hall",
                                        "Anderson", "Jackson", "White", "Harris", "Martin", "Thompson", "Robinson", "Lewis", "Walker", "Allen");
   call streaminit(123);
   do i=1 to 1e6;
      first_name=first_names[ceil(rand("Uniform")*20)];
      last_name=last_names[ceil(rand("Uniform")*20)];
      empid=compress(uuidgen(), '-');
      output;
   end;
run;
 
data ids(keep=empid);
   length empid $12;
   do p=1 to 5e5;
      set employees(keep=empid) point=p;
      output;
   end;
   do i=1 to 1e5;
      empid=compress(uuidgen(), '-');
      output;
   end;
   stop;
run;

The Common Data Step Approach

The task here is to take all the observations in employees who’s empid is present in ids. The common hash object approach in the SAS data step is to do the following. Declare a hash object in the first iteration of the data step and load the ids data set into it. Next, we read the employees data set sequentially. Finally, I use the Check() Method to search for the current key value in the hash object. If this is present, the method returns a return code with value zero. In this case, we output. If not, we do not.

data test;
   if _N_=1 then do;
      declare hash h(dataset:'ids');
      h.definekey('empid');
      h.definedone();
   end;
 
   set employees;
 
   if h.check()=0;
run;

Next, let us use PROC FCMP to create a function that performs the search for us. In the code below, I create the function hashcheck(). I declare the hash object in the same way as above. However, I have to call the methods assigned in PROC FCMP. One of the differences between PROC FCMP and the Data Step. Finally, I use the Check Method() and return the rc variable in the Return Statement.

In the following data step, I put the Hashcheck Function to use. The crucial point here is that I can use the function directly in the Where Statement. This is not possible with ordinary data step hash objects. In the code below, I use the Hashcheck Function and read only the observations where the function returns zero. I.e. the empid is present in the ids data set.

proc fcmp outlib=work.functions.fun;
    function hashcheck(empid $);
      declare hash h(dataset: 'ids');
      rc=h.definekey('empid');
      rc=h.definedone();
      rc=h.check();
      return (rc);
    endsub;
run;quit;
 
options cmplib=work.functions;
data test2;
   set employees(where=(hashcheck(empid)=0));
run;

Lookup Functions in PROC FCMP

Next, let us use the hash object in PROC FCMP to find and retrieve the corresponding values for a given empid. Here, we use the Find Method() to retrieve the corresponding first name to a given empid. Finally, we use the Return Statement and return first_name.

In the following data step, I use the created Findem Function on empid and assign the returned value to first_name. This works exactly like a regular Hash Object Lookup. Except that the code is shorter and can easily be reused.

proc fcmp outlib=work.functions.fun;
   function findem(empid $) $;
      length first_name $ 100;
 
      declare hash h(dataset:'employees');
      rc = h.definekey('empid');
      rc = h.definedata('first_name');
      rc = h.definedone();
      call missing(first_name);
 
      rc = h.find();
      return (first_name);
   endsub;
run;
 
options cmplib=work.functions;
data test3;
   set ids;
   length first_name $ 100;
   first_name=findem(empid);
run;

The function above limits itself to looking up a single value for a key value. This is the core of a function. A function can return only one value. We can overcome this by writing a Subroutine instead. A subroutine can return multiple values. In a future post, I will write about the difference between a function and a subroutine in PROC FCMP. The key difference between the code above and the code below is that I specify findem2 as a subroutine and use the Outargs Statement to list the variables that I want to return from the routine.

In the following data step, I use a Call Statment to call the Findem2 Subroutine. Here, I must specify both the key and the desired data values, that we want to retrieve.

proc fcmp outlib=work.functions.fun;
   subroutine findem2(empid $, first_name $, last_name $);
      outargs first_name, last_name;
      length first_name $ 100 last_name $ 100;
 
      declare hash h(dataset:'employees');
      rc = h.definekey('empid');
      rc = h.definedata('first_name', 'last_name');
      rc = h.definedone();
      call missing(first_name, last_name);
 
      rc = h.find();
 
   endsub;
run;
 
options cmplib=work.functions;
data test4;
   set ids;
   length first_name $ 100 last_name $ 100;
   call findem2(empid, first_name, last_name);
run;

Summary

In this post, we have seen how to use the hash object in PROC FCMP. We have seen examples of how to implement search and retrieve operations and even how to perform hash searches directly in a Where Statement in the Data Step. This post is meant as a teaser on the subject. In a future post, I write about how to use the Hash Object as a Cache in PROC PCMP. The capabilities extend far beyond the scope of this post. However, the documentation on the subject is quite short. Below, I have listed a few articles, that I find informative. If you want to learn more, use lexjansen.com and search for PROC FCMP Hash Object. Far more information there than in the doc.

Hashing in PROC FCMP to Enhance Your Productivity
SampleĀ 47224: Load a SAS data set into a Hash Object using PROC FCMP

Also, see this post about Arrays in PROC FCMP.

You can download the entire code from this post here.