When we declare a SAS hash object is created, two overall operations take place. A declare part and an instantiate part. To fully understand the hash object, you must know when these operations happen and the difference between them. In this blog post, we will examine the two operations. As we will see, more than one instance of the same object can be created.
Declare Vs Instantiate
We use the Declare Statement to declare a hash object in SAS. We outline two different approaches in the code below. The first approach declares and instantiates the hash object in two separate steps. The second approach does the same thing in a single statement. Let us take a closer look at the two.
The first approach uses the declare statement followed by the keyword hash followed by the name. This step declares the hash object and is considered only during compilation of the data step. Next, an assignment statement with the _NEW_ operator and the hash() keyword. This step instantiates the object and is considered only during execution of the data step. During instantiation, the hash object is actually placed in memory.
The second approach does exactly the same thing as the steps above. However, it does so in a single statement. Notice that in the declare statement, the name h is followed by a set of parentheses. This is not the case in the approach above. Here, the parenthesis appears in the assignment statement, that instantiates the hash object. This is a good rule of thumb. The parenthesis in the declaration of a SAS hash object relates itself to the instantiation.
data _null_; declare hash h; /* The hash object is declared PDV variable of type hash. Compile time only. */ h= _new_ hash(); /* A hash object instance is created of the hash object h. Run time only */ declare hash h(); /* This statement does exactly the same as the two statements above */ run;
Declare a Hash Object – What Happens in the PDV?
To better understand the two phases in the creation of a hash object, let us consider what happens in the Program Data Vector (PDV) during declaration and instantiation. During declaration, the hash object is declared and a variable is inserted into the PDV. This variable has the type hash and is naturally not printed to any data set. Only scalar variables (numeric and character) can be written to the data set. Consequently, only one declaration can take place of the same hash object. Consider the code below. The second declaration statement of the hash object h1 results in a compile-time error.
data _null_; declare hash h1; declare hash h2; declare hash h1; /* A compile time error is produced here */ run;
However, remember that data step compilation does not consider conditional logic (such as loops). Therefore the following declaration does not yield any error during compilation.
data _null_; do i=1,2; declare hash h; end; run;
When the data step instantiates the hash object, the data step first looks for the relevant variable of type hash in the PDV. If this does not exist, a run time error is produced. If it does exist, the instantiation proceeds and an instance of the object is created in memory. A distinct variable value of the type hash is created. This can be considered as a pointer to the newly instantiated object. Finally, this value is assigned to the variable of type hash in the PDV, effectively activating the object.
Multiple Instances of the Same Hash Object
A natural question arises from the discussion above: If the declaration and instantiation can be done in a single step, why should we bother doing so in two steps? The answer is, that it offers way more flexibility. I will leave it at that for now, and elaborate in a future blog post. However, as a small teaser, let me emphasize, that separating the two operations offers the flexibility of creating more than one instance of the same hash object.
Consider the following code.
data _null_; declare hash h; h=_new_ hash(); /* Instance 1 */ h=_new_ hash(); /* Instance 2 */ run;
Here, we create two instances of the same hash object h. The object that is instantiated last is active when the data step terminates. This is a very powerful technique if we use it with care. You can read more in the blog posts A Hash Object Of Hash Objects (Hash Of Hashes) and Count Distinct Values in SAS With the Hash Object.
In this post, I have demonstrated the two phases in creating a hash object in SAS: The declare phase and the instantiation phase. We have seen that these can be carried out in two different ways. This is a crucial concept if we want to fully understand the hash object. Also, the same goes for the Hash Iterator Object.
As usual, my favorite reference is the book Data Management Solutions Using SAS Hash Table Operations: A Business Intelligence Case Study. It is worth every penny and much more.
Also, read the related post An interesting PDV Application of the SAS Hash Object.
You can download the entire code from this post here.