5 Tips to Learn and Understand the Hash Object in SAS
I have written quite a lot about the SAS hash object. Hash objects are becoming increasingly popular. Almost all of the questions that I receive on the blog is regarding hash objects. And the popularity is justified. Hash objects solve problems that no other programming facility in SAS does. Also, the internal construction of the hash object makes it very efficient and quite easy to maintain as well.
In this post, I will focus on the learning side of the hash objects. Because the learning curve can be quite steep. For the intermediate Base SAS Programmer, the syntax doesn’t look like regular data step syntax. Below, I will present five tips to learn and understand the hash object in SAS.
1. The Proper Literature
If you are serious about learning about hash objects in SAS, there are two books you have to acquire:
- SAS Hash Object Programming Made Easy: A very nice Introduction to the hash object. The books covers the hash object mainly as a lookup tool. It explains the object in an easy to understand language. Even the beginner/intermediate SAS programmer will have no problem following the pace in the book.
- Data Management Solutions Using SAS Hash Table Operations: A Business Intelligence Case Study: While the book above covers the hash object mainly as a lookup tool, this book takes it further. It takes a deeper dive into how and why the hash object is so efficient and covers quite a lot of topics unfamiliar to most programmers.
I recommend that you read both books. If you have some knowledge of the hash object already, I recommend that you skip the first one and go straight to the second.
2. The PDV and Host Variable Interaction
The key to understand the SAS hash object is to understand the interaction between the PDV and the hash object itself. The hash object kan transfer data to and from the PDV. No other interactions between the data step and the hash object exists.
There exists three types of methods. Some transfer data from the PDV to the hash object. Some transfer data from the hash object to the PDV (such as in a Hash Object Lookup). Finally, some methods do not transfer data at all.
data _null_; declare hash h(); h.defineKey('k'); h.defineData('d'); h.defineDone(); k=1;d=2; h.add(); /* PDV --> Hash Object */ k=2;d=4; h.add(); /* PDV --> Hash Object */ k=1; rc=h.find(); /* Hash Object --> PDV */ put k= +2 d=; run;
You should familiarize yourself with the different types of hash object methods available. I the code above, I use the Add() Method to add the currect values of the host variables in the PDV to the hash object. Furthermore, I use the Find() Method to search for the current PDV host variable value (k=1) in the hash object. If the value exists, the associated data values are transferred into the PDV host variables. I have assembled a list of hash object methods and how they should be called in the blog post Assigned Vs Unassigned Hash Object Method Call in SAS.
3. The Hash Function
If you want to understand why hash objects are fast, you have to learn about hash functions. A hash function takes an input and produces some series of bytes, that is unique to the particular input. In a hash object context, this series of bytes are converted into a number that points to an AVL Tree. Then, when a hash search search is performed, SAS uses the same hash function to direct attention only to the relevant search tree. That is the reason that the hash search algorithm is so fast and scales well. I compare the hash search to the index search in the blog post Comparing SAS Hash Object And Index Search Algorithm.
data _null_; string="Hash This!"; one=md5(string); two=sha256(string); put one hex32. / two hex32.; run;
Run the code above and check the log. That is what a 32 bit Hexadecimal representation of a hash function output looks like. I have previously written an entire post on the topic in MD5 and SHA256 Hash Function Example.
4. An In-Memory Structure
The hash object resides in memory. That is (partly) why it is so efficient. However, this also means that the amount of data you can store in a SAS hash object is no longer bounded by disk space. Instead, it is limited by the memory available to your SAS session.
This means that you have to be careful about the data you put into a hash object. Do only read in the data that is absolutely necessary. I have previously written about the memory consumption of a hash object in the blog post How Much Memory Does SAS Hash Object Occupy?
Furthermore, I present a few techniques to limit the hash object size in the blog posts Three Basic Techniques to Reduce SAS Hash Object Size and Two Advanced Techniques to Reduce SAS Hash Object Size.
5. The Ability to Grow and Shrink at Run Time
The hash object is the only data structure in SAS that can grow and shrink dynamically at run time. Unlike a SAS array, we do not have no specify the number of entries or the amount of memory at compile time. This is quite extraordinary and holds some very powerful applications. Some of which, I present in the blog post The SAS Hash Object as a Dynamic Placeholder.
As a small teaser, consider the code below.
data _null_; declare hash h(); h.defineKey('k'); h.defineData('d'); h.defineDone(); do k=1 to 10; d=k*2; h.add(); num=h.num_items; put num=; end; do k=10 to 1 by -1; h.remove(); num=h.num_items; put num=; end; run;
Here, I set up a hash object h. Then I fill it up with items one at the time and put the number of items in the SAS log. As you can see, I can add and remove items from the hash object directly while the program executes. Quite cool.
In this post, I present five tips to learn and understand the SAS hash object. The hash object can be quite hard to grasp at first. Also, the syntax does not look like ordinary data step programming. But once you get the hang of it, it unlocks a world of opportunities not offered by any other programming facility in SAS.
Did I miss something? Feel free to reach out if you have some learning materials or tips on the hash object that I missed here.
You can download the entire code from this post here.