In the blog posts 8 Array Functions You Should Know and Search for Value in Numeric or Character Array, I introduce various functions to search an array. However, if an array is sorted, none of them exploit this information. In this post, I will demonstrate how to perform a binary search of an array in SAS.

In the examples to come, I will not elaborate much on the theory or code. Instead, I will leave it for the reader to explore. The code in this post is largely based on the article Array Lookup Techniques by Paul Dorfman.

A Simple Binary Search of SAS Array

First, let us see a classic example of how to do a binary search. In the snippet below, the two arrays k and d are sorted ascending by the values in k. Suppose, we want to search for the key value 3. First, I set l and h to the lower and upper bound of the array respectively. In this case, that means l=1 and h=10. Now, we perform the search. The search continues until l and h crosses each other or we find the key value.

We set m to the mid-point between l and h. If the desired key is less than the midpoint of k, we consider only the lower part of the remaining array. If the desired key is greater than the midpoint of k, we turn our attention to the upper part. Suppose none of these criteria are true, then key = k[m] and we have a match. In that case, we read the m’th entry of d and terminate the search successfully.

data _null_;
   array k {10} _temporary_     ( 1    2    3    4    5    6    7    8    9    10);
   array d {10} $ 1 _temporary_ ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j');
 
   key = 3;
 
   l = lbound(k);
   h = hbound(k);
 
   do until (l > h);
      m = floor((l + h) * .5);
      if      key < k[m] then h = m - 1; else if key > k[m] then l = m + 1;
      else do;
         data = d[m];
         leave;
      end;
   end;
 
   put key = / data =;
run;

Sequential Search in a SAS Array

Let us take a step backwards. Let us perform a manual sequential search of an array in SAS. Instead, this time we have 10 Million entries instead of 10 in k and d. The arrays are still sorted ascending by k. Let us search for the key value 654321. Simply by starting with entry 1. Check if k[1] = 654321. If not, then consider k[2] and so on. To make the results comparable as we proceed, I do the search 1000 times.

The search for key = 654321 is successful and takes about 70 sec on my system.

%let n = 10000000;
 
/* A matter of speed */
data _null_;
   array k {&n.} _temporary_;
   array d {&n.} _temporary_;
   call streaminit(123);
 
   do x = 1 to &n.;
      k [x] = x;
      d [x] = rand('integer', 1, 1e4);
   end;
 
   key = 654321;
 
   /* sequential search */
   t = time();
   do _N_ = 1 to 10000;
      do x = lbound(k) to hbound(k);
         if k[x] = key then do;
            data = d[x];
            leave;
         end;
      end;
   end;
   et = time() - t;
   put et=;
 
   put key = / data =;
 
run;

However, with a little rewriting, we can perform a quicker sequential search. The search below takes about 60 sec.

data _null_;
   array k {%sysevalf(&n. + 1)} _temporary_;
   array d {&n.}                _temporary_;
   call streaminit(123);
 
   do x = 1 to &n.;
      k [x] = x;
      d [x] = rand('integer', 1, 1e4);
   end;
 
   key = 654321;
 
   /* sequential search */
   t = time();
   do _N_ = 1 to 10000;
      k [hbound(k)] = key;
      x = lbound(k) + 1;
      do until (k[x] = key);
         x ++ 1;
      end;
      data = d[x];
   end;
   et = time() - t;
   put et=;
 
   put key = / data =;
 
run;

Binary Search in a SAS Array

Now, we have a benchmark in terms of run-time from the sequential searches above. Next, let us turn to the binary search. In the code below, we create the same arrays and search for the same key value. The code is implemented almost exactly as the introductory example. This code runs in about 1 second on my system. Quite the time saver! However, this is not surprising, since the search requires much fewer comparisons than the sequential search.

data _null_;
   array k {&n.} _temporary_;
   array d {&n.} _temporary_;
   call streaminit(123);
 
   do x = 1 to &n.;
      k [x] = x;
      d [x] = rand('integer', 1, 1e4);
   end;
 
   key = 654321;
 
   t = time();
   do _N_ = 1 to 10000;
      l = lbound(k);
      h = hbound(k);
      x = .;
      do until (l > h);
         m = floor((l + h) * .5);
         if      key < k[m] then h = m - 1; else if key > k[m] then l = m + 1;
         else do;
            data = d[m];
            leave;
         end;
      end;
   end;
   et = time() - t;
   put et=;
 
   put key = / data =;
 
run;

Let us see if we can do even better than above. The midpoints of interest are pre-determined from the size of the array. Therefore, we can compute them before any search takes place. We do so with a fixed sized array b with 50 entries. 50 is enough for practically all purposes. Now, instead of computing the midpoints during the search, we simply read them from the array. This process is called a uniform binary search. The code snippet below takes about half a second to run on my system.

data _null_;
   array k {0 : &n.} _temporary_;
   array d {    &n.} _temporary_;
   call streaminit(123);
 
   do x = 1 to &n.;
      k [x] = x;
      d [x] = rand('integer', 1, 1e4);
   end;
 
   key = 654321;
 
   array b {50} ;
   diff = dim(k) - 1;
   do p = 1 to log2(&n.);
     diff = diff * .5; 
     b[p] = floor(diff + .5);
   end;
 
   k[0] = .;
 
   t = time();
   do _N_ = 1 to 10000;
      x    = .;
      p    = 1;
      m    = b[1];
      diff = b[1];
      do while (diff > 0);
         p + 1;
         diff = b[p];
         if      key < k[m] then m +- diff; else if key > k[m] then m +  diff;
 
         else do;
            data = d[m];
            leave;
         end;
 
      end;
   end;
   et = time() - t;
   put et=;
 
   put key = / data =;
 
run;

All the examples above rely on the fact that the arrays are sorted in ascending order. Read the blog posts 4 Techniques To Sort An Array in SAS and A Closer Look at the Quicksort Algorithm in SAS to learn more about sorting arrays.

Summary

In this post, we explore how to perform a binary search of an array in SAS. We learn that the binary search is far superior to the sequential search if the array is sorted ascending. Throughout the post, explanations are sparse. However a quick Google Search will teach you more about the fundamentals of binary searching. Rather, this post is meant as a template of how to implement it in SAS. I highly encourage you to read the article at the top. And to edit and play around with the code examples above.

In a future blog post, I will demonstrate how to Use Binary Search of Array in SAS Table Lookup.

You can download the entire code from this post here.