Last week, I demonstrated 4 Techniques to Sort an Array in SAS. One of the four was the qsort macro, which invokes the quicksort algorithm. Today, I will take a closer look at the quicksort algorithm in SAS. I will go through the basic theory. Next, I will show you how to implement a simple quicksort in SAS with Proc Fcmp. Finally, I will compare the basic quicksort to the qsort macro.

Quicksort Algorithm – The Basics

First, let us see how the quicksort algorithm works by a simple example. Consider the table below as an array. Suppose we want to sort it ascending. The first part of the algorithm is choosing the pivot. There are many ways to do this. For simplicity, let us choose the right-most element. I highlight the pivot in red below. Now, I want to insert the pivot in the correct position in the final array. That means that all elements to the left of the pivot should be less than or equal to the pivot value and all the elements to the right must be greater than or equal to it. I’ll get back to the specifics of how we do this in the SAS code later.

81047251936

Now, the number 6 is in the right position and will stay there. Next, I have two partitions of the original array to consider. First, I focus on the left part of it. Again, I choose the right-most element to be the pivot and arrange the values in the partition so that the pivot can be inserted in the correct position.

42513679108

Again, the same thing happens. We focus on the lower partition, choose the pivot and insert it in the correct position.

21345679108

Now, there is no more lower partitions to consider, so we take the upper partition and perform the same operation.

12345679108

We have now successfully sorted the lower partition of the original SAS array. From here, we go all the way back and consider the initial upper partition.

12345679108

The quicksort algorithm is popular for a number of reasons. First of all, it sorts in-place. Meaning that we do not need to allocate some extra buffer to hold anything temporarily. We simply move elements around. Secondly, it scales well. In big O notation it scales in O(n log(n)) time, which is the best possible scenario for a comparison sort. Finally, it is simple. You do not need to know much about algorithms to get a good understanding of it.

Obviously, I only scratch the surface of the theory here. There are many ways to optimize the algorithm by choosing the pivot differently and so on. If you want to learn more about the quicksort algorithm, see the explanation at geeksforgeeks. Also, read the discussions about Quicksort Vs. Mergesort and Choosing the Pivot at StackOverflow.

A SAS Proc Fcmp Implementation

The recursive logic of the example above calls for recursive SAS code. Luckily, a Subroutine written in Proc Fcmp can call itself. There are many ways to implement the quicksort algorithm in recursive code. However, I will go with the most common approach you will find online.

Below, I write two SAS subroutines. The partition subroutine does the actual sorting of the considered partition. First, we select the pivot as the right-most element. Next, I initialize two index variables i and j and let them both run from right to left. As a side note, this differs from the approach you will see in the paper at the bottom with the qsort macro. I let j increment all the way up to the element before the pivot. However, i only increments when j is less than the pivot. In that case, I swap the i’th and j’th element. Because I know that this will eventually separate all the values less than and greater than the pivot value. Finally, I can insert the pivot into the i+1’th place and the partition is done.

The second SAS routine is the recursive part. Here, I call the partition subroutine if low is less than high. I do this to ensure that we only call the partition routine if the partition has more than one element. If it holds only one value, we know that it is correctly placed.

proc fcmp outlib=work.f.f;
 
   subroutine partition (low, high, i, arr[*]);
      outargs low, high, i, arr;
 
      pivot = arr [high];
 
      *put / "Pivot: " pivot;;
      *put "Before Partition: " arr;
 
      i = low-1;
 
      do j = low to high-1;
         if arr [j] < pivot then do;
            i       = i+1;
            t       = arr [i];
            arr [i] = arr [j];
            arr [j] = t;
         end;
      end;
 
      t          = arr[i+1];
      arr [i+1]  = arr[high];
      arr [high] = t;
 
      i=i+1;
 
     *put "After Partition:  " arr /;
 
   endsub;
 
   subroutine quicksort (low, high, arr[*]);
      outargs arr;
 
      if low < high then do;
         call partition (low, high, i, arr);
 
         call quicksort (low, i-1,  arr);
         call quicksort (i+1, high, arr);
      end;
   endsub;
 
quit;

Next, let us see the quicksort SAS routine at work. I store and use the subroutine above. Then, I create the array x, which is identical to the one in the example at the top. Finally, I call the subroutine.

options cmplib=(work.f);
data _null_;
   array x {10} _temporary_ (8 10 4 7 2 5 1  9 3 6);
 
   call quicksort(lbound(x), hbound(x), x);
run;

If you comment out the Put Statements in Proc Fcmp, you can see each step in the SAS log. This is identical to the steps in the example above.

Pivot:  6
Before Partition:  8 10 4 7 2 5 1 9 3 6
After Partition:   4 2 5 1 3 6 7 9 10 8
 
Pivot:  3
Before Partition:  4 2 5 1 3 6 7 9 10 8
After Partition:   2 1 3 4 5 6 7 9 10 8
 
Pivot:  1
Before Partition:  2 1 3 4 5 6 7 9 10 8
After Partition:   1 2 3 4 5 6 7 9 10 8
 
Pivot:  5
Before Partition:  1 2 3 4 5 6 7 9 10 8
After Partition:   1 2 3 4 5 6 7 9 10 8
 
Pivot:  8
Before Partition:  1 2 3 4 5 6 7 9 10 8
After Partition:   1 2 3 4 5 6 7 8 10 9
 
Pivot:  9
Before Partition:  1 2 3 4 5 6 7 8 10 9
After Partition:   1 2 3 4 5 6 7 8 9 10

Comparing to the Qsort SAS Macro

In the SAS code above and in the algorithm, I consider only simplicity. Not performance. Let us compare the subroutine to the Qsort macro from the article at the bottom. First, I create a large, temporary array. Next, I fill it with random values and sort it. First, I do so with the Call Routine from above. Next, I do it with the Qsort macro.

options cmplib=(work.f);
 
data _null_;
   array x {30000000} _temporary_;
 
   do _N_=1 to dim(x);
      x [_N_] = rand('integer', 1, 100000);
   end;
 
   t = time();
   call quicksort (lbound(x), hbound(x), x);
   _t = time()-t;
 
   put _t=;
run;
 
data _null_;
   array x {30000000} _temporary_;
 
   do _N_=1 to dim(x);
      x [_N_] = rand('integer', 1, 100000);
   end;
 
   t = time();
   %qsort (Arr=x);
   _t = time()-t;
 
   put _t=;
run;

The log reveals that the Qsort macro runs about 3 times faster than the call routine. There are a few reasons for this. First of all the pivot is chosen carefully, so that there is a high probability of getting even partitions. Secondly, the Qsort macro is actually not a pure quicksort algorithm. If the partition holds less than 9 elements, the macro invokes an insertion sort for the partition instead. Which is faster than a quicksort for small partitions. Finally, the macro is absolutely beautifully written and way better tested than the simple approach of mine above.

Summary

In this post, we consider the quicksort algorithm from a SAS point of view. I briefly go through the basic theory of the algorithm. Next, I demonstrate a simple quicksort implementation in the FCMP Procedure and replicate the example in SAS. Finally, I compare the performance to the qsort macro. It turns out, that the qsort macro is quite fast. There is plenty of more material and good discussions on the quicksort macro out there. I have linked to a few StackOverflow threads that are worth reading.

You can read the article supporting the Qsort Macro by Paul Dorfman here QuickSorting An Array.

You can download the entire code from this post here.