When you want to sort a SAS data set, PROC SORT is the obvious procedure choice. In most cases it is also the most efficient. However, alternatives do exist. In this post, I will present example code of three alternatives to the SORT Procedure. Furthermore, I will discuss when to use alternative approaches.
In the three examples to come, I will do the exact same sort operation using the sashelp.class example data set as the Sort Procedure
proc sort data=sashelp.class out=ProcSort; by name; run;
PROC SQL Order By Clause
The obvious alternative to PROC SORT is the Order By Clause in PROC SQL. Simply specify the variable by which you want to sort the data set in the Order By Clause.
proc sql noprint; create table SortSQL as select * from sashelp.class order by name; quit;
The Order By Clause is in most cases as efficient as using a PROC SORT Step. When you have an SQL query and you want to return the processed data set in some sorted order, you should use the Order By Clause. Do not return the data in random order and later use PROC SORT unless you have a very good reason to do so. Remember, you can see what the SQL optimizer does with the _Method Option in PROC SQL. The Order By Clause does have an advantage because it accepts expressions. Not just variables. This means that you can create Custom Sort Orders easily.
Hash Object Ordered: Argument
The next example is using the Ordered: argument in the Declare Hash Statement in a SAS hash object. The Ordered: argument specifies in what order SAS retrieves observations from the hash object based on the key variables. Therefore, we specify the Name variable in the definekey method and ude the output method to create the sorted data set SortHash.
data _null_; if 0 then set sashelp.class; declare hash h(dataset:"sashelp.class", multidata:"y", ordered:"y"); h.definekey("name"); h.definedata(all:"y"); h.definedone(); h.output(dataset:"SortHash"); run;
Using a hash table may seem a bit complicated, especially when looking at all the messy code. However, it can be useful if you need to both sort data and process it in a data step. Since a hash object can only reside in memory, the sorted data is read and loaded in memory once. While this seems memory efficient, it does restrict the data set to be able to fit in memory.
SAS Index By Group Processing
Lastly, let us look at how we can use a SAS index to return a data set in sorted order. First, I simply copy the sashelp.class data set into the Work Folder. I can not add an index to the Sashelp data sets. Then I use PROC DATASETS to add a simple index on the variable by which I want to sort the data, in this case, Name. Next, I can create a sorted data set by using the simple index by using a BY Statement in the last data step.
proc copy in=sashelp out=work memtype=data; select class; run; proc datasets library=work nolist; modify class; index delete _all_; index create name; run;quit; data SortIndex; set class; by name; run;
If your only goal is to sort a data set, it is not your best choice to create an index. However, if you also aim to Increase Performance Using An Index by making a small subset of a large data set, this may be worth looking into. Furthermore, if an appropriate index already exists on the data set, the index sort approach will almost always be the fastest choice. This is due to the fact that the index already holds the variable values in a sorted order in a tree structure.
In this post, I have demonstrated three alternatives to the Sort Procedure. In most cases the Sort Procedure is the best choice, but not always. However, the power of the alternatives depend on the specific problem.
Also keep in mind that PROC SORT and PROC SQL are the only sort options available that sets the Sorted and Validated Flags. You can read about these flags and why they matter in my previous post The Importance of the SAS SORTED and VALIDATED Flag.
You can download the entire code from this post here.