In the blog post Three PROC SORT Options You Should Know, I demonstrate how to use the three options Sortsize, Tagsort and Presorted. Sorting data is usually the most CPU and time consuming part of a data flow. Therefore, you should familiarize yourself with PROC SORT and how to twerk it to your advantages. Consequently, I devote a second post to three other options that controls the Sort Procedure in SAS. Namely the Details, Noequals and Nouniquekey Options.
In the following, I will use the example data below. The data is for demonstration purposes only.
/* Set relevant options */ options fullstimer msglevel=i threads cpucount=4; /* Create example data */ data MyData(drop=i); do i=1 to 5e7; ID=rand('integer', 1, 1e4); val=rand('integer', 1, 1e5); output; end; run;
The Details option displays information about the progress of the sort algorithm in the log. Then Details Option displays information such as
- Whether the sort completes in memory (an internal sort) or it requires a utility file.
- Attributes of the utility file(s) and information about the merge process of utility files.
- Whether or not a SAS uses multithread processing.
proc sort data=MyData details; by ID; run;
When you run the Sort Procedure above, you may not get the same results from the Details Option as me. The sort algorithm depends heavily on various options such as allocated virtual memory, allocated RAM, whether threaded processing is permitted and so on.
Not surprisingly, most programmers mainly use the Details Option as a performance tester or debugging tool. The Details Option is rarely used in production code or batch jobs.
When SAS performs a threaded sort, each thread sorts a portion of the data. Finally, SAS merges the data back together. By default, the order of data in the same by-groups is preserved. This is CPU and memory costly and sometimes unnecessary. Is the order within by-groups is not of interest, you can use the Noequals Option to speed things up. Be aware though, that using the Noequals option, you can not be sure that each PROC SORT run returns data in the same order.
Needless to say, the Noequals Option has no effect on single threaded processes.
proc sort data=MyData noequals; by ID; run; /*proc sort data=MyData equals; by ID; run;*/
The Nouniquekey Option has the opposite effect of the Nodupkey Option. While the Nodupkey Option removes duplicate observations, the Nouniquekey Option removes any group which contains exactly one item. The effect of the option is best demonstrated with an example. Consider the following small data set.
Data MyData; input ID $ var; datalines; 1 10 1 20 3 10 3 20 3 10 2 30 run;
The data above has three distinct IDs (1, 2 and 3). Two of the ID groups has strictly more than one item (Id=1 and 3). The group ID=2 has a single item. Now, consider the PROC SORT call.
proc sort data=MyData nouniquekey; by ID; run;
Not surprisingly, the Sort Procedure sorts the data and removes the single item by group from the data set. Consequently, the resulting data set has 5 observations and two distinct by-groups. The options is not nearly as frequently used as its counterpart NodupKey. However situations may arise where you do not want single-row groups in your data. Here, the Nouniquekey option is the way to go.
In this post, we have seen three small examples of PROC SORT Options, that may come in handy to you. Sorting data is necessary but costly. Therefore, it is important to do so properly and without spending unnecessary time or computer resources. I have previously written about Three Alternatives to PROC SORT in SAS, The Importance of the SORTED and VALIDATED Flags and The Difference Between NodupKey and NoDup in PROC SORT.
If you want a comprehensive examination of PROC SORT from a performance perspective, read chapter 4 from High Performance SAS Coding by Christian Graffeuille.
You can download the entire code from this post here.