It is a commons task in SAS to sort data. Either ascending or descending for one or multiple key variables. Most SAS programmers are comfortable with PROC SORT to sort their data. However, if we want to sort data according to some custom rule, things get a little more complicated. Say, we want to sort a list of key values 1, 2, 3 with 2 first, then 1, then 3. Or Sort a list of countries in descending order, but always keep the United States and China at the top. Then things get a little more complicated. Luckily, it is not as complicated as it may sound. In this post, we will investigate how to achieve this with the SQL Procedure and PROC FORMAT.
In the examples to come, I will use the two example data sets below.
data numbers; input x @@; datalines; 1 3 2 2 3 4 2 3 1 4 ; data country; input country $ 1-25; datalines; Denmark United States of America Germany France Canada China Russia Mexico ;
A PROC FORMAT Approach
When this sort of problem arises in the SAS Online Community or at SAS-L a popular approach is this. Write a custom format or informat that maps the values to ordered numbers. Use this to map each value in the data and sort the data by the mapped value.
proc format; invalue $ c 'United States of America' = 1 'China' = 2 other = 3 ; run; data temp / view=temp; set country; s = input (country, $c.); run; proc sort data=temp out=want (drop=s); by s; run;
Using PROC SQL Order By Clause
Next, let us see how to utilize the flexibility of the Order By Clause in PROC SQL to create custom sort orders. The flexibility lies in the fact that the Order By Clause is not restricted to variables or formatted values thereof. It accepts expressions in general. This means that we can use both function logic and conditional logic. Without having to write a preceding format. Consider the example from above. I can produce the same result with a Case-When block in the Order By Clause like this.
proc sql; create table want as select country from country order by case (country) when 'United States of America' then 1 when 'China' then 2 else 3 end ; quit;
Obviously, this is not the only way to do this in the Order By Clause. I can also use the SAS Whichc Function like below. This creates the same data as the two preceding examples.
proc sql; create table want as select country from country order by whichc(country, 'China', 'United States of America') descending; quit;
Naturally, I can do the same thing with the SAS Whichn Function for numeric variables like this. The Whichn/Whichc trick is not very common. Perhaps because the logic is not as straight forward as the conditional pendant above. However, remember that they return the index of the first matching value in the list. Consequently, in the code below, the function returns 1 for x=2, 2 for x=1 and so on.
proc sql; create table want as select x from numbers order by whichn(x, 2, 1, 3, 4); quit;
In this post, I have demonstrated examples of how to create custom sort orders in SAS data sets with PROC FORMAT and PROC SQL. It seems that the Order by Clause in the SQL Procedure is all we need to create the exact sorting logic we want. Without the preceding format, even though this approach is quite common. This is because it accepts general expressions in form of conditional and function logic directly. This fact makes it extremely flexible and able to create advanced order logic easier than any other tool in SAS. As a side note, PROC SQL also raises the Sorted and Validated Flag. Next week, I demonstrate 4 Ways to Sort a SAS Data Set in Random Order.
You can download the entire code from this example here.