In the blog post Create Custom Sort Order in SAS, I demonstrate how to sort the data by custom logic. Not just ascending or descending. Today, I will use some of that knowledge to demonstrate different approaches to randomize a data set in SAS. In other words, sort the data in random order. I will demonstrate four different approaches to doing so. Proc Sort, Proc SQL, a Data Step approach and a Proc Surveyselect method.

In the examples to come, I will use the sashelp.class data set for demonstration.

Proc Sort

First, let us see how to use Proc Sort. I can not do this in Proc Sort directly. I have to do some preparation first. Therefore, I create a SAS view classv. In it, I generate a random, uniform number s. Next, in the Proc Sort step, I simply sort by s and drop it again. This approach is quite common. It is easy to understand and fairly efficient.

data class / view=classv;
   set sashelp.class;
   s = rand('uniform');
run;
 
proc sort data=classv out=want(drop=s);
   by s;
run;

Proc SQL

Next, let us look at the Proc SQL approach. The SQL Procedure lets me create a random sort order in a single step. This is because the Order By Clause accepts general expressions. Not just variables as Proc Sort. Consequently, we can generate a random number directly in the Order By Clause and Order By that. Very simple and easy to understand.

proc sql;
   create table want as
   select * from sashelp.class
   order by rand('uniform');
quit;

Proc Surveyselect

Next, let us see how to use Proc Surveyselect. Usually, we use the Surveyselect Procedure to random sampling either With or Without replacement. However, we can consider this problem as a random sampling where the sample is the entire SAS data set. Luckily we can use the Samprate=1 option in the procedure statement to specify that the sample is the entire data set. Next, the Outrandom Option does the rest of the work and returns the data in random order.

proc surveyselect data=sashelp.class out=want noprint outrandom
     method=srs 
     samprate=1;
run;

While the Data Step and SQL approaches above considers the entire SAS data set and randomizes that, the Surveyselect Procedure can handle more complex cases quite nicely. Consider the code below. Here, I sort and group the data by sex. Next, I use the Strata=Sex statement in the Surveyselect Procedure. The result is that the data is still grouped. however, the data is randomized within each group.

proc sort data=sashelp.class out=class;
   by sex;
run;
 
proc surveyselect data=class out=want noprint outrandom
     method=srs 
     samprate=1;
     strata sex;
run;

Data Step

Finally, let us see how to randomize with the SAS data step alone. This can be done with the data step, though the three approaches above are probably better suited for the job. In the code below, I use a temporary array scheme. The array holds the observations numbers to start with. I generate a random number within the range of the observations numbers and pick that observation to output. Then, I move the observation number to the end of the array and leave that element out the next time I generate a random number. I do so until there are no more observations to pick.

data want(drop=h i);
   array s {19} _temporary_ (1:19);
   h = n;
   do _n_ = 1 to 19;
      i = rand ("integer", h);
      p = s [i];
      set sashelp.class point=p nobs=n;
      output;
      s [i] = s [h];
      h = h-1;
   end;
   stop;
run;

Summary

In this post, we have investigated four different ways to sort a SAS data set in random order. These are by Proc Sort, Proc SQL, Proc Surveyselect and the Data Step alone. We see that some are more intuitive and simple than others, while some handle more complex cases better. Which one to use is a matter of preference.

For related posts, read SAS Data Step Permutations with Randperm and Allperm.

You can download the entire code from this post here.