Split Dataset By Group in SAS

When you have a SAS data set with multiple levels of a class variable, you may want to split up that dataset by group. This is fairly easy if you know the different levels of the class variable. For example, I know the famous sashelp.iris dataset well. So I know that the variable species has exactly three levels, namely Setosa, Versicolor and Virginica. This means that I can simply use a data step, create a dataset for each of the three species, and use if-then logic to output to the relevant datasets. This is done in the example below.

data Setosa Versicolor Virginica;
	set sashelp.iris;
	if Species = 'Setosa' then output Setosa;
	else if Species = 'Versicolor' then output Versicolor;
	else if Species = 'Virginica' then output Virginica;
Hash Object By Group Example

But what if I do not know the levels of the class variable, i.e. you do not know the different by groups? Then the above approach is no good because you do not know how many different datasets to create or the names of the different levels of the variable. Consequently, we have to use other methods. My preferred method is using the Hash Object as in the example below.

/* Sort data */
proc sort data = sashelp.iris out=SortIris;                     /* Sort data by split-up variable       */
   by species;
/* Create a SAS dataset for each unique appearance of the input variable */
data _null_;
   if _n_=1 then do;                                            /* Need only declare Hash Object once   */
      if 0 then set SortIris;                                   /* Populate PDV                         */
      declare hash h(dataset:"SortIris(obs=0)", multidata:'y'); /* Declare Hash Object                  */
      h.definekey(all:'y');                                     /* Define Key                           */
      h.definedata(all:'y');                                    /* Define Data                          */
      h.definedone();                                           /* Define Done                          */
   do until(last.species);                                     
      set SortIris;                                             /* Read in variables from by group      */
      by species;
      h.add();                                                  /* Add to Hash Object                   */
   h.output(dataset:species);                                   /* Output from Hash Object to data set  */
   h.clear();                                                   /* Clear Hash Object                    */

The example above shows how this can be done with a Hash Object approach. The method goes as follows:

  • Sort the data by the class variable
  • In the first iteration of the data step, declare hash object
  • For each by group, add all observations to the Hash Object.
  • Finally, output from the Hash Object to a data set with the same name as the relevant level of the class variable and clear it again.

An alternative SQL solution to this problem is presented by Chris Hemedinger at The SAS Dummy.

Do you know any alternatives to this approach or do you have questions? Contact me through the Contact Form.