When you have a SAS data set with multiple levels of a class variable, you may want to split up that dataset by group. This is fairly easy if you know the different levels of the class variable. However, if you do not, it becomes more complicated. However, not so complicated that a little hash object wisdom can’t do the trick. This post shows you a simple example of how.
I know the famous sashelp.iris dataset well. So I know that the variable species has exactly three levels, namely Setosa, Versicolor and Virginica. This means that I can simply use a SAS data step, create a dataset for each of the three species, and use if-then logic to output to the relevant datasets. This is done in the example below.
data Setosa Versicolor Virginica; set sashelp.iris; if Species = 'Setosa' then output Setosa; else if Species = 'Versicolor' then output Versicolor; else if Species = 'Virginica' then output Virginica; run;
Hash Object By Group Example
But what if I do not know the levels of the class variable, i.e. you do not know the different by-groups? Then the above approach is no good because you do not know how many different datasets to create or the names of the different levels of the variable. Consequently, we have to use other methods. My preferred method is using the Hash Object as in the example below.
/* Sort data */ proc sort data = sashelp.iris out=SortIris; /* Sort data by split-up variable */ by species; run; /* Create a SAS dataset for each unique appearance of the input variable */ data _null_; if _n_=1 then do; if 0 then set SortIris; /* Populate PDV */ declare hash h(dataset:"SortIris(obs=0)", multidata:'y'); /* Declare Hash Object */ h.definekey(all:'y'); /* Define Key */ h.definedata(all:'y'); /* Define Data */ h.definedone(); /* Define Done */ end; do until(last.species); set SortIris; /* Read in variables from by group */ by species; h.add(); /* Add to Hash Object */ end; h.output(dataset:species); /* Output from Hash Object to data set */ h.clear(); /* Clear Hash Object */ run;
The example above shows how this can be done with a Hash Object approach. The method goes as follows:
- Sort the data by the class variable
- In the first iteration of the data step, declare the object
- For each by group, add all observations to the Hash Object inside a DoW Loop.
- Finally, output from the Object to a data set with the same name as the relevant level of the class variable and clear it again.
An alternative SQL solution to this problem is presented by Chris Hemedinger at The SAS Dummy. Also, I demonstrate a more advanced, but more flexible approach in the blog post A SAS Hash Object Of Hash Objects (Hash Of Hash).
This is just one of many useful applications of the SAS object. See the following related posts for more inspiration: SAS Hash Object Lookup Example, SAS Do_Over Method Example and Run Time Effect Of Hash Object HASHEXP Argument Size.
Do you know any alternatives to this approach or do you have questions? Contact me through the Contact Form.