Using the Modify Statement in the SAS Data Step

In the blog post Using the Update Statement in the SAS Data Step, I demonstrate how to use the UPDATE Statement to update a Master file with a transaction data set in SAS. In this post, I take it a step further and demonstrate by example how to use the MODIFY Statement in the SAS Data Step.

The Modify Statement in the SAS Data Step has multiple applications. I will not cover all of them here. Instead, I will focus on the application that I find most useful. Namely Updating a Master data set with observations from a transaction data set using the Key= Option. As you can see in the Modify Statement Documentation, this is referred to as Form 2. However, I do encourage you to check out Form 1, 3 and 4 as well. I will show an example below.

In the examples to come, I use the following data sets.

data Master;
input ID $ Value;
datalines;
1 1
2 2
3 3
4 4
5 5
;
 
data Trans;
input ID $ Value;
datalines;
1 10
2 20
4 10
4 20
4 30
5 .
6 60
;

Update a Master File with the Modify Statement with the Key= Option

SAS Modify Statement Data Step ExampleWhen we update a Master data set with observations from a transaction data set with the Modify Statement and the Key= Option, the Key= Option points to a SAS Index. Consequently, let us first create a simple index in the Master data set on the ID Variable.

proc datasets library=work nolist;
   modify Master;
      index delete _all_;
      index create ID / nomiss;
run;quit;

Next, the actual update happens. Below, I will go over the main steps in the Modify process

  1. First, I read the Trans data set sequentially with a Set Statement. I rename the Value variable to TransValue to be able to distinguishe it in the further process.
  2. Next, I use the Modify Statement with the Key= Option. I specify the Master data set in the Modify Statement and the simple ID Index in the Key= Option. This means that SAS takes the current ID value read in the Set Statement and searches for it with a binary index search in the Master data set.  Consequently, non of the input data sets need to be sorted. The index takes care of it. This is in contrast to Update, which must be followed by a By Statement and where both input data sets must be sorted by the appropriate variables. Finally, I use the Unique Option to handle duplicates in the transaction data set.
  3. Next, I use a Select-When Statement to conditionally process the automatic _IORC_ Variable. The _IORC_ Variable is a numeric variable that indicates the status of the I/O operation. Now, to make things even more complicated, the _IORC_ values changes as SAS version changes. Therefore, we check the mnemonic values of _IORC_ with the %SYSRC Macro.
  4. The mnemonic value _sok means that there is a match. In this case, I check if the value in the Transaction data set is missing, since I do not want to overwrite with missing data in the Master data set. If it is not missing, I use the Replace Statement to overwrite the current value in the Master data set with the transactions value. Most programmers do not know about the Replace Statement. You should only use it when you use the Modify. Check the documentation.
  5. Next, we handle the case of a non-match. This has the mnemonic value _dsenom. In this case, when the transaction key is not already in the Master data set, I simply add it with the Output Statement.
data Master;
   set Trans(rename=(Value=TransValue));     /* 1 */
   modify Master key=ID / unique;            /* 2 */
   select (_iorc_);                          /* 3 */
      when (%sysrc(_sok)) do;                /* 4 */
         if TransValue ne . then do;
            Value=TransValue;
            replace;
         end;
      end;
      when (%sysrc(_dsenom)) do;             /* 5 */
         Value=TransValue;
         output;
         _error_=0;
      end;
      otherwise do;
         put 'Unexpected I/O Error: ' errormessage;
         _error_=0;
         /*stop;*/
       end;
   end;
run;

It is not hard to see that the Modify Statement is very flexible. The conditional processing of the _IORC_ variable lets you take any action required based on whether there is a match or not. Furthermore, it is more efficient than the Update Statement, given that the two input data sets are not already sorted. The index search ensures that no sort needs to take place.

Summary

In this post, I introduce the Modify Statement with the Key= Option. This approach lets you flawlessly update Master data sets with transactions. The flexibility of the Modify Statement lets you take any action you want based on whether an index search succeeds or not. This post only covers a small part of what the Modify Statement can do. For a nice introduction to the Modify Statement and the pros and cons of it, see the article MODIFY: The Most Under-Appreciated of the Data Step File Handling Statements.

If you want to learn more about the Key= Option in the Modify Statement, read chapter 12 in Michael Raithel great book The Complete Guide to SAS Indexes.

You can download the entire code  with examples from this post here.