Check If Two Data Sets Are Indetical

It is often convenient to the SAS Data Scientist to compare data sets and check if two data sets are identical. To do so, PROC COMPARE is invaluable. The procedure compares two data sets and provides information on possibles differences between them. It also lets you check if two data sets are exactly identical, which is important if you move data sets between servers or libraries and want to keep exact copies of your production data sets. This example page demonstrates how to check if two data sets are exactly identical.

SAS PROC COMPARE Example

First, let us create an exact copy of the sashelp.class data set in the work library. To do this, I use PROC COPY as demonstrated in Copy And Move Data Sets Between Libraries.

proc copy in=sashelp out=work memtype=data;
   select class;
run;

Next, I use PROC COMPARE and specify sashelp.class as the base data set and work.class as the compare data set

proc compare data=sashelp.class compare=work.class;
run;

PROC COMPARE Information SAS Example Identical Data SetsTo the right, you can see the procedure output. The output contains two parts. A high level part at the top with a small data set summary of the data sets. Furthermore, we see an observation summary with information on number of observations read, equal and unequal from the two data sets. At the bottom we see a note, that confirms that the two data sets are exactly identical.

The SYSINFO Macro Variable

The equality of the compared data sets can also be verified by the SYSINFO macro variable. The macro variable has the value zero if the two data sets are exactly identical. We can check the value with the following data step.

data _null_;
   if &sysinfo.=0 then put "The two data sets are identical";
   else put "The two data sets are not identical";
run;

However, keep in mind that when SYSINFO is equal to zero, the two sets are exactly identical. This means that no differences at all exists between them. Not even label or formats are allowed to differ. If we modify the work.class data set by adding another label than in the original set, the value changes.

proc datasets lib=work;
   modify class;
   label height="Students Heigtht";
run;quit;
 
proc compare base=sashelp.class compare=work.class;
run;
 
%put &sysinfo.;

We can see in the log, that the &SYSINFO value is now 32. You can see what different values of the macro variable means at the Macro Return Codes (SYSINFO) part of the documentation. Exact comparison may not be what you are looking for when you compare data sets. I have previously written a blog post about how to tweak the COMPARE Procedure at How Identical Is Identical in PROC COMPARE?

Summary

The COMPARE Procedure is an extremely important tool to the SAS Data Scientist. If you are not yet familiar with it, I highly encourage you to browse the documentation. Also the article DARE TO COMPARE is a great article on how PROC COMPARE can be used to check similarities and inequalities between data sets.

You can download the entire code from this example here.