Load SAS Data Set Into Memory With SASFILE Statement
In the blog post Using the BUFSIZE and BUFNO Options in SAS, I demonstrate how to set the size and number of the input/output buffers in memory. Sometimes however, we want to load an entire data set into memory. We could use the BUFNO and BUFSIZE system options and set them so that the entire data set fits into memory. However, there is an easier and more efficient way. Namely using the SASFILE Statement.
The SASFILE Statement
The SASFILE Statement is a global statement. That means, we can use it in open code. When we use the SASFILE Statement, SAS allocates enough buffers in memory to hold the entire data set in memory. That means, the SASFILE Statement relies on the BUFSIZE System option to calculate the required number of buffers.
The SASFILE Statements has three important options, of which we must specify exactly one. Load, Open and Close. When Load is specified, SAS immediately loads the data set into memory. When we specify Open, SAS Allocates the necessary buffers. However, the data set is not loaded into memory until a SAS Procedure or a data step uses the data set. The Close Option frees the memory that the data set occupied.
Not surprisingly, the SASFILE Statement is most powerful when SAS reads or writes the same data set many times. A nice example is resampling from the same data set many times. First, let us see how to do this with the data step. Without loading the data set into memory.
data test; do x=1 to 10e6; output; end; run; data Resample; do Sample=1 to 1000; do i=1 to 1000; point=rand('integer', 1, nobs); set test point=point nobs=nobs; output; end; end; stop; run;
As a reference point, this proces took about 35 seconds on my system. Now, let us use the SASFILE Statement and load the data set into memory first. This will save us a lot of I/O operations.
sasfile test load; data Resample; do Sample=1 to 1000; do i=1 to 1000; point=rand('integer', 1, nobs); set test point=point nobs=nobs; output; end; end; stop; run; sasfile test close;
The process now takes under a second. An enormous time saving with so little code. However, it goes to show that you should always consider limiting I/O operations wherever possible.
This post demonstrated the use of the SASFILE Statement. Also, we saw that the statement reach its full potential when we read the same data set over and over. A quite substantial time saving can be achieved if the statement is used correctly.
If you want to know more about buffers in memory, see the blog post A SAS Case Study of the BUFSIZE and BUFNO Options.
You can download the entire code from this post here.