Using the BUFSIZE and BUFNO System Options in SAS
When SAS reads or writes a data set, it does so in chunks. We call these chunks “pages”. Like a book, SAS reads/writes a data set one page at the time. This is called an Input/Output (I/O) Operation. I/O Operations between disk and memory are usually the slowest part of data a process in SAS and should be limited to a minimum. In this post, I will introduce the system options BUFSIZE and BUFNO. These lets you take control of the size and number of data set buffers used to move data.
Before we go further and explore the two options, let us review the current values of the two system options with PROC OPTIONS.
proc options option=(bufsize bufno); run;
When a Data Step or SAS Procedure reads a data set, that allocate blocks of memory to hold the data. These blocks are called buffers. The BUFSIZE System Option sets the size of each of these buffers. BUFSIZE has a default value of 0. This means that SAS determines the buffer size from the individual data set. Be aware though, that SAS does so from the joint record length. Not the number of observations. We can set the Bufsize globally like this
options bufsize=32k; proc options option=(bufsize bufno); run;
The Options Procedure reveals that the buffer size has indeed changed from 0 to 32k. This means that when SAS reads a data set, the amount of data that can be transferred in one I/O operation to one buffer is 32k.
Increasing the buffer size can reduce program run time quite a lot. This is because we reduce the number of times SAS has to read/write data between memory and disk. Remember, this is usually the slowest part of a data flow. Remember though, that the reduction in elapsed time does come at the expense of increased memory consumption.
The BUFNO System Option lets you control the number of buffers that SAS allocates when reading/writing a data set. The default value is 1. This means that SAS allocates a single buffer in memory at the size of BUFSIZE. We set the BUFNO system option like this
options bufno=5; proc options option=(bufsize bufno); run;
Just as with the BUFISZE, the number of buffers can reduce elapsed time of your process. For the same reasons as above. Still, this is not free. It comes at the expense of increased memory consumption.
A Small Example
Let us consider a small example. In the code below, I create two data sets, a and b. The two data sets are exactly equal in terms of size, record length etc. The only difference is that in the first data set, I do nothing about the BUFNO and BUFSIZE Options. In the second data set, I set BUFSIZE=256 and BUFNO=20. This means that I allocate 20 buffers in memory. Also, I specify that a single I/O operation to one of these buffers can carry 256k data. This means that a single I/O operation can read/write 20*256k. This is roughly 5MB data.
If we compare the PROC CONTENTS output from the two data sets, we see that the data sets are identical. Except for the way that SAS divides the data into pages. Data set a is divides into 2482 pages. Data set b is divides into 310 pages. This is because of the BUFSIZE option. In data set a, we rely on the OPTIONS BUFSIZE=32k specified above. In data set b, we overwrite this in the data set options and set BUFSIZE=256k. This means that the buffer sizes are 8 times the size. This makes sense, since 310*8 is 2480. Very close to the original page number of 2482.
data a; do x=1 to 10e6; output; end; run; proc contents data=a; run; data b(bufsize=256k bufno=20); do x=1 to 10e6; output; end; run; proc contents data=b; run;
The small example above rises the question: “Can I read/write the entire data set in a single I/O operation”. And the answer is boring. It depends. Technically yes. You can set BUFSIZE and BUFNO to values such that the number of pages is less than the number of allocated buffers in memory. This means that when SAS reads the data set, there is one buffer available for each page, which means that SAS can process the entire data set in one operation.
However, using the BUFNO and BUFSIZE Options are not the right tools to read an entire data set into memory. If this is your goal, then use the SASFILE Statement. I demonstrate this in the blog post Load Data Set into Memory with SASFILE Statement.
The BUFNO and BUFSIZE system and data set options are important when moving large amounts of data. These control the amount of data to be processed in a single I/= operation. This post is meant to be an introduction to the two options and how to use them. In the next post, I will do a case study to show you how much these options matter. See the blog post A SAS Case Study of the BUFSIZE and BUFO Options.
You can download the entire code from this post here.