In a recent thread at the SAS Community a user asked how to generate all possible sums of a list of numbers in SAS. This is not a trivial subject. Also, one should be careful that the list of number is not too big. Binary logic tells us that when N = 20, the number of possible sums exceeds 1 Mio. In this post, I demonstrate how to approach the problem and provide solutions using Proc Summary and the Data Step.

In the examples to come, I will use the simple SAS data set below. This is simply the numbers 1, 2, 3, 4 and 5 in 5 different observations.

data have; input x @@; datalines; 1 2 3 4 5 ; |

### Using Proc Summary

First, let us see how to use Proc Summary to create all possible sums of the numbers 1, 2, 3, 4 and 5 in SAS. Binary logic tells us that these numbers can create 2^5-1 = 31 sums. The first step is to prepare the data. In this case, we want a wide data set. In other words, one variable for each number. Below, I do this in a data step. This gives me the variables a1-a5 and one observation.

data _null_; if 0 then set have nobs=n; call symputx('nobs',n); stop; run; %put nobs = &nobs.; data temp; array a {&nobs.}; do i = 1 to &nobs.; set have; a[i] = x; end; run; |

We want a wide data set because we want to utilize the fact that Proc Summary computes statistics for each combination of the levels in the Class variables. Usually, we control this with the Nway Option. The Nway Option makes sure that all class variables contribute to the calculations. However, we do not want that here. Therefore we omit the Nway Option.

proc summary data = temp chartype descendtypes completetypes; class a:; var a:; output out = temp2 sum = / ways; run; data want; set temp2(drop = _TYPE_ _FREQ_ where = (_WAY_)); s = sum(of a:); run; |

Run the SAS code above and verify that we create 31 sums (2^5 – 1).

### Using The Data Step – Method 1

Next, let us use the data step to create the same result. The first step is to find the number of observations in the input data. In the following data step, I create the desired result. First, I create a temporary array with the number of entries specified by nobs. In this case 5. Next, I read the input data set and place the values of x in the array. Finally, I loop from 1 to 2^5-1 backwards. Remember, this is the number of ways to combine the numbers from the input data. For each i, I find the binary representation using the Binary Format. I initially set sum equal to zero. Then, I use the Char Function to retrieve the individual bits. I use this as a coefficient to calculate the sums. Run the code and verify that the result is identical to the one above.

data _null_; if 0 then set have nobs=n; call symputx('nobs',n); stop; run; %put nobs = &nobs.; data want; array aa {&nobs.}; do i = 1 to &nobs.; set have; aa[i] = x; end; do i = 2 ** &nobs. - 1 to 1 by -1; s = put(i, binary&nobs..); sum = 0; do k = 1 to &nobs.; coef = char(s, k); sum + coef * aa[k]; end; output; end; format i binary5.; keep aa: i sum; run; |

### Using The Data Step – Method 2

Finally, let us check out a third method. The logic in the code below is a bit like the one above. However, instead of relying on the Binary Format and implicit conversion, I create a second array with pre-computed bit masks. Next, I do the same overall loop as above. However, in this case, I use the bitmasks and the Binary And Function to check if the coefficient should be 0 or 1. This logic is explained in the blog post Bitmap Search Technique in the SAS Data Step. This process is probably the fastest of the three.

data _null_; if 0 then set have nobs=n; call symputx('nobs', n - 1); stop; run; %put nobs=&nobs; data want; array bi {0 : &nobs.} _temporary_; array aa {0 : &nobs.}; do i = 0 to &nobs.; set have; aa[i] = x; bi[i] = 2 ** i; end; do i = 2 ** (&nobs. + 1) - 1 to 1 by -1; sum = 0; do j = 0 to &nobs.; coef = (band(i, bi[j]) > 0); sum + coef * aa[&nobs. - j]; end; output; end; format i binary5.; keep aa: i sum; run; |

## Summary

In this post we explore different approach to calculating all possible sums of N given numbers in SAS. One using Proc Summary and two data step solutions. The Proc Summary approach is the simplest and probably the way to go for reasonable sized N. The first data step solution is quite intuitive and surprisingly fast. However, the fastest of the three is the data step solution using bitmasks and the Band Function. Also, Proc Summary has a limitation to the number of class variables.

I encourage readers to play around with the code above. Try the code snippets on N = 20. Which method is fastest? The inspirations to the this post comes from this thread at the SAS Community.

You can download the entire code from this post here.