Binomial Distribution

Binomial Distribution Probability Mass Function PMF SASThe Binomial distribution is a discrete probability distribution closely related to the Bernoulli Distribution. It models the number of successes in a series of n independent Bernoulli trials. Furthermore, if n=1, i.e. we have a single trial, which means that the Binomial and Bernoulli are the same. Next, letting k be the number of successes in the n independent Bernoulli trials, the Probability Mass Function distribution is

    \begin{equation*} P(X=k) = p(k) = {{n}\choose{k}} p^k (1-p)^{n-k}, \quad k=1, 2, 3, \dots \end{equation*}

If a stochastic variable X follows a Binomial distribution with parameters k and p, we write X \sim Bi(k, p). Breaking the above formula down, p^k is the probability that k successes occur. (1-p)^{n-k} is the probability that n-k failures occur. Finally, {{n}\choose{k}} = \frac{n!}{n!(n-k)!} expresses the number of ways you can choose k distinct elements from a larger set of n elements. As a result, multiplying these gives the probability of observing exactly k successes in n Bernoulli trials with success probability p. The key to understanding the Binomial PMF is to understand the Binomial coefficient. Therefore, you should take the time to understand it. For a great explanation of the coefficient, go to Understanding the Binomial Coefficient at Khan Academy.

SAS Code Example

Let us plot the Probability Mass Function for the distribution. First of all, I create the PMF data, specifying the probability of success in the individual Bernoulli trials and the number of trials to be performed. Then I use the PDF function to calculate the PMF values. Finally, I use a needle plot to create the graph to the right if the Probability Mass Function.

/* Generate PMF Data */
%let p=0.5;
%let n=20;
data Bino_PMF;
   do k=0 to &n;
      PMF=pdf('Binomial', k, &p, &n);
      output;
   end;
run;
 
/* Plot PMF */
title "Binomial PMF with p=&p and n=&n";
proc sgplot data=Bino_PMF noautolegend;
   needle x=k y=PMF / lineattrs=(color=red);
   xaxis values=(0 to 20) label='k' labelattrs=(size=12 weight=Bold);
   yaxis display=(nolabel);
   keylegend / position=NE location=inside across=1 noborder valueattrs=(Size=12 Weight=Bold);
run;
title;

Finally, this distribution is one of first distributions you will meet in your statistics class. Therefore, I encourage you to play around with the p and n parameters in the above SAS code example to familiarize yourself with how the distribution changes with the parameters.

In addition, You can download the entire program supporting this example here.