Normal Distribution

The normal distribution is the most common probability distribution. It is a continuous distribution and widely used in statistics and many other related fields because it has some very nice properties. Here, I will give a brief introduction to the normal distribution and some code examples of the normal distribution in SAS. The Probability Density Function is given as

    \begin{equation*} f(x\;|\;\mu ,\sigma ^{2})={\frac {1}{\sqrt {2\pi \sigma ^{2}}}}\;e^{-{\frac {(x-\mu )^{2}}{2\sigma ^{2}}}} \end{equation*}

where \mu is the mean of the distribution and \sigma^2 is the variance and squared standard deviation. If a random variable X follows a normal distribution with mean \mu and standard deviation \sigma, this is written as X \sim \mathcal{N}(\mu,\,\sigma^{2})\,.

To the right, I have plotted the Probability Density Function (PDF) and the Cumulative Density Function (CDF) for three different normal distributions with different values of \mu and \sigma. The blue line is called the Standard Normal or Standard Gaussian distribution with \mu = 1 and \sigma^2 = 1. You can see that increasing \sigma^2 makes the distribution flatter, and decreasing \sigma^2 makes the distribution steeper around the mean \mu, which is the highest point on the density curve. You can download the entire program creating these plots here.

The normal distribution is used in many different contexts in statistics. A typical use of it is to assume a normal distribution, compute some test statistic and then evaluate this test statistic in the normal distribution to see if you can reject or fail to reject your null hypothesis. See the post Linear Regression in SAS for a simple example of this.

Normal Distribution Probability Density Function PDF
Normal Distribution Commulative Density Function CDF
The Central Limit Theorem

Furthermore, the Normal Distribution is popular because of the Central Limit Theorem, which is considered to be one of the most fundamental and profound concepts in statistics. Shortly put, the central limit theorem states that even though we draw samples from some non-normal distribution, the sampling distribution of the mean will tend to normality as the sample size increases. For a basic introduction to the Central Limit Theorem, I recommend watching this Introduction to the Central Limit Theorem by Khan Academy. Previously, I have written a blog post about how to Visualize the Central Limit Theorem in SAS.

SAS Code Example

It is important to have a basic understanding of the normal distribution, and how the shape changes with its parameters. Below, I have written a SAS code example for you to play around with. Insert it into your SAS editor and change the three values defined at the top of the code to see how it affects the shape of the distribution.

%let alpha = 0.05; /* Set alpha value */
%let mu = 0;	   /* Set mean value */
%let sigma = 1;	   /* Set st. dev value */
data normal_PDF(drop = lower_q upper_q);
   lower_q = quantile('normal', &alpha/2 , &mu, &sigma);	            /* Set lower quantile         */
   upper_q = quantile('normal', (1 - &alpha/2), &mu, &sigma);	            /* Set upper quantile         */
   do x=&mu - 3*&sigma to &mu + 3*&sigma by 0.01;
      density = pdf('normal',x,&mu,&sigma);                                 /* Normal Density Function    */
   x = .; density = .;
   x_line = upper_q; line = pdf('normal',x_line,&mu,&sigma);output;         /* Line for upper quantile    */
   x_line = lower_q; line = pdf('normal',x_line,&mu,&sigma);output;         /* Line for lower quantile    */
   x_line = .; line = .;   
   do lower_x_band = &mu - 3*&sigma to lower_q by 0.01;                     
      lower_band = pdf('normal',lower_x_band,&mu,&sigma);                   /* Lower critical region      */                
   lower_x_band = .; lower_band = .;
   do upper_x_band = upper_q to &mu + 3*&sigma by 0.01;
      upper_band = pdf('normal',upper_x_band,&mu,&sigma);                   /* Lower critical region      */    
   upper_x_band = .; upper_band = .;
title 'Normal Probability Density Function';
title2 'With Critical Regions Shaded';
proc sgplot data = normal_PDF noautolegend;
   series x = x y = density	/ lineattrs = (color = black thickness = 2);
   dropline x = x_line y = line / lineattrs = (color = black);
   band x = lower_x_band upper = lower_band lower = 0;
   band x = upper_x_band upper = upper_band lower = 0;
   yaxis offsetmin=0 min=0 label="Density";
   xaxis label = 'x';