where is the mean and is the variance and squared standard deviation. A random variable can follow a normal distribution with mean and standard deviation . We write this as .
To the right, I have plotted the Probability Density Function (PDF) and the Cumulative Density Function (CDF) for three different normally distributed curves with different values of and . The blue line is called the Standard Normal or Standard Gaussian distribution with and . You can see that increasing makes the distribution flatter, and decreasing makes the distribution steeper around the mean , which is the highest point on the density curve. You can download the entire program creating these plots here.
The normal is used in many different contexts in statistics. A typical use of it is to assume normality, compute some test statistic and then evaluate this test statistic in the normal distribution to see if you can reject or fail to reject your null hypothesis. See the post Linear Regression in SAS for a simple example of this.
The Central Limit Theorem
Furthermore, the Normal Distribution is popular because of the Central Limit Theorem, which is considered to be one of the most fundamental and profound concepts in statistics. Shortly put, the central limit theorem states that even though we draw samples from some non-normal distribution, the sampling distribution of the mean will tend to normality as the sample size increases. For a basic introduction to the Central Limit Theorem, I recommend watching this Introduction to the Central Limit Theorem by Khan Academy. Previously, I have written a blog post about how to Visualize the Central Limit Theorem in SAS.
Normal Distribution SAS Code Example
It is important to have a basic understanding of the normal distribution, and how the shape changes with its parameters. Below, I have written a SAS code example for you to play around with. Insert it into your SAS editor and change the three values defined at the top of the code to see how it affects the shape of the distribution.
%let alpha = 0.05; /* Set alpha value */ %let mu = 0; /* Set mean value */ %let sigma = 1; /* Set st. dev value */ data normal_PDF(drop = lower_q upper_q); lower_q = quantile('normal', &alpha/2 , &mu, &sigma); /* Set lower quantile */ upper_q = quantile('normal', (1 - &alpha/2), &mu, &sigma); /* Set upper quantile */ do x=&mu - 3*&sigma to &mu + 3*&sigma by 0.01; density = pdf('normal',x,&mu,&sigma); /* Normal Density Function */ output; end; x = .; density = .; x_line = upper_q; line = pdf('normal',x_line,&mu,&sigma);output; /* Line for upper quantile */ x_line = lower_q; line = pdf('normal',x_line,&mu,&sigma);output; /* Line for lower quantile */ x_line = .; line = .; do lower_x_band = &mu - 3*&sigma to lower_q by 0.01; lower_band = pdf('normal',lower_x_band,&mu,&sigma); /* Lower critical region */ output; end; lower_x_band = .; lower_band = .; do upper_x_band = upper_q to &mu + 3*&sigma by 0.01; upper_band = pdf('normal',upper_x_band,&mu,&sigma); /* Lower critical region */ output; end; upper_x_band = .; upper_band = .; run; title 'Normal Probability Density Function'; title2 'With Critical Regions Shaded'; proc sgplot data = normal_PDF noautolegend; series x = x y = density / lineattrs = (color = black thickness = 2); dropline x = x_line y = line / lineattrs = (color = black); band x = lower_x_band upper = lower_band lower = 0; band x = upper_x_band upper = upper_band lower = 0; yaxis offsetmin=0 min=0 label="Density"; xaxis label = 'x'; run; title;
Finally check out the blog post Fit Normal, Weibull and Lognormal Distribution to see how to fit the normal distribution in SAS.