F Distribution

The F Distribution is a continuous probability distribution. It is commonly used to test hypotheses about equality of two population variances and comparing linear regression models. You can see an example of the latter in the blog post Linear Regression in SAS. The F Distribution has Probability Density Function

    \begin{equation*} f(x) = \frac{\sqrt{\frac{ (d_1 x)^{d_1} d_2^{d_2} } { (d_1 x + d_2)^{d_1+d_2} } }} {x B\left( \frac{d_1}{2}, \frac{d_2}{2} \right)}, \quad x \geq 0 \end{equation}

where d_1 and d_2 are positive integers indicating the degrees of freedom and B is the Beta Function. At first sight, the definition of the Probability Density Function does not give much intuitive sense. Therefore, I usually think of the distribution as a relationship between two Chi Squared distributions divided by their degrees of freedom. Consequently, a random variable X is F distributed if it can be written as

    \begin{equation*} X = \frac{\chi^2(d_1)/d_1}{\chi^2(d_2)/d_2} \sim F(d_1, d_2) \end{equation*}

To the right, I have plottet Probability Density Functions and their corresponding Cumulative Density Functions for three different distributions. You can download the code creating these plots here. For a great introduction to the distribution, see the video An Introduction to the F distribution by jbStatistics.

F Probability Distribution Function PDF with different degrees of freedom
F Cummulative Distribution Function CDF with different degrees of freedom
SAS Code Example

Below, I have written a small SAS program that lets you set the degrees of freedom in the numerator and denominator respectively. I encourage you to play around with these and see how they affect the density drawn in the program. What happens when they are both large? Or small? And what happens when one is much bigger than the other?

%let d1=3;
%let d2=8;
 
data Beta_PDF;
      do x=0 to 3 by 0.01;
         F_pdf=pdf('F', x, &d1, &d2);
         output;
      end;
run;
 
title "F Probability Density Function for d1=&d1 and d2=&d2";
proc sgplot data = Beta_PDF noautolegend;
   series x=x y=F_pdf / lineattrs=(thickness=3);
   yaxis label="PDF" labelattrs=(size=12 weight=Bold);
   xaxis label='x' labelattrs=(size=12 weight=Bold);
run;
title;
The F Test

A frequent use of the distribution is in the F test. We have seen that a t Distribution is used to compare means to hypothesized values and among samples in the One Sample T Test and the Two Sample T Test. We use the F Test to compare variances/variability among samples or groups. Lastly, you can see examples of how the test is constructed and used in the Statistics Examples page F Test.