Descriptive Statistics With PROC MEANS and SAS/IML

Knowing your data well is crucial for any statistical analysis. Therefore you should always start by looking at your data graphically and calculating basic descriptive statistics to get an idea of how your data is shaped. First this example page shows how to compute basic descriptive statistics with PROC MEANS. Next, I demonstrate how these statistical sizes are computed ‘manually’ in SAS/IML.


First, we use PROC MEANS to calculate descriptive statistics of the heights of the students in the sashelp.class dataset. In the procedure statements, I request different common statistics.

proc means data = sashelp.class n max min range Mean Std stderr t prt maxdec=2;
   var Height;
proc means descriptive statistics output

To the right, you can see the result from the Means Procedure.


Next, let us take a look at how the statistics from PROC MEANS are actually calculated. The calculations in the IML Procedure below yield exactly the same results as in the Means Procedure above.

proc iml;
use sashelp.class;                                    /* Open dataset for reading              */                                 
   read all var {Height};                             /* Read variable Height into vector      */
close sashelp.class;                                  /* Close dataset                         */  
mu0      = 60;                                        /* Hypothesised mean                     */
N        = nrow(Height);                              /* Number of observations                */
min      = min(Height);                               /* Minimum Value                         */
max      = max(Height);                               /* Maximum Value                         */
range    = max - min;                                 /* The difference between min and max    */
Mean     = 1/n * sum(Height);                         /* Population mean value                 */
Std      = sqrt(1/(n-1) * sum((Height - Mean)##2));   /* Standard Deviation                    */
Std_Err  = Std / sqrt(n);                             /* Standard error of the mean            */
t_stat   = (Mean - mu0) / Std_Err;                    /* T statistic                           */
p_value  = (1-cdf('t',abs(t_stat),n-1))*2;            /* P value associated with t-statistic   */
print N max min range Mean Std Std_Err t_stat         /* Print selected descriptive statistics */