Most SAS programmers have used a format at some point. If not, they should. It lets you display data in ways not provided by any other facility. They let you flawlessly bin data ranges in categories for later use. You can even do Table Lookups with PROC FORMAT. In the future, I will do a few blog posts on user-defined SAS formats and how flexible and reliable they are. This post will serve as a basic introduction to writing simple formats in SAS with PROC FORMAT by example.
Replace If-Then-Else Statements
Here is a classic introductory example of using PROC FORMAT. Consider the following if-then-else block.
data test1; set sashelp.baseball; if salary = . then incomeclass="N/A"; else if salary < 500 then incomeclass="Low"; else if 500 <= salary < 1500 then incomeclass="Middle"; else if salary >=1500 then incomeclass="High"; run;
Here, we create the class variable incomeclass. There is several things wrong with the code above. First of all, it is messy. And it will get messier if there are more categories. Furthermore, the length of incomeclass is set at the first compilation encounter, which is when it is set to “N/A”. Therefore the variable length is 3, and values longer than that will be truncated.
Instead, let us write our own numeric format. Here, I use the same logic as above in a Format Procedure. This approach takes care of the shortcomings of the if-then-else logic above. Furthermore, if you are working with large data sets and many categories, the format approach is much faster due to the internal binary search algorithm. Finally, I create an identical data set with the one above. I do so by using the PUT Function with the defined format income.
proc format; value income low -< 500 = "Low" 500 -< 1500 = "Middle" 1500 - high = "High" . = "N/A"; run; data test2; set sashelp.baseball; incomeclass=put(salary, income.); run;
Binning Data Categories in PROC FREQ
Next, let us see how we can utilize a user defined format directly in a statistical procedure. A common question asked at the SAS Communities is how to calculate frequencies for specific ranges. Formats hold the answer. The sashelp.baseball data set contains salary information for various baseball players. Let us see how many are in the Low, Middle, and High categories defined by the PROC FORMAT step above. We simply use a Format Statement and specify the numeric format income. This way, we do not have to read an entire data set and create the categories. We can do it directly in the procedure.
You can see the result to the right.
proc freq data=sashelp.baseball; tables salary; format salary income.; run;
A Simple Picture Format Example
Picture formats are used to display numbers in SAS exactly the way you want it. Here is a very small example of creating a format, that displays numeric values with commas as thousand separators and a prepending dollar sign. In the sashelp.baseball data set, salaries are stored in thousands. We can use the Mult Option to display the actual number in reports without altering the actual data.
proc format; picture dollarsal low-high='000,000,000' (mult=1e3 prefix='$'); run; proc print data=sashelp.baseball(obs=10); format salary dollarsal.; var name team salary; run;
I use the dollarsal format in the proc print exactly like any other format step to display the numbers as specified. I demonstrate more picture tricks and techniques in the blog post 5 SAS Picture Format Options You Should Know.
SAS provides a wealth of predefined formats and you should familiarize yourself with them. Furthermore, you have the ability to roll out your own through PROC FORMAT. This post serves as a basic introduction to the creation and use of user defined formats. In future posts, I will blog about the flexibility and usefulness of formats in SAS. For example, see the post Creating Multilabel Formats in SAS with PROC FORMAT.
If you want an in depth examination of PROC FORMAT, the book The Power of PROC FORMAT by Jonas V. Bilenas is an excellent reference.
You can download the entire code from this example here.