The Danish Social Security Number CPR in SAS

The Danish social security number is a 10 digit number, that every Danish citizen receives at birth. In Danish, it is called a CPR (Central Person Register) number. In this post, I will demonstrate how to handle the CPR Number in SAS. We will see how to extract information from the CPR Number such as gender and birth date in SAS. If you are not at all familiar with the construction of the CPR Number, I encourage you to read the CPR Nummer Wiki Page. I will not go into details with the construction and allocation here. Rather, I will focus on how to tackle it in SAS.

Male of Female?

It is very simple to extract the gender of the CPR number holder. It is simply determined by the 10’th digit. If this is an even number, it belongs to a female. If it is uneven, it belongs to a male. This is easy to extract with the Char Function like below.

data _null_;
   cpr='2308792308';
   if      char(cpr, 10) in ('0', '2', '4', '6', '8') then gender='Female';
   else if char(cpr, 10) in ('1', '3', '5', '7', '9') then gender='Male';
 
   put gender=;
run;

Calculate Birthdate and Age From CPR Number

It is more complicated to calculate an individuals date of birth from his/her social security number. Part of it is straight forward. The first two digits represents the day of the month. The next two represents the month number. Digit 5 and 6 represents the year of birth. However, only in combination with the 7’th digit can we determine the century of birth. You can see how the century is determined in the Wiki Page in the top.

In the code below, I calculate the date of birth from a CPR number. I do so with a 2 Dimensional Array Lookup, where I look up the century using the 7’th and 5, 6’th digits respectively. I then use the MDY Function to calculate the actual date of birth. Try to input your own CPR number (if you have one).

data _null_;
    cpr='2902001308';
 
    day=substr(cpr, 1, 2);
    mon=substr(cpr, 3, 2);
    _56=substr(cpr, 5, 2);
    _7 =substr(cpr, 7, 1);
 
    array _ {0:9, 0:99} _temporary_  (4*(37*19   21*19   42*19)          
                                      1*(37*20   21*19   42*19)
                                      4*(37*20   21*20   42*18)
                                      1*(37*20   21*19   42*19));
 
    cen  = _[input(_7, 1.), input(_56, 2.)];
    year = input(cats(cen, _56), 4.);
 
    birthdate=mdy(input(mon, 2.), input(day, 2.), year);
 
    put "The individuals birthdate is " birthdate date9.;
 
run;

Obviously, the code above does not take into consideration invalid CPR numbers such as dates that make no sense. The custom function below will return a missing value for invalid CPR numbers. Feel free to copy the code and use it.

Code wise, I have to make few alterations compared to above. This is because the array is not as flexible in PROC FCMP as in the data step. You can read more about that in the PROC FCMP Array Documentation.

proc fcmp outlib=work.functions.fun;
    function cprbirthdate(cpr $);
 
       day=substr(cpr, 1, 2);
       mon=substr(cpr, 3, 2);
       _56=substr(cpr, 5, 2);
       _7 =substr(cpr, 7, 1);
 
       array _ {10, 100} / nosymbols (19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
                                      19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
                                      19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
                                      19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
                                      20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
                                      20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
                                      20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
                                      20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
                                      20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
                                      20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19);
 
       cen  = _[input(_7, 1.)+1, input(_56, 2.)+1];
       year = input(cats(cen, _56), 4.);
 
       if prxmatch('/^(((0[1-9]|[12][0-9]|3[01])[-](0[13578]|1[02])|(0[1-9]|[12][0-9]|30)[-](0[469]|11)|(0[1-9]|1\d|2[0-8])[-]02)[-]\d{4}|29[-]02[-](\d{2}(0[48]|[2468][048]|[13579][26])|([02468][048]|[1359][26])00))$/', cats(day, '-', mon, '-', put(year, 4.))) = 0 then call missing(birthdate); 
       else birthdate=mdy(input(mon, 2.), input(day, 2.), year);
 
       return (birthdate);
    endsub;
run;quit;

Finally, we put the custom cprbirthdate function to the test. See the examples below. The first three made up social security numbers make sense. The last two obviously do not. The function seems to handle these cases.

data cprnumbers;
   cpr='2307891556'; output;
   cpr='2902004165'; output;
   cpr='0904193498'; output;
   cpr='0000000000'; output;
   cpr='2902055165'; output;
run;
 
options cmplib=work.functions;
data test;
   set cprnumbers;
   bd=cprbirthdate(cpr);
   format bd date9.;
run;
 
proc print data=test;run;

Summary

In this post, we look at the Danish Social Security Number in SAS. We see examples of how to extract information such as gender and date of birth from it. Furthermore, we discuss pitfalls hereof such as leap years and invalid dates.

Obviously, organizations consider CPR numbers as sensitive personal information. I have previously blogged about how to mask certain part of a character variable with the CPR number as an example in the post Write Picture Formats For Character Variables in SAS.

You can download the entire code from this post here.