Thursday, December 10, 2020

Estimating birth date from age

This code demonstrates an algorithm for estimating birth date from age. We cannot know the exact birth date, but we can get close: the maximum error is half a year, and the typical error is one quarter of a year.

/* The %age macro was taken from the Internet---maybe from here ? */
%macro age(date,birth);
floor ((intck('month',&birth,&date) - (day(&date) < day(&birth))) / 12)
%mend age;

Generate 10000 fake people with random birth dates and random perspective days
on which their age was measured. Then, calculate age from that perspective date.
In reality, there is some seasonality to births (e.g., more births in July), but 
here we assume each day of the year has an equal distribution of births.
data person;
	format birth_date submit_date yymmdd10.;
	do i = 1 to 10000;
		birth_date = %randbetween(19000,20500);
		submit_date = birth_date + %randbetween(0,100*365);
		age = %age(submit_date, birth_date);
	drop i;

/* Work in reverse from age to estimated birth date. */
data reverse;
	set person;
	format birth_date_min birth_date_max yymmdd10.;
	birth_date_min = intnx('years', submit_date, -1 * (age+1), 's') - 1;
	birth_date_max = intnx('years',birth_date_min,1,'s') + 1;

    /* check range of estimates for errors */
	min_error = (birth_date > birth_date_min);
	max_error = (birth_date < birth_date_max);

    /* estimate birth date as the middle of the range */
	birth_date_avg = mean(birth_date_min, birth_date_max);
    /* calculate variance */
	abs_days_error = abs(birth_date - birth_date_avg);

/* Both errors should always be zero. */
proc freq data=reverse;
	table min_error max_error;

/* Error of estimates range from 0 to 183.5 with a median of 92 and average of 91.*/
proc means data=reverse n nmiss min median mean max;
	var abs_days_error;

/* Distribution of errors is uniform */
proc sgplot data=reverse;
	histogram abs_days_error;

Tested with SAS 9.4M6

No comments:

Post a Comment