Thursday, December 10, 2020

Estimating birth date from age

This code demonstrates an algorithm for estimating birth date from age. We cannot know the exact birth date, but we can get close: the maximum error is half a year, and the typical error is one quarter of a year.

/* The %age macro was taken from the Internet---maybe from here ? */
%macro age(date,birth);
floor ((intck('month',&birth,&date) - (day(&date) < day(&birth))) / 12)
%mend age;

Generate 10000 fake people with random birth dates and random perspective days
on which their age was measured. Then, calculate age from that perspective date.
In reality, there is some seasonality to births (e.g., more births in July), but 
here we assume each day of the year has an equal distribution of births.
data person;
	format birth_date submit_date yymmdd10.;
	do i = 1 to 10000;
		birth_date = %randbetween(19000,20500);
		submit_date = birth_date + %randbetween(0,100*365);
		age = %age(submit_date, birth_date);
	drop i;

/* Work in reverse from age to estimated birth date. */
data reverse;
	set person;
	format birth_date_min birth_date_max yymmdd10.;
	birth_date_min = intnx('years', submit_date, -1 * (age+1), 's') - 1;
	birth_date_max = intnx('years',birth_date_min,1,'s') + 1;

    /* check range of estimates for errors */
	min_error = (birth_date > birth_date_min);
	max_error = (birth_date < birth_date_max);

    /* estimate birth date as the middle of the range */
	birth_date_avg = mean(birth_date_min, birth_date_max);
    /* calculate variance */
	abs_days_error = abs(birth_date - birth_date_avg);

/* Both errors should always be zero. */
proc freq data=reverse;
	table min_error max_error;

/* Error of estimates range from 0 to 183.5 with a median of 92 and average of 91.*/
proc means data=reverse n nmiss min median mean max;
	var abs_days_error;

/* Distribution of errors is uniform */
proc sgplot data=reverse;
	histogram abs_days_error;

Tested with SAS 9.4M6

Monday, July 22, 2019

How to connect from Linux to BleemSync 1.1

After installing BleemSync 1.1 on my PlayStation Classic, I could connect to the BleemSync UI from Windows but not from Linux (Ubuntu 19.04). Google Chrome reported "unable to connect," and ping to reported a network error.

dmesg showed that Linux identified the device, and RNDIS networking started

[23945.137399] usb 1-3: new high-speed USB device number 31 using xhci_hcd
[23945.286069] usb 1-3: New USB device found, idVendor=04e8, idProduct=6863, bcdDevice=ff.ff
[23945.286075] usb 1-3: New USB device strings: Mfr=3, Product=4, SerialNumber=5
[23945.286079] usb 1-3: Product: classic
[23945.286082] usb 1-3: Manufacturer: BleemSync
[23945.290633] rndis_host 1-3:1.0 usb0: register 'rndis_host' at usb-0000:00:14.0-3, RNDIS device, 8a:04:6f:1c:f9:72
[23945.291271] cdc_acm 1-3:1.2: ttyACM0: USB ACM device
[23945.339113] rndis_host 1-3:1.0 enp0s20f0u3: renamed from usb0

However, ifconfig showed it did not have an IPv4 address. This indicates a DHCP failure.

$ ifconfig enp0s20f0u3
enp0s20f0u3: flags=4163  mtu 1500
        inet6 fe80::adf4:a447:4b1d:f96c  prefixlen 64  scopeid 0x20
        ether 72:b7:b8:ae:c8:00  txqueuelen 1000  (Ethernet)
        RX packets 8  bytes 536 (536.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 81  bytes 12347 (12.3 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

I bypassed DHCP by manual IP configuration.

sudo route add -net netmask metric 1024 dev enp0s20f0u3
sudo ifconfig enp0s20f0u3

Now Google Chrome, ping, and even telnet worked.

Note: your interface name may vary. Mine was enp0s20f0u3.

This solution worked until rebooting Ubuntu. Later I found a permanent solution: BleemSync 1.0 on Ubuntu thanks to DDFoster96.

Monday, January 28, 2019

SAS Message Log with ODBC: COMMIT performed on connection #

When using SAS to develop high-performance queries against remote SQL databases, it is helpful to see the exact ODBC messages that SAS passes to the driver. Sometimes the implicit SQL poorly translates a query, which can be optimized. To see these message, enable the SAS trace like this:

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

However, when closing the SAS process, there can be a pop-up dialog window with the title "SAS Message Log" with entries like this:

ODBC: COMMIT performed on connection #6.
ODBC: COMMIT performed on connection #5.
ODBC: COMMIT performed on connection #4.
ODBC: COMMIT performed on connection #3.
ODBC: COMMIT performed on connection #2.
ODBC: COMMIT performed on connection #1.
ODBC: COMMIT performed on connection #0.

When running SAS interactively, this is a minor nuisance. When running SAS in an automated batch, this can be a serious problem because the dialog will wait indefinitely for human interaction, so the sas.exe process will never terminate.

Sunday, January 28, 2018

Type I error rates in two-sample t-test by simulation

What do you do when analyzing data is fun, but you don't have any new data? You make it up.

This simulation tests the type I error rates of two-sample t-test in R and SAS. It demonstrates efficient methods for simulation, and it reminders the reader not to take the result of any single hypothesis test as gospel truth. That is, there is always a risk of a false positive (or false negative), so determining truth requires more than one research study.

A type I error is a false positive. That is, it happens when a hypothesis test rejects the null hypothesis when in fact it is not true. In this simulation the null hypothesis is true by design, though in the real world we cannot be sure the null hypothesis is true. This is why we write that we "fail to reject the null hypothesis" rather than "we accept it." If there were no errors in the hypothesis tests in this simulation, we would never reject the null hypothesis, but by design it is normal to reject it according to alpha, the significance level. The de facto standard for alpha is 0.05.


First, we run a simulation in R by repeatedly comparing randomly-generated sets of normally-distributed values using the two-sample t-test. Notice the simulation is vectorized: there are no "for" loops that clutter the code and slow the simulation.

Wednesday, January 10, 2018

Condition execution on row count

Use this code as a template for scenarios when you want to change how a SAS program runs depending on whether a data set is empty or not empty. For example, when a report is empty, you may want to not send an email with what would be a blank report. In other words, the report sends only when it has information.

On the other hand, you may want to send an email when a data set is empty if that means an automated SAS program had an error that requires manual intervention.

In general, it's good practice in automated SAS programs to check the size of a data sets in case they are empty or otherwise have the wrong number of observations. With one easy tweak, you could check for a specific minimum number of observations that is greater than zero. (This is left as an exercise for the reader.)

Tuesday, August 30, 2016

SAS ERROR: Cannot load SSL support. on Microsoft Windows

When using SAS with HTTPS or FTPS, which requires SSL/TLS support, you may see this error message in the SAS log.

ERROR: Cannot load SSL support.

Here is an example of code that can trigger the error.

filename myref url "";
data _null_; 
infile myref; 

The cause was that

Thursday, June 23, 2016

SAS error "insufficient memory" on remote queries with wide rows

SAS can give the error The SAS System stopped processing this step because of insufficient memory when querying a single, wide row from a remote SQL Server. The following code fully demonstrates the problem and shows a workaround. Also, I eliminate the explanation that SAS data sets in general do not support rows this wide.