Monday, February 22, 2016

ISO 3166-1 alpha-2 (two-letter country code) format for SAS

Here is the widely-used ISO 3166-1 alpha-2 format for use in SAS. It is commonly called the two-letter country code format.

The PROC FORMAT code generates a character format, so where the raw data contains a code, such as US, it expands it to the pretty name, such as United States. As with any SAS format, applying the format does not change the underlying data.

Wednesday, February 3, 2016

Undocumented SAS feature: Bulkloading to Netezza with ODBC interface

The SAS/ACCESS Interface to ODBC in SAS 9.4M4 states it supports bulk loading only to "Microsoft SQL Server data on Windows platforms." However, in practice on the Windows platform it also supports bulk loading to Netezza.

Bulk loading is amazingly fast. In some of my benchmarks the duration of the whole bulk loading operation is independent of the number of rows inserted!

By default on Netezza the bulk loading interface delimits values using a pipe character, and for cases where the values contain a pipe, SAS Access Interface to ODBC unofficially supports the BL_DELIMITER option to specify an alternate delimiter. For the ODBC interface, this option is undocumented.

However, there are nuances with the BL_DELIMITER option. According to the SAS Access Interface to Netezza:

Friday, January 29, 2016

Frequency of individual characters from SAS data set

This script counts the frequencies of individual ASCII characters in a single column in a SAS data set and then prints an easy-to-read report.

My initial motivation relates to delimiters. By default bulkloading data from Netezza to SAS (which is very fast) uses the pipe character as a delimiter, but my data set contained values with the pipe character, so this macro identifies alternative delimiters.

Another potential use is cracking a message encrypted using a simple letter substitution cipher.

To begin, this code creates an example data set courtesy of William Shakespeare.

Wednesday, September 2, 2015

SAS macros always have a global scope

SAS allows the programmer to declare the scope of macro variables using %LOCAL or %GLOBAL, but the macros themselves are always created in the global scope.

Say you have a macro that in another language, say Python, would be considered a function. Within the macro you want a sub-macro (i.e., sub-function) to be used only within the outer macro.

%macro outer;
%put NOTE: outer;

/* This "sub-macro" is defined within the outer macro and is
   intended only for use within the outer macro. */
 %macro inner(foo);
 %put NOTE: inner &foo;
 %mend;

%inner(1);
%inner(2);
%mend;

%outer;

/* If the "sub-macro" has a local scope, the next step would fail */
%inner(3);

/* However, it succeeds */

This can lead to conflicts if the macro %inner is defined somewhere else in the same session. One way of dealing with this is to be careful to give the inner macro a unique name like __outer_inner where the underscores in the prefix suggest a local scope, and adding outer to the macro name indicates the macro is to be used only in the outer macro.

Monday, August 31, 2015

Gotcha with SAS, regular expressions, and end-of-line matching

Regular expressions are essential for sophisticated text processing, and it is generally easy to transfer knowledge of Perl regular expressions to the SAS functions prxparse, prxmatch, prxposn, etc. However, use caution with the end of line character ($) because of how SAS treats whitespace.

For demonstration I will run what looks like an equivalent use of regular expressions in Python, JavaScript, and SAS, but notice that only SAS does not match the string.

# Python 2.7
import re
first_name ='Andrew  '
first_name = first_name.strip()
if re.search(r"^Andrew$", first_name):
    print 'match' # it does match
else:
    print 'no match'
/*JavaScript */
first_name ='Andrew  ';
first_name = first_name.trim();
if (first_name.match(/^Andrew$/)) 
    alert('match'); /* it does match */
    else alert('no match');

(JavaScript fiddle for this code.)

data x;
 first_name='Andrew  ';
 first_name=strip(first_name);
 match=prxmatch('/^Andrew$/', first_name); /* it does not match (match=0) */
run;

In SAS ignore the trailing whitespace using the trim() function:

data x;
 first_name='Andrew  ';
 match=prxmatch('/^Andrew$/', trim(first_name)); /* it does match */
run;

SAS, however, does not distinguish a string that was inserted with trailing spaces from a string that was inserted without trailing spaces. In the following SAS-only example imagine the table was created and populated using a non-SAS system like MySQL or Microsoft SQL Server.

proc sql;
 create table names (
  first_name varchar(8)
 );

 insert into names values ('Andrew'); /* no trailing spaces */
 insert into names values ('Andrew '); /* one trailing space */
 insert into names values ('Andrew  ');
quit;

data names;
 set names;
 length=length(first_name);
 match1=prxmatch('/^Andrew$/', first_name);
 match2=prxmatch('/^Andrew$/', trim(first_name));
run;
Screenshot of SAS data set

This was tested with SAS 9.4M3 on Microsoft Windows 7.

Tuesday, June 9, 2015

List of user-installed R packages and their versions

This R command lists all the packages installed by the user (ignoring packages that come with R such as base and foreign) and the package versions.

ip <- as.data.frame(installed.packages()[,c(1,3:4)])
rownames(ip) <- NULL
ip <- ip[is.na(ip$Priority),1:2,drop=FALSE]
print(ip, row.names=FALSE)

Example output

       Package   Version
        bitops     1.0-6
 BradleyTerry2     1.0-6
          brew     1.0-6
         brglm     0.5-9
           car    2.0-25
         caret    6.0-47
          coin    1.0-24
    colorspace     1.2-6
        crayon     1.2.1
      devtools     1.8.0
     dichromat     2.0-0
        digest     0.6.8
         earth     4.4.0
      evaluate       0.7
[..snip..]

Tested with R 3.2.0.

This is a small step towards managing package versions: for a better solution, see the checkpoint package. You could also use the first column to reinstall user-installed R packages after an R upgrade.

Tuesday, March 3, 2015

SAS 9.4 crash with MySQL ODBC pass-through queries

SAS 9.4 (TS1M2) on X64_DS08R2 (Windows Server 2008 64-bit) always crashes with certain pass-through queries using MySQL Connector/ODBC 5.3.4. When it crashes, the SAS log shows some red messages, but SAS closes immediately.

The crash is not reproducible with other ODBC drivers, on SAS 9.3 64-bit, or SAS 9.4 32-bit.

Workarounds include: using an ODBC DSN instead of the connection string, not using pass-through queries, or using SAS 9.3.

SAS agreed to fix the bug.

Benchmarking the TP-Link TL-WPA7510 Powerline kit

Real-World Powerline Networking Benchmarks A practical look at HomePlug AV2 performance across outlets, with iperf3 & ping TL;DR: Po...