Friday, January 29, 2016

Frequency of individual characters from SAS data set

This script counts the frequencies of individual ASCII characters in a single column in a SAS data set and then prints an easy-to-read report.

My initial motivation relates to delimiters. By default bulkloading data from Netezza to SAS (which is very fast) uses the pipe character as a delimiter, but my data set contained values with the pipe character, so this macro identifies alternative delimiters.

Another potential use is cracking a message encrypted using a simple letter substitution cipher.

To begin, this code creates an example data set courtesy of William Shakespeare.

Estimating token counts from character length

When programatically using an AI chatbot API, it is easy to run up big bills. To avoid this, carefully moniter token usage, but resist the u...