tag:blogger.com,1999:blog-76660668160411906612024-03-16T12:49:34.082-06:00Heuristic AndrewR, SAS, machine learning, and statisticsAndrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.comBlogger31125tag:blogger.com,1999:blog-7666066816041190661.post-68826440824128485582024-02-14T17:08:00.002-07:002024-02-14T17:09:47.762-07:00<p>I bought a dashcam from <a href="https://temu.to/m/u5u13gok3nx">Temu</a> for $6.31 January 2024, and here is sample footage that includes three scenes: daytime, dusk, and daytime.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/mWml49xdrSA?si=otf1FwYKVf6GWCtQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<p>Product benefits</p>
<ul>
<li>Easy mounting with suction cup</li>
<li>Easy power with 12V cigarette plug adapter</li>
<li>Battery lasts a few moments after car turns off</li>
<li>MicroSD card included</li>
<li>Cheap price</li>
</ul>
<p>Problems</p>
<ul>
<li>Cheap quality
<li>Terrible video quality (despite product description)
<li> Narrow field of view (despite product description)
</ul>
<p>Notes</p>
<ul>
<li>It was a weird choice for it record in .avi instead of .mp4 container.</li>
<li>The product was <A href="https://www.temu.com/goods.html?_bg_fs=1&goods_id=601099516698795">discontinued</a> on <a href="https://temu.to/m/u5u13gok3nx">Temu</a>.</li>
</ul>
<p>Model (box): Y320
<br>Manufacturer (box): shenzhen Hengxin Weiye Digital Co., LTD
<br>Product title (Temu):
<br>Dash Camera For Cars With 32G Memory Cards Wide Angle Full 1080P Driving Record...
</p>
<p>Below is the product menu (PDF on Temu):
<br>Intelligent voice reminder, built-in multinational voice pronunciation, no need to worry about language barriers
1080P highdefinition night vision, even in the weak light environment, can also shoot clearly
Loop recording, no missing seconds, segmented storage, automatic monitoring of sto
rage space, when the memory is full, automatically delete the earliest recorded video
and save the new video
Builtin gravity sensor, when a sudden brake or collision is sensed, the current video is instantly locked to prevent overwriting important files during loop recording
Supported languages: English, French, German, Russian, Japanese, etc.</p><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-47044513647429553442023-07-12T16:17:00.005-06:002023-07-12T17:38:24.232-06:00Timestamp precision in Snowflake<p>Timestamps in Snowflake have precisions 0 to 9 with a default of 9, which is a nanosecond, but the Snowflake documentation is not clear on precisions 0 to 8.</p>
<h2>Storage difference</h2>
<p>I did an empircal test by creating tables. Each table had one million rows and one column with random timestamps. The values have an original precison of one nanosecond, and I used random values because otherwise Snowflake would compress down any number of rows with the same values to a few KB.
<pre class="prettyprint lang-sql">
create or replace table zzz_timestamp9 as
select dateadd(nanosecond, uniform(1,3e17, random()), current_timestamp())::timestamp(9) as time1
from TABLE(GENERATOR(ROWCOUNT => 1e6))
;
create or replace table zzz_timestamp0 as
select dateadd(nanosecond, uniform(1,3e17, random()), current_timestamp())::timestamp(0) as time1
from TABLE(GENERATOR(ROWCOUNT => 1e6))
;
</pre>
<p>The storage difference for 1 million rows was 3.5MB vs 7.0MB.</p>
<h2>Precision difference</h2>
<p>Again, I generated random rows and then copied the value into columns with varied precisions. </p>
<pre class="prettyprint lang-sql">
select
dateadd(nanosecond, uniform(1,3e17, random()), current_timestamp())::timestamp(9) as "precision 9",
"precision 9"::timestamp(3) as "precision 3",
"precision 9"::timestamp(2) as "precision 2",
"precision 9"::timestamp(1) as "precision 1",
"precision 9"::timestamp(0) as "precision 0",
datediff(ms, "precision 3", "precision 9") as "Precisions 9 vs 3 in milliseconds", /* alwayz zero */
datediff(second, "precision 0", "precision 9") as "Precisions 9 vs 0 in seconds" /* alwayz zero */
from TABLE(GENERATOR(ROWCOUNT => 10))
</pre>
<p>Precision 0 is 1 second, precision 1 is 100 ms, precision 2 is 10 ms, precision 3 is 1 ms, etc.</p><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-78162677844295766252023-05-20T23:14:00.002-06:002023-05-20T23:14:44.871-06:00openwrt ssh connection refused<h2>Symptom</h2>
Normal connection attempt
<pre>
$ ssh root@192.168.1.1
Connection to 192.168.1.1 closed.
</pre>
End of log with verbose ssh
<pre>
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: Offering public key: /home/z/.ssh/id_ed25519 ED25519 SHA256:XX//XX/XX agent
debug1: Server accepts key: /home/z/.ssh/id_ed25519 ED25519 SHA256:XX//XX/XX agent
Authenticated to 192.168.1.1 ([192.168.1.1]:22) using "publickey".
debug1: channel 0: new [client-session]
debug1: Entering interactive session.
debug1: pledge: filesystem
debug1: Sending environment.
debug1: channel 0: setting env LANG = "en_US.UTF-8"
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: channel 0: free: client-session, nchannels 1
Connection to 192.168.1.1 closed.
Transferred: sent 2488, received 1012 bytes, in 0.0 seconds
Bytes per second: sent 346691.3, received 141017.5
debug1: Exit status 1
</pre>
System log snippet in OpenWRT's luci interface
<pre>
Sat May 20 23:02:33 2023 authpriv.notice dropbear[7690]: Pubkey auth succeeded for 'root' with ssh-ed25519 key SHA256:XX//XX/XX from 192.168.1.X:X
Sat May 20 23:02:33 2023 authpriv.info dropbear[7691]: Exit (root) from <192.168.1.X:X>: Child failed
Sat May 20 23:02:33 2023 authpriv.info dropbear[7690]: Exit (root) from <192.168.1.X:X>: Disconnect received
</pre>
<h2>Background</h2>
<p>A few weeks ago, I installed OpenWRT 22.03.3 on my Belkin RT3200. SSH and everything else was working fine until I updated it to OpenWRT 22.03.5, and then immedietly SSH refused to connect.<p>
<h2>Solution</h2>
<p>I read a <a href="https://github.com/openwrt/luci/issues/2414#issuecomment-451338690">GitHub conversation</a> about a similar problem. In their case, zsh was missing, and I remembered that earlier I installed Bash. I reinstalled it via LuCI, and then SSH worked again.</p><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-76923050660251243482022-12-04T15:06:00.024-07:002023-02-01T11:13:01.646-07:00Enable 5G standalone (NR SA) on Samsung Galaxy A13<p>I bought two Samsung Galaxy A13 5G phones from Google Fi for $100 each on promo, but by default, they connected in NR NSA (non-standalone) mode, which requires an LTE anchor. Performance was okay in NR NSA except in few geographic areas where LTE bands 2, 4, and 66 had high packet loss of 50-100% because of low signal or interference.</p>
<p>With a little effort, Samsung Galaxy phones can be put into NR SA ("pure 5G") mode. The process is similar to the Galaxy S22 (see guide by <a href="#guidecredit">peacey8</a> at end), but there are a few differences. In particular, the A13 uses a Mediatek MT6769V/CU Helio G80 chipset instead of the Qualcomm SM8450 Snapdragon 8 Gen 1 on the Galaxy S22, so the serivce menus differ.</p>
<h2>Pros and cons</h2>
<p>Pros</p>
<ul>
<li>Lower latency and packet loss (sometimes/generally).</li>
<li>Latest generation of cell network technology.</li>
</ul>
<p>Cons</p>
<ul>
<li>Requires compatiable carrier. (In the USA, it's only T-Mobile and their MVNOs such as Google Fi and Mint Mobile.)</li>
<li>Requires somewhat modern phone.</li>
<li>Requires compatible cell tower. (In my area, 5G coverage is excellent.)
<li>T-Mobile does not support voice yet on NR SA mode. If band locking is set to NR SA, phone calls will not work. If band locking allows LTE and NR NSA, it will automatically fall back to LTE/NR NSA.</li>
<li>Allowing NR SA mode requires some initial technical set up.</li>
<li>Without band locking, the phone may often favor NR NSA over NR SA, even when NR SA has better network performance.</li>
<li>Band locking settings (optional) gets lost after phone reboot.</li>
<li>Set up varies by chipset.</li>
</ul>
<h2>Guide</h2>
<p>This guide assumes your phone has the XAA (unlocked) CSC profile. If not, <a href="guidecredit">see peacey8's guide</a> at the end. If you have TMB profile, then NR SA mode should already work. Once you have XAA, proceed here.</p>
<ol>
<li>On the phone, enable developer mode: go to settings (gear) - About Phone - Software Information - tap seven times on <i>build</i>.</li>
<li>In phone settings (Gear) - Developer Options - enable USB debugging.</li>
<li>Install Samsung USB driver for Windows. (Sorry, you need a laptop or desktop.)</li>
<li>Launch the <a href="https://samfw.com/blog/samfw-frp-tool-1-0-remove-samsung-frp-one-click">SamFW FRP tool</a> on Windows.</li>
<li>Connect phone by USB cable to Windows machine.</li>
<li>In SamFW, enable secret code for Verizon. (You do not need to use Verizon as a carrier.)</li>
<li>SamFW should stop at "Waiting for DIAG."</li>
<li>In Samsung Phone app (with the blue icon), call <tt>*#0808#</tt>. (It does not work for me in the Google Phone app with the green icon.) After hitting the last pound (and before hitting the call button), the USB settings menu should appear.</li>
<li>If you have two options, then you did not enable USB debugging. Start over.</li>
<li>Tap <i>DM+ACM+ADB</i> and tap <i>OK</i>. (The Galaxy S22 has way more USB debugging options, by the way.)</li>
<li>SamFW should stop at "Disabling DIAG." </li>
<li>Then call <tt>*#0808#</tt>, and switch back to <i>MTP</i>.</li>
<li>Phone can be disconnected from USB, if you wish.</li>
<li>In the Samsung phone, app use the dialer code *#27663368378#.</li>
<li>UE SETTING AND INFO -SETTING - PROTOCOL- NR -ALLOW LIST- ALLOW LIST OFF
<li>Three dots (top right), back</li>
<li>NR5G SA / NSA mode control - SA / NSA enable</li>
<li>Reboot phone.</li>
</ol>
<p>Once it is working, you can disable the USB debugging and developer options menu on the phone, and you can uninstall SamFW.</p>
<h2 id="troubleshoot">Troubleshooting</h2>
<p>Unlike the Galaxy S22, the <tt>*#2263#</tt> menu will not look any different on the A13. Even when NR SA is active, this service menu looks the same. It doesn't distinguish NR NSA and NR SA, so to confirm you are on SA,
some options are:</p>
<ul>
<li>Dial *#0011#. Pick relevant SIM. Check fourth line "Serving PLMN." If you see Nr5G, it's SA. If LTE, then it's not.</li>
<li>Use app such as CellMapper, Network Cell Info Lite, or NetMonster. They will all show the network type on the main page.</li>
</ul>
<p>If you are still on LTE or NR NSA, some options:<p>
<ul>
<li>Check your carrier supports NR SA. Right now, T-Mobile does, while Verizon and AT&T do not.</li>
<li>Check you have a 5G SIM and a 5G plan.</li>
<li>Turn off wifi. (It may use LTE only for power savings.)</li>
<li>Move closer to the tower. </li>
<li>Move to another tower.</li>
<li>Reset the connection like this: phone settings (gear) - Connections - Mobile networks - LTE/3G/2G. Wait a moment and set back to 5G/LTE/3G/2G. On our two A13, S22, and S22+ phones, we often need to reset the connection like this, and airplane mode does not seem to help. Otherwise, it may stick to LTE when NR NSA or NR SA are available.</li>
<li>Disable LTE like this: in the Samsung Phone app (green app icon), call <tt>*#2263#</tt>. Tap the relevant SIM. Tap CLEAR ALL BANDS. Tap "BLOCK SET BY AP," so the asterisk goes away. Tap NR menu to enter it. Select NR ALL. Go back to main. Apply selection. This may reset after rebooting the phone.</li>
<li>Alternate way to force NR: in Google Phone app (blue app icon), call <tt>*#*#4636#*#*</tt>- Phone Information - Set preferred network Type - NR only.</li>
</ul>
<p>I tested this two Galaxy A13 phones running Android 12 with T-Mobile via Google FI. In case you want to join Google Fi, here's a referral code to get a $20 credit when you join: <a href="https://g.co/fi/r/2RD2V5">2RD2V5</a>.</p>
<p>On the Galaxy S22, voice calls do not work with NR SA: incoming calls go straight to voicemail, and outgoing calls stop a moment after dialing because VoNR apprently is not enabled. It is probably the same on VoNR.</p>
<p id="guidecredit">Thank you much to the <a href="https://forum.xda-developers.com/t/how-to-enable-all-bands-through-service-menu-on-us-ca-s22-series-including-sub-6-and-mmwave.4488435/">NR SA for Galaxy S22 guide from peacey8</a>
and <a href="https://forum.xda-developers.com/t/please-help-me-to-get-5g-connection-on-galaxy-5g-mobile-wi-fi-scr01.4469827/">molexs's comment about the SCR01 hotspot</a>.</p>
<div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com2tag:blogger.com,1999:blog-7666066816041190661.post-91226769287869879342022-02-03T10:03:00.002-07:002022-02-03T10:03:52.049-07:00Generate random names and addresses from SAS<p>For testing data processing systems (e.g., CRM, record linkage), you may need to generate fake people. SAS makes it
uniquely easy to generate an unlimited count of fake US residents because it comes with a data set of
US zip codes, which include the city and state name.</p>
<p>The system uses four data sets: first names, last names, street names, and US zip codes. Initials are randomly generated from letters. The street addresses probably do not exist in the given zip codes. </p>
<p>You could extend this by:</p>
<ul>
<li>Add street directions (i.e., N, S, E, W)</li>
<li>Add street post type (e.g., Dr., Ct.)</li>
<li>Add units (e.g., Apt B, Ste 101)</li>
<li>Add post office boxes and private mail boxes</li>
<li>Spell out the middle name</li>
<li>Add name prefix (e.g., Dr., Mr.)</li>
<li>Add name suffix (e.g., Jr., Sr.)</li>
</ul>
<a name='more'></a>
<pre class="prettyprint">
%let mv_person_count = 10000; /* how many people to make */
%let mv_max_street_num = 20000; /* largest street number */
/* https://www.ssa.gov/OACT/babynames/decades/century.html */
data first;
format first_name $20.;
input first_name $;
first_name_id = _n_;
datalines;
James
Robert
John
Michael
William
David
Richard
Joseph
Thomas
Charles
Christopher
Daniel
Matthew
Anthony
Mark
Donald
Steven
Paul
Andrew
Joshua
Kenneth
Kevin
Brian
George
Edward
Ronald
Timothy
Jason
Jeffrey
Ryan
Jacob
Gary
Nicholas
Eric
Jonathan
Stephen
Larry
Justin
Scott
Brandon
Benjamin
Samuel
Gregory
Frank
Alexander
Raymond
Patrick
Jack
Dennis
Jerry
Tyler
Aaron
Jose
Adam
Henry
Nathan
Douglas
Zachary
Peter
Kyle
Walter
Ethan
Jeremy
Harold
Keith
Christian
Roger
Noah
Gerald
Carl
Terry
Sean
Austin
Arthur
Lawrence
Jesse
Dylan
Bryan
Joe
Jordan
Billy
Bruce
Albert
Willie
Gabriel
Logan
Alan
Juan
Wayne
Roy
Ralph
Randy
Eugene
Vincent
Russell
Elijah
Louis
Bobby
Philip
Johnny
Mary
Patricia
Jennifer
Linda
Elizabeth
Barbara
Susan
Jessica
Sarah
Karen
Nancy
Lisa
Betty
Margaret
Sandra
Ashley
Kimberly
Emily
Donna
Michelle
Dorothy
Carol
Amanda
Melissa
Deborah
Stephanie
Rebecca
Sharon
Laura
Cynthia
Kathleen
Amy
Shirley
Angela
Helen
Anna
Brenda
Pamela
Nicole
Emma
Samantha
Katherine
Christine
Debra
Rachel
Catherine
Carolyn
Janet
Ruth
Maria
Heather
Diane
Virginia
Julie
Joyce
Victoria
Olivia
Kelly
Christina
Lauren
Joan
Evelyn
Judith
Megan
Cheryl
Andrea
Hannah
Martha
Jacqueline
Frances
Gloria
Ann
Teresa
Kathryn
Sara
Janice
Jean
Alice
Madison
Doris
Abigail
Julia
Judy
Grace
Denise
Amber
Marilyn
Beverly
Danielle
Theresa
Sophia
Marie
Diana
Brittany
Natalie
Isabella
Charlotte
Rose
Alexis
Kayla
Homer
Marge
Bart
Lisa
Maggie
;
/* https://www.thoughtco.com/most-common-us-surnames-1422656 */
data last;
format last_name $20.;
input last_name $;
last_name_id = _n_;
datalines;
Smith
Johnson
Williams
Brown
Jones
Garcia
Miller
Davis
Rodriguez
Martinez
Hernandez
Lopez
Gonzales
Wilson
Anderson
Thomas
Taylor
Moore
Jackson
Martin
Lee
Perez
Thompson
White
Harris
Sanchez
Clark
Ramirez
Lewis
Robinson
Walker
Young
Allen
King
Wright
Scott
Torres
Nguyen
Hill
Flores
Green
Adams
Nelson
Baker
Hall
Rivera
Campbell
Mitchell
Carter
Roberts
Gomez
Phillips
Evans
Turner
Diaz
Parker
Cruz
Edwards
Collins
Reyes
Stewart
Morris
Morales
Murphy
Cook
Rogers
Gutierrez
Ortiz
Morgan
Cooper
Peterson
Bailey
Reed
Kelly
Howard
Ramos
Kim
Cox
Ward
Richardson
Watson
Brooks
Chavez
Wood
James
Bennet
Gray
Mendoza
Ruiz
Hughes
Price
Alvarez
Castillo
Sanders
Patel
Myers
Long
Ross
Foster
Jimenez
Simpson
;
/* https://www.nlc.org/resource/most-common-u-s-street-names/ */
data street;
format street_name $20.;
input street_name $;
street_name_id = _n_;
datalines;
Second
Third
First
Fourth
Park
Fifth
Main
Sixth
Oak
Seventh
Pine
Maple
Cedar
Eighth
Elm
View
Washington
Ninth
Lake
Hill
Evergreen
;
data person0;
do i = 1 to &mv_person_count;
first_name_id = %RandBetween(1, 205);
last_name_id = %RandBetween(1, 101);
street_name_id = %RandBetween(1, 21);
zip_code_id = %RandBetween(1, 40000);
output;
end;
drop i;
run;
data zip;
set sashelp.zipcode(keep=zip city statecode);
zip_code_id = _n_;
run;
proc sql;
create table person1 as
select
f.first_name,
l.last_name,
s.street_name,
z.city '',
z.statecode as state '',
z.zip as zip_numeric ''
from person0 as p
join first as f on
f.first_name_id = p.first_name_id
join last as l on
l.last_name_id = p.last_name_id
join street as s on
s.street_name_id = p.street_name_id
join zip as z on
z.zip_code_id = p.zip_code_id;
quit;
data person2;
format name street city state zip $50.;
set person1;
initial = byte(int(65+26*ranuni(0)));
name = catx(' ', first_name, initial, last_name);
/* RandBetween from https://blogs.sas.com/content/iml/2015/10/05/random-integers-sas.html */
street_num = put(%RandBetween(1,&mv_max_street_num),10.);
street = catx(' ',street_num,street_name);
zip = put(zip_numeric, z5.);
drop zip_numeric street_name street_num first_name initial last_name;
run;
</pre>
<p>Example output table with ten randomly generated fake people:</p>
<style>.mmt-table,.mmt-table-custom{width:100%;table-layout:fixed;}.mmt-table tr td, .mmt-table thead tr th{height:24px!important;padding:4px;border:1px solid #aaa;overflow: auto;} .mmt-table thead tr th {background: #e8a647; color: #fff; text-align: inherit;}.mmt-table-custom-parent{margin:0;padding:4px}.mmt-table tr:nth-child(odd) {background: #ededed} .mmt-table tr:nth-child(even) {background: #cdcdcd} .mmt-table td:nth-child(odd) {color: #545454} .mmt-table td:nth-child(even) {color: #000000}.mmt-table tr td, .mmt-table thead tr th {border : 1px solid #aaaaaa}.mmt-table thead tr th {background: #e8a647; color: #fff; text-align: inherit;}</style><div class="mmt-custom-content col-sm-12 mmt-table-custom-parent col-lg-12">
<table id="TABLE-1" class="mmt-table mmt-table-custom selected">
<thead><tr><th class="mmt-selected-col">name</th><th>street</th><th>city</th><th>state</th><th>zip</th></tr></thead>
<tbody>
<tr class="">
<td class="mmt-selected-col">Steven I Murphy</td><td>8206 Fifth</td><td>Frankfort</td><td>KY</td><td>40619</td></tr><tr><td class="mmt-selected-col">Mary M Williams</td><td>5076 Seventh</td><td>Evensville</td><td>TN</td><td>37332</td></tr><tr><td class="mmt-selected-col">Jeffrey Y Lopez</td><td>3485 Third</td><td>Henning</td><td>IL</td><td>61848</td></tr><tr><td class="mmt-selected-col">Richard Z Sanders</td><td>5500 Sixth</td><td>Kimball</td><td>NE</td><td>69145</td></tr><tr><td class="mmt-selected-col">Russell M Smith</td><td>16425 Sixth</td><td>Lexington</td><td>KY</td><td>40515</td></tr><tr><td class="mmt-selected-col">Johnny R Carter</td><td>11949 Eighth</td><td>Mount Hope</td><td>OH</td><td>44660</td></tr><tr><td class="mmt-selected-col">Raymond V Green</td><td>4659 Park</td><td>West Helena</td><td>AR</td><td>72390</td></tr><tr><td class="mmt-selected-col">Megan N Anderson</td><td>8437 Third</td><td>Chico</td><td>CA</td><td>95927</td></tr><tr><td class="mmt-selected-col">Isabella A Ross</td><td>8151 Evergreen</td><td>Barstow</td><td>MD</td><td>20610</td></tr><tr><td class="mmt-selected-col">Sharon Q Flores</td><td>3022 Lake</td><td>Poth</td><td>TX</td><td>78147</td></tr></tbody></table></div>
<p>Want to generate people names with Python instead of SAS? See <a href="https://github.com/az0/entity-metadata/blob/master/code/combine_people_names.py">combine_people_names.py</a> for a system that generates random people using <a href="https://sourceforge.net/projects/entity-metadata/files/wikidata/">Wikidata biographies</a>.</p> <div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-86224278290762592722020-12-10T17:56:00.005-07:002020-12-10T18:00:42.144-07:00Estimating birth date from age<p>This code demonstrates an algorithm for estimating birth date from age. We cannot know the exact birth date, but we can get close: the maximum error is half a year, and the typical error is one quarter of a year.</p>
<pre>
/* The %age macro was taken from the Internet---maybe from here http://support.sas.com/kb/24/808.html ? */
%macro age(date,birth);
floor ((intck('month',&birth,&date) - (day(&date) < day(&birth))) / 12)
%mend age;
/*
Generate 10000 fake people with random birth dates and random perspective days
on which their age was measured. Then, calculate age from that perspective date.
In reality, there is some seasonality to births (e.g., more births in July), but
here we assume each day of the year has an equal distribution of births.
*/
data person;
format birth_date submit_date yymmdd10.;
do i = 1 to 10000;
birth_date = %randbetween(19000,20500);
submit_date = birth_date + %randbetween(0,100*365);
age = %age(submit_date, birth_date);
output;
end;
drop i;
%runquit;
/* Work in reverse from age to estimated birth date. */
data reverse;
set person;
format birth_date_min birth_date_max yymmdd10.;
birth_date_min = intnx('years', submit_date, -1 * (age+1), 's') - 1;
birth_date_max = intnx('years',birth_date_min,1,'s') + 1;
/* check range of estimates for errors */
min_error = (birth_date > birth_date_min);
max_error = (birth_date < birth_date_max);
/* estimate birth date as the middle of the range */
birth_date_avg = mean(birth_date_min, birth_date_max);
/* calculate variance */
abs_days_error = abs(birth_date - birth_date_avg);
%runquit;
/* Both errors should always be zero. */
proc freq data=reverse;
table min_error max_error;
quit;
/* Error of estimates range from 0 to 183.5 with a median of 92 and average of 91.*/
proc means data=reverse n nmiss min median mean max;
var abs_days_error;
quit;
/* Distribution of errors is uniform */
proc sgplot data=reverse;
histogram abs_days_error;
quit;
</pre>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXORffS0VsJYmAHrwOVEMA3cwgGBC7RcI_3VSwmKLLYqUIPdxNCN6Uw08OO5jgAA-JPB0lxcFJCgpql_t-fDH4-slMB2cUvOScK4fRw62CBcCDkdzb4ZihyphenhyphenLorOibNyLah_XJQyZycYgZJ/s0/SGPlot.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" data-original-height="480" data-original-width="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXORffS0VsJYmAHrwOVEMA3cwgGBC7RcI_3VSwmKLLYqUIPdxNCN6Uw08OO5jgAA-JPB0lxcFJCgpql_t-fDH4-slMB2cUvOScK4fRw62CBcCDkdzb4ZihyphenhyphenLorOibNyLah_XJQyZycYgZJ/s0/SGPlot.png"/></a></div>
<p>Tested with SAS 9.4M6<div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-66286597385658939312019-07-22T22:44:00.000-06:002019-07-22T22:45:10.728-06:00How to connect from Linux to BleemSync 1.1<p>After installing BleemSync 1.1 on my PlayStation Classic, I could connect to the BleemSync UI from Windows but not from Linux (Ubuntu 19.04). Google Chrome reported "unable to connect," and ping to 169.254.215.100 reported a network error.</p>
<p>dmesg showed that Linux identified the device, and RNDIS networking started</p>
<pre>
[23945.137399] usb 1-3: new high-speed USB device number 31 using xhci_hcd
[23945.286069] usb 1-3: New USB device found, idVendor=04e8, idProduct=6863, bcdDevice=ff.ff
[23945.286075] usb 1-3: New USB device strings: Mfr=3, Product=4, SerialNumber=5
[23945.286079] usb 1-3: Product: classic
[23945.286082] usb 1-3: Manufacturer: BleemSync
[23945.290633] rndis_host 1-3:1.0 usb0: register 'rndis_host' at usb-0000:00:14.0-3, RNDIS device, 8a:04:6f:1c:f9:72
[23945.291271] cdc_acm 1-3:1.2: ttyACM0: USB ACM device
[23945.339113] rndis_host 1-3:1.0 enp0s20f0u3: renamed from usb0
</pre>
<p>However, ifconfig showed it did not have an IPv4 address. This indicates a DHCP failure.</p>
<pre>$ ifconfig enp0s20f0u3
enp0s20f0u3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::adf4:a447:4b1d:f96c prefixlen 64 scopeid 0x20<link>
ether 72:b7:b8:ae:c8:00 txqueuelen 1000 (Ethernet)
RX packets 8 bytes 536 (536.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 81 bytes 12347 (12.3 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
</pre>
<p>I bypassed DHCP by manual IP configuration.</p>
<pre>sudo route add -net 169.254.215.0 netmask 255.255.255.0 metric 1024 dev enp0s20f0u3
sudo ifconfig enp0s20f0u3 169.254.215.2
ping 169.254.215.100</pre>
<p>Now Google Chrome, ping, and even telnet worked.</p>
<p>Note: your interface name may vary. Mine was enp0s20f0u3.</p>
<p>This solution worked until rebooting Ubuntu. Later I found a permanent solution: <a href="https://www.reddit.com/r/classicmods/comments/aj311i/bleemsync_10_on_ubuntu/">BleemSync 1.0 on Ubuntu</a> thanks to DDFoster96.</p><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-68142783651821582612019-01-28T10:14:00.000-07:002019-01-28T10:14:35.657-07:00SAS Message Log with ODBC: COMMIT performed on connection #<p>When using SAS to develop high-performance queries against remote SQL databases, it is helpful to see the exact ODBC messages that SAS passes to the driver. Sometimes the implicit SQL poorly translates a query, which can be optimized. To see these message, enable the SAS trace like this:</p>
<pre>
options sastrace=',,,d' sastraceloc=saslog nostsuffix;
</pre>
<p>However, when closing the SAS process, there can be a pop-up dialog window with the title "SAS Message Log" with entries like this:</p>
<pre>
ODBC: COMMIT performed on connection #6.
ODBC: COMMIT performed on connection #5.
ODBC: COMMIT performed on connection #4.
ODBC: COMMIT performed on connection #3.
ODBC: COMMIT performed on connection #2.
ODBC: COMMIT performed on connection #1.
ODBC: COMMIT performed on connection #0.
</pre>
<p>When running SAS interactively, this is a minor nuisance. When running SAS in an automated batch, this can be a serious problem because the dialog will wait indefinitely for human interaction, so the sas.exe process will never terminate.</p>
<a name='more'></a>
<p>This isn't exactly a bug, but it can feel like it. Sadly, SAS provides no convenient options like these:</p>
<ul>
<li>Never show the SAS message pop-up dialog when the SAS editor has closed.</li>
<li>Automatically close the pop-up dialog after 60 seconds of inactivity.</li>
<li>Filter all traces with the text "ODBC commit."</li>
</ul>
<p>The SAS developer has these options:</p>
<ul>
<li>Disable the SAS trace.</li>
<li>Send the SAS trace to a file like this: <pre>options sastrace=',,,d' sastraceloc=file 'c:\sastest\mytrace.log' nostsuffix;</pre>
</li>
<li>Manually close the SAS Message Log whenever it appears.</li>
<li>Use <a href="https://gist.github.com/az0/06be331b83ba82f6bdbe8763a3cdeb4a">my Python script to automatically close the "SAS Message Log" dialog</a> whenever it appears.</li>
</ul>
<p>Tested on SAS 9.4M5 and Python 3.7 on Windows 10 and Windows Server 2008.</p><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-13682260196422501612018-01-28T14:51:00.001-07:002018-01-28T15:26:16.925-07:00 Type I error rates in two-sample t-test by simulation <p>What do you do when analyzing data is fun, but you don't have any new data? You make it up.</p>
<p>This simulation tests the type I error rates of two-sample t-test in R and SAS. It demonstrates efficient methods for simulation, and it reminders the reader not to take the result of any single hypothesis test as gospel truth. That is, there is always a risk of a false positive (or false negative), so determining truth requires more than one research study.<p>
<p>A type I error is a false positive. That is, it happens when a hypothesis test rejects the null hypothesis when in fact it is not true. In this simulation the null hypothesis is true by design, though in the real world we cannot be sure the null hypothesis is true. This is why we write that we "fail to reject the null hypothesis" rather than "we accept it." If there were no errors in the hypothesis tests in this simulation, we would never reject the null hypothesis, but by design it is normal to reject it according to <i>alpha</i>, the significance level. The de facto standard for alpha is 0.05.</p>
<h2>R</h2>
<p>First, we run a simulation in R by repeatedly comparing randomly-generated sets of normally-distributed values using the two-sample t-test. Notice the simulation is vectorized: there are no "for" loops that clutter the code and slow the simulation.</p>
<a name='more'></a>
<pre class="prettyprint lang-r">
# type I error
alpha.p <- 0.05
# number of simulations
n.simulations <- 1000
# number of observations in each simulation
n.obs <- 100
# a vector of test results
type.one.error<-replicate(n.simulations, t.test(rnorm(n.obs),rnorm(n.obs),
var.equal=TRUE)$p.value)<alpha.p
# type I error for the whole simulation
mean(type.one.error)
# Store cumulative results in data frame for plotting
sim <- data.frame(
n.simulations = 1:n.simulations,
type.one.error.rate = cumsum(type.one.error) / seq_along(type.one.error))
# alternative plot using ggplot2
require(ggplot2)
ggplot(sim, aes(x=n.simulations, y=type.one.error.rate)) +
geom_line() +
xlab('Number of simulations') +
ylab('Cumulative type I error rate') +
ggtitle('Simulation of type I error in t-test') +
geom_abline(intercept = alpha.p, slope=0, col='red') +
theme_bw()
</pre>
<div class="separator" style="clear: both; text-align: center;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrEh_hiU1LlPja3-mAisRhfUsnmdkMI4S9tEaMxIU5sdAngvMsv9U4v5RjdL7cgYFL-EdqXvMf9jYrchzcX9jyGnZRAuSGCGDS8Xz1moUFz3YcJo0DgiNAFQ93f-TmlC5u8uV8sDW-QYIR/s1600/t_test_2sample_r.png" data-original-width="550" data-original-height="550" /></div>
<h2>SAS</h2>
<p>Likewise, here is the equivalent code to do the same in SAS. Notice the simulation is implemented not as a slow SAS macro. Instead, it uses the BY statement in PROC TTEST.</p>
<pre class="prettyprint lang-sas">
/*
Create a data set with 1000 simulations. Each simulation
has 100 observations in each of two groups.
*/
data normal;
length simulation 4 i 3; /* save space and time */
do simulation = 1 to 1000;
do i = 1 to 100;
group='A';
/* The values are normally distributed */
x = rand('normal');
output;
group='B';
x = rand('normal');
output;
end;
end;
run;
/*
Run two-sample t-test once for each simulation, and output to
a data set called ttests.
*/
ods _all_ close;
ods output ttests=ttests;
proc ttest plots=none data=normal;
by simulation;
class group;
var x;
run;
data ttests;
set ttests;
/* Limit the rows */
if variances='Equal';
/* Define the error as a boolean */
type_one_error = probt<0.05;
/* cumulative error */
retain cumulative_error_count;
format cumulative_error_rate percent10.2;
label cumulative_error_rate = 'Cumulative error rate';
if simulation eq 1 then cumulative_error_count = 0;
cumulative_error_count+type_one_error;
cumulative_error_rate = cumulative_error_count /simulation;
run;
/* Summarize the type I error rates for this simulation */
ods html;
proc freq data=ttests;
table type_one_error/nocum;
run;
/* Draw a line plot */
proc sgplot data=ttests;
series x=simulation y=cumulative_error_rate;
refline 0.05 /axis=y lineattrs=(color=red);
run;
</pre>
<img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjb2XD-5tfMB8unjUV_XNAmXS49v7QinRqHosp1OnTKN-6ERtHsHL6VP3P_YTVWpDMfchWf9ZiAxV4aGUJHhZ8uOJBtgwICH_-KEq4v1PzpjtmxI0pB4u80FEJHlfC5Jz79bbwmXBuSvVww/s1600/sas_type_i_two_sample_t_test.png" data-original-width="640" data-original-height="480" />
<h2>Sawtooth</h2>
<p>Did you notice the sawtooth pattern in the error rate? The incidence of a false positive is relatively rare, and when it happens, there is a spike in the error rate. Then for each simulation in which there is no false positive, the rate drops by a steady rate because the count of simulations (the denominator) is an integer.</p>
<h2>Conclusion</h2>
<p>This article was developed on Ubuntu 16.04 with R 3.4 and Windows 7 with SAS 9.4.</p>
<p>See also the article: <a href="http://heuristicandrew.blogspot.com/2014/02/type-i-error-rates-in-test-of-normality.html"> Type I error rates in test of normality by simulation </a>.</p><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com1tag:blogger.com,1999:blog-7666066816041190661.post-15634471371669628062018-01-10T15:06:00.000-07:002018-01-12T13:04:13.150-07:00Condition execution on row count<p>Use this code as a template for scenarios when you want to change how a SAS program runs depending on whether a data set is empty or not empty. For example, when a report is empty, you may want to not send an email with what would be a blank report. In other words, the report sends only when it has information.</p>
<p>On the other hand, you may want to send an email when a data set is empty if that means an automated SAS program had an error that requires manual intervention.</p>
<p>In general, it's good practice in automated SAS programs to check the size of a data sets in case they are empty or otherwise have the wrong number of observations. With one easy tweak, you could check for a specific minimum number of observations that is greater than zero. (This is left as an exercise for the reader.)</p>
<a name='more'></a>
<pre class="prettyprint">
/*
This creates a sample data set with one record.
*/
data mydata;
input x;
datalines;
1
;
/*
This creates a sample data set with zero records.
It has the same name as above, so if you want to test the scenario
with a non-empty data set, simply do not run this step.
*/
data mydata;
input x;
datalines;
;
/*
Count the number of observations, and store the count in a
macro variable.
*/
data _null_;
if 0 then set mydata nobs=record_count;
call symput('mv_record_count', put(record_count, 20.));
stop;
run;
/* Print count to the log. */
%put &=mv_record_count;
/* Define a macro */
%macro conditional_run;
%if &mv_record_count gt 0 %then %do;
%put NOTE: this runs when the data set is not empty;
%end;
%else %do;
%put NOTE: this runs when the data set is empty;
%end;
%mend;
/* Run macro */
%conditional_run;
</pre>
<p>I tested this on SAS 9.4 on Windows 7, though it should work on practically all SAS systems.</p><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-62393259808289034962016-08-30T11:52:00.000-06:002016-08-30T11:52:56.330-06:00SAS ERROR: Cannot load SSL support. on Microsoft Windows<p>When using SAS with HTTPS or FTPS, which requires SSL/TLS support, you may see this error message in the SAS log.</p>
<pre class="prettyprint">
ERROR: Cannot load SSL support.
</pre>
<p>Here is an example of code that can trigger the error.
<pre class="prettyprint">
filename myref url "https://www.google.com";
data _null_;
infile myref;
run;
</pre>
<p>The cause was that
<a name='more'></a>
<em>SAS/Secure Client Components</em> was not installed, so I resolved the issue by running the SAS Deployment Wizard to install <em>SAS/Secure Client Components</em>.</p>
<p>Tested with SAS 9.4 M3 on Microsoft Windows 7. The error may also happen with encrypted SMTP, but I did not test SMTP.</p><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-74811397327626646592016-06-23T13:04:00.000-06:002016-06-23T13:06:53.689-06:00SAS error "insufficient memory" on remote queries with wide rows<p>SAS can give the error <b>The SAS System stopped processing this step because of insufficient memory</b> when querying a single, wide row from a remote SQL Server. The following code fully demonstrates the problem and shows a workaround. Also, I eliminate the explanation that SAS data sets in general do not support rows this wide.</p>
<a name='more'></a>
<pre class="prettyprint">
proc sql;
/* Connect to Microsoft SQL Server */
connect using sbox;
execute (
/* Create the table */
create table dbo.zzz_varchar_max (
id int,
txt1 varchar(max),
txt2 varchar(max)
);
/* Insert a single row */
insert into zzz_varchar_max values (1, 'foo', 'bar');
) by sbox;
quit;
/* This step triggers the error */
data _null_;
set sbox.zzz_varchar_max;
run;
/* This step does NOT trigger the error */
data _null_;
set sbox.zzz_varchar_max(drop=txt2);
run;
/* ERROR: The SAS System stopped processing this step because of insufficient memory. */
/* This step inspects the metadata. SAS considers each character column as having
the width 32767, which is the maximum string size for a SAS data set according to
https://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001336069.htm
*/
proc contents data=sbox.zzz_varchar_max;
run;
/* This step shows that SAS allows creating a data set that is even
wider than the query that fails, so the error isn't a fundamental limitation
of SAS. */
data wide;
id =1;
format txt1 txt2 txt3 $32767.;
txt1='foo';
txt2='bar';
txt3='';
run;
</pre>
<p>If the data set has a single column with NVARCHAR(MAX) or VARCHAR(MAX), there is no error. It happens only when there are (at least) two such wide columns.</p>
<p>Another workarounds include: use the KEEP option on the data set to KEEP only one of the wide columns, use a PROC SQL statement to query only one of the columns, or use a remote SQL query (maybe with SUBSTR) to truncate the columns.</p>
<p>Another workaround is to switch from the modern Microsoft ODBC driver (<tt>driver=ODBC Driver 11 for SQL Server</tt>) to the ancient driver (<tt>driver=sql server</tt>) by changing the ODBC connection string or DSN.<p>
<p>I tested with SAS 9.4 TS1M3 32-bit, Microsoft SQL Server 2012 (11.0 SP2), and the ODBC Driver 11 for SQL Server (2014.120.2000.08).</p>
<div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-20785613602467047292016-06-08T08:29:00.001-06:002016-06-08T10:26:21.795-06:00Reusing calculated columns in Netezza and SAS queries<p>Netezza and SAS allow a query to reference a <i>calculated</i> column by name in the SELECT, WHERE, and ORDER BY clauses. Based on the DRY principle, this reduces code and makes code easier to read and maintain.</p>
<p>Some people call calculated columns <i>derived</i> or <i>computed</i> columns.</p>
<p>In Microsoft SQL Server, SQLite, and other RDBMSs you cannot exactly do this: a workaround is to reference a subquery or view. In Microsoft SQL Server, you can also define a computed column on a table.</p>
<p>Below is an example tested with Netezza 7.2. Notice <tt>height_m</tt> is used in the SELECT clause, and <tt>bmi</tt> is used in the WHERE and ORDER BY clauses.</p>
<a name='more'></a>
<pre class="prettyprint lang-sql">
CREATE TEMP TABLE people (weight_kg INT, height_m float);
INSERT INTO people
VALUES (50, 1.6);
INSERT INTO people
VALUES (70, 1.8);
INSERT INTO people
VALUES (150, 1.8);
SELECT weight_kg
,height_m
,height_m*height_m as height_m_squared
,weight_kg/(<span style="background:yellow">height_m_squared</span>)::int as bmi
FROM people
WHERE <span style="background:yellow">bmi</span> < 30
ORDER BY <span style="background:yellow">bmi</span>;
</pre>
<p>Below is an example tested with SAS 9.4.</p>
<pre class="prettyprint">
data people;
input weight_kg height_m;
datalines;
50 1.6
70 1.8
150 1.8
;
proc sql;
select
weight_kg,
height_m,
height_m*height_m as height_m_squared,
weight_kg/(calculated height_m_squared) as bmi
from
people
where
calculated bmi < 30
order by
calculated bmi;
quit;
</pre><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-29907743668078706252016-03-15T16:47:00.000-06:002016-03-18T13:27:10.059-06:00In case of error in SAS program, send email and stop<p>Any automated program should check for errors and unexpected conditions, such as inability to access a resource and presence of invalid values. Unlike traditional programming languages such as Python and C# that stop processing when an error occurs, SAS barrels ahead through the rest of the program. Therefore, carelessly-written SAS programs can create unwanted side effects, such as overwriting an output data set with bad data.</p>
<p>Previously I wrote about a <a href="https://heuristically.wordpress.com/2012/02/09/return-codes-errors-sas/">robust solution for checking SAS error codes</a> which wraps the entire program in a macro and invokes %GOTO EXIT in case an error. This is still the ideal solution when some part of the program must continue, but it comes at a cost: wrapping SAS code in a macro disables syntax highlighting in the SAS Enhanced Editor (though not in SAS Studio). Also, it can be awkward to work with
the large code block delimited by the macro, so this post focuses on two alternatives.</p>
<a name='more'></a>
<p>The easiest method is to enable <tt>OPTIONS ERRORABEND</tt>, which exits the program in case of an error. The major benefit is it requires only a single line of code to enable for the rest of the program, regardless of how many steps there are. For example:</p>
<pre class="prettyprint">
OPTIONS ERRORABEND;
/* Misspelled data set */
data female;
set sashelp.clss;
if sex='F';
run;
</pre>
<p>However, ERRORABEND can be painful while developing a program in interactive mode because it immediately exits the SAS session, and it cannot push error notifications.</p>
<p>This motivates the following approach. After each DATA STEP or procedure, call a macro that checks the sanity of preceding steps. In case of an error, the universal error handler can send a notification email, clean up, etc. Like ERRORABEND, syntax highlighting still works. Like the other options, it can check errors for DATA STEPs and various procedures. Though it requires code after each step, it is less code than the first solution proposed in the <a href="https://heuristically.wordpress.com/2012/02/09/return-codes-errors-sas/">original article</a>.</p>
<pre class="prettyprint">
/* This macro is called from two other macros below. It defines any
common error handling code, and it should be customized.
In some programs, you may want to clean up here.
*/
%macro email_and_abort;
/* Send an email */
filename mymail email "andrew@example.com" subject="error in SAS program";
data _null_;
file mymail;
put 'Check the SAS logs';
run;
/* Stop further processing */
%abort cancel;
%mend;
/* This macro asserts the last procedure did not throw an error
or a warning. We will invoke this macro after each step. */
%macro expect_no_syserr;
/* If there is an error or warning... */
%if &syserr ne 0 %then %do;
/* Write the error code to the SAS log */
%put ERROR: &=syserr;/
%email_and_abort;
%end;
%mend;
/* Assuming SASHELP.CLASS exists and contains the character
variable SEX, this DATA STEP will succeed. */
data female;
set sashelp.class;
if sex='F';
run;
%expect_no_syserr;
/* Likewise, the macro works with procedures such as PROC FREQ. */
proc freq data=sashelp.class noprint;
table age/out=age_freq;
run;
%expect_no_syserr;
/* This macro asserts PROC SQL creates at least one observation. */
%macro expect_any_obs;
%if &sqlobs eq 0 %then %do;
%put ERROR: no observations;
%email_and_abort;
%end;
%mend;
/* This PROC SQL shows that the assertions working with PROC
SQL and that two assertions can be combined. */
proc sql;
create table age_count as
select
age,
count(1) as count
from
sashelp.class
group by
age;
quit;
%expect_no_syserr;
%expect_any_obs;
/* Misspelled data set. This will fail to demonstrate the macros. */
data female;
set sashelp.clss;
if sex='F';
run;
%expect_no_syserr;
</pre>
<p>Similar assertion macros can check that a data set exists, variables exist in a a data set, observations exist in a data set, a path is writable, a file exists, and so on.</p>
<p>Tested with SAS 9.4M3.</p><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com3tag:blogger.com,1999:blog-7666066816041190661.post-35460068890983619952016-03-10T10:42:00.000-07:002016-03-10T10:43:47.572-07:00R: InternetOpenUrl failed: 'The date in the certificate is invalid or has expired'<p>Today the two-year-old TLS security certificate for <a rel="external nofollow" href="https://cran.r-project.org">cran.r-project.org</a> expired, so suddenly in R you are getting errors running <b>install.packages</b> or <b>update.packages</b>.</p>
<p>The error looks like this:</p>
<pre class="prettyprint lang-r">
> update.packages()
--- Please select a CRAN mirror for use in this session ---
Error in download.file(url, destfile = f, quiet = TRUE) :
cannot open URL 'https://cran.r-project.org/CRAN_mirrors.csv'
In addition: Warning message:
In download.file(url, destfile = f, quiet = TRUE) :
InternetOpenUrl failed: 'The date in the certificate is invalid or has expired'
</pre>
<p>The workaround is simple: choose another repository! For example:<p>
<a name='more'></a>
<pre class="prettyprint lang-r">
options("repos"="https://cran.revolutionanalytics.com/")
update.packages(ask=T)
install.packages('gbm')
</pre>
<p>This is bad timing with the release of R 3.2.4 today. If you need to download R using your web browser, visit a mirror, such as <a href="https://cran.revolutionanalytics.com" rel="external nofollow">cran.revolutionanalytics.com</a>.</p>
<p>Tested with R 3.2.3 on Windows 7 and Windows Server 2012.</p><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com3tag:blogger.com,1999:blog-7666066816041190661.post-81296235930403431272016-02-22T13:13:00.001-07:002016-02-22T13:13:58.166-07:00ISO 3166-1 alpha-2 (two-letter country code) format for SAS<p>Here is the widely-used ISO 3166-1 alpha-2 format for use in SAS. It is commonly called the two-letter country code format.</p>
<p>The PROC FORMAT code generates a character format, so where the raw data contains a code, such as US, it expands it to the pretty name, such as United States. As with any SAS format, applying the format does not change the underlying data.</p>
<a name='more'></a>
<pre class="prettyprint">
proc format;
/* ISO 3166-1 alpha-2 two letter country codes */
value $ iso3166alphatwo
'AF' = 'Afghanistan'
'AX' = 'Åland Islands'
'AL' = 'Albania'
'DZ' = 'Algeria'
'AS' = 'American Samoa'
'AD' = 'Andorra'
'AO' = 'Angola'
'AI' = 'Anguilla'
'AQ' = 'Antarctica'
'AG' = 'Antigua and Barbuda'
'AR' = 'Argentina'
'AM' = 'Armenia'
'AW' = 'Aruba'
'AU' = 'Australia'
'AT' = 'Austria'
'AZ' = 'Azerbaijan'
'BS' = 'Bahamas'
'BH' = 'Bahrain'
'BD' = 'Bangladesh'
'BB' = 'Barbados'
'BY' = 'Belarus'
'BE' = 'Belgium'
'BZ' = 'Belize'
'BJ' = 'Benin'
'BM' = 'Bermuda'
'BT' = 'Bhutan'
'BO' = 'Bolivia, Plurinational State of'
'BQ' = 'Bonaire, Sint Eustatius and Saba'
'BA' = 'Bosnia and Herzegovina'
'BW' = 'Botswana'
'BV' = 'Bouvet Island'
'BR' = 'Brazil'
'IO' = 'British Indian Ocean Territory'
'BN' = 'Brunei Darussalam'
'BG' = 'Bulgaria'
'BF' = 'Burkina Faso'
'BI' = 'Burundi'
'KH' = 'Cambodia'
'CM' = 'Cameroon'
'CA' = 'Canada'
'CV' = 'Cape Verde'
'KY' = 'Cayman Islands'
'CF' = 'Central African Republic'
'TD' = 'Chad'
'CL' = 'Chile'
'CN' = 'China'
'CX' = 'Christmas Island'
'CC' = 'Cocos (Keeling) Islands'
'CO' = 'Colombia'
'KM' = 'Comoros'
'CG' = 'Congo'
'CD' = 'Congo, the Democratic Republic of the'
'CK' = 'Cook Islands'
'CR' = 'Costa Rica'
'CI' = 'Côte d''Ivoire'
'HR' = 'Croatia'
'CU' = 'Cuba'
'CW' = 'Curaçao'
'CY' = 'Cyprus'
'CZ' = 'Czech Republic'
'DK' = 'Denmark'
'DJ' = 'Djibouti'
'DM' = 'Dominica'
'DO' = 'Dominican Republic'
'EC' = 'Ecuador'
'EG' = 'Egypt'
'SV' = 'El Salvador'
'GQ' = 'Equatorial Guinea'
'ER' = 'Eritrea'
'EE' = 'Estonia'
'ET' = 'Ethiopia'
'FK' = 'Falkland Islands (Malvinas)'
'FO' = 'Faroe Islands'
'FJ' = 'Fiji'
'FI' = 'Finland'
'FR' = 'France'
'GF' = 'French Guiana'
'PF' = 'French Polynesia'
'TF' = 'French Southern Territories'
'GA' = 'Gabon'
'GM' = 'Gambia'
'GE' = 'Georgia'
'DE' = 'Germany'
'GH' = 'Ghana'
'GI' = 'Gibraltar'
'GR' = 'Greece'
'GL' = 'Greenland'
'GD' = 'Grenada'
'GP' = 'Guadeloupe'
'GU' = 'Guam'
'GT' = 'Guatemala'
'GG' = 'Guernsey'
'GN' = 'Guinea'
'GW' = 'Guinea-Bissau'
'GY' = 'Guyana'
'HT' = 'Haiti'
'HM' = 'Heard Island and McDonald Mcdonald Islands'
'VA' = 'Holy See (Vatican City State)'
'HN' = 'Honduras'
'HK' = 'Hong Kong'
'HU' = 'Hungary'
'IS' = 'Iceland'
'IN' = 'India'
'ID' = 'Indonesia'
'IR' = 'Iran, Islamic Republic of'
'IQ' = 'Iraq'
'IE' = 'Ireland'
'IM' = 'Isle of Man'
'IL' = 'Israel'
'IT' = 'Italy'
'JM' = 'Jamaica'
'JP' = 'Japan'
'JE' = 'Jersey'
'JO' = 'Jordan'
'KZ' = 'Kazakhstan'
'KE' = 'Kenya'
'KI' = 'Kiribati'
'KP' = 'Korea, Democratic People''s Republic of'
'KR' = 'Korea, Republic of'
'KW' = 'Kuwait'
'KG' = 'Kyrgyzstan'
'LA' = 'Lao People''s Democratic Republic'
'LV' = 'Latvia'
'LB' = 'Lebanon'
'LS' = 'Lesotho'
'LR' = 'Liberia'
'LY' = 'Libya'
'LI' = 'Liechtenstein'
'LT' = 'Lithuania'
'LU' = 'Luxemourg'
'MO' = 'Macao'
'MK' = 'Macedonia, the Former Yugoslav Republic of'
'MG' = 'Madagascar'
'MW' = 'Malawi'
'MY' = 'Malaysia'
'MV' = 'Maldives'
'ML' = 'Mali'
'MT' = 'Malta'
'MH' = 'Marshall Islands'
'MQ' = 'Martinique'
'MR' = 'Mauritania'
'MU' = 'Mauritius'
'YT' = 'Mayotte'
'MX' = 'Mexico'
'FM' = 'Micronesia, Federated States of'
'MD' = 'Moldova, Republic of'
'MC' = 'Monaco'
'MN' = 'Mongolia'
'ME' = 'Montenegro'
'MS' = 'Montserrat'
'MA' = 'Morocco'
'MZ' = 'Mozambique'
'MM' = 'Myanmar'
'NA' = 'Namibia'
'NR' = 'Nauru'
'NP' = 'Nepal'
'NL' = 'Netherlands'
'NC' = 'New Caledonia'
'NZ' = 'New Zealand'
'NI' = 'Nicaragua'
'NE' = 'Niger'
'NG' = 'Nigeria'
'NU' = 'Niue'
'NF' = 'Norfolk Island'
'MP' = 'Northern Mariana Islands'
'NO' = 'Norway'
'OM' = 'Oman'
'PK' = 'Pakistan'
'PW' = 'Palau'
'PS' = 'Palestine, State of'
'PA' = 'Panama'
'PG' = 'Papua New Guinea'
'PY' = 'Paraguay'
'PE' = 'Peru'
'PH' = 'Philippines'
'PN' = 'Pitcairn'
'PL' = 'Poland'
'PT' = 'Portugal'
'PR' = 'Puerto Rico'
'QA' = 'Qatar'
'RE' = 'Réunion'
'RO' = 'Romania'
'RU' = 'Russian Federation'
'RW' = 'Rwanda'
'BL' = 'Saint Barthélemy'
'SH' = 'Saint Helena, Ascension and Tristan da Cunha'
'KN' = 'Saint Kitts and Nevis'
'LC' = 'Saint Lucia'
'MF' = 'Saint Martin (French part)'
'PM' = 'Saint Pierre and Miquelon'
'VC' = 'Saint Vincent and the Grenadines'
'WS' = 'Samoa'
'SM' = 'San Marino'
'ST' = 'Sao Tome and Principe'
'SA' = 'Saudi Arabia'
'SN' = 'Senegal'
'RS' = 'Serbia'
'SC' = 'Seychelles'
'SL' = 'Sierra Leone'
'SG' = 'Singapore'
'SX' = 'Sint Maarten (Dutch part)'
'SK' = 'Slovakia'
'SI' = 'Slovenia'
'SB' = 'Solomon Islands'
'SO' = 'Somalia'
'ZA' = 'South Africa'
'GS' = 'South Georgia and the South Sandwich Islands'
'SS' = 'South Sudan'
'ES' = 'Spain'
'LK' = 'Sri Lanka'
'SD' = 'Sudan'
'SR' = 'Suriname'
'SJ' = 'Svalbard and Jan Mayen'
'SZ' = 'Swaziland'
'SE' = 'Sweden'
'CH' = 'Switzerland'
'SY' = 'Syryan Arab Republic'
'TW' = 'Taiwan, Province of China'
'TJ' = 'Tajikistan'
'TZ' = 'Tanzania, United Republic of'
'TH' = 'Thailand'
'TL' = 'Timor-Leste'
'TG' = 'Togo'
'TK' = 'Tokelau'
'TO' = 'Tonga'
'TT' = 'Trinidad and Tobago'
'TN' = 'Tunisia'
'TR' = 'Turkey'
'TM' = 'Turkmenistan'
'TC' = 'Turks and Caicos Islands'
'TV' = 'Tuvalu'
'UG' = 'Uganda'
'UA' = 'Ukraine'
'AE' = 'United Arab Emirates'
'GB' = 'United Kingdom'
'US' = 'United States'
'UM' = 'United States Minor Outlying Islands'
'UY' = 'Uruguay'
'UZ' = 'Uzbekistan'
'VU' = 'Vanuatu'
'VE' = 'Venezuela, Bolivarian Republic of'
'VN' = 'Vietnam'
'VG' = 'Virgin Islands, British'
'VI' = 'Virgin Islands, U.S.'
'WF' = 'Wallis and Futuna'
'EH' = 'Western Sahara'
'YE' = 'Yemen'
'ZM' = 'Zambia'
'ZW' = 'Zimbabwe'
;
quit;
/* Example usage */
data country;
format country_code $iso3166alphatwo.;
country_code = 'US';
output;
country_code='GB';
output;
run;
proc print data=country;
run;
</pre>
<p>This list is <a href="https://support.cloudflare.com/hc/en-us/articles/205072537-What-are-the-two-letter-country-codes-for-the-Access-Rules-">from Cloudflare</a> published 2015.
<p>Tested with SAS 9.4M3 on Microsoft Windows.</p><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-48506702183320023752016-02-03T14:01:00.000-07:002016-02-03T14:08:10.739-07:00Undocumented SAS feature: Bulkloading to Netezza with ODBC interface<p>The SAS/ACCESS Interface to ODBC in SAS 9.4M4 states it supports <a rel="external nofollow" href="https://support.sas.com/documentation/cdl/en/acreldb/68028/HTML/default/viewer.htm#n0qddeciyqg3qfn1tosrb42y47os.htm">bulk loading</a> only to "Microsoft SQL Server data on Windows platforms." However, in practice on the Windows platform it also supports bulk loading to Netezza.</p>
<p>Bulk loading is amazingly fast. In some of my benchmarks the duration of the whole bulk loading operation is independent of the number of rows inserted!<?p>
<p>By default on Netezza the bulk loading interface delimits values using a pipe character, and for cases where the values contain a pipe, <i>SAS Access Interface to ODBC</i> unofficially supports the BL_DELIMITER option to specify an alternate delimiter. For the ODBC interface, this option is undocumented.</p>
<p>However, there are nuances with the BL_DELIMITER option. According to the <a rel="external nofollow" href="https://support.sas.com/documentation/cdl/en/acreldb/68028/HTML/default/viewer.htm#n04jpcu1ws9wjrn15vii00ifx5v6.htm">SAS Access Interface to Netezza</a>:</p>
<a name='more'></a>
<blockquote>You can use any 7-bit ASCII character as a delimiter. The default is the pipe symbol (ǀ). To use a printable ASCII character, enclose it in quotation marks (for example, BL_DELIMITER="|"). However, to use an extended character, use the three-digit decimal number representation of the ASCII character for this option. For example, set BL_DELIMITER=202 to use ASCII character 202 as a delimiter. You must specify decimal number delimiters as three digits even if the first two digits would be zero. For example, specify BL_DELIMITER=003, not BL_DELIMITER=3 or BL_DELIMITER=03. </blockquote>
<p>First, notice a contradiction in the documentation. Because 7-bit characters are in the range 1-127 implies that 8-bit characters in the range 128-256 are not supported, but the documentation gives an example in this range (BL_DELIMITER=202).</p>
<p>Second, the syntax for the ODBC interface (which is not covered by the documentation for the ODBC interface) supports only a single character, so specifying a delimiter using the three-digit decimal notation will always cause an error.</p>
<p>For example, this works with the ODBC interface</p>
<pre class="prettyprint">
options sastrace=',,,d' sastraceloc=saslog nostsuffix;
/*
Because a data set with one variable does not require
a delimiter, this data set has two variables.
*/
data has_pipe;
/*
Because the first character in 122 used below is a 1,
here we test the number 1.
*/
number=1;
/*
Because by default the delimiter is a pipe, one of the
values has a pipe.
*/
char='I|have|a|pipe.';
output;
run;
/* Use a lowercase z as the alternate delimiter */
data nz.has_pipe(bulkload=yes bl_delimiter='z');
set has_pipe;
run;
</pre>
<p>However, the decimal representation fails.</p>
<pre class="prettyprint">
/* 122 is the decimal representation of the lowercase z */
data nz.has_pipe(bulkload=yes bl_delimiter=122);
set has_pipe;
run;
</pre>
<p>The SAS log shows SAS treats the decimal representation as a literal character and truncates it to the first character.</p>
<pre class="prettyprint">
ODBC_25: Executed: on connection 8
CREATE EXTERNAL TABLE EXT_HAS_PIPE SAMEAS ADMIN.HAS_PIPE USING
(DATAOBJECT('\\.\pipe\BL_HAS_PIPE_3') DELIMITER '1' REMOTESOURCE 'ODBC' )
</pre>
<p>If you have it licensed, the <i>SAS/ACCESS Interface to Netezza</i> should support the decimal notation: in this case, I would suggest using a tab delimiter with BL_DELIMITER=009.</p>
<p>If not, you must either disable bulkloading or use a single, printable ASCII character as a delimiter. If your data set requires a full range of characters but never all characters on the same row (for example, some rows have a pipe while other rows have a caret), split your data set into two data sets, and then bulk load each data set using separate delimiters.</p>
<div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com5tag:blogger.com,1999:blog-7666066816041190661.post-34943995675146043022016-01-29T13:31:00.000-07:002016-02-08T12:44:21.513-07:00Frequency of individual characters from SAS data set<p>This script counts the frequencies of individual ASCII characters in a single column in a SAS data set and then prints an easy-to-read report.</p>
<p>My initial motivation relates to delimiters. By default bulkloading data from Netezza to SAS (which is very fast) uses the pipe character as a delimiter, but my data set contained values with the pipe character, so this macro identifies alternative delimiters.</p>
<p>Another potential use is cracking a message encrypted using a simple letter substitution cipher.</p>
<p>To begin, this code creates an example data set courtesy of William Shakespeare.</p>
<a name='more'></a>
<pre>
data sonnet18;
input line $60.;
datalines;
Shall I compare thee to a summer's day?
Thou art more lovely and more temperate:
Rough winds do shake the darling buds of May,
And summer's lease hath all too short a date:
Sometime too hot the eye of heaven shines,
And often is his gold complexion dimmed,
And every fair from fair sometime declines,
By chance, or nature's changing course untrimmed:
But thy eternal summer shall not fade,
Nor lose possession of that fair thou ow'st,
Nor shall death brag thou wander'st in his shade,
When in eternal lines to time thou grow'st,
So long as men can breathe, or eyes can see,
So long lives this, and this gives life to thee.
;
</pre>
<p>Next, here is the macro that counts all the printable ASCII characters in all rows of the data set and makes a new data set with total counts by ASCII character.</p>
<pre class="prettyprint">
%macro character_histogram(dataset, column);
data histogram_tmp;
set &dataset;
/* Characters 32 through 126 are printable ASCII. */
%do i = 32 %to 126;
/* Count the number of characters in the column. */
/* Store each count in separate column. */
count_chr_&i = count(trim(&column), byte(&i));
%end;
run;
/* Sum the character counts from all the rows. */
proc means data=histogram_tmp noprint;
var count_chr_:;
output out=histogram_wide sum=sum/autoname;
run;
/* Clean up */
proc sql;
drop table histogram_tmp;
quit;
/* Switch from long to wide. */
proc transpose
data=histogram_wide(keep=count_chr:)
out=histogram_long
;
run;
/* Make pretty. */
data histogram_long;
set histogram_long;
character_decimal = input(compress(_name_, , 'kd'), 3.);
drop _name_;
character = byte(character_decimal);
rename col1=count_characters;
run;
%mend;
</pre>
<p>Finally, this code invokes the macro and prints the report.</p>
<pre class="prettyprint">
/* Run the histogram macro */
%character_histogram(sonnet18, line);
/* Print the final report as a table. */
proc print data=histogram_long noobs;
var character_decimal character count_characters ;
run;
/* Barchart, what some people would call a histogram of the letters. */
proc sgplot data=histogram_long;
hbar character/freq=count_characters;
run;
</pre>
<p>This is the final report.</p>
<table class="table" cellspacing="0" cellpadding="5" rules="all" frame="box" bordercolor="#C1C1C1" summary="Procedure Print: Data Set WORK.HISTOGRAM_LONG">
<thead>
<tr>
<th class="r header" scope="col">character_decimal</th>
<th class="l header" scope="col">character</th>
<th class="r header" scope="col">count_characters</th>
</tr>
</thead>
<tbody>
<tr>
<td class="r data">32</td>
<td class="l data"> </td>
<td class="r data">100</td>
</tr>
<tr>
<td class="r data">33</td>
<td class="l data">!</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">34</td>
<td class="l data">"</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">35</td>
<td class="l data">#</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">36</td>
<td class="l data">$</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">37</td>
<td class="l data">%</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">38</td>
<td class="l data">&</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">39</td>
<td class="l data">'</td>
<td class="r data">6</td>
</tr>
<tr>
<td class="r data">40</td>
<td class="l data">(</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">41</td>
<td class="l data">)</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">42</td>
<td class="l data">*</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">43</td>
<td class="l data">+</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">44</td>
<td class="l data">,</td>
<td class="r data">12</td>
</tr>
<tr>
<td class="r data">45</td>
<td class="l data">-</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">46</td>
<td class="l data">.</td>
<td class="r data">1</td>
</tr>
<tr>
<td class="r data">47</td>
<td class="l data">/</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">48</td>
<td class="l data">0</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">49</td>
<td class="l data">1</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">50</td>
<td class="l data">2</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">51</td>
<td class="l data">3</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">52</td>
<td class="l data">4</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">53</td>
<td class="l data">5</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">54</td>
<td class="l data">6</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">55</td>
<td class="l data">7</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">56</td>
<td class="l data">8</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">57</td>
<td class="l data">9</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">58</td>
<td class="l data">:</td>
<td class="r data">3</td>
</tr>
<tr>
<td class="r data">59</td>
<td class="l data">;</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">60</td>
<td class="l data"><</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">61</td>
<td class="l data">=</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">62</td>
<td class="l data">></td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">63</td>
<td class="l data">?</td>
<td class="r data">1</td>
</tr>
<tr>
<td class="r data">64</td>
<td class="l data">@</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">65</td>
<td class="l data">A</td>
<td class="r data">3</td>
</tr>
<tr>
<td class="r data">66</td>
<td class="l data">B</td>
<td class="r data">2</td>
</tr>
<tr>
<td class="r data">67</td>
<td class="l data">C</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">68</td>
<td class="l data">D</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">69</td>
<td class="l data">E</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">70</td>
<td class="l data">F</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">71</td>
<td class="l data">G</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">72</td>
<td class="l data">H</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">73</td>
<td class="l data">I</td>
<td class="r data">1</td>
</tr>
<tr>
<td class="r data">74</td>
<td class="l data">J</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">75</td>
<td class="l data">K</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">76</td>
<td class="l data">L</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">77</td>
<td class="l data">M</td>
<td class="r data">1</td>
</tr>
<tr>
<td class="r data">78</td>
<td class="l data">N</td>
<td class="r data">2</td>
</tr>
<tr>
<td class="r data">79</td>
<td class="l data">O</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">80</td>
<td class="l data">P</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">81</td>
<td class="l data">Q</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">82</td>
<td class="l data">R</td>
<td class="r data">1</td>
</tr>
<tr>
<td class="r data">83</td>
<td class="l data">S</td>
<td class="r data">4</td>
</tr>
<tr>
<td class="r data">84</td>
<td class="l data">T</td>
<td class="r data">1</td>
</tr>
<tr>
<td class="r data">85</td>
<td class="l data">U</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">86</td>
<td class="l data">V</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">87</td>
<td class="l data">W</td>
<td class="r data">1</td>
</tr>
<tr>
<td class="r data">88</td>
<td class="l data">X</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">89</td>
<td class="l data">Y</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">90</td>
<td class="l data">Z</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">91</td>
<td class="l data">[</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">92</td>
<td class="l data">\</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">93</td>
<td class="l data">]</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">94</td>
<td class="l data">^</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">95</td>
<td class="l data">_</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">96</td>
<td class="l data">`</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">97</td>
<td class="l data">a</td>
<td class="r data">37</td>
</tr>
<tr>
<td class="r data">98</td>
<td class="l data">b</td>
<td class="r data">3</td>
</tr>
<tr>
<td class="r data">99</td>
<td class="l data">c</td>
<td class="r data">9</td>
</tr>
<tr>
<td class="r data">100</td>
<td class="l data">d</td>
<td class="r data">20</td>
</tr>
<tr>
<td class="r data">101</td>
<td class="l data">e</td>
<td class="r data">63</td>
</tr>
<tr>
<td class="r data">102</td>
<td class="l data">f</td>
<td class="r data">10</td>
</tr>
<tr>
<td class="r data">103</td>
<td class="l data">g</td>
<td class="r data">10</td>
</tr>
<tr>
<td class="r data">104</td>
<td class="l data">h</td>
<td class="r data">31</td>
</tr>
<tr>
<td class="r data">105</td>
<td class="l data">i</td>
<td class="r data">26</td>
</tr>
<tr>
<td class="r data">106</td>
<td class="l data">j</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">107</td>
<td class="l data">k</td>
<td class="r data">1</td>
</tr>
<tr>
<td class="r data">108</td>
<td class="l data">l</td>
<td class="r data">23</td>
</tr>
<tr>
<td class="r data">109</td>
<td class="l data">m</td>
<td class="r data">22</td>
</tr>
<tr>
<td class="r data">110</td>
<td class="l data">n</td>
<td class="r data">31</td>
</tr>
<tr>
<td class="r data">111</td>
<td class="l data">o</td>
<td class="r data">44</td>
</tr>
<tr>
<td class="r data">112</td>
<td class="l data">p</td>
<td class="r data">4</td>
</tr>
<tr>
<td class="r data">113</td>
<td class="l data">q</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">114</td>
<td class="l data">r</td>
<td class="r data">28</td>
</tr>
<tr>
<td class="r data">115</td>
<td class="l data">s</td>
<td class="r data">38</td>
</tr>
<tr>
<td class="r data">116</td>
<td class="l data">t</td>
<td class="r data">39</td>
</tr>
<tr>
<td class="r data">117</td>
<td class="l data">u</td>
<td class="r data">13</td>
</tr>
<tr>
<td class="r data">118</td>
<td class="l data">v</td>
<td class="r data">5</td>
</tr>
<tr>
<td class="r data">119</td>
<td class="l data">w</td>
<td class="r data">4</td>
</tr>
<tr>
<td class="r data">120</td>
<td class="l data">x</td>
<td class="r data">1</td>
</tr>
<tr>
<td class="r data">121</td>
<td class="l data">y</td>
<td class="r data">8</td>
</tr>
<tr>
<td class="r data">122</td>
<td class="l data">z</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">123</td>
<td class="l data">{</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">124</td>
<td class="l data">|</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">125</td>
<td class="l data">}</td>
<td class="r data">0</td>
</tr>
<tr>
<td class="r data">126</td>
<td class="l data">~</td>
<td class="r data">0</td>
</tr>
</tbody>
</table>
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjAY2EFbvRPo8y9yhOWXrqunY_VrW-jnakHpTWdNx4PymaqxBcbEwwStLyy8s8FaXVZRegHIA609m0vllosiYZNojFp3KsGBjFZLM56CKxAlBnwrqvG36ZwIOkRnvOLSLCiKdBXoA4ZESDK/s800-Ic42/histogram_characters.png" alt="Histogram of individual characters made with SAS SGPLOT" width="640" height="480">
<p>This script was tested with SAS 9.4M3 on Windows 7.</p>
<div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-19282710871606224682015-09-02T10:29:00.002-06:002016-01-29T14:37:39.366-07:00SAS macros always have a global scope<p>SAS allows the programmer to declare the scope of macro variables using <tt>%LOCAL</tt> or <tt>%GLOBAL</tt>, but the macros themselves are always created in the global scope.</p>
<p>Say you have a macro that in another language, say Python, would be considered a function. Within the macro you want a sub-macro (i.e., sub-function) to be used only within the outer macro.</p>
<pre class="prettyprint">
%macro outer;
%put NOTE: outer;
/* This "sub-macro" is defined within the outer macro and is
intended only for use within the outer macro. */
%macro inner(foo);
%put NOTE: inner &foo;
%mend;
%inner(1);
%inner(2);
%mend;
%outer;
/* If the "sub-macro" has a local scope, the next step would fail */
%inner(3);
/* However, it succeeds */
</pre>
<p>This can lead to conflicts if the macro <tt>%inner</tt> is defined somewhere else in the same session. One way of dealing with this is to be careful to give the inner macro a unique name like <tt>__outer_inner</tt> where the underscores in the prefix suggest a local scope, and adding <tt>outer</tt> to the macro name indicates the macro is to be used only in the <tt>outer</tt> macro.</p>
<a name='more'></a>
<p>Another option is to use the <tt>%sysmacdelete</tt> to delete the inner macro:</p>
<pre class="prettyprint">
%macro outer;
%put NOTE: outer;
%macro inner(foo);
%put NOTE: inner &foo;
%mend;
%inner(1);
%inner(2);
/* Delete the inner macro */
%SYSMACDELETE inner;
%mend;
%outer;
/* This fails because of SYSMACDELETE */
%inner(3);
</pre>
<p>Tested with SAS 9.4M3 on Windows 7.</p><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-46289311495100508102015-08-31T14:40:00.001-06:002015-08-31T14:41:25.024-06:00Gotcha with SAS, regular expressions, and end-of-line matching<p>Regular expressions are essential for sophisticated text processing, and it is generally easy to transfer knowledge of Perl regular expressions to the SAS functions prxparse, prxmatch, prxposn, etc. However, use caution with the end of line character ($) because of how SAS treats whitespace.</p>
<p>For demonstration I will run what looks like an equivalent use of regular expressions in Python, JavaScript, and SAS, but notice that only SAS does not match the string.</p>
<pre class="prettyprint lang-py">
# Python 2.7
import re
first_name ='Andrew '
first_name = first_name.strip()
if re.search(r"^Andrew$", first_name):
print 'match' # it does match
else:
print 'no match'
</pre>
<pre class="prettyprint lang-js">
/*JavaScript */
first_name ='Andrew ';
first_name = first_name.trim();
if (first_name.match(/^Andrew$/))
alert('match'); /* it does match */
else alert('no match');
</pre>
<p>(<a rel="external nofollow" href="https://jsfiddle.net/rwj2fp7k/">JavaScript fiddle for this code</a>.)</p>
<pre class="prettyprint">
data x;
first_name='Andrew ';
first_name=strip(first_name);
match=prxmatch('/^Andrew$/', first_name); /* it does not match (match=0) */
run;
</pre>
<p>In SAS ignore the trailing whitespace using the trim() function:</p>
<pre class="prettyprint">
data x;
first_name='Andrew ';
match=prxmatch('/^Andrew$/', trim(first_name)); /* it does match */
run;
</pre>
<p>SAS, however, does not distinguish a string that was inserted with trailing spaces from a string that was inserted without trailing spaces. In the following SAS-only example imagine the table was created and populated using a non-SAS system like MySQL or Microsoft SQL Server.</p>
<pre class="prettyprint">
proc sql;
create table names (
first_name varchar(8)
);
insert into names values ('Andrew'); /* no trailing spaces */
insert into names values ('Andrew '); /* one trailing space */
insert into names values ('Andrew ');
quit;
data names;
set names;
length=length(first_name);
match1=prxmatch('/^Andrew$/', first_name);
match2=prxmatch('/^Andrew$/', trim(first_name));
run;</pre>
<img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvke4q7NvYYzNVmuBO0-HUevTofAydTbeCuA34jjvWugzQo_6QSu_goOWdeU3N4wflz5I2pHCu5cSViDH9SXNYmhVHQiDJw7poSGVWz1HXhYOIC8lWmpR6p7g2-DG6MyIvk6fvHabsce-M/s1600/sas_names.png" alt="Screenshot of SAS data set"/>
<p>This was tested with SAS 9.4M3 on Microsoft Windows 7.</p><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com3tag:blogger.com,1999:blog-7666066816041190661.post-90323432501550011152015-06-09T14:02:00.002-06:002015-06-09T14:02:36.699-06:00List of user-installed R packages and their versions<p>This R command lists all the packages installed by the user (ignoring packages that come with R such as <em>base</em> and <em>foreign</em>) and the package versions.</p>
<pre class="prettyprint lang-r">
ip <- as.data.frame(installed.packages()[,c(1,3:4)])
rownames(ip) <- NULL
ip <- ip[is.na(ip$Priority),1:2,drop=FALSE]
print(ip, row.names=FALSE)
</pre>
<p>Example output</p>
<pre>
Package Version
bitops 1.0-6
BradleyTerry2 1.0-6
brew 1.0-6
brglm 0.5-9
car 2.0-25
caret 6.0-47
coin 1.0-24
colorspace 1.2-6
crayon 1.2.1
devtools 1.8.0
dichromat 2.0-0
digest 0.6.8
earth 4.4.0
evaluate 0.7
[..snip..]
</pre>
<p>Tested with R 3.2.0.</p>
<p>This is a small step towards managing package versions: for a better solution, see the <em>checkpoint</em> package. You could also use the first column to reinstall user-installed R packages after an R upgrade.</p><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-42599446120622864692015-03-03T09:26:00.002-07:002016-01-29T14:37:50.330-07:00SAS 9.4 crash with MySQL ODBC pass-through queries<p>SAS 9.4 (TS1M2) on X64_DS08R2 (Windows Server 2008 64-bit) always crashes with certain pass-through queries using MySQL Connector/ODBC 5.3.4. When it crashes, the SAS log shows some red messages, but SAS closes immediately. </p>
<p>The crash is not reproducible with other ODBC drivers, on SAS 9.3 64-bit, or SAS 9.4 32-bit.</p>
<p>Workarounds include: using an ODBC DSN instead of the connection string, not using pass-through queries, or using SAS 9.3.</p>
<p>SAS agreed to fix the bug.</p>
<a name='more'></a>
<p>This shows how to procedure it
<pre class="prettyprint">
/* Trace log */
options sastrace=',,d,d' sastraceloc=file 'c:\temp\mytracefile.log';
/* This does not crash */
libname ensembl odbc
required="Driver={MySQL ODBC 5.3 Unicode Driver};Server=ensembldb.ensembl.org;Database=aedes_aegypti_core_48_1b;Uid=anonymous;interactive=1;";
/* This does not crash */
proc sql;
create table x as
select *
from ensembl.analysis;
quit;
/* This crashes */
proc sql;
connect using ensembl;
create table x as
select *
from connection to ensembl (
show databases;
);
quit;
</pre>
<p>This bug was present in SAS 9.4M2, and it was fixed in SAS 9.4M3.</p><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-57751969354243938352015-02-10T16:07:00.000-07:002015-02-13T10:07:10.042-07:00Autocommit with ceODBC is slow<p>You already know that in Python it is faster to call executemany() than repeatedly calling execute() to INSERT the same number of rows because executemany() avoids rebinding the parameters, but what about the effect of autocommit on performance? While this is probably not specific to ceODBC, using autocommit is astonishingly slow. Here is how slow.</p>
<p>First, the Python code to run the benchmark:</p>
<pre class="prettyprint lang-python">
import ceODBC
import datetime
import os
import time
connection_string="driver=sql server;database=database;server=server;"
print connection_string
conn = None
cursor = None
def init_db():
import ceODBC
global conn
global cursor
conn = ceODBC.connect(connection_string)
cursor = conn.cursor()
def table_exists():
cursor.execute("select count(1) from information_schema.tables where table_name='zzz_ceodbc_test'")
return cursor.fetchone()[0] == 1
def create_table():
print('create_table')
create_sql="""
CREATE TABLE zzz_ceodbc_test (
col1 INT,
col2 VARCHAR(50)
) """
try:
cursor.execute(create_sql)
assert(table_exists())
except:
import traceback
traceback.print_exc()
rows = []
for i in xrange(0,10000):
rows.append((i,'abcd'))
def log_speed(start_time, end_time, records):
elapsed_seconds = end_time - start_time
if elapsed_seconds > 0:
records_second = int(records / elapsed_seconds)
# make elapsed_seconds an integer to shorten the string format
elapsed_str = str(
datetime.timedelta(seconds=int(elapsed_seconds)))
print("{:,} records; {} records/sec; {} elapsed".format(records, records_second, elapsed_str))
else:
print("counter: %i records " % records)
def benchmark(bulk, autocommit):
init_db()
global conn
global cursor
conn.autocommit=True
cursor.execute('truncate table zzz_ceodbc_test')
conn.autocommit = autocommit
insert_sql = 'insert into zzz_ceodbc_test (col1, col2) values (?,?)'
start_time = time.time()
if bulk:
cursor.executemany(insert_sql, rows)
else:
for row in rows:
cursor.execute(insert_sql, row)
conn.commit()
end_time = time.time()
cursor.execute("select count(1) from zzz_ceodbc_test")
assert cursor.fetchone()[0] == len(rows)
log_speed(start_time, end_time, len(rows))
conn.autocommit=True
del cursor
del conn
return end_time - start_time
def benchmark_repeat(bulk, autocommit, repeats=5):
description = "%s, autocommit=%s" % ('bulk' if bulk else 'one at a time', autocommit)
print '\n******* %s' % description
results = []
for x in xrange(0, repeats):
results.append(benchmark(bulk, autocommit))
print results
benchmark_repeat(True, False)
benchmark_repeat(True, True)
benchmark_repeat(False, True)
</pre>
<p>And to graph the results in R:</p>
<pre class="prettyprint lang-r">
results_table <- 'group seconds
bulk_manual 0.6710000038146973
bulk_manual 0.6710000038146973
bulk_manual 0.9830000400543213
bulk_manual 0.7330000400543213
bulk_manual 0.6710000038146973
bulk_auto 8.486999988555908
bulk_auto 8.269000053405762
bulk_auto 8.980999946594238
bulk_auto 8.453999996185303
bulk_auto 8.480999946594238
one_at_a_time 24.391000032424927
one_at_a_time 23.70300006866455
one_at_a_time 71.66299986839294
one_at_a_time 23.58899998664856
one_at_a_time 37.18400001525879'
results <- read.table(textConnection(results_table), header = TRUE)
closeAllConnections()
library(ggplot2)
ggplot(results, aes(group, seconds)) + geom_boxplot()
</pre>
<img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCBHXcV0YSMgmS_kc2Xy8GKQlB1xJKebqaRJnCIP4bpZfZsmsplh6c094lYlAvC2cgbir4tdT-_AFDmNGGIFx3V4sVYX6a4ck_boA_GS1lYgNPVg_9ohhlvl0zc-vP3DFfrDvB0tKhQFB-/s1600/ceodbc_autocommit.png" />
<p><b>Conclusion: executemany() with autocommit is 76% faster than execute(), and executemany() without autocommit is 91% faster than executemany() with autocommit.</b> Also, executemany() gives more consistent performance.</p>
<p>Ran on Windows 7 Pro 64-bit, Python 2.7.9 32-bit, ceODBC 2.0.1, Microsoft SQL Server 11.0 SP1, R 3.1.2.</p>
<div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com2tag:blogger.com,1999:blog-7666066816041190661.post-67287999903663485652015-02-04T14:35:00.001-07:002015-02-04T14:48:42.289-07:00Party like it's 19999 (SAS)<p>On 03OCT2014 I must have missed the party in Cary, NC.</p>
<pre class="prettyprint">
data _null_;
format date date9.;
date = 19999;
put date=;
run;
</pre>
<img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJ3fne8ivdLCUjBVhH3y22HtNdZKIPo77r5pB2LF-fk-fTVq7hUpy-Wz66C8yaVa4VyAiQ5mwaf_2gdZHfCQp5bDZs3opcVDHdaNQ26cojsOkR1X6s-S-XLeC1feNQDaYepanuNsVEVg1m/s1600/prince19999.jpg" />
<p>So the next party is 03JUN2568?</p>
<pre class="prettyprint">
data _null_;
format date date9.;
date = 222222;
put date=;
run;
</pre><div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0tag:blogger.com,1999:blog-7666066816041190661.post-47463425977750388642015-01-29T16:26:00.001-07:002015-03-03T09:27:55.121-07:00LimeSurvey is allergic to Cloudflare Rocket Loader<p>In case you use LimeSurvey with Cloudflare, you may want to disable Rocket Loader, which "automatically asynchronously load all JavaScript resources." In LimeSurvey it causes problem saving questions (the save button does not do anything), disables tooltips for buttons in the administrative interface (so the icons are hard to interpret), and maybe causes other problems.</p>
<p>If you are not sure whether Rocket Loader is enabled, just look at the HTML source. If it is enabled, you will see "rocketloader" in the HTML source.</p>
<p>Cloudflare's Auto Minify seems safe to use.</p>
<p>I tested with LimeSurvey Version 2.05+ Build 141229, Firefox 35, and Google Chrome 40.<div class="blogger-post-footer"><p>For more posts like this, see <a href="http://heuristicandrew.blogspot.com/">Heuristic Andrew</a>.</p></div>Andrew Zhttp://www.blogger.com/profile/10108637160465346326noreply@blogger.com0