No Slide Title

Download Report

Transcript No Slide Title

Publishing SAS metadata
using macros, PROC SQL and
dictionary tables
John Fahey and Barry Campbell
Reproductive Care Program of Nova Scotia
[email protected]
Introduction
• Motivating Scenario
– Health Data Registry
– Denormalization  Wide Data
– Expert, frequent and casual users
– Mind the knowledge gap
– Variables with a history  need for date
information in metadata
Variables with a History
• All DJIA stocks – 30 companies, Stock price – high, low,
close values for every traded company: date of trade,
when this stock became a component
• Market share for NSAIDs – drugs come and go (e.g.
Vioxx)
• Medicaid claims: date of claim, amount
• Climate record – tree rings, ice cores, fossil records,
explicit temperature records – capture stops and starts
depending on medium
• Modelling with deletion of any row with Nas
– available
Gantt Chart
The Problem …
• Meet my monster
• Watch carefully for discharge date
Monster Movie
Ingredient List
• MACRO: Parameter passing & Code Generation
• Dictionary Tables: dynamic, plays well w/ SQL
• PROC SQL: creative, selective, insertive
• intnx function
• Serving Suggestion
Remember The Code
%start_end
(Atlee.monster1,
_num_,
DLAdmsD8,
1,
0,
Work.MetaTabl);
Assumptions
Meet your Macro Parameters
• DataSet Name – a valid SAS dataset name
• VarList – a list of variables in valid SAS
syntax, e.g. Var1 Var2 Var3 | Var1 – Var3
| Var1 -- Var3 | _numeric_ …
• DateVar – the name of the variable from
which you want to want the macro to
generate date ranges
Parameters (cont’d)
• Zero (flag) – pick out zeroes, or SASdefined missing values
• FY – fiscal year (flag) do you want to
assume that data that first appears in the
middle of a fiscal actually started being
available at the beginning?
• RslTbl – Result Table, the name of the
dataset into which the metadata is placed
Dictionary
• The VColumn view from SASHelp
• Source of metadata for all datasets
PROC SQL
Create table using column definitions
create table Work._MSSWork_
(VName char(32), VType char(4),
Vlabel char(256),
FirstD8
num format = YYMMDD10.,
LastD8
num format = YYMMDD10.,
NonMiss
num, NTotal
num,
RowNum
num);
– builds the framework for a dataset
PROC SQL - select :into
• into :hostvariable populates a macro variable
• Concatenates observations into one long delimited string
• &SQLObs “number of rows that were processed by an
SQL procedure statement”
proc sql;
select name into :AllNames separated by '|'
from sashelp.class;
%put &AllNames; %put &SQLObs;
Alfred|Alice|Barbara|Carol|Henry|James|Jane|Janet|Jeffrey|Jon|
Joyce|Judy|Louise|Mary|Philip|Robert|Ronald|Thomas|William
19
PROC SQL – insert tab A in slot B
insert into Work._MSSWork_
select "&ThisVar”, "&ThisType”, "&ThisLbl"
, %if &FY %then intnx ( 'Year.4', min ( &DateVar ), 0 , 'Beginning' );
%else min ( &DateVar );
, %if &FY %then intnx ( 'Year.4', max ( &DateVar ), 0 , 'End' );
%else max ( &DateVar );
, . , . , &I
from Work._Subset_
%if ( ( &Zero ) and ( &ThisType = num ) ) %then %do;
where &ThisVar NE 0 ; %end ;
%else %do;
where not ( missing ( &ThisVar ) ) ; %end;
MACRO oni
%do – iterates over &VarList (string of names)
%scan – picks out the &ith variable name and
type, assigns to &ThisVar &ThisType via %let
%if %then %else – distinguishes between num
and char and also processes flags FY and Zero
Thank God We %do Not Have To TYPE This
insert into Work._MSSWork_ select "Var1" ,
"char" , "" , min (ObsDate) , max (ObsDate) ,
. , . , 1 from Work._Subset_ where not
(missing (Var1)) ;
insert into Work._MSSWork_ select "Var2" , "num"
, "" , min (ObsDate) , max (ObsDate) , . , . ,
2 from Work._Subset_ where Var2 NE 0 ;
insert into Work._MSSWork_ select "Var3" ,
"char" , "" , min (ObsDate) , max (ObsDate) ,
. , . , 3 from Work._Subset_ where not
(missing (Var2)) ;
…
insert into Work._MSSWork_ select "Var1000" ,
"num" , "" , min (ObsDate) , max (ObsDate) , .
, . , 1000 from Work._Subset_ where not
(missing (Var1000)) ;
Meta Monster
Suggested Improvements
• Make the flags per-variable instead of global
• Generalize FY flag to allow rounding dates to
other time units, such as month or day
• Support Datetime as well as Date
• Refactor to avoid potentially long macro
variables, e.g. &VLabels, since SAS has limit of
216 - 2 characters
Resources
• Other talks, e.g. Kirk Lafler’s Sunday Workshop
that is already over and you didn’t go to …
• PROC SQL User’s Guide
• Ron Fehd’s %FreqAll and its offspring
• DaDaBik (www.dadabik.org)
• Macro Quoting Bug Killer:
– …sesug/2008/CS-049.pdf
• See references in the paper
Remember The Code
• Current version of code is “better, faster,
stronger”
• Available with the proceedings or from the
authors
Contact Info + TM
• [email protected][email protected]
TRADEMARK:
SAS ® and all other SAS Institute Inc. product or service names are registered trademarks of SAS Institute Inc. in the USA and other
countries. ® indicates USA registration. Other products and brand names are trademarks or registered trademarks of their respective
owners.