Getting data

Download Report

Transcript Getting data

The first step in data analysis
Learning Objective
Using SAS/BASE® to connect to third-party relational data
base software to extract data needed for
 program evaluation
 research using administrative data
 operational reports e.g. routine surveillance
1. What is a relational database?
2. Contact your DBA for how to connect to your
database(s)?
3. How to write queries using PROC SQL
SHRUG, 2014-05-02
1
What is a relational database?
 Set of tables
 tables made up of rows and columns
 Trade names of relational databases (RDB):
 Oracle, Teradata, SQL Server, DB2, Access
 RDB is software which is designed to retain large amounts
of data
 transactional DB
 reporting/warehousing DB
SHRUG, 2014-05-02
2
What is a relational database?
 Transactional DB designed to increase the speed for front-
end users
 complex table and table join structures
 Warehousing DB designed for efficient storage and
retrieval for reporting
 simpler table designs and table join structures
 Queries for either design use same syntax (code)
 queries for warehouses will be simpler to write
SHRUG, 2014-05-02
3
What is a relational database?
 Why use relational databases?
 relational databases use a concept called “normalization”
 Normalization reduces the amount of redundant data and
allows for updates to data with less error
 There are degrees of normalization



first degree
second degree
third degree and higher degrees
SHRUG, 2014-05-02
4
First degree normalization
 First degree normalization


each row pertains to a single entity: a patient, an encounter, a
physician
each column pertains to a characteristic of the entity: e.g. date of
birth, sex, date of encounter, etc
Table 1: Subjects with demographic information
ID
FirstName
Gender
BirthCity
BirthCountry
0001
John
M
Moncton
Canada
0002
Devbani
F
Kolkata
India
SHRUG, 2014-05-02
5
Violation of first degree normalization
Table 1: Subjects with improper 1NF
SubjID
FirstName
Gender
BirthCity
BirthCountry
0001
John
43
Moncton
New Brunswick
0002
Raha
F
West Bengal
India
 What impact does violating the first degree
normalization have on your query
 if you want all patients born in Canada?
 if you want all male patients?
SHRUG, 2014-05-02
6
Second degree normalization
 Table 2 has employer information about rows in Table 1
Table 2: Business addresses
Name
City
Prov
PostalCode
John
Halifax
NS
B3K 6R8
Devbani
Halifax
NS
B3H 2Y9
 The table above has some redundant information:
 name is repeated from Table 1, province is embedded in
the postal code
 Better design – two or even 3 tables
SHRUG, 2014-05-02
7
Second degree normalization
Table 2: Revised with 2NF
SubjID
PostalCode
0001
B3K 6R8
0002
B3H 2Y9
Table 3: Creating a secondary table for 2NF
PostalCode
City
Prov
B3K 6R8
Halifax
NS
B3H 2Y9
Halifax
NS
SHRUG, 2014-05-02
8
Second degree normalization
 Table 2 now no longer contains name – it’s replaced with
the subject ID
 to get the subject’s name we link the table to the table in
the first example, using SUBJID/ID column
 we get the province and city by linking Table 2 and 3 using
the POSTALCODE column
 SUBJID is a primary key in Tables 1 and 2
 POSTALCODE is a foreign key in Table 2, but a primary key in
Table 3
SHRUG, 2014-05-02
9
Primary/Foreign Keys
 primary key – a column or combination of columns that
uniquely identify each row in the table
 e.g. patient medical record needs at least 3 columns to
identify a unique record: patient ID, date of encounter, and
provider ID
 foreign key – a column or combination of columns that is
used to link data between two tables
SHRUG, 2014-05-02
10
Questions about 2NF?
 Can you see the advantage of splitting the data into
different tables?
 share examples of your data where normalization is used
 higher degrees of normalization work similarly to the
examples above
 you have to go through more tables for higher levels of
normalization in order to link to the data that you need
SHRUG, 2014-05-02
11
Getting access to data: What do
you need from DBA?
 Explain to DBA that you need to query data, but have no
need to write to the database
 this helps them to determine where you belong on a user
matrix
 DBA or IT install necessary software on your machine
 Google has lots of information on SAS Connect
 SAS Connect documentation
SHRUG, 2014-05-02
12
How SAS authenticates
• User name is provided by DBA/IT
• In this example the password is
held in the macro DBPASS
• Statement to have Oracle print
any messages to the SAS log
proc sql;
connect to oracle
(user = <userid>
password="&dbpass”
path = prod );
%put &sqlxmsg;
This is an example of “pass-through” code
SHRUG, 2014-05-02
13
Using a LIBNAME to connect
 Recall that slide 13 showed pass-through facility in SAS
 most of the query is done on the database
 Can use libname statement to connect instead of pass-
through
 advantage to this method is that you are programming in
SAS (using SAS functions and formats)
 SAS determines which program (SAS or RDB) will handle
statements more efficiently
SHRUG, 2014-05-02
14
Using a LIBNAME to connect
Example using a libname statement:
1.
2.
3.
libname onco odbc dsn='Oncolog'
schema=dbo;
1. The name of the library
2. Tells SAS that you are using an ODBC engine
3. DSN – use the name of the database that was used to
set up the odbc connection
NOTE: schema statement is not always required
SHRUG, 2014-05-02
15
Seeing your data - Views
 Once view is created, you use
the EXPLORER tab in SAS and
use as normal dataset
SHRUG, 2014-05-02
16
Seeing your data - Views
Using the “view columns” in SAS EXPLORER
SHRUG, 2014-05-02
17
Seeing your data - Views
• Double click on table to get to see the data
• NOTE: columns that identify personal information have been
removed from this screen shot
SHRUG, 2014-05-02
18
Other ways to view data
 You may have software from the RDB:
 TOAD (for Oracle)
 SQL Developer (for Oracle)
 SQL Server
 Teradata
 All vendors may have some limited function
“development” software that allows:
 Viewing data
 Viewing the “type” of a column: char, num, date, etc.
 Writing SQL queries
SHRUG, 2014-05-02
19
Sample view from SQL Developer
SHRUG, 2014-05-02
20
Syntax: Single table - 1 of 2
PROC SQL
DATA STEP
proc sql;
create <table/view> <name>
as
select <var1>
, <var2>
, etc
from <table/view>
where <apply data filters>
quit;
data <dataset name>;
set <dataset> (
keep= <list of variables>
where=(<apply filters>));
run;
Example: Create a dataset (table) with men aged 50 to 74.
Assume the source table is called “demographics” and
contains variables: subjectID, age and sex
SHRUG, 2014-05-02
21
Syntax: Single table – 2 of 2
PROC SQL
proc sql;
create table
men5074 as
select subjectID
, age
from
work.demographics
where sex=‘M’
and age
between 50 and 74
;
quit;
SHRUG, 2014-05-02
DATA STEP
data men5074
(drop=sex);
set
work.demographics
(keep=subjectid sex
age
where=(sex='M' and
50<=age<=74));
run;
22
Syntax: Multiple tables
Request for report received
Realize as you look through data elements needed for
complete the request that relevant columns reside in two
or more tables
Background Information
1. Need to know which tables and which columns are
relevant. Useful to have data dictionary, otherwise 3rd
party software very helpful
2. Need to know what filters to apply: sex, time period of
interest, diagnosis codes, etc are all commonly applied
filters
SHRUG, 2014-05-02
23
Map: Multiple tables
• Create a map to guide your query
• names of tables that go in ‘FROM’ statement of
SQL or ‘SET’ statement in DATA step
• names of columns that you need
• use meaningful arrows to connect
work. table1
work.table2
uniqueID
uniqueID
filter 1
filter
filter 2
<other columns>
SHRUG, 2014-05-02
<other columns>
24
Syntax: Multiple table DATA step –
1 of 2
*** sort the first dataset;
proc sort data=<dataset1>;
by <var(s)>;
run;
*** sort the second dataset;
proc sort data=<dataset2>;
by <var(s)>; *** same var(s) as first sort;
run;
SHRUG, 2014-05-02
25
Syntax: Multiple table DATA step 2 of 2
*** find records common to both tables;
data <result dataset>;
merge <dataset1> (in=in_a)
<dataset2> (in=in_b);
by <var(s)>;
*** we only want a list of records with
data in table a AND in table b;
if in_a and in_b;
run;
SHRUG, 2014-05-02
26
Syntax: Multiple table PROC SQL using
temporary tables
proc sql feedback;
create table <result table name> as
select <columns from either or both tables below>
from
*** temporary table from table1;
(select <column(s)>
from <table 1>
where <apply filter(s)>) a
inner join
*** temporary table from table2;
(select <column(s)>
from <table 2>
where <apply filter(s)>) b
on a.<pk>=b.<pk>
;
quit;
SHRUG, 2014-05-02
27
Syntax: Multiple table PROC SQL. “Oraclestyle (PL/SQL)”
proc sql feedback;
create table <result table> as
select <columns from one or more tables>
from <table1> a
, <table2> b
where a.pk=b.pk
<apply additional filters>
;
quit;
SHRUG, 2014-05-02
28
Types of joins
RIGHT join – join Table A to B only if an
observation exists in Table B
SHRUG, 2014-05-02
29
Compare syntax
PROC SQL
DATA STEP
 no need to sort temporary
 needed to sort data by
tables
 needed to think about type of
join
 in this case wanted patients
only if they were in both tables
 join columns need to be same
type but can have different
names (slides 6 and 9)
subjectID
 key variable to join
demographic to cancersite
table
 “by” variables need to have
same name and type
What would you do if you found out that one record in table 1 matched to multiple
records in table 2?
SHRUG, 2014-05-02
30
Issues arising from multiple table
queries
 Table relationships are important:
 one-to-one: each record in first table has a maximum of
one record in the second table (through primary key)
 one-to-many: each record in one table may have multiple
rows in second table. Example:


Table 1 contains all patients with a flag indicating whether or not
they are “active”
Table 2 contains all GP appointments for each patient
 many-to-many
SHRUG, 2014-05-02
31
Task 1 - single table
Task 2 – two tables
Task 3 – multiple tables
Task 4 – reusing a table multiple times
SHRUG, 2014-05-02
32
Task 1a - Participants
You are asked to provide a count of the female participants
in a cancer screening program who are aged 50 years as of
May 31, 2013. Break down the birth dates by month
Approach 1
Create view of the table required and use SAS to analyze
data
SHRUG, 2014-05-02
33
Task 1a – background information
 demographic data for participants is stored in table
“PARTICIPANTS”
 sex_cd is a coded variable: 222=F, 223=M, 240=U
 birth_dt is the column containing birth dates
 although birth_dt appears as a date type column, in SAS
Oracle dates are datetime types in SAS
 For a participant to be considered 50 years of age on May
31, 2013, their birthday must occur between June 1, 1962
and May 31, 1963
SHRUG, 2014-05-02
34
Task 1a – Create view using pass-through
code
proc sql feedback noprint;
connect to oracle as myconn (user=&userid password=&pw
path=&path);
create view participant as
select *
from connection to myconn
(select *
from csprod.participant
where sex_cd=222
and trunc(birth_dt) between
to_date('19620601','YYYYMMDD‘) and
to_date('19630501','YYYYMMDD‘)
and del_dt is null
);
disconnect from myconn;
quit;
SHRUG, 2014-05-02
35
Task 1a – Create view: Parsing the
code. Slide 1 of 3
 Create view participant
 this syntax translates to “Create a view called ‘participant’
 select *
 ‘*’ is a wildcard and means select all
 where
 “%”, “_” – see Task 1b

multiple (%) or single (_) byte of data, in contrast to the entire
column. Only used to scan a column.
SHRUG, 2014-05-02
36
Task 1a – Create view: Parsing the
code. Slide 2 of 3
 trunc()
 recall that SAS will treat Oracle, Teradata, SQL dates as
DATETIME
 trunc() is an Oracle function that looks only at the DATE
part of the column
SHRUG, 2014-05-02
37
Task 1a – Create view: Parsing the
code. Slide 3 of 3
 to_date(‘<yourdate>’, ‘<yourdate format>’)
 in this example to_date(‘19620601’, ‘YYYYMMDD’)
 take the string 19620601 and treat it as a date with the
format YYYYMMDD
 could use other formats: YYYYMONDD, MM-DD-YYYY, etc
 BETWEEN operator – works as you would expect, includes
both lower and upper limits specified
SHRUG, 2014-05-02
38
Task 1a – Contents of view
#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Variable
Variables in Creation Order
Type Len Flags Format
Informat
Label
PARTICIPANT_ID
Num
8 P--
FIRST_NAME
Char
LAST_NAME
Char
SEX_CD
Num
BIRTH_DT
21.
21.
PARTICIPANT_ID
100 P--
$100.
$100.
FIRST_NAME
100 P--
$100.
$100.
LAST_NAME
8 P--
21.
21.
SEX_CD
Num
8 P--
DATETIME20.
DATETIME20.
BIRTH_DT
MIDDLE_NAME
Char
100 P--
$100.
$100.
MIDDLE_NAME
PRFD_NAME
Char
20 P--
$20.
$20.
PRFD_NAME
PRFD_LANG_CD
Num
8 P--
21.
21.
PRFD_LANG_CD
NO_FUTURE_CONTACT_IND
Char
1 P--
$1.
$1.
NO_FUTURE_CONTACT_IND
EMAIL
Char
100 P--
$100.
$100.
EMAIL
PRFD_CONTACT_METHOD_CD
Num
8 P--
21.
21.
PRFD_CONTACT_METHOD_CD
PART_STATUS_CD
Num
8 P--
21.
21.
PART_STATUS_CD
STATUS_DT
Num
8 P--
DATETIME20.
DATETIME20.
STATUS_DT
STATUS_SOURCE_CD
Num
8 P--
21.
21.
STATUS_SOURCE_CD
TWIN_IND
Char
1 P--
$1.
$1.
TWIN_IND
COMMENT_TXT
Char
200 P--
$200.
$200.
COMMENT_TXT
ROW_DT
Num
8 P--
DATETIME20.
DATETIME20.
ROW_DT
ROW_USER_ID
Char
20 P--
$20.
$20.
ROW_USER_ID
DEL_DT
Num
8 P--
DATETIME20.
DATETIME20.
DEL_DT
DEL_USER_ID
Char
20 P--
$20.
$20.
DEL_USER_ID
DEL_REASON_CD
Num
8 P--
21.
21.
DEL_REASON_CD
VERSION_NUM
Num
8 P--
21.
21.
VERSION_NUM
RELIGION_CD
Num
8 P--
21.
21.
RELIGION_CD
SHRUG, 2014-05-02
NOTE: The
contents show
exactly the
same columns
as slide 31
39
Task 1a – Count of birth month
dob Frequency Percent
June
1184
7.56
July
1207
7.71
August
1241
7.92
September
1232
7.87
October
1232
7.87
November
1104
7.05
December
1053
6.72
January
3657
23.35
February
1113
7.11
March
1327
8.47
April
1266
8.08
May
45
0.29
Cumulative
Frequency
1184
2391
3632
4864
6096
7200
8253
11910
13023
14350
15616
15661
Cumulative
Percent
7.56
15.27
23.19
31.06
38.92
45.97
52.70
76.05
83.16
91.63
99.71
100.00
• More detailed analysis uncovered that missing month/day combinations were
defaulted to January 1
SHRUG, 2014-05-02
40
Task 1b - Like operator (1 of 4)
REQUEST
Find a list of CCI codes for hysterectomy
 use of single table
 example of PROC SQL with SAS data
 filter using “like” operator
SHRUG, 2014-05-02
41
Task 1b – Like operator (2 of 4)
SHRUG, 2014-05-02
42
Task 1b – Like operator (3 of 4)
proc sql feedback;
create table hystcd as
select a.f1 as cci_code
, a.f3 as long_desc
, substr(a.f1,1,5) as rubric
from work.cci_raw as a
where upcase(a.f3) like '%HYSTERECTOMY%'
or (upcase(a.f3) like '%EXCISION%'
and upcase(a.f3) like '%UTERUS%')
or (upcase(a.f3) like '%EXCISION%'
and upcase(a.f3) like '%CERVIX%')
order by a.f1
;
quit;
SHRUG, 2014-05-02
43
Task 1b – Like operator (4 of 4)
RESULTING DATASET
SHRUG, 2014-05-02
44
Task 2 – Participants in the CRC
program
 Report the number of men and women who turned 60 as
of May 31, 2013, enrolled in the colorectal cancer
screening program. Do not include participants with
unknown sex
 participant table contains demographic information: sex,
birth date
 participant_program table contains data for participants
and screening program


program_id=1 indicates colorectal cancer screening program
program_status_cd=263 indicates that a participant is active in
the program
SHRUG, 2014-05-02
45
Task 2 – Mapping your query
CSPROD.PARTICIPANT
participant_id
sex_cd ≠ 240
birth_dt between 01Jun1952
and 31May1953
CSPROD.PARTICIPANT_PROGRAM
participant_id
program_status_cd=263
program_id=1
del_dt is null
del_dt is null
SHRUG, 2014-05-02
46
Translating your map into sql code
- 1 of 3
*** METHOD 2 - Oracle pass through. Simple code;
proc sql feedback noprint;
connect to oracle as myconn (user=&userid
password=&pw path=&path);
create table part60 as
select *
from connection to myconn
(select ptc.gender
, count(*)
from (select participant_id
, sex_cd
SHRUG, 2014-05-02
47
Translating your map into sql code
– 2 of 3
*** METHOD 2 - Oracle pass through. Simple code;
, case
when sex_cd=222 then 'F'
else 'M'
end as gender
from csprod.participant
where trunc(birth_dt) between
to_date('19520601','YYYYMMDD') and
to_date('19530531','YYYYMMDD')
and sex_cd <> 240
and del_dt is null) ptc
SHRUG, 2014-05-02
48
Translating your map into sql code
– 3 of 3
inner join
(select participant_id
from csprod.participant_program
where program_id=1
and program_status_cd=263
and del_dt is null) pp
on ptc.participant_id=pp.participant_id
group by ptc.gender
;
disconnect from myconn;
quit;
SHRUG, 2014-05-02
49
Parsing Code – 1 of 3
proc sql feedback
noprint;
connect to oracle
 Create a SAS dataset
as myconn
called “part60”
(user=&userid
password=&pw
 select all columns from
path=&path);
the query (seen after
create table
connection statement)
part60 as
select *
from connection to
myconn
SHRUG, 2014-05-02
50
Parsing the code - 2 of 3
(select ptc.gender
, count(*)
from (select participant_id
, sex_cd
, case
when sex_cd=222
then 'F'
else 'M'
end as gender
from csprod.participant
where trunc(birth_dt)
between
to_date('19520601','Y
YYYMMDD') and
to_date('19530531','Y
YYYMMDD‘)
and sex_cd <> 240
and del_dt is null)
ptc
SHRUG, 2014-05-02
Put these columns in
the SAS dataset part60
Create a
temporary table
called ‘ptc’
Table PTC
contains
columns as
listed
from the
PARTICIPANT
table, with the
restrictions
shown in the
WHERE clause
51
Parsing the code – 3 of 3
inner join
(select participant_id
from csprod.participant_program
where program_id=1
and
program_status_cd=263
and del_dt is null)pp
on
ptc.participant_id=
pp.participant_id
group by ptc.gender
;
disconnect from myconn;
quit;
SHRUG, 2014-05-02
Create temporary
table, ‘PP’ from
PARTICIPANT_PROGRAM
with restrictions
defined in the
WHERE clause
52
Joins – for joining two or more
tables
 This example shows an inner join: want participants, and
the # males and females participating in CRC screening
program age 60 as of May 31, 2013
PTC
C
SHRUG, 2014-05-02
PP
• Area C is the result of the
inner join
• Temporary table PTC: a
subset of
csprod.participant
• Temporary table PP: a
subset of
csprod.part_program
53
Task 2 - Results
 What will be the query result?
 What’s the table/dataset name?
 How many rows?
 How many columns?

What are the columns called?
SHRUG, 2014-05-02
54
Task 2 - Results
SHRUG, 2014-05-02
55
Task 3 – Patients with kidney cancer
REQUEST
 Find number of patients with invasive kidney cancer (ICD-O3=C64.9) diagnosed between 2008 and 2010. Breakdown
counts by age and sex. Interested in age < 60 and age ≥ 60
BACKGROUND
 remove any patients who were deleted
 remove any tumors that were deleted
 diagnoses are in table called “oldiagnostic”
 sex is in table called “olpatient”
 birth date in table called “person”
SHRUG, 2014-05-02
56
Task 3 - Map
onco.oldiagnostic
personser ≠ deleted
diagnosticser ≠ deleted
onco.olpatient
personser
olsex
dxstate=‘NS’
substr(icdohistocode,6,1)='3'
year(dateinitial..) in (2008,
2009, 2010)
onco.person
personser
persontype=‘patient’
datepart(dateofbirth)
SHRUG, 2014-05-02
57
Task 3 – Code (1 of 5)
proc sql feedback;
create table onco_coh as
select a.*
, b.olsex
, f.birth_dt
, floor(yrdif(f.birth_dt,a.initdx_dt,'act/act'))
as ageatdx
from
SHRUG, 2014-05-02
58
Task 3 – Code (2 of 5)
/*** get cases ***/
(select o.personser
, o.diagnosticser
, datepart(o.DateInitialDiagnosis) as initdx_dt
format=date9.
, o.icdositecode
from onco.oldiagnostic o
where o.icdositecode in ('C64.9')
/*** only invasive cancers ***/
and substr(o.icdohistocode,6,1)='3'
and year(o.dateinitialdiagnosis) between 2008
and 2010
and o.dxstate='NS’
SHRUG, 2014-05-02
59
Task 3 – Code (3 of 5)
/*** patient not deleted ***/
and o.personser not in
(SELECT ps1.Personser
FROM onco.OlPatientSup ps1
WHERE ps1.PersonSer = o.PersonSer
and ps1.identifier = 'CCRPatientReportingStatu'
AND ps1.String IN ('04','05')
and ps1.FieldSeq = 0)
SHRUG, 2014-05-02
60
Task 3 – Code (4 of 5)
/*** diagnosis not deleted ***/
and o.diagnosticser not in
(SELECT ds1.diagnosticser
FROM onco.OLdiagnosticsup ds1
WHERE ds1.PersonSer = o.PersonSer
and o.diagnosticser = ds1.diagnosticser
and ds1.identifier = 'CCRPrimaryReportingStatu'
AND ds1.String IN ('04','05')
and ds1.FieldSeq = 0)) a
SHRUG, 2014-05-02
61
Task 3 – Code (5 of 5)
/*** get patient's sex ***/
left join
(select personser
, olsex
from onco.olpatient) b
on a.personser=b.personser
/*** get birth date ***/
left join
(select personser
, datepart(DateOfBirth) as birth_dt format=date9.
from onco.person
where lowcase(persontype)='patient') f
on a.personser=f.personser
;
quit;
SHRUG, 2014-05-02
62
Task 3 - Results
Sex
Age at diagnosis
Under 60
60 and older
Total
M
128
247
375
F
76
147
223
204
394
598
Total
SHRUG, 2014-05-02
63
• Self-join
• Correlated sub-query
• Outer from and where
• UNION
SHRUG, 2014-05-02
64
What is the sound of one table joining?
77 /* select candidates for babes becoming mothers */
78
proc
sql
79
;
80
create table
Candidates as
81
select B1.BrthDate
82
, B1.BirthID
83
, B2.DLMBDate
84
, B2.ContctID
85
from SASDM.DelnBrth as B1 /* babes */
86
, SASDM.DelnBrth as B2 /* mums */
87
where
B1.BrthDate = B2.DLMBDate;
NOTE: Table WORK.CANDIDATES created, with 855040 rows and 4
columns.
SHRUG, 2014-05-02
65
Correlated Sub-query
 OB/Research has data in:
 Clinical ultrasound db
 Maternal serum screening db
 Objective: find all mothers with abnormal screening and
see if the ultrasound indicated risk for restricted growth
(small baby)
SHRUG, 2014-05-02
66
Correlated Sub-Query
create table
Work.WithAtlee as
/* VP data only available after 2003 not
2000 */
select
One18.*
/* 18-wk US */
, M.MO365/*perinatal data*/
, M.Wgt4Age
/* 45 lines omitted here */
SHRUG, 2014-05-02
67
The Sub-Query
, M.DLPrvNND
, M.DLPrvFTD /* no such variable as IUGR in
*/
, M.MotherID in /* previous pregnancy - back link */
( select
Prev.MotherID
from
SASDM.DelnBrth as Prev
where
Prev.MotherID = M.MotherID
and Prev.Wgt4Age in ( 1, 2)/* pick<5th*/
and Prev.BrthDate < M.BrthDate )
as Previous_IUGR
, M.DLPrvLBW
, M.Prev_PTD
SHRUG, 2014-05-02
68
Outer From and Where
from
Work.VP1USper18 as One18
, SASDM.Monster as M /*inner join-only want */
where
(One18.ContctID = M.ContctID)
/* matches, in both databases, and linkable */
and ( M.DLDschD8 between
'01Oct2003'D and '30Sep2008'D ) )
/* 5 full years, 10/2003 on */
and ( not Major_Anom );
/* clarified this as complete
exclusion */
SHRUG, 2014-05-02
69
For the UNION makes us strong
create table Work.NoPlace as
select VS.*
, . as GA
, . as CountyNum
from
Work.VS_2302 as VS
where
VS.VS_Deaths_ID not in
( select
VS_Deaths_ID
from
Work.FADBPlace
union
select
VS_Deaths_ID
from
Work.NSAPDPlace )
and ( VS.Age_Code in ( 1, 2 ) )
and (BrthD8 between '01Jan2010'D and '31Dec2010'D);
SHRUG, 2014-05-02
70
Review code, functions, operators
Recap learning objectives
SHRUG, 2014-05-02
71
Code/Functions/Operators
 proc sql skeleton
 libname to connect to database
 pass-through to connect to database
 trunc(): an Oracle function
 to_date(): an Oracle function
 case-when: SQL statements
 is null/is not null
 wildcards: “*”, “_”
SHRUG, 2014-05-02
72
Recap of objectives
RDB
Connecting in SAS
PROC SQL
tables
ODBC
compare to datastep
normalization
Oracle client server
single table
keys: primary/foreign
info from IT
multiple tables
joins
connecting: libname vs
pass-through
self-joins
http://support.sas.com/documentation/cdl/en/acreldb/65247/PDF/default/acreldb.pdf
SHRUG, 2014-05-02
73
John Fahey
Biostatistician, Halifax
[email protected]
Devbani Raha
Cancer Care Nova Scotia
[email protected]
74
Photograph taken by Ken DeBacker, 2011
SHRUG, 2014-05-02