Correlated Subqueries in PROC SQL

Download Report

Transcript Correlated Subqueries in PROC SQL

Correlated Subqueries
in PROC SQL
Barry Campbell
Reproductive Care Program of Nova Scotia
Overview
• Quick review of PROC SQL
• Subqueries and examples
• Correlated subqueries and examples
Quick review of PROC SQL
•
•
•
•
SQL: Structured Query Language
Talk to relational databases
PROC SQL is SAS’s implementation SQL
Integrates elements of SAS language: e.g.,
functions, formats, labels
• Alternative to Data Step when you want to
think relationally
Terminology
PROC SQL Syntax
What is a subquery?
• SELECT statement embedded in a SQL query
instead of a column name, table name or
expression
• Use in SELECT, FROM, WHERE, or HAVING
clauses
• Scalar or vector depending on context
• Can refer to same or different table
• AKA “Nested” or “Inner” queries
Subquery in SELECT clause
SELECT
student_id,
(SELECT COUNT(*) FROM courses
WHERE student_id = S.student_id) num_courses,
(SELECT MAX(grade) FROM grades
WHERE student_id = S.student_id) best_grade
FROM
students S
Pull number of courses and best
grade from other tables. Could
also be done with a JOIN.
Subquery in FROM clause
Create inline table C to join
with A and B using IDs
A
(admits)
B
(patients)
C
(DISTINCT
doctors)
Subquery in WHERE clause
Systolic blood pressure
What is a correlated subquery
• Results of inner query constrained by outer
• Uses a common variable or key to correlate
inner and outer
• Typically in the WHERE clause
What’s it good for?
• Selection depends on aggregate results from
the same table
• Selection depends on data about same entity
in a different table
• Combine multiple steps into one
• Pass-through queries (execute on DBMS)
What’s it good for? Examples
• Improving grades:
– Show students whose average grade improved at
least 10% over last year’s
• Flooding events:
– List cities and dates on which rainfall was at least
10x the city’s daily average
Correlated Subquery Example
SELECT
student_id, name
FROM
students S
WHERE
(SELECT MEAN(grade) FROM grades
WHERE student_id = S.student_id AND year = 2011) >
(SELECT MEAN(grade) FROM grades
WHERE student_id = S.student_id AND year = 2010) * 1.1
Who improved their
average by at least 10%?
Correlated Subquery Example
SELECT
city_id, rainfall_mm, event_date
FROM
rainfall_data R
WHERE
rainfall_mm >
(SELECT MEAN(rainfall_mm) * 10
FROM rainfall_data
WHERE city_id = R.city_id)
Where and when was the
heavy rain?
Correlation: inner
to outer
Correlated Subquery Example
From patient visit registry, list all
patients and the maximum amount
charged for each one
Correlated Subquery Example
Which patients had vital signs
taken more than once in the
same visit?
Correlated Subquery Example
Show me profits on CPUs
from top performing
suppliers
Correlated Subquery Example
Build dataset from X but
exclude observations found in Y
Summary
• Subqueries and correlated subqueries:
compact way to write complex data
manipulation.
• Combine selection with aggregation
• Think relationally rather than procedurally.
• Good way to improve understanding of
relationships among database tables
Resources
•
PROC SQL for DATA Step Die-Hards, Christianna S. Williams
http://www2.sas.com/proceedings/forum2008/185-2008.pdf
Various data manipulation tasks using Data Step and SQL in an increasingly complex
series of examples, including sub-queries and correlated sub-queries.
•
Advanced SQL Processing, Destiny Corporation
http://www.nesug.org/proceedings/nesug02/hw/hw007.pdf
Advanced topics in PROC SQL including HAVING, FULL JOINs and creation of Views,
Indexes, and Data sets. Later examples of sub-queries and correlated sub-queries.
•
Working With Subquery in the SQL Procedure Lei Zhang, Domain Solutions Corp
http://www.nesug.org/proceedings/nesug98/dbas/p005.pdf - Advanced examples of
subqueries and correlated subqueries in all clauses of the SQL statement.
•
An Animated Guide: Knowing SQL Internal Processes makes SQL Easy - Russ Lavery
http://www.phuse.eu/download.aspx?type=cms&docID=597
A graphical representation of the SQL process and some rules for describing/predicting the SQL
process. Detailed examples with good explanations of pros and cons of subqueries.
•
http://beginner-sql-tutorial.com/sql-subquery.htm