Transcript Data Modeling - Hiram College
Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from
Database Systems: An Application Oriented Approach 2ed
by Kifer, Bernstein & Lewis, © Addison Wesley 2005)
SQL • Widely used (only?)
standard
query language for relational databases • Once SEQUEL (Structured English QUEry Language), now Structured Query Language • Objectives – Easy to learn, easy to use – Create and modify the database and query from it • DDL
defines
• DML
manipulates
SQL is Declarative, RA is Procedural • SQL Statements describe the desired results, but do not specify a sequence of operations to get those results • Relational Algebra expressions describe a specific sequence of operations to perform • To evaluate a SQL statement, it needs to be translated into (a computer implementation of) RA first!
Query Processing • Query Processing is the translation of SQL into RA like nested function calls • One query can have multiple translations SELECT roomNo, HotelName FROM Room, Hotel WHERE HotelName = ‘Savoy’ and Room.hotelNo=Hotel.hotelNo; roomNo,HotelName Hotel) ) ( hotelName=Savoy and Room.hotelNo=Hotel.hotelNo (Room x roomNo,HotelName Room) ) ( Room.hotelNo=Hotel.hotelNo ( ( hotelName=Savoy (Hotel)) x
Query Optimization • Choose the translation that minimizes resource use (time, space) • The second translation below is better. (Why?) SELECT roomNo, HotelName FROM Room, Hotel WHERE HotelName = ‘Savoy’ and Room.hotelNo=Hotel.hotelNo; roomNo,HotelName Hotel) ) ( hotelName=Savoy and Room.hotelNo=Hotel.hotelNo (Room x roomNo,HotelName ( ( hotelName=Savoy (Hotel)) Room.hotelNo=Hotel.hotelNo
Room)
Query Processing SQL User Result (table) Decom position Relational Algebra Optimi zation Processing Engine Efficient Rel. Algebra
More Detail of Query Processing
Steps in Query Processing • Query Decomposition (create relational algebra expression) • Query Optimization (create execution plan) • Code Generation • Query Execution
Parts of Optimization • Query Plan Generator – Comes up with viable relational algebra expressions to improve the initial naïve one • Cost Estimator – Estimates the cost (time / space) of each plan • Optimization – Choosing the plan with the lowest cost, or at least “reasonably cheap”
Query Decomposition • Check Syntax • Build a relational algebra tree Room.hotelNo=Hotel.hotelNo
X hotelName=Savoy Hotel Room
Query Transformation (Selection) • Select with multiple AND conditions can be sequence of selects hotelName=Savoy and Room.hotelNo=Hotel.hotelNo
( …) = hotelName=Savoy ( Room.hotelNo=Hotel.hotelNo ( …)) • Order of Select operations doesn’t matter = Room.hotelNo=Hotel.hotelNo ( hotelName=Savoy ( …))
Query Transformation (Projection) • Extra intermediate projections don’t matter Name ( Name, Status (Student))= Name (Student) • Order of select and project doesn’t matter Status=‘SR’ ( Status (Student)) = Status ( Status=‘SR’ (Student))
Query Transformation (Join) • Push Select through Join – Replace a select on a cross-product with a join – Joins can be implemented at the lowest level more efficiently than “materialized cross-product” • Push Select through Product – If the attributes of the condition all belong to one table of the join, put the select on only the one table, so a smaller table is joined – Joins on smaller tables are faster than on larger ones • More rules pp. 640-642
Pushing Select Example • Find all seniors that take CPSC356 stu_id=id & crs=‘CPSC356’ (Student x Transcript) • Separate the selects stu_id=id ( crs=‘CPSC356’ (Student x Transcript)) • Push the inner select stu_id=id (Student x ( crs=‘CPSC356’ (Transcript))) • Replace Select/Product by Join Student |x| stu_id=id ( crs=‘CPSC356’ (Transcript)))
Query Processing Example
Query Processing Example
Execution Plans • Add specific algorithms to each relational algebra step • Determine whether/how indices will be used • Add pipelining (not storing intermediate data) where possible
Choosing Transformations • Estimate cost of each tree based on – Table sizes – Numbers of distinct attribute values – Average number of tuples for selection condition – Methods used for join (e.g. indexed, hashed) • Choose lowest cost tree • Because estimates aren’t perfect, the absolute best tree might not be chosen!
Heuristics (Rules of Thumb) • Perform Selection as early as possible – Unless doing it later lets you use an index • Combine X and Selection into join operation • Execute most restrictive Selections first • Perform Projection as early as possible • Compute common expressions once – Creating a view is a way to do this!
Consequences for SQL Programmer • Using RA operations in SQL (e.g. explicit Join) constrains optimization – Good when “programmer knows best” – Bad when programmer prevents a better optimization • Intermediate tables (i.e. views) can constrain optimization – Use views to compute common subexpressions • When performance is substandard, tweaking the SQL can help! (Remember the heuristics).