Data Modeling - Hiram College

Download Report

Transcript Data Modeling - Hiram College

Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from

Database Systems: An Application Oriented Approach 2ed

by Kifer, Bernstein & Lewis, © Addison Wesley 2005)

SQL • Widely used (only?)

standard

query language for relational databases • Once SEQUEL (Structured English QUEry Language), now Structured Query Language • Objectives – Easy to learn, easy to use – Create and modify the database and query from it • DDL

defines

• DML

manipulates

SQL is Declarative, RA is Procedural • SQL Statements describe the desired results, but do not specify a sequence of operations to get those results • Relational Algebra expressions describe a specific sequence of operations to perform • To evaluate a SQL statement, it needs to be translated into (a computer implementation of) RA first!

Query Processing • Query Processing is the translation of SQL into RA like nested function calls • One query can have multiple translations SELECT roomNo, HotelName FROM Room, Hotel WHERE HotelName = ‘Savoy’ and Room.hotelNo=Hotel.hotelNo;  roomNo,HotelName Hotel) ) (  hotelName=Savoy and Room.hotelNo=Hotel.hotelNo (Room x  roomNo,HotelName Room) ) (  Room.hotelNo=Hotel.hotelNo ( (  hotelName=Savoy (Hotel)) x

Query Optimization • Choose the translation that minimizes resource use (time, space) • The second translation below is better. (Why?) SELECT roomNo, HotelName FROM Room, Hotel WHERE HotelName = ‘Savoy’ and Room.hotelNo=Hotel.hotelNo;  roomNo,HotelName Hotel) ) (  hotelName=Savoy and Room.hotelNo=Hotel.hotelNo (Room x  roomNo,HotelName ( (  hotelName=Savoy (Hotel))  Room.hotelNo=Hotel.hotelNo

Room)

Query Processing SQL User Result (table) Decom position Relational Algebra Optimi zation Processing Engine Efficient Rel. Algebra

More Detail of Query Processing

Steps in Query Processing • Query Decomposition (create relational algebra expression) • Query Optimization (create execution plan) • Code Generation • Query Execution

Parts of Optimization • Query Plan Generator – Comes up with viable relational algebra expressions to improve the initial naïve one • Cost Estimator – Estimates the cost (time / space) of each plan • Optimization – Choosing the plan with the lowest cost, or at least “reasonably cheap”

Query Decomposition • Check Syntax • Build a relational algebra tree  Room.hotelNo=Hotel.hotelNo

X  hotelName=Savoy Hotel Room

Query Transformation (Selection) • Select with multiple AND conditions can be sequence of selects  hotelName=Savoy and Room.hotelNo=Hotel.hotelNo

( …) =  hotelName=Savoy (  Room.hotelNo=Hotel.hotelNo ( …)) • Order of Select operations doesn’t matter =  Room.hotelNo=Hotel.hotelNo (  hotelName=Savoy ( …))

Query Transformation (Projection) • Extra intermediate projections don’t matter  Name (  Name, Status (Student))=  Name (Student) • Order of select and project doesn’t matter  Status=‘SR’ (  Status (Student)) =  Status (  Status=‘SR’ (Student))

Query Transformation (Join) • Push Select through Join – Replace a select on a cross-product with a join – Joins can be implemented at the lowest level more efficiently than “materialized cross-product” • Push Select through Product – If the attributes of the condition all belong to one table of the join, put the select on only the one table, so a smaller table is joined – Joins on smaller tables are faster than on larger ones • More rules pp. 640-642

Pushing Select Example • Find all seniors that take CPSC356  stu_id=id & crs=‘CPSC356’ (Student x Transcript) • Separate the selects  stu_id=id (  crs=‘CPSC356’ (Student x Transcript)) • Push the inner select  stu_id=id (Student x (  crs=‘CPSC356’ (Transcript))) • Replace Select/Product by Join Student |x| stu_id=id (  crs=‘CPSC356’ (Transcript)))

Query Processing Example

Query Processing Example

Execution Plans • Add specific algorithms to each relational algebra step • Determine whether/how indices will be used • Add pipelining (not storing intermediate data) where possible

Choosing Transformations • Estimate cost of each tree based on – Table sizes – Numbers of distinct attribute values – Average number of tuples for selection condition – Methods used for join (e.g. indexed, hashed) • Choose lowest cost tree • Because estimates aren’t perfect, the absolute best tree might not be chosen!

Heuristics (Rules of Thumb) • Perform Selection as early as possible – Unless doing it later lets you use an index • Combine X and Selection into join operation • Execute most restrictive Selections first • Perform Projection as early as possible • Compute common expressions once – Creating a view is a way to do this!

Consequences for SQL Programmer • Using RA operations in SQL (e.g. explicit Join) constrains optimization – Good when “programmer knows best” – Bad when programmer prevents a better optimization • Intermediate tables (i.e. views) can constrain optimization – Use views to compute common subexpressions • When performance is substandard, tweaking the SQL can help! (Remember the heuristics).