Transcript Document

SQL

• SQL is a very-high-level language, in which the programmer is able to avoid specifying a lot of data-manipulation details that would be necessary in languages like C++.

• What makes SQL viable is that its queries are “optimized” quite well, yielding efficient query executions.

Select-From-Where Statements

• The principal form of a query is:

SELECT

desired attributes

FROM

one or more tables

WHERE

condition about tuples of the tables

Our Running Example

• Our

SQL

schema.

queries will be based on the following database

Movie

(title, year, length, inColor, studioName, producerC)

StarsIn

(movieTitle, movieYear, starName)

MovieStar

(name, address, gender, birthdate)

MovieExec

(name, address, cert#, netWorth)

Studio

(name, address, cert#, netWorth) Find all movies produced by Disney Studios in 1990.

SELECT * FROM Movie WHERE studioName = 'Disney' AND year = 1990;

Meaning of Single-Relation Query

• Begin with the relation in the

FROM

clause.

• Apply the selection indicated by the

WHERE

clause.

• Apply the extended projection indicated by the

SELECT

clause.

(Extended) Projection in SQL

SELECT title, length FROM Movie WHERE studioName = 'Disney' AND year = 1990; SELECT title AS name, length AS duration FROM Movie WHERE studioName = 'Disney' AND year = 1990; SELECT title AS name, length*0.016667 AS lenghtInHours FROM Movie WHERE studioName = 'Disney' AND year = 1990; SELECT title AS name, length/60 AS length, 'hrs.' AS inHours FROM Movie WHERE studioName = 'Disney' AND year = 1990;

Selection in SQL

• The selection of the relational algebra is available through the WHERE clause of SQL. • We may build expressions by using the operators:

= <> < > <= >=

• The string constants are surrounded by single quotes . – studioName = 'Disney' • Numeric constants are for e.g.: -12.34, 1.23E45

• Boolean operators are:

AND

,

OR

,

NOT

.

SELECT title FROM Movie WHERE (year > 1970) AND NOT (inColor='C');

Selection in SQL (Cont.)

• Which Disney movies are after 1970 or have length greater than 90 mins?

SELECT title FROM Movie WHERE (year > 1970 OR length < 90) AND studioName='Disney'; • The parenthesis are needed because the precedence of OR is less than that of AND.

Comparision of strings

• Strings can as well be compared (lexicographically) with the same operators:

= <> < > <= >=

• For instance ‘fodder’<‘foo’ ‘bar’ < ‘bargain’

Patterns

WHERE clauses can have conditions in which a string is compared with a pattern, to see if it matches.

• General form:

LIKE

• •

NOT LIKE

Pattern is a quoted string with

%

= “any string”

_

= “any character.”

Examples

SELECT title FROM Movie . Suppose we remember a movie “Star WHERE title LIKE 'Star %';

something

”.

SELECT title FROM Movie WHERE title LIKE '%''s%'; Two consecutive apostrophes in a string represent itself and not the end of the string.

Comparison of Strings (Continued)

• What if the pattern we wish to use in a

LIKE

expression involves the characters

%

or

_

? • We should “escape” their special meaning proceeding them by some escape character. • • In UNIX and C we use backslash \ as the escape character.

SQL

allows us to use any character we like.

• s

LIKE

'x%%x%%'

ESCAPE

'x' –

x

will be the escape character.

– A string that is matched by this pattern is for example: %aaaa% SELECT title FROM Movie WHERE title

LIKE

'x%%x%%'

ESCAPE

'x';

Ordering the Input

• We may ask the tuples produced by a query to be presented in sorted order. •

ORDER BY

Example

. Find the Disney movies of 1990.

Movie(

title, year, length, inColor, studioName, producerC#

)

– To get the movies listed by length, shortest first, and among movies of equal length, sort alphabetically:

SELECT

*

FROM

Movie

WHERE

studioName = 'Disney'

ORDER BY

length, title; • Ordering is ascending, unless you specify the DESC keyword to

an attribute

.

• Ties are broken by the second attribute on the ORDER BY list, etc.

NULL Values

• Tuples in SQL relations can have NULL as a value for one or more components.

• Meaning depends on context. Two common cases: –

Missing value

: e.g., we know the

length

some value, but we don’t know what it is.

has –

Inapplicable

: e.g., the value of attribute

spouse

for an unmarried person.

Comparing NULL’s to Values

• The logic of conditions in SQL is really 3 valued logic: TRUE , FALSE , UNKNOWN .

• When any value is compared with NULL , the truth value is UNKNOWN .

• But a query only produces a tuple in the answer if its truth value for the WHERE clause is TRUE (not FALSE or UNKNOWN).

Three-Valued Logic

• To understand how AND, OR, and NOT work in 3-valued logic, think of – TRUE = 1, FALSE = 0, and UNKNOWN = ½.

• • •

AND = MIN OR = MAX NOT(x) = 1-x

• Example: TRUE

AND

(FALSE

OR NOT

(UNKNOWN)) = MIN(1, MAX(0, (1 - ½ ))) = MIN(1, MAX(0, ½ )) = MIN(1, ½ ) = ½.

Surprising Example

SELECT

*

FROM

Movie

WHERE

length <=120

OR

length > 120; • Suppose that we have some NULL values in the length.

• What’s the result?

Products and Joins in SQL

• SQL has a simple way to couple relations in one query: list each relation in the FROM clause.

– All the relations in the FROM clause are coupled through Cartesian product – Then we can put conditions in the WHERE clause in order to get the various kinds of join.

Example

. We want to know the name of the producer of

Star Wars

. • To answer we need the information from both of the relations: – –

Movie(

title, year, length, inColor, studioName, producerC

) MovieExec

(name, address, cert, netWorth) SELECT name FROM Movie, MovieExec WHERE title = 'Star Wars' AND producerC = cert;

Disambiguating Attributes

• When we involve

two or more

relations in a query, we can have attributes with the

same

name among these relations. • We solve the problem of

disambiguating

between them by putting the name of the relation followed by a dot and then the name of the attribute.

• Example. Suppose we wish to find pairs (star, movie executive) living in the same address.

– –

MovieStar(

name, address, gender, birthdate

) MovieExec

(name, address, cert, netWorth)

SELECT

MovieStar.name, MovieExec.name

FROM

MovieStar, MovieExec

WHERE

MovieStar.address = MovieExec.address;

Tuple Variables

• Sometimes we need to ask a query that involves two or more tuples from the same relation. We may list a relation R as many times we want in the from clause but

we need a way to refer to each occurrence of R

.

• SQL allows us to define, for each occurrence in the FROM clause, an alias which we call “

tuple variable

.” •

Example

. We like to know about two stars who share an address.

SELECT

Star1.name, Star2.name

FROM

MovieStar

AS

Star1, MovieStar

AS

Star2

WHERE

Star1.address = Star2.address

AND

Star1.name < Star2.name;

AS

is not supported in Oracle.

In Oracle

SELECT

Star1.name, Star2.name

FROM

MovieStar Star1, MovieStar Star2

WHERE

Star1.address = Star2.address

AND

Star1.name <> Star2.name;

Tuple Variables (Continued)

• Why we have the condition –

Star1.name < Star2.name

?

• Without this condition we would produce also pairs of identical star names. • This conditions forces us to produce each pair of stars with a common address

only

. • Why we used the operator

<

and not

<>

?

• If we had used <> the we would have produced pairs of married stars twice, like:

Star1.name

Alec Baldwin

Star2.name

Kim Basinger Kim Basinger Alec Baldwin

Conversion to Relational Algebra

• Another approach to interpret SQL queries is to relate them to relational algebra. • Start with the relations in the

FROM

clause and take their

Cartesian Product

.

• Having created the product, we apply a

selection

operator to it by converting the

WHERE

clause to a selection condition.

• Finally from the list of attributes in the

SELECT

clause we do a

projection

.

An Unintuitive Consequence of SQL semantics • Suppose

R

,

S

,

T

are

unary relations

• We wish to compute

R

(S

T)

. each having attribute

A

alone. • We might expect the following SQL query to do the job.

SELECT

R.A

FROM

R, S, T

WHERE

R.A = S.A

OR

R.A = T.A

• However, consider the situation in which

T

is empty. Since

R.A = T.A

never be satisfied, we might expect the query to produce exactly

R

T

. can • But using the interpretation the result is empty.

– If we use the conversion to RA, the Cartesian product

R x S x T is

 .

Union, Intersection, and Difference of Queries • If two SQL queries produce relations with the same set of attributes then we can combine the queries using the set operations:

UNION

,

INTERSECT

and

EXCEPT

. •

Example

. Suppose we want the names and addresses of all female movie stars who are also movie executives with a net worth over $1,000,000.

– –

MovieStar(

name, address, gender, birthdate

) MovieExec

(name, address, cert, netWorth) (SELECT name, address FROM MovieStar WHERE gender = 'F')

INTERSECT

(SELECT name, address FROM MovieExec WHERE netWorth > 1000000);

Union, Intersection, and Difference of Queries (Continued) • Example. Give the names and addresses of movie stars

who are not

also movie executives.

(SELECT name, address FROM MovieStar)

EXCEPT

(SELECT name, address FROM MovieExec); • In

ORACLE

the EXCEPT is

MINUS

.

• Example. We want all the titles and years of movies that appeared in either the Movie or StarsIn relation.

(SELECT title, year FROM Movie)

UNION

(SELECT title, year FROM StarsIn);