Object Relational Databases - West University of Timișoara

Download Report

Transcript Object Relational Databases - West University of Timișoara

Object Relational Databases
Ioan Despi
Motivation & Politics
In the early 80’s, it became clear that relational systems were not
robust enough for non-administrative data-intensive applications
of the day:
CAD/CAM
CASE
GIS
etc.
Two buzz-phrases began to emerge:
"Object-Oriented" and "Extensible"
Much vision & politics ensued:
Various data models (NF2, ER, Functional, Semantic)
Object-Oriented DB System Manifesto (OO-ness)
Third-Generation DB System Manifesto (Extensibility)
Many query languages proposed
Systems were built, companies started, etc.
Today, the field has settled down into two arenas:
Persistent OO PL systems
(e.g. EXODUS, ObjectStore,
Objectivity, Versant, etc.)
Query-based systems with OO features
(e.g. Starburst, Postgres, Illustra,
Informix & ORacle "Universal Servers", DB/2 UDB)
Almost nobody does both well
Few people continue to argue in terms of the paradigms.
Market:
$10 billion/year for RDBMS
$200 million/year for OODBMS
Obvious & common idea:
To extend an existing relational data base management system:
•
keep the basic relational tables & query language
•
add object flavours:
user - extensible type system
encapsulation
inheritance
polymorphism
dynamic binding of methods
complex objects (non-first normal form objects)
object identity
Terminology:
Extended Relational DBMS (ERDBMS) -- original term
Object - Relational DBMS (ORDBMS) -- more descriptive
Universal Server, Universal DBMS (UDBMS)-- recently
Oracle
Informix
have all extended their systems to become ORDBMSs
IBM
SQL3 standardize extensions to the relational model and query
language
Advantages:
resolves many of the weaknesses of the relational model
extends RDBMS with reuse and sharing
comes from the ability to extend the database server to
perform standard functionality centrally, rather than
having it coded in each application
preserves the knowledge and experience from relational model
Disadvantages:
complexity and associated increased costs
simplicity and purity of the relational model are lost
object-orinted purists: the used terminology is wrong
The Third - Generation Database Manifestos
1989 Atkinson et al. : Object -Oriented Database System Manifesto:
1. Complex objects must be supported
2. Object identity must be supported
3. Encapsulation must be supported
4. Types or classes must be supported
5. Types or classes must be able to inherit from ancestors
6. Dynamic binding must be supported
7. The DML must be computationally complete
8. The set of data types must be extensible
9. Data persistence must be provided
10. Support for very large databases must be provided
11. DBMS must support concurrent users
12. DBMS must be capable of recovery
13. DBMS must provide a simple way of querying data
1990 Stonebraker et al. : Committee for Advanced DBMS Function
The Third-Generation Database Systems Manifeso
1. Athird-generation DBMS must have a rich type system
2. Inheritance is a good idea
3. Functions, including database procedures, methods and
encapsulation, are a good idea
4. Unique identifiers for records should be assigned by the
DBMS only if a user-defined primary key is not available
5. Rules (triggers, constraints) will become a major feature.
They should not be associated with a specific function
or collection.
6. The programmatic access to a database should be through a
non-procedural high-level access language
7. At least two ways to specify collections:
using enumaration of members
using QL to specify membership
8. Updateable views are essential
9. Performance indicators must not appear in the data model
10. DBMS must be accessible from multiple languages
11. On top of the DBMS: a variety of high-level languages
12. For better or worse, SQL is ‘intergalactic dataspeak’
13. Queries and their resulting answers should be the lowest
level of communication between a server and a client
1995, 1998: Darwen & Date: attempt to defend the relational data
model, as described in their book (1992).
The Third Manifesto:
certain oo features are desirable but they must be orthogonal
to the relational model
the realtional model needs no extension, no correction, no
subsumtion, no perversion
SQL is the biggest perversion--> it is rejected from the model
instead: language D
a front-end layer is provided to D that allows SQL to be used
(a migration path for existing SQL users)
The D language is subject to:
1. Prescriptions that arise from the relational model
(RM Prescriptions)
2. Prescriptions that do not arise from the realtional model,
called Other Orthogonal prescriptions
(OO Prescriptions)
3. Proscriptions that arise from the relational model
(RM Proscriptions)
4. Proscriptions that do not arise from the relational model
(OO Proscriptions)
5. Very strong suggestions (RM and OO)
RM Prescriptions
1. Domanins
14. Base vs. derived relvars
2. Typed scalars
15. Database variables (dbvars)
3. Scalar operators
16. Transactions and dbvars
4. Actual representation
17. Create/destroy operations
5. Truth values
18. Relational algebra
6. Type constructor TUPLE
19. Relvar names and explicit values
7. Type constructor RELATION
20. Relation functions
8. Equality operator
21. Relation and tuple assignement
9. Tuples
22. Comparisons
10. Relations
23. Integrity constraints
11. Scalar variables
24. Relation and database predicates
12. Tuple variables
25. Catalog
13. Realtion variables (relvars)
26. Language design
RM Proscriptions
OO Proscriptions
1. No attribute ordering
1. Relvars are not domains
2. No tuple ordering
2. No object ids
no duplicate tuples
3. No ‘public instance variables’
4. No nulls
4. No ‘prottected instance variables’ or friends
5. No nullogical mistakes
6. No internal-level constructs
7. No tuple-level operations
8. No composite columns
9. No domain check override
10. Not SQL
OO Prescriptions
1. Compile-time type checking
2. Single inheritance ( conditional)
3. Multiple inheritance (conditional)
4. Computational completeness
5. Explicit transactions boundaries
6. Nested transactions
7. Aggregates and empty sets
RM Very strong
OO Very strong
suggestions
suggestions
1. Candidate keys for derived relvars
1. Typew inheritance
2. System-generated keys
2. Collection type constructors
3. Referential integrity
3. Conversion to/from relations
4. Candidate key inference
4. Single- level store
5. Quota queries
6. Transitive closure of a relation
7. Tuple and relation parameters
8. Default values
9. SQL migration
The primary object in the propopsal is the domain:
a named set of encapsulated values, of arbitrary complexity
equivalent to a data type or object class
Domain values:
scalars, can be manipulated only by means of
operators defined for the domain.
The language D comes with some build-in domains(ex: truth values)
The equals (=) comparison operator is defined for every domain,
returning a boolean value
Relations, tuples and tuple headings have their normal meaning with
the introduction of RELATION and TUPLE type constructors
for these objects
The following variables are defined:
•Scalar variable of type V--variable whose permitted values are
scalars from a specified domain V
•Tuple variable of type H-- variable whose permitted values are
tuples with a specified tuple heading H
•Relation variable (relvar) of type H -- variable whose permitted
values are relations with a specified
relation heading H
•Database variable (dbvar) -- a named set of relvars. Every dbvar is
subject to a set of named integrity
constraints and has an associated selfdescribing catalog
Stonebraker’s matrix: upper right is growing,
Query
RDBMS
ORDBMS
No
Query
File Sys.
OODBMS
Simple
Data
Complex
Data
Systems History
Three influential research systems:
Starburst (IBM Almaden)
POSTGRES (Berkeley)
EXODUS (Wisconsin)
Others include
O2 (Altair), ORION (MCC),
Iris (HP), Genesis (Texas)
EXODUS
EXODUS was intended to be both a persistent PL system and
a query system with "Toolkit" extensibility
Query processing engine never got built, though the EXODUS
optimizer architecture was influential (Graefe & DeWitt)
Ended up focusing on OODB stuff
SHORE (follow-on to EXODUS) is delivering on EXODUS
promises, but rather late.
Persistent C++
Query Processing (w/ GIS features): Paradise
Extensible Optimizer: Opt++
Parallelism
We'll see Exodus/SHORE work on
pointer swizzling, client-server caching
POSTGRES
Stonebraker, Rowe, a few staff and many students, 1986-1994.
Post-INGRES.
The Postgres Data Model
1.Co-opt the OO terminology
class = relation
instance = tuple
object-id = tuple-id
method = attribute or function of attributes
2.Support extensible ADTs
extensible procedures using C functions
binary operators, which interface to extensible AM
3.Support type constructors
trick: use queries
columns can be parameterized Postquel functions
(returns setof, or tuple)
queries can live in fields of a tuple (returns setof or tuple)
another exploitation of the view paradigm!
these derived objects can optionally be cached
(never implemented)
nested-dots used to traverse complex object structures
leverages EXISTING techniques for relational processing
added array support directly
4.Added class inheritance (gives method inheritance and
collection hierarchies)
Starburst
Original goal: build a nice playpen for whatever comes next.
Extensible "in-house". Not by users!
No one survey paper seems to capture the work they did. Best bet:
"Starburst Mid-Flight: As The Dust Clears", Haas, et al., TKDE 1990
Plumbing:
clean internal query representation (QGM).
Key to Query Rewrite!
non-normalized catalogs for efficiency –
normalized view for users
WAL instead of shadow pages
B+-tree compression
Buffer Pool Manager accepts hints from optimizer
(a la DBMIN)
Extensibility features:
•
User-defined functions:
table expressions: queries or C functions
scalar functions
no dynamic linking
•
Rule-based query rewrite engine
a little rule system with QGM as "working memory"
conditions and actions are C functions that check and
change QGM
some nifty rule control mechanisms (rule classes, rule
budgets, multiple conflict res.)
•
Extensible access methods (as in POSTGRES)
"Attachments": routines to be automatically called before/after
dealing with an access method
used by Starburst Rule System to generate transition logs
used to implement pre-computed joins
•
Complex objects implemented in a "wrapper" (SQL-XNF),
translated down to Starburst SQL
Hot Applications:
Web servers, full-text collections
Time-series data
"Asset Management"
GIS
image DB
Players:
Informix Universal Server (head of Illustra, body of Informix). Shipping now.
IBM DB2 UDB (head of Starburst, body of DB2). Extensibility features coming along.
UniSQL (Won Kim of ORION fame). Went out of business recently.
Oracle Universal Server (marketing-ware). Shipping now.
NCR (Teradata) bought Wisconsin's Paradise ORDBMS
(and DeWitt/Naughton/students)
Other big R vendors are late (Sybase, Tandem, etc.)
Object -Relational Databases Concepts
I. Nested relations
II. Complex types and object orientation
III. Querying with complex types
IV. Creation of complex values and objects
V. Comparison of OO and OR databases
Object -Relational Data Models
1. Extend the relational data model by including object
orientation and constructs to deal with added data types
2. Allow attributes of tuples to have complex types, including
non-atomic values such as nested relations
3. Preserve relational foundations (e.g. declarative access to
data) while extending modeling power
4. Upward compatibility with existing relational languges
I. Nested Relations
1. Permit non-atomic domains (e.g. set of integers, set of tuples,…)
--> violate 1NF (that all attributes have atomic (indivisible) domains)
2. Allow more intuitive modelling for applications with complex data
a complex object may be represented by a single tuple --->
1:1 correspondence between data items and objects
3. Retain mathematical foundation of relational model
Intuitive definition: Nested relations allow:
1. Relations wherever we allow atomic (scalar) values
2. Relations within relations
Example.
Suppose the information to be stored consists of:
1. document title
2. author_list (set of authors)
3. date (day, month, year)
4. key_word_list (list of keywords)
stored in a non-1NF document relation, doc:
title
author_list
date
keyword_list
day month year
salesplan
stat.report
{Smith, Jones}
17 april 93
{profit, strategy}
{Jones, Frick}
28 june 98
{personnel, profit}
title
author
day month year
keyword
salesplan
Smith
17
april
93
profit
salesplan
Jones
17
April
93
profit
salesplan
Smith
17
April
93
strategy
salesplan
Jones
17
April
93
strategy
stat report
Jones
28
June
98
personnel
stat report
Frick
28
June
98
Personnel
stat report
Jones
28
June
98
Profit
stat report
frick
28
June
98
Profit
The doc relation can be represented in 1NF as doc’ but awkward.
If we asssume the following MVDs hold:
title --> author
title --> keyword
title --> day month year
we can decompose the relation in the following 4NF relations:
R1(title, author)
R2(title, keyword)
R3(title, day, month, year)
A relation is in 5NF if no reamining nonloss projections are possible, except the trivial one in
which the key appears in each projection.
A relation is in 4NF iff it is in BCNF and there are no nontrivial MVD
A relation is in BCNF iff every determinant is a candidate key.
A relation is in 3NF iff it is in 2NF and no nonkey attribute is transitively dependent on the key
A relation is 2NF iff it is in 1NF and all the non-key attributes are fully FD on the key
A relation is 1NF iff every attribute is single-valued for each tuple.
The non-1NF representation may be an easier-to-understand model
(closer to user’s view)
The 4NF design would require users to include joins in their
queries, thereby complicating interaction with the system
We could define a view, but we lose the 1:1 correspondence
between tuples and documents.
II. Complex Types and Object Orientation
Again: extensions to relational model include:
nested relations
complex types
specialization (IS_A hierarchies)
inheritance
object identity
A. Structured and collection types
Define a relation doc with complex attributes: sets and
structured attributes:
create type MyString char varying
create type MyDate
(day integer, month char(10), year char(10))
create type Document
(name MyString, author_list setof(MyString),
date MyDate, keyword_list setof (MyString))
create table doc of type Document
•Unlike table definitions in ordinary relational databases, the doc table
definition allows attributes that are sets and structured attributes (see:
MyDate)
•Allow composite attributes and multivalued attributes of ER diagrams
to be represented directly.
•The types created using the above statements are recorded in the
schema stored in the database
•Can create tables directly:
create table doc
(name MyString, author_list setof (MyString), date MyDate, keyword_list setof (MyString))
Complex type systems usually support other collection types, as
arrays:
author_array MyString[10] //presents an ordered list of authors
multisets:
print_runs multiset(integer) //presents the number of copies in each
//printing run
B. Inheritance
Inheritance can be at the level of types or at the level of tables
1. Inheritance of types
create type Person
(name MyString,
social_security_no integer)
create type Student
create type Teacher
(degree MyString,
(salary integer,
department MyString)
department MyString)
under Person
under Person
1’. Multiple inheritance of types
Definiton of the type TeachingAssistant as a subtype of both
Teacher andStudent.
Since name and social_security_no are inherited from a common
source, Person, there is no conflict by inheriting them .
However, department is defined separately in Student and Teacher
and one can rename them to avoid conflict, by using an as clause:
create type TeachingAssistant
under Student with (department as student_dept),
under Teacher with (department as teacher_dept)
2. Inheritance of tables
To avoid creation of too many subtypes: one approach is to allow an
object to have multiple types without having a most specific type
OR databases can model such a feature by using inheritance at the
level of tables, rather than types and allowing an entity(object) to
exist in more than one table at once.
create table people
(name MyString,
social_security_no integer)
create table students
create table teachers
(degree MyString,
(salary integer,
department MyString)
department MyString)
under people
under people
2’. Table inheritance: Roles
Table inheritance permits an object to have multiple types, without
having a most-specific type (unlike type inheritance)
Example: an object can be in the students and teachers subtables
simultaneously, without having to be in a subtable
student_teachers that is under both students and teachers
Object can gain/ lose roles: corresponds to inserting /deleting object
from a subtable.
2’’. Table inheritance: Consistency Requirements
1. Each tuple of supertable people can correspond to (i.e. having the
same values for all inherited attributes as) at most one tuple in each of
the tables studentsand teachers (WHY?)
2. Each tuple in students and teachers must have exactly one
corresponding tuple in people. (WHY?)
Subtables can be stored in an efficient manner without replication
of all inherited fields:
inherited attributes other than the primary key of the
supertable need not be stored and can be derived by means
of a join with the supertable, based on the primary key
As with types, multiple inheritance is possible with tables:
a TeachingAssistant can simply belong to the table students
as well as to the table teachers. However, if ew want, we
can create a table for TeachingAssistant entities.
Based on the consistency requirements for subtables, if an entity is
present in the teaching_assistants table, it is also present in the
teachers and in the students table.
3 . Inheritance: Conclusion
Inheritance:
makes schema definition natural
ensures referential and cardinality constraints
enables the use of functions defined for supertypes on
objects belonging to subtypes
allows the orderly extension of a database system to
incorporate new types
C. Reference Types
Object - oriented languages provide the ability to create and refer to
objects.
1.To refer to objects = an attribute of a type can be a reference to an
object of a specified type.
Example: redefine the author_list field of the type Document as:
author_list setof ( ref (Person))
Now author_list is a set of references to Person objects
2. Tuples of a table can also have references to them.
Example:
ref(people)
It can be implemented using the primary key or tuple-id.
3. SQL3 uses identity (for tuples) and oid (for objects).
III. Quering with Complex Types
A. Relation-Valued Attributes
Extended SQL allows an expression evaluating to a relation to
appear anywhere the relation name may appear.==>
can take advantage of the structure of nested relations
Example: consider the following relation pdoc:
create table pdoc
name MyString
author_list setof (ref people)),
date MyDate,
keyword_list setof (MyString))
1. Find all documents having the word “database” as one of their
keywords:
select name
from pdoc
where “database” in keyword_list
2. Find all pairs of the form “doc_name, author_name” for each
document and each author of the document:
select B.name, Y.name
from pdoc as B, B.author_list as Y
3. Find the name and the number of authors for each document
(aggregate functions can be applied to any relation-valued expression)
select name, count(author_list)
from pdoc
B. Path Expressions
The dot notation for referring to composite attributes can be used
with references.
Example: student.advisor.name
References can be used to hide join operations --> they simplify the
query considerably.
Example: Consider the previous table people and a table phd_student:
create table phd_students
(advisor (ref(people))
under people
Find the names of the advisers of all PhD students:
select phd_student.advisor.name
from phd_students
In general, attributes used in a path expression can be a collection, such
as a set or a multiset.
For example, to get the names of all authors of documents in pdoc
relation:
select Y.name
from pdoc.author_list as Y
C. Nesting and Unnesting
Unnesting = the transformation of a nested relation into 1NF
converts a nested relation into a single flat relation with no
nested relations or structured types as attributes
create table doc
(name MyString,
author_list setof (MyString),
date MyDate,
author_list, keyword_list:
nested relations
name, date: are not nested
keyword_list setof (MyString))
select name, A as author, date.day, date.month, date.year, K as keyword
from doc as B, B.author_list as A, B.keyword_list as K
B is declared to range over doc, A ranges over the authors in
author_list for that document, K ranges over the keywords in the
keyword_list of the document
Nesting =
the reverse operation of transformation of a 1NF
relation into a nested relation
can be carried out by an extension of grouping in SQL
Example:
nest the relation flat_doc on the attribute keyword:
select title, author, (day, month, year) as date,
set (keyword) as keyword_list
from flat_doc
groupby title, author, date
title
author_list
date
keyword_list
day month year
salesplan
Smith
17 april 93
{profit, strategy}
salesplan
Jones
17 april 93
{profit, strategy}
stat.report
Jones
28 june 98
{personnel, profit}
stat_report
Frick
28 june 98
{personnel, profit}
D. Functions
Object - relational systems allow functions to be defined by users
Functions can be defined in a DML, such as extended SQL
Example:
define a function that, given a document, return the
count of the number of authors:
create function author_count (one_doc Document)
returns integer as
select count (author_list)
from one_doc
the function can be used in a query , to find the names of all
documents that have more than one author
select name
from doc
where author_count (doc) > 1
Notes:
1.Although doc refers to a relation in the from clause, it is treated as
tuple variable in the where clause, and can therefore be used as an
argument to the author_count function.
2. A select statement can return a collection of values.
If the return type of a function is a collection type, the result of the
function is the entire collection.
However, if the return type is not a collection type, the collection
generated by SQL shoul contain only one tuple.
Otherwise, a system may have two choices: flag an error or select
an arbitrary one from the collection.
Functions can also be defined in a programming language as C,
C++, Java. It can be more efficient and handle more complex
computations than that defined using SQL.
Since the code needs to be loaded and executed within the database
system code, it may carry the risk of:
integrity: a bug in the program can corrupt the database
internal structure
security: it can by-pass the access control functionality of
the database management system
Embedded SQL is different from C++ code functions: in SQL the
query is passed by the user program to the database system to run.
User-written code never needs to access to the database itself. The
operating system thus can protect the database from access by any
user process.
IV. Creation of Complex Values and Objects
1. Create and update tuples with complex types (tuple-valued and
set-valued):
insert into doc
values
(“salesplan”, set (“Smith”, “Jones”),
(19, “January”, 00), set (“profit”, “strategy”))
2. We can use complex values in queries: anywhere where a set is
expected, we can enumerate a set:
select name, date
from doc
where name in set (“salesplan”, “opportunities”,
“risks”)
3. Multiset values can be created by replacing set by multiset.
4. To create new objects, one can use constructor functions.
The constructor function for an object type T is T( ).
When it is invoked, it creates a new uninitialized object of
type T, fills in its oid field, and returns the object.
The fields of the object must then be initialized.
V. Comparison of OO and OR databases
1. OR databases are OO databases built on top of the relational model.
2. Persistent programming language-based OODBs target applications
that have high performance requirements( CAD dbs)
3. Relational systems: simple data types, powerful query languages,
high protection
4. Persistent programming language-based OODBs: complex data
types, integration with programming languages, high
performance
5. Object- relational systems: complex data types, powerful query
languages, high protection
Reference: A. Silberschatz, H.F. Korth, S. Sudarshan - Database System Concepts, McGraw-Hill, 3rd ed., 1997
T. Connolly, C. Begg, A. Strachan - Database Systems: A Practical Approach to Design, Implementation and Management. 2 nd ed., Addison-Wesley, 1999