Distributed Databases Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data.
Download
Report
Transcript Distributed Databases Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data.
Distributed Databases
Outline
Evolution of data processing
What is a DDBMS?
Motivation behind DDBMS
Types of Distributed Databases
Distributed Data Storage
Replication
Fragmentation
Transparencies in a DDBMS
Outline
Evolution of data processing
What is a DDBMS?
Motivation behind DDBMS
Types of Distributed Databases
Distributed Data Storage
Replication
Fragmentation
Transparencies in a DDBMS
Evolution of data processing
File-based system
Each application defined and maintained its data
Database technology (DBMS)
Data is defined and administered centrally.
A single logical database located at one site
managed/controlled by a single DBMS
Distributed database technology (DDBMS)
Decentralization
Allow users to access not only data at their own site
but also data stored at remote sites.
File-based system
a collection of application
programs that perform
services for the end users
such as the production of
reports.
Registrar
enrollment
Data entry of
courses
Data entry of
enrollment
prepared by:RdDB
Student
Courses
Faculty
File
EAF
Generation of
EAF
prepared by:RdDB
Student
Courses
Faculty
File-based system
Registrar
enrollment
Data entry of
courses
Data entry of
enrollment
struct course
{ char code[5];
char desc[20];
int units [3];
}
File
EAF
Each program in the
system defines and
manages its own data.
[CBS98]
Each user (with the assistance of
DP staff) defines and implements
(including storage and control) the
files needed for a specific
application. [CBS98, EN94]
Registrar
enrollment
Accounting
prepared by:RdDB
File-based systems
File
Student
Courses
Faculty
EAF
File
Student
Fees
payment of fees
OR
Department
Processing of
grades
File
course
cards
Student
Courses
Grades
Faculty
What can be observed?
prepared by:RdDB
Data redundancy
Student System (File-based)
Registrar
enrollment
Accounting
File
Student
Courses
Faculty
EAF
File
Student
Fees
payment of fees
OR
Department
Processing of
grades
File
course
cards
Student
Courses
Grades
Faculty
What can Student
be observed?
System
Separation and
(File-based)
prepared by:RdDB
isolation of data
Registrar
enrollment
Accounting
File
Student
Courses
Faculty
EAF
File
Student
Fees
payment of fees
OR
Department
Processing of
grades
File
course
cards
Student
Courses
Grades
Faculty
What can be observed?
Program-data dependence
Student System (File-based)
struct person
{ char first[20];
char middle[3];
Registrar
enrollment
char last[30];
} employees,
managers;
Accounting
File
prepared by:RdDB
Student
Courses
Faculty
EAF
File
Student
Fees
payment of fees
OR
Department
Processing of
grades
File
course
cards
Student
Courses
Grades
Faculty
What can be observed?
Incompatibility of files
Student System (File-based)
COBOL
Registrar
enrollment
File
prepared by:RdDB
Student
Courses
Faculty
EAF
C
Accounting
File
Student
Fees
payment of fees
OR
Department
Processing of
grades
File
course
cards
Student
Courses
Grades
Faculty
What can be observed?
Fixed
queries;
Student System (File-based)
proliferation
prepared by:RdDB
of application
programs
Registrar
enrollment
Accounting
File
Student
Courses
Faculty
EAF
File
Student
Fees
payment of fees
OR
Department
Processing of
grades
File
course
cards
Student
Courses
Grades
Faculty
Limitations of File-Based Systems
Separation and isolation of data
Duplication of data
Program-data dependence
Incompatibility of files (e.g C vs. COBOL)
Fixed queries / proliferation of application
programs
prepared by:RdDB
Factors that limit File-Based System
prepared by:RdDB
The definition of data is embedded in the
application programs, rather than being
stored separately and independently. [CBS98]
There is no control over the access and
manipulation of data beyond that imposed by
the application programs. [CBS98]
Database Approach
What is a database?
prepared by:RdDB
It is a shared collection of logically coherent
data with some inherent meaning;
It is designed , built and populated with data
for specific purpose such as meeting the
information needs of an organization;
Ex: student database, employee database,
library database, air flights database, hospital
database, etc.
What is a DBMS?
Database Management System
It is a software system that enables users to :
define, create and maintain the database
DDL
DML
provide controlled access to this database
Security
Integrity
Concurrency control
Recovery control
User-accessible catalogue (description of data)
A Simple Database System Environment
DATABASE
SYSTEM
Application
Programs/
Queries
prepared by:RdDB
DBMS Software
Software to
process
programs/
queries
Software to
access stored
data
Data
Definition
Database
University System: Database Environment
prepared by:RdDB
Registrar
DATABASE
SYSTEM
Accounting
Department
Enrollment
Payment
Grade processing
MySQL
Software to
process
programs/
queries
Software to
access stored
data
Data
Definition
University
database
Outline
Evolution of data processing
What is a DDBMS?
Motivation behind DDBMS
Types of Distributed Databases
Distributed Data Storage
Replication
Fragmentation
Transparencies in a DDBMS
What is a Distributed database?
A distributed database is a
logically interrelated collection of shared data (and
a description of this data),
physically distributed over a computer network.
What is a Distributed DBMS?
Distributed DBMS is the software system that
permits the management of the distributed
database and
makes the distribution transparent to users.
Topology of DDBMS
Site 1
DB
Site 4
Site 2
Computer
Network
Site 3
DB
DB
DB
Topology of Distributed Processing
Site 1
Site 4
Site 2
Computer
Network
A centralized database
that can be accessed
over a computer
network
Site 3
DB
Outline
Evolution of data processing
What is a DDBMS?
Motivation behind DDBMS
Types of Distributed Databases
Distributed Data Storage
Replication
Fragmentation
Transparencies in a DDBMS
Motivation behind DDBMS
1. To integrate the operational data and provide
controlled access to the data.
This may imply centralization but it is not the
intention
2. To adopt a decentralize approach to data
which mirrors the organizational structure of
many companies.
Each unit maintains its own data.
Data is stored proximate to the location which
would improve:
Shareability of the data
Efficiency of data access
Motivation behind DDBMS
3. To help resolve the islands of information
problem:
Geographical separation
Incompatible computer architectures
Incompatible communication protocols
Why Distribute??
Example: X Corp. has offices in London, New
York, and Hong Kong.
Employee data:
EMP(ENUM, NAME, TITLE, SALARY, …)
Where should the employee data table
reside?
(example from Stanford University)
X Corp. Data Access Pattern
Mostly, employee data is managed at the
office where the employee works
E.g., payroll, benefits, hire and fire
Periodically, X Corp needs consolidated
access to employee data
E.g., X Corp. changes benefit plans and that
affects all employees.
E.g., Annual bonus depends on global net profit.
(example from Stanford University)
New York
Payroll app
London
Payroll app
EMP
London
New York
Internet
Hong Kong
Payroll app
NY and HK payroll
apps run very slowly!
Hong Kong
(example from Stanford University)
New York
Payroll app
London
Payroll app
London
London
Emp
New York
NY
Emp
Internet
Hong Kong
Payroll app
Much better!!
Hong Kong
HK
Emp
(example from Stanford University)
Fundamental principle of DDBMS
The DDBMS is expected to make the
distribution transparent (invisible) to the user.
The objective of transparency:
To make the distributed system appear like a
centralized system.
Outline
Evolution of data processing
What is a DDBMS?
Motivation behind DDBMS
What is a distributed database system?
Types of Distributed Databases
Distributed Data Storage
Replication
Fragmentation
Transparencies in a DDBMS
Distributed Database System
Data is spread over multiple machines (also
A Distributed
System
referred to as
sites or nodes).
The computers may vary in size and function
A Distributed System
Network interconnects the machines
Database is stored on several sites.
A Distributed System
Data is shared by users on multiple machines
Account(actno,
br-name, Aki-01
balance)
Add 100 to account
Makati
Branch(br-name, br-city,assets)
Local transaction: If
transaction is initiated at
network
Makati branch
Q.C.
Account(actno, br-name, balance)
Main
Manila
Account(actno, br-name, balance)
Account(actno, br-name, balance)
Makati
Q.C.
Branch(br-name, br-city,assets)
Main
Global: if transaction is initiated
elsewhere network
Global Transaction Example:
transfer 100 pesos from account Manila
Aki-01 (Makati) to account Aki100 (Manila)
Account(actno, br-name, balance)
Account(actno, br-name, balance)
Local and Global Transactions
A local transaction accesses data in the
single site at which the transaction was
initiated.
A global transaction either:
accesses data in a site different from the
one at which the transaction was initiated or
(e.g. funds transfer)
accesses data in several different sites.
(e.g. summarization of deposits in all
branches)
Distributed Database System
Makati
Main
network
Q.C.
A distributed database system consistsManila
of loosely coupled sites that share no
physical component
Database systems that run on each site
are independent of each other
Transactions may access data at one or
more sites
Why Distributed Database Systems?
Sharing data – users at one site are able
to access the data residing at some
other sites.
Autonomy – each site is able to retain a
degree of control over data stored
locally.
Higher system availability through
redundancy — data can be replicated at
remote sites, and system can function
even if a site fails.
Trade-offs in Distributed Systems
Disadvantage: added complexity
required to ensure proper
coordination among sites.
Software development cost.
Greater potential for bugs.
Increased processing overhead.
Outline
Evolution of data processing
What is a DDBMS?
Motivation behind DDBMS
Types of Distributed Databases
Distributed Data Storage
Replication
Fragmentation
Transparencies in a DDBMS
Homogeneous Distributed Database
oracle
oracle
All sites have
identical database
management
system software
Sites are aware of
Makati
Main
network
Manila
Q.C.
each other and
agree to cooperate
in processing user
requests.
Each site
surrenders part of
its autonomy in
terms of right to
change schemas or
software
oracle
oracle
Appears to user as
a single system
Heterogeneous Distributed Database
SQl server
oracle
different schemas and
software (DBMS)
Makati
Main
network
Q.C.
Different sites may use
Manila
Difference in schema
is a major problem
for query processing
Difference in
software is a major
problem for
transaction
processing
Sites may not be aware
Sybase
DB2
of each other and may
provide only limited
facilities for cooperation
in transaction
processing
Outline
Evolution of data processing
What is a DDBMS?
Motivation behind DDBMS
Types of Distributed Databases
Distributed Data Storage
Replication
Fragmentation
Transparencies in a DDBMS
Approaches: Distributed Data Storage
1.
Data Replication
System maintains multiple copies of the relation, stored
in different sites, for faster retrieval and fault
tolerance.
2.
Data Fragmentation
Relation is partitioned into several fragments stored in
distinct sites
Note: Replication and fragmentation can be combined
Relation is partitioned into several fragments: system
maintains several identical replicas of each
fragment.
Allocation
Each fragment is stored at the site with optimal
distribution
Data Replication
Account(actno, br-name, balance)
Makati
Q.C.
A relation or fragment of a
relation is network
replicated if it is
stored redundantly in two or
more sites.
Account(actno, br-name, balance)
Main
Manila
Account(actno, br-name, balance)
Data Replication
Account(actno, br-name, balance)
Makati
Account(actno, br-name, balance)
Full replication of a relationMain
is
the case where the relation is
network
stored at all sites.
Q.C.
Account(actno, br-name, balance)
Manila
Account(actno, br-name, balance)
Bank database
Makati
Q.C.
Data Replication
Main
Fully redundant databases are those
in which everynetwork
site contains a copy of
the entire database.
Manila
Advantages of Data Replication
Availability: failure of site containing
relation r does not result in unavailability of
r if replicas exist.
Parallelism: queries on r may be
processed by several nodes in parallel.
Reduced data transfer: relation r is
available locally at each site containing a
replica of r.
Disadvantages Data Replication
Increased cost of updates: each replica
of relation r must be updated.
Increased complexity of concurrency
control: concurrent updates to distinct
replicas may lead to inconsistent data
unless special concurrency control
mechanisms are implemented.
Data Fragmentation
A relation may be divided into a number of sub-
relations called fragments, which are then
distributed.
Division of relation r into fragments r1, r2, …, rn
which contain sufficient information to
reconstruct relation r.
relation r
r1
r2
…
rn
Relation r
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
a3
b1
c3
d3
e3
a4
b1
c4
d4
e4
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
a9
b3
c9
d9
e9
a10
b3
c10
d10
e10
a11
b3
c11
d11
e11
a12
b3
c12
d12
e12
}
}
}
r1
r2
r3
Why fragment? : Advantages
Usage: Applications work with views rather
than entire relations
Efficiency: Data is stored to where it is most
frequently used
Parallelism: A transaction can be divided into
subqueries that operate on fragments
Security: Data not required by local
applications is not stored and consequently
not available to unauthorized users
Disadvantages of fragmentation
Performance: Performance of applications
that require data form several fragments
located at different sites may be slower
Integrity :May be more difficult to implement
Data Fragmentation
Types of Data Fragmentation
Horizontal fragmentation
Vertical fragmentation
Mixed fragmentation
Horizontal fragmentation
A horizontal fragment of a relation consists of a subset
of the tuples of a relation.
It is produced by specifying a predicate that performs
a restriction on the tuples in the relation
It is defined using the selection operation of relational
algebra
r1
R
r2
r3
Horizontal fragmentation
Given a relation R, a horizontal fragment
is defined as:
r1 = p (R )
Where p is the predicate based on
one or more attributes of the relation
Relation r
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
a3
b1
c3
d3
e3
a4
b1
c4
d4
e4
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
a9
b3
c9
d9
e9
a10
b3
c10
d10
e10
a11
b3
c11
d11
e11
a12
b3
c12
d12
e12
r1 = B=‘b1’ (r )
r2 = B=‘b2’ (r )
r3 = B=‘b3’ (r )
Relation r
r1 = B=‘b1’ (r )
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
a3
b1
c3
d3
e3
a4
b1
c4
d4
e4
Relation r
r2 = B=‘b2’ (r )
A
B
C
D
E
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
Relation r
r3 = B=‘b3’ (r )
A
B
C
D
E
a9
b3
c9
d9
e9
a10
b3
c10
d10
e10
a11
b3
c11
d11
e11
a12
b3
c12
d12
e12
Horizontal fragmentation
Given:
PROPERTY_FOR_RENT(pno, street, area, city,
pcode, type, rooms, rent, ono)
Horizontal fragmentation by property type of
PROPERTY_FOR_RENT
P1: type=‘House’ (PROPERTY_FOR_RENT )
P2: type=‘Flat’ (PROPERTY_FOR_RENT )
Vertical Data Fragmentation
A vertical fragment of a relation consists of a subset of the
attributes of a relation
Vertical fragmentation groups together attributes that are
used by some applications
It is defined using the projection operation of relational
algebra.
R
r1
r2
r3
Vertical Fragmentation
Given R, a vertical fragment is defined as:
r1 = a1, …., an (R )
Where a1, ….an are attributes of relation R
Relation r
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
a3
b1
c3
d3
e3
a4
b1
c4
d4
e4
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
a9
b3
c9
d9
e9
a10
b3
c10
d10
e10
a11
b3
c11
d11
e11
a12
b3
c12
d12
e12
r1 = A, B,C (r)
r2 = A, D (r)
r3 = A, E (r)
Relation r
A
B
C
a1
b1
c1
a2
b1
c2
a3
b1
c3
a4
b1
c4
a5
b2
c5
a6
b2
c6
a7
b2
c7
a8
b2
c8
a9
b3
c9
a10
b3
c10
a11
b3
c11
a12
b3
c12
r1 = A, B,C (r)
Relation r
A
D
a1
d1
a2
d2
a3
d3
a4
d4
a5
d5
a6
d6
a7
d7
a8
d8
a9
d9
a10
d10
a11
d11
a12
d12
r2 = A, D (r)
Relation r
A
E
a1
e1
a2
e2
a3
e3
a4
e4
a5
e5
a6
e6
a7
e7
a8
e8
a9
e9
a10
e10
a11
e11
a12
e12
r3 = A, E (r)
Vertical Fragmentation
Given:
STAFF(Sno, Fname, Lname, Address, Telno,
Position, Sex, DOB, Salary, NIN,Bno)
repeated
Vertical Fragmentation of staff
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
Mixed Fragmentation
A mixed fragment of a relation
consists of
either:
a horizontal fragment that is
subsequently vertically fragmented or
vertical fragment that is then
horizontally fragmented
It is defined using the selection and
projection operations of relational algebra
Mixed Fragmentation
Given a relation R, a mixed fragment is
defined as:
Vertical fragment that is then horizontally
fragmented
p(a1, …., an (R ))
Horizontal fragment that is then vertically
fragmented
a1, …., an (p(R ))
Data Fragmentation
Mixed fragmentation:
Vertical fragments,
horizontally fragmented
Mixed fragmentation:
Horizontal fragments,
vertically fragmented
Relation r
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
a3
b1
c3
d3
e3
a4
b1
c4
d4
e4
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
a9
b3
c9
d9
e9
a10
b3
c10
d10
e10
a11
b3
c11
d11
e11
a12
b3
c12
d12
e12
r1 = B=‘b1’ (r )
r2 = B=‘b2’ (r )
r3 = B=‘b3’ (r )
r3.1 = A, B,C (r)
r3.2 = A, D (r)
r3.3 = A, E (r)
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
a3
b1
c3
d3
e3
a4
b1
c4
d4
e4
A
B
C
D
E
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
A
B
C
D
E
a9
b3
c9
d9
e9
a10
b3
c10
d10
e10
a11
b3
c11
d11
e11
a12
b3
c12
d12
e12
r1 = B=‘b1’ (r )
r2 = B=‘b2’ (r )
r3 = B=‘b3’ (r )
A
B
C
a9
b3
c9
a10
b3
c10
a11
b3
c11
a12
b3
c12
A
D
a9
d9
a10
d10
a11
d11
a12
d12
A
E
a9
e9
a10
e10
a11
e11
a12
e12
r3.1 = A, B,C (r3)
r3.2 = A, D (r3)
r3.3 = A, E (r3)
Mixed Fragmentation
Given:
STAFF(Sno, Fname, Lname, Address, Telno,
Position, Sex, DOB, Salary, NIN,Bno)
Vertically fragment Staff into:
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
Mixed Fragmentation
Horizontally fragment S2 according to branch
number.
S2: Sno, fname, lname,address,telno,bno (Staff)
S21:
bno=‘B3’ (S2)
S22:
bno=‘B5’ (S2)
S22:
bno=‘B7’ (S2)
Rules: Correctness of fragmentation
Rule1: Completeness
Rule2: Reconstruction
Rule3: Disjointness
Rules: Correctness of fragmentation
Rule1: Completeness
If a relation instance R is decomposed into
fragments R1, R2, … Rn , each data item
that can be found in R must appear in at
least one fragment.
This is necessary to ensure that there is no
loss of data during fragmentation
Rule1: Completeness
R
r1
r2
If a relation instance R is decomposed
into fragment R1, R2, … Rn , each data
item that can be found in R must
appear in at least one fragment.
Relation r(A,B,C,D,E)
A
B
C
D
E
a1
b1
c1
d1
e1
A
B
C
D
E
a2
b1
c2
d2
e2
a1
b1
c1
d1
e1
a3
b1
c3
d3
e3
a2
b1
c2
d2
e2
a4
b1
c4
d4
e4
a3
b1
c3
d3
e3
A
B
C
D
E
a4
b1
c4
d4
e4
a5
b2
c5
d5
e5
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
a8
b2
c8
d8
e8
a9
b3
c9
d9
e9
A
B
C
D
E
a10
b3
c10
d10
e10
a9
b3
c9
d9
e9
a11
b3
c11
d11
e11
a10
b3
c10
d10
e10
a12
b3
c12
d12
e12
a11
b3
c11
d11
e11
a12
b3
c12
d12
e12
Rules: Correctness of fragmentation
Rule 2: Reconstruction
It must be possible to define a relational operation
that will reconstruct the relation R from the
fragments.
This rule ensures that functional dependencies are
preserved.
r1
Rule2: Reconstruction
R
r2
Relational
operation
It must be possible to define a relational
operation that will reconstruct the relation R
from the fragments.
Horizontal: Union r1 r2 = r
Vertical: Natural Join operation: S1
S2
A
B
C
D
E
a1
b1
c1
d1
e1
A
B
C
D
E
a2
b1
c2
d2
e2
a1
b1
c1
d1
e1
a3
b1
c3
d3
e3
a2
b1
c2
d2
e2
a4
b1
c4
d4
e4
a3
b1
c3
d3
e3
A
B
C
D
E
a4
b1
c4
d4
e4
a5
b2
c5
d5
e5
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
a8
b2
c8
d8
e8
a9
b3
c9
d9
e9
A
B
C
D
E
a10 b3
c10 d10 e10
a9
b3
c9
d9
e9
a11 b3
c11 d11 e11
a12 b3
c12 d12 e12
a10 b3
c10 d10 e10
a11 b3
c11 d11 e11
a12 b3
c12 d12 e12
R= r1 r2 r2
A
B
C
A
D
A
E
a1
b1
c1
a1
d1
a1
e1
a2
b1
c2
a2
d2
a2
e2
a3
b1
c3
a3
d3
a3
e3
a4
b1
c4
a4
d4
a4
e4
a5
b2
c5
a5
d5
a5
a6
b2
c6
a6
d6
a7
b2
c7
a7
a8
b2
c8
a9
b3
a10
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
e5
a3
b1
c3
d3
e3
a6
e6
a4
b1
c4
d4
e4
d7
a7
e7
a5
b2
c5
d5
e5
a8
d8
a8
e8
a6
b2
c6
d6
e6
c9
a9
d9
a9
e9
b3
c10
a10
d10
a10
e10
a7
b2
c7
d7
e7
a11
b3
c11
a11
d11
a11
e11
a8
b2
c8
d8
e8
a12
b3
c12
a12
d12
a12
e12
a9
b3
c9
d9
e9
R= r1
r2
r2
a10 b3
c10 d10 e10
a11 b3
c11 d11 e11
a12 b3
c12 d12 e12
Rules: Correctness of fragmentation
Rule 3: Disjointness
If a data item di appears in fragment Ri, then it
should not appear in any other fragment.
This rule ensures minimal data redundancy
Vertical fragmentation is the exception to this rule,
where the primary key attributes must be repeated
to allow reconstruction.
Rule3: Disjointness
R
r1
r2
If a data item di appears in fragment Ri, then it
should not appear in any other fragment,
except for the primary key in the case of
vertical fragmentation
A
B
C
D
E
a1
b1
c1
d1
e1
A
B
C
D
E
a2
b1
c2
d2
e2
a1
b1
c1
d1
e1
a3
b1
c3
d3
e3
a2
b1
c2
d2
e2
a4
b1
c4
d4
e4
a3
b1
c3
d3
e3
a4
b1
c4
d4
e4
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
a9
b3
c9
d9
e9
R= r1 r2 r2
A
B
C
D
E
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
A
B
C
D
E
a10 b3
c10 d10 e10
a9
b3
c9
d9
e9
a11 b3
c11 d11 e11
a12 b3
c12 d12 e12
a10 b3
c10 d10 e10
a11 b3
c11 d11 e11
a12 b3
c12 d12 e12
No data item
appears in
more than
one fragment
A
B
C
A
D
A
E
a1
b1
c1
a1
d1
a1
e1
a2
b1
c2
a2
d2
a2
e2
a3
b1
c3
a3
d3
a3
e3
a4
b1
c4
a4
d4
a4
e4
a5
b2
c5
a5
d5
a5
a6
b2
c6
a6
d6
a7
b2
c7
a7
a8
b2
c8
a9
b3
a10
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
e5
a3
b1
c3
d3
e3
a6
e6
a4
b1
c4
d4
e4
d7
a7
e7
a5
b2
c5
d5
e5
a8
d8
a8
e8
a6
b2
c6
d6
e6
c9
a9
d9
a9
e9
b3
c10
a10
d10
a10
e10
a7
b2
c7
d7
e7
a11
b3
c11
a11
d11
a11
e11
a8
b2
c8
d8
e8
a12
b3
c12
a12
d12
a12
e12
a9
b3
c9
d9
e9
R= r1
r2
r2
No data item appears
in more than one
fragment except for
the primary key
a10 b3
c10 d10 e10
a11 b3
c11 d11 e11
a12 b3
c12 d12 e12
Check Correctness:
Horizontal Fragmentation
Given:
PROPERTY_FOR_RENT(pno, street, area, city,
pcode, type, rooms, rent, ono)
Horizontal fragmentation by property type of
PROPERTY_FOR_RENT
P1: type=‘House’ (PROPERTY_FOR_RENT )
P2: type=‘Flat’ (PROPERTY_FOR_RENT )
Check Correctness:
Horizontal Fragmentation
P1: type=‘House’ (PROPERTY_FOR_RENT )
P2: type=‘Flat’ (PROPERTY_FOR_RENT )
Completeness: Each tuple in the relation appears in
either fragment P1 or P2.
Reconstruction: The Property_For_Rent relation can
be reconstructed from the fragments using Union
operation:
S1 S2 = Property_For_Rent
Disjointness:
The fragments are disjoint;
There can be no property type that is both house and flat
Check Correctness:
Vertical Fragmentation
Given:
STAFF(Sno, Fname, Lname, Address, Telno,
Position, Sex, DOB, Salary, NIN,Bno)
Vertical Fragmentation of staff
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
Check Correctness:
Vertical Fragmentation
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
Completeness: Each attribute in Staff relation
appears in either fragment S1 or S2.
Reconstruction: The Staff relation can be
reconstructed from the fragments using
Natural Join operation: S1
S2
Disjointness: S1 and S2 are disjoint except for
the necessary duplication of the primary key.
Mixed Fragmentation
Given:
STAFF(Sno, Fname, Lname, Address, Telno,
Position, Sex, DOB, Salary, NIN,Bno)
Vertically fragment Staff into:
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
Mixed Fragmentation
Horizontally fragment S2 according to branch
number.
S2: Sno, fname, lname,address,telno,bno (Staff)
S21:
bno=‘B3’ (S2)
S22:
bno=‘B5’ (S2)
S22:
bno=‘B7’ (S2)
Check Correctness:
Mixed Fragmentation
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
S21:
bno=‘B3’ (S2)
S22:
bno=‘B5’ (S2)
S22:
bno=‘B7’ (S2)
Completeness
Each attribute in Staff relation appears either in fragment
S1 or S2
Each tuple (part) appears in fragment S1 and either
fragment S21, S22, or S23.
Check for correctness:
Mixed Fragmentation
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
S21:
bno=‘B3’ (S2)
S22:
bno=‘B5’ (S2)
S22:
bno=‘B7’ (S2)
Reconstruction
The Staff relation can be reconstructed from the
fragments using the Union and Natural Join
operations:
S1
( S21 S22 S23 ) = Staff
Check for correctness:
Mixed Fragmentation
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
S21:
bno=‘B3’ (S2)
S22:
bno=‘B5’ (S2)
S22:
bno=‘B7’ (S2)
Disjointness
The fragments are disjoint
There can be no staff member that works in more than
one branch
S1 and S2 are disjoint except for the necessary
duplication of the primary key.
Advantages of Horizontal Fragmentation
allows parallel processing on
fragments of a relation
allows a relation to be split so that
tuples are located where they are
most frequently accessed
Advantages of Vertical Fragmentation
allows tuples to be split so that each
part of the tuple is stored where it is
most frequently accessed
Basis of design: definition and allocation of
fragments
Analyze applications
Concentrate on the most important ones
80/20 rule may be used as a guideline
most active 20% of user queries account for 80% of the
total data access
Quantitative information (used in allocation)
Frequency with which an application is run
Site from which an application is run
Performance criteria for transactions and
applications
Basis of design: definition and allocation of fragments
Qualitative information (used in
fragmentation)
Transactions executed by the application
Type of access (read or write)
Predicates of read operations
Objectives of fragment definition and allocation
Locality of reference
Data should be stored close to where it is used
If a fragment is used at several sites, it may be beneficial to
store copies of the fragments at these sites
Improved reliability and availability
Made possible through replication
If one site fails, another copy is available at another site
Acceptable performance
Bad allocation may result in occurrence of bottlenecks;
Bad allocation may also result in under utilization of resources.
Objectives of fragment definition and allocation
Balanced storage capacities and costs
Availability and cost of storage at each site
Minimal communication costs
Consider cost of remote requests
Retrieval costs are minimized
when locality of reference is maximized or
when each site has its own copy of the data
Updating replicated data is costly
Strategies for Data allocation
Centralized
Partitioned (or fragmented)
Complete replication
Selective replication
Single database
Centralized
Workstation 1
Workstation 3
Workstation 2
LAN
Database
Server with
DBMS
Single DBMS
Users distributed
across the network
Reliability and
availability are low –
failure of central site
results in loss of the
entire database
Communication
costs are high
Partitioned
Site 1
DB
Site 4
Site 2
Computer
Network
Site 3
DB
DB
Partitioned
Site 1
DB
Site 4
Site 2
Computer
Network
Site 3
DB
DB
DB
Partitioned
Database is partitioned into
disjoint fragments
Each fragment is assigned
to one site
Site 1
Storage costs are low since
there are no replications
DB
Site 4
Site 2
Locality of reference is high
if data access frequently
occurs in the site where data
is located
Computer
Network
Site 3
DB
Availability and reliability are
low but higher than centralized
Performance should be
good
DB
DB
Communication costs low if
distribution is designed
properly
Complete Replication
Site 1
DB
Site 4
Site 2
Computer
Network
Site 3
DB
DB
Complete Replication
Site 1
DB
Site 4
Site 2
Computer
Network
Site 3
DB
DB
Complete Replication
Maintaining a complete
copy of the database at
each site
Site 1
DB
Site 4
Site 2
Computer
Network
Locality of reference,
reliability, performance and
availability are high
Storage costs are high
Communication costs for
updates are high
Site 3
DB
DB
Selective Replication
Selective replication
Site 1
is a combination of
replication, partitioning
and centralization.
DB
Site 4
Site 2
Computer
Network
Site 3
DB
DB
Outline
Evolution of data processing
What is a DDBMS?
Motivation behind DDBMS
Types of Distributed Databases
Distributed Data Storage
Replication
Fragmentation
Transparencies in a DDBMS
Transparencies in a DDBMS
Transparency hide implementation details from the
user
Types of DDBMS transparencies
Distribution transparency
Transaction transparency
Performance transparency
DBMS transparency
Transparencies in a DDBMS
Distribution transparency
Transaction transparency
Performance transparency
DBMS transparency
Distribution transparency
Allows the user to perceive the database as a single
logical entity.
Types of distribution transparency
The user does not need to know :
that the data is fragmented (fragmentation transparency)
location of data items (location transparency)
The user is unaware of the replication of fragments
(replication transparency)
If the user needs to know about fragmented data and
location of fragments, then there is local mapping
transparency
The DBMS must ensure that no two sites create a database
object with the same name (naming transparency)
Transparencies in a DDBMS
Distribution transparency
Transaction transparency
Performance transparency
DBMS transparency
Transaction transparency
Ensures that all distributed transactions maintain
the distributed database’s integrity and consistency.
What is a distributed transaction?
Accesses data stored at more than one location
Each transaction is divided into a number of
subtransactions, one for each site that has to be accessed
Transaction transparency
Fragmented schema: S1, S2, S21, S22 , S23 ,
a transaction T that prints out the names of
all staff;
Subtransactions:
Ts3 : at site 3
Ts5 : at site 5
Ts7 : at site 7
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
S21: bno=‘B3’ (S2)
site 3
S22: bno=‘B5’ (S2)
site 5
S22: bno=‘B7’ (S2)
site 7
site5
Transaction transparency
Distributed transaction (transactions can
execute concurrently; inherent parallelism)
Time
Ts3
Ts5
Ts7
t1
begin
begin
t2
read(fname,lname)
read(fname,lname) read(fname,lname)
t3
print(fname,lname)
print(fname,lname) print(fname,lname)
t4
end
end
begin
end
Transaction transparency
The DBMS must ensure the indivisibility of each
subtransaction
It must ensure the synchronization of subtransactions
with other local transactions that are executing
concurrently at a site
It must ensure the synchronization of subtransactions
with global transactions that are running simultaneously
at the same or different sites
Note: Transaction transparency in a DDBMS is
complicated by the fragmentation, allocation and
replication schemas
Transaction transparency
Aspects of transaction transparency
Concurrency transparency
Failure transparency
Transaction transparency: Concurrency
transparency
Results of all concurrent transactions (distributed and nondistributed) :
execute independently
Logically consistent with the results that are obtained
if transactions are executed one at a time, in some
arbitrary serial order
Note: There is added complexity because the DDBMS
must ensure:
that both local and global transactions do not interfere with each other
The consistency of all subtransactions of the global transaction
Transaction transparency: Concurrency transparency
Strategies (for replication, which makes concurrency more
complex):
Propagate the changes
If one site holding a copy is not reachable, the
transaction is delayed until the site is reachable
Limit the update propagation to currently available sites;
remaining sites are updated when they become available
Allow the updates to copies to happen asynchronously
sometime after the original update
NOTE: there may be a delay in regaining consistency
and this may range from a few seconds to several
hours
Transaction transparency: Failure transparency
The DDBMS must provide for a recovery
mechanism:
Ensure subtransactions of a global transaction are
atomic, that is, all commit or all abort
Before recording a final commit for the global
transaction, ensure that all subtransactions
completed successfully
In addition to the above, it must cater for:
Loss of a message
Failure of a communication link
Failure of a site
Network partitioning
Transaction transparency: Failure transparency
Example: Given a global transaction
that has to update data at two sites,
S1 and S2
Site 1
DB
Site 4
Site 2
Computer
Network
Subtransaction at S2 is unable to
commit and rolls back the changes
to ensure local consistency
Site 3
DB
Subtransaction at S1 completes
successfully and COMMIT
Problem:
DB
The distributed database is now in an
inconsistent state.
We are unable to uncommit the data at
site S1 due to the durability of the
subtransaction at S1
Transparencies in a DDBMS
Distribution transparency
Transaction transparency
Performance transparency
DBMS transparency
Performance transparency
Requires a DDBMS to perform as if it were a
centralized DBMS.
Requires DDBMS to determine the most cost-
effective strategy to execute a request.
Transparencies in a DDBMS
Distribution transparency
Transaction transparency
Performance transparency
DBMS transparency
DBMS transparency
It hides the knowledge that the local DBMSs
may be different
It is applicable to heterogeneous DDBMSs.
One of the most difficult to provide.
Date’s Twelve Rules for a DDBMS
(0) Fundamental principle
To the user, a distributed system should look exactly like a
distributed system.
Local autonomy
(1)
•
Local data is locally owned and managed
•
Local operations remain purely local
•
All operations at a given site are owned by that site
(2)
No reliance on a central site
(3)
Continuous operation (no shutdown) in the case of:
•
Adding or removing a site from the system
•
Dynamic creation and deletion of fragments at one or more
site
Date’s Twelve Rules for a DDBMS
(4) Location independence
(5) Fragmentation independence
(6) Replication independence
(7) Distributed query processing
(8) Distributed transaction processing
(9) Hardware independence
(10) Operating system independence
(11) Network independence
(12) Database independence
Summary: Characteristics of a DDBMS
A collection of logically related data.
The data is split into a number of fragments.
Fragments may be replicated.
Fragments/replicas are allocated to sites.
The sites are linked by a communications network.
The data at each cite is under the control of a DBMS.
The DBMS at each site can handle local applications
autonomously.
Each DBMS participates in at least one global
application.
The End
Thank You!!!