Distributed Databases Outline  Evolution of data processing  What is a DDBMS?  Motivation behind DDBMS  Types of Distributed Databases  Distributed Data.

Download Report

Transcript Distributed Databases Outline  Evolution of data processing  What is a DDBMS?  Motivation behind DDBMS  Types of Distributed Databases  Distributed Data.

Distributed Databases
Outline
 Evolution of data processing
 What is a DDBMS?
 Motivation behind DDBMS
 Types of Distributed Databases
 Distributed Data Storage
Replication
Fragmentation
 Transparencies in a DDBMS
Outline
 Evolution of data processing
 What is a DDBMS?
 Motivation behind DDBMS
 Types of Distributed Databases
 Distributed Data Storage
Replication
Fragmentation
 Transparencies in a DDBMS
Evolution of data processing
 File-based system
Each application defined and maintained its data
 Database technology (DBMS)
Data is defined and administered centrally.
A single logical database located at one site
managed/controlled by a single DBMS
 Distributed database technology (DDBMS)
Decentralization
Allow users to access not only data at their own site
but also data stored at remote sites.
File-based system
a collection of application
programs that perform
services for the end users
such as the production of
reports.
Registrar
enrollment
Data entry of
courses
Data entry of
enrollment
prepared by:RdDB
Student
Courses
Faculty
File
EAF
Generation of
EAF
prepared by:RdDB
Student
Courses
Faculty
File-based system
Registrar
enrollment
Data entry of
courses
Data entry of
enrollment
struct course
{ char code[5];
char desc[20];
int units [3];
}
File
EAF
Each program in the
system defines and
manages its own data.
[CBS98]
Each user (with the assistance of
DP staff) defines and implements
(including storage and control) the
files needed for a specific
application. [CBS98, EN94]
Registrar
enrollment
Accounting
prepared by:RdDB
File-based systems
File
Student
Courses
Faculty
EAF
File
Student
Fees
payment of fees
OR
Department
Processing of
grades
File
course
cards
Student
Courses
Grades
Faculty
What can be observed?
prepared by:RdDB
Data redundancy
Student System (File-based)
Registrar
enrollment
Accounting
File
Student
Courses
Faculty
EAF
File
Student
Fees
payment of fees
OR
Department
Processing of
grades
File
course
cards
Student
Courses
Grades
Faculty
What can Student
be observed?
System
Separation and
(File-based)
prepared by:RdDB
isolation of data
Registrar
enrollment
Accounting
File
Student
Courses
Faculty
EAF
File
Student
Fees
payment of fees
OR
Department
Processing of
grades
File
course
cards
Student
Courses
Grades
Faculty
What can be observed?
Program-data dependence
Student System (File-based)
struct person
{ char first[20];
char middle[3];
Registrar
enrollment
char last[30];
} employees,
managers;
Accounting
File
prepared by:RdDB
Student
Courses
Faculty
EAF
File
Student
Fees
payment of fees
OR
Department
Processing of
grades
File
course
cards
Student
Courses
Grades
Faculty
What can be observed?
Incompatibility of files
Student System (File-based)
COBOL
Registrar
enrollment
File
prepared by:RdDB
Student
Courses
Faculty
EAF
C
Accounting
File
Student
Fees
payment of fees
OR
Department
Processing of
grades
File
course
cards
Student
Courses
Grades
Faculty
What can be observed?
Fixed
queries;
Student System (File-based)
proliferation
prepared by:RdDB
of application
programs
Registrar
enrollment
Accounting
File
Student
Courses
Faculty
EAF
File
Student
Fees
payment of fees
OR
Department
Processing of
grades
File
course
cards
Student
Courses
Grades
Faculty
Limitations of File-Based Systems
 Separation and isolation of data
 Duplication of data
 Program-data dependence
 Incompatibility of files (e.g C vs. COBOL)
 Fixed queries / proliferation of application
programs
prepared by:RdDB
Factors that limit File-Based System
prepared by:RdDB
 The definition of data is embedded in the
application programs, rather than being
stored separately and independently. [CBS98]
 There is no control over the access and
manipulation of data beyond that imposed by
the application programs. [CBS98]
Database Approach
What is a database?
prepared by:RdDB
 It is a shared collection of logically coherent
data with some inherent meaning;
 It is designed , built and populated with data
for specific purpose such as meeting the
information needs of an organization;
Ex: student database, employee database,
library database, air flights database, hospital
database, etc.
What is a DBMS?
 Database Management System
 It is a software system that enables users to :
define, create and maintain the database
 DDL
 DML
provide controlled access to this database
 Security
 Integrity
 Concurrency control
 Recovery control
 User-accessible catalogue (description of data)
A Simple Database System Environment
DATABASE
SYSTEM
Application
Programs/
Queries
prepared by:RdDB
DBMS Software
Software to
process
programs/
queries
Software to
access stored
data
Data
Definition
Database
University System: Database Environment
prepared by:RdDB
Registrar
DATABASE
SYSTEM
Accounting
Department
Enrollment
Payment
Grade processing
MySQL
Software to
process
programs/
queries
Software to
access stored
data
Data
Definition
University
database
Outline
 Evolution of data processing
 What is a DDBMS?
 Motivation behind DDBMS
 Types of Distributed Databases
 Distributed Data Storage
Replication
Fragmentation
 Transparencies in a DDBMS
What is a Distributed database?
 A distributed database is a
logically interrelated collection of shared data (and
a description of this data),
physically distributed over a computer network.
What is a Distributed DBMS?
 Distributed DBMS is the software system that
permits the management of the distributed
database and
makes the distribution transparent to users.
Topology of DDBMS
Site 1
DB
Site 4
Site 2
Computer
Network
Site 3
DB
DB
DB
Topology of Distributed Processing
Site 1
Site 4
Site 2
Computer
Network
A centralized database
that can be accessed
over a computer
network
Site 3
DB
Outline
 Evolution of data processing
 What is a DDBMS?
 Motivation behind DDBMS
 Types of Distributed Databases
 Distributed Data Storage
Replication
Fragmentation
 Transparencies in a DDBMS
Motivation behind DDBMS
1. To integrate the operational data and provide
controlled access to the data.
 This may imply centralization but it is not the
intention
2. To adopt a decentralize approach to data
which mirrors the organizational structure of
many companies.
 Each unit maintains its own data.
 Data is stored proximate to the location which
would improve:
 Shareability of the data
 Efficiency of data access
Motivation behind DDBMS
3. To help resolve the islands of information
problem:
 Geographical separation
 Incompatible computer architectures
 Incompatible communication protocols
Why Distribute??
 Example: X Corp. has offices in London, New
York, and Hong Kong.
 Employee data:
EMP(ENUM, NAME, TITLE, SALARY, …)
 Where should the employee data table
reside?
(example from Stanford University)
X Corp. Data Access Pattern
 Mostly, employee data is managed at the
office where the employee works
E.g., payroll, benefits, hire and fire
 Periodically, X Corp needs consolidated
access to employee data
E.g., X Corp. changes benefit plans and that
affects all employees.
E.g., Annual bonus depends on global net profit.
(example from Stanford University)
New York
Payroll app
London
Payroll app
EMP
London
New York
Internet
Hong Kong
Payroll app
NY and HK payroll
apps run very slowly!
Hong Kong
(example from Stanford University)
New York
Payroll app
London
Payroll app
London
London
Emp
New York
NY
Emp
Internet
Hong Kong
Payroll app
Much better!!
Hong Kong
HK
Emp
(example from Stanford University)
Fundamental principle of DDBMS
 The DDBMS is expected to make the
distribution transparent (invisible) to the user.
 The objective of transparency:
To make the distributed system appear like a
centralized system.
Outline
 Evolution of data processing
 What is a DDBMS?
 Motivation behind DDBMS
 What is a distributed database system?
 Types of Distributed Databases
 Distributed Data Storage
Replication
Fragmentation
 Transparencies in a DDBMS
Distributed Database System
Data is spread over multiple machines (also
A Distributed
System
referred to as
sites or nodes).
The computers may vary in size and function
A Distributed System
Network interconnects the machines
Database is stored on several sites.
A Distributed System
Data is shared by users on multiple machines
Account(actno,
br-name, Aki-01
balance)
Add 100 to account
Makati
Branch(br-name, br-city,assets)
Local transaction: If
transaction is initiated at
network
Makati branch
Q.C.
Account(actno, br-name, balance)
Main
Manila
Account(actno, br-name, balance)
Account(actno, br-name, balance)
Makati
Q.C.
Branch(br-name, br-city,assets)
Main
Global: if transaction is initiated
elsewhere network
Global Transaction Example:
transfer 100 pesos from account Manila
Aki-01 (Makati) to account Aki100 (Manila)
Account(actno, br-name, balance)
Account(actno, br-name, balance)
Local and Global Transactions
 A local transaction accesses data in the
single site at which the transaction was
initiated.
 A global transaction either:
 accesses data in a site different from the
one at which the transaction was initiated or
(e.g. funds transfer)
 accesses data in several different sites.
(e.g. summarization of deposits in all
branches)
Distributed Database System
Makati
Main
network
Q.C.
 A distributed database system consistsManila
of loosely coupled sites that share no
physical component
 Database systems that run on each site
are independent of each other
 Transactions may access data at one or
more sites
Why Distributed Database Systems?
 Sharing data – users at one site are able
to access the data residing at some
other sites.
 Autonomy – each site is able to retain a
degree of control over data stored
locally.
 Higher system availability through
redundancy — data can be replicated at
remote sites, and system can function
even if a site fails.
Trade-offs in Distributed Systems
 Disadvantage: added complexity
required to ensure proper
coordination among sites.
Software development cost.
Greater potential for bugs.
Increased processing overhead.
Outline
 Evolution of data processing
 What is a DDBMS?
 Motivation behind DDBMS
 Types of Distributed Databases
 Distributed Data Storage
Replication
Fragmentation
 Transparencies in a DDBMS
Homogeneous Distributed Database
oracle
oracle
 All sites have
identical database
management
system software
 Sites are aware of
Makati
Main
network
Manila
Q.C.
each other and
agree to cooperate
in processing user
requests.
 Each site
surrenders part of
its autonomy in
terms of right to
change schemas or
software
oracle
oracle
 Appears to user as
a single system
Heterogeneous Distributed Database
SQl server
oracle
different schemas and
software (DBMS)
Makati
Main
network
Q.C.
 Different sites may use
Manila
 Difference in schema
is a major problem
for query processing
 Difference in
software is a major
problem for
transaction
processing
 Sites may not be aware
Sybase
DB2
of each other and may
provide only limited
facilities for cooperation
in transaction
processing
Outline
 Evolution of data processing
 What is a DDBMS?
 Motivation behind DDBMS
 Types of Distributed Databases
 Distributed Data Storage
Replication
Fragmentation
 Transparencies in a DDBMS
Approaches: Distributed Data Storage
1.
Data Replication
System maintains multiple copies of the relation, stored
in different sites, for faster retrieval and fault
tolerance.
2.
Data Fragmentation
Relation is partitioned into several fragments stored in
distinct sites
Note: Replication and fragmentation can be combined
Relation is partitioned into several fragments: system
maintains several identical replicas of each
fragment.

Allocation
Each fragment is stored at the site with optimal
distribution
Data Replication
Account(actno, br-name, balance)
Makati
Q.C.
A relation or fragment of a
relation is network
replicated if it is
stored redundantly in two or
more sites.
Account(actno, br-name, balance)
Main
Manila
Account(actno, br-name, balance)
Data Replication
Account(actno, br-name, balance)
Makati
Account(actno, br-name, balance)
Full replication of a relationMain
is
the case where the relation is
network
stored at all sites.
Q.C.
Account(actno, br-name, balance)
Manila
Account(actno, br-name, balance)
Bank database
Makati
Q.C.
Data Replication
Main
Fully redundant databases are those
in which everynetwork
site contains a copy of
the entire database.
Manila
Advantages of Data Replication
 Availability: failure of site containing
relation r does not result in unavailability of
r if replicas exist.
 Parallelism: queries on r may be
processed by several nodes in parallel.
 Reduced data transfer: relation r is
available locally at each site containing a
replica of r.
Disadvantages Data Replication
 Increased cost of updates: each replica
of relation r must be updated.
 Increased complexity of concurrency
control: concurrent updates to distinct
replicas may lead to inconsistent data
unless special concurrency control
mechanisms are implemented.
Data Fragmentation
 A relation may be divided into a number of sub-
relations called fragments, which are then
distributed.
 Division of relation r into fragments r1, r2, …, rn
which contain sufficient information to
reconstruct relation r.
relation r
r1
r2
…
rn
Relation r
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
a3
b1
c3
d3
e3
a4
b1
c4
d4
e4
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
a9
b3
c9
d9
e9
a10
b3
c10
d10
e10
a11
b3
c11
d11
e11
a12
b3
c12
d12
e12
}
}
}
r1
r2
r3
Why fragment? : Advantages
 Usage: Applications work with views rather
than entire relations
 Efficiency: Data is stored to where it is most
frequently used
 Parallelism: A transaction can be divided into
subqueries that operate on fragments
 Security: Data not required by local
applications is not stored and consequently
not available to unauthorized users
Disadvantages of fragmentation
 Performance: Performance of applications
that require data form several fragments
located at different sites may be slower
 Integrity :May be more difficult to implement
Data Fragmentation
Types of Data Fragmentation
Horizontal fragmentation
Vertical fragmentation
Mixed fragmentation
Horizontal fragmentation
 A horizontal fragment of a relation consists of a subset
of the tuples of a relation.
 It is produced by specifying a predicate that performs
a restriction on the tuples in the relation
 It is defined using the selection operation of relational
algebra
r1
R

r2
r3
Horizontal fragmentation
 Given a relation R, a horizontal fragment
is defined as:
r1 = p (R )
Where p is the predicate based on
one or more attributes of the relation
Relation r
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
a3
b1
c3
d3
e3
a4
b1
c4
d4
e4
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
a9
b3
c9
d9
e9
a10
b3
c10
d10
e10
a11
b3
c11
d11
e11
a12
b3
c12
d12
e12
r1 = B=‘b1’ (r )
r2 = B=‘b2’ (r )
r3 = B=‘b3’ (r )
Relation r
r1 = B=‘b1’ (r )
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
a3
b1
c3
d3
e3
a4
b1
c4
d4
e4
Relation r
r2 = B=‘b2’ (r )
A
B
C
D
E
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
Relation r
r3 = B=‘b3’ (r )
A
B
C
D
E
a9
b3
c9
d9
e9
a10
b3
c10
d10
e10
a11
b3
c11
d11
e11
a12
b3
c12
d12
e12
Horizontal fragmentation
 Given:
PROPERTY_FOR_RENT(pno, street, area, city,
pcode, type, rooms, rent, ono)
 Horizontal fragmentation by property type of
PROPERTY_FOR_RENT
P1: type=‘House’ (PROPERTY_FOR_RENT )
P2: type=‘Flat’ (PROPERTY_FOR_RENT )
Vertical Data Fragmentation
A vertical fragment of a relation consists of a subset of the
attributes of a relation
Vertical fragmentation groups together attributes that are
used by some applications
It is defined using the projection operation of relational
algebra.
R

r1
r2
r3
Vertical Fragmentation
 Given R, a vertical fragment is defined as:
r1 = a1, …., an (R )
Where a1, ….an are attributes of relation R
Relation r
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
a3
b1
c3
d3
e3
a4
b1
c4
d4
e4
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
a9
b3
c9
d9
e9
a10
b3
c10
d10
e10
a11
b3
c11
d11
e11
a12
b3
c12
d12
e12
r1 = A, B,C (r)
r2 = A, D (r)
r3 = A, E (r)
Relation r
A
B
C
a1
b1
c1
a2
b1
c2
a3
b1
c3
a4
b1
c4
a5
b2
c5
a6
b2
c6
a7
b2
c7
a8
b2
c8
a9
b3
c9
a10
b3
c10
a11
b3
c11
a12
b3
c12
r1 = A, B,C (r)
Relation r
A
D
a1
d1
a2
d2
a3
d3
a4
d4
a5
d5
a6
d6
a7
d7
a8
d8
a9
d9
a10
d10
a11
d11
a12
d12
r2 = A, D (r)
Relation r
A
E
a1
e1
a2
e2
a3
e3
a4
e4
a5
e5
a6
e6
a7
e7
a8
e8
a9
e9
a10
e10
a11
e11
a12
e12
r3 = A, E (r)
Vertical Fragmentation
Given:
STAFF(Sno, Fname, Lname, Address, Telno,
Position, Sex, DOB, Salary, NIN,Bno)
repeated
Vertical Fragmentation of staff
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
Mixed Fragmentation
 A mixed fragment of a relation
consists of
either:
a horizontal fragment that is
subsequently vertically fragmented or
vertical fragment that is then
horizontally fragmented
 It is defined using the selection and
projection operations of relational algebra
Mixed Fragmentation
 Given a relation R, a mixed fragment is
defined as:
Vertical fragment that is then horizontally
fragmented
p(a1, …., an (R ))
Horizontal fragment that is then vertically
fragmented
a1, …., an (p(R ))
Data Fragmentation
Mixed fragmentation:
Vertical fragments,
horizontally fragmented
Mixed fragmentation:
Horizontal fragments,
vertically fragmented
Relation r
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
a3
b1
c3
d3
e3
a4
b1
c4
d4
e4
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
a9
b3
c9
d9
e9
a10
b3
c10
d10
e10
a11
b3
c11
d11
e11
a12
b3
c12
d12
e12
r1 = B=‘b1’ (r )
r2 = B=‘b2’ (r )
r3 = B=‘b3’ (r )
r3.1 = A, B,C (r)
r3.2 = A, D (r)
r3.3 = A, E (r)
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
a3
b1
c3
d3
e3
a4
b1
c4
d4
e4
A
B
C
D
E
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
A
B
C
D
E
a9
b3
c9
d9
e9
a10
b3
c10
d10
e10
a11
b3
c11
d11
e11
a12
b3
c12
d12
e12
r1 = B=‘b1’ (r )
r2 = B=‘b2’ (r )
r3 = B=‘b3’ (r )
A
B
C
a9
b3
c9
a10
b3
c10
a11
b3
c11
a12
b3
c12
A
D
a9
d9
a10
d10
a11
d11
a12
d12
A
E
a9
e9
a10
e10
a11
e11
a12
e12
r3.1 = A, B,C (r3)
r3.2 = A, D (r3)
r3.3 = A, E (r3)
Mixed Fragmentation
Given:
STAFF(Sno, Fname, Lname, Address, Telno,
Position, Sex, DOB, Salary, NIN,Bno)
 Vertically fragment Staff into:
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
Mixed Fragmentation
 Horizontally fragment S2 according to branch
number.
S2: Sno, fname, lname,address,telno,bno (Staff)
S21: 
bno=‘B3’ (S2)
S22: 
bno=‘B5’ (S2)
S22: 
bno=‘B7’ (S2)
Rules: Correctness of fragmentation
 Rule1: Completeness
 Rule2: Reconstruction
 Rule3: Disjointness
Rules: Correctness of fragmentation
 Rule1: Completeness
If a relation instance R is decomposed into
fragments R1, R2, … Rn , each data item
that can be found in R must appear in at
least one fragment.
This is necessary to ensure that there is no
loss of data during fragmentation
Rule1: Completeness
R
r1
r2
If a relation instance R is decomposed
into fragment R1, R2, … Rn , each data
item that can be found in R must
appear in at least one fragment.
Relation r(A,B,C,D,E)
A
B
C
D
E
a1
b1
c1
d1
e1
A
B
C
D
E
a2
b1
c2
d2
e2
a1
b1
c1
d1
e1
a3
b1
c3
d3
e3
a2
b1
c2
d2
e2
a4
b1
c4
d4
e4
a3
b1
c3
d3
e3
A
B
C
D
E
a4
b1
c4
d4
e4
a5
b2
c5
d5
e5
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
a8
b2
c8
d8
e8
a9
b3
c9
d9
e9
A
B
C
D
E
a10
b3
c10
d10
e10
a9
b3
c9
d9
e9
a11
b3
c11
d11
e11
a10
b3
c10
d10
e10
a12
b3
c12
d12
e12
a11
b3
c11
d11
e11
a12
b3
c12
d12
e12
Rules: Correctness of fragmentation
 Rule 2: Reconstruction
It must be possible to define a relational operation
that will reconstruct the relation R from the
fragments.
This rule ensures that functional dependencies are
preserved.
r1
Rule2: Reconstruction
R
r2
Relational
operation
It must be possible to define a relational
operation that will reconstruct the relation R
from the fragments.
Horizontal: Union r1  r2 = r
Vertical: Natural Join operation: S1
S2
A
B
C
D
E
a1
b1
c1
d1
e1
A
B
C
D
E
a2
b1
c2
d2
e2
a1
b1
c1
d1
e1
a3
b1
c3
d3
e3
a2
b1
c2
d2
e2
a4
b1
c4
d4
e4
a3
b1
c3
d3
e3
A
B
C
D
E
a4
b1
c4
d4
e4
a5
b2
c5
d5
e5
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
a8
b2
c8
d8
e8
a9
b3
c9
d9
e9
A
B
C
D
E
a10 b3
c10 d10 e10
a9
b3
c9
d9
e9
a11 b3
c11 d11 e11
a12 b3
c12 d12 e12
a10 b3
c10 d10 e10
a11 b3
c11 d11 e11
a12 b3
c12 d12 e12
R= r1 r2 r2
A
B
C
A
D
A
E
a1
b1
c1
a1
d1
a1
e1
a2
b1
c2
a2
d2
a2
e2
a3
b1
c3
a3
d3
a3
e3
a4
b1
c4
a4
d4
a4
e4
a5
b2
c5
a5
d5
a5
a6
b2
c6
a6
d6
a7
b2
c7
a7
a8
b2
c8
a9
b3
a10
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
e5
a3
b1
c3
d3
e3
a6
e6
a4
b1
c4
d4
e4
d7
a7
e7
a5
b2
c5
d5
e5
a8
d8
a8
e8
a6
b2
c6
d6
e6
c9
a9
d9
a9
e9
b3
c10
a10
d10
a10
e10
a7
b2
c7
d7
e7
a11
b3
c11
a11
d11
a11
e11
a8
b2
c8
d8
e8
a12
b3
c12
a12
d12
a12
e12
a9
b3
c9
d9
e9
R= r1
r2
r2
a10 b3
c10 d10 e10
a11 b3
c11 d11 e11
a12 b3
c12 d12 e12
Rules: Correctness of fragmentation
 Rule 3: Disjointness
If a data item di appears in fragment Ri, then it
should not appear in any other fragment.
This rule ensures minimal data redundancy
Vertical fragmentation is the exception to this rule,
where the primary key attributes must be repeated
to allow reconstruction.
Rule3: Disjointness
R
r1
r2
If a data item di appears in fragment Ri, then it
should not appear in any other fragment,
except for the primary key in the case of
vertical fragmentation
A
B
C
D
E
a1
b1
c1
d1
e1
A
B
C
D
E
a2
b1
c2
d2
e2
a1
b1
c1
d1
e1
a3
b1
c3
d3
e3
a2
b1
c2
d2
e2
a4
b1
c4
d4
e4
a3
b1
c3
d3
e3
a4
b1
c4
d4
e4
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
a9
b3
c9
d9
e9
R= r1 r2 r2
A
B
C
D
E
a5
b2
c5
d5
e5
a6
b2
c6
d6
e6
a7
b2
c7
d7
e7
a8
b2
c8
d8
e8
A
B
C
D
E
a10 b3
c10 d10 e10
a9
b3
c9
d9
e9
a11 b3
c11 d11 e11
a12 b3
c12 d12 e12
a10 b3
c10 d10 e10
a11 b3
c11 d11 e11
a12 b3
c12 d12 e12
No data item
appears in
more than
one fragment
A
B
C
A
D
A
E
a1
b1
c1
a1
d1
a1
e1
a2
b1
c2
a2
d2
a2
e2
a3
b1
c3
a3
d3
a3
e3
a4
b1
c4
a4
d4
a4
e4
a5
b2
c5
a5
d5
a5
a6
b2
c6
a6
d6
a7
b2
c7
a7
a8
b2
c8
a9
b3
a10
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c2
d2
e2
e5
a3
b1
c3
d3
e3
a6
e6
a4
b1
c4
d4
e4
d7
a7
e7
a5
b2
c5
d5
e5
a8
d8
a8
e8
a6
b2
c6
d6
e6
c9
a9
d9
a9
e9
b3
c10
a10
d10
a10
e10
a7
b2
c7
d7
e7
a11
b3
c11
a11
d11
a11
e11
a8
b2
c8
d8
e8
a12
b3
c12
a12
d12
a12
e12
a9
b3
c9
d9
e9
R= r1
r2
r2
No data item appears
in more than one
fragment except for
the primary key
a10 b3
c10 d10 e10
a11 b3
c11 d11 e11
a12 b3
c12 d12 e12
Check Correctness:
Horizontal Fragmentation
 Given:
PROPERTY_FOR_RENT(pno, street, area, city,
pcode, type, rooms, rent, ono)
 Horizontal fragmentation by property type of
PROPERTY_FOR_RENT
P1: type=‘House’ (PROPERTY_FOR_RENT )
P2: type=‘Flat’ (PROPERTY_FOR_RENT )
Check Correctness:
Horizontal Fragmentation
P1: type=‘House’ (PROPERTY_FOR_RENT )
P2: type=‘Flat’ (PROPERTY_FOR_RENT )
 Completeness: Each tuple in the relation appears in
either fragment P1 or P2.
 Reconstruction: The Property_For_Rent relation can
be reconstructed from the fragments using Union
operation:
S1  S2 = Property_For_Rent
 Disjointness:
 The fragments are disjoint;
 There can be no property type that is both house and flat
Check Correctness:
Vertical Fragmentation
Given:
STAFF(Sno, Fname, Lname, Address, Telno,
Position, Sex, DOB, Salary, NIN,Bno)
Vertical Fragmentation of staff
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
Check Correctness:
Vertical Fragmentation
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
 Completeness: Each attribute in Staff relation
appears in either fragment S1 or S2.
 Reconstruction: The Staff relation can be
reconstructed from the fragments using
Natural Join operation: S1
S2
 Disjointness: S1 and S2 are disjoint except for
the necessary duplication of the primary key.
Mixed Fragmentation
Given:
STAFF(Sno, Fname, Lname, Address, Telno,
Position, Sex, DOB, Salary, NIN,Bno)
 Vertically fragment Staff into:
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
Mixed Fragmentation
 Horizontally fragment S2 according to branch
number.
S2: Sno, fname, lname,address,telno,bno (Staff)
S21: 
bno=‘B3’ (S2)
S22: 
bno=‘B5’ (S2)
S22: 
bno=‘B7’ (S2)
Check Correctness:
Mixed Fragmentation
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
S21: 
bno=‘B3’ (S2)
S22: 
bno=‘B5’ (S2)
S22: 
bno=‘B7’ (S2)
Completeness

Each attribute in Staff relation appears either in fragment
S1 or S2

Each tuple (part) appears in fragment S1 and either
fragment S21, S22, or S23.
Check for correctness:
Mixed Fragmentation
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
S21: 
bno=‘B3’ (S2)
S22: 
bno=‘B5’ (S2)
S22: 
bno=‘B7’ (S2)
Reconstruction
 The Staff relation can be reconstructed from the
fragments using the Union and Natural Join
operations:
 S1
( S21  S22  S23 ) = Staff
Check for correctness:
Mixed Fragmentation
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
S21: 
bno=‘B3’ (S2)
S22: 
bno=‘B5’ (S2)
S22: 
bno=‘B7’ (S2)
Disjointness
 The fragments are disjoint
 There can be no staff member that works in more than
one branch
 S1 and S2 are disjoint except for the necessary
duplication of the primary key.
Advantages of Horizontal Fragmentation
 allows parallel processing on
fragments of a relation
 allows a relation to be split so that
tuples are located where they are
most frequently accessed
Advantages of Vertical Fragmentation
 allows tuples to be split so that each
part of the tuple is stored where it is
most frequently accessed
Basis of design: definition and allocation of
fragments
 Analyze applications
Concentrate on the most important ones
80/20 rule may be used as a guideline
 most active 20% of user queries account for 80% of the
total data access
 Quantitative information (used in allocation)
Frequency with which an application is run
Site from which an application is run
Performance criteria for transactions and
applications
Basis of design: definition and allocation of fragments
 Qualitative information (used in
fragmentation)
Transactions executed by the application
Type of access (read or write)
Predicates of read operations
Objectives of fragment definition and allocation
 Locality of reference
 Data should be stored close to where it is used
 If a fragment is used at several sites, it may be beneficial to
store copies of the fragments at these sites
 Improved reliability and availability
 Made possible through replication
 If one site fails, another copy is available at another site
 Acceptable performance
 Bad allocation may result in occurrence of bottlenecks;
 Bad allocation may also result in under utilization of resources.
Objectives of fragment definition and allocation
 Balanced storage capacities and costs
Availability and cost of storage at each site
 Minimal communication costs
Consider cost of remote requests
Retrieval costs are minimized
 when locality of reference is maximized or
 when each site has its own copy of the data
Updating replicated data is costly
Strategies for Data allocation
 Centralized
 Partitioned (or fragmented)
 Complete replication
 Selective replication
Single database
Centralized
Workstation 1
Workstation 3
Workstation 2
LAN
Database
Server with
DBMS
Single DBMS
Users distributed
across the network
Reliability and
availability are low –
failure of central site
results in loss of the
entire database
Communication
costs are high
Partitioned
Site 1
DB
Site 4
Site 2
Computer
Network
Site 3
DB
DB
Partitioned
Site 1
DB
Site 4
Site 2
Computer
Network
Site 3
DB
DB
DB
Partitioned
Database is partitioned into
disjoint fragments
Each fragment is assigned
to one site
Site 1
Storage costs are low since
there are no replications
DB
Site 4
Site 2
Locality of reference is high
if data access frequently
occurs in the site where data
is located
Computer
Network
Site 3
DB
Availability and reliability are
low but higher than centralized
Performance should be
good
DB
DB
Communication costs low if
distribution is designed
properly
Complete Replication
Site 1
DB
Site 4
Site 2
Computer
Network
Site 3
DB
DB
Complete Replication
Site 1
DB
Site 4
Site 2
Computer
Network
Site 3
DB
DB
Complete Replication
Maintaining a complete
copy of the database at
each site
Site 1
DB
Site 4
Site 2
Computer
Network
Locality of reference,
reliability, performance and
availability are high
Storage costs are high
Communication costs for
updates are high
Site 3
DB
DB
Selective Replication
Selective replication
Site 1
is a combination of
replication, partitioning
and centralization.
DB
Site 4
Site 2
Computer
Network
Site 3
DB
DB
Outline
 Evolution of data processing
 What is a DDBMS?
 Motivation behind DDBMS
 Types of Distributed Databases
 Distributed Data Storage
Replication
Fragmentation
 Transparencies in a DDBMS
Transparencies in a DDBMS
 Transparency hide implementation details from the
user
 Types of DDBMS transparencies
Distribution transparency
Transaction transparency
Performance transparency
DBMS transparency
Transparencies in a DDBMS
 Distribution transparency
 Transaction transparency
 Performance transparency
 DBMS transparency
Distribution transparency
 Allows the user to perceive the database as a single
logical entity.
 Types of distribution transparency
The user does not need to know :
 that the data is fragmented (fragmentation transparency)
 location of data items (location transparency)
The user is unaware of the replication of fragments
(replication transparency)
If the user needs to know about fragmented data and
location of fragments, then there is local mapping
transparency
The DBMS must ensure that no two sites create a database
object with the same name (naming transparency)
Transparencies in a DDBMS
 Distribution transparency
 Transaction transparency
 Performance transparency
 DBMS transparency
Transaction transparency
 Ensures that all distributed transactions maintain
the distributed database’s integrity and consistency.
 What is a distributed transaction?
Accesses data stored at more than one location
Each transaction is divided into a number of
subtransactions, one for each site that has to be accessed
Transaction transparency
 Fragmented schema: S1, S2, S21, S22 , S23 ,
 a transaction T that prints out the names of
all staff;
 Subtransactions:
Ts3 : at site 3
Ts5 : at site 5
Ts7 : at site 7
S1: Sno, position,sex, dob, salary,nin (Staff)
S2: Sno, fname, lname,address,telno,bno (Staff)
S21:  bno=‘B3’ (S2)
site 3
S22:  bno=‘B5’ (S2)
site 5
S22:  bno=‘B7’ (S2)
site 7
site5
Transaction transparency
 Distributed transaction (transactions can
execute concurrently; inherent parallelism)
Time
Ts3
Ts5
Ts7
t1
begin
begin
t2
read(fname,lname)
read(fname,lname) read(fname,lname)
t3
print(fname,lname)
print(fname,lname) print(fname,lname)
t4
end
end
begin
end
Transaction transparency
 The DBMS must ensure the indivisibility of each
subtransaction
It must ensure the synchronization of subtransactions
with other local transactions that are executing
concurrently at a site
It must ensure the synchronization of subtransactions
with global transactions that are running simultaneously
at the same or different sites
Note: Transaction transparency in a DDBMS is
complicated by the fragmentation, allocation and
replication schemas
Transaction transparency
Aspects of transaction transparency
 Concurrency transparency
 Failure transparency
Transaction transparency: Concurrency
transparency
Results of all concurrent transactions (distributed and nondistributed) :
execute independently
Logically consistent with the results that are obtained
if transactions are executed one at a time, in some
arbitrary serial order
Note: There is added complexity because the DDBMS
must ensure:
 that both local and global transactions do not interfere with each other
 The consistency of all subtransactions of the global transaction
Transaction transparency: Concurrency transparency
 Strategies (for replication, which makes concurrency more
complex):
Propagate the changes
 If one site holding a copy is not reachable, the
transaction is delayed until the site is reachable
Limit the update propagation to currently available sites;
remaining sites are updated when they become available
Allow the updates to copies to happen asynchronously
sometime after the original update
NOTE: there may be a delay in regaining consistency
and this may range from a few seconds to several
hours
Transaction transparency: Failure transparency
 The DDBMS must provide for a recovery
mechanism:
Ensure subtransactions of a global transaction are
atomic, that is, all commit or all abort
Before recording a final commit for the global
transaction, ensure that all subtransactions
completed successfully
In addition to the above, it must cater for:
 Loss of a message
 Failure of a communication link
 Failure of a site
 Network partitioning
Transaction transparency: Failure transparency
Example: Given a global transaction
that has to update data at two sites,
S1 and S2
Site 1
DB
Site 4
Site 2
Computer
Network
Subtransaction at S2 is unable to
commit and rolls back the changes
to ensure local consistency
Site 3
DB
Subtransaction at S1 completes
successfully and COMMIT
Problem:
DB
The distributed database is now in an
inconsistent state.
We are unable to uncommit the data at
site S1 due to the durability of the
subtransaction at S1
Transparencies in a DDBMS
 Distribution transparency
 Transaction transparency
 Performance transparency
 DBMS transparency
Performance transparency
 Requires a DDBMS to perform as if it were a
centralized DBMS.
 Requires DDBMS to determine the most cost-
effective strategy to execute a request.
Transparencies in a DDBMS
 Distribution transparency
 Transaction transparency
 Performance transparency
 DBMS transparency
DBMS transparency
 It hides the knowledge that the local DBMSs
may be different
 It is applicable to heterogeneous DDBMSs.
 One of the most difficult to provide.
Date’s Twelve Rules for a DDBMS
(0) Fundamental principle

To the user, a distributed system should look exactly like a
distributed system.
Local autonomy
(1)
•
Local data is locally owned and managed
•
Local operations remain purely local
•
All operations at a given site are owned by that site
(2)
No reliance on a central site
(3)
Continuous operation (no shutdown) in the case of:
•
Adding or removing a site from the system
•
Dynamic creation and deletion of fragments at one or more
site
Date’s Twelve Rules for a DDBMS
(4) Location independence
(5) Fragmentation independence
(6) Replication independence
(7) Distributed query processing
(8) Distributed transaction processing
(9) Hardware independence
(10) Operating system independence
(11) Network independence
(12) Database independence
Summary: Characteristics of a DDBMS
 A collection of logically related data.
 The data is split into a number of fragments.
 Fragments may be replicated.
 Fragments/replicas are allocated to sites.
 The sites are linked by a communications network.
 The data at each cite is under the control of a DBMS.
 The DBMS at each site can handle local applications
autonomously.
 Each DBMS participates in at least one global
application.
The End
Thank You!!!