On Replication - Informatics Homepages Server

Download Report

Transcript On Replication - Informatics Homepages Server

On Replication
Yin Chen
July 2006
Overview
• What is? Why need? Types?
• Investigation of existing technologies
–
–
–
–
–
–
–
IBM SQL replication
Sybase replication
Oracle replication
MySQL replication
Globus DRS
EGEE RMS
SRB
• Our project
– Goals
– Solutions
– Features
What is replication?
• Copying of data & synchronization of
updating
• Is not Cashing
– Client phenomenon
– Only for improving response time
• Is not a Backup (not automatically
overwritten when the original data is modified )
• Is not a replicated system
– deal with when/where to copy
– Optimization (how much replica needed …)
– Grow or shrink replication tree
Why we need it?
• Data consolidation (central audit & analyse)
• Data distribution (for branch offices)
• Performance
–
–
–
–
–
Access efficiency (moving data near apps.)
Load balance (distributing access load)
Security (data protection)
Availability (off-line access)
Reliability (disaster recovery, avoiding single
point of failure)
• Data Grid (to improve availability, response
time, fault tolerance)
• Digital Library (copying digital doc, index … )
Replication types
• Synchronous Replication:
What is: updating two storages at the same time; roll
back if one fails
Benefits: High availability/auto fail-over/minimal data loss
Usages: Disaster recover
Drawbacks: Network efficiency /scalability/cost/less
flexibility
• Asynchronous Replication:
What is: changes are captured on the primary storage
and immediately / timely propagated
Benefits: low cost / scalability /flexibility
Usages: load balance/off-line access/access efficiency
Drawbacks: data lost / network bandwidth
Existing technologies
 WebSphere Information Integrator V8.2
 Supports multivendors DB
IBM Replication
 Admin: create replication criteria  control table
Capture: use log/trigger to capture the changes temp table
IBM 
Replication
 Apply: scheduled apply transactions accumulated target DB
 Alert Monitor: monitor and notify users
 Supports: after-image copy / before-image copy (can rollback)
Allows subset/simple view/ complex joins & unions copy
 Asynchronous replication, allows specifying schedule
 Pioneer, Since 1993
 “publish-and-subscribe” approach
Sybase Replication
 Replication
Agent: runs on each publisher, detects changes base on logs
Sybase
Replication
 Replication Server: apply changes to target DBs (use pre-configured
intelligent routes)
 Replication Server Manager: GUI-based, manage/monitor P2P env.
 Stable Queues: temporary storage of data , ensure no data is lost
 Is advanced in providing high performance
 Multimaster Replication
Multimaster Replication
 P2P structure
Materialized
View Replication
Oracle Replications
Oracle Replications
 Changes
are pushed to every other site (synchronous/ asynchronous)
 Conflicts may happen (Update conflict/Uniqueness conflict /Delete conflict )
Materialized View Replication
 One master site manages several non-master sites (keep one/partial copy)
 Updatable
 Refresh (fast refresh/ complete refresh/ force refresh)
 Hybrid Replication
3. dual masters
 Basic
replication services, using a light weight Master-Slave model
1. simple
master/slaver
2. one slave two masters
 The master writes updates to logs; the slave reads and executes the queries
from the master’s logs
 the slave checks results on both sites, replication stops if query only succeeds
on one site
 This simple structure can be combined arbitrarily to build complex
architectures
 In a slow network, it is difficult for a slave to catch up with the master –
improved in 4.0 by adding
relayring
logs
5. master
6. master ring with slaves
4. dual master with slaves
 Have to lock or restart the master for initial snapshot copy
MySQL Replications
MySQL Replications
Globus DRS
Existing technologies
 A client creates a request file (requested file
name & target location) and sends to DRS
 The Replicator checks user’s credential, and
query RLI to find the LRC that contain
mappings for the requested file
 Also queries each remote LRC to get the
physical file names, and selects a best one
 Then starts RFT to transfer files.
 Finally, registers the new replica to its LRC. The
LRC will updates LRI to make replica visible
Globus DRS
Existing technologies
Designed for large, read-only, file replicating among
heterogeneous resources
Implement File Catalogues
Replica Location Service maps replica’s Grid Unique
ID to physical location
Local Replica Catalogues provides information of
replicas for a single VO
Replica Metadata Catalogue maps file’s logical name
to Grid Unique ID
EGEE RMS
 LCG File Catalogue is used for performance issues
EGEE RMS
Existing technologies
Application
DISPATCHER: monitors input port and dispatches requests to handler
Enables file searching by attributes
MCAT
High Level
MCAT a database system storing metadata
Request Handler
one or more Master daemon processes having SRB
Remote SRB
Agent running on them
 The dispatcher monitors incoming requests and
pass to HLRH (can retrieve metadata from
Low Level Request
Handler MCAT) or LLRH (can retrieve data from
local/remote
DBMS
drivers
File system drivers
storage)
DB2
Oracle
Unitree HPSS
ObjectStore
Illustra
 supports
synch/asynch replication, MCAT
UNIX
replication
SRB
Our Goals
• Combining DB2 SQL Replication with OGSADAI technologies
• Grid-enabling DB2 Replication to provide a grid
service interface for managing replication.
• Supporting more scalable, secure, high
performance data access
• Extend OGSA-DAI to provide more powerful
capabilities.
• Explore metadata technologies
System architecture
GridFTP Transfer
Metadata
Catalogue
Data
Resource
Replication
Control
Service
Relational Database
Replication Mechanism
Data
Replica
Workflows
Request
Replication Control Service
Metadata
Search
Engine
Initiator
GridFTP Transfer
Starter
Relational Database
Replication Mechanism
Data
Resource
Metadata
Catalogue
Selector
Metadata
Register
Replication
Target
Features
• Keeping the features of relational
database replication
• Adding Grid’s features
• Using Grid service discovery mechanism
• Supporting more replication scenarios
Summary
• Introduction of replication
• Introduction of existing technologies
– Relational database replications are
advanced in flexibility, offering solutions for
frequent updating, update everywhere, data
conflictions…
– Grid file replications are good at scalable,
secure, and efficient file transferring
• We studied both model and combine the
two structures to gain benefits from both