Global Peer-to-Peer Replication Utilizing Sybase

Download Report

Transcript Global Peer-to-Peer Replication Utilizing Sybase

DM208
Global Peer-to-Peer Data
Replication Utilizing Sybase
Replication Server
Mich Talebzadeh
Consultant
Peridale Systems,
[email protected]
Aimed At Audience with
Full Familiarity with Sybase ASE
Working knowledge of Sybase's Replication Server Products
Knowledge of UNIX and shell scripting useful.
Topics Covered
Reasons for Peer-to-Peer Replication
Project Issues to be Addressed
Planning for a Replication System
Issues to be Addressed
Peer-to-Peer Replication Implementation
Topics Covered
Tuning Replication Server for Better Performance
Monitoring the Latency and Delivery of Data
Use Replication Server to Measure the Volume of Replication
Traffic on a Daily Basis
Reasons for Peer-to-Peer
Replication
Need to Provide Business Solutions to a Community
Scattered Around the Globe
Limitations of Traditional Techniques Using a Central
Database Repository in a Major Location with Applications
Connected Remotely are:
Limited Application Performance Because of the
Geographical Distance
Degradation in Network Response When the
Traffic
over the WAN Gets Heavy. For Example, Remote DSS
Versus Local Transaction Inputs
Reasons for Peer-to-Peer
Replication
The Data Server Becomes a Bottleneck As a Larger
Business Community Contends for Access
Data Becomes Unavailable When There is a Network
Failure.
Impact on Maintenance Tasks When Remote Users
Logged In.
Reasons for Peer-to-Peer
Replication
A typical third-party application may not access data in
the most efficient way. For example, if the Application
Makes a Large Number of Discreet Queries to the
Database, Connection Latency between the Application
and Data Server could Cause Start-Up Delays
If the Application were to Crash, the System Becomes
Unusable to Traders and Has an Unacceptable Business
Impact.
Project Issues to Be addressed
The Package Eligibility for Replication
The Availability of Required Band-width
The Technical Knowledge of the Package or Application to
Be Replicated
Local DBA Support for Sybase Replication
Planning For A Replication
System
Start A Pilot Project with One Way Replication Between Two
Sites (Not to be confused with Sybase Warm Standby)
Establish the volume of data growth and Deltas on a Daily
Basis on the Replicated Database
Establish the Bandwidth between the Two Replicate Sites.
May have to Improve the Bandwidth
Identifies External Feeders to the Database
Planning For A Replication
System
Best Option to set up the Pilot on the Local and the Remote
Sites, otherwise use two different Servers Locally linked via a
WAN Simulator
Decide on What to Replicate
Replicate Tables and Turn off Triggers for Replicated
Transactions
Not Replicate Tables and Let Triggers Insert the Records
Replicate Stored Procedures
Issues to Be Addressed
If Table Replication
Which Tables you are Replicating
Does the Table Being Replicated Meets Criteria for
Replication
Does the Table Have Primary Key
How is the Unique ID on the Table Generated
Determine Rows and Columns to Be Replicated
Issues To Be Addressed
Time Zones Dependency and Handling Time Stamps. Discuss
and Agree with Business on How to Handle These.
Remember Remote Locations Mean Different Time Zones.
Serious Business Implication
Managing Conflicts in Peer-to-Peer Replication
Managing Inserts
Conflicting Updates
Let Us Take a Look at These Site Specific Solutions.
Issues To Be Addressed
Managing Inserts
A typical local table will include inserts from local
application in addition to inserts delivered via replicated
transactions. In a peer-to-peer set up both tables are bidirectional.
Designers tend to use unique IDs to uniquely identify
records in a table. Primary keys or unique clustered
indexes are usually built on the unique ID.
Issues To Be Addressed
The unique ID for a given table tends to be a
monolithically incrementing number, a 32-bit integer,
stored in a table, the so-called table_next_row_id and
retrieved from this table.
Inter site conflicts occur when rows are inserted in a local
table and distributed to the remote table. If the remote
table has already a record with the same unique ID, the
replicated insert will be rejected, and the data at the
remote table will be inconsistent with the local table.
Possible solutions are:
Issues To Be Addressed
Add a location key to the tables if not already there.
Include the location key in the primary key for replicated
tables. This is useful if the application is at the design
stage. As replication implementation is normally an after
thought, this approach may not be possible without a
substantial change to database schema and code. Beware
of Maintenance and Retrofitting.
Localise the table_next_row_id and DO NOT replicate it.
Allocate ranges for next_row_id column for each location.
A 32-bit integer provides ability to store up to 2 billion
unique values. For a lifetime of a typical application this
range is far more than enough to cater for all sites.
Issues To Be Addressed
For example, you can allocate the following ranges:
Location Reserved range for next_row_id
Column
London
1-100,000,000
Hong Kong
>100,000,000 and <= 150,000,000
Tokyo
> 150,000,000 and <= 200,000,000
Issues To Be Addressed
Conflicting Updates
The best way to handle conflicting updates from different
sites is to construct the application and the environment so
that conflicts cannot happen. However, in majority cases one
needs to rely on business rules to reduce/eliminate the
conflicts. These rules are application specific. For example
for a trading system these could be:
Issues To Be Addressed
On performing simultaneous new trade on the same
holding on different site, problem will occur on calculated
field, such as quantity, P&L, Furthermore, there are no
signals, which warn the users, when the problem occurs.
The adopted solution is to recalculate these fields
(nightly) so that the following day when the portfolio is
loaded, they will have the correct figure.
However, we have not yet encountered such problem.
Issues To Be Addressed
On simultaneous update on the same order. This could
happen due to a mistake. The business rule is whoever
created the order should be the one who updates the order.
If this happens the quantity (P&L) data will be out of
synch. Again there will be no warning message to indicate
this and it will be very difficult for IT support to detect it.
The traders/ users will inform Application support that
P&L or quantity is wrong. The Application support group
needs to check the historical order and amend it
appropriately. Once this is carried out, the correct details
will be replicated to other sites and the databases will be
in sync again.
Issues To Be Addressed
Make Sure that the IT and Business Managers plus
Application Support are Aware of the Fact that Technology is
Not a Substitute for Business Rules and Ultimately they are
Responsible For Consistency of the Data.
Peer-to-Peer Replication
Implementation
For Simplicity We Illustrate the Peer-to-Peer Implementation
Among Three Sites. Although This Model Can be Expanded
Further to Include Additional Sites
We Have the Following:
Peer-to-Peer Replication
Implementation
Replication Server
Location
RSSD Location
lon_rep_server
London, BCP
BCP Server
hk_rep_server
Hong Kong
HK Server
tyo_rep_server
Tokyo
Tokyo Server
Peer-to-Peer Replication Set UP
Peer-to-peer Replication
Set up
Warm Standby Data
Warm
Standby
Database
London
London
Database
Direct Route
Connection
App Server
London
London LTM
London Client
London Replication
Server
Hong Kong LTM
Tokyo LTM
Hong Kong
Replication Server
Tokyo Replication
Server
Hong
Kong
Database
Tokyo
Database
App Server
Hong Kong
Hong Kong
Client
Tokyo
Client
App Server
Tokyo
Peer-to-Peer Replication
Implementation
Basics
Create Replication Servers on all Three sites. Each
Replication Server Will Handle the Local RSSD and the
Local Database to Be Replicated
Choose the ID Server for Replication Where the
Replication and Support Knowledge is Highest
Ideally All Sites Should Have Sybase “sa” and root
Access to all Servers. As a Minimum the ID Server Site
Should Have Access to All These
Peer-to-Peer Replication
Implementation
Create the Diagnostics Run Files for Replication Servers so
the DBA May Observe Each Replicated Transaction
Performed by the Server (Invaluable in Identifying
Problems).
This is Achieved by Replacing the Repserver Binary with the
repserver.diag Binary in the Run File and Adding the
Following Entry to the Replication Server .cfg File
trace=DSI,DSI_CMD_DUMP
Peer-to-Peer Replication
Implementation
Create error class rs_sqlserver_error_class (default sql server
error class) in ID replication server only.
This will Handle Sql Server Errors in Replication Server. The
default error action for all errors returned by Sql Server is to
Stop Replication! You Can Assign Action to the Created Error
Class etc. Very Important!
Error Actions are Stored in Table rs_erroractions. Use
rs_helperror error_no, v to Get Information About the Errors.
Peer-to-Peer Replication
Implementation
Turn off trigger settings For Each Replicated Database in the
Repserver Controlling that Database. Use configure
connection command with ‘dsi keep triggers’ option set to
‘off’. For example in lon_rep_server run the following
command:
configure connection to london_sql_server.db set
dsi_keep_triggers to ‘off’
Create Interfaces Files with All Sql Server and Repserver
Entries for All Sites. Ensure that the Application Servers also
have All the Relevant Information for All Sql Servers
Peer-to-Peer Replication
Implementation
Creating Direct Routes
In our triangle diagram, we need to create routes in order
for our three replication servers to send messages to
destination replication servers.
A route is a one-way message stream that sends requests
from one replication server to another, carrying data
modification commands, replicated functions and stored
procedures. In this design the routes are created as
follows:
Peer-to-Peer Replication
Implementation
Route type
Direct
Direct
Direct
Direct
Direct
Direct
Source
lon_rep_server
lon_rep_server
hk_rep_server
hk_rep_server
tyo_rep_server
tyo_rep_server
destination
hk_rep_server
tyo_rep_server
lon_rep_server
tyo_rep_server
lon_rep_server
hk_rep_server
Peer-to-Peer Replication
Implementation
For example in lon_rep_server run the following
command to create route to hk_rep_server and
tyo_rep_server respectively:
create route to hk_rep_server set username
hk_rep_server_rsi set password hk_rep_server_rsi_ps
create route to tyo_rep_server
tyo_rep_server_rsi
set
tyo_rep_server_rsi_ps
set
username
password
Peer-to-Peer Replication
Implementation
Where hk_rep_server_rsi is the RSI username already
created by rs_init when you created the hk_rep_server.
hk_rep_server_rsi_ps is the default password for such
user etc. Use rs_helproute in any RSSD to check the
status of the routes created.
Keep all passwords below 30 characters. If you want
to change password for RSI user etc, do so when
creating the repserver. Makes life easier later.
Peer-to-Peer Replication
Implementation
Loading the Database to Be Replicated
In order to perform the initial load of the database to be
replicated, you may consider the following steps:
Decide where you are going to load your initial
database. In our case we chose London as the starting
point. Otherwise you may have to merge databases.
Requires a good migration plan
dbcc the database and perform update statistics in
London.
Review all the primary keys for tables to be
replicated.
Peer-to-Peer Replication
Implementation
Turn off all replication flags in the user tables. Use
sp_setreptable table_name, false
Do dbcc settrunc(ltm,ignore) on the database
Dump transaction with truncate_only
Dump database to the dump directory.
Ftp the dump file to the remote servers. May consider
zipping the file etc.
On the BCP site load the database
Peer-to-Peer Replication
Implementation
Load the database from the dump file in remote
locations
Localise the so-called local tables. For example if you
have table_next_row_id, set next_row_id column to
the appropriate starting values for location etc
Adding databases is quite straightforward. For example:
Add the London Production database to the replication
system using lon_rep_server
Add the Hong Kong database to the replication system
using hk_rep_server
Add the Tokyo database to the replication system
using tyo_rep_server
Peer-to-Peer Replication
Implementation
Creating Replication and Subscription Definitions
All the Information you need is in the database to be
replicated
Use the Local Repserver and SQL Server to Create
Replication Definitions for the Database to be Replicated
You will need to extract information from system tables:
sysobjects, syscolumns systypes and syskeys, provided
that you have defined primary keys for your tables!
Peer-to-Peer Replication
Implementation
I have prepared shell scripts which will automatically
generate replication definitions for all tables. This script
takes the format:
genrepdef.ksh -R <REP_SERVER_NAME> -S
<SQL_SERVER_NAME> -D <DATABASE_NAME>
where the parameters refer to the repserver controlling
the database, the name of the SQL server and the
database name respectively.
In the same way, you can create subscription definitions
using script
gensubdef.ksh -R <REP_SERVER_NAME> -S <SUBSCRIBER_SERVER>
-D <DATABASE_NAME>
Peer-to-Peer Replication
Implementation
Where the parameters refer to the repserver in which
subscriptions are defined, the name of the SQL server
which subscribes to the replication definitions and the
database name respectively.
Use activate.ksh, validate.ksh and checksub.ksh to
complete the subscription process.
Peer-to-Peer Replication
Implementation
In summary for replications we have:
Replication
Definitions
Repserver Run
Against
SQL Server
London Tables
lon_rep_server
London server
HK Tables
hk_sql_server
HK server
Tokyo Tables
tokyo_sql_server
Tokyo server
Peer-to-Peer Replication
Implementation
Likewise for subscriptions we have
Subscription
Definition
London to HK
Repserver where
subscriptions
defined
hk_rep_server
Subscriber
Server
HK server
London to Tokyo tyo_rep_server
Tokyo server
HK to London
lon_rep_server
London server
HK to Tokyo
tyo_rep_server
Tokyo server
Tokyo to London lon_rep_server
London server
Tokyo to HK
HK server
hk_rep_server
Peer-to-Peer Replication
Implementation
At the end of subscription definitions you should have
two subscription definitions for each replication
definitions. In other words, doing sp_helpsub for each
table should give you 3x2 subscriptions = 6 lines. This
should be shown in any RSSD database.
Use of Function Strings to Apply Local Timestamps
Replication Server converts functions to commands for
destination data servers, and submits them to these data
servers.
Peer-to-Peer Replication
Implementation
For example, a new row inserted in the source table
causes Replication Server to distribute an rs_insert
function specific to that table to the subscriber databases.
A possible solution for applying local timestamps at
replicate database would be to modify rs_insert for a
given source table to invoke an RPC at the destination
database. The RPC in turn inserts local timestamp to the
subscribed table as shown below:
Peer-to-Peer Replication
Implementation
alter function string tran_history_db_rd.rs_insert for
rs_sqlserver_function_class output rpc
'execute ins_tran_history_sp
@time_stamp = ?time_stamp!new?,
@action = ?action!new?,
@table_name = ?table_name!new?,
@row_id = ?row_id!new?,
@host_name = ?host_name!new?,
@user_name = ?user_name!new?,
@pid = ?pid!new?’
go
Peer-to-Peer Replication
Implementation
create procedure ins_tran_history_sp
( @time_stamp datetime = null, @action int = null,
@table_name varchar(24) = null, @row_id int = null,
@hos_tname varchar(24 ) = null, @user_name varchar(24) = null,
@pid numeric(35) = null)
as
begin transaction
insert into tran_history
( timestamp, action, table_name, row_id, host_name,user_name,
pid)
Peer-to-Peer Replication
Implementation
values
( getdate(), @action, @table_name, @row_id,
@hostname, @user_name, @pid)
commit transaction
go
Tuning Replication Server for
Better Performance
There are some configuration parameters that can be altered
in order to get better performance from the Replication
Servers
init_sqm_write_delay - stable queue manager waits for
at least init_sqm_write_delay milliseconds for a block to
fill before it writes the block to the correct queue on the
stable device (default is 1000). Try decreasing this
parameter.
init_sqm_max_write_delay - a flush to the queue is
guaranteed
to
happen
after
waiting
for
init_sqm_max_write_delay, if the DSI or RSI thread
reading the queue is unable to connect to the target or has
been suspended (default 10000). Decrease this parameter
Tuning Replication Server for
Better Performance
sqt_max_cache_size - need to increase this value if there
are a lot of open transactions and or large transactions.
Memory for sqt_max_cache_size is taken from the global
memory pool (default is 131072 bytes)
batch_sz - the larger the batch_sz the less often the
truncation point is updated (default 1000 commands). Try
increasing this value
Monitoring the Latency and
Delivery of Data
It is a common practice for DBAs to set up a DBA specific
table in the replicated database and try to check the latency
and health of the replication system by updating data in this
table. In its simplest form one can insert or update records in
this table and see if the data is being replicated to the other
sites. The time taken for data to get to the remote site will
give an indication of latency.
Monitoring the Latency and
Delivery of Data
Although the above method can be used as a simple
monitoring tool it suffers from the following drawbacks:
It is not an indicator of whether the business
transactions arrive in remote sites in a timely manner
DBA table is normally used by one application, for
example a cron job via a simple transaction. User
applications create a larger number of transactions and
the data delivery is impacted by the following:
Monitoring the Latency and
Delivery of Data
Concurrency
Size of the user tables
Locking mechanism employed by the application
It should be noted here that the maintenance user trying to
deliver the replicated data could be blocked by local users. If
the statistics on the user tables are not current then the
replicated data may take a longer time to be delivered
resulting in remote users being blocked waiting for locks to
be released.
Monitoring the Latency and
Delivery of Data
It is important to get a more realistic estimate of replication
delivery. A possible solution would be to look at the entries in
an audit or transact history table in the replicate database and
check the delivery timestamp. By comparing the timestamps
for records delivered and adjusting for servers clock
difference, the latency can be estimated.
Use Replication Server to
Measure the Volume of
Replication Traffic
Use Replication Command admin who, sqm
Take the Column First Seg.Block. This is the first
undeleted segment and block number in the queue. As we
go through the day the Seg. Block value increases. When
the Replication is shutdown these values are reset. Take
readings at the different time intervals, the difference in
the values read gives an indication of data passed to a
replication queue over the time.
Need to convert blocks read to MB
Received:0
Sent:1
105:0
105:1
116:0
116:1
117:0
117:1
118:0
118:1
119:0
119:1
Server
hk_server.db
hk_server.db
bcp_server.db
bcp_server.db
lon_server.db
lon_server.db
ny_server.db
ny_server.db
tyo_server.db
tyo_server.db
Current
Previous
Start of
Received/Sent
Read(S.Blk) Read(S.Blk) Day Read(S.Blk) today/MB
40196.52
8487.27
34872.64
5245.16
3742.25
45706.58
21153.49
2709.48
14081.53
2664.40
40196.35
8487.24
34872.47
5245.13
3742.25
45705.62
21153.32
2709.45
14081.36
2664.37
40171.60
8472.60
34844.20
5236.18
3733.60
45660.20
21124.23
2702.48
14053.30
2656.28
24.87
14.48
28.69
8.97
8.45
46.59
29.41
7.00
28.36
8.19
Use Replication Server to
Measure the Volume of
Replication Traffic
Stable queues are composed of segments stored on disk
partitions.
Each segment has a megabyte of message space, divided
into 64 blocks of 16K
Take the reading at Start of day (Say 11PM)
No_of_16K_Blocks_cur = (Seg.Block_cur -1) * 64 + (
Seg.Block_cur - int(Seg.Block_cur) ) * 100
No_of_16K_Blocks_start = (Seg.Block_start -1) * 64 + (
Seg.Block_start - int(Seg.Block_start) ) * 100
Traffic today (MB) = ( No_of_16K_Blocks_cur No_of_16K_Blocks_start ) / 64
Total Messages attributed to different servers up to now are:
London Production
Hong Kong
Tokyo
New York
London bcp
55.05MB
39.36MB
36.55MB
36.41MB
37.66MB
DBA specific traffic (estimate) 44.84MB
Application traffic
160.17MB
Sum total all sites
205.02MB