Ecotools Scalability and Implementation Issues

Download Report

Transcript Ecotools Scalability and Implementation Issues

Ecotools Case Study :
Database Monitoring at
BNP Paribas
Dennis Adams
BNP Paribas
What’s in a name :
Paribas Capital Markets
Now part of BNP Paribas
Investment Banking
€ Euro-Denominated Bonds.
Equities, Bonds and Derivatives Trading
Systems in most major financial centres
Infrastructure in territories and central
London Data Centre.
Finance and Reporting in Paris.
Databases and OS
Types in Production
HP/UX DecUNIX Solaris NT 4 TOTAL
Sybase
8
16
8
5
37
Ingres
2
39
0
0
41
TOTALS
10
55
8
5
78
Tokyo, Singapore, Hong Kong, Paris,
London, New York
Our Responsibility: Data
Management Group
Managing DBMS Servers
Based in London, remote Support to
territories (out-of-hours callouts !)
Liaise with London Operations 24h/day
Other Teams
Systems Management - hardware & OS
Networks - LAN & WAN
Application Teams - Application Support
Objectives when
purchasing Ecotools
Monitoring of Sybase DBMS
detail down to SQL statement
Ingres as an additional requirement
Reliable
WAN based
Event Alerting
Links to TNG for Central Alerting
Extract trend data for capacity planning
How we use Ecotools
GUI
Monitored
Agent
Machines
Alerts
Command
Line
Control
Files for
Scenarios
Ecotools
Repository
How we use Ecotools
Single Central Solaris Console (V6.2.1)
Continuous running 24 X 7
Minimal use of the Ecotools GUI
Scenarios controlled from ECOCLI
start/stop UNIX shell scripts.
Alerts picked up by “tail” of ecotools log
summary on daily in-house web page
Configuring Domains
Domain = Logical grouping of Servers
Unit of “discovery” within Ecotools
Monitor “All Servers” together
Configuring Domains
Group by Business Unit / DBMS Type /
Territory ?
Lots of small Domains ?
speed up “discovery”
Few no of large Domains ?
easier to get to individual machines
COMPROMISE: DOMAINS OF MAX 25
ORGANISED BY APPLICATION TYPE
(ALMOST)
Creating Scenarios
Scenario = Basic Collection Task
One or more agents (cache hit, log size)
One or more machines (“All Servers”)
Time Interval (“10 Minutes”)
Either using the GUI or from .CTL file
CREATE USING GUI, SAVE AS A CTL
TEMPLATE... SED/AWK
START FROM UNIX SHELL
Managing Scenario
Control Files
Lots of Small Control files ?
One UNIX impf process per scenario
a single machine can have 15 “request”s
80 *15 impf processes on console (!!!)
ps -ef | more
Few no of Large Control files ?
Unix VI ctl file when new machine added.
Lots of machines per ctl file makes restart
a problem
Managing Control Files
EVERY MACHINE HAS IT’S OWN
CONTROL FILES (MAXIMUM OF 3)
Use SED string substitution to created
from standard Templates
Ecotools Startup logic...
For file in *.ctl
do
 ecocli
 eco: run $file
done
GUI Stability Issues
GUI “hangs” when Alerts arrive.
Determined by ECOCLI_BLINKALARM
ECOCLI_ALARMINTERVAL =1200 default
= 2*CliLogInterval
The GUI Crashes !
Corrupts master imdb - lose all data
Better in version 6.2.1 (latest patches)
SWITCH OFF ECOCLI_BLINKALARM,
USE UNIX SCRIPT TO TAIL LOG FILE
Managing Ecotools
“Views”
View = Repository for performance data
consists of indexed/ flat file directory
Need X months data - capacity planning
Can keep views going for several weeks
but “imdb” = 12MB, .datalog = 1.8GB.
CREATE NEW VIEW EVERY MONTH
SET “NO OF DATA POINTS=1500” IN
CONTROL FILES (OR LOSE DATA)
Trend Data for Capacity
Planning
Several Month’s data required
Start a new GUI for every single view ??
SOLUTION : extract data to CSV files
for reading into database
There is no supported utility to do this.
SHELL SCRIPTS - GREP / SED / AWK
UNSUPPORTED HACK !
Unicenter/TNG
Integration
In-house implementation calling KSH
script from Ecotools Scenario Language
Can be implemented as “MgrAction” or
“AgtAction”
MgrAction = run on Solaris Console
GMRPY_CDBSTOP errors - not scalable
AgtAction = run on Agent Machine
NOT YET PROVEN
Summary : Ecotools
Experience so far
Evaluation of version 4.n in 1997
Monitoring of Euro changeover in 1998
Y2K tracking and Reporting 1999
Version 6.1 chosen as Y2K version
Soon replaced by 6.2.1 with patches.
Currently Monitoring nearly 80
Production Systems from London.
Looking at Ecotools V.7
Looks impressive on paper
SQL server storing collected data
Genuine 3-layer architecture
Better User interface
Concerns:
Support for Legacy Operating Systems
Dec UNIX versions
Ability to create own “shell script” agents.
Earlier Versions of Sybase
Ecotools Future at BNP
Paribas
Evaluate V7 for monitoring NT/Sybase
Production Systems.
Need UNIX shell agents for NT console
Objective: move to centralised NT
console - eventually
Keyword: STABILITY
Success Stories
Sybase Settlement System - Tokyo
Reserved log space low
caused by backup server failing and
transaction logs not being properly
dumped. Restarted backup server
Potential system hang averted
Delayed Settlement = We get Fined
Success Stories
Ingres Equity Derivative Trading - New
York
Large Table Approaching Ingres
Architectural Limit of 2GB
Planned outage to remodify table into
multiple locations to allow it to grow
Equity Trading Outage averted
Keep the Traders Trading
Success Stories
Ingres Forex Trading - Paris
Ingres Error Log reported potential
UNIX Filesystem Error one evening
Alerted Paris Unix Team, who decided
to ignore the warning
System Crashed overnight
But at least we could say “I told you so”
Final Proof
Accepted within DMG Group
Integrated with our way of working
We are now a more Pro-Active Team
Looks good on the CV, provided you
call it...

E-COTOOLS
Lunch Time ?