Monitoring and system management in distributed environment

Download Report

Transcript Monitoring and system management in distributed environment

S-38.310 Tietoverkkotekniikan diplomityöseminaari
16.3.2004
Monitoring and System
Management in Distributed
Environment
(Hajautettujen Tietojärjestelmien
Hallinta ja Valvonta)
Mikko Uljas
Supervisor: Jorma Jormakka
Topics
• Research problem and methods
• Framework for distributed applications
management
• Measurement data warehouse
• Scalability and resource usage
calculations
• Management concepts and applications
• Conclusions
Research Problem
• How monitoring of distributed systems can
be arranged in an enterprise level
architecture?
• What management concepts can gain
from a properly implemented monitoring
system?
Research Methods
• Literature research focusing on
– system monitoring and management
– data warehousing
– management concepts
• Theoretical calculations
• Case study
Bauer’s Framework for Distributed
Applications Management
• Three-tier architecture
– management applications
– management services
– managed nodes
• Management services play a central role
– repository subsystem
– configuration subsystem
– combined control and monitoring subsystem
• In this thesis adopted as a common architecture
for distributed systems management
Bauer’s Framework for Distributed
Applications Management
Management applications
Modelling
Configuration
Management
Perfomance
Management
Service Level
Management
Fault
Management
Management service interface
Management services
Combined monitoring and control subsystem
Configuration
subsystem
Control
subsystem
Monitoring
subsystem
Repository subsystem
Information flow
Managed nodes
Managed node
Management agent
Managed node
Management agent
Managed node
Management agent
Measurement Data Warehouse
• A data warehouse solution for storing
monitoring data
• Four main components
– Collector
– Data integration component
– Data warehouse
– Data querying and reporting component
• Monitoring information data model
Measurement Data Warehouse
Management services
Combined monitoring and control subsystem
Control
subsystem
Configuration
subsystem
Monitoring
subsystem
Measurement Data Warehouse
Repository subsystem
Management Information
Repository (MIR)
Data querying & reporting component
Data warehouse
Data integration component
Managed nodes
Managed node
Managed node
Managed node
Collector
Collector
Collector
Measurement
data source
Measurement
data source
Measurement
data source
Management agent
Management agent
Management agent
Scalability and Resource Usage
Calculations
• Aim is to show that the framework is
suitable for enterprise level architecture
• Three potential bottleneck points identified
– amount and accumulation of monitoring
information
– disk space need
– network traffic
• There are more -> further studies needed
Scalability and Resource Usage
Calculations
Focus
Points of observation
Managed node

Consider carefully how long there is a need to
keep detailed historical information in the local
measurement data source.
Network

Measurement Data Warehouse should be
located in a site where network transmission
capabilities are at least 100 Mbits Ethernet.
Spread data transfers from different managed
nodes to a longer time period.

Centralized data
warehouse



The amount of data transferred to the data
warehouse has to be chosen carefully.
Use pre-summarized data when possible.
Consider carefully how long there is a need to
keep detailed data in the warehouse.
Management Concepts
• ITIL (IT Infrastructure Library) chosen as a
conceptual base
• Three example concepts
– Service level management
– Capacity management
– Incident management
• For each concept a set of example metrics
• Motivation => To show what they can gain from
a properly implemented monitoring system
Management Concepts
Management
concept
Gains of properly implemented monitoring
system
Service level
management
 When SLAs are initially solved there is a
need of historical performance data and
trends.
 Early warning system which tells when SLA
goals are about to be breached or have
already been breached.
 Historical performance data of SLA metrics.
Capacity
management
 Historical performance data of systems key
parameters.
Incident management
 Systems real time status monitoring can
provide a proactive way to deal with
incidents.
Conclusions
• Propose a three-tier architecture for enterprise
level monitoring system
• Measurement data warehouse solution
separates
– real-time updates made by management agents
– complex data analysis performed by management
applications
• No obvious bottleneck point found
• Management concepts gain greatly from a
properly implemented monitoring system
Further Studies
• Because of the wide subject there has
been only little room for details
• Case studies and working solutions
– monitoring system fine tuning
– data warehouse solution concentrating on
monitoring information
– management applications using monitoring
information
The End
Questions?