PPS - dougmcclure.net

Download Report

Transcript PPS - dougmcclure.net

From Consolidated Operations to Service
Management with the Netcool Suite
General Session
Doug McClure
Sr. Manager, Service and Technology
Monitoring, EarthLink
October 14, 2004
Copyright © 2004 Micromuse Inc. All rights reserved.
Agenda
> EarthLink Overview
> Innovation, Technology, and Change
> And the need for open, flexible, adaptable monitoring solutions
> IT Operations and Business Maturity
> Challenges facing EarthLink and Roadmap to Improvement
> EarthLink Service and Technology Monitoring
> Improving Service, Customer, and Business Performance and
Availability
> Enabling ITIL Best Practices with the Micromuse Suite
> Service Management Database, Change Management Dashboard
> Linking IT Operations with the Business
> Business Process/Activity Monitoring and Dashboards
> Continuous Improvement
Copyright © 2004 Micromuse Inc. All rights reserved.
2
EarthLink Overview
> One of the Nation’s Largest ISPs
> Headquarters in Atlanta, GA
>
Key facilities in Dallas, Pasadena, San Jose, Knoxville, and Seattle
> Profitable, strong balance sheet
> Largest DSL footprint
> First-to-market with products that provide the best possible Internet experience
> Customer Advocacy: Fighting SPAM, Abuse, and Fraud (Phishers)
>
Technical solutions
>
Litigation
>
Legislative support
>
Industry collaboration
>
Consumer education
> 10th Anniversary (1994-2004)
>
http://www.redefineyourworld.com
Copyright © 2004 Micromuse Inc. All rights reserved.
3
EarthLink Overview
5.25M Customers
> ~4M Dialup (Premium ~3.5M, Value ~500K)
> ~1.2M Broadband (Cable, xDSL)
> ~160K Web Hosting (Unix, Windows)
> ~50K Wireless (Blackberry, PDA, Laptops, Wi-Fi)
> Dial Access Coverage > 90% of US Population
> ~16K Local Dial Access Numbers
> ~500K Active Modem Ports (~50% ELNK, ~50% Outsourced)
> ~250 PoPs (18 Core Backbone PoPs, four data centers)
> Broadband Coverage
> ~200 Markets with Broadband Offerings
Large and Diverse Infrastructure
> ~2300 Network Elements
> ~1600 Server Elements
> Thousands of Access Circuits, Hundreds of WAN Circuits
Copyright © 2004 Micromuse Inc. All rights reserved.
4
EarthLink Overview
Access Technology Innovation
> Premium and Value Dial-up
> Broadband (Cable, xDSL, Satellite)
> Voice (Converged Devices, VoIP, SIP)
> Wireless (WiFi, CDMA, Blackberry, PDA)
> Broadband over Power Lines (BPL)
> IP Services (Triple Play)
Value Added Service and Product Innovation
> Blocker Family: spamBlocker, POP-UP Blocker, ScamBlocker, Virus Blocker,
Spyware Blocker (www.blockoftheday.com)
> Parental Controls
> Webmail, Web Accelerator
Copyright © 2004 Micromuse Inc. All rights reserved.
5
EarthLink Overview
Exceptional Customer Service
> 2004 J.D. Power and Associates Customer Satisfaction Award for HighSpeed and Dial-Up Internet Service
> 2003 PC Magazine Readers' Choice Awards for both high-speed and dialup services
> 2003 highest ranking in customer satisfaction for the second year in a
row for high-speed Internet service by J.D. Power and Associates in its
Internet Service Provider Residential Customer Satisfaction StudySM
> 2003 CNET Editors' Choice award
Copyright © 2004 Micromuse Inc. All rights reserved.
6
Innovation, Technology, and Change
“
A company can't outgrow its
competitors unless it can
out-innovate them.
Source: Gary Hamel and Gary Getz, in ‘Funding Growth in an Age of Austerity’
Copyright © 2004 Micromuse Inc. All rights reserved.
7
”
Innovation = Constant Change
Drivers
> Customer Retention – Decrease Churn
> Speed to Market, Competition – Do more, faster
> Quality, Performance, Support Costs
> Compliance - Sarbanes-Oxley, Visa CISP
Operational Challenges
> Release Management
> Change Management
> Service Level Management
> Enterprise Security
Copyright © 2004 Micromuse Inc. All rights reserved.
8
Leading Edge Technology = Constant Change
Drivers
> Voice – SIP
> Broadband
> Wireless (WiFi, Regulated, Unregulated)
> Content, Rich Internet Applications
> End-to-End Services
> Custom Applications
Operational Challenges
> Fault, Performance, Availability, Utilization Monitoring
> Vendor Lag in Support
> Lack of a Standard Fault, Performance, Availability, Utilization API
Copyright © 2004 Micromuse Inc. All rights reserved.
9
IT Operations and Business Maturity
“
It is not the strongest of the
species that survive, nor the
most intelligent, but the one
most responsive to change.
Source: Charles Darwin
Copyright © 2004 Micromuse Inc. All rights reserved.
10
”
Operations Maturity: Growing Up, Focused
on Four Areas
Service Level Management
> All Tier 1, 2, 3 Support Groups in Operations
> Set and manage expectations internal/external to Operations related to
responsiveness and resolution of production issues
Change Management
> Provide oversight and control of the production environment
> Minimize risk and impact from change activities
Release Management
> Development  Operations
> Minimize poor quality production releases
Enterprise Security
> Compliance, control, audit
Copyright © 2004 Micromuse Inc. All rights reserved.
11
Operations Maturity: Common Language and
Best Practices
Production Improvement Program (PIP)
> Foundation in IT Service Management, ITIL, CobIT
> Focusing on four main areas: Service Level Mgmt, Change Mgmt,
Release Mgmt, and Production Security
> Over the past four months, 20% of Operations staff have now attended ITIL
Training
> 1 Master Level Certified (two more pending results)
> 12 Practitioner Level Trained in CCR Quadrant
> 8 Change Management Practitioner Certified (more pending results)
> 4 Configuration Management Practitioner Certified
> Over 130 Foundation Level Trained and Certified
Copyright © 2004 Micromuse Inc. All rights reserved.
12
Production Improvement Program
REQUEST FOR CHANGE (RFC)
Prod Sec
Release Mgt
Change Mgt
Corp
Project
Ops
Project
Non-Project
STATUS CHANGE (1)
Prioritization, Risk Assessment and
Forward Schedule of Change
Release
Planning
Mutual Benefit from EarthLink’s Innovation
and Advanced Use of Micromuse Products
Micromuse OMNIbus, Impact,
Webtop, RAD
STATUS
CHANGE (2)
Change Approval
and Proj. Service
Availability
Dev /
Release
Release
Procurement Design, Build Acceptance
STATUS
CHANGE (3)
Final Change
Approval and
Implementation
CLOSED RFC
STATUS
CHANGE (4)
Review
Changes
Roll-out Comm, Prep, Distribution/
Planning Training
Installation
Metrics
&
Reporting
Policy,
Procedures,
Standards &
Guidelines
Security
Consulting
Copyright © 2004 Micromuse Inc. All rights reserved.
Security
Assessment
13
Security Test
& Sign off
Security
Monitoring
Source: EarthLink SLM Group
EarthLink Service and Technology Monitoring
“
Creativity involves breaking
out of established patterns in
order to look at things in a
different way.
Source: Edward de Bono
Copyright © 2004 Micromuse Inc. All rights reserved.
14
”
EarthLink and Micromuse Facts
Very Early Netcool Adopter
> EarthLink (Mindspring) was Micromuse’s first US customer
> Began evaluating Micromuse Netcool in 1996, official customer April 1997
Early Innovation
> Early joint innovation and development helped build foundation for many of
Micromuse’s key products
Driving 3rd Party Vendor Integration & Partnerships
> Much more than just “sending SNMP TRAPs  EarthLink requires in-depth
integration with Micromuse suite
Current Deployment
> Netcool OMNIbus, Internet Service Monitors, SM Reporter, Desktop Clients,
Webtop, Impact, numerous Gateways, Probes, Data Source Adaptors
> Preparing for OMNIbus v7 migration, RAD 2.0
> Plan to evaluate Precision
Copyright © 2004 Micromuse Inc. All rights reserved.
15
Moving Beyond “MoM” and Apple Pie
EarthLink’s Early Micromuse Netcool Deployment
> Focused on Netcool as the “Manager of Managers” or “MoM”
> Needed during EarthLink’s rapid growth and expansion
> Enabled event management  eliminated “swivel chair NOC”
“Apple Pie” is Event Correlation and Deduplication
> The Netcool sweet spot was providing EarthLink with event correlation and
deduplication
> Enables Tier 1 and Tier 2 break/fix support groups to operate efficiently
Focus now on End-to-End Service Management
> Netcool Suite allows EarthLink to manage entire service
> We can understand service relationships, service levels, and service impact;
perform service modeling and service discovery
> Enables impact assessment, prioritization, understanding full service delivery
chain
> Eliminate “needle in the haystack” approach of event management
Copyright © 2004 Micromuse Inc. All rights reserved.
16
The Service IS Important
End-to-End Service Management and Monitoring
> End-to-End service monitoring is my team’s #1 goal!
> Providing that all layers (L1-L7) of the infrastructure are thoroughly
instrumented, real-time monitoring of the true end-to-end service is possible
> Service discovery, topology, dependency mapping, and change control ARE
REQUIRED for highly accurate service monitoring
> “Intimate Service and Infrastructure Knowledge” can be instrumented
> Developers and support staff have deep understanding of how our services
operate and their unique operational characteristics and dependencies
> This knowledge can be programmatically instrumented and monitored, correlated,
analyzed, and presented in real-time
> Immediate notification to support groups when service infrastructure
capabilities or performance degrades
Copyright © 2004 Micromuse Inc. All rights reserved.
17
Service Management Complexity
Good Customer Experience?
Performance?
ANY WEB
BROWSER
Client
Applications
Mail
CLIENT
HTML
Presentation
Layer
Infrastructure Events to Netcool
CLIENT
ANY WEB
BROWSER
HTML
S81
HTML
S82
S84
S83
PALM
CLIENT
HTML
S85
S86
S80
APIs
API 1
Tickets
Application
Services
Layer
APIs
Core
Services
Layer
S90
S88
S91
SMTP
IMAP
S100
Infrastructure
Layer
API 3
S87
POP3
S102
S101
S108
APIs
API 2
S106
API 4
S107
API 5
API 6
S109
S105
S104
API 7
S110
Storage
Copyright © 2004 Micromuse Inc. All rights reserved.
S103
S112
To Other Systems
18
S111
Source: EarthLink Product Group
Service Management Complexity
•Event information increases
exponentially by amount of number of
components, time (growth), and
infrastructure changes
•Over 1500 Servers, 2300 Network
Elements, and 20K Interfaces/Circuits
•Netcool/ObjectServer is a must have for
effectively managing service event
stream from end-to-end
•Impact 3.0’s cluster capability will
greatly improve ability to analyze, enrich,
suppress, and manage event stream
regardless of our growth
Number of
Components
Infrastructure
Events
D D D D
D D D D
D
D D D D
D
D
D D D D
D
D D D D
Time
(24x7x365)
Infrastructure
Changes
Copyright © 2004 Micromuse Inc. All rights reserved.
19
Source: EarthLink Product Group
The Customer IS Important
Customer Experience Management and Monitoring
> The Micromuse Netcool Suite enables consolidation and understanding of proactive,
real-time monitoring of the customer’s experience for core EarthLink services
> Proactive, real-time monitoring of the customer’s experience
> Traditional Infrastructure Monitoring (SNMP, System Agents, Service Port Monitoring)
> Synthetic transaction monitoring
> Customer Agent based monitoring,
> Agentless application, transaction, and customer performance monitoring (Emerging)
> Becomes the “glue” that ties infrastructure monitoring together
> Powerful information when customer experience and infrastructure monitoring data is
correlated, analyzed, and presented in real-time
> Immediate notification to support groups when customer’s experience degrades
Copyright © 2004 Micromuse Inc. All rights reserved.
20
The Business IS Important
Business Activity Monitoring and Management
> Expands IT Operations visibility vertically and horizontally
> Ties IT Operations data and Business data together
> System Downtime vs. Contact Center Call Volume
> Real-Time Customer Subscriptions vs. Sales Forecasts
> Almost any process can be instrumented and monitored in realtime, have policies applied to it, and be presented in a dashboard
or portal for presentation
> Enables Real Time Monitoring and Management of Business and
IT processes
> Change and Downtime Management
> Customer Registration Management
Copyright © 2004 Micromuse Inc. All rights reserved.
21
Enabling ITIL Best Practices with the Micromuse Suite
“
If you always do what you've
always done, you'll always
get what you always got.
Source: From a speech, unattributed
Copyright © 2004 Micromuse Inc. All rights reserved.
22
”
Enabling ITIL Best Practices
Incident and Problem Management
> IM: Low level event classification, service dependencies, full integration with
Remedy, Service Management DB (SMDB)
> PM: Long-term historical event database for trend research, Service Management
DB (SMDB)
Change and Release Management
> CM: Change Management System (CMS/RFC), Service Management DB (SMDB),
service dependencies, impact on infrastructure from changes or downtimes
> RM: Monitoring can greatly help in the development, test, and staging
environments PRIOR to release to production
Performance and Availability Management
> PM/AM: Continuous low-level element and system level testing and data
collection, trending, reporting, and alerting
Capacity Management
> CM: Continuous low-level element and system data collection, trending,
reporting, and alerting
Copyright © 2004 Micromuse Inc. All rights reserved.
23
Service Management Database: ITIL/PIP & Service
Management
SMDB
•Information about end-to-end service, service dependencies, relationships, topology,
elements, production status, etc.
•Self-serve customer interfaces into the service management and monitoring process
•Auto-provision monitoring on all applications  reduce administrative overhead
•Not a low-level configuration management database (CMDB), but could be the virtual
high-level CMDB
SMDB Modules
•Change Management System (CMS) / Downtime Request (DTR)
•All RFC’s/DTR’s managed from within the SMDB complex, full lifecycle management, full risk and
approval matrices, service management policies, interested parties
•Impact of changes/downtimes immediately known within infrastructure through Impact 3.0
integration, policy creation, and event management
•Element Management (network, server, application), ISM Creation, Agent Configuration, etc.
Service Management Policies
•Information about customer and business defined service management policies,
SLA/OLAs, etc.
Copyright © 2004 Micromuse Inc. All rights reserved.
24
Service Management Database: ITIL/PIP & Service
Management
Copyright © 2004 Micromuse Inc. All rights reserved.
25
Source: EarthLink Service and Technology Monitoring
Business Process Monitoring – ITIL Change Management
“
What gets measured,
gets done!
Source: Tom Peters
Copyright © 2004 Micromuse Inc. All rights reserved.
26
”
Overview – Controlling Change and Benefits
Drivers
> Adoption of ITIL/COBIT Best Practices for Change Management
> Significant change for many groups – Fear, Uncertainty, Doubt (FUD)
> No Real-Time Visibility into Change/Downtime Management Activities
> Business Process
> Who, What, When, Where, Why, and How, Cost, Risk, and Impact
> Workflow – Monitor Lifecycle, SLAs, Bottlenecks – Is the process enabling
Operations or is it a bottleneck?
> Impact on Infrastructure – False Positives, Contact Center Call Volume
(COGS)
> Drive out False Positives from Production Monitoring Systems
> Huge burden on NOC and other support staff
> Desire to have Automated Remedy Trouble Ticket Creation
> Reduce time to address problems, reduces MTTR
Copyright © 2004 Micromuse Inc. All rights reserved.
27
Enabling Change Management with Netcool Suite
Solution
> Provide Real-Time Visibility into Change/Downtime Process
> Create Actionable Information
> Ensure Business Rules are Guiding/Enabling the Process – Not
Hindering It
> Eliminate FUD
> Report (dashboards, reports) on Process and Impact
> NOC and other support groups know what’s happening during
change and downtime windows
> Management has oversight and visibility
> Business understands impact of change and downtime activity
Copyright © 2004 Micromuse Inc. All rights reserved.
28
Copyright © 2004 Micromuse Inc. All rights reserved.
29
Source: EarthLink Service and Technology Monitoring
Source: EarthLink Service and Technology Monitoring
Copyright © 2004 Micromuse Inc. All rights reserved.
30
Source: EarthLink Service and Technology Monitoring
Copyright © 2004 Micromuse Inc. All rights reserved.
31
Source: EarthLink Service and Technology Monitoring
Business Activity Monitoring
Copyright © 2004 Micromuse Inc. All rights reserved.
32
Source: EarthLink Service and Technology Monitoring
RAD 2.0 Presentation
Source: EarthLink Service and Technology Monitoring
Copyright © 2004 Micromuse Inc. All rights reserved.
33
Netcool Event Management
Change/Downtime Request Events
Change / Downtime ID
Change / Downtime Status
Suppressed Change/Downtime Activity Events
Event Suppressed by Change / Downtime
Copyright © 2004 Micromuse Inc. All rights reserved.
34
Source: EarthLink Service and Technology Monitoring
Future Enhancements
Planned Netcool/Impact Policies
> Impact on EarthLink
> COGS: Assess support cost impact due to change and downtime
activities within Operations and Customer Support in Real-Time
> Tier 1, 2, 3 Support Cycles
> Better Change and Release Management Planning
> Data Gap Management
> A common question: Why does my chart or graph have gaps?
> The solution: Annotate graphs, charts, portals, etc. with the reason
for data gaps caused by planned change/downtime activities
> How: Integrate change and downtime event information with all
performance, utilization, and capacity monitoring solutions via
Impact 3.0
Copyright © 2004 Micromuse Inc. All rights reserved.
35
RAD 2.0 Joint Development
Business Activity Monitoring:
Real-Time Customer Registration Dashboard
Copyright © 2004 Micromuse Inc. All rights reserved.
36
Source: EarthLink Service and Technology Monitoring
Continuous Improvement
“
We have a ‘strategic plan’.
It’s called doing things.
Source: Herb Kelleher
Copyright © 2004 Micromuse Inc. All rights reserved.
37
”
Continuous Improvement
> Making Applications “Monitoring Aware and Netcool Ready”
> Work with developers on getting a monitoring API embedded into
applications
> Every application and tier linked into Netcool directly (not through server
agent)
> Discovery, Topology, Dependency Modeling
> Monitoring accuracy and root cause depend on this!
> Need solution for Layer 1-7, likely two solutions (L1-3 & L4-7)
> Application, Transaction and Customer Performance Monitoring
> Synthetic transactions only get us so far…but will continue to evolve
> Don’t forget about client-server – everything isn’t web enabled!
> Agentless technologies are emerging to accurately map out application and
transaction flows, relationships, and topology
> Next (2nd/3rd) Generation Quality, Performance, Capacity, Utilization solution
needed
> Services, Applications, Servers, Storage, Network
Copyright © 2004 Micromuse Inc. All rights reserved.
38
Continuous Improvement
Building better Network and Systems Management
> Founded Atlanta Network and Systems Management Technical User
Group (ANSMTUG) in January 2004
> http://www.ansmtug.org
> Metro-Atlanta Fortune 100, Service Providers, Enterprise, Media, and
Emerging Technology Companies
> Bell South, The Home Depot, EarthLink, Southern Company, N2
Broadband, eDeltacom, Delta, CNN, Cingular, E*Trade, Knology
Broadband, Cox Communications
> Customers helping Customers
> Use Micromuse and other NSM products better
> Collectively drive product requirements and features into Micromuse
and other NSM vendors
Copyright © 2004 Micromuse Inc. All rights reserved.
39
Closing and Questions and Answers
> EarthLink is a happy Micromuse customer
> EarthLink depends on the Netcool suite’s openness,
flexibility and adaptability to keep up with innovation,
technology, and constant change
> EarthLink will continue to push the Netcool suite
beyond the sales and marketing slick
> EarthLink’s infrastructure, service, customer, and
business performance and availability continues to
improve because of our advanced use of the Netcool
suite
> Q&A
Copyright © 2004 Micromuse Inc. All rights reserved.
40