Service Management @ Colruyt

Download Report

Transcript Service Management @ Colruyt

Service Management @ Colruyt
slide 1
Frank Waegeman
Frank Waegeman
Team Manager
Service Management
[email protected]
slide 2
Assignment Service management
Service Management BP&S has the overall
responsibility, together with all stakeholders, to
ensure that the operations and support of the
operational BP&S products and services meet
and continue to meet the agreed service levels
We keep
slide 3
SH..
out of ..IT
Role of Service Mgmt
in the Service Life Cycle
Service
Management
Managed
service
Solution
Delivery
Solutions deliver the new
functional and nonfunctional
requirements
 fix the service levels
Ensures that we keep the
agreed SLE’s
Solution
slide 4
Guidelines for
Service management
Used standard: ITIL (“Information Technology Infrastructure Library”)
= Goal
= Series of best practices (guidance) to set up the
necessary operational processes for an (ICT)
organisation
 Service management ensures that these
processes can be incorporated within BP&S
slide 5
Reference model
The processes…
Business IT Alignment
Business
Assessment
Operation bridge
Incident
Customer
Management
Event
Request
Fulfilment
IT Strategy
development
Configuration
Problem
Change
Service Planning
SLA Management
Availability
COST
Continuity
Capacity
slide 6
Service Design & Management
Release to
Production
Build & test
Service development & deployment
The processes…
Business IT Alignment
Business
Assessment
Operation bridge
Incident
Customer
Management
Event
Request
Fulfilment
IT Strategy
development
Configuration
Problem
Change
Service Planning
SLA Management
Availability
COST
Continuity
Capacity
slide 7
Service Design & Management
Release to
Production
Build & test
Service development & deployment
Why Service Management?
slide 8
For the Business
PRODUCTION
IS
CRUCIAL
slide 9
Operational ITIL
PRODUCTION
We make every effort to keep a stable production environment
today and tomorrow.
To achieve this we need to set up different processes
slide 10
Operational ITIL
PRODUCTION
CHANGE
You can only have a stable production environment
if you have control over the operational changes
slide 11
Operational ITIL
PRODUCTION
ITChange
CHANGE
Change
Calendar
ITCONFIG
Having control over the operational changes means:
- planning and communicating each change
- knowing ALL the changes
- knowing the correct impact of a change
slide 12
Operational ITIL
PRODUCTION
ITChange
CHANGE
Change
Calendar
ITCONFIG
ITASSET
Asset management is mandatory for asset validation
CHANGE_ASSET = INCIDENT_ASSET = EVENT_ASSET
slide 13
Operational ITIL
PRODUCTION
ITChange
Unavailability
CHANGE
Change
Calendar
ITSERVICES
SLA & SLE
ITCONFIG
Having control over
the impact means:
Change Window
ITASSET
- knowing what an enduser needs (inventory of assets)
- knowing the change window of an impacted asset
- communicating the changes for each itservice
slide 14
Change
Goal
Ensure that changes can happen within the
agreed SLEs and without affecting the
stability of the production
slide 15
Change
How
• Having control over the changes:
– Each Change is communicated  ITChange
– Each Change is planned  ChangeCalender
– Each Change impact is known
– Each Change is authorised
The CAB (Change Advisory Board) manage all changes.
slide 16
Configuration & Asset
Impact Analysis & dependences
The environment becomes more and more complex
The impact becomes bigger
Extra availability becomes ‘normal’
The change windows become smaller
How can we keep an overview
of all these assets & relations?
slide 17
What is the impact?
When can I deploy this middleware service?
When can I install a new application server?
When can I upgrade the RAC Database?
When can I switch this cable?
When can I maintain the UPS System?
How can I move a datacenter?
slide 18
IMPACT
80 % of all unavailabilities are due to changes
(Gartner)
Today 99% of all changes are running fine at Colruyt,
but this still generates more than 40% of all unavailabilities…
slide 19
Impact?
Which services are impacted when
I pull the fibre cable connected to
the director XFBS011102 on port
26 module 2?
slide 20
IMPACT?
The impact list of component
XFBS011102-FC2/26
contains 1954 entries
(Result on 20/01/2010)
ORACPC50_PROCESS
ORACPC50
BRSTD001@ORACPC50
Bootdisk
MAC
SVLIPC71-001A64D32554
NETWORK
XWBS013P21 – GI0/12
The ITService
VERKOOP_FVS2000
has 1199 dependences
(Result on 20/01/2010)
TELLINGEN_ALIAS
BRANCHCOUNT001
ITSERVICE
VERKOOP_FVS2000
ITSERVICE
VERKOOP_FVS2000
ITSERVICE
VERKOOP_FVS2000
ITSERVICE
VERKOOP_FVS2000
DS-JDBC_BRANCHCOUNT
FIBERCARD1
SVLIPC71-500110A00016C17E
Director 1
XFBS011101-FC2/26
XFBS011101-FC6/4
XFBS011101-FC9/4
SAN
FIBERCARD2
SVLIPC71-50050763060005D4
Wilgenveld 1214B RACK AD41
slide SVLIPC71
21
Director 2
XFBS011102-FC2/26
XFBS011102-FC6/4
XFBS011102-FC9/4
SAN
DS8300W-50050763060005D4
DS8300W-50050763060B05D4
DS8300W-50050763061405D4
DS8300W-50050763061905D4
RELATIONS
Country
Site
Physical server
MF
Building
Room
Load balancer
Windows
Linux
LPAR
Rack
ESX
Unix
Logical server
Blade Chassis
Network component
Fibercard
Others
Bootdisk
STC’s
Physical Database
Storage
WAS
CICS
Windows Shares
Logical Database
JDBC connection
IMSL
Windows Services
Middleware services
Queue
Application
ITELEMENTS
slide 22
IRAP
ITFUNCTIONS
ITSERVICES
Universe
Reports
ITService e.g. Finance
ITSERVICE
AGENDA
ARCHIVES
ATST
DIENSTINFO_SHARE
EXCEL
INTERNET_CONNECTIVITY
IRAP
FILT
MICF
ONKO
PAFW
PEOPLESOFT_HUMAN_RESOURCES
PERSONEELSDIENST_SHARE
PNPEPAFW_REPORTGROUP
TELEFONIE
....
35 top levels
MUST
MUST
SHOULD
MUST
SHOULD
MUST
MUST
SHOULD
SHOULD
MUST
SHOULD
MUST
MUST
MUST
SHOULD
685 dependencies
35 top levels defined by the FA
gives 685 dependences for this itservice
slide 23
ASSET
Extra availability
Extra availability is a period outside the normal availability
hours when you want to make use of the ITService
e.g. Extra work needs to be done on Saturday
e.g. No changes on related ITServices because the financial
year closure takes place the first 2 weeks of April
e.g. Next week project H59A asks full exclusivity for changes
because of the size of the project
e.g. A demo will take place at the fair this weekend
slide 24
Frozen period
During the whole month of December we reduce the amount of
changes to an absolute minimum for the complete Colruyt Group
because:
- This period is too crucial to take risks for the Colruyt Group
(each change is a risk…)
- We notice that a yearly ‘rest of our IT’ is good for stability
slide 25
The processes…
Business IT Alignment
Business
Assessment
Operation bridge
Incident
Customer
Management
Event
Request
Fulfilment
IT Strategy
development
Configuration
Problem
Change
Service Planning
SLA Management
Availability
COST
Continuity
Capacity
slide 26
Service Design & Management
Release to
Production
Build & test
Service development & deployment
Incident
What
• An incident is an event caused by a disruption
or a reduction in quality of a service
slide 27
Incident
Goal
- Return as soon as possible to the ‘normal situation’ so
the end user can continue doing his job
- Minimise the negative impact on the business operation
It is not the goal of incident to fix the problem in a
permanent way  Cost vs benefit
An incident is fixed when the EU can continue with his work
and when he agrees with the proposed solution
slide 28
slide 29
Information Request
Information Requests are handled by the
Key user of the application on business side
slide 30
Disaster
Escalation of an incident
• Prio1 and 2 incidents can be escalated to disaster by helpdesk
• Escalated incidents are evaluated by a disaster coordinator
• Not every escalated incident results in a disaster!
• The disaster coordinator coordinates the disaster until the
incident is under control
• Tools : Adobe connect, disastertel, disaster room
slide 31
Request Fulfilment
What
• Handles standard IT requests (computer, keyboard,
software, hardware, mobile devices,...) of an end user
• <> INCIDENT!
slide 32
Request Fulfilment
Help
• Link @ Portal to Servicedesk
slide 33
Event Monitoring
What
Monitors all events that occur throughout the IT infrastructure, to monitor normal
operation and to detect and escalate exception conditions
We have :
– Passive monitoring: Detects operational events configuration item (asset)
– Active Monitoring: Active testing of a health status of a configuration item
(asset)
slide 34
Event Management
How does ITO works?
Collecting
Snmptraps,
Application & System Log
Monitoring
System Messages
Mail2ITO
Processing
Filtering
Priority
Grouping
Threshold
Acting
Automatic Actions
Operator Initiated Actions
Incident Management
Notification (SMS)
slide 35
Event Overview
Monitoring Strategy
Fixes
Filter
CHANGE
A
P
P
L
I
C
A
T
I
O
N
S
M
A
C
H
I
N
E
S
Automatic
Actions
ITO
SUPPORT
TEAM
Operator Initiated
Actions
OPERATIONS
INCIDENT
PROBLEM
Workarounds
CONFIG
HELPDESK
END USERS
slide 36
Problem
What
Problem management is focused on:
• Solving the underlying cause of a incident
“How can we avoid this?”
• Ideas from the end user
• Managing problems that you deliberated not to fix
• status REJECTED!
active & proactive
slide 37
Thanks
Questions
slide 38