Transcript Document

Digital Service Efficiency:
- A New Management Scorecard (DCM 10.2)
Shekhar Dasgupta
Founder
GreenField Software
1
Digital Service Efficiency: A New Management Scorecard
This presentation defines and outlines management scorecards, including Kaplan
& Norton’s Balanced Scorecard. It then discusses why a supplementary scorecard
should be used to measure IT efficiencies with respect to specific Data Center
operational roles. Finally, it goes to show how next-gen DCIM software should
build a role-based DSE framework to achieve organizational objectives and goals.
2
Management Scorecards
What are they?
Organizational Performance Management frameworks
• Mix of financial & non-financial measures against benchmarks
• Started in late 1970s by Dr. Aubrey Daniels
• Goal: alignment of top management towards common organizational
objectives & positive outcomes through key measurement parameters
What do they measure?
Scorecard Examples
 People Performance
 Process Efficiency
 Systems Efficiency
3
Balanced Scorecard
•
•
•
•
•
Developed by Kaplan & Norton in
1990s
Linked company strategy to Financial
& Non-Financial KPIs
Multiple variants, including industryspecific templates
Technology recognized as enabler for
business process efficiencies and
driver for innovation and growth
Observations:
 Tool for objectively incentivizing
executives on non-financial KPIs
 Practitioners have not evolved
any IT infrastructure-related KPIs
nor directly linked them to the
BSC framework
4
Why a New Scorecard for Data Centers?
•
•
•
•
•
For the Data Center to function effectively/compete better
Who are responsible to make that happen?
How does one measure the new processes are being managed effectively?
What are the costs and how do they measure against benchmarks?
How does one measure that the innovations/ new systems are delivering
desired outcomes?
5
Digital Service Efficiency: A New Scorecard for Data Centers.
“Digital Service Efficiency (DSE) methodology is ebay’s miles-per-gallon (MPG)
equivalent for viewing the productivity and efficiency of technical
infrastructure across four key areas: performance, cost, environmental impact
and revenue”.
“The DSE methodology equips decision-makers to see the results of their
technical infrastructure choices to date (i.e., what MPG they achieved with
their design and operations), and serves as the flexible tool they need when
faced with making new decisions (i.e., what knobs to turn to achieve maximum
performance across all dimensions). Ultimately, DSE enables balance within the
technology ecosystem by exposing how turning knobs in one dimension affects
the others”.
Original Designer: Dean Nelson from eBay, Inc.
6
DSE Dashboard
ebay’s real-time dashboard available on http://tech.ebay.com/dashboard
7
Next Gen DCIM
Delivering Role-Based DSE Scorecard
8
DCIM & Role-Based Digital Services Efficiency
Today’s DCIM
• Current DCIM measures real-time:
 IT Asset utilization
 DC Power usage (PUE, CUE)
•
•
 Cooling Requirements
 Floor & Rack Space
Occupancy
Data analyzed for improving
Energy Efficiencies & Capacity
Planning
Helps to predict & prevent failures
Next-Gen DCIM
• DCIM DSE will provide Role-Based
Scorecards
• DCIM DSE will provide granular
cost measurement across
complete IT infrastructure
• DCIM DSE scorecard will measure
Infrastructure Capability for
Process Improvements &
Technology Innovations.
9
Data Center Operations & Roles
Data Center Manager
Data Center Manager: Responsible
for overall data center operations
Data
Center
Facility
Staff:
Responsible for data center facilities
operations
Data Center IT Staff: Responsible for
data center IT operations
Facility Staff
IT Staff
10
DCIM DSE Scorecard
For Facility Staff
11
Data Center Facility Staff
Scheduled & Preventive Maintenance
Incident and Problem Management
Maintaining Energy efficiency
Facility KPIs
Facility KRA
Infrastructure monitoring & health check
Uptime Reporting
12
Infrastructure Monitoring KRA & KPIs
KRA:
Data
Center
infrastructure monitoring
& health check
DCIM
DCIM provides better DC facility monitoring by
 Real-time monitoring of power systems >
Electrical panels (HT & LT panels), UPS, PDUs
(row & rack)
 Real-time monitoring of cooling systems >
Chillers, PACs, AHU
 Real-time monitoring of environmental
statistics of DC > temperature, humidity, waterleak, smoke, fire
 Ability to monitor above subsystems through a
single dashboard and get alerts on abnormal
conditions over email/SMS
Cooling KPIs
UPS KPIs
Environment KPIs
Fan Runtime
Utility Line & Output
Voltage
Cabinet Internal
Temperature
Supply Air
Temperature
Power Loss
Cabinet Internal
Humidity
Supply Air Humidity
UPS Load
Room Ambient
Temperature/Humi
dity
Rack Cooling Index
Remaining Battery
Capacity
Smoke
Return Temperature Internal UPS
Index
temperature
Water Leak
Detection
Return Air Humidity
Cabinet Door Ajar
UPS battery run time
remaining before
battery exhaustion
Power Consumption The elapsed time since Motion
(kW)
the UPS has switched to
battery power
13
Maintenance KRA & KPIs
Scheduled Maintenance
KRA:
Preventive
&
Breakdown Maintenance
DCIM
 Real-time monitoring & alerts help staff
during routine checks as well as preventing
failures of facility equipment.
 Helps scheduling routine maintenance for
facility devices
 Breakdown maintenance analysis prevents
similar failures or enables faster recovery
Breakdown Maintenance
Age of Device
Failure Rate
Criticality of Device
Mean Time Between
Failures
Date of Last Check-Up
Mean Time To Repair
Check-Up Frequency
Total Maintenance Cost
Asset Replacement Value
Condition Based
Maintenance %
Uptime
Required Time
Spare Part Used Versus
Availability
Immediate Corrective Maint.
Time
Total DT Related to
Maintenance
14
Incident Management KRA & KPIs
KRA: Incident and Problem
Management
DCIM
DCIM enforces ITSM best practice framework on
data center facility operations and ensures that
all incidents, service requests are tracked till
closure
Incident Measures
Resolution Measures
Number of Incidents
Mean Response Time
versus target response time
Breakdown of incidents at Mean elapsed time for
each stage (logged, WIP
incident resolution (Turn
and closed)
around Time)
Number and % of major
incidents
% of incidents resolved
within target resolution
time
Number of incidents
reopened as % of total
Number and % of incidents
incorrectly assigned
Breakdown of incidents by Number and % of incidents
time of day
incorrectly categorized
15
DCIM: Helping facility staff with their PUE KPI
KPI: Maintain
level (PUE)
efficiency
Ensure a stable PUE for the
data center
DCIM
DCIM monitors data center PUE at real-time and
also does analytics on historical PUE data to
recommend ways to improve PUE
Other power management
measures: watt per sf, RCI
16
DCIM: Helping facility staff in their Uptime KPI
KPI: Facility Uptime as per
SLA
Periodic
reporting
of
Facility uptime, RTO & RPO
statistics of Facility Services
& subsystems.
DCIM
DCIM provides Facility Uptime and recovery
metrics. Includes reporting on health &
functional statistics of facility subsystems like
power, cooling and environmental components.
DCIM provides dashboards, analytics and
scheduled reports on facility uptime, DC energy
efficiency (PUE) and incident management
17
DCIM DSE Scorecard
For IT Staff
18
KRA: Data Center IT Staff
IT Monitoring
IT Asset Management
IT Vendor/Contract Management
IT KPIs
IT KRA
IT Hardware Maintenance
Business Continuity
Reporting
19
Monitoring & Provisioning KRAs & KPIs
KRA: IT Monitoring
Provisioning
&
DCIM
Real-time monitoring of resource utilization of IT
devices: server CPU, memory, storage, network
bandwidth.
Proactive monitoring enables
thresholds are breached.
alerts
when
Monitoring
Provisioning
CPU Utilization
Time to Harden a New
Server
Memory Utilization
Time to Provision a New
Device
Power Consumption
Time to Provision New Rack
space
Storage Utilization versus Free Time to Virtualize a new
Storage
system
Server Uptime versus Target
Time to replace a legacy
system
Failures Prevented Due to
proactive monitoring
Time to decommission a
legacy system
Failures due to human errors
Time to install patches &
updates
Auto Provisioning of Racks & Devices
Virtualization Planner Identifies servers that can
be virtualized. Also identifies under-utilized IT
devices; recommends retirement, replacement.
20
IT Hardware Maintenance KRA & KPIs
KRA:
IT
Maintenance
Hardware
Breakdown Maintenance
Age of Device
Failure Rate
Criticality of Device based on
utilization and application
hosted
Mean Time Between
Failures
Date of Last Check-Up
Mean Time To Repair
DCIM
Scheduled Maintenance
DCIM
helps
schedule
preventive
maintenance (PM) based on following:
Age of a device as recorded in DCIM
Utilization/load of device as monitored by
DCIM
 DCIM helps IT staff understand cascading
effect of temporary unavailability (due to
PM) of a particular device: send prior
notification
Date of last upgrade/nature of Total Maintenance Cost
upgrade
Asset Replacement Value
Condition Based Maintenance Uptime
%
Required Time
Spare Part Used Versus
Availability
Immediate Corrective
Maint. Time
Total DT Related to
Maintenance
21
IT Asset Management KRA & KPIs
KRA: IT Asset Management
DCIM
 DCIM serves as enterprise asset
management software for both IT & Facilities.
 DCIM auto-discovers intelligent assets and
creates asset database.
 DCIM helps manage IT asset relationships
 DCIM also maintains information about
redundant assets in HA and DR setup
Asset Management
Time taken to add or delete intelligent &
Non-intelligent asset
Time taken to update due to MAC
Time taken to add interdependencies
between assets
% accuracy of asset database
% Over & Under Provisioned
22
Vendor/Contract Management KRA & KPIs
KRA:
Vendor/Contract
Management
DCIM
 DCIM tracks support renewal dates
 Tracks hardware vendor/supplier and services
provider
Vendor Management
% of systems out of support
renewal
% Uptime by device category and
vendor
% Contractor’s Compliance by SLA
terms
23
Business Continuity KRA & KPIs
KRA: Business Continuity
Business Continuity
Recovery Time Objective (RTO)
DCIM
DCIM helps in better impact analysis of outages
and in faster RCA of any incident and thereby
helps in faster turn-around-time
Recovery Point Objective (RPO)
Actual versus RTO & RPO
24
DCIM: Helping IT staff in their Reporting KRA
KRA: Reporting
DCIM
DCIM provides superior reporting
on IT infra availability, resource
utilization
and
incident
management
Trend Comparison for Multiple Servers
25
DCIM DSE Scorecard
For Data Center Manager
26
Increase profitability by controlling data center cost
Minimize DC failure and improve availability
Improve operational efficiency and meet business SLA
Data center capacity planning
Adopt ‘Green’ practices for sustainable DC operations
Reports & Analytics
DC Manager KPIs
DC Manager KRA
KRA: Data Center Manager
27
Data Center Manager Cost & Profitability KRA & KPIs
KRA: Increase profitability
by controlling data center
cost
DCIM
Control CapEx:
 Repurpose
under-utilized
servers
 Discover stranded capacities
& defer costly upgrade
Reduce OpEx:
 Reduce cooling costs
 Reduce server footprint
28
Data Center Manager Availability KRA & PIs
KRA: Minimize DC failure
and improve availability
Actuals
Number of Incidents/alarms
DCIM
Ability
to
predict
failures
Better impact analysis
in
the
event
of
subsystem/component
failure
Faster RCA and Turn
around Time capabilities
Breakdown of alarms at each
stage (logged, WIP and
closed)
Major alarms by type
Facilities: Fire, Temp, ….
IT: Server, Storage,
Application…
RTO
SLA Benchmarks
RPO
29
Data Center Manager Operational Efficiency vs. SLA
KRA: Improve operational
efficiency
and
meet
business SLA
DCIM
 DCIM automates critical data center processes
like Asset Management, Capacity Planning and
Provisioning, thereby minimizing human error,
increasing accuracy and data integrity and
improving operational efficiency of the data
center.
Actuals
Asset DB Accuracy
Time and Cost to Provision
additional resources
Availability by Servers,
Storage and Applications
SLA Benchmarks
Watt Per Rack and Watt per
sq ft
PUE & CUE
30
Data Center Manager: Capacity Planning KRA & KPIs
KRA: Data center capacity
planning
Planning & Forecasting
Incidents due to Capacity Shortages
Exactness of Capacity Forecast
Capacity Adjustments
% reduction in panic buying
Unplanned Capacity Adjustments
% reduction in lost business due to
inadequate capacity
Resolution Time of Capacity Shortage
Capacity Reserves
Percentage of Capacity Monitoring
Relative reduction in cost of
production of Capacity Plan
DCIM
Monitoring
Monitor current capacity
utilization
Forecast future capacity
requirement accurately
Design and implement
critical capacities efficiently
without
under/overprovisioning
Sources: 1. Clemson Computing & Information Technology
2. IT Process Maps
31
Data Center Manager ‘Green Practices’ KRA & KPIs
KRA: Adopt ‘Green’ practices
for sustainable DC operations
DCIM
 Monitor
energy
consumption in the data
center till the lowest level
 Find ways to reduce
energy consumption and
improve efficiency
 Ensure that DC operations
comply with organization’s
sustainability goals
32
Data Center Manager Reporting KRA & KPIs
KRA: Reports & Analytics
DCIM
Reports & Analytics on
- Uptime and availability
- Energy efficiency and health
- Data center costs and
savings
- Capacity/Resource
utilization and availability
- Operational efficiency and
SLA Compliance
33
How Will DSE Scorecard Help Data Center Operations?
Are the Data Centre
Infrastructure & Capital Costs
aligned to process
improvements?
Have we been able to reduce
Infrastructure OpEx?
Next Gen
DCIM w/ DSE
Scorecard
Are we maintaining a Risk-free
Data Centre Infrastructure?
Is the infrastructure delivering on
the technology innovation?
Link Back to Organizational Vision & Strategy & BSC
34
Thank You
Shekhar Dasgupta
[email protected]
Mobile: 408-431-1044
35