Transcript Document
Digital Service Efficiency: - A New Management Scorecard (DCM 10.2) Shekhar Dasgupta Founder GreenField Software 1 Digital Service Efficiency: A New Management Scorecard This presentation defines and outlines management scorecards, including Kaplan & Norton’s Balanced Scorecard. It then discusses why a supplementary scorecard should be used to measure IT efficiencies with respect to specific Data Center operational roles. Finally, it goes to show how next-gen DCIM software should build a role-based DSE framework to achieve organizational objectives and goals. 2 Management Scorecards What are they? Organizational Performance Management frameworks • Mix of financial & non-financial measures against benchmarks • Started in late 1970s by Dr. Aubrey Daniels • Goal: alignment of top management towards common organizational objectives & positive outcomes through key measurement parameters What do they measure? Scorecard Examples People Performance Process Efficiency Systems Efficiency 3 Balanced Scorecard • • • • • Developed by Kaplan & Norton in 1990s Linked company strategy to Financial & Non-Financial KPIs Multiple variants, including industryspecific templates Technology recognized as enabler for business process efficiencies and driver for innovation and growth Observations: Tool for objectively incentivizing executives on non-financial KPIs Practitioners have not evolved any IT infrastructure-related KPIs nor directly linked them to the BSC framework 4 Why a New Scorecard for Data Centers? • • • • • For the Data Center to function effectively/compete better Who are responsible to make that happen? How does one measure the new processes are being managed effectively? What are the costs and how do they measure against benchmarks? How does one measure that the innovations/ new systems are delivering desired outcomes? 5 Digital Service Efficiency: A New Scorecard for Data Centers. “Digital Service Efficiency (DSE) methodology is ebay’s miles-per-gallon (MPG) equivalent for viewing the productivity and efficiency of technical infrastructure across four key areas: performance, cost, environmental impact and revenue”. “The DSE methodology equips decision-makers to see the results of their technical infrastructure choices to date (i.e., what MPG they achieved with their design and operations), and serves as the flexible tool they need when faced with making new decisions (i.e., what knobs to turn to achieve maximum performance across all dimensions). Ultimately, DSE enables balance within the technology ecosystem by exposing how turning knobs in one dimension affects the others”. Original Designer: Dean Nelson from eBay, Inc. 6 DSE Dashboard ebay’s real-time dashboard available on http://tech.ebay.com/dashboard 7 Next Gen DCIM Delivering Role-Based DSE Scorecard 8 DCIM & Role-Based Digital Services Efficiency Today’s DCIM • Current DCIM measures real-time: IT Asset utilization DC Power usage (PUE, CUE) • • Cooling Requirements Floor & Rack Space Occupancy Data analyzed for improving Energy Efficiencies & Capacity Planning Helps to predict & prevent failures Next-Gen DCIM • DCIM DSE will provide Role-Based Scorecards • DCIM DSE will provide granular cost measurement across complete IT infrastructure • DCIM DSE scorecard will measure Infrastructure Capability for Process Improvements & Technology Innovations. 9 Data Center Operations & Roles Data Center Manager Data Center Manager: Responsible for overall data center operations Data Center Facility Staff: Responsible for data center facilities operations Data Center IT Staff: Responsible for data center IT operations Facility Staff IT Staff 10 DCIM DSE Scorecard For Facility Staff 11 Data Center Facility Staff Scheduled & Preventive Maintenance Incident and Problem Management Maintaining Energy efficiency Facility KPIs Facility KRA Infrastructure monitoring & health check Uptime Reporting 12 Infrastructure Monitoring KRA & KPIs KRA: Data Center infrastructure monitoring & health check DCIM DCIM provides better DC facility monitoring by Real-time monitoring of power systems > Electrical panels (HT & LT panels), UPS, PDUs (row & rack) Real-time monitoring of cooling systems > Chillers, PACs, AHU Real-time monitoring of environmental statistics of DC > temperature, humidity, waterleak, smoke, fire Ability to monitor above subsystems through a single dashboard and get alerts on abnormal conditions over email/SMS Cooling KPIs UPS KPIs Environment KPIs Fan Runtime Utility Line & Output Voltage Cabinet Internal Temperature Supply Air Temperature Power Loss Cabinet Internal Humidity Supply Air Humidity UPS Load Room Ambient Temperature/Humi dity Rack Cooling Index Remaining Battery Capacity Smoke Return Temperature Internal UPS Index temperature Water Leak Detection Return Air Humidity Cabinet Door Ajar UPS battery run time remaining before battery exhaustion Power Consumption The elapsed time since Motion (kW) the UPS has switched to battery power 13 Maintenance KRA & KPIs Scheduled Maintenance KRA: Preventive & Breakdown Maintenance DCIM Real-time monitoring & alerts help staff during routine checks as well as preventing failures of facility equipment. Helps scheduling routine maintenance for facility devices Breakdown maintenance analysis prevents similar failures or enables faster recovery Breakdown Maintenance Age of Device Failure Rate Criticality of Device Mean Time Between Failures Date of Last Check-Up Mean Time To Repair Check-Up Frequency Total Maintenance Cost Asset Replacement Value Condition Based Maintenance % Uptime Required Time Spare Part Used Versus Availability Immediate Corrective Maint. Time Total DT Related to Maintenance 14 Incident Management KRA & KPIs KRA: Incident and Problem Management DCIM DCIM enforces ITSM best practice framework on data center facility operations and ensures that all incidents, service requests are tracked till closure Incident Measures Resolution Measures Number of Incidents Mean Response Time versus target response time Breakdown of incidents at Mean elapsed time for each stage (logged, WIP incident resolution (Turn and closed) around Time) Number and % of major incidents % of incidents resolved within target resolution time Number of incidents reopened as % of total Number and % of incidents incorrectly assigned Breakdown of incidents by Number and % of incidents time of day incorrectly categorized 15 DCIM: Helping facility staff with their PUE KPI KPI: Maintain level (PUE) efficiency Ensure a stable PUE for the data center DCIM DCIM monitors data center PUE at real-time and also does analytics on historical PUE data to recommend ways to improve PUE Other power management measures: watt per sf, RCI 16 DCIM: Helping facility staff in their Uptime KPI KPI: Facility Uptime as per SLA Periodic reporting of Facility uptime, RTO & RPO statistics of Facility Services & subsystems. DCIM DCIM provides Facility Uptime and recovery metrics. Includes reporting on health & functional statistics of facility subsystems like power, cooling and environmental components. DCIM provides dashboards, analytics and scheduled reports on facility uptime, DC energy efficiency (PUE) and incident management 17 DCIM DSE Scorecard For IT Staff 18 KRA: Data Center IT Staff IT Monitoring IT Asset Management IT Vendor/Contract Management IT KPIs IT KRA IT Hardware Maintenance Business Continuity Reporting 19 Monitoring & Provisioning KRAs & KPIs KRA: IT Monitoring Provisioning & DCIM Real-time monitoring of resource utilization of IT devices: server CPU, memory, storage, network bandwidth. Proactive monitoring enables thresholds are breached. alerts when Monitoring Provisioning CPU Utilization Time to Harden a New Server Memory Utilization Time to Provision a New Device Power Consumption Time to Provision New Rack space Storage Utilization versus Free Time to Virtualize a new Storage system Server Uptime versus Target Time to replace a legacy system Failures Prevented Due to proactive monitoring Time to decommission a legacy system Failures due to human errors Time to install patches & updates Auto Provisioning of Racks & Devices Virtualization Planner Identifies servers that can be virtualized. Also identifies under-utilized IT devices; recommends retirement, replacement. 20 IT Hardware Maintenance KRA & KPIs KRA: IT Maintenance Hardware Breakdown Maintenance Age of Device Failure Rate Criticality of Device based on utilization and application hosted Mean Time Between Failures Date of Last Check-Up Mean Time To Repair DCIM Scheduled Maintenance DCIM helps schedule preventive maintenance (PM) based on following: Age of a device as recorded in DCIM Utilization/load of device as monitored by DCIM DCIM helps IT staff understand cascading effect of temporary unavailability (due to PM) of a particular device: send prior notification Date of last upgrade/nature of Total Maintenance Cost upgrade Asset Replacement Value Condition Based Maintenance Uptime % Required Time Spare Part Used Versus Availability Immediate Corrective Maint. Time Total DT Related to Maintenance 21 IT Asset Management KRA & KPIs KRA: IT Asset Management DCIM DCIM serves as enterprise asset management software for both IT & Facilities. DCIM auto-discovers intelligent assets and creates asset database. DCIM helps manage IT asset relationships DCIM also maintains information about redundant assets in HA and DR setup Asset Management Time taken to add or delete intelligent & Non-intelligent asset Time taken to update due to MAC Time taken to add interdependencies between assets % accuracy of asset database % Over & Under Provisioned 22 Vendor/Contract Management KRA & KPIs KRA: Vendor/Contract Management DCIM DCIM tracks support renewal dates Tracks hardware vendor/supplier and services provider Vendor Management % of systems out of support renewal % Uptime by device category and vendor % Contractor’s Compliance by SLA terms 23 Business Continuity KRA & KPIs KRA: Business Continuity Business Continuity Recovery Time Objective (RTO) DCIM DCIM helps in better impact analysis of outages and in faster RCA of any incident and thereby helps in faster turn-around-time Recovery Point Objective (RPO) Actual versus RTO & RPO 24 DCIM: Helping IT staff in their Reporting KRA KRA: Reporting DCIM DCIM provides superior reporting on IT infra availability, resource utilization and incident management Trend Comparison for Multiple Servers 25 DCIM DSE Scorecard For Data Center Manager 26 Increase profitability by controlling data center cost Minimize DC failure and improve availability Improve operational efficiency and meet business SLA Data center capacity planning Adopt ‘Green’ practices for sustainable DC operations Reports & Analytics DC Manager KPIs DC Manager KRA KRA: Data Center Manager 27 Data Center Manager Cost & Profitability KRA & KPIs KRA: Increase profitability by controlling data center cost DCIM Control CapEx: Repurpose under-utilized servers Discover stranded capacities & defer costly upgrade Reduce OpEx: Reduce cooling costs Reduce server footprint 28 Data Center Manager Availability KRA & PIs KRA: Minimize DC failure and improve availability Actuals Number of Incidents/alarms DCIM Ability to predict failures Better impact analysis in the event of subsystem/component failure Faster RCA and Turn around Time capabilities Breakdown of alarms at each stage (logged, WIP and closed) Major alarms by type Facilities: Fire, Temp, …. IT: Server, Storage, Application… RTO SLA Benchmarks RPO 29 Data Center Manager Operational Efficiency vs. SLA KRA: Improve operational efficiency and meet business SLA DCIM DCIM automates critical data center processes like Asset Management, Capacity Planning and Provisioning, thereby minimizing human error, increasing accuracy and data integrity and improving operational efficiency of the data center. Actuals Asset DB Accuracy Time and Cost to Provision additional resources Availability by Servers, Storage and Applications SLA Benchmarks Watt Per Rack and Watt per sq ft PUE & CUE 30 Data Center Manager: Capacity Planning KRA & KPIs KRA: Data center capacity planning Planning & Forecasting Incidents due to Capacity Shortages Exactness of Capacity Forecast Capacity Adjustments % reduction in panic buying Unplanned Capacity Adjustments % reduction in lost business due to inadequate capacity Resolution Time of Capacity Shortage Capacity Reserves Percentage of Capacity Monitoring Relative reduction in cost of production of Capacity Plan DCIM Monitoring Monitor current capacity utilization Forecast future capacity requirement accurately Design and implement critical capacities efficiently without under/overprovisioning Sources: 1. Clemson Computing & Information Technology 2. IT Process Maps 31 Data Center Manager ‘Green Practices’ KRA & KPIs KRA: Adopt ‘Green’ practices for sustainable DC operations DCIM Monitor energy consumption in the data center till the lowest level Find ways to reduce energy consumption and improve efficiency Ensure that DC operations comply with organization’s sustainability goals 32 Data Center Manager Reporting KRA & KPIs KRA: Reports & Analytics DCIM Reports & Analytics on - Uptime and availability - Energy efficiency and health - Data center costs and savings - Capacity/Resource utilization and availability - Operational efficiency and SLA Compliance 33 How Will DSE Scorecard Help Data Center Operations? Are the Data Centre Infrastructure & Capital Costs aligned to process improvements? Have we been able to reduce Infrastructure OpEx? Next Gen DCIM w/ DSE Scorecard Are we maintaining a Risk-free Data Centre Infrastructure? Is the infrastructure delivering on the technology innovation? Link Back to Organizational Vision & Strategy & BSC 34 Thank You Shekhar Dasgupta [email protected] Mobile: 408-431-1044 35