Transcript From the Server to the Intranet
Seeing the Forest AND the Trees: Capacity Planning for a Large Number of Servers
Linwood Merritt Capital One Services, Inc.
February 2004 page 1
Introduction: Environment • Capital One – 4th largest card issuer in the United States – Capital One to S&P 500 in 1998 – Fortune 500 company starting in 2000 – Managed loans at $71.2 billion as of Q4 2003 – Accounts at 47.0 million as of Q4 2003 – CIO 100 Award “Master of the Customer Connection” – Information Week “Innovation 100” Award Winner – ComputerWorld “Top 100 places to work in IT” February 2004 page 2
Categories of Issues • Acquiring and using business data • Exception detection • Platform types and operating systems • Data capture analysis • Organization and reporting of server structure • Bulk Capacity Planning • Business driver based forecasting • Visualization techniques February 2004 page 3
Acquiring Business Data • Chaney, Bob, “The Capacity Performance Council, Start Yours Today,” CMG1999 Proceedings • Chaney, Bob, “Divide and Conquer: Implementing the Capacity Performance Council in Pieces,” CMG2001 Proceedings • Merritt, Linwood, “A Capacity Planning Partnership with the Business,” CMG2002 Proceedings and UKCMG 2003 February 2004 page 4
Business Data Inputs • “Pull” – Capacity impact forms – Interview meetings – Phone calls – e-mails • “Push” – Capacity Councils February 2004 page 5
Capacity Councils • Cross-organizational structure • Purpose: Bring together business and technical views of applications and supporting technologies.
• Evolution: Single Capacity Council, multiple Capacity Councils along business lines, multiple
Technical
Councils along types of platforms (mainframe “MVS,” Unix, NT, etc.) February 2004 page 6
Capacity Council Deliverables • Monthly meeting • Evaluation of Business Area Capacity Status (“stoplight” green/yellow/red color) • Business driver mapping to servers • Monthly report February 2004 page 7
“Stoplight” Status • Green: Little concern about hitting capacity constraints in the next 6 months • Yellow: Concern about capacity in the next 4-6 months • Red: Very high concern about not meeting capacity needs in the next three months February 2004 page 8
Capacity Planners’ Responsibilities PlannerA PlannerB
Capacity Planners
PlannerC PlannerD PlannerE PlannerF Mashies Tee shot Fairway shots Pitch Mgmt Foursome
Capacity Councils
Putt Mgmt Clubhouse Mgmt Chip Mgmt February 2004 page 9
Business Drivers • Capacity Councils: Business units responsible for capacity planning of “demand” side • Capacity Planners: Build “supply side” projections based on business drivers and historical trending February 2004 page 10
Business Driver Based Forecasting • Map business drivers to servers.
• Use historical data to correlate.
• Use business driver projections to build forecast.
• Technical approaches (Spreadsheet and SAS) – Exponential forecast comparison – Multivariate regression February 2004 page 11
Business Driver and Resource Regression
Business Driver Regression
30000000 25000000 20000000 15000000 10000000 5000000 0 Ja n 01 M ar -0 1 M ay -0 1 Ju l-0 1 Se p 01 N ov -0 1 Ja n 02 M ar -0 2 M ay -0 2 Ju l-0 2 Se p 02 N ov -0 2 Ja n 03 M ar -0 3 M ay -0 3
Month
Ju l-0 3 Se p 03 N ov -0 3 8000 7000 6000 5000 4000 3000 2000 1000 0 BusDriver Actual Forecast =FORECAST(C
x
,$B$3:$B$26, $C$3:$C$26) February 2004 page 12
Combination of Business Driver and Resource Date-Based Projections February 2004
Business Diver Correlation
30000000 8000 25000000 7000 6000 20000000 15000000 5000 4000 3000 10000000 2000 5000000 1000 0 Jan 01 Mar -01 May -0 1 Jul -01 S ep 01 Nov -01 Jan 02 Mar -02 May -0 2
Date
Jul -02 S ep 02 Nov -02 Jan 03 Mar -03 May -0 3 Jul -03 0 BusDriverTrend BusDrverInput SpecTrend Forecast Actual page 13
Forecast Using Date and Business Driver
=FORECAST(A27,$B$7:$B$26,$A$7:$A$26)*C27/FORECAST(A27,$C$7:$C$26,$A$7:$A$26)
A
Month Actual BusDriver Forecast
23 Sep-02 6049 24 Oct-02 6310 25 Nov-02 6195 26 Dec-02 6406 27 Jan-03 28 Feb-03 29 Mar-03 30 Apr-03 31 May-03 32 Jun-03 33 Jul-03 B C 23133576 6350 23421420 6371 23645813 6373 23829750 6368 23966066 6348 24246086 6368 24837570 6475 25144463 6501 25418544 6522 25946250 6605 26254497 6635 D February 2004 page 14
SAS Regression proc reg noprint data=forecast outest=regdata tableout ; by machine shift; model cpureg = %CtReg / selection=rsquare noint; February 2004 page 15
Actual vs. Projected February 2004 page 16
Individual Actual vs. Projected Analysis Server ServA ServB ServC ServD ServD ServE Count = Avg Error= Application Bus1 232 0.25439
Bus2 0.343
Bus3 0.585
Bus4 0.343
Bus5 Bus7 0.233
0.454
0.596
0.108
February 2004 page 17
Actual vs. Projected Graphical Analysis February 2004 page 18
Aggregate Actual vs. Projected (Relative) February 2004 page 19
Aggregate Actual vs. Projected (Average Actual vs. Average Projected) February 2004 page 20
Aggregate Actual vs. Projected (Absolute) February 2004 page 21
Exception Detection • Statistical Analysis (standard deviations from mean) • Exception Detection System developed by Igor Trubin, Ph.D., of Capital One – “Exception Detection System, Based on the Statistical Process Control Concept,” CMG2001 – “Global and Application Level Exception Detection System, Based on MASF Technique,” CMG2002 and UKCMG 2003 • Reporting: E-mail, web reports February 2004 page 22
Statistical Process Control February 2004 page 23
Exception Detection E-Mail Exception Detection Report for 05/02 _____________________________________________________ CPU_Utilization exception unix/unisys/tandem/MVS 5 boxes list: ServerA ServerC ServerE ServerL ServerZ _____________________________________________________ CPU_Utilization NULL DATA unix/unisys/tandem/MVS 0 boxes list: _____________________________________________________ CPU_Utilization insufficient DATA unix/unisys/tandem/MVS 1 boxes list: ServerG =============================================== CPU utilization was greater than 50% yesterday for: ServerA ServerD February 2004 page 24
Exception Reporting Dept A Daily Exceptions CPU ServerA ServerB ServerI ServerJ CPU Queue ServerD Memory ServerB ServerF Insufficient Data ServerE ServerG Disk Busy ServerC Disk I/O ServerC February 2004 page 25
Platform Types • “MVS” mainframe • “Flavors” of Unix • NT servers • Non- “standard” such as Unisys, Tandem, HP3000, native commands, etc.
• Different data formats and locations February 2004 page 26
Level of Detail Mainframe • Global (by Sysplex) • Partition (LPAR) • Service Class • SMF (use job names and account codes) • Hardware may be shared among workloads and business units.
February 2004 page 27
Level of Detail Distributed • Global (by server) • Application or Workload – Assigned within data collection product – Assigned within Capacity/Performance database code – Processes by descending CPU% • Overall view of utilization (for consolidation opportunities) February 2004 page 28
Integrated Products • Products with integrated reporting capabilities – Extract data, port to the existing Performance Database system.
– Interface directly with the product databases (e.g. with SAS ODBC).
– Link web HTML pages to product graphs. February 2004 page 29
Multiple Platform Types
Native Commands 5 or 10 Min Samples Hourly CPU Hourly Sum maries trace(if available) Sequential File(s) Product Database Product Database Remote Server with Product Product Database Extract Sequential File(s) ftp ftp Non-Standard Platforms Product Extract Network ftp Server Sequential Files Web Files Sequential File(s) ftp ftp NRJE ftp (Other Remote Platforms) Workstation Graphics “MVS” Mainframe Capacity Bridge Capacity Programs Web Files Sequential Files Mainframe Data SNMP Performance Data LAN / WAN Html and Graphics Files Reports
February 2004 page 30
Data Capture Analysis • Operational side to Capacity Planning: Automation of data collection, performance database population, and report creation • Automated process to check the successful and timely completion of each step of the process • Included in exception analysis mechanism • Ongoing tuning effort as complexity and volume increases February 2004 page 31
Organization and Reporting of Server Structure • Database of server characteristics and assignments – Business unit classifications – Applications – Capacity Planner assignments – Configuration details – Status color codes February 2004 page 32
Server Database
Dept Application platform Database Server
DeptA App1 DeptA App1 DeptB App2 DeptB App2 DeptC App3 DeptD App4 NT Unix Unix Unix Unix Unix DB1 DB2
Planner Status_Color Manufacturer CPUs
ServerA Joe Blow ServerB Joe Blow ServerC Moe Toe ServerC Moe Toe ServerD Flo Doe ServerE Flo Doe 1 Compaq 1 Sun 3 HP 3 HP 2 HP 2 HP
Model_Number
1 Proliant 3000 64 Ultra SPARC II 400MHz 2 rp7400 2 rp7400 2 rp7400 2 rp7400
TPC-C
8049.6
156873 26895 26895 26895 26895 February 2004 page 33
Use of Server Database • Central repository of capacity information • Data source to build browser pages on the company’s Intranet – Color-coded view of servers by business area and application.
February 2004 page 34
Matrix-Based Reporting February 2004 page 35
Application Mapping and Color Coding BusArea1 ApplicationA BusArea2 BusArea3 ApplicationB ApplicationC ApplicationD ApplicationE ApplicationF ApplicationG ApplicationH DB_1 DB_2 DB_3 DB_4 DB_5 DB_6 DB_7 ServerA ServerB ServerC ServerD ServerE ServerF ServerG OS390A OS390B ServerH ServerI ServerJ OS390A OS390A February 2004 page 36
Bulk Capacity Planning • Analyze large number of servers in a single pass.
• Import measured and trended CPU utilization of each server.
• Assign servers to business areas.
• Allow assignment of business drivers (with growth rates) to each server • Allow assignment of upgrade thresholds to each server.
February 2004 page 37
“Bulk Capacity Planning” Projections • For each server, calculate the month where projected CPU utilization crosses the upgrade threshold, for three growth rates.
• Use “conditional formatting” to flag server upgrade dates as red or yellow if the date is of concern. February 2004 page 38
Calculation of #Years Before Upgrade
Threshold = Base * (1+AnnualGrowth) (#Years) Log(Threshold) = Log(Base) + (#Years) * log(1+AnnualGrowth) (#Years) = ( Log(Threshold) - Log(Base) ) / Log(1+AnnualGrowth)
February 2004 page 39
Bulk Capacity Planning Spreadsheet
B usi ness U ni t Ser ver U p g r ad e% B ase U p g d Scen U p g d T r end U p g d
Dept A ServerA 75.0% Nov-07 Oct -06 Jan-05 Dept B ServerB 75.0% ServerD 75.0% ServerC 75.0% ServerE 75.0% M ar-04 Aug-04
Sep - 0 2
M ay-06 Sep-25
N o v- 0 2
M ay-11 M ar-04
Sep - 0 3 A p r - 0 2
Oct -05
D ec- 0 3 N o t es B usi ness D r i ver B ase T i me B ase C PU %
Widget s Not impact ed by Scenario Widget s Widget s Gadget s Gadget s Oct -01 Oct -01 Oct -01 Oct -01 Oct -01 31.6% 45.0% 65.0% 19.0% 35.6% Dept C Dept D ServerF 75.0% ServerG 75.0% ServerI 75.0%
D ec- 0 3 M ay- 0 2
Oct -05 Jan-11 M ar-19 Nov-04
M ay- 0 2
Dec-06 Sep-07 Scen Upgd Feb-03 Gadget s Things Things Oct -01 Oct -01 Oct -01 61.0% 29.0% 32.6% February 2004 page 40
Visualization Techniques • Different views of the same data • Web-based (HTML and Java) reports • “Stoplight” (green/yellow/red) coded status • Overlay presentation of trends and forecasts • “Thumbnail” charts with drilldown capabilities February 2004 page 41
Visualization Techniques (Continued) • Automatic generation of static HTML • Dynamic HTML (CGI bin, web portal) • Representation of servers as color-coded rectangles on a single page, where the area of each rectangle represents its capacity rating.
February 2004 page 42
Different Views of Same Data • Servers can appear more than once (multiple applications and assignments).
• “Production” vs. “All” • “All Departments” (no duplicates) vs. “By Department” February 2004 page 43
HTML with Hyperlinks
Table of Contents
……………………………………………………………….
Storage Analysis Presented below are workload-based DASD space projections. Additional detail can be found in the DASD Workload Profile. Figure 4 - DASD Space Analysis by Workload
February 2004 page 44
Web Graphs from SAS
TITLE2 F=SIMPLEX C=RED J=C H=1.3 ”ServerA CPU BY DAY"; PROC GCHART GOUT=GOUT.DATE3; WHERE ( SYSTEM =: 'AVG_ServerA' ); VBAR DATE / SUMVAR=CPU SPACE = 0 SUBGROUP=WKLD DISCRETE CAXIS=BLUE CTEXT=RED NAME="DATE" DESCRIPTION="DATE GRAPH”; /****************************************/ GOPTIONS DEVICE=GIF GSFNAME=GIFOUT GPROTOCOL=SASGPASC CBACK=BWH BORDER HSIZE=6 VSIZE=4 GSFMODE=REPLACE GSFLEN=128; PROC GREPLAY IGOUT=GOUT.DATE2 NOFS; REPLAY 1;
February 2004 page 45
Indexed Report February 2004 page 46
Anchored Links
Web Page 1 Business Area
BusA
Application
APPA APPB
Server
ServerA ServerB
CPU
Month Month
Web Page 2 Server name ServerA ServerB Sub Business SubA SubA Config Solaris7/Sun/E420/4/4/P Solaris7/Sun/E420/4/4/D Perform Charts BMC MW BMC
MW February 2004 page 47
Thumbnail Graphs February 2004 page 48
Automatic Generation of HTML • Driven by server database • SAS or Visual Basic code - builds web pages and hyperlinks February 2004 page 49
Color-Coded Rectangles February 2004 “Treemap” paper by Ben Shneiderman, University of Maryland, http://www.cs.umd.edu/hcil/treemaps page 50
Server Utilization Reporting (Maximum Hour) February 2004 page 51
Underutilization Analysis by Business Area February 2004 Maximum Hourly CPU Utilization page 52
Summary Issues • Capacity Planning more complicated number of servers grows.
• Amount of effort does not grow linearly. • Analyze and track many more servers at a time within a given business context.
February 2004 page 53
Summary Issues (Continued) • Processing and reporting operations • Connection to the business increasingly important and more complicated as applications cross business units and servers • Identification of trends and exceptions more difficult • Reporting February 2004 page 54
Summary Recommendations • One or more Capacity Councils • Automation of exception detection, operations monitoring, and business driver-based forecasting • A database of relevant server information • Redesigned approaches for processing, analyzing, and reporting data for large numbers of servers February 2004 page 55
Thanks!
Linwood Merritt Technical Delivery Capacity Planning Capital One Services, Inc.
February 2004 page 56