From the Server to the Intranet

Download Report

Transcript From the Server to the Intranet

Seeing the Forest AND the Trees: Capacity Planning for a Large Number of Servers

Linwood Merritt Capital One Services, Inc.

[email protected]

February 2004 page 1

Introduction: Environment • Capital One – 4th largest card issuer in the United States – Capital One to S&P 500 in 1998 – Fortune 500 company starting in 2000 – Managed loans at $71.2 billion as of Q4 2003 – Accounts at 47.0 million as of Q4 2003 – CIO 100 Award “Master of the Customer Connection” – Information Week “Innovation 100” Award Winner – ComputerWorld “Top 100 places to work in IT” February 2004 page 2

Categories of Issues • Acquiring and using business data • Exception detection • Platform types and operating systems • Data capture analysis • Organization and reporting of server structure • Bulk Capacity Planning • Business driver based forecasting • Visualization techniques February 2004 page 3

Acquiring Business Data • Chaney, Bob, “The Capacity Performance Council, Start Yours Today,” CMG1999 Proceedings • Chaney, Bob, “Divide and Conquer: Implementing the Capacity Performance Council in Pieces,” CMG2001 Proceedings • Merritt, Linwood, “A Capacity Planning Partnership with the Business,” CMG2002 Proceedings and UKCMG 2003 February 2004 page 4

Business Data Inputs • “Pull” – Capacity impact forms – Interview meetings – Phone calls – e-mails • “Push” – Capacity Councils February 2004 page 5

Capacity Councils • Cross-organizational structure • Purpose: Bring together business and technical views of applications and supporting technologies.

• Evolution: Single Capacity Council, multiple Capacity Councils along business lines, multiple

Technical

Councils along types of platforms (mainframe “MVS,” Unix, NT, etc.) February 2004 page 6

Capacity Council Deliverables • Monthly meeting • Evaluation of Business Area Capacity Status (“stoplight” green/yellow/red color) • Business driver mapping to servers • Monthly report February 2004 page 7

“Stoplight” Status • Green: Little concern about hitting capacity constraints in the next 6 months • Yellow: Concern about capacity in the next 4-6 months • Red: Very high concern about not meeting capacity needs in the next three months February 2004 page 8

Capacity Planners’ Responsibilities PlannerA PlannerB

Capacity Planners

PlannerC PlannerD PlannerE PlannerF Mashies Tee shot Fairway shots Pitch Mgmt Foursome

Capacity Councils

Putt Mgmt Clubhouse Mgmt Chip Mgmt February 2004 page 9

Business Drivers • Capacity Councils: Business units responsible for capacity planning of “demand” side • Capacity Planners: Build “supply side” projections based on business drivers and historical trending February 2004 page 10

Business Driver Based Forecasting • Map business drivers to servers.

• Use historical data to correlate.

• Use business driver projections to build forecast.

• Technical approaches (Spreadsheet and SAS) – Exponential forecast comparison – Multivariate regression February 2004 page 11

Business Driver and Resource Regression

Business Driver Regression

30000000 25000000 20000000 15000000 10000000 5000000 0 Ja n 01 M ar -0 1 M ay -0 1 Ju l-0 1 Se p 01 N ov -0 1 Ja n 02 M ar -0 2 M ay -0 2 Ju l-0 2 Se p 02 N ov -0 2 Ja n 03 M ar -0 3 M ay -0 3

Month

Ju l-0 3 Se p 03 N ov -0 3 8000 7000 6000 5000 4000 3000 2000 1000 0 BusDriver Actual Forecast =FORECAST(C

x

,$B$3:$B$26, $C$3:$C$26) February 2004 page 12

Combination of Business Driver and Resource Date-Based Projections February 2004

Business Diver Correlation

30000000 8000 25000000 7000 6000 20000000 15000000 5000 4000 3000 10000000 2000 5000000 1000 0 Jan 01 Mar -01 May -0 1 Jul -01 S ep 01 Nov -01 Jan 02 Mar -02 May -0 2

Date

Jul -02 S ep 02 Nov -02 Jan 03 Mar -03 May -0 3 Jul -03 0 BusDriverTrend BusDrverInput SpecTrend Forecast Actual page 13

Forecast Using Date and Business Driver

=FORECAST(A27,$B$7:$B$26,$A$7:$A$26)*C27/FORECAST(A27,$C$7:$C$26,$A$7:$A$26)

A

Month Actual BusDriver Forecast

23 Sep-02 6049 24 Oct-02 6310 25 Nov-02 6195 26 Dec-02 6406 27 Jan-03 28 Feb-03 29 Mar-03 30 Apr-03 31 May-03 32 Jun-03 33 Jul-03 B C 23133576 6350 23421420 6371 23645813 6373 23829750 6368 23966066 6348 24246086 6368 24837570 6475 25144463 6501 25418544 6522 25946250 6605 26254497 6635 D February 2004 page 14

SAS Regression proc reg noprint data=forecast outest=regdata tableout ; by machine shift; model cpureg = %CtReg / selection=rsquare noint; February 2004 page 15

Actual vs. Projected February 2004 page 16

Individual Actual vs. Projected Analysis Server ServA ServB ServC ServD ServD ServE Count = Avg Error= Application Bus1 232 0.25439

Bus2 0.343

Bus3 0.585

Bus4 0.343

Bus5 Bus7 0.233

0.454

0.596

0.108

February 2004 page 17

Actual vs. Projected Graphical Analysis February 2004 page 18

Aggregate Actual vs. Projected (Relative) February 2004 page 19

Aggregate Actual vs. Projected (Average Actual vs. Average Projected) February 2004 page 20

Aggregate Actual vs. Projected (Absolute) February 2004 page 21

Exception Detection • Statistical Analysis (standard deviations from mean) • Exception Detection System developed by Igor Trubin, Ph.D., of Capital One – “Exception Detection System, Based on the Statistical Process Control Concept,” CMG2001 – “Global and Application Level Exception Detection System, Based on MASF Technique,” CMG2002 and UKCMG 2003 • Reporting: E-mail, web reports February 2004 page 22

Statistical Process Control February 2004 page 23

Exception Detection E-Mail Exception Detection Report for 05/02 _____________________________________________________ CPU_Utilization exception unix/unisys/tandem/MVS 5 boxes list: ServerA ServerC ServerE ServerL ServerZ _____________________________________________________ CPU_Utilization NULL DATA unix/unisys/tandem/MVS 0 boxes list: _____________________________________________________ CPU_Utilization insufficient DATA unix/unisys/tandem/MVS 1 boxes list: ServerG =============================================== CPU utilization was greater than 50% yesterday for: ServerA ServerD February 2004 page 24

Exception Reporting Dept A Daily Exceptions CPU ServerA ServerB ServerI ServerJ CPU Queue ServerD Memory ServerB ServerF Insufficient Data ServerE ServerG Disk Busy ServerC Disk I/O ServerC February 2004 page 25

Platform Types • “MVS” mainframe • “Flavors” of Unix • NT servers • Non- “standard” such as Unisys, Tandem, HP3000, native commands, etc.

• Different data formats and locations February 2004 page 26

Level of Detail Mainframe • Global (by Sysplex) • Partition (LPAR) • Service Class • SMF (use job names and account codes) • Hardware may be shared among workloads and business units.

February 2004 page 27

Level of Detail Distributed • Global (by server) • Application or Workload – Assigned within data collection product – Assigned within Capacity/Performance database code – Processes by descending CPU% • Overall view of utilization (for consolidation opportunities) February 2004 page 28

Integrated Products • Products with integrated reporting capabilities – Extract data, port to the existing Performance Database system.

– Interface directly with the product databases (e.g. with SAS ODBC).

– Link web HTML pages to product graphs. February 2004 page 29

Multiple Platform Types

Native Commands 5 or 10 Min Samples Hourly CPU Hourly Sum maries trace(if available) Sequential File(s) Product Database Product Database Remote Server with Product Product Database Extract Sequential File(s) ftp ftp Non-Standard Platforms Product Extract Network ftp Server Sequential Files Web Files Sequential File(s) ftp ftp NRJE ftp (Other Remote Platforms) Workstation Graphics “MVS” Mainframe Capacity Bridge Capacity Programs Web Files Sequential Files Mainframe Data SNMP Performance Data LAN / WAN Html and Graphics Files Reports

February 2004 page 30

Data Capture Analysis • Operational side to Capacity Planning: Automation of data collection, performance database population, and report creation • Automated process to check the successful and timely completion of each step of the process • Included in exception analysis mechanism • Ongoing tuning effort as complexity and volume increases February 2004 page 31

Organization and Reporting of Server Structure • Database of server characteristics and assignments – Business unit classifications – Applications – Capacity Planner assignments – Configuration details – Status color codes February 2004 page 32

Server Database

Dept Application platform Database Server

DeptA App1 DeptA App1 DeptB App2 DeptB App2 DeptC App3 DeptD App4 NT Unix Unix Unix Unix Unix DB1 DB2

Planner Status_Color Manufacturer CPUs

ServerA Joe Blow ServerB Joe Blow ServerC Moe Toe ServerC Moe Toe ServerD Flo Doe ServerE Flo Doe 1 Compaq 1 Sun 3 HP 3 HP 2 HP 2 HP

Model_Number

1 Proliant 3000 64 Ultra SPARC II 400MHz 2 rp7400 2 rp7400 2 rp7400 2 rp7400

TPC-C

8049.6

156873 26895 26895 26895 26895 February 2004 page 33

Use of Server Database • Central repository of capacity information • Data source to build browser pages on the company’s Intranet – Color-coded view of servers by business area and application.

February 2004 page 34

Matrix-Based Reporting February 2004 page 35

Application Mapping and Color Coding BusArea1 ApplicationA BusArea2 BusArea3 ApplicationB ApplicationC ApplicationD ApplicationE ApplicationF ApplicationG ApplicationH DB_1 DB_2 DB_3 DB_4 DB_5 DB_6 DB_7 ServerA ServerB ServerC ServerD ServerE ServerF ServerG OS390A OS390B ServerH ServerI ServerJ OS390A OS390A February 2004 page 36

Bulk Capacity Planning • Analyze large number of servers in a single pass.

• Import measured and trended CPU utilization of each server.

• Assign servers to business areas.

• Allow assignment of business drivers (with growth rates) to each server • Allow assignment of upgrade thresholds to each server.

February 2004 page 37

“Bulk Capacity Planning” Projections • For each server, calculate the month where projected CPU utilization crosses the upgrade threshold, for three growth rates.

• Use “conditional formatting” to flag server upgrade dates as red or yellow if the date is of concern. February 2004 page 38

Calculation of #Years Before Upgrade

Threshold = Base * (1+AnnualGrowth) (#Years) Log(Threshold) = Log(Base) + (#Years) * log(1+AnnualGrowth) (#Years) = ( Log(Threshold) - Log(Base) ) / Log(1+AnnualGrowth)

February 2004 page 39

Bulk Capacity Planning Spreadsheet

B usi ness U ni t Ser ver U p g r ad e% B ase U p g d Scen U p g d T r end U p g d

Dept A ServerA 75.0% Nov-07 Oct -06 Jan-05 Dept B ServerB 75.0% ServerD 75.0% ServerC 75.0% ServerE 75.0% M ar-04 Aug-04

Sep - 0 2

M ay-06 Sep-25

N o v- 0 2

M ay-11 M ar-04

Sep - 0 3 A p r - 0 2

Oct -05

D ec- 0 3 N o t es B usi ness D r i ver B ase T i me B ase C PU %

Widget s Not impact ed by Scenario Widget s Widget s Gadget s Gadget s Oct -01 Oct -01 Oct -01 Oct -01 Oct -01 31.6% 45.0% 65.0% 19.0% 35.6% Dept C Dept D ServerF 75.0% ServerG 75.0% ServerI 75.0%

D ec- 0 3 M ay- 0 2

Oct -05 Jan-11 M ar-19 Nov-04

M ay- 0 2

Dec-06 Sep-07 Scen Upgd Feb-03 Gadget s Things Things Oct -01 Oct -01 Oct -01 61.0% 29.0% 32.6% February 2004 page 40

Visualization Techniques • Different views of the same data • Web-based (HTML and Java) reports • “Stoplight” (green/yellow/red) coded status • Overlay presentation of trends and forecasts • “Thumbnail” charts with drilldown capabilities February 2004 page 41

Visualization Techniques (Continued) • Automatic generation of static HTML • Dynamic HTML (CGI bin, web portal) • Representation of servers as color-coded rectangles on a single page, where the area of each rectangle represents its capacity rating.

February 2004 page 42

Different Views of Same Data • Servers can appear more than once (multiple applications and assignments).

• “Production” vs. “All” • “All Departments” (no duplicates) vs. “By Department” February 2004 page 43

HTML with Hyperlinks

Table of Contents

Introduction

……………………………………………………………….

Storage Analysis

……………………………………………………………….

Storage Analysis

Presented below are workload-based DASD space projections.

Additional detail can be found in the DASD Workload Profile.

Figure 4 - DASD Space Analysis by Workload

February 2004 page 44

Web Graphs from SAS

TITLE2 F=SIMPLEX C=RED J=C H=1.3 ”ServerA CPU BY DAY"; PROC GCHART GOUT=GOUT.DATE3; WHERE ( SYSTEM =: 'AVG_ServerA' ); VBAR DATE / SUMVAR=CPU SPACE = 0 SUBGROUP=WKLD DISCRETE CAXIS=BLUE CTEXT=RED NAME="DATE" DESCRIPTION="DATE GRAPH”; /****************************************/ GOPTIONS DEVICE=GIF GSFNAME=GIFOUT GPROTOCOL=SASGPASC CBACK=BWH BORDER HSIZE=6 VSIZE=4 GSFMODE=REPLACE GSFLEN=128; PROC GREPLAY IGOUT=GOUT.DATE2 NOFS; REPLAY 1;

February 2004 page 45

Indexed Report February 2004 page 46

Anchored Links

Web Page 1 Business Area

BusA

Application

APPA APPB

Server

ServerA ServerB

CPU

Month Month

Web Page 2 Server name ServerA ServerB Sub Business SubA SubA Config Solaris7/Sun/E420/4/4/P Solaris7/Sun/E420/4/4/D Perform Charts BMC MW BMC

MW February 2004 page 47

Thumbnail Graphs February 2004 page 48

Automatic Generation of HTML • Driven by server database • SAS or Visual Basic code - builds web pages and hyperlinks February 2004 page 49

Color-Coded Rectangles February 2004 “Treemap” paper by Ben Shneiderman, University of Maryland, http://www.cs.umd.edu/hcil/treemaps page 50

Server Utilization Reporting (Maximum Hour) February 2004 page 51

Underutilization Analysis by Business Area February 2004 Maximum Hourly CPU Utilization page 52

Summary Issues • Capacity Planning more complicated number of servers grows.

• Amount of effort does not grow linearly. • Analyze and track many more servers at a time within a given business context.

February 2004 page 53

Summary Issues (Continued) • Processing and reporting operations • Connection to the business increasingly important and more complicated as applications cross business units and servers • Identification of trends and exceptions more difficult • Reporting February 2004 page 54

Summary Recommendations • One or more Capacity Councils • Automation of exception detection, operations monitoring, and business driver-based forecasting • A database of relevant server information • Redesigned approaches for processing, analyzing, and reporting data for large numbers of servers February 2004 page 55

Thanks!

Linwood Merritt Technical Delivery Capacity Planning Capital One Services, Inc.

[email protected]

February 2004 page 56