Transcript Slide 1

RADS Student Seminar 01 Mar 2006 Jing Xu page 1

Improving Software Performance through Configuration Tuning and Design Changes Jing Xu Carleton University

Mar 01, 2006

RADS Student Seminar 01 Mar 2006 Jing Xu page 2

Introduction

 

Performance

Motivation is obvious Approaches

Configuration tuning: capacity, deployment, priorities, etc.

 

Late stage of development, deployment time or runtime

Easy to do, low cost

Limited ability Design changes: software architecture, patterns, communication mechanisms, etc.

Should be started in early stage of design time

Hard to do; if late, cost will be high

Once done, effect is dramatic

RADS Student Seminar 01 Mar 2006 Jing Xu page 3

Research groups and their work

  

Configuration tuning

  

Daniel Menasce and Hassan Gomaa (UML-CLISSPE-QN) Tao Zheng and Murray Woodside (Optimization of Scheduling and Allocation) Risse et al (2 step approach for finding a satisfactory configuration) Design Changes

    

Balsamo (CHAM + QNM on a multiphase compiler) Petriu and Woodside (UCM+LQN on RADS online bookstore) Gorton et al (EJB patterns) Hassan Gomaa, Daniel A. Menascé (component interconnection patterns) Jing Xu, Murray Woodside (performance improvement on BSS) Some general principles

 

Peter Tregunno (4 techniques for bottleneck mitigation) Connie Smith’s principles

RADS Student Seminar 01 Mar 2006 Jing Xu page 4

Configuration tuning example 1 (1)

A Method for Design and Performance Modeling of Client/Server Systems (Daniel Menasce and Hassan Gomaa, IEEE Transactions on Software Engineering, Volume 26, November 2000 )

UML model for system design

CLISSPE (Client/Server Software Performance Evaluation) describes transaction specification

Analytical QN model compiled from CLISSPE specifications

System being studied: a Recruitment and Training System of a major US Government agency

RADS Student Seminar 01 Mar 2006 Jing Xu page 5

Configuration tuning example 1 (2)

RADS Student Seminar 01 Mar 2006 Jing Xu page 6

Configuration tuning example 1 (3)

RADS Student Seminar 01 Mar 2006 Jing Xu page 7

Configuration tuning example 1 (4)

Performance results (resource configuration tuning):

cannot achieve a satisfactory performance

mD(p)nA(q) m: # of DB servers p: # of processors in each DB server n: # of App Servers q: # of processors for each App Server Comb(p): combined, DB and App Server on the same node p: # of processors in each node 

Ability for guiding design changes: redesign critical transactions fully caching of most critical tables

RADS Student Seminar 01 Mar 2006 Jing Xu page 8

Configuration tuning example 2 (1)

Heuristic Optimization of Scheduling and Allocation for Distributed Systems with Soft Deadlines ( Tao Zheng and Murray Woodside, TOOLS 2003, based on work by Hesham El-Sayed)

Definition of soft deadline: TaskWaitingMetric: (Evaluate task waiting time) Penalty function: (for performance checking) ComGainMetric: (Evaluate Communication Overhead)

RADS Student Seminar 01 Mar 2006 Jing Xu page 9

Configuration tuning example 2 (2)

1.

2.

3.

4.

5.

6.

Initialize the configuration:

 

Use the Multifit-Com algorithm for the initial allocation Give higher initial priority to tasks with fewer threads, on the same processor. Break ties by the Proportional-Deadline-Monotonic algorithm Check termination; use the V(A,P) metric to estimate the solution quality. If it is zero, go to 5. Otherwise go to 3.

Use the TaskWaitingMetric metric to select a task with the largest value for priority increase. Estimate the solution quality. If V(A,P) is zero, go to 5. Otherwise go to 4.

Restore the configuration which has the smallest V(A,P) metric so far. Use the ComGainMetric metric defined in Eq. (4) below to select a task with the largest value, and a new CPU for it to move to. Estimate the solution quality. If V(A,P) is zero, go to 5. If failure conditions (on maximum total steps or on running out of alternatives to try) are satisfied, go to 6. Otherwise, go to 3.

Stop. A feasible solution is found.

Failed

RADS Student Seminar 01 Mar 2006 Jing Xu page 10

Configuration tuning example 2 (3)

Applied to RADS Online Bookstore Model

RADS Student Seminar 01 Mar 2006 Jing Xu page 11

Configuration tuning example 3 (1)

Configuration of Distributed Message Converter Systems using Performance Modeling (Thomas Risse, Karl Aberer, et al , Performance Evaluation, vol. 58, Oct 2004)

An efficient two-step approach for finding a satisfactory configuration:

Determine hardware configuration by QN analysis

Incrementally optimize software configuration by simulating LQN models

Applied systems: Electronic Data Interchange (EDI) Systems

RADS Student Seminar 01 Mar 2006 Jing Xu page 12

Configuration tuning example 3 (2)

RADS Student Seminar 01 Mar 2006 Jing Xu page 13

Configuration tuning example 3 (2)

XML Hardware configuration file Small Message 0.72 100 ...

Bluenun ...

4.465 1.35 1.157 1.87 ...

RADS Student Seminar 01 Mar 2006 Jing Xu page 14

Configuration tuning example 3 (3)

Result:

Find hardware and software configurations that could fulfill performance goals

Hardware configurations:

Work load distribution of different message types on each host types

Software configurations:

Number of instances for each software component

RADS Student Seminar 01 Mar 2006 Jing Xu page 15

Design Changes Example 1 (1)

An Approach to Performance Evaluation of Software Architectures (S. Balsamo, P. Inverardi, C.Mangano, WOSP 1998)

CHAM (Chemical Abstract Machine) for software architecture specification (Molecules, Solutions, Transforms)

Queuing Network Model (QNM) for performance evaluation

Studied System: a Multiphase Compiler

RADS Student Seminar 01 Mar 2006 Jing Xu page 16

Design Changes Example 1 (2)

Two competing software architectures

Sequential architecture

Concurrent architecture

RADS Student Seminar 01 Mar 2006 Jing Xu page 17

Design Changes Example 1 (3)

CHAM specification of software architecture

Initial state:

S1=

text

<>o(Char), i(char)<>o(tok)<>

lexer

, i(tok)<>o(phr)<>

parser

, i(phr)<>o(cophr)<>

semantor

, i(cophr)<>o(cophr)<>

optimizer

, i(cophr)<>o(obj)<>

generator

 Reaction rules: T1 =

text

<>o(char) -> o(char)<>

text

T2 = i(d)<>m1, o(d) <> m2 -> m1<>i(d), m2<>o(d) T3 = o(obj)<>

generator

<> i(cophr) -> i(char)<>o(tok)<>

lexer

, i(tok)<>o(phr)<>

parser

, i(phr)<>o(cophr)<>

semantor

, i(cophr)<>o(cophr)<>

optimizer

, i(cophr)<>o(obj)<>

generator

RADS Student Seminar 01 Mar 2006 Jing Xu page 18

Design Changes Example 1 (4)

Performance Models

Results: Maximum throughput supported by Concurrent model is 4-5 times bigger than sequential model

RADS Student Seminar 01 Mar 2006 Jing Xu page 19

Design Changes Example 2 (1)

Analysing Software Requirements Specifications for Performance (Dorin Petriu, Murray Woodside, WOSP 2002)

Use Case Map (UCM) for software scenario description

LQN model for performance evaluation

Studied system: RADS online book store

RADS Student Seminar 01 Mar 2006 Jing Xu page 20

Design Changes Example 2 (2)

UCM example:

RADS Student Seminar 01 Mar 2006 Jing Xu page 21

Design Changes Example 2 (3)

LQN Performance Model

Obtained by UCM2LQN tool

RADS Student Seminar 01 Mar 2006 Jing Xu page 22

Design Changes Example 2 (4)

RADS Student Seminar 01 Mar 2006 Jing Xu page 23

Design Changes Example 2 (5)

RADS Student Seminar 01 Mar 2006 Jing Xu page 24

Design Changes Example 3 (1)

Design-Level Performance Prediction of Component Based Applications (Yan Liu, Alan Fekete and Ian Gorton, IEEE Transactions on Software Engineering , Vol 31, November 2005)

Approach: Modeling -> Calibrating -> Characterizing -> Benchmarking -> Populating

Studied system: an EJB application - Stock-Online

RADS Student Seminar 01 Mar 2006 Jing Xu page 25

Design Changes Example 3 (2)

RADS Student Seminar 01 Mar 2006 Jing Xu page 26

Design Changes Example 3 (3)

3 competing architectures evaluated:

 

Common Container Managed Persistence (CMP)

Session Bean façade to the entity beans Read-Mostly Pattern (RM)

  read-only and read-write operations are separated into two different entity beans.

 reads from read-only beans are not blocked, reducing the transactional overhead in synchronizing the cache with the persistent data store.

Optimistic Concurrency Control (OCC)

 no lock is held during a transaction  container issues a

predicated update

clause to detect transaction conflict at commit time  transaction is rolled back under if conflict occurs

RADS Student Seminar 01 Mar 2006 Jing Xu page 27

Design Changes Example 3 (4)

RADS Student Seminar 01 Mar 2006 Jing Xu page 28

Design Changes Example 3 (5)

RADS Student Seminar 01 Mar 2006 Jing Xu page 29

Design Changes Example 4 (1)

Design and Performance Modeling of Component Interconnection Patterns for Distributed Software Architecture (Hassan Gomaa, Daniel Menasce, WOSP2000)

  

UML pattern for connectors Performance annotation in XML based files QN models for evaluation

RADS Student Seminar 01 Mar 2006 Jing Xu page 30

Design Changes Example 4 (2)

string [ string external | internal [ real bytes real msg/sec ] [ real real sec real bytes string ]+ ]* [ string real sec real bytes ]*

RADS Student Seminar 01 Mar 2006 Jing Xu page 31

Design Changes Example 4 (3)

RADS Student Seminar 01 Mar 2006 Jing Xu page 32

Design Changes Example 5 (My previous work, Tools 2003)

acquireLoop [1.8] VideoController User rate=0.5/sec Users UserP ($N) (1) procOneImage [1.5,0] AcquireProc Applic CPU readCard [1, 0] (1,0) CardReader CardP alloc [0.5, 0] BufferManager (forwarded) (1, 0) bufEntry (1, 0) Buffer (0, 1) Dumm y getImage [12,0] ($P, 0) passImage [0.9, 0] AcquireProc2 storeImage [3.3, 0] (1, 0) (1, 0) admit [3.9, 0.2] AccessController (1, 0) StoreProc (0

,

0.2) (0, 0) alarm [0,0] (forwarded) Alarm lock [0, 500] Lock AlarmP LockP network [0, 1] Network (infinite) NetP releaseBuf [0.5, 0] BufMgr2 writeImg [7.2, 0] readRight s [1.8,0] writeEvent [1.8, 0] ($B, 0) (0.4, 0) (1, 0) DataBase (10 threads) DB CPU writeBlock [1, 0] readData [1.5, 0] writeRec [3, 0] Disk (2 threads) DiskP

Original Sequence Diagram

<> <> Video Controller <> AcquireProc <> BufferManager o RADS Student Seminar 01 Mar 2006 Jing Xu page 33 <> StoreProc <> Database {PAcapacity=10} o <> {PArep = $N} *[$N] procOneImage(i) <> { PAdemand =(‘asmd’, ‘mean’, (1.5, ‘ms’)} getBuffer() <> allocBuf (b) <> {PApopulation = 1, PAinterval =((‘req’,’percentile’,95, (1, ‘s’)), (‘pred’,’percentile’, 95, $Cycle)) } o <> { PAdemand=(‘asmd’, ‘mean’, ($P * 1.5, ‘ms’)), PAextOp = (network, $P)} getImage (i, b) o This object manages the resource Buffer <> { PAdemand=(‘asmd’, ‘mean’, (0.5, ‘ms’))} <> { PAdemand=(‘asmd’, ‘mean’, (1.8, ‘ms))} <> { PAdemand=(‘asmd’, ‘mean’, (0.9, ‘ms’))} passImage (i, b) <> { PAdemand=(‘asmd’, ‘mean’, (0.5, ‘ms’))} o <> { PAdemand=(‘asmd’, ‘mean’, (1.1, ‘ms’))} storeImage (i, b) <> { PAdemand=(‘asmd’, ‘mean’, (0.2,’ms’))} freeBuf (b) <> releaseBuf (b) <> { PAdemand=(‘asmd’, ‘mean’, (2, ‘ms’))} store (i, b) <> { PAdemand=(‘asmd’, ‘mean’, ($B * 0.9, ‘ms’)),, PAextOp=(writeBlock, $B)} writeImg (i, b)

Feedback

Modified Sequence Diagram

<> RADS Student Seminar 01 Mar 2006 Jing Xu page 34 <> Video Controller <> AcquireProc {PAcapacity= 3} <> BufferManager o <> StoreProc {PAcapacity= 6} <> Database {PAcapacity= 10} o <> {PArep = $N} *[$N] procOneImage(i) o <> {PApopulation = 1, PAinterval =((‘req’,’percentile’, 95, (1, ‘s’)), (‘pred’, ‘percentile’, 95, $Cycle))} <> { PAdemand=(‘asmd’, ‘mean’, (1.8, ‘ms))} <> { PAdemand =(‘asmd’, ‘mean’, (1.5, ‘ms’)} getBuffer() <> allocBuf (b) o <> { PAdemand=(‘asmd’, ‘mean’, ($P * 1.5, ‘ms’)), PAextOp = (network, getImage (i, b) This object manages the resource Buffer <> { PAdemand=(‘asmd’, ‘mean’, (0.5, ‘ms’))} <> { PAdemand=(‘asmd’, ‘mean’, (0.9, ‘ms’))} passImage (i, b) <> { PAdemand=(‘asmd’, ‘mean’, (0.5, ‘ms’))} <> { PAdemand=(‘asmd’, ‘mean’, (1.1, ‘ms’))} storeImage (i, b) <> { PAdemand=(‘asmd’, ‘mean’, (0.2,’ms’))} freeBuf (b) <> releaseBuf (b) <> { PAdemand=(‘asmd’, ‘mean’, (2, ‘ms’))} store (i, b) <> { PAdemand=(‘asmd’, ‘mean’, ($B * 0.9, ‘ms’)),, PAextOp=(writeBlock, $B)} writeImg (i, b) o

RADS Student Seminar 01 Mar 2006 Jing Xu page 35

Principles from Connie Smith(1)

    

Fixing-Point Principle:

early binding, e.g. Java vs C program Locality-Design Principle:

Create actions, functions, and results that are “close” to physical computer resources. Processing Versus Frequency Tradeoff Principle:

Seeking min( ∑ (frequency * each processing)) Shared-Resource Principle:

Share resources when possible – less critical session or lock Parallel Processing Principle:

Processing speedup vs. communication overhead

RADS Student Seminar 01 Mar 2006 Jing Xu page 36

Principles from Connie Smith(2)

Centering Principle:

folkloric “80-20” rule – work on the key problem

Instrumenting Principle:

Instrument systems as you build them to enable measurement and analysis of workload scenarios, resource requirements, and performance goal achievement. This is the foundation of our performance analysis and improvement.

RADS Student Seminar 01 Mar 2006 Jing Xu page 37

Four software bottleneck mitigation techniques from Peter Tregunno

Practical Analysis of Software Bottlenecks (Thesis of Peter Tregunno )

Four software bottleneck mitigation techniques:

1) increasing threading levels at the bottleneck task

2) replication of the bottleneck task and its processor

3) the reduction of phase one processor demand of bottleneck task

4) the decreasing of the number of interactions that the bottleneck task has with lower layers.