Transcript Slide 1
RADS Student Seminar 01 Mar 2006 Jing Xu page 1
Improving Software Performance through Configuration Tuning and Design Changes Jing Xu Carleton University
Mar 01, 2006
RADS Student Seminar 01 Mar 2006 Jing Xu page 2
Introduction
Performance
Motivation is obvious Approaches
Configuration tuning: capacity, deployment, priorities, etc.
Late stage of development, deployment time or runtime
Easy to do, low cost
Limited ability Design changes: software architecture, patterns, communication mechanisms, etc.
Should be started in early stage of design time
Hard to do; if late, cost will be high
Once done, effect is dramatic
RADS Student Seminar 01 Mar 2006 Jing Xu page 3
Research groups and their work
Configuration tuning
Daniel Menasce and Hassan Gomaa (UML-CLISSPE-QN) Tao Zheng and Murray Woodside (Optimization of Scheduling and Allocation) Risse et al (2 step approach for finding a satisfactory configuration) Design Changes
Balsamo (CHAM + QNM on a multiphase compiler) Petriu and Woodside (UCM+LQN on RADS online bookstore) Gorton et al (EJB patterns) Hassan Gomaa, Daniel A. Menascé (component interconnection patterns) Jing Xu, Murray Woodside (performance improvement on BSS) Some general principles
Peter Tregunno (4 techniques for bottleneck mitigation) Connie Smith’s principles
RADS Student Seminar 01 Mar 2006 Jing Xu page 4
Configuration tuning example 1 (1)
A Method for Design and Performance Modeling of Client/Server Systems (Daniel Menasce and Hassan Gomaa, IEEE Transactions on Software Engineering, Volume 26, November 2000 )
UML model for system design
CLISSPE (Client/Server Software Performance Evaluation) describes transaction specification
Analytical QN model compiled from CLISSPE specifications
System being studied: a Recruitment and Training System of a major US Government agency
RADS Student Seminar 01 Mar 2006 Jing Xu page 5
Configuration tuning example 1 (2)
RADS Student Seminar 01 Mar 2006 Jing Xu page 6
Configuration tuning example 1 (3)
RADS Student Seminar 01 Mar 2006 Jing Xu page 7
Configuration tuning example 1 (4)
Performance results (resource configuration tuning):
cannot achieve a satisfactory performance
mD(p)nA(q) m: # of DB servers p: # of processors in each DB server n: # of App Servers q: # of processors for each App Server Comb(p): combined, DB and App Server on the same node p: # of processors in each node
Ability for guiding design changes: redesign critical transactions fully caching of most critical tables
RADS Student Seminar 01 Mar 2006 Jing Xu page 8
Configuration tuning example 2 (1)
Heuristic Optimization of Scheduling and Allocation for Distributed Systems with Soft Deadlines ( Tao Zheng and Murray Woodside, TOOLS 2003, based on work by Hesham El-Sayed)
Definition of soft deadline: TaskWaitingMetric: (Evaluate task waiting time) Penalty function: (for performance checking) ComGainMetric: (Evaluate Communication Overhead)
RADS Student Seminar 01 Mar 2006 Jing Xu page 9
Configuration tuning example 2 (2)
1.
2.
3.
4.
5.
6.
Initialize the configuration:
Use the Multifit-Com algorithm for the initial allocation Give higher initial priority to tasks with fewer threads, on the same processor. Break ties by the Proportional-Deadline-Monotonic algorithm Check termination; use the V(A,P) metric to estimate the solution quality. If it is zero, go to 5. Otherwise go to 3.
Use the TaskWaitingMetric metric to select a task with the largest value for priority increase. Estimate the solution quality. If V(A,P) is zero, go to 5. Otherwise go to 4.
Restore the configuration which has the smallest V(A,P) metric so far. Use the ComGainMetric metric defined in Eq. (4) below to select a task with the largest value, and a new CPU for it to move to. Estimate the solution quality. If V(A,P) is zero, go to 5. If failure conditions (on maximum total steps or on running out of alternatives to try) are satisfied, go to 6. Otherwise, go to 3.
Stop. A feasible solution is found.
Failed
RADS Student Seminar 01 Mar 2006 Jing Xu page 10
Configuration tuning example 2 (3)
Applied to RADS Online Bookstore Model
RADS Student Seminar 01 Mar 2006 Jing Xu page 11
Configuration tuning example 3 (1)
Configuration of Distributed Message Converter Systems using Performance Modeling (Thomas Risse, Karl Aberer, et al , Performance Evaluation, vol. 58, Oct 2004)
An efficient two-step approach for finding a satisfactory configuration:
Determine hardware configuration by QN analysis
Incrementally optimize software configuration by simulating LQN models
Applied systems: Electronic Data Interchange (EDI) Systems
RADS Student Seminar 01 Mar 2006 Jing Xu page 12
Configuration tuning example 3 (2)
RADS Student Seminar 01 Mar 2006 Jing Xu page 13
Configuration tuning example 3 (2)
XML Hardware configuration file
RADS Student Seminar 01 Mar 2006 Jing Xu page 14
Configuration tuning example 3 (3)
Result:
Find hardware and software configurations that could fulfill performance goals
Hardware configurations:
Work load distribution of different message types on each host types
Software configurations:
Number of instances for each software component
RADS Student Seminar 01 Mar 2006 Jing Xu page 15
Design Changes Example 1 (1)
An Approach to Performance Evaluation of Software Architectures (S. Balsamo, P. Inverardi, C.Mangano, WOSP 1998)
CHAM (Chemical Abstract Machine) for software architecture specification (Molecules, Solutions, Transforms)
Queuing Network Model (QNM) for performance evaluation
Studied System: a Multiphase Compiler
RADS Student Seminar 01 Mar 2006 Jing Xu page 16
Design Changes Example 1 (2)
Two competing software architectures
Sequential architecture
Concurrent architecture
RADS Student Seminar 01 Mar 2006 Jing Xu page 17
Design Changes Example 1 (3)
CHAM specification of software architecture
Initial state:
S1=
text
<>o(Char), i(char)<>o(tok)<>
lexer
, i(tok)<>o(phr)<>
parser
, i(phr)<>o(cophr)<>
semantor
, i(cophr)<>o(cophr)<>
optimizer
, i(cophr)<>o(obj)<>
generator
Reaction rules: T1 =
text
<>o(char) -> o(char)<>
text
T2 = i(d)<>m1, o(d) <> m2 -> m1<>i(d), m2<>o(d) T3 = o(obj)<>
generator
<> i(cophr) -> i(char)<>o(tok)<>
lexer
, i(tok)<>o(phr)<>
parser
, i(phr)<>o(cophr)<>
semantor
, i(cophr)<>o(cophr)<>
optimizer
, i(cophr)<>o(obj)<>
generator
RADS Student Seminar 01 Mar 2006 Jing Xu page 18
Design Changes Example 1 (4)
Performance Models
Results: Maximum throughput supported by Concurrent model is 4-5 times bigger than sequential model
RADS Student Seminar 01 Mar 2006 Jing Xu page 19
Design Changes Example 2 (1)
Analysing Software Requirements Specifications for Performance (Dorin Petriu, Murray Woodside, WOSP 2002)
Use Case Map (UCM) for software scenario description
LQN model for performance evaluation
Studied system: RADS online book store
RADS Student Seminar 01 Mar 2006 Jing Xu page 20
Design Changes Example 2 (2)
UCM example:
RADS Student Seminar 01 Mar 2006 Jing Xu page 21
Design Changes Example 2 (3)
LQN Performance Model
Obtained by UCM2LQN tool
RADS Student Seminar 01 Mar 2006 Jing Xu page 22
Design Changes Example 2 (4)
RADS Student Seminar 01 Mar 2006 Jing Xu page 23
Design Changes Example 2 (5)
RADS Student Seminar 01 Mar 2006 Jing Xu page 24
Design Changes Example 3 (1)
Design-Level Performance Prediction of Component Based Applications (Yan Liu, Alan Fekete and Ian Gorton, IEEE Transactions on Software Engineering , Vol 31, November 2005)
Approach: Modeling -> Calibrating -> Characterizing -> Benchmarking -> Populating
Studied system: an EJB application - Stock-Online
RADS Student Seminar 01 Mar 2006 Jing Xu page 25
Design Changes Example 3 (2)
RADS Student Seminar 01 Mar 2006 Jing Xu page 26
Design Changes Example 3 (3)
3 competing architectures evaluated:
Common Container Managed Persistence (CMP)
Session Bean façade to the entity beans Read-Mostly Pattern (RM)
read-only and read-write operations are separated into two different entity beans.
reads from read-only beans are not blocked, reducing the transactional overhead in synchronizing the cache with the persistent data store.
Optimistic Concurrency Control (OCC)
no lock is held during a transaction container issues a
predicated update
clause to detect transaction conflict at commit time transaction is rolled back under if conflict occurs
RADS Student Seminar 01 Mar 2006 Jing Xu page 27
Design Changes Example 3 (4)
RADS Student Seminar 01 Mar 2006 Jing Xu page 28
Design Changes Example 3 (5)
RADS Student Seminar 01 Mar 2006 Jing Xu page 29
Design Changes Example 4 (1)
Design and Performance Modeling of Component Interconnection Patterns for Distributed Software Architecture (Hassan Gomaa, Daniel Menasce, WOSP2000)
UML pattern for connectors Performance annotation in XML based files QN models for evaluation
RADS Student Seminar 01 Mar 2006 Jing Xu page 30
Design Changes Example 4 (2)
RADS Student Seminar 01 Mar 2006 Jing Xu page 31
Design Changes Example 4 (3)
RADS Student Seminar 01 Mar 2006 Jing Xu page 32
Design Changes Example 5 (My previous work, Tools 2003)
acquireLoop [1.8] VideoController User rate=0.5/sec Users UserP ($N) (1) procOneImage [1.5,0] AcquireProc Applic CPU readCard [1, 0] (1,0) CardReader CardP alloc [0.5, 0] BufferManager (forwarded) (1, 0) bufEntry (1, 0) Buffer (0, 1) Dumm y getImage [12,0] ($P, 0) passImage [0.9, 0] AcquireProc2 storeImage [3.3, 0] (1, 0) (1, 0) admit [3.9, 0.2] AccessController (1, 0) StoreProc (0
,
0.2) (0, 0) alarm [0,0] (forwarded) Alarm lock [0, 500] Lock AlarmP LockP network [0, 1] Network (infinite) NetP releaseBuf [0.5, 0] BufMgr2 writeImg [7.2, 0] readRight s [1.8,0] writeEvent [1.8, 0] ($B, 0) (0.4, 0) (1, 0) DataBase (10 threads) DB CPU writeBlock [1, 0] readData [1.5, 0] writeRec [3, 0] Disk (2 threads) DiskP
Original Sequence Diagram
<
Feedback
Modified Sequence Diagram
<
RADS Student Seminar 01 Mar 2006 Jing Xu page 35
Principles from Connie Smith(1)
Fixing-Point Principle:
early binding, e.g. Java vs C program Locality-Design Principle:
Create actions, functions, and results that are “close” to physical computer resources. Processing Versus Frequency Tradeoff Principle:
Seeking min( ∑ (frequency * each processing)) Shared-Resource Principle:
Share resources when possible – less critical session or lock Parallel Processing Principle:
Processing speedup vs. communication overhead
RADS Student Seminar 01 Mar 2006 Jing Xu page 36
Principles from Connie Smith(2)
Centering Principle:
folkloric “80-20” rule – work on the key problem
Instrumenting Principle:
Instrument systems as you build them to enable measurement and analysis of workload scenarios, resource requirements, and performance goal achievement. This is the foundation of our performance analysis and improvement.
RADS Student Seminar 01 Mar 2006 Jing Xu page 37
Four software bottleneck mitigation techniques from Peter Tregunno
Practical Analysis of Software Bottlenecks (Thesis of Peter Tregunno )
Four software bottleneck mitigation techniques:
1) increasing threading levels at the bottleneck task
2) replication of the bottleneck task and its processor
3) the reduction of phase one processor demand of bottleneck task
4) the decreasing of the number of interactions that the bottleneck task has with lower layers.