Transcript Document

Performance
Engineering
Bob Dugan, Ph.D.
Computer Science Department
Rensselaer Polytechnic Institute
Troy, New York 12180
The Nightmare Scenario
Product pre-sold by marketing as carrier scalable
Demos are flashy, fast and successful
Product is supposed to ship to big name customers like
GM, Fidelity, and AT&T a week after QA
During QA product is performance tested
Performance tests uncover serious scalability problems
Analysis shows a fundamental architecture flaw
Months of redesign and testing necessary to fix
Overview
Background
Methodology
Resources
Incorporate performance into software’s entire life cycle
to achieve performance goals.
Background
What is software performance?
Background
Response Time
Throughput
Resource
Utilization
Background: Response Time


How long does it take for a request to execute?
Example:
Web page takes 100ms to return to browser after
request.



Interactive applications require 2000ms or less.
Tells us a lot about how system is performing.
Response time has big impact on the holy grail
of performance
THROUGHPUT.
Background: Throughput


How many requests per second can be processed?
Example:
A server has throughput of 30 requests/sec
 Supports roughly 1 million requests/10 hour day
 Assume average user makes 10 requests/day
 Server will support approximately 100,000 users



Inverse of response time on lightly loaded system.
Combined with user model, can be used for
performance requirements, capacity planning,
sales, and marketing.
Background: Resource Utilization



Resources consumed by code processing request.
Examples: CPU, memory, network, disk
In a closed system, as load increases:

Throughput rises linearly
 Resources are consumed
 Response time remains near constant

When a resource is completely consumed:

Throughput remains constant
 Resource utilization remains near constant
 Response time rises linearly with load
Background: Resource Utilization

Virtual Users
Response
Time
Throughput
CPU
Utilization
1
100ms
10 req/sec
25%
2
110ms
19 req/sec
53%
4
130ms
38 req/sec
96%
8
300ms
37 req/sec
98%
16
640ms
39 req/sec
99%
Resource utilization is critical to determining throughput/response
time relationships.
 During performance testing, resource utilization helps identify the
cause of a performance problem.
Performance Engineering
Methodology
Incorporate performance into software’s entire life
cycle to achieve performance goals.
Software Life Cycle
Requirements
Specification
Design
Implementation
Integration
Test
Release
Maintenance
Requirements
Functional
requirements identified.
What are the performance
requirements?
Do any functional requirements
interfere with performance
requirements?
Performance Requirements
 What
is the capacity planning guide for
the system?
 How much is a customer willing to pay
for performance and scalability?
 Hardware
 Software
licensing (e.g. OS, Oracle, etc.)
 System Administration
Example: Internet Bank
View
accounts
Search for specific transaction
Transfer money between accounts
Export account to Quicken
10 million potential users
Performance Model

Make some assumptions (refine later)
 Three tier system: browser, web farm, database
server
 Database updated nightly with day’s transactions
(e.g. read mostly)
 User logs in once per 5 day work week, between
8AM-6PM EST
 Logins evenly distributed
 Typical user does 3 things, then logs off
 About 20% of customers will actually use online
banking
Performance Model
10,000,000 users x 20% adoption rate
2,000,000 x 3 requests per user
6,000,000 / 5 day work week
1,200,000 / 10 hour day
120,000 / 60 minutes per hour
2000 / 60 seconds per hour
= 2,000,000 users/week
= 6,000,000 requests/week
= 1,200,000 requests/day
= 120,000 requests/hour
= 2000 requests / minute
=
33 requests per second
Performance Model
Performance Requirements
 The
customer wants to pay as little as possible
for the system hardware.
 Your company wants the system to perform well,
but there’s a development cost.
 YOU must find the balance.
 What are reasonable service times and
throughput for web and database servers?
Performance Requirements
Description
Time
Throughput
Web/App Service
Time
< 1000 ms
1 req/sec per
processor
Database Service < 100 ms
Time
10 req/sec per
processor
Total Response
Time
33 req/sec
< 1100 ms
Requirements
Goal: Identify/eliminate performance problems before they
get into Functional/Design/UI specifications.
Functional/Design/UI
Goal: Eliminate performance problems before writing a
line of code.
Example:
Requirements say that users should be able to search on account activity
using any combination of activity fields (e.g. date, payee, amount,
check#).
Functional/Design specification describes an ad-hoc query mechanism
with pseudocode that allows users to conduct this search using a single
database query.
Performance analysis of prototype ad-hoc query shows a throughput of 2
req/sec with 100% CPU utilization on a two processor database server.
Prototyping

Great time to play
 Investigate competing architectures
 Don’t forget performance!
Example: HTML Tag Processing Engine for Internet Bank
 Initial performance analysis showed 5 tags/sec. Web server
CPU 100%. Dependency on size of page.
 Second iteration improved to 20 tags/sec. Still too slow!
Service time allotted completely consumed by tag
processing.
 Third iteration at 60 tags/sec. No page size dependency.
Implementation
Goal: Identify and eliminate performance problems
before they are discovered in QA.






Long duration
Break into drops
Performance assessment of drops
Track progress
A maturing system increases in complexity and
jeopardizes performance
Use instrumentation!
Instrumentation
 Code
must be instrumented by development
 Allows self-tuning
 Provides execution trace for debugging
 Aids performance analysis in lab
 Useful for monitoring application in
production
Example: Instrumentation
Sample code
sub unitTest {
eCal::Metrics->new()->punchIn();
my $tableName;
Activating instrumentation
eCal::Metrics->new()->setEnabled("true");
my $result = tableSelect("users");
eCal::Metrics->new()->setShowExecutionTrace("true");
unitTest;
print $result."\n";
eCal::Metrics->new()->punchOut();
}
Sample instrumentation output
PUNCHIN eCal::Metrics::TableStatisticsDB::unitTest []
|PUNCHIN eCal::Metrics::TableStatisticsDB::tableSelect []
||PUNCHIN eCal::Oracle::prepare []
||PUNCHOUT eCal::Oracle::prepare [] 131.973028182983 msecs
|PUNCHOUT eCal::Metrics::TableStatisticsDB::tableSelect [] 642.809987068176 msecs
PUNCHOUT eCal::Metrics::TableStatisticsDB::unitTest [] 643.355011940002 msecs
Testing
Goal: Identify and eliminate performance problems
before they get into production.
 Performance
testing and analysis must
occur throughput development!!!
 In late cycle QA, should be a formality with
no surprises.
 A surprise at this point will delay product
release or potentially kill a product.
Maintenance
Goal: Identify and eliminate performance problems
before they are detected by users.
Management
console for resource
monitoring
Metrics pages
Instrumentation
Conclusion
Incorporate performance into software’s
entire life cycle to achieve performance
goals.
Resources: Books
 Smith/Williams,
“Software Performance
Engineering”
 Jain, “The Art of Computer Systems Performance
Analysis”
 Tannenbaum, “Modern Operating Systems”
 Elmasri/Navathe, “Fundamentals of Database
Systems”
 Baase, “Computer Algorithms: An Introduction to
Design and Analysis”
Resources: Software
 Resource
Monitoring:
Task Manager, Perfmon
 Sar/iostat/netstat/stdprocess, SE Toolkit
 BMC Best/1, HP OpenView, Precise Insight

 Load
Generation
LoadRunner, SilkPerformer
 Webload

 Automated
Instrumentation
Numega True Time, Jprobe
 Tkprof, Explain Plan, Precise In Depth for Oracle

Resources: Literature/Web







www.perfeng.com - Dr. Connie Smith’s Website
www.spec.org - Benchmarks for computer hardware
www.tpc.org - Benchmarks for databases
Computer Management Group – annual conference in
December.
Workshop on Software Performance – semi-annual
conference in late summer/early fall
ACM SIGMETRICs – annual conference in early summer.
ACM SIGSOFT/SIGMETRICS publications – periodically
feature papers on performance engineering.
Case Studies
Case Study: Microsoft VBScript





Website uses IIS, Microsoft ASP, VBScript
Critical page takes 3000 ms, CPU bound
Instrumentation shows 2500 ms in a single subroutine
Subroutine executed just before html returned to
browser
Approximate size of HTML page is 64K
resp = resp & “<ul>”
I=0
while (I<MAX) {
resp = resp & “<li> List Element” & I & oneKString
}
resp = resp & “</ul>”
Case Study: Microsoft VBScript
MAX
Response
Time
Average Time per
Iteration
10
10ms
1ms
100
800ms
8ms
1000
50000ms
50ms
10000
2,000,000ms
200ms

The more the loop iterates, the longer each iteration takes.

VBScript does not support string concatenation

Each string operation results in a malloc(), copy, and free which is
dependent on the current size of the html string

Why is that so bad?
Case Study: Microsoft VBScript

n
cost of malloc() =
oneK string malloc()
I=1
Sn
Sn
=
=
1
n
+
2
+ …
+ (n-1) + …
2Sn = (n+1) + (n+1) + …
Sn
= n(n+1)/2
+ (n-1) + n
+
2
+ 1
+ (n+1) +(n+1)
Case Study: Microsoft VBScript
Solutions?