Presentation
Download
Report
Transcript Presentation
TOP N PERFORMANCE TIPS
Adam Backman
[email protected]
Partner, White Star Software
Components of Performance
Application
Network
Disk
Memory
CPU
Wise men once said…
The best administrator in the world can be easily
defeated by incompetent programmers
-Adam Backman (2014)
No amount of tuning can defeat crap code
-Tom Bascom (2014)
Application Issues
Use indexes (Not: USE-INDEX)
Use NO-LOCK for reporting
Use NO-UNDO on variables
Use TEMP-TABLES
Use FIELD-LIST
Move Client/Server applications to use OpenEdge®
AppServers™
Methodology
After correcting obvious application issues you
want to push the performance bottleneck to the
fastest resource
Changes should be made one at a time so the
effect of each change can be measured prior to
the next change
The goal is to push the bottleneck to the next
faster resource
Network Disk Memory CPU
OpenEdge Architecture
OpenEdge Architecture
− Shared memory
− Server-less
− Multi-server
Networking
− Primary broker
− Secondary broker
OpenEdge Memory Architecture
Remote
Client
Shared Memory
Broker
Server
Background Processes
Database
7
OpenEdge Network Architecture
• Primary broker (–Mn –Mpb –Mi –Ma)
• Splitting clients across servers
• Single pool of servers for all users
• Secondary broker (–Mpb –Mi –Ma)
• Splitting clients across brokers
• Multiple pools of servers
•
•
•
Call Center users
General users
Reports…
OpenEdge Architecture
Client/Server Overview
• The OpenEdge Server
– A process that accesses the database for 1 or more remote
clients
Shared Memory
SQL-92
Server
4GL
Server
Database
9
Network I/O
Sources of Network I/O
− Temporary files
-T pointing to network
Users’ home directories on network
− Poorly constructed queries
Lock requests
Poor use of indexes
No use of field list on large tables
Solutions
− -T to local drive (C:\TEMP or /tmp)
− Use of AppServer for read-intensive queries
− Use –pls with program libraries stored on the network
Disks
This is where to spend your money
Goal: Use all disks evenly
Buy as many physical disks as possible
RAID 5 is still bad in many cases, improvements
have been made but test before you buy as there
is a performance wall out there and it is closer with
RAID 5
Planning – Careful Planning
Causes of Disk I/O
Database
− User requests (reads) - Usually 90% of total load
− Updates (This affects DB, BI and AI)
Temporary file I/O
Operating system - usually minimal provided
enough memory is installed
Other I/O
− Reporting to disk
− Other application file
− Archiving
Operating System Storage Considerations
Use RAID 10
Avoid RAID5 (There are exceptions but not many)
Use large stripe widths
Match OpenEdge and OS block size
Operating system Storage
Use RAID 10 (0+1) or Mirroring and Striping for
best protection of data with optimal performance
for the database
For the AI and BI RAID 10 still makes sense in
most cases. Exception: Single database
environments
RAID 10 vs. RAID 5 cache fill rate
fillTime = cacheSize / (requestRate – serviceRate)
• 4 disks
• RAID10 vs RAID5
• 4KB db blocks
• 4GB RAM cache (1048576 blocks)
Typical Production DB Example:
4GB / ( 200 io/sec – 800 io/sec ) = cache doesn’t fill!
Heavy Update Production DB Example:
4GB / ( 1200 io/sec – 800 io/sec ) = 2621 sec. (≈ 44 min.) (RAID10)
4GB / ( 1200 io/sec – 200 io/sec ) = 1049 sec. (≈ 17 min.) (RAID5)
Maintenance Example:
4GB / ( 5000 io/sec – 3200 io/sec ) = 583 sec. (≈ 10 min.) (RAID10)
4GB / ( 5000 io/sec – 200 io/sec ) = 218 sec (≈ 4 min.) (RAID5)
Database Storage Considerations
Database blocks
Type II storage areas
Database block size
Records per block
Fixed vs. variable extents
Enable large files
Database Blocks
Fixed extent
Cluster
Database block
Variable extent
Not yet
allocated
by O/S
Filled
Partly filled
Free
Not yet allocated
Block Layout
Block’s DBKEY
Type
Next DBKEY in Chain
Num
Dirs.
Free
Dirs.
Rec 2 Offset
Chain Backup Ctr
Block Update Counter
Free Space
Rec 0 Offset
Rec n Offset
Free Space
Used Data Space
Rec 1 Offset
Block’s DBKEY
Type
Next DBKEY in Chain
Top
Chain Backup Ctr
Block Update Counter
Bot
Index No.
Reserved
Num Entries
Bytes Used
Dummy Entry . . .
. . . Compressed Index Entries . . .
…….
row 1
. . . Compressed Index Entries . . .
row 2
Free Space
row 0
Data Block
Index Block
Performance Tuning
General tuning methodology
Get yourself in the ballpark
Get baseline timings/measurements
Change one thing at a time to understand value of each
change
This is the only thing where everyone agrees 100%
Remember: Tuning is easy just follow our simple plan
Type I Storage Areas
Data blocks are social
− They allow data from any table in the area to be stored
within a single block
− Index blocks only contain data for a single index
Data and index blocks can be tightly interleaved
potentially causing scatter
Type II Storage Areas
Like data is clustered together
A cluster will only contain records from a single
table
A cluster can contain 8, 64 or 512 blocks
Data scatter is reduced
Universally better performance
− Disk array read-ahead
− Better locality of data
− Better value per physical I/O
Type II Clusters
Fixed Extent
Cluster
Customer
Cluster
Order
Cluster
Order Index
Storage Areas Compared
Type I
Type II
Data Block(1)
Data Block(1)
Idx Block(1)
Data Block(3)
Data Block(2)
Data Block(1)
Idx Block(2)
Idx Block(1)
Data Block(3)
Idx Block(2)
Data Block(1)
Data Block(1)
Data Block(1)
Data Block(1)
Data Block(1)
Data Block(1)
Data Block(1)
Data Block(1)
Idx Block(1)
Idx Block(1)
Database Block Size
Generally, 8k works best for Unix/Linux
4k works best for Windows
Remember to build file systems with larger block
sizes (match if possible)
There are exceptions so a little testing goes a long
way but if in doubt use the above guidelines
Determining Records per Block
Determine “Mean” record size
− Use proutil <dbname> -C dbanalys
Add 20 bytes for record and block overhead
Divide this product into your database block size
Choose the next HIGHER binary number
− Must be between 1 and 256
Example: Records /Block
Mean record size = 90
Add 20 bytes for overhead (90 + 20 = 110)
Divide product into database blocksize
− 8192 ÷ 110 = 74.47
Choose next higher binary number 128
Default records per block is 64 in version 9
and 10
OpenEdge Memory Considerations
-B buffer pool
-B2 alternate buffer pool
BI cluster size
Before image buffers
Page writers
-spin
General Memory Rule - Locality of Reference
When data is referenced there is a high probability
that it will be referenced again soon
If data is referenced there is a high probability that
“nearby” data will be referenced soon
Locality of reference is why caching exists at all
levels of computing
OpenEdge Buffer Pool (-B)
OpenEdge buffer pool provides a way to reduce
operations to physical disks
A 90% buffer hit rate denotes 1 in 10 requests
being passed to disk
At 95% 1 in 20 requests are passed to disk
At 99% only 1 of each 100 requests are passed to
disk
At the high percentages small increases in buffer
hit percentage significantly reduce I/Os to disk
OpenEdge Alternate Buffer Pool (-B2)
An excellent way to reduce/eliminate in-memory overhead
associated with buffer pool operations
Tables that are candidates for the –B2 are:
− Fixed size (not rapidly growing)
− Very high utilization (master records)
The –B2 must be set larger than the storage requirement
of the tables and indexes that are placed in this structure
− Use a database analysis to determine size
− I generally add 10% for good measure
What is a Checkpoint?
A synchronization point between memory and disk
− The initial checkpoint schedules modified buffers (-B)
to be written prior to the next checkpoint
− Subsequent checkpoints first check to make sure all of
the previously scheduled buffers have been written and
then schedules
The goal is to have no buffers to be written at
checkpoint
Checkpoint Efficiency
The before image cluster size determines the
frequency of checkpoints
The ability APWs to write all of the modified blocks
determines the efficiency
Buffers flushed at checkpoint can be caused by a
disk bottleneck as well as improper settings of the
cluster size and number of APWs
If changing the number of APWs does not reduce
or eliminate buffers flushed at checkpoint it is
likely a disk issue
Checkpoint Efficiency
Goals:
− 1 Checkpoint per 120 seconds (2 min.) during highest human
update period of the day (BI cluster size)
− Start with a single APW and add APWs until you can get through
the highest update period of the day without buffers flushed at
checkpoint
− If you already have several APWs then look at writes by BIW and
reduce the number of APWs one at a time
Writes by BIW should increase
Partial writes should reduce
Still no buffers flushed at checkpoint
Before Image Buffers -bibufs
Buffers used by BIW (and APWs) to do the writes
to the before image file
The default value is too low
Generally, I set it to 50 for small to medium
systems and 150 for large systems
Additionally, I generally set –aibufs to the same
value as -bibufs
Page Writers
Every database that has updates should have a
Before image writes (BIW)
Every database that has after imaging journaling
enabled should have a after image writer (AIW)
As stated before:
− Start with a single asynchronous page writer (APW)
− After your cluster size is correct (1 CP/120 sec.)
− Add additional APWs to eliminate writes at checkpoint
in the activity screen (option 5) in promon
Setting –spin
The –spin parameter determines the number of
retries a client will attempt while getting shared
memory resources
The vast majority of installations can set it
between 2000 and 10000
Others will need to do in-depth analysis and these
sights are generally very high throughput
Putting highly read static tables in –B2 can provide
additional performance without changing -spin
Conclusion
Move the bottleneck to the next faster resource
Setup disks for optimal performance
−
−
−
−
No, RAID 5
Yes, RAID 10
Large stripe width/chunk size
Match DB and OS block size
Database block size (generally 8k blocks are best)
Remember to change number of records per block
Conclusion - continued
Use type II storage areas
Get the best buffer hit rate that you can (95%+)
High read tables with little or no growth could go in –B2
One checkpoint per 2 minutes at peak (human peak)
APWs, start with 1 and add as needed
No buffers flushed at checkpoint
Use –spin generally between 2000 and 10000 works for
most
Questions?