Code Analyzer - Active Error
Download
Report
Transcript Code Analyzer - Active Error
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
1
Insert Information Protection Policy Classification from Slide 13
Maximizing Your SPARC T5
Oracle Solaris Application
Performance
Darryl Gove
Senior Principal Software Engineer
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
2
Insert Information Protection Policy Classification from Slide 13
Program Agenda
Hardware
Correctness
Performance
Parallelism
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
3
Insert Information Protection Policy Classification from Slide 13
Oracle Solaris Studio
Compiler Suite
Analysis Suite
C, C++ Compilers utilize advanced code
generation technology to optimize apps for highest
performance on SPARC & x86
Performance Analyzer provides unparalleled insight
into your app, allowing you to identify bottlenecks and
improve performance by orders of magnitude
Fortran Compiler optimizes compute intensive app
performance
Code Analyzer ensures app reliability by detecting
app vulnerabilities, including memory leaks and
New
memory
access violations
Debugger ensures app stability with event handling
& multi-thread support
Performance Library maximizes computeintensive app performance using advanced numeric
solver
libraries
© 2011
Oracle
Corporation – Proprietary and Confidential
Thread Analyzer simplifies complex parallel
programming errors by detecting hard to pinpoint
race and deadlock conditions
Integrated Development Environment increases developer efficiency
4
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
4
Oracle Solaris Studio 12.3 Highlights
Accelerate
Performance
Gain Extreme
Observability
Improve
Productivity
5
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
5x faster code on SPARC T5
1.5x faster code on Intel x86
New Code Analyzer for more reliable applications;
reports common coding & memory access errors faster
than competitive alternatives
Enhanced Performance Analyzer with system-wide
performance analysis
Remote access to Solaris Studio tools from local
desktop (Oracle Solaris, Linux, Microsoft Windows, Mac)
Streamlined Oracle DB application development
Simplify Oracle Tuxedo development with IDE plug-in
IPS distribution on Solaris 11 for simplified management
20% faster compile time
Oracle Solaris Studio 12.3, 1/13 PSE
Delivers compiler optimisations resulting in the fastest code on the
new Oracle T5, Oracle M5 and Fujitsu M10 systems
Up to 5x faster than GCC
Up to 10% faster than Oracle Solaris Studio 12.3
IPS or SVR4 package update to Oracle Solaris Studio 12.3
Available for customers with the Solaris Development Tools Support
contract
More information: Article ID 1519949.1 on My Oracle Support
6
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Click icon to add picture
SPARC T5 Hardware
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
7
Insert Information Protection Policy Classification from Slide 13
SPARC T5 - Overview
Enhancement over T4
More threads
Faster clock speed
Larger third level cache
T5 and T4:
8
Unrelated to T1 – T3 (only share the T-series name)
Enhanced multithread throughput
Enhanced single thread performance
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
SPARC T5 - Details
1 to 8 chips per system
16 cores per chip
Dual issue
Out-of-order
8 threads per core
3.6 GHz clock
9
115B (3.6 GHz * 16 * 2) instructions / sec / chip
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
SPARC T5 - Capacity
Chip capacity: 115 B instructions / sec
For fully active threads:
Single thread: 7.2 B instructions / sec
Each of eight threads: 0.9 B instructions / sec
Threads rarely fully active:
10
I/O wait
Processor stall (fetch from memory = 300-400 cycles)
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Developing for T5
Make it correct
Remove obvious performance issues
Make it scale (correctly)
11
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Click icon to add picture
Application Correctness
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
12
Insert Information Protection Policy Classification from Slide 13
Debug information
Always use -g
No optimisation flags:
Full debug
Lower performance
Optimised binaries:
Best effort debug
No/minimal performance impact
Debug what you ship!
13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Automatic Error Detection
Static/compile time error detection
Code Analyzer
Dynamic/runtime memory access error detection
14
Discover
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Code Analyzer
Static analysis for common coding errors
Uninitialised variables, etc.
Compile with:
-xanalyze=code
View results with:
15
code-analyzer <a.out>
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Code Analyzer – example output
16
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Memory Error Detection - discover
Common memory allocation and use errors:
Uninitialised memory
Access past bounds
Memory leaks
Usage:
17
discover <a.out>
<a.out>
Default = html output
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Example of discover
$ ./a.out
ERROR 1 (ABR): reading memory beyond array bounds at address
0xffbff278 (8 bytes) on the stack at:
average() + 0x228 <disc.c:8>
6:
for (int i=1; i<=len; i++)
7:
{
8:=>
total+=array[i];
9:
}
_start() + 0xd8
...
double array[20];
...
printf(" Average = %f\n", average(array,20) );
18
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Click icon to add picture
Application Performance
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
19
Insert Information Protection Policy Classification from Slide 13
Optimisation – the Basics
No optimisation flags == no optimisation
Good optimisation: -O
Advanced optimisations:
20
Guided by profile of appliaction
Knowledge of deployment systems
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Profiling
Profiling with the performance analyzer
collect <a.out>
collect -P <pid>
analyzer test.1.er
Report generation with spot
21
spot <a.out>
spot -P <pid>
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Performance Analyzer
Demo
22
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Aggressive Optimisation
One stop flag: -fast
Enables multiple optimisations
Build machine = deployment machine
Floating point simplification and optimisation
Pointers to different types do not alias
Function inlining
Investigate performance gain
23
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Profile Drives Flag Selection
Floating Point
Significant time in floating point computation:
Floating point simplification
-fsimple=2
Significant time in floating point library code:
Optimised floating point libraries
-xlibmopt, -xlibmil
Use FP optimisations if performance improves and FP optimisations
are acceptable
24
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Profile Drives Flag Selection
Flat profile
Many hot small functions
At least -xO4 optimisation level
-xipo for cross-file optimisations
Conditional code or inlining
25
Profile feedback
-xprofile=collect:
Training run of application
-xprofile=use:
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Profile Drives Flag Selection
Pointers
Pointers inhibit compiler optimisations
Compiler needs more information
restrict qualified pointers in C
Localised action
Flags:
26
-xrestrict (restrict qualified pointers passed into functions)
-xalias_level=std [C]
-xalias_level=compatible [C++]
Actions at file level
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Processor Specific Optimisations
Default: -xtarget=generic often good enough
T4/T5 have useful instructions:
Compare and branch
Floating point multiply add
One stop flag: -xtarget=T5
Schedules for T5, uses entire T4 and T5 instruction set
Only runs on T4, T5, or later processors
27
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
SPARC Instruction Sets
28
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Click icon to add picture
Multi-threaded
Applications
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
29
Insert Information Protection Policy Classification from Slide 13
Multi-thread or Multi-process
Multiprocess:
Isolation
Independence
Large virtual memory footprint
Potentially high synchronisation costs
Throughput
Multithread
30
Low synchronisation costs
Minimal memory footprint
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Latency
Multi-threaded Application Development
POSIX threads (C11, C++11)
Low level: Great control, significant complexity
OpenMP
High abstraction: Easy to use, flexible
Automatic parallelisation
31
Trivial to use: -xautopar -xreduction
Works best for loop-intensive code (typically FP)
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
OpenMP Parallel For
Distributes iterations across CPUs
#pragma omp parallel for
for (int i=0; i<length; i++)
{
// Do work
}
32
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
OpenMP Tasks
Distributes work across CPUs
for (int i=0; i<length; i++)
{
#pragma omp task
{
// Do work for task ‘i’
}
}
33
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Parallel Program Correctness
Distributes work across CPUs
int total=0;
#pragma omp parallel for
for (int i=0; i<length; i++)
{
total += i;
}
Data race: Multiple threads updating the same variable
34
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Thread Analyzer
Instrument application
Compiler flag: -xinstrument=datarace
Binary instrumentation: discover -i datarace
Gather data:
collect -r on <a.out>
View data:
35
tha tha.1.er
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
<a.out>
Thread Analyzer - Example
Demo
36
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Scaling to Many Threads
Minimise serial code
Amdahl’s Law
Minimise lock contention
Minimise writes of shared data
Evenly distribute work
37
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Scaling to Many Threads
Demo
38
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Limits of Performance
Threads
vmstat
Instruction Issue Width
pgstat / cputrack / cpustat / ripc
Bandwidth
39
busstat / bw
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Conclusion: Optimising for T5
Step 1: Profile and remove inefficient code
Step 2: Explore benefits of increased optimisation
Step 3: Identify opportunities for parallelisation
Step 4: Profile and tune parallel code
Step 5: Watch for hitting hardware limits
40
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
41
Insert Information Protection Policy Classification from Slide 13