Instruction-Based Sampling and AMD CodeAnalyst

Download Report

Transcript Instruction-Based Sampling and AMD CodeAnalyst

Instruction-Based Sampling and
AMD CodeAnalyst
ISPASS 2010 poster session
Paul J. Drongowski | March 29, 2010
Instruction-Based Sampling (IBS)
 IBS is supported by AMD Family 10h processors.
 IBS monitors execution activity and fetch activity.
– Select and tag execution micro-op at issue stage.
– Retain address of parent x86/x86_64 instruction.
– Monitor tagged op during execution.
– Generate interrupt when the tagged op retires.
– Profiling software (AMD CodeAnalyst) takes sample.
 Event attribution is precise because the address of the
parent instruction is known and is reported.
 An IBS profile accurately identifies performance culprits
unlike performance counter sampling (PCS).
2 | IBS and AMD CodeAnalyst | March 29, 2010
Example: Art benchmark (SPEC CPU2000)
 Art incurs DTLB misses due to long memory strides.
[0]
[0]
[1]
…
[87]
[1]
[0]
[1]
…
[87]
[2]
[0]
[1]
…
[87]
[3]
[0]
[1]
…
[87]
[4]
[0]
[1]
…
[87]
[5]
[0]
[1]
…
[87]
for (ti = 0 ; ti < numf1s ; ti++)
{
Y[tj].y += f1_layer[ti].P * bus[ti][tj] ;
}
…
bus = (double **)malloc(numf1s*sizeof(double *));
…
bus[i] = (double *)malloc(numf2s*sizeof(double));
3 | IBS and AMD CodeAnalyst | March 29, 2010
Example: PCS profile
 This table is the PCS profile for an inner loop in Art.
 Events are attributed to culprit instructions.
Retired
Mem
Address Instruction
Instruction Access
402520 mov esi,dword ptr [_bus]
805
541
402526 mov esi,dword ptr [esi+eax*4]
794
486
402529 fld qword ptr [esi+ebx*8]
1981
641
40252C mov esi,dword ptr [_f1_layer]
36636
26068
402532 fmul qword ptr [edx+esi+28h]
673
487
402536 inc eax
4611
3278
402537 add edx,40h
610
450
40253A fadd qword ptr [ecx+edi]
614
442
40253D fstp qword ptr [ecx+edi]
13515
9262
402540 mov esi,dword ptr [_numf1s]
7226
4922
402546 cmp eax,esi
924
560
402548 mov edi,dword ptr [_Y]
824
529
40254E jl
402520
917
605
4 | IBS and AMD CodeAnalyst | March 29, 2010
Cache
Miss
65
78
72
3378
70
538
48
57
1076
530
69
64
52
L1 DTLB L2 DTLB
Miss
Miss
2
0
2
1
4
1
103
87
4
1
12
29
3
0
7
0
36
3
23
5
1
2
3
1
1
0
Example: IBS profile
 This table is the IBS profile for the same inner loop.
 Culprit instructions are clearly identified.
Address
402520
402526
402529
40252C
402532
402536
402537
40253A
40253D
402540
402546
402548
40254E
Instruction
mov
mov
fld
mov
fmul
inc
add
fadd
fstp
mov
cmp
mov
jl
esi,dword
esi,dword
qword ptr
esi,dword
qword ptr
eax
edx,40h
qword ptr
qword ptr
esi,dword
eax,esi
edi,dword
402520
ptr [_bus]
ptr [esi+eax*4]
[esi+ebx*8]
ptr [_f1_layer]
[edx+esi+28h]
[ecx+edi]
[ecx+edi]
ptr [_numf1s]
ptr [_Y]
Retired
Op
5430
5378
5341
5296
5381
5328
5250
5353
5323
5355
5340
5426
5411
5 | IBS and AMD CodeAnalyst | March 29, 2010
Mem
Access
5430
5378
5341
5296
5381
0
0
5353
5323
5355
0
5426
0
Cache
Miss
23
530
5340
45
3464
0
0
77
32
30
0
20
0
L1 DTLB L2 DTLB
Miss
Miss
0
0
6
7
180
285
0
1
49
113
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Information reported by IBS
 A wide spectrum of information is collected in a single
experimental run.
 Miss latency, data operand (effective) address and
locality flags enable NUMA analysis. [McCurdy/Vetter]
IBS fetch sampling
Fetch address
Completion status
Fetch latency
Instruction cache miss
L1 instruction TLB (ITLB) miss
L2 instruction TLB miss
Translation page size
IBS op sampling
Instruction address
Load / store operation
Data operand address
Data cache miss latency
Data cache miss
L1 data TLB (DTLB) miss
L2 data TLB miss
6 | IBS and AMD CodeAnalyst | March 29, 2010
Misaligned access
Remote / local access
Remote / local data source
Translation page size
Branch / return operation
Branch prediction
Branch taken
AMD CodeAnalyst™ Performance Analyzer
 CodeAnalyst collects and displays IBS-based profiles.
 IBS data are aggregated into derived event counts.
– A “derived event” is an abstract event defined in terms
of one or more hardware flags or a stall/latency count.
– Derived events are treated like counter events.
– This approach allows reuse of existing infrastructure
 CodeAnalyst is available for Windows/Linux. (Source is
available for the Linux version.)
Trademark Attribution
AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other
names used in this presentation are for identification purposes only and may be trademarks of their respective owners.
©2010 Advanced Micro Devices, Inc. All rights reserved.
7 | IBS and AMD CodeAnalyst | March 29, 2010