Slide 1

Transcript Slide 1

The Return of Synthetic Benchmarks

Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin)

January 28, 2008 Laboratory of Computer Architecture Department of Electrical & Computer Engineering The University of Texas at Austin

Outline



The Need for Synthetic Benchmarks



BenchMaker Framework for Benchmark Synthesis



Workload Characteristics Used in Synthesis



Synthetic Benchmark Construction



Evaluation of BenchMaker



Applications



Summary

Benchmark Spectrum

Complete Application Code Application Suites e.g. SPEC CPU Kernel Codes e.g. Livermore Loops Synthetic Benchmarks e.g. Dhrystone, Whetstone Microbenchmarks e.g. STREAM Toy Benchmarks e.g. Heap sort Less Development Effort More Scalable More Maintainable Less Representative More Development Effort Less Scalable Less Maintainable More Representative

Focus on Simulation Time Reduction

Benchmark Subsetting [Eeckhout et al., PACT’02] [Vandierendonck et al., CAECW’04] [Phansalkar et al., ISPASS’05] [Eeckhout et al. IISWC’05]

•

Statistical Sampling [Conte et al., ICCD’96 ] [Wunderlich et al., ISCA’03]

•

Representative Sampling [Sherwood et al., ASPLOS’02]

•

Reduced Input Set [ KleinOsowski, CAN’04]

•

Statistical Simulation & Synthetic Workloads [Oskin et al., ISCA’00] [ Eeckhout et al., ISPASS’00] [Nussbaum et al., PACT’01] [Bell et al., ICS’05] Benchmark Run Length M ic ro pr Co m oc pl es ex so ity r

•

Analytical Modeling [Noonburg et al., MICRO’94] [Karkhanis et al., ISCA’04]

•

Speedup Simulation [Schnarr et al., ASPLOS’98] [Loh et al., SIGMETRICS’01]

Motivation : Benchmarking Challenges

   

Using Real-World Applications as Benchmarks Proprietary Nature of Real-World Applications Single-Point Performance Characterization Application Benchmarks are Rigid Applications Evolve Faster than Benchmarks Benchmark Suites are Costly to Develop, Maintain, and Upgrade Studying Commercial Workload Performance Early Design Stage Power/Performance Studies Usefulness of Synthetic Benchmarks Beyond Simulation Time Reduction

Resurgence of Synthetic Benchmarks…..

IEEE Computer, August 2003

Outline



The Need for Synthetic Benchmarks



BenchMaker Framework for Benchmark Synthesis



Workload Characteristics Used in Synthesis



Synthetic Benchmark Construction



Evaluation of BenchMaker



Applications



Summary

Workload Synthesis: Central Idea

Just 40 workload characteristics Application Behavior Space ‘Knobs’ for Changing Program Characteristcs Workload Synthesis Algorithm

Workload Synthesizer

Synthetic Benchmark Compile and Execute

Real Hardware or RTL

A D D R 1 , R 2 , R 3 L D R 4 , R 1 , R 6 M U L R 3 , R 6 , R 7 A D D R 3 , R 2 , R 5 D I V R 1 0 , R 2 , R 1 S U B R 3 , R 5 , R 6 S T O R E R 3 , R 1 0 , R 2 0 A D D R 1 , R 2 , R 3 L D R 4 , R 1 , R 6 M U L R 3 , R 6 , R 7 A D D R 3 , R 2 , R 5 D I V R 1 0 , R 2 , R 1 S U B R 3 , R 5 , R 1 B E Q R 3 , R 6 , L O O P S U B R 3 , R 5 , R 6 S T O R E R 3 , R 1 0 , R 2 0 D I V R 1 0 , R 2 , R 1 … … … … .

Execution Driven Simulator

Modeling Real-World Applications

Microarchitecture-Independent Workload Profiling Modeling Workload Attributes into Synthetic Workload Experiment Environment Real World Proprietary Workload Workload Profiler

Binary Instrumentation OR Simulation

Real Hardware Workload Profile = Workload Attributes + Distribution Of Attribute Values Workload Synthesizer Synthetic Benchmark Clone Execution Driven Simulator

Outline



The Need for Synthetic Benchmarks



BenchMaker Framework for Benchmark Synthesis



Workload Characteristics Used in Synthesis



Synthetic Benchmark Construction



Evaluation of BenchMaker



Applications



Summary

Workload Characteristics as ‘Knobs’

Category instruction mix Num.

10 Characteristic percentage of integer short latency percentage of integer long latency percentage of floating-point short latency percentage of floating-point long latency percentage of integer load percentage of integer store percentage of floating-point load percentage of floating-point store percentage of branches Instruction-level parallelism 8 register-dependency-distance – 8 distributions for register dependencies. Register dependency distance equal to 1 instruction, and the percentage of dependency dependencies that have a distance of up to 2, 4, 6, 8, 16, 32, and greater than 32 instructions.

data locality instruction locality branch predictability 1 10 1 10 data footprint distribution of local stride values instruction footprint distribution of branch transition rate

Capturing The Essence of Workloads



Attributes to capture inherent workload behavior – Data Locality: Dominant strides of static Load/Store – Control Flow Predictability: Branch transition rate



Modeling Locality & Control Flow Predictability – Data Locality of Integer, Scientific, and Embedded Workloads effectively modeled using circular streams – Replicating transition-rate of static branches

Modeling Data Access Pattern

• Identify streams of data references • A Stream?

– Sequence of memory addresses in an arithmetic progression – Elements of arrays A, B, and C form 3 streams for( ii = 0; ii < N; ii ++) A [ii] = B [ii] + C [ii] 200, 204, 208 .. 320, 324, 328 Issuing Sequence : 320 , 404 , 200 , ..

324 , 408 , 404, 408, 412 204 ….

...

• Streams are interleaved and may contain noise 4 , 8 , 12 , 16 , 1 , 3 , 20 , 24 , 5 , 7 , 2, 9 , 11 , 28 … 13

Extracting Streams



Reference pattern of static Load / Store Instructions

– PC-correlated spatial locality - Dependence on address referenced by nearby Ld / St - Programs with pointer chasing codes – PC-correlated temporal locality - Dependence on previous address generated by same Ld / St - Programs with multidimensional arrays 

Could static Load / Store instructions be natural sources of streams ?



Profile every static Load / Store instruction

– Number of different strides with which it accesses data 14

Modeling Instruction Level Parallelism

Dependency Distance ADD R1, R3,R4 MUL R5,R3,R2 ADD R5,R3,R6 LD R4, (R1) SUB R8,R2,R1

Read After Write Dependency Distance = 3 Measure Distribution of Dependency Distances Upto 1, Upto 2, Upto 4, Upto 8, Upto 16, Upto 32, >32

Modeling Control Flow Predictability

 Capture behavior of easy and difficult to predict branches  Inherent program feature that captures branch behavior  Transition Rate [ Haungs et al. HPCA’00 ] # of Taken-Not Taken transitions / # of times executed  Branches with low transition-rate (easier to predict) TTTTTTTTTN, NNNNNNNNNT  Branches with high transition-rate (easier to predict) TNTNTNTNTN  Branches with moderate transition-rate (tougher to predict) 16

Outline



The Need for Synthetic Benchmarks



BenchMaker Framework for Benchmark Synthesis



Workload Characteristics Used in Synthesis



Synthetic Benchmark Construction



Evaluation of BenchMaker



Applications



Summary

Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities A

0.8

BR 0.2

BR 1.0

1.0

C D

BR 0.1

0.9

Workload Profile

Workload Synthesis (1)

1 Big Loop

A B D A B D A C D A B D 18

Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities

Workload Synthesis (2)

Memory Access Model (Strides) A

0.8

BR 0.2

BR 1.0

1.0

C D

BR 0.1

0.9

Workload Profile

1 Big Loop

A B D A B D A C D A B D 19

Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities

Workload Synthesis (3)

Memory Access Model (Strides) A

0.8

BR 0.2

BR 1.0

1.0

C D

BR 0.1

0.9

Workload Profile

1 Big Loop Branching Model – Based on Transition Rate

A B D A B D A C D A B D 20

Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities

Workload Synthesis (4)

Memory Access Model (Strides) A

0.8

BR 0.2

BR 1.0

1.0

C D

BR 0.1

0.9

Workload Profile

1 Big Loop Branching Model – Based on Transition Rate

A B D A B D A C D A B D

Register Assignment C code with asm & volatile constructs

Outline



The Need for Synthetic Benchmarks



BenchMaker Framework for Benchmark Synthesis



Workload Characteristics Used in Synthesis



Synthetic Benchmark Construction



Evaluation of BenchMaker



Applications



Summary

Evaluation of BenchMaker

  SPEC CPU2000, SPECjbb2005, and DBT2 workloads Validated Sim-Alpha Performance Model of Alpha 21264

Benchmark bzip2 crafty eon gcc gzip mcf perlbmk twolf vortex vpr gcc gcc Input SimPoint(s)

SPEC CPU2000 Integer

graphic ref rushmeier 166.i

graphic 553 774 403 389 389 ref perfect-ref ref lendian1 route expr 271 476 8, 24, 47, 51, 56, 73, 87, 99

SPEC CPU95 Integer

expr 553 5 1066 0, 3,5,6,7,8,9,10,12

1.8

1.6

1.4

1.2

1 0.8

0.6

0.4

0.2

Performance Correlation

Original Benchmark Synthetic Benchmark

Trade Accuracy for Flexibility – Average Error of 11%

35 30 25 20 15 10 5 0

Energy/Power Correlation

Original Benchmark Synthetic Benchmark

Average Error of 13%

Outline



The Need for Synthetic Benchmarks



BenchMaker Framework for Benchmark Synthesis



Workload Characteristics Used in Synthesis



Synthetic Benchmark Construction



Evaluation of BenchMaker



Applications



Summary

Altering Individual Program Characteristics

1.4

1.2

1 0.8

0.6

0.4

0.2

0 0 10 20 30 40 50 60 66 70 80

Percentage of References with Stride Value 0

90 100 27

Interaction of Program Characteristics

Data Footprint - 600K Data Footprint - 900K Data Footprint - 300K 0.35

0.3

0.25

0.2

0.15

0.1

0.05

0 0 10 20 30 40 50 60 66 70

Percentage of references w ith Stride Value 0

80 90 100 28

Modeling Impact of Benchmark Drift

Increase in Code Footprint (hypothetical)

1.2

1 0.8

0.6

0.4

0.2

0 1 2 3 4 5 6

Factor by which code size is increased

7 8

Increase in Data Footprint from SPEC CPU95 to SPEC CPU2000 for gcc (Model with 7% accuracy)

Summary



Synthetic Benchmarks to Address Benchmarking Challenges



Constructing Synthetic Benchmarks from Hardware-Independent Characteristics



Applications of Synthetic Benchmarks - Altering Program Characteristics - Studying Interaction of Program Characteristics - Modeling Benchmark Drift

Questions?

Ajay’s email: [email protected]

Slide 1

Transcript Slide 1

The Return of Synthetic Benchmarks

Outline

Benchmark Spectrum

Focus on Simulation Time Reduction

Motivation : Benchmarking Challenges

Resurgence of Synthetic Benchmarks…..

Outline

Workload Synthesis: Central Idea

Modeling Real-World Applications

Outline

Workload Characteristics as ‘Knobs’

Capturing The Essence of Workloads

Modeling Data Access Pattern

Extracting Streams

Modeling Instruction Level Parallelism

Modeling Control Flow Predictability

Outline

Workload Synthesis (1)

Workload Synthesis (2)

Workload Synthesis (3)

Workload Synthesis (4)

Outline

Evaluation of BenchMaker

Performance Correlation

Energy/Power Correlation

Outline

Altering Individual Program Characteristics

Interaction of Program Characteristics

Modeling Impact of Benchmark Drift

Summary

Questions?

Directory