Transcript PPT-2
Cache-Conscious
Structure Definition
By Trishul M. Chilimbi, Bob Davidson,
and James R. Larus
Presented by Shelley Chen
March 10, 2003
Motivation
Processor-Memory Performance Gap
Growing at 53% per year!
Chilimbi and Larus previous work
Placed objects with high temporal locality
in the same cache block
Works best with objects < ½ cache block
Current paper proposes techniques for
larger data structures
Improving Cache Performance
Two Cache-Conscious
Definition Techniques
Structure Splitting
Split large data structures into two smaller
structures
Field Reordering
Group fields in a structure with high
temporal locality into the same cache block
Structure Splitting
Split large structures into two smaller
structures
“hot” structure contains frequently accessed fields
“cold” structure contains rarely accessed fields
Allow more “hot” structures to fit into the
cache
Has been done manually in the past, this
paper is first to automate
Class Splitting Overview
Experimental Setup
Ran compiled programs on Sun
Ultraserver E5000
2 GB of memory
L1 dcache, 16 KB DM, 16 byte blocks
L2 cache, 1 MB DM, 64 byte blocks
Results: Structure Splitting
Reduces L2 miss rates by 10-27%; improves execution
time by 10-20%
Field Reordering
Logical ordering of the program is not usually
consistent with its data access patterns
Frequently accessed fields may be coded next to
rarely accessed fields, putting them in the same
cache block
cause excessive cache misses
Reorder field definitions of structure
fields with high temporal affinity in same cache
block
bbcache
Tool that produces structure field reordering
recommendations
Construct a database containing both static and
dynamic information about the structure field
accesses
Process database to construct field affinity graphs
for each structure
Produce the structure field order
recommendations for the affinity graphs
Attempts to group fields with high temporal
affinity into the same cache block
Experimental Setup
4 processor 400MHz Pentium II Xeon system
1MB L2 cache/processor
4GB memory
Ran TPC-C on Microsoft SQL Server 7.0 to
collect traces as input to bbcache
Chose 5 structures which showed largest
potential from the benefits of reordering
Results: Field Reordering
Modified SQL Server was consistently better
by 2-3%
Cache block pressure =Σ (b1, …,bn)/n
Cache block utilization = Σ(f11, …, fnbn)/ Σ(b1, …,bn)
Conclusion
2 techniques to improve cache
performance
change the internal organization of fields in
a data structure
Structure splitting
Field reordering
Structure Layouts better left to compiler
Easier to determine field access order
Questions?
Class Splitting Algorithm
Structure Access Database
Building Field Affinity Graphs