Transcript [ppt]

Improving Cache Performance
of OCaml Programs
Case Study - MetaPRL
Alexey Nogin and Alexei Kopylov
April 15, 1999
Background Information
• OCaml is a dialect of the ML functional
language
• MetaPRL is the next generation of the
NuPrl Proof Development System.
• All measurements were done on a 400Mhz
Pentium-II Xeon with 512Kb L2 cache
running Linux 2.2.2
Overview
• What we tried to do
– Collect some data
– See if standard techniques (developed for Java
and C programs) can be applied
• Why it didn’t work
– Ocaml programs (MetaPRL in particular) are
quite different from Java and C programs in
their cache behavior.
Memory Usage Statistics
• Most object are really small:
– 60-90% of all allocated objects are 3 words (12
bytes) big
• We allocate them really fast - 40-60 Mb/sec
• Only 1-10% of allocated objects survive the first
garbage collection.
• L1 DCU miss rate is 1.7-2.1%
• L2 cache miss rate is 18-47%
Cache-Conscious Structure
Definition
Trushil M. Chilimbi
Bob Davidson
James R. Larus
Ideas
• Structure size << cache block size
– no action
• Structure size @ cache block size
– splitting structure into “hot” and “cold” portions
• Structure size >> cache block size
– field reordering
Structure Splitting
hot
f1 f2 f3 f4
becomes
f3
cold
f1 f2 f4
• Pros:
– pack more hot object fields per cache line
• Cons:
– cost of additional reference from hot to cold portion
– code bloat
– more objects in memory
– extra indirection to access fields in the cold portion
Field Reordering
• Typically fields in big structures are grouped
logically
– exchange fields to better match program access pattern
• Problems:
– in C may use pointer arithmetic to access field
– existing file formats and protocol specifications