Transcript [ppt]
Improving Cache Performance of OCaml Programs Case Study - MetaPRL Alexey Nogin and Alexei Kopylov April 15, 1999 Background Information • OCaml is a dialect of the ML functional language • MetaPRL is the next generation of the NuPrl Proof Development System. • All measurements were done on a 400Mhz Pentium-II Xeon with 512Kb L2 cache running Linux 2.2.2 Overview • What we tried to do – Collect some data – See if standard techniques (developed for Java and C programs) can be applied • Why it didn’t work – Ocaml programs (MetaPRL in particular) are quite different from Java and C programs in their cache behavior. Memory Usage Statistics • Most object are really small: – 60-90% of all allocated objects are 3 words (12 bytes) big • We allocate them really fast - 40-60 Mb/sec • Only 1-10% of allocated objects survive the first garbage collection. • L1 DCU miss rate is 1.7-2.1% • L2 cache miss rate is 18-47% Cache-Conscious Structure Definition Trushil M. Chilimbi Bob Davidson James R. Larus Ideas • Structure size << cache block size – no action • Structure size @ cache block size – splitting structure into “hot” and “cold” portions • Structure size >> cache block size – field reordering Structure Splitting hot f1 f2 f3 f4 becomes f3 cold f1 f2 f4 • Pros: – pack more hot object fields per cache line • Cons: – cost of additional reference from hot to cold portion – code bloat – more objects in memory – extra indirection to access fields in the cold portion Field Reordering • Typically fields in big structures are grouped logically – exchange fields to better match program access pattern • Problems: – in C may use pointer arithmetic to access field – existing file formats and protocol specifications