Transcript Slide 1

Data Latency
Rich Altmaier
Software and Services Group
1
Software and Services Group
CPU Architecture Contribution
• Data intensive == memory latency bound
– Minimal cache line use and reuse
– Often pointer chasing – hard to prefetch
2
Software and Services Group
CPU Architecture Contribution
• Large Instruction cache
– Capture a sophisticated code loop, esp database
• Share last level cache across cores
– Nehalem added this for I & D
– When lacking, a copy per core of I, and data lock lines
have to move between caches
• Integrated Memory Controller
– Big win for latency in Nehalem
• QPI for socket to socket cache line movement
– Introduced in Nehalem, faster than FSB
3
Software and Services Group
CPU Architecture Contribution
• Improvements in branch prediction
– Successful prediction of more complex branching
structures
• Total number of outstanding cache line reads per
socket
– Improved in Nehalem
– Exploited by Out of Order execution
– Exploited by Hyper Threading (database benchmarks
usually enable and win)
– Opportunity to tune data structures for parallel reading
4
Software and Services Group
System Architecture Contribution
•
•
•
•
Larger physical memory
Faster memory (lower latency)
Faster I/O, and more ports, for data movement
SSDs – big boost to IOPS (I/Os per second)
– Filesystem read/write is usually small and scattered
– No big sequential ops
• Faster networking
5
Software and Services Group
Summary
• Large & shared cache
• Latency reduction with Integrated Memory
Controller, and QPI socket to socket
• Total number of outstanding reads
• Branch prediction
• Storage configured for IOPS
6
Software and Services Group
7
Software and Services Group