Transcript 740_2.ppt

Resource Replication






6 Integer Units
4 FP units
8 Sets of architectural registers
100+100 Renaming registers (Int/FP)
HW Context (PC, Return Stack etc.)
Ports in I-cache
Replication (Contd)

Per-thread mechanism for





Pipeline Flushing
Instruction Retirement
Trapping
Precise Interrupts
Thread Identifier in BTB, TLB
Inter-thread Interference



Increases with #threads
 1.4% (2 thread)  4.8% (4)  5.3% (8)
Does not hurt much
 0.1% performance degradation
Why?
 L1 misses covered by L2 misses
 Out of order execution, write buffer,
multi thread
Memory Requirement

Increases with number of threads


Mostly for L1 Bank Conflict




Memory requirement doubles as number of
threads go from 1 to 8
Multiple Thread
Long L1 cache line
Longer cache line has better locality
Overall performance degrades by 3.4%