Transcript 740_2.ppt
Resource Replication
6 Integer Units
4 FP units
8 Sets of architectural registers
100+100 Renaming registers (Int/FP)
HW Context (PC, Return Stack etc.)
Ports in I-cache
Replication (Contd)
Per-thread mechanism for
Pipeline Flushing
Instruction Retirement
Trapping
Precise Interrupts
Thread Identifier in BTB, TLB
Inter-thread Interference
Increases with #threads
1.4% (2 thread) 4.8% (4) 5.3% (8)
Does not hurt much
0.1% performance degradation
Why?
L1 misses covered by L2 misses
Out of order execution, write buffer,
multi thread
Memory Requirement
Increases with number of threads
Mostly for L1 Bank Conflict
Memory requirement doubles as number of
threads go from 1 to 8
Multiple Thread
Long L1 cache line
Longer cache line has better locality
Overall performance degrades by 3.4%