Document 7847750
Download
Report
Transcript Document 7847750
Virtual Circuit Tree Multicasting:
A Case for On-Chip Hardware
Multicast Support
Natalie Enright Jerger,
Li Shiuan Peh, and Mikko Lipasti
University of Wisconsin – Madison and Princeton University
Executive Summary
Demonstrate necessity of multicasting on-chip
State of the art router insufficient
Significant number of proposals could leverage
multicasting
Provide efficient multicasting solution using
Virtual Circuit Trees
Overlay logical routing trees on mesh network
6/24/2008
Reduces interconnect latency by up 90%
Reduces switching activity by up to 53%
Enright Jerger - ISCA 2008
2
Packet-Switched Unicast Router
3 stage packet-switched router
Based on most aggressive recent proposals
Buffer Write
Virtual
Switch
Channel/
Traversal
Switch
Allocation
Router
Router
6/24/2008
Link
Switch
Traversal
Traversal
Link
Traversal
Link
Link
Aggressive baseline not well matched all
types of communication
Multicast is performed using multiple unicasts
Enright Jerger - ISCA 2008
3
State-of-the-Art Router
Interconnect Latency
20
18
16
No MC
14
MC (1%)
12
MC (5%)
10
MC (10%)
8
6
0%
10%
20%
30%
40%
50%
Network Load (% of Link Capacity)
Current router architecture poorly equipped to
handle even a low amount of multicast (MC) traffic
6/24/2008
Enright Jerger - ISCA 2008
4
Outline
Motivation
VCTM Implementation
Multicasting Scenarios
6/24/2008
Baseline router problems
Example
Architecture
Description
Characterization
Evaluation
Conclusion
Enright Jerger - ISCA 2008
5
Baseline Router Example
A
VCs
1A
2A
1B
2B
B
VCs resourcesVCs
More
to solve Busy
this VCs
problem?
X
1C
2C
1D
2D
More buffers, virtual channels, links?
VCs
C
6/24/2008
Enright Jerger - ISCA 2008
D
6
Key Router Problems
Redundant (wasteful) use of
Injection Bandwidth:
resources:
payload occupying
Asame
B
Burst of messages
at network
interface
extra buffers, links
VCs
2A
2B
VCs
1A
2C
1C
2D
X
VCs
Busy
Alternative
routing:
1B
Improve
1D throughput,
but wastes power
VCs
VCs
Speculation Problems:
predicated on low loads
Burst of messages
6/24/2008
C
Enright Jerger - ISCA 2008
D
7
Virtual Circuit Tree Multicasting
Overview
Builds on existing state-of-the-art router
Unicast performance is not impacted
Build
Fewer packets
multicast
trees incrementally
Tree improves
reuse is speculation
necessary
M: <East,
M:
<East>
South>
M: <Eject,
M:
<Eject>
South>
1
M: <East>
Multicast from 0 to <2,4,5>
for effectiveness
2
Build Tree Incrementally (Tree M)
0
1
2
M
C
B
A
M
M
Significant
temporal destination set
3 Unicast
Setup Packets
(1 per destination)
reuse acrossLink
all scenarios
Redundancy
problem
3 Packets Injected Injection
into Network
Removed
solved
A
2
3
4
5
3
4
B
4
C
5
6/24/2008
M: <Eject>
Enright Jerger - ISCA 2008
M: <Eject>
8
VCTM Router Architecture
Virtual Circuit Tree Table
Src
VCTnum
Id
Ej
N
S
E
W
Fork
Virtual Channel Allocator
.
.
.
0
1
0
1
Switch Allocator
1
0
3
VC 0
VC 0 VC 0
Input
Ports
0
MVC MVC
0
VC x
VC 0
VC x
VC x
6/24/2008
MVC 0
Enright Jerger - ISCA 2008
9
Implementation Details (1)
Destination Set Content Addressable Memory
1
1
5
4
0
0
0
1
1
0
1
2
1
3
0
6/24/2008
0
1
0
0
1
0
0
1
1
0
0
1
0
1
0
0
0
1
0
1
0
Destination Set <5,4,2>
2
Encode Tree ID
2 into multicast
header
If not present replace oldest tree perform
setup
Enright Jerger - ISCA 2008
10
Implementation Details (2)
VCTs provide routing not resources
Multicast arbitration same as unicast
Multiple arbitration steps at tree branch
6/24/2008
VCTs do not pre-allocate resources
If one desired output is blocked, other tree branch
outputs can still proceed
Longer buffer occupancy
Enright Jerger - ISCA 2008
11
VCTM Overhead
Virtual Circuit Tree Routing Tables
Number of Entries
Area (mm2)
Energy (nJ)
512
0.024
0.002
1024
0.041
0.002
2048
0.078
0.003
Destination Set CAMs
Number of Entries
Area (mm2)
Energy (nJ)
32
0.018
0.007
64
0.021
0.010
128
0.029
0.017
6/24/2008
Access Time < 1 cycle
Enright Jerger - ISCA 2008
12
Outline
Motivation
VCTM Implementation
Multicasting Scenarios
6/24/2008
Baseline router problems
Example
Architecture
Description
Characterization
Evaluation
Conclusion
Enright Jerger - ISCA 2008
13
Multicasting Scenarios (1)
Token Coherence [Martin, 2003]
TokenB: Broadcast for tokens
SGI Origin Directory Protocol [Laudon, 1997]
Multicast invalidate requests
Opteron Protocol [Conway, 2007]
Coherence requests sent to ordering point and
broadcast to all cores
6/24/2008
1 Token to read
All Tokens to write
Some filtering of destinations
Enright Jerger - ISCA 2008
14
Multicasting Scenarios (2)
Region Multicasting
Two level protocol
TRIPs [Sankaralingam, 2003]
Operand network
Multicast results of instructions to tiles containing
dependent instructions
6/24/2008
1st level: Multicast to sharers of address region
2nd level: Fall back on directory when no region
information available
35% of dynamic instructions have 2 or more future
uses
Enright Jerger - ISCA 2008
15
Multicasting Scenarios (3)
Uncorq [Strauss, 2007]
Virtual Hierarchies [Marty, 2007]
1st level directory
2nd level global broadcast
Dynamic NUCA caches [Kim, 2002]
6/24/2008
Unordered broadcast, ordered response
network
Multicast for cache hit
Enright Jerger - ISCA 2008
16
Characterizing Multicasts
100%
Up to 13% of traffic is multicast
100
90
80
70
60
50
40
30
20
10
0
80%
100
150
Number of Unique Destination Sets
6/24/2008
Enright Jerger - ISCA 2008
15,16
11,14
7,10
3,6
2
1
Token
50
Opteron
0
Directory
Token 60%
VCTM is an inexpensive
solution to
Opteron
40%
support
TRIPS multicasting
Region
RegionMulticast:
and
Directory:
Region 20%
Wide
variety
ofvariety
sizes of
Much
larger
Directory
0%
destination sets
Region
Multicast Coverage
Unique Destination Sets: combination of
destinations
multicast
Token:
TRIPs and
1in
destination
Directory:
TokenB and Opteron:
Small
setof
for
destination
each node
sets
Large destination sets
Number
Destinations
per multicast
TRIPS
17
Simulation Methodology
Network traffic from 5 different scenarios
Detailed network simulator
Flexible, lightweight VCTM mechanism
provides improvement for diverse scenarios
6/24/2008
Cycle-accurate modeling of router stages
Many more results in paper
Enright Jerger - ISCA 2008
18
Network Configuration
Topology
4-ary 2-mesh
5-ary 2-mesh (TRIPs)
Routing
Dimension Order: X-Y Routing
Channel Width
16 Bytes
Packet Size
1 flit (Coherence request = Address + Command)
5 flits (Data)
3 flits (TRIPs)
Virtual Channels
4
Buffers per port
24
Router ports
5
Virtual Circuit Trees
Varied from 16 to 4K
(1 to 256 VCTS/core)
6/24/2008
Enright Jerger - ISCA 2008
19
Power Savings
On-chip networks consume up to ~36% of chip
power [Wang, 2002]
Links, buffers and crossbars consume nearly 100% of
network power
Power saved through activity reduction
60
50
40
30
20
10
0
Directory
6/24/2008
TokenB
Region
Enright Jerger - ISCA 2008
TRIPs
Crossbar
Buffer
Link
Crossbar
Buffer
Link
Crossbar
Buffer
Link
Crossbar
Buffer
Link
Crossbar
4096
16
Buffer
% Usage Reduction
Link
Opteron
20
Performance Results Summary
Normalized Interconnect
Latency
SPECweb:
12%
Art: 55%
1,2
1
0,8
TPC-H: 68%
0,6
0,4
0,2
Directory
TRIPs
Opteron
Region Multicast
TokenB
0
0
16
32
64
128
512 2048 4096
Number of Virtual Circuit Trees
Small number of trees required for majority of benefit
Performance improvement depends on network pressure
6/24/2008
Enright Jerger - ISCA 2008
21
20
18
16
14
12
10
8
6
4
2
0
Opteron (FMM)
TRIPS (bzip2)
Region (TPC-H)
Token (Barnes)
Wide Injection
Port + Infinite
VCs + Adaptive
Routing
Dir (specWEB)
Interconnect Latency
VCTM vs. Aggressive Network
VCTM w/ 512
Trees
VCTM outperforms aggressive (unrealistic) network
6/24/2008
Enright Jerger - ISCA 2008
22
VCTM Summary (1)
Improves performance across a variety of
scenarios
Small number of trees necessary
6/24/2008
Reduces interconnect latency by up 90%
Reduces switching activity by up to 53%
8 trees/core achieves substantial benefit
Dynamic table partitioning could further reduce
total tree storage
Enright Jerger - ISCA 2008
23
VCTM Summary (2)
Outperforms aggressive router
No impact on unicast performance
Integrates with existing state-of-the-art
router architecture
6/24/2008
Easily extendable to more scalable topologies
and routing algorithms
Open door for new optimizations
Enright Jerger - ISCA 2008
24
Thank you
6/24/2008
Questions
Enright Jerger - ISCA 2008
25