MIT Roofnet - The Center for Wireless Communications

Download Report

Transcript MIT Roofnet - The Center for Wireless Communications

Networks-on-Chip
ECE 111
1
Many-Core Processor Roadmap
• Number of cores Quadrupling every 3 years
Research
Industry
‘02
‘05
‘08
‘11
‘14
16
4
64
16
256
64
1024
256
4096
1024
Source: Agarwal, MIT
Intel’s
80 core
Cisco’s 188 core
Tilera’s 100 core2
SoC & Many-Core Convergence
• Application-Specific Systems-on-Chips (SoCs) are evolving to
look like many-core processors with custom hardware cores
Source: Arvind, MIT
General-purpose
processors
Application-specific
processing cores
On-chip
memory cores
Structured on-chip
interconnection
network
3
The Need for On-Chip Networks
Compute or Memory Core
Router
•
•
•
•
Scalable communication
Efficient use of wires
Modular design
A new way to organize and build VLSI systems
4
Power and Performance Both Critical
Compute or Memory Core
Router
• For most applications, low energy consumption and highperformance are both 1st order design goals !
• e.g. 28% of total power in Intel 80-core Teraflops chip is due
to interconnection networks (routers + links);
• Network latency plays central role in performance
5
Flits
• Packets decomposed into “flits”
– Basic data units
– Flit size usually the same as “bus width”, say 128-bits
– Head flit carries destination address information
• Flow control
– For on-chip networks, data loss is usually not acceptable
– Credit-based flow control is used to ensure next-hop router
has buffer size before a flit gets forwarded
• Flits proceed forward like a “train” or “worm”
through a path of routers
– No need for store-and-forward, which leads to much lower
latencies
6
Using Virtual Channels to Avoid Deadlocks
Source: Becker et al, Stanford
• Coupling between channels and buffers causes head-of-line
blocking as well as deadlocks
– Adds false dependencies between packets, leading to possibly
deadlocks
– Limits channel utilization
– Increases latency
• Solution:
– Implement virtual channels (VCs)
7
VC Router Pipeline
• Route Computation (RC)
– Determine candidate output
port(s) and VC(s)
– Can be precomputed at upstream
router (lookahead routing)
Per packet
• Virtual Channel Allocation (VA)
– Assign available output VCs to
waiting packets at input VCs
• Switch Allocation (SA)
– Assign switch time slots to
buffered flits
• Switch Traversal (ST)
– Send flits through crossbar switch
to appropriate output
Per flit
Source: Becker et al, Stanford
8
Allocation Basics
• Arbitration:
– Multiple requestors
– Single resource
– Request + grant vectors
• Allocation:
– Multiple requestors
– Multiple equivalent resources
– Request + grant matrices
• Matching:
– Each grant must satisfy a request
– Each requester gets at most one grant
– Each resource is granted at most once
Source: Becker et al, Stanford
9
Separable Allocators
• Matchings have at most one grant per row
and per column
• Implement via to two phases of arbitration
– Column-wise and row-wise
– Perform in either order
– Arbiters in each stage are fully independent
• Fast and cheap
• But bad choices in first phase can prevent
second stage from generating a good
matching!
Input-first:
Output-first:
Source: Becker et al, Stanford
10
Oblivious Routing
• Route packets without
knowledge about the state
(e.g. congestion) of the
network
• Objectives
– Maximize worst-case and
average-case throughput
– Minimize latency (hop count)
11
Routing Algorithm Affects Channel Loads
Source: Seo et al, Purdue
12
DOR
• DOR = Dimension Ordered Routing (XY Routing)
Destination
Y
Source
X
• Minimal latency, no path diversity; hence poor worst case
throughput and average case throughput
13
VAL
• VAL = Valiant’s routing algorithm
Random
Intermediate
node
Destination
Y
Source
X
• Average latency twice as DOR, optimal worst-case
throughput, poor average-case throughput
14
ROMM
• ROMM
Intermediate
node within
bounding box
Bounding
Box
Source
Destination
Y
X
• Minimal latency, good average throughput, but poor worstcase throughput
15
O1TURN
• O1TURN = Orthogonal 1 TURN (X-Y and Y-X routing with equal
probability)
Destination
Y
Source
X
• Minimal routing, optimal worst-case throughput, good
average-case throughput
16
Worst-Case Throughput Trends
Source: Seo et al, Purdue
17
Average Case Analysis
Source: Seo et al, Purdue
18
Comparison
DOR
Minimal
hop count
Worst-case
Throughput
Average-case
Throughput
VAL
X
ROMM O1TURN
X
X
X
X
X
X
X
19