Networks-on-Chip Seminar contents  The Premises  Homogenous and Heterogeneous Systemson-Chip and their interconnection networks  The Network-on-Chip approach Slide from S.

Download Report

Transcript Networks-on-Chip Seminar contents  The Premises  Homogenous and Heterogeneous Systemson-Chip and their interconnection networks  The Network-on-Chip approach Slide from S.

Networks-on-Chip
Seminar contents
 The Premises
 Homogenous and Heterogeneous Systemson-Chip and their interconnection networks
 The Network-on-Chip approach
Slide from S. Tota and M. R. Casu [1]
The premises
 The System-on-Chip (SoC) today




Heterogeneous ~10 IP’s
Homogeneous (MP-SoC) ~ 10 uP (with exceptions)
On-Chip BUS (AMBA, Core Connect, Wishbone, …)
IP and uP are sold with proprietary Bus IF
 Near and long-term forecast
  100 IP/uP: Busses are non scalable!
 Physical Design issues: signal integrity, power
consumption, timing closure
 Clock issues: Is time for the Globally Asynchronous
paradigm? (Still locally synchronous)
 Need for “more regular” design
Slide from S. Tota and M. R. Casu [1]
Heterogeneous Today’s SoC
CPU
DSP
MEM
Interconnection network (BUS)
Embedded
FPGA
Slide from S. Tota and M. R. Casu [1]
Dedicated
IP
I/O
Maya (Rabaey’00)
Slide from S. Tota and M. R. Casu [1]
Maya (Rabaey’00)
Slide from S. Tota and M. R. Casu [1]
Maya (Rabaey’00)
Slide from S. Tota and M. R. Casu [1]
Homogeneous SoC (MP-SoC)
CPU
MEM
CPU
MEM
CPU
MEM
CPU
MEM
Interconnection network (BUS, XBAR)
CPU
MEM
Slide from S. Tota and M. R. Casu [1]
CPU
MEM
CPU
MEM
CPU
MEM
MP-SoC: Cisco CRS-1 Router
CRS-1 Router uses 188
extensible network
processors per “Silicon
Packet Processor” chip
Slide from S. Tota and M. R. Casu [1]
MP-SoC: Cisco CRS-1 Router
CRS-1 Router uses 188
extensible network
processors per “Silicon
16 PPE Clusters
Packet Processor”
chip
of 12 PPEs each
Slide from S. Tota and M. R. Casu [1]
Very long wires
Year 2005
1 ns (1 GHz)
Year 2010
0.1 ns (10 GHz)
B
B
A
Slide from S. Tota and M. R. Casu [1]
A
Bus pros () and cons ()
 Every unit attached adds parasitic capacitance, therefore
electrical performance degrades with growth.
 Bus timing is difficult in a deep submicron process.
 Bus arbiter delay grows with the number of masters. The
arbiter is also instance-specific.
 Bandwidth is limited and shared by all units attached.
 The silicon cost of a bus is small.
 Any bus is almost directly compatible with most available
IPs, including software running on CPUs.
 The concepts are simple and well understood.
Slide from S. Tota and M. R. Casu [1]
What are NoC’s?
 According to Wikipedia:
 “Network-on-a-chip (NoC) is a new paradigm
for System-on-Chip (SoC) design. NoC basedsystems accommodate multiple asynchronous
clocking that many of today's complex SoC
designs use. The NoC solution brings a
networking method to on-chip communications
and claims roughly a threefold performance
increase over conventional bus systems.”
Slide from S. Tota and M. R. Casu [1]
NoC exemple
Processor
Master
Processor
Master
Routing
Node
Processor
Master
Processor
Master
Routing
Node
Processor
Master
Routing
Node
Global
Memory
Slave
Processor
Master
Global I/O
Slave
Routing
Node
Routing
Node
Routing
Node
Global I/O
Slave
Processor
Master
Processor
Master
Routing
Node
Slide from S. Tota and M. R. Casu [1]
Processor
Master
Routing
Node
Routing
Node
Basic Ingredients of a NoC
 N Computational Resources
 Processing Elements (PE)







1 Connection Topology
1 Routing technique
M  N Switches
N Network Interfaces
1 Addressing system
1 Communication Protocol
1 Programming model
 Message passing
 Shared Memory
Slide from S. Tota and M. R. Casu [1]
Problems
 Internal network contention causes (often
unpredictable) latency.
 The network has a significant silicon area.
 Bus-oriented IPs need smart wrappers.
 Software needs clean synchronization in
multiprocessor systems.
 System designers need reeducation for
new concepts.
Slide from S. Tota and M. R. Casu [1]
Network on Chip (NoC)
 Adoption of networkbased packet
communication paradigm.
 Use abstraction and
layering to decouple the
communication issue from
computation
 Distribute the
responsibility of reliable
transmission evenly over
higher and lower layers of
abstraction
Slide from L. Benini [2]
Software
Application
systems
Architecture and
control
• Transport
• Network
• Data link
Physical wiring
Protocol stack abstraction
Benini & De Micheli, Computer 2002
Physical layer - Synchronization
 Physical design:




Voltage levels
Driver design
Sizing
Physical routing
 Synchronization: How and when to sample the channel?
 Avoid a clock: asyncronous communication
 The clock travels with the data
 The clock can be reconstructed from the data
 Synchronization recovery has a cost
 Cannot be abstracted away
 Can cause errors (e.g., metastability)
Slide from L. Benini [2]
Data-link layer
Provide reliable data transfer on an
unreliable physical channel
Access to the communication medium
 Dealing with contention and arbitration
Issues
 Fairness and safe communication
 Achieve high throughput
 Error resiliency
Slide from L. Benini [2]
Topologies
 Heritage of networks with new constraints
 Need to accommodate interconnects in a 2D layout
 Cannot route long wires (clock frequency bound)
a)
b)
c)
d)
e)
f)
SPIN,
CLICHE’
Torus
Folded torus
Octagon
BFT.
Slide from S. Tota and M. R. Casu [1]
Topologies
 Comparison of topologies according to different QoS parameters.
Throughput as a function of number of IPs.
Topologies
 Comparison of topologies according to different QoS parameters.
Drop probability as a function of number of IPs.
Topologies
 Comparison of topologies according to different QoS parameters.
Latency as a function of number of IPs.
Switching
 Again, techniques inherited from Computer and
Communication Networks
 New constraints in silicon: area and power
 Use as few buffers as possible
 Store & Forward and Virtual-Cut-Through
 Need buffers size for an entire packet, unsuited!
 Limited buffer size in
 Wormhole
 Deflection Routing, a.k.a. “Hot Potato”
 Virtual channels
 Increase buffer size…
Slide from L. Benini [2]
Switching
 Classification of Switching Techniques :
Routing
 Deterministic vs. Adaptive
 Simplify/Complicate routing logic
 Easy/Uneasy deadlock free
 Prone/Robust to congestion
 2D dimension order routing (XY) most
used static routing in NoC (e.g. with
Wormhole and Mesh)
Slide from L. Benini [2]
Routing
 Classification of Routing Algorithms :
Transport layer
Decompose and reconstruct information
Important choices
 Packet granularity
 Admission/congestion control
 Packet retransmission parameters
(Ex.:Timeout)
All these factors affect heavily energy and
performance
Application-specific schemes vs. standards
Slide from L. Benini [2]
Flow control
Determines how resources are allocated to
packets moving in the network.
Classification of Flow Control Algorithms :
System software
Programming paradigms
 Shared memory
 Message passing
Middleware:
 Layered system software
 Should provide low communication latency
 Modular, scalable, robust ….
Slide from L. Benini [2]
Who first had the idea?
 The most referred papers according to
Google (#cit.)
 Guerrier’00 (204), A Generic Architecture for OnChip Packet-Switched Interconnections
 Dally’01 (392), Route Packets, Not Wires: On-Chip
Interconnection Networks
 Benini’02 (417), Networks on Chips: A New SoC
Paradigm
 Kumar’02 (184), A Network on Chip Architecture
and Design Methodology
Slide from S. Tota and M. R. Casu [1]
Some NoC References
 J. Rabaey et al., “A 1-V heterogeneous reconfigurable DSP IC for wireless baseband
digital signal processing,” IEEE Journal of Solid State Circuits, Vol. 35, No. 11, Nov.
2000, pp. 1697 - 1704
 P. Guerrier and A. Greiner, “A Generic Architecture for On-Chip Packet-Switched
Interconnections,” Proc. Design and Test in Europe (DATE), pp. 250-256, Mar. 2000.
 A. Adriahantenaina et al., “SPIN: a Scalable, Packet Switched, On-chip Micro-network,”
Proc. Design and Test in Europe (DATE), Mar. 2003.
 L. Benini and G. De Micheli, “Networks on Chips: A New SoC Paradigm,” Computer,
vol. 35, no. 1, Jan. 2002, pp. 70-78.
 S. Kumar et al., “A network on chip architecture and design methodology,” in Proc.
ISVLSI, 2002.
 W. J. Dally and B. Towles, “Route packets, not wires: on-chip interconnection
networks,” in Proc. Design Automation Conf., 2001.
 K. Goossens et al., “Trade-offs in the design of a router with both guaranteed and besteffort services for networks on chip,” IEE Proc.-Comput. Digit. Tech., Vol. 150, No. 5,
Sep. 2003, pp. 294-302.
 P.P. Pande et al., “Performance Evaluation and Design Trade-offs for Network-on-Chip
Interconnect Architectures,” IEEE Trans. Computers, vol. 54, no. 8, Aug. 2005, pp.
1025-1040.
Slide from S. Tota and M. R. Casu [1]
References
1.
2.
S. Tota and M. R. Casu Sergio Tota and Mario R. Casu, “Networks-onChip,” presentation.
www.tlc.polito.it/~nordio/seminars/2006_05_05_Casu.ppt
L. Benini, “Networks on chip,” presentation,
http://www.ida.liu.se/~petel/NoC/lecture-notes/lect2.pdf