Networks-on-Chip Seminar contents The Premises Homogenous and Heterogeneous Systemson-Chip and their interconnection networks The Network-on-Chip approach Slide from S.
Download
Report
Transcript Networks-on-Chip Seminar contents The Premises Homogenous and Heterogeneous Systemson-Chip and their interconnection networks The Network-on-Chip approach Slide from S.
Networks-on-Chip
Seminar contents
The Premises
Homogenous and Heterogeneous Systemson-Chip and their interconnection networks
The Network-on-Chip approach
Slide from S. Tota and M. R. Casu [1]
The premises
The System-on-Chip (SoC) today
Heterogeneous ~10 IP’s
Homogeneous (MP-SoC) ~ 10 uP (with exceptions)
On-Chip BUS (AMBA, Core Connect, Wishbone, …)
IP and uP are sold with proprietary Bus IF
Near and long-term forecast
100 IP/uP: Busses are non scalable!
Physical Design issues: signal integrity, power
consumption, timing closure
Clock issues: Is time for the Globally Asynchronous
paradigm? (Still locally synchronous)
Need for “more regular” design
Slide from S. Tota and M. R. Casu [1]
Heterogeneous Today’s SoC
CPU
DSP
MEM
Interconnection network (BUS)
Embedded
FPGA
Slide from S. Tota and M. R. Casu [1]
Dedicated
IP
I/O
Maya (Rabaey’00)
Slide from S. Tota and M. R. Casu [1]
Maya (Rabaey’00)
Slide from S. Tota and M. R. Casu [1]
Maya (Rabaey’00)
Slide from S. Tota and M. R. Casu [1]
Homogeneous SoC (MP-SoC)
CPU
MEM
CPU
MEM
CPU
MEM
CPU
MEM
Interconnection network (BUS, XBAR)
CPU
MEM
Slide from S. Tota and M. R. Casu [1]
CPU
MEM
CPU
MEM
CPU
MEM
MP-SoC: Cisco CRS-1 Router
CRS-1 Router uses 188
extensible network
processors per “Silicon
Packet Processor” chip
Slide from S. Tota and M. R. Casu [1]
MP-SoC: Cisco CRS-1 Router
CRS-1 Router uses 188
extensible network
processors per “Silicon
16 PPE Clusters
Packet Processor”
chip
of 12 PPEs each
Slide from S. Tota and M. R. Casu [1]
Very long wires
Year 2005
1 ns (1 GHz)
Year 2010
0.1 ns (10 GHz)
B
B
A
Slide from S. Tota and M. R. Casu [1]
A
Bus pros () and cons ()
Every unit attached adds parasitic capacitance, therefore
electrical performance degrades with growth.
Bus timing is difficult in a deep submicron process.
Bus arbiter delay grows with the number of masters. The
arbiter is also instance-specific.
Bandwidth is limited and shared by all units attached.
The silicon cost of a bus is small.
Any bus is almost directly compatible with most available
IPs, including software running on CPUs.
The concepts are simple and well understood.
Slide from S. Tota and M. R. Casu [1]
What are NoC’s?
According to Wikipedia:
“Network-on-a-chip (NoC) is a new paradigm
for System-on-Chip (SoC) design. NoC basedsystems accommodate multiple asynchronous
clocking that many of today's complex SoC
designs use. The NoC solution brings a
networking method to on-chip communications
and claims roughly a threefold performance
increase over conventional bus systems.”
Slide from S. Tota and M. R. Casu [1]
NoC exemple
Processor
Master
Processor
Master
Routing
Node
Processor
Master
Processor
Master
Routing
Node
Processor
Master
Routing
Node
Global
Memory
Slave
Processor
Master
Global I/O
Slave
Routing
Node
Routing
Node
Routing
Node
Global I/O
Slave
Processor
Master
Processor
Master
Routing
Node
Slide from S. Tota and M. R. Casu [1]
Processor
Master
Routing
Node
Routing
Node
Basic Ingredients of a NoC
N Computational Resources
Processing Elements (PE)
1 Connection Topology
1 Routing technique
M N Switches
N Network Interfaces
1 Addressing system
1 Communication Protocol
1 Programming model
Message passing
Shared Memory
Slide from S. Tota and M. R. Casu [1]
Problems
Internal network contention causes (often
unpredictable) latency.
The network has a significant silicon area.
Bus-oriented IPs need smart wrappers.
Software needs clean synchronization in
multiprocessor systems.
System designers need reeducation for
new concepts.
Slide from S. Tota and M. R. Casu [1]
Network on Chip (NoC)
Adoption of networkbased packet
communication paradigm.
Use abstraction and
layering to decouple the
communication issue from
computation
Distribute the
responsibility of reliable
transmission evenly over
higher and lower layers of
abstraction
Slide from L. Benini [2]
Software
Application
systems
Architecture and
control
• Transport
• Network
• Data link
Physical wiring
Protocol stack abstraction
Benini & De Micheli, Computer 2002
Physical layer - Synchronization
Physical design:
Voltage levels
Driver design
Sizing
Physical routing
Synchronization: How and when to sample the channel?
Avoid a clock: asyncronous communication
The clock travels with the data
The clock can be reconstructed from the data
Synchronization recovery has a cost
Cannot be abstracted away
Can cause errors (e.g., metastability)
Slide from L. Benini [2]
Data-link layer
Provide reliable data transfer on an
unreliable physical channel
Access to the communication medium
Dealing with contention and arbitration
Issues
Fairness and safe communication
Achieve high throughput
Error resiliency
Slide from L. Benini [2]
Topologies
Heritage of networks with new constraints
Need to accommodate interconnects in a 2D layout
Cannot route long wires (clock frequency bound)
a)
b)
c)
d)
e)
f)
SPIN,
CLICHE’
Torus
Folded torus
Octagon
BFT.
Slide from S. Tota and M. R. Casu [1]
Topologies
Comparison of topologies according to different QoS parameters.
Throughput as a function of number of IPs.
Topologies
Comparison of topologies according to different QoS parameters.
Drop probability as a function of number of IPs.
Topologies
Comparison of topologies according to different QoS parameters.
Latency as a function of number of IPs.
Switching
Again, techniques inherited from Computer and
Communication Networks
New constraints in silicon: area and power
Use as few buffers as possible
Store & Forward and Virtual-Cut-Through
Need buffers size for an entire packet, unsuited!
Limited buffer size in
Wormhole
Deflection Routing, a.k.a. “Hot Potato”
Virtual channels
Increase buffer size…
Slide from L. Benini [2]
Switching
Classification of Switching Techniques :
Routing
Deterministic vs. Adaptive
Simplify/Complicate routing logic
Easy/Uneasy deadlock free
Prone/Robust to congestion
2D dimension order routing (XY) most
used static routing in NoC (e.g. with
Wormhole and Mesh)
Slide from L. Benini [2]
Routing
Classification of Routing Algorithms :
Transport layer
Decompose and reconstruct information
Important choices
Packet granularity
Admission/congestion control
Packet retransmission parameters
(Ex.:Timeout)
All these factors affect heavily energy and
performance
Application-specific schemes vs. standards
Slide from L. Benini [2]
Flow control
Determines how resources are allocated to
packets moving in the network.
Classification of Flow Control Algorithms :
System software
Programming paradigms
Shared memory
Message passing
Middleware:
Layered system software
Should provide low communication latency
Modular, scalable, robust ….
Slide from L. Benini [2]
Who first had the idea?
The most referred papers according to
Google (#cit.)
Guerrier’00 (204), A Generic Architecture for OnChip Packet-Switched Interconnections
Dally’01 (392), Route Packets, Not Wires: On-Chip
Interconnection Networks
Benini’02 (417), Networks on Chips: A New SoC
Paradigm
Kumar’02 (184), A Network on Chip Architecture
and Design Methodology
Slide from S. Tota and M. R. Casu [1]
Some NoC References
J. Rabaey et al., “A 1-V heterogeneous reconfigurable DSP IC for wireless baseband
digital signal processing,” IEEE Journal of Solid State Circuits, Vol. 35, No. 11, Nov.
2000, pp. 1697 - 1704
P. Guerrier and A. Greiner, “A Generic Architecture for On-Chip Packet-Switched
Interconnections,” Proc. Design and Test in Europe (DATE), pp. 250-256, Mar. 2000.
A. Adriahantenaina et al., “SPIN: a Scalable, Packet Switched, On-chip Micro-network,”
Proc. Design and Test in Europe (DATE), Mar. 2003.
L. Benini and G. De Micheli, “Networks on Chips: A New SoC Paradigm,” Computer,
vol. 35, no. 1, Jan. 2002, pp. 70-78.
S. Kumar et al., “A network on chip architecture and design methodology,” in Proc.
ISVLSI, 2002.
W. J. Dally and B. Towles, “Route packets, not wires: on-chip interconnection
networks,” in Proc. Design Automation Conf., 2001.
K. Goossens et al., “Trade-offs in the design of a router with both guaranteed and besteffort services for networks on chip,” IEE Proc.-Comput. Digit. Tech., Vol. 150, No. 5,
Sep. 2003, pp. 294-302.
P.P. Pande et al., “Performance Evaluation and Design Trade-offs for Network-on-Chip
Interconnect Architectures,” IEEE Trans. Computers, vol. 54, no. 8, Aug. 2005, pp.
1025-1040.
Slide from S. Tota and M. R. Casu [1]
References
1.
2.
S. Tota and M. R. Casu Sergio Tota and Mario R. Casu, “Networks-onChip,” presentation.
www.tlc.polito.it/~nordio/seminars/2006_05_05_Casu.ppt
L. Benini, “Networks on chip,” presentation,
http://www.ida.liu.se/~petel/NoC/lecture-notes/lect2.pdf