ICARO: Congestion Isolation in Networks-On-Chip

Download Report

Transcript ICARO: Congestion Isolation in Networks-On-Chip

José Vicente Escamilla
José Flich
Pedro Javier García
1



Introduction / Motivation
ICARO overview
ICARO description
◦ Detection
◦ Notification
◦ Isolation



Results
Conclusions
Questions
2
MPSoC
CMP



CMP and MPSoCs use a network to
interconnect nodes
Network performance degradation
due to:
 Power saving mechanisms
(DVFS)
 Bursty traffic patterns
 Heterogeneous systems
designs
Performance degradation may
lead to congestion
Tile-Gx (72 cores)
3



ICARO does not remove congestion. ICARO
isolates it.
Two types of traffic
 Congested
 Non-congested
Goal: To isolate congested traffic from noncongested one in order to avoid HoL-Blocking.
4

RCA, P. Gratz et al.
◦ Redirects traffic at each router based on congestion metrics.
◦ Metrics are piggybacked.
 Vicious cycles may be created.

“Prediction-based Flow Control for Network-on-Chip Traffic”, U.
Ogras et al.
◦ Injection control based on prediction-models.
◦ Prediction-model uses links status sent through a dedicated network.
 Injection throttling may produce performance oscillations.

AVADA/FVADA, Yi Xu et al.
◦ Map different flows to different queues based on the output port
requested in the next router (lookahead routing).
 Require lookahead routing and credit-based flow control.
 Congested flows and non-congested ones may share queues, generating
HoL-blocking in some degree since the mapping policy only consider one
hop of the message path.
5
Credits=0
Credits=2
6

ICARO uses two types of Virtual Networks
(VNs)
◦ Regular VN: Non-congested traffic
◦ Extra VN: Congested traffic

Three stages:
◦ Detection
 Congestion is detected at routers.
◦ Notification
 Routers notify to all Networks Interfaces (NIs).
◦ Isolation
 NIs isolate congested traffic from not-congested one.
7
Regular VN queue
Extra VN queue
SW0
SW1
NI 1
NI 0
SW4
SW5
SW8
SW6
NI 7
NI 10
SW11
NI 11
SW14
SW13
NI 13
SW7
SW10
SW9
SW12
SW3
NI 3
NI 6
NI 9
NI 8
NI 12
NI 2
NI 5
NI 4
SW2
NI 14
SW15
NI 15
8



It is performed at routers
Detects congestion points ({router, port} pairs)
When a message arrives/leaves
◦ Buffer saturation checking
 If buffer.level > HIGH_THR such buffer is marked as saturated.
 If buffer.level < LOW_THR such buffer is marked as NOTsaturated (hysteresis).
 If any of the buffers of an input port is marked as saturated the
whole input port is marked as well.
◦ Congestion checking
 Requests from saturated input ports against each output port
are computed
 Each output port requested by more than 1 saturated input port
is marked as congested
9


Segmented ring connecting routers and NIs
Network width (wires)
log 2 (N) + p + 1
N=Number of nodes
p=Router radix

Process:
◦ Notifications are injected to the register (when it is free).
◦ Notifications are delivered from a register to the next one at each cycle.
◦ Notifications are discarded when reach their origin register.
10
SW0
SW1
SW2
SW3
CNN out
CNN in
R
e
g
SW4
SW5
SW6
SW7
SW9
SW10
SW11
SW12
SW13
SW14
SW15
out
in2
Notification
Reception
NI 7
SW8
in1
Notification
Injection
SW 7
Register
Notification
11


Notifications are stored in a cache memory.
SW
Port
5
E
10
S
Useless notifications are discarded
◦ Unreachable CPs
◦ Redundant notifications (merge)
13
SW0
SW1
SW2
SW4
SW5
SW6
SW7
SW8
SW9
SW10
SW11
SW12
SW13
SW14
NI 0
SW3
SW15
SW
Port
10
S
--
-NI 4
SW
Port
5
E
10
S
XY routing
14
SW0
SW1
SW2
SW3
SW4
SW5
SW6
SW7
SW8
SW9
SW10
SW11
SW12
SW13
SW14
SW15
XY routing
NI 4
SW
Port
10
5
S
E
{SW5, Port E} and {SW10, Port S}
{SW10, Port S} notification is IGNORED
notifications are MERGED
15


It is performed at NIs
Process:
◦ Initially all traffic is allocated into regular-VNs.
◦ At each cycle the post-processor module checks
messages at the header of all regular-VNs in
parallel.
◦ If the route crosses any of the CPs stored in the CPs
cache memory the message is reallocated into
extra-VNs.
16
Network Interface 4
Router 4
Regular-VN
dst:6
Regular-VN
dst:15 dst:12
out1
Post-processor
Arbiter
in
out2
Extra-VN
Extra-VN
CPs Cache
SW
Port
5
E
17

Simulation:
◦


NoC simulator developed in our research group.
Parameter
Value
Topology
8x8 2D mesh
Routing
XY
Switching
Wormhole (flit-level switching)
Flow control
Credits
Flit size
128 bits
Message size
5 flits
Traffic
0.3 f/c (background) + 1 f/c (hotspot 4-to-1, from cycle 10k to 20k)
Compared against FVADA/AVADA with different number of virtual queues
◦
◦
FVADA: Restricted to 4 VCs
ICARO: Uses <x> VNs instead of VCs
◦
Tools used:
Overheads analysis:



Synthesis: Design vision (Synopsys)
Place & Route: Encounter (Cadence)
Library: 45nm Nangate Open Cell (typical conditional)
18
2VC/VN
4VC/VN
8VC/VN
19


Area overhead: ~6%.
Power overhead: varies from 6% to 10%.
20


Area overhead: varies from 3,8% to 6%
Power overhead: varies from 4,5% to 5,4%.
21

Conclusions:
◦ A mechanism to avoid HoL-Blocking on networkson-chip has been presented.
◦ ICARO manages to isolate harmful traffic from nonharmful one by using VNs achieving an overall
latency improvement of up to 82%.

Future work:
◦ To analyze hierarchical CNN to improve scalability.
◦ To implement in-order delivery support
22
Questions?
23