Control Update - Applied Research Lab

Download Report

Transcript Control Update - Applied Research Lab

Substrate Control: Overview
Fred Kuhns
[email protected]
Applied Research Laboratory
Washington University in St. Louis
[email protected]
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
Overview
• Last time
– control software architecture
– component responsibilities
– basic abstractions: meta-interface and tunnels; TCAM,
slice-oriented view of lookup filters and example IPv4
LPM entries
• This time
–
–
–
–
SW architecture update
assignments and current efforts
assigning bandwidth and queue weights
allocating code options and NPE resources
Fred Kuhns - 7/7/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
2
System Block Diagram
Substrate Control Daemon (SCD)
Boot and Configuration Control (BCC)
RTM 10 x 1GbE
RMP
…
…
vnet
SPI
LC
NAT & Tunnel
filters (in/out)
SCD
flow stats
xscale xscale
(netflow)
NPU-B
NPU-B
GE
NPU-A
TCAM
SCD
xscale xscale
RTM
NPE
ARP Table
FIBPCI
TCAM
NMP
PCI
GPE
GE
NPU-A
GPE
user slivers
NPE
pl_netflow
NPE
sppnode.txt
ReBoot
how??
External Interfaces
SPP Node
bootcd
cacert.pem
boot_server
plnote.txt
PLC
…
…
Power Control
Unit
(has own IP
Address)
SPI
interfaces
Hub
Fabric Ethernet Switch (10Gbps, data path)
move pl_netflow to cp?
Base Ethernet Switch (1Gbps, control)
manage LC Tables
Control Processor (CP)
System Node Manager (SNM)
Resource DB
BCC
tftp,
dhcpd
routed* sshd* httpd*
Standalone GPEs
I2C
(IPMI)
nodeconf.xml
Slivers DB
boot files
System Resource Manager (SRM)
Fred Kuhns - 7/7/2015
route DB
user info
Shelf manager
flow stats
All flow
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
monitoring done at Line Card
3
The SPP Node
• Slice instantiation:
– Allocate VM instance on a GPE
– may request code option instance, NPE NPE
resources and interface bandwidth
• Share a common set of (global) IP
addresses
GPE
FPx
NMP
TCAM
vmx
app
GPE
RMP
mi-mux
SCD
planetlab OS
fabric
– UDP/TCP port space shared across GPE/NPEs
• Line card TCAM Filters direct traffic
SRAM
NPE
code option
Egress
local delivery and
exceptions (UDP Tunnel)
Ingress
…
lookup
table
SCD (ARP, NAT)
SNM
…
…
IP route
– send unregistered traffic originating outside
and ARP
the node to CP.
SRM CP
Ingress
SCD
LC
– unregistered traffic originating within node uses
NAT (on line card)
Internet
– application may register server ports. Causes filter to be inserted in the line card directing traffic to
specific GPE
– application must register ports (or tunnels) associated with fast path instances
…
• It is assumed that fast path instances will use tunnels (overlays) to send traffic between
routing nodes.
– Currently we only support UDP tunnels but will extend to include GRE and possibly others.
Fred Kuhns - 7/7/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
4
Key Software Control Components
Primary Hub
Fabric SW
Base SW
SFP
Allocate VLANs,
enable ports, stats
snmpd
(Logical Slot 1, Channel 1)
vlan
Table
vlan
Table
XFP
node components not in hub
(switch, external GPEs, Development Hosts)
Resource
DB
XFP
SRM
CP
SNM
Assign slices to GPE. Boot
management and PLC proxy.
Filter management, BW
allocations and stats
LC
SCD
MUX
TCAM
Resource
allocations and
slice bindings
Slice requests
to allocate or
free resources.
NPE
SCD
SRAM
FP
k
FP
kx
FP
Slice owned resource
management
NMP
RMP
vmx
control
SP
root context
planetlab OS
vnet
Exception and Local delivery traffic.
Includes shim header with RxMI.
TCAM
Fred Kuhns - 7/7/2015
GPE
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
5
Software Control Components
•
Utilities: parts of BCC to generate config and distribution files
–
–
•
Node configuration and management: generate config files, dhcp, tftp, ramdisk
Boot CD and distribution file management (images, RPM and tar files) for GPEs and CP.
Control processor:
–
–
–
–
Boot and Configuration Control (BCC): GPE’s retrieve boot images and scripts from BCC Todo, fredk
System Resource Manager (SRM): Runtime central resource manager
Working, fredk
System Node Manager (SNM): Interface with PlanetLab Central (PLC)
Working, fredk
http daemon providing a node specific interface to netflow data (planetflow)
•
–
–
•
User authentication and ssh forwarding daemon
Routing protocol daemon (BGP/OSPF/RIP) for maintaining FIB in Line Card
Local Boot Manager (LBM): Modified BootManager running on the GPEs
Resource Manager Proxy (RMP)
Node Manager Proxy (NMP), the required changes to existing Node Manager software.
Network Processor Element (NPE)
–
Substrate Control Daemon (SCD)
•
–
–
–
•
netflow runtime database and data management
General Purpose Element (GPE)
–
–
–
•
Working, fredk
Done, fredk
TCAM Library
kernel module to read/write memory locations (wumod)
Command interpreter for configuring NPU memory (wucmd)
Modified Radisys and Intel source; ramdisk; Linux kernel
Line Card
–
–
SCD: LC version of the SCD
ARP: protocol and error notifications. Lookup table entries either NH IP or enet addr
•
–
–
Sliver packets which can not be mapped to an Ehternet address must receive error notifications.
netflow-like stat collection and reporting to CP for web and PLC downloading
NAT lookup entries for unregistered traffic originating from GPE or CP
Washington
Fred Kuhns - 7/7/2015
WASHINGTON UNIVERSITY IN ST LOUIS
Pending, ???
Pending, ???
Pending, ???
Starting, mike
Working, fredk
Starting, mart
Testing, jas
Done, fredk
Done, fredk
Todo, fredk
Todo, mart
Pending, ???
Pending, ???
6
Slice-Centric View
Slicex
MI1 := {myIP, Port1}
...
MIn := {myIP, Portn}
– Get interface attributes:
{{ifn, type, ipaddr, linkBW, availBW}, ...}
– If peering then get peer’s IP address
– Allocate aggregate interface bandwidth
• Allocate external port number(s)
• Define meta-interfaces
GPE
Slicex control
application
BW1,min
MI2 (tunnel)
BW2,min
...
MIm
qj
ql
wrr
MIn (tunnel)
BWn,min
– add, remove, update, get, lookup
– Substrate remaps slice ids (qid, fid, mi,
stats) to global identifier
• Associate queues with meta-interfaces
– Substrate has to map meta-interface numbers
used in TCAM filters to the corresponding
local addresses
• Manage queue parameters, get queue length
Fred Kuhns - 7/7/2015
qGPE
qk
wrr
MI1 (tunnel)
• Manage TCAM filters
– Substrate adds line card filter(s)
– Slice may specify minimum BW
– threshold, bandwidth (weight)
TCAM
(Filters)
MI2
qj
...
• Allocate and free fast path: code option
instance, NPE resources and interface BW
• Manage interfaces
substrate
slice state
max
Buffers
VLAN
max weights
qparams
qlen, threshold, weight
qi
wrr
...
Fast path
Slicex
MI1
q0
...
SRAM DRAM
stats block block
• One-time or periodic statistics
– Periodic uses polled or callback model
• Read and write SRAM
– Substrate verifies address and length
– Extended to also support DRAM memory
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
7
RMP Interface
retcode alloc_fast_path(copt, atype, attrs)
qlen get_queue_len(qid)
retcode free_fast_path()
retcode write_fltr(fid, key[N], mask[N], result[M])
{entry,...} get_interfaces()
retcode update_result(fid, result[M])
fltr get_fltr(fid), fltr_t get_fltr(key[N])
entry = {ifn, type, ipaddr, linkBW, availBW}
entry get_ifattrs(ifn)
result[M] lookup_fltr(key)
ipaddr get_ifpeer(ifn)
retcode rem_fltr(fid), recode rem_fltr(key[N])
retcode alloc_ifbw(ifn, bw)
{uint32, tstamp} read_stats(index, location, what)
port alloc_port(ipaddr, port, proto)
handle create_periodic(id,P,cnt,type,loc,what)
port alloc_port(ipaddr, port, proto)
retcode delete_periodic(handle)
mi add_endpoint(ep_type, params, BW)
retcode set_callback(handle, xport)
mi add_udp_endpoint(ipaddr, port, BW)
stats_t get_periodic(handle)
{ep_type, params} get_endpoint(mi)
ret_t mem_write(offset[, len], data)
retcode bind_queue(mi, list_type, qid_list)
data mem_read(offset, len)
retcode set_queue_params(qid, thresh, weight)
{threshold, weight} get_queue_params(qid)
Fred Kuhns - 7/7/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
8
Short Term Milestones
•
21/09/07
–
–
•
28/09/07 (1 week delta)
–
–
–
•
–
–
–
SCD: alloc_fp/free_fp using tcamLib; retest asynchronous copt_freed(). Includes configuration file with target
search machines and default entries for both NPE and LC. mart, jonathon.
SCD: simple client driven tests of tcam operations (add, remove, update,lookup). mart, jonathon
SCD-SRM: fp_freed(xsid). Asynchronous free when slice queues are non-empty. fred, mart.
RMP-SRM-SCD: alloc_fp(...) and free_fp(). mike, mart, fred.
12/10/07 (1 week delta)
–
–
•
SRM-SCD: rem_fp(xsid); Does not include tcamLib. fred, mart.
tcamLib: API tests; config file with search machine and multiple DBs; reasonably complex DB (say ising
jdd’s configurations for SIGCOM). jonathon.
rudiments of SNMP interface to SRM. fred
05/10/07 (1 week delta)
–
•
SRM-SCD: alloc_fp(); Does not include tcamLib. fred, mart.
RMP-SRM: noop(). fred, mike.
RMP : send commands from slice to RMP using UNIX domain sockets. Map slice to its planetlab id (PlabID).
fred, mike
Configure HUB using snmp from srm: initialization, hardware discovery, add/remove VLAN. fred
19/10/07 (1 week delta)
–
–
–
IDT kernel module, locking. jonathon
SRM: Interface and bandwidth management. verify interface management with simple client: get_interfaces(),
get_ifattrs(), get_ifpeer(),alloc_ifbw(). fred
RMP-SCD: tcam operations: write_filter(), update_result(), get_filter(), lookup_fltr(), rem_fltr(). Must add
code to map MI in filter to internal representation and prepend the VLAN tag. mike, jonathon, mart.
Fred Kuhns - 7/7/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
9
Example Outlining Slice Interface and Abstractions
Slice Interface and Queue Allocations:
{Port, BW, QList}; Qlist = {{qid, weight, threshold},...}
NPE
wrr
q10
qid in 0...n-1
q11
...
FP slice1
q1n’
BW11
q20
qid in 0...m-1
...
FP slice2
LC
q21
q2m’
wrr
Physical Port (Interface)
Attributes:
{ifn, type, ipaddr,
linkBW, availBW}
ifn : Interface number
type: {Internet, Peering}
Operations:
get_interfaces()
get_ifattrs(ifn)
get_ifpeer(ifn)
alloc_ifbw(ifn,bw)
FP1
FP2
BW1
BW11 + BW21 = BW1
GPE
GPE
ipAddr
linkBW
BW21
Fred Kuhns - 7/7/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
10
• QM throughput estimates, up to 20 schedulers
– 5 schedulers per microengine. 4 uengines
– 2.5 Gbps per microengine
– 1 Gbps per scheduler
• Add scd commands
– initialize static code option/substrate memory/tables
•
•
•
•
parse block
header format
queue manager
???
– load microengine code
• Use second VLAN tag to represent meta-interface
Fred Kuhns - 7/7/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
11
Single Interface Example
• LC Ingress
Ingress
wrr qxs1
qxs2
SchedNPE1
...
–
–
–
...
qps1
...
SchedGPE1
qpsn
qBE
wrr
...
At least one scheduler for each physical interface
One queue for each active slice with MI defined for the associated scheduler
One best effort queue for each board (GPE, CP, NPE?)
• NPE
–
–
–
Slice binds queues to meta-interfaces, hence physical interfaces
Slice either reserves BW on a physical interface or it is assigned the minimum
Substrate assigns a per interface maximum weight for each slice
• Substrate sets scheduler rates according to aggregate allocations
SchedCP
Egress
qs1
wrr
–
Manage scheduler rates to control aggregate traffic to interfaces and boards.
NPE
FP slice1
qs2
qid in 0...n-1
...
SchedI1
...
FP slicek
qid in 0...m-1
qsn
BWI1
qGPE
...
qCP
src addr
proto
port/icmp
Fred Kuhns - 7/7/2015
q11
w11
q12
w12
wrr
GPE
q1n
w1n
...
SchedI1
...
BWNPE1,GPE1
local delivery
and exception
VLAN
...
interface 1
qps2
One queue per slice with reserved bandwidth (really one per scheduler)
One queue for best effort traffic to each GPE
One scheduler for CP with queues for reserved traffic plus BE
• LC Egress
qxsn
dst addr
proto
port/icmp
–
–
–
qp1
wp1
qp2
wp2
BWNPE1,I1
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
...
LC
qpm
wpm
qw,GPE1
...
Total weight for all slices (i)
and queues (j) ≤ max weight
for interface I1 (W k)
w
i
i,j,I 1
NPE
wrr
 WI 1
j
w i,I 1  MTU 
...
slice’s minimum
allocated BW
BWi,I 1
BWI 1,min
scheduler rate
minimum weight = 1 MTU sized packet
12
Two Interface Example; Setting Queue Weights
wrr
NPE
Slice i, slice qid j and scheduler k.
q10
q11
...
q1n
to interface 1
FP slice1
LC
q20
FP1
q21
FP2
GPE
q2m’
BW1
IP1
linkBW
to interface 2
q10’
to interface 1
...
FP slice2
q1n’
qid in 0...m-1
q20’
to interface 2
wrr
q11’
FP1
BW12
FP2
GPE
BW2
IP2
linkBW
q21’
 Wk
w i,k
BWi,k 
BWk
Wk
MTU
BWk,min 
BWk
Wk
BWi,k
w i,k  Wk 
BWk
BWi,k
w i,k  MTU 
BWk,min
...
w i,j,k  MTU 
q2m’
wrr
Fred Kuhns - 7/7/2015
i,k
i
wrr
...
qid in 0...n-1
BW11
w
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
BWi,j,k
BWk,min
13
Allocating Code Option Instance
Slice to RMP:
npeIP alloc_fast_path(copt, atype, attrs)
uint16_t copt: NPE code option {IPv4=0,I3=1}
uint16_t atype: Reservation type {Shared = 0, Firm = 1}
uint32_t attrib[] : Array of resource allocation parameters:
attrib_t {uint32_t bw, pps; // bits/second, packets/second
uint32_t fltrs, queues, buffers, stats; // totals
uint32_t sram, dram; // memory block size in Bytes
}
RMP to SRM:
{xsid, npeIP} alloc_fast_path(PlabID, copt, atype, attrs)
uint32_t PlabID : GPE/PlanetLab slice identifier. The SRM allocates an internal
slice identifier unique within the SPP node. All substrate operations use the xsid.
SRM to SCD
set_fast_path(xsid,copt,VLAN,TParams,Mem)
uint16_t xsid; internal slice id.
uint16_t VLAN;
uint32_t TParams[] = {#Qs, #Fltrs, #Buffers, #Stats},
mem_t Mem[] = {SRAM:{Offset, Size},{DRAM:{Offset, Size}}
Fred Kuhns - 7/7/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
14
Allocating NPE (Creating
Meta-Router)
Cache
assigned
xsid.
Open
Allocate
code
option
{copt,
FP - fast path
NPE
SRAM
Control Interface
PE
TCAM
local socket
for
exception
and,
#fltrs
, #Qs
, #stats
local delivery
traffic;
to
#buffs
, SRAMreturn
, DRAM}
client vserver
GPE
NMP
tbl
lkup
Interfaces
...
Host
System
...
Fast Path
NPE Table
id:{addr,BW/Port,copts,fltrs,sram,Qs}
k
RMP
meta-ifaces
mi:endpoint
x
usedMaps OS
xsidMap
planetlab
availMap
controlIP GPE
BWmaps
1GbE (base, control)
Slice PlabID requests
Returns status and
code option copt with
assigned
global Port
x
resources
{params}
number
Slices
endpoint references
1
servMap
resvMap
3 SCD informing it2
4
Send message
to
x
x k {xsid,
x
VLAN
of the new allocation
VLAN, {params}} 10GbE (fabric, data)
5
VLAN maps
range:{start,end}
free {...}
root
context
endpoint
(port) maps
resvMap
...
Per Slice
vlan
xsid
plabID
Tables
gpe
NPE (allocated)
sram {start,size}
#flts
board id BW
plab sliceID
board ID
#Qs BW #Stats
SCD
FP
...
…
ifn:{type,ipaddr,linkBW,availBW}
(located
within
node)
tables
6
x
Substrate
LC
mux
MI1
CP
user login info
If sufficient resources available then assignSNM
PLC internal slice identifier (xsid) and associate
Allocate and Enable VLAN
with allocation {Slice, VLAN, NPE:{copt, #fltrs,
to isolate internal slice
#Qs, #stats, #buffs, SRAM, DRAM} , EP {}, MI{},
traffic, VLANk
GPE {IP, control Port}}
Fred Kuhns - 7/7/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
SRM
Resource DB
sliver tbl
15
SRM: Allocating NPE Resources
• Actions required to allocate code option instance
and resources:
– Select NPE
• Load balance across available NPEs: of eligible NPEs, select the
one with the greatest “head room”.
– Eligible if sufficient resources (SRAM, TCAM space, queues, etc)
– Select NPE with greatest firm BW and PPS. If tie then select
greatest available soft resources, else pick lowest numbered.
– Either allocates requested resources or returns error
• Keeps memory map of SRAM (and DRAM) so can perform
allocation, though the absolute starting address is not required.
• If compaction is necessary then must communicate with SCD
directly.
– Allocate VLAN and configure switch.
– Send command to selected NPE’s SCD
• set_fast_path(xsid,copt,VLAN,Params)
• SCD updates tables and creates local mappings.
Fred Kuhns - 7/7/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
16
Freeing Code Option Instance (Fast path)
• Slice to RMP and RMP to SRM:
– void free_fast_path() : requires asynchronous processing by SRM and SCD.
• SRM to SCD:
– {Success/Pending/Failed} rem_fast_path(xsid)
• SRM first sends request to SCD directing it to free all resources assigned to slice
xsid.
– the SCD first disables the fast path and GPE (how?) so no new packets will be
processed.
– The it checks all queues assigned to xsid. If all are empty then resources are freed
and a Success response is sent to the SRM
– Else, if packets are in any of the queues then the SCD must send a Pending response
to the SRM and periodically check all queues assigned to xsid. When they are all
empty the SCD sends a asynchronous Successful-deallocation message (which
includes the Slice’s xsid)to the SRM notifying it that all resources associated with
xsid are now free.
• If the SCD returns Success the the SRM marks the resources as available and
removes the slice from its internal xsid (fast path) tables).
• If the SCD returns Pending then the SCD registers a call back method which is
called when the SCD sends the resource freed message.
• Regardless of whether the resources are freed immediately or asynchronously
the SRM returns Success to the RMP.
Washington
Fred Kuhns - 7/7/2015
17
WASHINGTON UNIVERSITY IN ST LOUIS
Comments
•
•
•
•
•
•
•
Pace SCD message processing
Drop threshold using packets or packets and length
Limit BW over allocations
Use fast path, not slice
GPE traffic to NPE turned off when freeing fast path
How long to wait for Q’s to drain
Turn off FP using vlan table
Fred Kuhns - 7/7/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
18