Endsystem Support for Network Virtualization

Download Report

Transcript Endsystem Support for Network Virtualization

Diversifying the Network Edge
Fred Kuhns
[email protected]
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
Host and LAN Support for Network Diversification
• Motivation:
– solution to network ossification: difficult to field new protocols
or technologies which address limitations in current data
networks
– create common substrate layer over which new networking
protocols, services and technologies may be deployed
– common substrate layer provides virtualized links, routers and
end systems.
– key issue is how to realize virtualization with isolation at the
network edge (LAN and end system)
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
2
Introduction - Diversified networking at the edge
•
Isolating vNet traffic in the LAN
–
–
–
–
–
–
•
Define substrate packet format and protocol
Reserve portion of LAN bandwidth for vNET traffic
Determining topology and available bandwidth
Establish substrate links between send systems and substrate routers
Establishing virtual links between virtual end systems and associated virtual routers
Mechanisms for realizing BW reservation
Isolating vNet traffic in the end system
– Supporting the common substrate layer: managing the network resource
•
•
•
•
network interface access control
bandwidth allocation and enforcement
delivering to neighbor
management and control interface (accounting, configuration)
– OS extensions to support new networking protocol instantiation and isolation
•
•
•
•
specifying and enforcing isolation
maintain kernel integrity
required safety and liveness properties of protocols
mechanisms to guard against ill-behaved vNet protocol instances: due to unsafe behavior (bugs, malicious) or
excessive resource use
• mechanisms for protocol developers to use for enforcing safety/security
• optimizing performance
• software development environment for protocols (TBD)
– User versus kernel space protocol implementation and necessary kernel support
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
3
Context: Network Diversification (vNets)
substrate
link
substrate
router
virtual
router
virtual
link
virtual
end-system
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
4
substrate
link
substrate
router
virtual
router
virtual
link
virtual
end-system
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
5
Concepts
•
Intranet versus Internet:
– intranet (no routing) use existing model and protocols
– internet (routing) use diversified networking model
•
Diversified Networking Model:
– multiple networks coexisting within common infrastructure (virtual networks or vNets)
– each distinct network instance operates as though it has dedicated resources (non-interfering)
– vNet specific routers (virtual routers) interconnected through simplex, point-to-point links
(virtual links)
– common substrate layer used for delivering vNet packets to neighbor (provides a simple wirelike service)
•
Current model:
– Dominant networking protocol: IP
– Shared, heterogeneous physical networks (ATM, Ethernet, Frame Relay, wireless, packet over
SONET, etc.)
– Links interconnecting packet switches
– Interconnection links may be tunneled (Link Virtualization) through intermediate devices:
ATM, Packet over SONET (or PPP-over-X), MPLS.
•
Challenges at the network edge:
–
–
–
–
partition LAN into virtual links and access routers
end-system support for virtual networks
isolation mechanisms for virtualized resources
bind virtualized resources to network instances
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
6
Terminology
• Network Diversification:
– Virtual Network (vNet): distinct vNets coexist within a common physical network
– Diversification layer: common substrate layer, provides isolation and point-to-point link services
– vNet is composed of one or more virtual routers (VR) interconnected by virtual links. Virtual routers
and links are direct corollaries to their physical counterparts … Network resources are virtualized.
– An end-system implements vNet protocols and provides connectivity services within a virtualized
network protocol environment (virtual end-system). The virtual end-system provides mechanisms for
protocol implementation, resource control and isolation.
• Diversification layer provides two levels of abstraction (i.e. two core services):
– Substrate: encapsulate existing layer 1 and layer 2 technologies and provide a single, consistent
framework for implementing virtualized links and routers.
substrate link: abstraction to provide similar behavior as a point-to-point connection between
communicating end points. Provides isolation services to different virtual networks using a common
substrate link.
substrate router: A physical device which forwards network traffic based on its vNet membership.
Provides sharing and isolation services to disparate vNets and hosts virtual routers.
– Virtual: framework providing a simple model and set of interfaces for implementing virtual
networks. The model defines virtual routers, end-systems and links. The goal is for virtual inks to
and routers to behave similar to their physical counterparts.
virtual link: simulates the behavior of a dedicted point-to-point link interconnecting virtual end
points (virtual routers and/or virtual end systems). A virtual link is implemented by one or more
substrate links.
virtual router: implements a particular vNet’s routing logic. The underlying substrate router
provides the necessary isolation and resource management functions.
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
7
Related Work
• Current virtualization efforts on the end system are driven by the desire to
support many concurrently running, non-interfering, secure server
applications
– The goal is to completely isolate applications running on a common hardware
platform. It appears to each application as though it is running on a dedicated
platform (hardware and operating system).
– The framework enforces resource constraints and access controls
– In this model the isolation is complete and transparent
– each operational environment appears as a complete end system with independent
operating system instances.
– however this is too course grained for our purposes where we want to support
multiple networks per OS instance.
– Mention VMWare, Xen, Denali
• Network Protocol extension or composition:
– xkernel, spin
• Protocol development environment and patterns
– ???
• Extensible Operating Systems (loading extensions into kernel), see next slide
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
8
Related Work: Extensible Operating Sytstems
• Issues:
– safety, liveness, performance
• Techniques:
– Safe Execution Environment/Virtual machines: Java, KoffeOS, packet
filters
– Language based (type safety): OKE, mobile code (STP), SPIN,
– Proofs: proof carrying code (PCC)
– Software Fault Isolation (SFI): VINO
– Hardware Fault Isolation (HFI): kernel plugins, Denali, XEN, Exokernel,
Palladium, NOOKS. See VMM next page.
• we focus on two approaches:
– kernel extension to support simple interpreted environment (packet
filtering) with protocols implemented in user space
– sandbox for in-kernel protocol implementations using a type safe
language and run-time support. In the sprit of OKE and mobile code (with
concepts from OKE
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
9
Modeling the LAN Environment
• The effort to provide a simple, common infrastructure layer for creating new or
specialized networks has parallels in operating system and middleware research.
Both attempt to offer two key services[1]:
– Resource management: time and space sharing (multiplex resources);
synchronization and deadlock handling (buffers, link access, link BW, nonpreempted transmission of packet); accounting and status
– User friendliness: convenient and consistent operational environment (see the many
RFCs); error detection and handling; protection and security; fault tolerance and
failure recovery.
[1] Singhal, Shivaratri, Advanced Concepts in Operating Systems, McGraw-Hill, 1994
• A core technique is to export an extended, virtualized machine providing the
illusion of dedicated resources (though the level of abstraction and degree of
virtualization differ between systems)
– extended machine: abstraction to deal with complexity
– virtual machine: controlled sharing
• Define an administrative entity to represent clients (of the service).
– For example, operating systems define processes to represent resource ownership
and protection domains. An IP network may define flows, or flow aggregates, to
represent an abstract client to which resources (buffers and bandwidth) are assigned.
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
10
LAN Virtualization
• Goal: enable unrelated entities, vNets, to transparently share a common set of
underlying resources. Similar to how processes transparently share the
underlying computer platform.
• Abstract Resources (to create the extended Net): links, routers, end-systems
• Virtualization (make the virtual resource behave as through they were real,
physical devices):
– End-system: network subsystem interface, protocol implementation, device
interface for point-to-point links
– LAN: links, switches and buffers
– WAN/MAN: LANS, packet switches (beyond scope of this ppt)
• We would like to virtualize LAN resources such that registered vNets and local
traffic are isolated.
• As an example we consider an Ethernet LAN: We can realize this with Ethernet
and IEEE standards 802.1P/Q (VLANs and Priorities):
– star topology
– tree topology
– layered tree (with priorities)
• If a virtual link must pass through an existing IP router (the vNet router is not
directly attached to the same LAN) then tunnels may be used: IPIP, GRE,
MPLS etc.
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
11
Simulates Star Topology for Substrate Links
…
VLANX1
Internetworking over a diversified network
Substrate function with Ethernet:
• Substrate links: use VLANs to provide the equivalent
of a virtualized “wire” connecting an endsystem to a
specific substrate router.
• Sharing and Isolation:
- All vNet traffic use assigned VLANs
- Use priority queuing (802.1P/Q)
- All intranet traffic uses lower priority queues.
• Resource management:
- LAN: Use admission control (static or dynamic) to
provide bandwidth guarantees to vNet traffic.
- End system: Substrate layer on end-system enforce
per VLAN and per vNet bandwidth constraints
• Virtual links: In this simple example there is exactly
one virtual link for each substrate link.
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
VLANX2
VLANXN
switched LAN
vNetX
VR1
• Each host to substrate router connection is
assigned a distinct VLAN. So N hosts implies
N VLANs on Ethernet.
• Alternative is to define one VLAN tree for
each protocol suite (i.e. vnet).
12
Traffic isolation with priority aware substrate
…
Ethernet Hub
with High and Low
Priority TX queues
Low
High
Low
Low
High
Low
vNet traffic to High
otherwise Low
High
Local control/management;
Legacy internet traffic
vNet traffic (internet)
all vNet traffic
vNetX
VR1
Local traffic (intranet)
Fred Kuhns - 7/16/2015
High
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
13
Substrate Link as a VLAN Tree
…
Internetworking over a diversified network
Substrate function with Ethernet:
• Substrate links: The VLAN creates a tree
interconnecting all end-systems to the substrate router.
Substrate end-point then uses the VLAN tag and
source/destination address to realize the logical pointto-point substrate link.
• Sharing and Isolation:
- no change from substrate star topology. The only
difference is the shared VLAN domain. Scheme
provides traffic isolation.
• Resource management:
- Same
• Virtual links: Same.
Fred Kuhns - 7/16/2015
ethernet switched LAN
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
VLANX
14
…
…
switched LAN
switched LAN
VLANX
VLANdgram
VLANhigh
VLANmed
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
15
Multiple Substrate Links
…
Internetworking over a diversified network
Substrate function with Ethernet:
• Substrate links: Three VLAN trees are used for all
virtual net traffic to/from a substrate router:
- Low priority: default for best-effort traffic
- Medium priority for virtual nets with soft
performance requirements (average bandwidth)
- High priority for isochronous or low-delay,
interactive applications
• Sharing and Isolation: See above.
• Resource management: See above
• Virtual links: Same.
Fred Kuhns - 7/16/2015
ethernet switched LAN
VLANdgram
VLANhigh
VLANmed
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
16
Multiple vNets per Host
…
virtual interface
substrate interface
ether
addr/vlan
ether
addr/vlan
VLAN1
The full model:
• Substrate link: connects end-system to substrate router.
Virtualization of a physical cable or wire. A packet
enters one end, exists the other and is opaque within.
- Simplex or Duplex?
• Substrate interface: end-system abstraction
- Ethernet: <interface, VLAN, dst_addr>
- tunnel: MPLS, IP, IPsec, L2TPv3, GRE, AToM
- Layer 2: ATM, others?
• Virtual link: Logical interconnection (virtual wire) of
adjacent vNet nodes.
- Point-to-point, Simplex or Duplex?
• Virtual interface: end-system abstraction representing
one end of a virtual link. Substrate defines mechanism
for multiplexing onto common substrate link. For
example a virtual link identifier (VLI) in a substrate
header
- Simplex or Duplex?
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
ether
addr/vlan
VLAN2
VLAN3
ethernet LAN
VLAN tag and dst addr
identify substrate
router. VLI tag
used to router pkt
substrate interfaces
VLI
VLI
VLI
VR1
virtual interface
VR1
VLI
VLI
17
…
substrate
interface
ether
addr/vlan
SL1
SL2
SL3
vNet1
Ethernet LAN
vNet2
vNet3
substrate
interfaces
SR1
VLI
virtual
interface
VLI VLI
VR1
VR1
VLI
VLI
SR2
VR
Fred Kuhns - 7/16/2015
VR
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
SR3
VR
18
SR1
VR
VR SR2
VR
VR
SR3
substrate
interfaces
SR4
VLI
VLI VLI
VR
VR
vNet1
virtual
interface
VLI VLI
vNet2
vNet3
SR5
VR
Fred Kuhns - 7/16/2015
VR
VR
SR6
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
19
Multiple next hop VRs
Host A
member of
vNetX and vNetY
substrate
router 2
vNetX
VR2
VLI1
enetAddrSR2
enetAddrA
VLANA2
Multiple Next Hop Virtual Routers:
• Substrate link: per end-system, substrate router pair.
• Substrate interface: three substrate interfaces:
ethernet switched LAN
SI1 = <eth0, VLANXA1, enetAddrSR1>
SI2 = <eth0, VLANXA2, enetAddrSR2>
VLANA1
SI3 = <eth0, VLANXA3, enetAddrSR3>
• Virtual link: Logical point-to-point connection between
virtual end-system and access virtual router. Since we model
enetAddrSR1
a point-to-point link there is no need for link addresses.
VLI1
VLI2
• Virtual interface: Representation of virtual link on the endsystem. The substrate assigns a per substrate link, virtual link
vNetX
vNetY
identifier (VLI) for each virtual link.
VR1
VR1
VI1 = <SI1, VLI1>
substrate
VI2 = <SI1, VLI2>
router 1
VI3 = <SI2, VLI1>
VI4 = <SI3, VLI1>
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
substrate
router 3
vNetX
VR3
VLI1
enetAddrSR3
VLANA3
20
Host A
member of
vNetX and vNetY
SR2
enetAddrA
vNetX
VR2
VLI1
enetAddrSR2
SR3
vNetX
VR3
VLI1
enetAddrSR3
VLANA3
VLANA2
ethernet switched LAN
VLANA1
SR1
VLI1
enetAddrSR1
VLI2
vNetX
VR1
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
vNetY
VR1
21
TCP/IP as an Example Protocol
destination
prefix
gateway
(router address)
192.168.12.0/24
*
(default)
virtual interface
substrate
interface
ll_info
0.0.0.0
eth0
ARP
192.168.12.254
vint0
(eth0,VLAN,ethDst)
VLI
vNet Protocl = IP
vNet
framework
vint0
VLANX
eth0
standard ethernet
Interface
…
eth0
direct connect
ethernet device
VLANX
Substrate Interface:
Directly connected: destination IP address + ARP = enet addr
Gateway: (Gateway’s IP + ARP = enet addr) + VLAN
Virtual Interface:
Directly connected: Not used, model only for internetworking
Gateway: VLI assigned by substrate.
How is this integrated into the current ARP/route interface?
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
ethernet LAN
ethernet
dest. addr
VLAN
VLI
VLI
Substrate Router
SR1
VLI
IP
22
Using Tunnels for the substrate layer
• Need to look into the various tunneling approaches/protocols.
How can we leverage these?
–
–
–
–
–
–
–
–
MPLS and MPLS VPNs
Generic Routing Encapsulation (GRE): RFC 2784
Point-to-point tunneling protocol (PPTP)
Secure VPN
Any transport over MPLS (AToM)
IP tunnel
IPsec VPNs
Layer 2 Tunneling Protocol version 3 (L2TPv3)
• version3 is a draft standard
• RFC 2661: Layer 2 tunneling protocol
– 802.1Q Tunneling: Cisco 802.1Q-in-Q VLAN Extension Services
• What about MPLS over IP tunnels: what was done there?
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
23
Supporting Diversified Networking on the End System
•
vNet framework
–
–
–
–
–
–
–
–
–
•
substrate layer design and implementation on end system. Policies.
integration with existing networking subsystem and isolation mechanisms
packet processing and forwarding rules for both substrate and diversified networking layer. Includes address resolution rules and
techniques.
how do we coordinate substrate and vNet link establishment? VLAN label assignments, substrate router address (IP? ethernet?), VLI
assignments?
establishing links and assigning identifier and integrating with existing network infrastructure/tables.
controlling bandwidth allocations and link access
Supporting the common substrate layer: managing the network resource
what accounting functions are needed?
What control interface is exported?
OS extensions to support new networking protocol instantiation and isolation
–
to what degree do we “protect” the kernel?
–
specifying and enforcing isolation
•
•
•
•
–
–
•
performance: interface access and bandwidth; CPU; buffer (buffer hoarding)
kernel integrity: corrupt data structs, exceptions, unauthorized access, improper interface use, other safety issues
other vNet protocol instance integrity: vNet instance may be able to corrupt another module but not the kernel
do we attempt to monitor network traffic to ensure one vNet instance is not masquerading as another? Or other types of abuse?
techniques to require/enforce safety and liveness properties of protocols – or to detect violations (prevent or recover)
•
•
•
•
–
–
–
buggy code? malicious code? the more protection the greater the performance hit.
type-safe compiler and run-time checks
hardware fault isolation
software fault isolation
cross our fingers
optimizing performance
for user space protocols use safe execution environment for interpreting packet filters
for kernel space protocols use ???
Software development environment for protocols
–
–
–
utility libraries and wrappers
patterns and OO models
compositional techniques or even automation
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
24
Background: Traditional Commodity OS Environments
•
Traditional general purpose operating system
– Process model (resource ownership and execution context): associate programs with resource
usage (allocation, scheduling, access control, synchronization) and accounting (historical data).
– Isolation and accounting falls on this process boundary (or possibly thread)
– the process model as implemented is not good at capturing resource usage resulting from hidden
scheduling (kernel performing work for a process asynchronously such as when network packets
are received)
– likewise, the trust model assume the OS kernel is trustworthy which may not be true for
dynamically extensible systems
– the virtualization and scheduling of the CPU and memory is well developed (out of necessity)
however managing I/O access and bandwidth is a more recent concern
•
With the increasing importance of networking and multimedia new techniques have been
developed to manage I/O access and bandwidth
– Network transmit bandwidth is typically managed with the use of packet classifiers (map packet
to flow or flow aggregate) and queuing disciplines. This allocation and accounting model differs
from the process centric model.
– Disk I/O scheduling shifted from simply optimizing overall throughput to ensuring time critical
operations completed on time.
– For the majority of desktop systems network bandwidth is not a limiting factor (1Gbps
interfaces are common on new systems). Rather memory and disks remain the critical
performance bottleneck.
– Much research and design has been directed at managing either per process or per Flow (or flow
aggregate) I/O usage. Neither is the correct approach for this effort were we want per vNet
resource management.
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
25
OS Kernel Block Diagram
User Space (Applications)
AST Processing
File Interface
ops
FS management
open
files
…
TCPn
UDP
RAW IP
1
callback
task management
util
tasks
Interrupt Processing
TCP module
TCP
TCP2
Basic I/O Interface
buffer
cache
op
s
Socket Interface
hardware independent layer
Device independent I/O
hardware dependent layer
uart
Hardware
timer
Fred Kuhns - 7/16/2015
scheduler
SW int
(AST)
TCP
poll
callout Q
IP
TC/
AST
routes
qdisc
clock handler
process accounting
scheduling
time management
device driver
configuration: registers, MMU (TLB, cache, VM) bus
and peripherals
System Exception handlers
OS ISR demux
Washington
HW interrupt/Exception
WASHINGTON UNIVERSITY IN ST LOUIS
core ethernet
ethernet
device
txqueue driver rxqueue
26
End-System Support for Network Diversification
•
•
What needs to change?
Process model: (Applications and programs need not change): No
– process model is sufficient for application isolation
•
Trust model (is network subsystem in trusted?): Yes
– current trust model is not good: need to dynamically load/unload new protocols which may
not be trusted. Even user space applications will require mechanisms in the kernel to ensure
non-interference
•
Resource Management for the Network Subsystem: Yes
– Network subsystem degree of isolation is not longer adequate. vNet protocols must be
separately contained, isolated, identifiable, preemptable and cancelable.
– Network and processor usage accounting is not adequate. We need to keep track of per
vNet resource usage and constraints. asynchronous network events (aka hidden scheduling)
must be properly accounted for and scheduled (per vNet basis).
•
User friendliness (for the Network Subsystem - vNets): Yes
–
–
–
–
Provide simple mechanisms for adding, removing new vNet protocol instances.
Convenient environment for implementing, testing and debugging new protocols.
Support per vNet protection boundaries
mechanisms for implementing different security policies both within a given vNet and
between different vNets.
– Ensure system as a whole is not adversely impacted by faulted or poorly implemented
protocols
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
27
Virtual End System
• Comments and assumptions
– assume that the creation/deletion of new vNets is infrequent
– an application may open connections on one or more different vNets
– unrelated applications must be able to engage in IPC using any available
mechanism (pipes, shared memory, TCP/IP etc)
– continue support for IP. In fact, IP can be considered to be the least
common denominator network instance. We could use the existing IP
network for control to establish and/or manage vNets.
– support both user and kernel space protocol instances
– provide isolation and resource guarantees on a per vNet basis
– poorly behaved protocol instances (for a given vNet) will be detected,
stopped and expelled from an end system. Applications using this
protocol stack will be informed via a socket error return value.
– intra-VN, implementers should have the mechanisms to support QoS and
Security – what are they?
– simple mechanism for adding new protocols/VNs
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
28
Block Diagram
vNet3
vNet2
vNet1
vNet Framework
application
TCP/IP
protocol stack
vNet mux/demux
proto mux/demux
network device
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
29
vNet3
vNet2
vNet1
vNet Framework
application
TCP/IP
protocol stack
vNet mux/demux
proto mux/demux
network device
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
30
User or kernel Space protocols?
• Each has pros and cons
• User space protocols:
– easier to implement and debug
– easier to introduce new protocols (not tightly dependent on socket layer
knowing about the new protocol)
– easier to isolate and protect protocols and apps from each other (leverage
process model)
• kernel level protocols
– easier to integrate into existing framework (simplifies support for system
interface functions like select/poll)
– simplifies intra-protocol security and protection (since protocol runs within
trusted kernel)
– simplifies (well, more direct) kernel demultiplexing to correct protocol
context (endpoint)
– increased efficiency
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
31
User Space Protocol Implementation
• Uncommon outside of high-performance community, they want
zero-copy and specialized demux keys.
• Problems: asynchronous processing, life cycle, authentication and
demultiplexing to endpoints
– latency in delivering packets (i.e. acks) to user space
– increased overhead in per packet processing before a drop/keep decision is
made
– processing received acks
– timeouts and retransmissions
– establishing connections and security: snooping, masquerading
– supporting select and poll
– protocols where connection may outlive process (TCP’s TIMED_WAIT)
– global routing and address resolution tables
– global connection tables
• need to know what other ports are being used (locally)
• accepting/rejecting new connections
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
32
user-space protocols: Global Issues
•
Routing: Direct packets to/from correct endpoint/interface
– How is traffic demultiplexed and sent to the correct endpoint/process?
• In-kernel filters
– Where are the routing tables and how are they maintained?
• route fixed when connection established or located in shared memory
•
Control: I use IPv4 as an example
– Address resolution protocols/tables?
– Other control protocols. For example ICMP, IGRP, others?
– Where are the routing protocols implemented?
•
Management:
– Must manage a protocols namespace (for example, port numbers in IPv4).
– Common programming technique, allow protocol instance to select local address part
• specify port = 0 and addr = 0 then implementation will assign correct values
– Passive connect model?
• In IPv4 a server listens on a port (host:port:proto) for a connection request. To establish a connection a
unique (to the endsystem) port number is assigned and new socket allocated.
– socket-oriented system calls must be supported. On UNIX must support non-blocking I/O with
select and poll.
– Connection lifetime may outlast process.
• For example TCP TIME_WAIT or simply waiting for a final ack or resending if no ack received.
•
Security: we must provide sufficient mechanisms for protocol developers
–
implementations must be able to guard against masquerading and eavesdropping
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
33
User Space: Configurations
• Given these global issues there are two likely
configurations:
– all traffic passes through common protocol daemon in user
space
– control daemon implements basic set of control functions while
user library implements majority of data path functions
– prior work has shown the latter approach to be superior.
• Having all traffic pass through a common protocol
daemon => at least one extra copy operation (kernel ->
daemon -> user process)
• A better solution is for a daemon to insert relatively
simple packet filters in kernel for established connections
which directs packets to/filters packets from endpoints.
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
34
User-Space: Passive Open
0. listen/accept
(passive open)
vnetX: application
protocol
library
vnetX
control daemon:
(namespace, lifecycle, connections)
4. new connection
data copy
socket layer
3. insert incoming and
outgoing filters for
vnetX connection
1. connection
request (in)
5. data, established
connections
compare against connection
specific outgoing filter
2. ack (out)
vnet demux
connection filters
ethernet
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
use VLI to access incoming filters
and use to demux to filter set and/or
socket.
35
User-Space: Active Open
0. connect
vnetX: application
protocol
library
vnetX
control daemon:
(namespace, lifecycle, connections)
4. new connection
data copy
socket layer
1. connection
request (out)
3. insert incoming and
outgoing filters for
vnetX connection
5. data, established
connections
compare against connection
specific outgoing filter
2. ack (in)
vnet demux
connection filters
ethernet
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
use VLI to access incoming filters
and use to demux to filter set and/or
socket.
36
User-Space: Datagram (Connectionless)
daemon fills in local address and binds to
socket. No restrictions on destination
0. open(any)
vnetX: application
protocol
library
vnetX
control daemon:
(namespace, lifecycle, connections)
2. new connection
socket layer
(local address)
1. insert incoming and
outgoing filters for
vnetX connection
data copy
3. data established
connections
compare against “connection”
specific outgoing filter
vnet demux
connection filters
ethernet
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
use VLI to access incoming filters
and use to demux to socket. In this
case only the local part is used.
37
User-Space: Datagram (Connectionless)
daemon fills in both local and destination
addresses. Destination restricted
0. open(local and remote addr)
vnetX: application
protocol
library
vnetX
control daemon:
(namespace, lifecycle, connections)
2. new connection(local and remote) data copy
socket layer
1. insert incoming and
outgoing filters for
vnetX connection
3. data established
connections
compare against “connection”
specific outgoing filter
vnet demux
connection filters
ethernet
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
use VLI to access incoming filters and
use to demux to socket.
38
User-Space: App exits
TCP enters TIME_WAIT after close
vnetX: application
protocol
library
vnetX
control daemon:
(namespace, lifecycle, connections)
socket layer
3. remove filters
1. connection
close (out)
2. ack (in/out)
vnet demux
connection filters
ethernet
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
drop
39
Considerations For Kernel Extensions
• Identified areas where modules may impact system behavior
– software bugs (implementation errors) which may result in kernel or another vNet protocol stack to becoming
corrupted.
•
•
•
•
•
•
•
dereference invalid pointer: corrupt kernel memory, cause exception (invalid address), read invalid data
incorrect parameter usage
indexing beyond end of an array
incorrect locking protocol or deadlock
overflowing stack (large local variables, recursion etc)
memory management errors: using freed memory, memory leaks, incorrect allocation sizes
not checking return values
– design errors leading to kernel corruption
•
•
•
misuse of kernel interfaces
improper control processing
improper data output
– performance/efficiency errors: use too many resources (buffers, I/O bandwidth, CPU cycles, locks, time)
•
•
•
adversely impacts kernel and application processes
adversely impacts other vNet protocol stacks
adversely impacts network traffic (remote hosts or network devices)
– security or protection violation either compromising confidentiality or altering data
•
•
•
unauthorized read/write of kernel/user data
unauthorized use or resource (invalid packets set on network)
unauthorized read/write on another vNet protocol stack environment
• possible Isolation mechanisms:
– static and dynamic enforcement of kernel module (interface) access restrictions
– Bounded (deterministic or limited)
•
•
•
•
•
buffers: common buffer pool but thresholds on number that can be in use at any one time. Easy for tx, what about receive (do we
drop packets)?
Bandwidth
Locks??
other resources?
hard/soft bounds? Deterministic or Statistical?
– ???
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
40
Pushing protocols into the Kernel
• Positives:
– All the issues associated with user-space protocol simply go
away. Global tables and lifetime of the kernel
– Performance, efficiency, existing code base
– Enhances intra-Protocol security
– Simplifies integration with existing network I/O subsystems and
interfaces
• Negatives:
– Isolation: More difficult to isolate system from protocol
instances. Inter-protocol isolation difficult.
– Security: Proving trust/security more difficult
– Implementation and debugging more difficult in kernel
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
41
Our Approach
• ???
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
42
Kernel-Space Protocols
Application(s)
Rework!
/dev/protoX
User Space (Applications) /dev/vnet
vnet:ep
File Interface
tcp:port
PF_VNET
Socket Interface
I/O Interface
buffer
cache
vnet
vnet:ep
…
udp:port rawIP
ops
FS management
open
files
…
TCP
TCP1
Socket I/O Interface
vnet ops
vnet Proto
state tables
…
vnet Proto
state tables
TCP/IP
IP
TCP2
PF_INET
…
TCPn
route to interface
UDP RAW IP
routes
SW Interrupt
HW Interrupt
ethetnet
vnet Demux
VLAN
Hardware
Fred Kuhns - 7/16/2015
eth device driver
eth0
HW interrupt/Exception
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
43
User Space Protocols
 Chandramohan A. Thekkath , Thu D. Nguyen , Evelyn Moy , Edward D.
Lazowska, Implementing network protocols at user level, IEEE/ACM
Transactions on Networking (TON), v.1 n.5, p.554-565, Oct. 1993
 Chris Maeda, Brian Bershad, Protocol Service Decomposition for HighPerformance Networking, Proceedings of the 14th ACM Symposium on
Operating Systems Principles. December 1993, pp. 244-255.
• Aled Edwards , Steve Muir, Experiences implementing a high
performance TCP in user-space, Proceedings of the conference on
Applications, technologies, architectures, and protocols for computer
communication, p.196-205, 1995
• Kieran Mansley, Engineering a User-Level TCP for the CLAN
Network, Proceedings of the ACM SIGCOMM workshop on Network-I/O
convergence: experience, lessons, implications, Pages: 228 – 236, 2003
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
44
Extensible protocol frameworks in the kernel
• Parveen Patel, Andrew Whitaker, David Wetherall, Jay Lepreau,
Tim Stack, Upgrading Transport Protocols using Untrusted
Mobile Code, Proceedings of the 19th ACM Symposium on
Operating Systems Principles, Pages 1-14, October 2003.
• Herbert Bos, Bart Samwel, Safe Kernel Programming in the
OKE, Proceedings of the fifth IEEE Conference on Open
Architectures and Network Programming, June 2002
• Marc Fiuczynski, Brian Bershad, An Extensible Protocol
Architecture for Application-Specific Networking,
Proceedings of the Winter USENIX Technical Conference, pages
55-64, January, 1996
• Norman Hutchinson, Larry Peterson, The x-kernel: An
Architecture for Implementing Network Protocols, IEEE
Transactions on Software Engineering, 17(1):64-76, January
1991
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
45
Isolation Services
• Marko Zec, Implementing a Clonable Network Stack In the FreeBSD
Kernel, Proceedings of USENIX Technical Conference, pages 137-150,
June 9-14, 2003
• P. H. Kamp, R. N. M. Watson, Jails: Confining the omnipotent root,
Proceedings of the 2nd International SANE Conference, May 2000
• A Bavier, M Bowman, B Chun, D Culler, S Karlin, S Muir, L Peterson, T
Roscoe, T Spalink, M Wawrzoniak, Operating System Support for
Planetary-Scale Network Services, Proceedings of the 1st USENIX
Symposium on Networked Systems Design and Implementation, pages
253-266, March 2004
• G Back, W Hsieh, J. Lepreau, Processes in KaffeOS: Isolation, Resource
Management, and Sharing in Java, Proceedings of the 4th Symposium on
Operating Systems Design and Implementation, pages 333-346, October
2000
• R Wahbe, S Lucco, T Anderson, S Graham, Efficient Software-Based
Fault Isolation, Proceedings of the 14th Symposium on Operating Systems
Principles, pages 203-216, December 5-8, 1993
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
46
VMM
• P Barham, B Dragovic, K Fraser, S Hand, T
Harris, A Ho, R Neugebauer, I Pratt, A Warfield,
Xen and the Art of Virtualization, Proceedings
of the 19th Symposium on Operating System
Principles, pages 164-177, October 19-22, 2003
• A Whitaker, M Shaw, S Gribble, Scale and
Performance in the Denali Isolation Kernel,
Proceedings of the 5th Symposium on Operating
Systems Design and Implementation, pages 195210, December 9-11, 2002
Fred Kuhns - 7/16/2015
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
47