The Experiment Lifecycle and its Major Programs 1

Download Report

Transcript The Experiment Lifecycle and its Major Programs 1

The Experiment Lifecycle
and
its Major Programs
1
Experiment Lifecycle:
the User Perspective
2
Creating an Experiment
• Done with `batchexp’ for both batch and
interactive experiments
– “batch”* is historical name
• Can bring the experiment to three states
– swapped – pre-run only
– posted – queued experiment ready to run
– active – experiment swapped in
3
Swapping An Experiment
• Done with `swapexp’
• Can effect several transitions
– swapped to active (swap in experiment)
– active to swapped (swap out experiment)
– active to active (modify experiment)
– posted to swapped (dequeue batch experiment)
4
Pre-run (tbprerun)
• Parse NS file (parse-ns and parse.tcl)
– Put virtual state in database (xmlconvert)
• Do visualization layout (prerender)
• Compute static routes (staticroutes)
5
swapped to active (tbswap in)
• Mapping: Find nodes for experimenter
– assign_wrapper
– assign
• Allocate nodes (nalloc)
– Set serial console access (console_setup)
• Set up NFS exports (exports_setup)
• Set up DNS names (named_setup)
• Reboot nodes and wait for them (os_setup)
– Load disks if necessary (os_load)
6
swapped to active (contd.)
•
•
•
•
Start event system (eventsys_control)
Create VLANs (snmpit)
Set up mailing lists (genelists)
Failure at any step results in swapout
7
active to swapped(tbswap out)
• Stop the event system (eventsys_control)
• Tear down VLANs (snmpit)
• Free nodes (nfree)
– Scheduled reservations (sched_reserve)
– Place in reloadpending experiment
– Revoke console access (console_setup)
• Reset DNS (named_setup)
• Reset NFS exports (exports_setup)
• Reset mailing lists (genelists)
8
active to active (tbswap modify)
• Purpose: experiment modification
– Get new virtual state (re-parse NS file)
– Bring physical mapping into sync with new state
• Leaves alone nodes whose physical mapping
matches the new virtual state
9
Important Daemons
• batch_daemon
– Picks up posted experiments
– Attempts a swapin
– One experiment at a time for each user
– Swaps out finished batch experiments
• reload_daemon
– Picks up nodes from reloadpending experiment
– Frees them when done reloading
10
Next, in More Depth
• Parsing
• Resource allocation
– Setup for the action: assign_wrapper
– The real brains: assign
•
•
•
•
•
•
Serial console management
Link shaping
IP routing support
Traffic generation
Inter-node synchronization
Event system
12
Parsing Experiment Configurations
13
Experiment Configuration Language
• General purpose OTcl scripting language
based on NS
• Exports an API nearly identical to that of NS
albeit a subset
• Testbed specific actions via the tb-*
procedures
– We provide a compatibility script to include when
running under a NS simulation
• Define your own procedures / classes /
methods
14
Making sense out of others’ code
• The parser is also written in OTcl
• It mirrors a subset of NS classes
• Implemented methods for the above classes
capture the user specified experiment attributes
• Convert experiment attributes to an intermediate
XML format
– Generic format makes it easy to add support for other
configuration languages
• Store the configuration in the virt_* tables such as
virt_nodes, virt_lans etc.
15
Implementation Quirks
• Capture top level resource names for later use
– E.g.: Use 'n0' to name the physical node when the user
asks for set n0 [$ns node]
• Rename resource names to workaround restrictions
such as in DNS
– E.g.: Node 'n(0)' to 'n-0'
• Parser run on ops for security reasons
– Mixing trusted/untrusted OTcl code on main server (boss)
is dangerous
• Read tbsetup/ns2ir/README in the source tree for
details
16
Assign Wrapper (PG Version)
17
Assign Wrapper
• Perl frontend to assign
• Converts virtual DB representation to more
neutral “top” file format (input)
• Converts results from plain text format into
physical DB representation
• assign_wrapper is extremely testbed aware
• Moves information from virtual tables to
physical tables
18
Virtual Representation
• An experiment is really a set of tables in the
database
• Includes “virt_nodes” and “virt_lans” which
describe the nodes and the network topology
• Other tables include routes, program agents,
traffic generators, virtual types, etc.
19
Virtual Representation Cont.
• Example:
set n1 [$ns node]
set n2 [$ns node]
set link0 [$ns duplex-link $n1 $n2 100MB 10ms]
tb-set-hardware $n2 pc600
• Is stored in database tables:
virt_node ('n1', '10.1.1.1', 'pc850', 'FBSD-STD', ...)
virt_node ('n2', '10.1.1.2', 'pc600', 'RHL-STD, ...)
virt_lan ('link0', 'n1', '100MB', '5ms', ...)
virt_lan ('link0', 'n2', '100MB', '5ms', ...)
20
What’s a top file?
• Stands for "topology" file, but thats too many
syllables.
• Input file to assign specifying nodes, links,
desires.
• Conversion of DB format to:
node n2 pc850
node n1 pc600
link link0/n1:0,n2:0 n1 n2 100000 0 0
• Combine with current (free) physical
resources to come up with a solution.
21
Assign Results
• Assign maps n1 and n2 to pc1 and pc41
based on types and bandwidth.
Nodes
node1 pc1
node2 pc41
End Nodes
Edges
link0/n1:0,n2:0 intraswitch pc1/eth3 pc41/eth1
End Edges
• The above is a “simplified” version of actual
results. Gory details available elsewhere.
22
Assign Wrapper Continues
• Allocate physical resources (nodes) as
specified by assign
• Allocate virtual resources (vnodes) on
physical nodes (local and remote)
• If some nodes already allocated (someone
else got them before you), try again
• Keep trying until maximum try exceeded;
assign might fail to find a solution on first N
tries
23
Assign Wrapper Keeps Going …
• Insert set of “vlans” into database
– pc1/eth3 connected to pc41/eth1
• Update “interfaces” table with IP addresses
assigned by the parser
• Update “nodes” table with user specified
values from virt_nodes.
– Osids, rpms, tarballs, etc.
• Update “linkdelays” table with end node
traffic shaping configuration (from virt_lans)
24
And Going and Going
• Update “delays” table with delay node traffic
shaping configuration
• Update “tunnels” table with tunnel
configuration (widearea nodes)
• Update “agents” table with location of where
events should be sent to control traffic
shaping
• Call exit(0) and rest!
25
Resource Allocation:
assign
26
assign’s job
• Maps virtual resources to local nodes and VLANs
• General combinatorial optimization approach to
NP-hard problem
• Uses simulated annealing
• Minimizes inter-switch links, number of switches,
and other constraints.
• Takes seconds for most experiments
27
What’s Hard About It?
• Satisfy constraints
– Requested types
– Can’t go over inter-switch bandwidth
– Domain-specific constraints
• LAN placement for virtual nodes
• Subnodes
• Maximize opportunity for future mappings
– Minimize inter-switch bandwidth
– Avoid scarce nodes
28
What It Can Do
• Handle multiple types of nodes on multiple
switches
• Allow users to ask for classes of nodes
• Prefer/discourage use of certain nodes
• Map multiple virtual nodes to one physical
node
• Handle nodes that are 'hosted' in some other
node
• Partial solutions
29
What It Doesn't Do
• Map based on observed end-to-end network
characteristics
– Applicable to wide-area and wireless
– But, we have another program, wanassign, that
can
• Satisfy requests for specific link types
– But, we could approximate with subnodes
• Full node resource description
30
Issues
• Complicated
–
–
–
–
Several authors
Subject of paper evaluating many configurations
Nature of randomized algorithm makes debugging hard
Evolved over time to keep up with features
• Scaling
– Particularly with virtual and simulated nodes
• Not just scale (1000’s), it’s the type of node
– Pre-passes may help
• The good: it’s coped with a lot of new demands!
31
Remote Console Access
32
Executive Summary
•
•
•
•
•
•
Allow user access to consoles via serial line
Console proxy enables remote access
Authentication and encryption
All console output logged
Requires OS support for serial consoles
Utah Emulab: all nodes have serial lines
– Not required, but handy
33
Serial Consoles
• Can redirect console in three places
– BIOS: on most “server” motherboards
– Boot loader: easy on BSD and Linux
– OS: easy on BSD and Linux
• Boot loaders and OSes must be configured
– Generally via boot loader configuration
34
The serial line proxy
(capture)
• Original purpose was to log console output
– Read/write serial line, log data, present tty IF
– Use “tip” to access pty
• Enhanced to “remote” the console
– Present a socket interface
– Can be accessed from anywhere on the
network
• One capture process per serial line
35
Authentication
(capserver)
• Only users in an experiment can access
• Use a one-time key
– capture running on serial line host generates
new key for every “session”
• Sends key to capserver on the boss node
– capserver records key in DB, returns ownership
info
– capture uses info to protect ACL and log files
36
Clients
(console, tiptunnel)
• console is the replacement for tip
– Run on ops, obtains access info via ACL file
created by capture
– File permissions restrict user access
• tiptunnel is the remove version
– Binaries for Linux, BSD, Windows
– Run as a helper app from browser
– Access info passed via secure web connection
– All communication via SSL
37
Emulab Link Shaping
38
Executive Summary
• Emulab allows setting and modification of
bandwidth, latency, and loss rate on a perlink basis
• Interface through NS script or command
• Implemented either by dedicated “delay”
nodes or on end nodes
• Delay nodes work with any end node OS
• End node shaping for FreeBSD or Linux
39
Delay nodes
• Run FreeBSD + dummynet + bridging
• FreeBSD kernel:
– Runs at 10000Hz to improve accuracy
– Uses polling device drivers to reduce overhead
•
•
•
•
Nodes are dedicated to an experiment
One node can shape multiple links
Transparent to end nodes
Not transparent to switch fabric
40
VLANs and Delay Nodes - Diagram
41
End node shaping
(“link delays”)
• Handle link shaping at both ends of the link
• Requires OS support on the end nodes
– FreeBSD: dummynet
– Linux: “tc” with modifications
• Conserves Emulab resources at potential
expense of emulation fidelity
• Works in environments where delay nodes
are not practical or possible
42
Dynamic control
• Link settings can be modified at “run time”
– at commands in the NS file
– tevc command
• Run a control agent (delay_agent) on all
nodes implementing shaping
• Listens for events, interacts with kernel to
effect changes
• OS specific
43
IP routing support in Emulab
44
Executive Summary
• Emulab offers three options for IP routing in
a topology: none, manual, or automatic
• Specified via the NS file
• Routes setup automatically at boot time
• There is no agent for dynamic modification
of routes
45
User-specified routing
• “None”
– No experimental network routes will be setup
– Used for LANs and routing experiments
• “Manual”
– Explicit specification of routes in the NS file
– Routes becomes part of DB state of experiment
– Passed to a node at boot, part of self-config
– Implies IP forwarding enabled
46
Emulab-provided routing
• “Static”
– Emulab calculates routes at experiment creation
(routecalc, staticroutes)
– Shortest path calculation between all pairs
– Optimized to coalesce into network routes
• “Session”
– Dynamic routing: runs gated/OSPF on all nodes
– Auto-generated config file uses only active
experimental interfaces
47
Routing Gotcha’s
• Node default route uses the control net
– Missing manual routes result in lost traffic
• Control net is visible to routing daemons
– Makes their job easy (one hop to anyone)
• NxN "Static" route computation and storage
do not scale as N increases, such as in
multiplexed virtual nodes
48
Traffic Generation in Emulab
49
Executive Summary
• Emulab allows experiments to run and
control background traffic generators
• Interface through NS script or command line
tool
• Constant Bit Rate traffic only right now
• UDP or TCP only right now
50
Implementation details
• Based on TG (http://www.postel.org/tg/)
– UDP or TCP, one-way, various distributions of
interarrival and length
• Modified to be an event agent
– Start and stop, change packet rate and size
• Interface:
– NS: standard syntax for traffic sources/sinks
– tevc command line tool
51
Inter-node synchronization
in Emulab
52
Executive Summary
• Provides a simple inter-node barrier
synchronization mechanism for experiments
• Example: wait for all nodes to finish running
a test before starting the next one
• Not a centralized service (per-experiment
infrastructure), scales well
• Easy to use: can be scripted
53
History
• Originally implemented a single-barrier,
single-use “ready” mechanism:
– Allowed users to know when all nodes were “up”
– Used centralized TMCC to report/query status
– Network/server unfriendly: constant polling
• Users wanted a more general mechanism
– Multiple barriers, reusable barriers
• Tended to roll their own
– Often network unfriendly as well
54
Enter the Sync Server
• In NS file, declare a node as the server:
– set node1 [$ns node]
– tb-set-sync-server $node1
• When node boots, it starts up the sync server
automatically
• Nodes requiring synchronization use
emulab-sync application
• Use can be scripted using program agent
55
Example client use
• One node acts as barrier master, initializing
barrier and waiting for a number of clients:
– /usr/testbed/bin/emulab-sync -i 4
• All other client nodes contact the barrier:
– /usr/testbed/bin/emulab-sync
• emulab-sync blocks until the barrier count
is reached
56
Implementation
• Simple TCP-based server and client program
– UDP version in the works
• Client:
– Gets server info from a config file written at boot
– Connect to server and write a small record
– Block until a reply is read
• Server:
– Accept connections, read records from clients
– Write a reply when all clients have connected
57
Issues
• Why not use the event system for
synchronization?
– Event system is a centralized service
– As we move to decentralization, may reconsider
• Authentication: none
– Local: uses shared control net so this is a
problem, won't be with control net VLANs
– Wide-area: wide-open, add HMAC ala events or
just use event system
58
The Emulab Event System
59
Emulab Control Plane
• Many of Emulab’s features are dynamically
controllable:
– Traffic generators: can be started, stopped, and
parameters altered
– Link shaping: links can be brought up and down,
characteristics can be modified
• Control is via the NS file, the web interface,
or a command line tool.
60
Example: A Link
• NS: create a shaped link:
– set link0 [$ns duplex-link $n1 $n2 50Mb 10ms DropTail]
• NS: control the link:
– $ns at 100 "$link0 modify DELAY=20 BANDWIDTH=25"
– $ns at 200 "$link0 down"
• Command line: control the link
– tevc -e tutorial/linktest +10 link0 down
61
What's really happening?
• A link “agent” runs on each (delay) node to
control all of the links for that node.
• The agent listens for “events” from the server
telling it what to do.
• A per-experiment scheduler doles out the
events at the proper time, sending them to
the agents.
• Other agents include the traffic generators,
program objects, link tester.
62
Come on, what's really
happening?!
• Use Elvin (http://elvin.dstc.edu.au/)
– off-the-shelf publish-subscribe system
• Agents "listen" for events by "subscribing" to
those they care about.
• The per-experiment scheduler "publishes"
events as they come due.
• Events flow from the scheduler through the
Elvin daemon to the nodes, and ultimately to
the agents that wanted them.
63
Static/Dynamic event flow
64
Issues: Time
• What happens to “event time” when an
experiment is swapped?
– Run in real time: events could be lost
– Suspend time: dilation of experiment time
– Restart time: replay static event stream
• Timing for dynamic events
– tevc … +10 link0 down; tevc … +10 link1 up
– What is the latency between events?
• What latency do we need to guarantee?
65
Issues: Security
• Elvin mechanism is too heavyweight
– Requires encryption to protect authentication keys
– We have no reason to encrypt our events
• Don't want to tie ourselves to Elvin
– In principle
– Elvin has gone closed source
• Emulab past: no authentication, no wide-area
• Emulab current: use end-to-end HMAC
– Key transferred via TMCC
– Wide-area nodes supported, cannot inject events
66
Issues: Scaling
• Open Elvin TCP connection for every agent
– Use per-node proxy
– But agents still send events directly to boss
– And there are still a lot of nodes
• Use UDP?
– What about lost events?
• Deliver static events to nodes early?
– Doesn't help dynamic (“now”) events
• Multicast, someday (not the current usage model)
• You’d think we could just find a better pub/sub
system, but haven’t.
67