Transcript pptx

Lightweight Remote Procedure
Call (Bershad, et. al.)
Andy Jost
CS 533, Winter 2012
Introduction
• Preliminary definitions
– Monolithic kernels and microkernels
– Capability systems
– Remote procedure calls (RPC)
• Motivation
– Analysis of the common case
– Performance of RPC
– Sources of overhead
• Lightweight RPC (LRPC)
• Performance
OS Kernel Paradigms
• Monolithic OS
–
–
–
–
–
All (or nearly all) services built into the kernel
One level of protection, but typically no internal firewalls
E.g., BSD UNIX (millions of LOC)
HW is exposed to a great deal of complex software
Hard to debug, extend
• Microkernel (or “small-kernel”) OS
– The kernel provides only the minimum services necessary
to support independent application programs (address
space management, thread management, IPC)
– Additional services are provided by user-space daemons
– Certain daemons are imbued with special permission (e.g.,
HW access) by the kernel
– Service requests utilize IPC
Separate address spaces are used
to establish protection domains
1. No cross-domain read/write
2. The kernel mediates IPC
3. RPC can be used to give a
procedural interface to IPC
http://en.wikipedia.org/wiki/File:OS-structure.svg
Capability Systems
• A capability system is a security model
– A security model specifies and enforces a security policy
• Provides a fine-grained protection model
• A capability is a communicable, unforgeable token of authority
representing an access right
• In a capability system, the exchange of capabilities among mutually
untrusting entities is used to manage privileged access throughout the
system
One Possibility
Capability System
write(“/etc/passwd”)
fd = open(“/etc/passwd”, O_RDWR)
write(fd)
kernel
ACL
On whose authority do we write /etc/passwd?
Need to consult an access control list
kernel
The open file descriptor proves write
access was previously granted
Remote Procedure Call
• An IPC mechanism that allows a program to
invoke a subroutine in another address space
– The receiver might reside on the same physical system
or over a network
• Provides a large-grained protection model
• The call semantics make it appear as though only
a normal procedure call was performed
– Stubs interface to a runtime environment, which
handles data marshalling; the OS handles low-level
IPC
– Protection domain boundaries are hidden by stubs
Steps in a Traditional RPC
Client Application
Server Application
sending path
return path
Client Stub
Server Stub
Client Runtime Library
Server Runtime Library
Client Kernel
Server Kernel
Potentially shared
in a single-system
RPC
transport layer
The Use of RPC in Microkernel Systems
• Small-kernel systems can and do use RPC to borrow its
large-grained protection model
– Separate components are placed in disjoint address spaces
(protection domains)
– Communication between components is mediated by RPC,
using messages
– Advantages include: modularity, design simplification,
failure isolation, and transparency (of network services)
• But this approach simultaneously borrows the control
transfer facilities of RPC
– Those are not optimized for same-machine control transfer
– This leads to an unnecessary loss of efficiency
The Use of RPC Systems (I)
Bershad argues that the common case for RPC:
–
–
–
is cross-domain (not cross-machine)
involves relatively simple parameters
can be optimized
1. Frequency of Cross-Machine Activity
Frequency of Remote Activity
Operation System
Percentage of operations that
cross machine boundaries
V
3.0
Taos
5.3
Sun UNIX+NFS
0.6
The Use of RPC Systems (II)
2.
Parameter Size and Complexity
– 1,487,105 cross-domain procedure RPCs observed during one four-day
period
– 95% were to 10 procedures; 75% were to 3 procedures
– None of them involved complex arguments
– Furthermore, most RPCs involve a relatively small amount of data
transfer
The Use of RPC Systems (III)
3. The Performance of Cross-Domain RPC
– The theoretical minimum time for a null crossdomain operation includes time for
• Two procedure calls
• Two traps
• Two virtual memory context switches
– The cross-domain performance, measured across
six systems using the Null RPC, varies from over
300% to over 800% of the theoretical minimum
Sources of Overhead in Cross-Domain
RPC
• Stub Overhead: stubs are general enough for cross-machine RPC,
but inefficient for the common case of local RPC calls
• Message Buffer Overhead: client/kernel, kernel/server,
server/kernel, kernel/client
• Access Validation: the kernel must validate the message sender on
call and again in return
• Message Transfer: messages are enqueued by the sender and
dequeued by the receiver
• Scheduling: separate, concrete threads run in client and server
domains
• Context Switch: in going from client to server
• Dispatch: the server must receive and interpret the message
Lightweight RPC (LRPC)
• LRPC aims to improve the performance of cross-domain
communication relative to RPC
• The execution model is borrowed from a protected
procedure call
– Control transfer proceeds by way of a kernel trap; the kernel
validates the call and establishes a linkage
– The client provides an argument stack and its own concrete
thread of execution
• The programming semantics and large-grained protection
model are borrowed from RPC
– Servers execute in private protection domains
– Each one exports a specific set of interfaces to which clients may
bind
– By allowing a binding, the server authorizes a client to access its
procedures
LRPC High-Level Design
Control Flow
Control Flow
sending path
E-stack
Kernel
thread
return path
Virtual Memory
A-stack
Physical Memory
Virtual Memory
A-stack
Implementation Details
• Execution of the server procedure is made by way of a kernel trap
• The client provides the server with an argument stack and its own
concrete thread of execution
• The argument stacks (A-stacks) are shared between client and
server; the execution stacks (E-stacks) belong exclusively in the
server domain
– A-stacks and E-stacks are associated at call time
– Each A-stack queue is guarded by a single lock
• The client must bind to an LRPC interface before using it; binding:
– establishes shared segments between client and server
– allocates bookkeeping structures in the kernel
– returns a non-forgeable binding object to the client, which serves as
the key for accessing the server (recall capability systems)
• On multiprocessors, domains are cached on idle processors (to
reduce latency)
Performance
• The measurements below were taken across 100,000 cross-domain calls in
a tight loop
• LRPC/MP uses the domain-caching optimization for multiprocessors
• LRPC performs a context switch on each call
Table IV. LRPC Performance of Four Tests (in microseconds)
Test
Description
LRPC/MP
LRPC
Taos
Null
The Null cross-domain call
125
157
464
Add
A procedure taking two 4-byte arguments
and returning one 4-byte argument
130
164
480
BigIn
A procedure taking one 200-byte argument
173
192
539
BigInOut
A procedure taking and returning one 200byte argument
219
227
636
Discussion Items
• When the client thread is executing an LRPC,
does the scheduler know it has changed
context?
• Who is the parent of the server process?
What is its main thread doing?