6.894: Distributed Operating System Engineering

Download Report

Transcript 6.894: Distributed Operating System Engineering

6.894: Distributed Operating
System Engineering
Lecturers:
Frans Kaashoek ([email protected])
Robert Morris ([email protected])
TA:
Jinyang Li ([email protected])
www.pdos.lcs.mit.edu/6.894
Operating System
• Software that turns silicon into something
useful
– Provides applications with a programming
interface
– Manages hardware resources on behalf of
applications
Distributed Operating System
• The holy grail: transparency
– provide applications with a virtual machine consisting
of many processors distributed around the network.
• Distributed OS engineering is difficult:
–
–
–
–
Failures
High-degree of concurrency
Long latencies
New classes of security attacks
Client/Server Architecture
• A modular architecture to structure
distributed systems
– Clients request services from servers
– Client and servers communicate with messages
– Servers are typically trusted
• Other architectures
– Peer-to-peer (decentralized)
– Single address space
6.894 topics
• Client-server components
– Remote procedure call, threads, address spaces,
etc.
• Storage
– File systems, transactions
• Security
– Confidentiality, authentication, etc.
• Scalable servers
6.894 is an advanced 6.033
• Perform actual systems research
– Perform a research project
– Study recent research papers
• Design systems for real workloads
– New abstractions, protocols, datastructures,
algorithms, etc.
• Build a real system (lab)
– Real enough that you can use it
Internet video-on-demand server
• Example to study issues and overview 6.894
• Requirements:
–
–
–
–
Low and high-quality video
Many users, spread around the Internet
Last mile bandwidth may be low
Access control
Client and server structure
Client() {
fd = connect(“server”);
write (fd, “video.mpg”);
while (!eof(fd)) {
read (fd, buf);
display (buf);
}
}
Server() {
while (1) {
cfd = accept();
read (cfd, name);
fd = open (name);
while (!eof(fd)) {
read(fd, block);
write (cfd, block);
}
close (cfd); close (fd);
}}
Performance “analysis”
• Server capacity:
– Network (100 Mbit/s)
– Disk (20 Mbyte/s)
• Obtained performance: one client stream
• Server is limited by software structure
• If a video is 200 Kbit/s, server should be
able to support more than one client.
Better single-server performance
• Goal: run at server’s hardware speed
– Disk or network should be bottleneck
• Method:
– Pipeline blocks of each request
– Multiplex requests from multiple clients
• Two implementation approaches:
– Multithreaded server
– Asynchronous I/O
Multithreaded server
server() {
while (1) {
cfd = accept();
read (cfd, name);
fd = open (name);
while (!eof(fd)) {
read(fd, block);
write (cfd, block);
}
close (cfd); close (fd);
}}
for (i = 0; i < 10; i++)
fork (server);
• When waiting for I/O,
thread scheduler runs
another thread
• All shared data must
protected by locks
• Release locks when
blocking
Asynchronous I/O
struct callback {
bool (*is_ready)();
void (*cb)(arg);
void *arg;
}
main() {
while (1) {
for (c = each callback) {
if (c->is_ready())
c->handler(c->arg);
}
}
}
• Code is structured as a
collection of handlers
• Handlers are nonblocking
• Create new handlers for
blocking operations
• When operation
completes, call handler
Asychronous server
init() {
on_accept(accept_cb);
}
accept_cb() {
on_readable(cfd,name_cb);
}
on_readable(fd, fn) {
c = new
callback(test_readable, fn, fd);
add c to callback list;
}
name_cb(cfd) {
read(cfd,name);
fd = open(name);
on_readable(fd, read_cb);
}
read_cb(cfd, fd) {
read(fd, block);
on_writeeable(fd, write_cb);
}
write_cb(cfd, fd) {
write(cfd, block);
on_readable(fd, read_cb);
}
Multithreaded vs. Async
• Hard to program
– Locking code
– Need to know what blocks
• Coordination explicit
• State stored on thread’s
stack
– Memory allocation implicit
• Context switch may be
expensive
• Multiprocessors
• Hard to program
– Callback code
– Need to know what blocks
• Coordination implicit
• State passed around
explicitly
– Memory allocation explicit
• Lightweight context
switch
• Uniprocessors
Coordination example
• Threaded server:
– Thread for network
interface
– Interrupt wakes up
network thread
– Protected (locks and
conditional variables)
shared buffer shared
between server threads
and network thread
• Asynchronous I/O
– Poll for packets
• How often to poll?
– Or, interrupt generates
an event
• Be careful: disable
interrupts when
manipulating callback
queue.
Scheduling: polling vs. interrupts
• Maintain peak performance under heavy load
– Interrupts model can lead to livelock
• Solution:
– Use interrupts under low load (good latency)
– Use polling under heavy load (good throughput)
• Polling is typically more efficient than interrupts
– Fits naturally into asynchronous I/O model
Other design issues
• Disk scheduling
– Elevator algorithm
• Memory management
– File system buffer cache
• Address spaces (VM management)
– Fault isolate different servers
• Efficient local communication?
• Efficient transfers between disk and networks
– Avoid copies
More than one processor
• Problem: single machine may not scale to enough
clients
• Solutions:
– Multiprocessors
• Helps when CPU is bottleneck
– Server clusters
• Helps when bandwidth between server and backbone is high
– Distributed server clusters
• Helps when bandwidth between client and distant server is
low
Clusters
• Naming transparency
– Server cluster transparent to client?
• Server selection
– Metrics: CPU load, presence of data
• Consistency
– Partition data
• Availability
– More processors can decrease reliability
– Replicate data (makes consistency more difficult)
Distributed clusters
•
•
•
•
•
Replication policies
Data distribution
Consistency
Network monitoring and modeling
Global load balancing
Tradeoff between accuracy, latency, and
network load
Making it secure: access control
• Redo design: don’t add on
– Firewalls: insecure and break many things
• CPU cycles is an issue
– A secure HTTP server can do about 10-20
connections a second
• Pulls in other global issues
– Name to key binding
– Key management infrastructure
Example summary
• Pipelining of disk and network requests
– Need a lot of sophisticated software
infrastructure
• Replication for reliability and performance
– Need sophisticated protocols
• Difficult: We did it for one application
– What if data changes rapidly?
– Lack of abstractions!
6.894 lab: real systems
• Multi-finger (due next week)
– Asynchronous I/O
• HTTP proxy
– High-performance proxy
– Cache, consistency, etc.
• Open-ended file system project
– Research