Introduction - University of Pennsylvania
Download
Report
Transcript Introduction - University of Pennsylvania
Distributed Computing:
Servers
Zachary G. Ives
University of Pennsylvania
CIS 455 / 555 – Internet and Web Systems
January 23, 2015
© 2013 A. Haeberlen, Z. Ives
Today
• Review and further discussion about distributed systems issues
• Scale
• Availability
• Consistency
• Interoperability
• Location / Discovery
University of Pennsylvania
© 2004-15 A. Haeberlen, Z. Ives
2
Next
• Brief discussion of the Butler Lampson paper
• Server architecture (internal)
• If time: Web (HTTP) servers
• Read:
• “HTTP Made Really Easy” link on the Schedule page
• Krishnamurthy / Rexford Ch 4 (see Web page)
© 2004-15 A. Haeberlen, Z. Ives
3
Some Context
• To this point, you’ve probably had significant experience designing programs
to solve specific, relatively small tasks
• It’s often a very difficult job to build a system
• (What is a computing system?)
• (Why is it harder to build?)
• We will consider in this course:
• Architectural aspects [Butler Lampson article]
• Algorithmic aspects [e.g., two-phase commit]
• Engineering aspects [e.g., build management]
© 2004-15 A. Haeberlen, Z. Ives
4
Butler Lampson
(Abbreviated Biography from His Page)
• Butler Lampson is an Architect at Microsoft Corporation and an
Adjunct Professor of Computer Science and Electrical Engineering at
MIT.
• He was one of the designers of the SDS 940 time-sharing system,
the Alto personal distributed computing system, the Xerox 9700
laser printer, two-phase commit protocols, ...
• He received the ACM’s Software Systems Award in 1984 for his work
on the Alto, the IEEE Computer Pioneer award in 1996, and the
Turing Award in 1992.
© 2004-15 A. Haeberlen, Z. Ives
5
Historical Note: Xerox Alto
1972-78
• Personal computer for
research
• The first GUI-based computer
(note the mouse!)
• 128KB RAM, 2.5MB hard disk
• Ethernet
In many ways, the forerunner
to the Xerox Star
• … Which begat the Apple Lisa,
and the rest is history!
© 2004-15 A. Haeberlen, Z. Ives
6
Lampson’s Advice
© 2004-15 A. Haeberlen, Z. Ives
7
Designing Servers: Systems for Handling Many Client
Requests
Major issues:
• Concurrency
•
How do we handle multiple simultaneous requests?
• Statefulness and sessions
•
Are requests self-contained, or do they require the server to keep around state?
• Communication and consistency
•
•
What state is shared across requests?
Do all requests need the same view?
• … And, of course, security!!!
(Note that servers today are typically replicated)
© 2004-15 A. Haeberlen, Z. Ives
8
Toy Example
• Suppose we want to build an “arithmetic” server
• Takes a request for a computation
• Parses the computation request
• Performs the computation
• Generates an HTML document with the result
• Returns the result to the requestor
• Suppose we can build on TCP…
© 2004-15 A. Haeberlen, Z. Ives
9
Concurrency
One approach: a separate server for each request
• Obviously this doesn’t work
Alternative: context-switching using shared resources
One, or a few, CPUs/disks/etc., multiplexing across jobs
• Threads and processes
• Events
• Cooperative scheduling
• Thread pools
© 2004-15 A. Haeberlen, Z. Ives
10
Review:
Threads and Processes
• Threads/processes are each written as if they are sequential programs
• But threads may also yield or wait on condition variables
• Preemptive switching, based on time slicing according to quanta (usu. 10100msec)
• States of threads: ready, running, and blocked
• Different levels of sharing and overhead between the two
© 2004-15 A. Haeberlen, Z. Ives
11
Example with Threads
• “Arithmetic” server divided into several components
• Daemon thread:
•
•
Takes a request for a computation
Parses the computation request
• Handler thread invoked on the results:
•
•
•
Performs the computation
Generates an HTML document with the result
Returns the result to the requestor
© 2004-15 A. Haeberlen, Z. Ives
12
Necessary Java Constructs
• Each handler may subclass Thread
• Implement the run() method
• Invoke via Handler h = new Handler(); h.start()
• Or the handler may implement Runnable
• Implement the run() method
• Invoke via Thread t = new Thread(myHandler); t.start()
© 2004-15 A. Haeberlen, Z. Ives
13
Shared Resources
• Suppose we share a resource such as an output logfile across threads
• How do we ensure that each thread’s modifications to the file are compatible
(e.g., log interleaves one status msg at a time)?
• For shared resources, use synchronized to gain a monitor on an
object to be “locked”
• synchronized methods lock the entire object
• synchronized (obj) { … } blocks lock their argument
© 2004-15 A. Haeberlen, Z. Ives
14
Issues with Threads and Shared Resources
• Deadlock: nothing happens because locks are held in a way that all threads
are waiting on other threads
• Livelock: system grinds to a halt because each thread is responding to
requests from the other threads, but not making progress
• Starvation: a thread never gets scheduled
© 2004-15 A. Haeberlen, Z. Ives
15
Event Handlers
• Basically, a programmer-specified way of breaking up tasks
• You’ve probably seen it if you’ve done any sort of GUI programming
• But it’s also used to multitask
• Based on an event queue and a notion of an event handler loop
• Each task is broken into a series of events
• Each event has a handler that does some work and potentially enqueues another event
• “Local state” is generally kept in the event
© 2004-15 A. Haeberlen, Z. Ives
16
Shared Resources in Event Handlers
• Generally don’t need true synchronized blocks or the equivalent here
• We control when each event handler gives up a resource, hence we control interleaving
of requests and their modification to shared resource
• But still may need to maintain flags or other information for situations when a
resource is used across events
© 2004-15 A. Haeberlen, Z. Ives
17
Example with Events
© 2004-15 A. Haeberlen, Z. Ives
18
Thread Pools
• Very commonly used (e.g., in many Apache products including some versions
of the Web server)
• Fixed number of threads – say 100 or 200
• As requests come in, they’re put onto a queue
• Handler threads dequeue items and process them
© 2004-15 A. Haeberlen, Z. Ives
19
A Key Aspect of the Thread Pool: the Queue
• The daemon thread doesn’t spawn threads: instead, it enqueues requests
• The handler threads dequeue and handle requests
• What to do when the system is not fully saturated, i.e., some threads in the
pool are idle?
• object.wait(), notify(), notifyAll()
• MUST use these when a synchronized on the object!
© 2004-15 A. Haeberlen, Z. Ives
20
Example with Thread Pools
© 2004-15 A. Haeberlen, Z. Ives
21
Other Ideas
• Cooperative scheduling
• “Non-preemptive multitasking”: threads execute for a while, save state, and explicitly
yield
• Examples of where used: old Mac OS, Windows 2.x
• Why is it bad?
© 2004-15 A. Haeberlen, Z. Ives
22
Concurrency and Debugging
• A critical issue: how do we debug concurrent apps?
• Consider:
• Threads – pros and cons
• Events – pros and cons
• There’s no free lunch!
• What are some tricks?
© 2004-15 A. Haeberlen, Z. Ives
23
Statefulness and Sessions
Very early HTTP
• Essentially stateless
• Make a request; the response is a page that is named by the URL
More recent HTTP, and other protocols:
• Some amount of state is maintained
• In HTTP, this requires cookies (more later)
• In many other protocols, the connection is kept open and all state is preserved
on both ends
Pros and cons of statefulness?
• (Does this look at all like the threads vs. events discussion?)
© 2004-15 A. Haeberlen, Z. Ives
24
Communication and Consistency
• A key question: how much interaction is there among server processes /
requests?
• Let’s consider:
• Amazon.com
• EBAY
• Blogger.com
• iTunes
• Google
© 2004-15 A. Haeberlen, Z. Ives
25
Shared, Persistent State
• Generally a database back-end
• Recovery and reliability features
• Transaction support
• Simple query interface
• Often the database is on a different server from the
executing code
• This is what Enterprise JavaBeans are
designed to support: distributed
• transactions
• “Model view controller” pattern
AJAX game
Client-side
JScript
Controller
View
XML view
Model
Database
is the most common
© 2004-15 A. Haeberlen, Z. Ives
26
Web (HTTP) Servers
• Processes HTTP requests,
generally over TCP Port 80
• Response uses another port
HTTP
request
• May involve:
Port 80
• Returning a document, with its
(MIME) type info
• e.g., HTML document, TXT
Processing
Response Other port
document
• Invoking a program or module,
•
returning its output
Submitting form data to a
program or module, returning
its output
• Resources are described
using URLs
© 2004-15 A. Haeberlen, Z. Ives
27
The URL
URL: Uniform Resource Locator
• A way of encoding protocol, login, DNS (or IP) address,
path info in one string
• Special case of Uniform Resource Identifer (URI)
URL is a URI for a location from which something can be retrieved
URN is a URI for a name
General syntax:
• {partition/protocol}://{userid}:{password}@{domain:port}/{path}
• http://me:[email protected]/index.html
• news://nntp.upenn.edu
• imap://email:[email protected]/folder1
© 2004-15 A. Haeberlen, Z. Ives
28
Handling a Web (HTTP) Request
1.
2.
3.
4.
Read and parse the request message
•
Most commonly, GET the contents of a URL
Translate the URL
•
•
Extract the “path” that is being requested
Determine if this is:
• A “virtual directory” that’s an alias for something else
• A reference to a file (HTML or SSI)
• A reference to a script or servlet
Verify authorization / access rights
Generate the response (may be an error code)
© 2004-15 A. Haeberlen, Z. Ives
29
HTTP: HyperText Transfer Protocol
A very simple, stateless protocol for sessionless exchanges
• Browser creates a new connection each time it wants to make a new request
(for a page, image, etc.)
• What are the benefits of this model? Drawbacks?
Exceptions:
• HTTP 1.1 added optional support for persistent connections and pipelining
• Clients + servers might keep state information
• Cookies provide a way of recording state
© 2004-15 A. Haeberlen, Z. Ives
30
HTTP Overview
Requests:
• A small number of request types (GET, POST, PUT, DELETE)
• Request may contain additional information, e.g. client info, parameters for forms, etc.
Responses:
• Response codes: 200 (OK), 404 (not found), etc.
• Metadata: content’s MIME type, length, etc.
• The “payload” or data
© 2004-15 A. Haeberlen, Z. Ives
31
A Simple HTTP Request
GET /~cis455/index.html HTTP/1.1
If-Modified-Since: Sun, 7 Jan 2007 11:12:23 GMT
Referer: http://www.cis.upenn.edu/index.html
• Requests data at a path using HTTP 1.1 protocol
• Example response:
HTTP/1.1 200 OK
Date: Sun, 7 Jan 2007 11:12:26 GMT
Last-Modified: Wed, 14 Jan 2004 8:30:00 GMT
Content-Type: text/html
Content-Length: 3931
…
© 2004-15 A. Haeberlen, Z. Ives
32
Request Types
GET
Retrieve the resource at a URL
PUT
Publish the specified data at a URL
DELETE
(Self-explanatory; not always supported)
POST
Submit form content
© 2004-15 A. Haeberlen, Z. Ives
33
Forms: Returning Data to the Server
• HTML forms allow assignments of values to variables
• Two means of submitting forms to apps:
• GET-style – within the URL:
GET /home/my.cgi?param=val¶m2=val2
• POST-style – as the data:
POST /home/second.cgi
Content-Length: 34
searchKey Penn
where www.google.com
© 2004-15 A. Haeberlen, Z. Ives
34
Authentication and Authorization
• Authentication
• At minimum, user ID and password – authenticates requestor
• Client may wish to authenticate the server, too!
• SSL (we’ll discuss this more later)
• Part of SSL: certificate from trusted server, validating machine
• Also: public key for encrypting client’s transmissions
• Authorization
• Determine what user can access
• For files, applications: typically, access control list
• If data from database, may also have view-based security
• We’ll talk about these in more detail later in the semester
© 2004-15 A. Haeberlen, Z. Ives
35
Programming Support in Web Servers
• Several means of supporting custom code:
• CGI – Common Gateway Interface – the oldest:
•
•
•
•
A CGI is a separate program, often in Perl, invoked by the server
Certain info is passed from server to CGI via Unix-style environment variables
QUERY_STRING; REMOTE_HOST, CONTENT_TYPE, …
HTTP post data is read from stdin
• Interface to persistent process:
•
•
In essence, how communication with a database is done – Oracle or MySQL is running “on the side”
Communicate via pipes, APIs like ODBC/JDBC, etc.
• Server module running in the same process
© 2004-15 A. Haeberlen, Z. Ives
36
Two Main Types of Server Modules
• Interpreters:
• Old JavaScript/JScript, PHP, ASP, …
• Often a full-fledged programming language
• Code is generally embedded within HTML, not stand-alone
• Custom runtimes/virtual machines/JIT compilers:
• Most modern Perl runtimes; Java servlets; ASP.NET; Node.js
• A virtual machine runs within the web server process
• Functions are invoked within that JVM to handle each request
• Code is generally written as usual, but may need to use HTML to create UI
rather than standard GUI APIs
• Most of these provide (at least limited) protection mechanisms
© 2004-15 A. Haeberlen, Z. Ives
37
Interfacing with a Database
A very common operation:
• Read some data from a database, output in a web form
• e.g., postings on Slashdot, items for a product catalog, etc.
Three problems, abstracted away by ODBC/ADO/JDBC:
• Impedance mismatch from relational DBs to objects in Java (etc.)
• Standard API for different databases
• Physical implementation for each DB
© 2004-15 A. Haeberlen, Z. Ives
38
(Cross-)Session State: Cookies
• Major problem with sessionless nature of HTTP: how do we keep info
between connections?
• Cookie: an opaque string associated with a web site, stored at the browser
•
•
Create in HTTP response with “Set-Cookie:
Passed in HTTP header as “Cookie:
xxx”
xxx”
• Interpretation is up to the application
•
Usually, object-value pairs; passed in HTTP header:
• Cookie:
user=“Joe” pwd=“blob”
…
• Often have an expiration
•
Very common: “session cookies”
© 2004-15 A. Haeberlen, Z. Ives
39
Common Web Server Architectures
• How do we handle many concurrent requests?
• Approach 1 – use what the OS provides:
• Fork a separate process for each request
• Or spawn a separate thread
• Approach 2 – write your own task switcher
• Break every response into small steps
• Schedule with custom event-driven dispatcher
• Approach 3 – pool of handlers:
• Create a thread pool that switches among requests or steps
© 2004-15 A. Haeberlen, Z. Ives
40
Content Management Systems
• Generally, a “middleware” that runs under the web server (or provides its own)
• Provides content integration from multiple sources
•
•
Perhaps SQL or XML databases
Perhaps text files, RSS feeds, etc.
• Often provides content authoring & assembly tools
• Typically, provides templates or other similar features for describing how to assemble
the site
• Common examples:
• MS Content Management Server; Slash; Apache Cocoon
© 2004-15 A. Haeberlen, Z. Ives
41
Ways of Handling Many Requests
• Web server “listens” on port 80 – “daemon” task
• Upon a request, it needs to invoke a response
• How should that response task get executed?
© 2004-15 A. Haeberlen, Z. Ives
42
Readings
• Please read for further depth:
• “HTTP Made Really Easy”
• Rexford/Krishnamurthy chapter on HTTP servers
• You will need to learn:
• Enough about HTTP to handle GET, POST, cookies, etc.
• Enough about Java threads to write your own thread pools for a Web server
• Enough about servlets to run them (including sessions)
© 2004-15 A. Haeberlen, Z. Ives
43