csci5211: Computer Networks and Data Communications

Download Report

Transcript csci5211: Computer Networks and Data Communications

Internet Engineering Course
Web Servers
Introduction

Company needs to provide various web
services
◦ Hosting intranet applications
◦ Company web site
◦ Various internet applications

Therefore there is a need to provide http
server
◦ First we have a look at what http protocol is
◦ Then we talk about Web Servers and Apache
as leading web server application
The World Wide Web (WWW)



Global hypertext system
Initially developed in 1989
◦ By Tim Berners Lee at the European Laboratory for Particle
Physics, CERN in Switzerland.
◦ To facilitate an easy way of sharing and editing research
documents among a geographically dispersed groups of
scientists.
In 1993, started to grow rapidly
◦ Mainly due to the NCSA developing a Web browser called
Mosaic (an X Window-based application)
 First graphical interface to the Web  More convenient browsing
 Flexible way people can navigate through worldwide resources in the
Internet and retrieve them
Web Browsers
Provides access to a
Web server
 Basic components

◦ HTML interpreter
◦ HTTP client used to
retrieve HTML pages

Some also support
◦ FTP, NTTP, POP, SMTP, …
Web Servers

Definitions
◦ A computer, responsible for accepting HTTP
requests from clients, and serving them Web
pages.
◦ A computer program that provides the above
mentioned functionality.

Common features
◦ Accepting HTTP requests from the network
◦ Providing HTTP response to the requester
 Typically consists of an HTML
◦ Usually capable of logging
 Client requests/Server responses
Web Servers cont.

Returned content
◦ Static
 Comes from an existing file
◦ Dynamic
 Dynamically generated by some other
program/script called by the Web server.

Path translation
◦ Translate the path component of a URL into a
local file system resource
 Path specified by the client is relative to the server’s
root dir
Basic Client/Server Architecture in
WWW

Overall organization of the Web.
• Basic function operation is to fetch documents
– Client issues requests, browser displays document
– Server responsible for retrieving document from local file system
• Client/server communications based on HTTP protocol
Dynamic Content
Parts of documents may be specified via
scripts/programs
 Client-side (executed on client machine, e.g., within
the browser)
◦ Client-side script - Script embedded in html document
◦ Applet - pre-compiled program passed to client

Server-side (executed on server machine)
◦ Server-side script embedded in document
◦ Servelet - precompiled program executed within the
server’s address space
◦ CGI scripts
Common Gateway Interface (CGI)

The principle of using server-side CGI programs.
• Allows documents can be generated dynamically “on-the-fly”
• Provides a standard way for web server to execute a program
using user-provided data as input
• To the server, CGI program appears as program responsible for
fetching the requested document
Architectural Overview

Architectural details of a client and server in the Web.
• Document fetch (and possibly server-side script): 2b-3b
• Execute CGI Script (separate process): 2c-3c-4c
• Execute servlet program (run within server): 2a-3a-4a
http protocol




Defines the communication between a web
server and a client
Used to deliver virtually all files and other
data (collectively called resources) on the
World Wide Web
A browser is an HTTP client because it sends
requests to an HTTP server (Web server
The standard (and default) port for HTTP
servers to listen on is 80, though they can
use any port.
Structure of http transactions
Request/Response, text based protocol
 Format of a http message:

<initial line, different for request vs. response>
Header1: value1
Header2: value2
Header3: value3
<optional message body goes here, like file contents
or query data; it can be many lines long, or even
binary data >
The Format of a Request
method
header
sp
:
URL
value
sp version
cr lf
cr
lf
headers
lines
header
cr lf
:
value
cr
lf
Entity Body
13
Request Example
GET /index.html HTTP/1.1 [CRLF]
Accept: image/gif, image/jpeg [CRLF]
User-Agent: Mozilla/4.0 [CRLF]
Host: www.ui.ac.ir:80 [CRLF]
Connection: Keep-Alive [CRLF]
[CRLF]
14
Request Example
method
request URL
GET /index.html HTTP/1.1
Accept: image/gif, image/jpeg
User-Agent: Mozilla/4.0
Host: www.ui.ac.ir:80
Connection: Keep-Alive
[blank line here]
headers
version
The Format of a Response
version
header
sp status code sp phrase
value
cr lf
:
cr
lf
status
line
headers
lines
header
cr lf
:
value
cr
lf
Entity Body
16
Response Example
HTTP/1.0 200 OK
Date: Fri, 31 Dec 1999 23:59:59 GMT
Content-Type: text/html
Content-Length: 1354
<html>
<body>
<h1>Hello World</h1>
(more file contents) . . .
</body>
</html>
17
Response Example
version
status code
reason phrase
HTTP/1.0 200 OK
Date: Fri, 31 Dec 1999 23:59:59 GMT
Content-Type: text/html
Content-Length: 1354
<html>
<body>
<h1>Hello World</h1>
(more file contents) . . .
</body>
</html>
headers
message body
18
Initial line




A typical initial request line:
◦ GET /path/to/file/index.html HTTP/1.0
Initial response line:
◦ HTTP/1.0 200 OK
◦ HTTP/1.0 404 Not Found
Status code:
◦ 1xx indicates an informational message only
◦ 2xx indicates success of some kind
◦ 3xx redirects the client to another URL
◦ 4xx indicates an error on the client's part
◦ 5xx indicates an error on the server's part
Common status codes:
◦ 200 OK
◦ 404 Not Found
◦ 301 Moved Permanently
◦ 302 Moved Temporarily
◦ 303 See Other (HTTP 1.1 only)
◦ 500 Server Error
Header lines

Typical request headers:
◦ From: email address of requester
◦ User-Agent: for example Useragent: Mozilla/3.0Gold

Typical response headers:
◦ Server: for example Server:Apache/1.2b3dev
◦ Last-modified: for example Last-Modified: ,
19 Feb 2006 23:59:59 GMT
Message body



In a response, this is where the requested
resource is returned to the client (the most
common use of the message body), or perhaps
explanatory text if there's an error.
In a request, this is where user-entered data or
uploaded files are sent to the server.
If an HTTP message includes a body, there are
usually header lines in the message that
describe the body. In particular,
◦ The Content-Type: header gives the MIME-type of
the data in the body, such as text/html or
image/gif.
◦ The Content-Length: header gives the number of
bytes in the body.
MIME Media types
Multipurpose Internet Mail Extensions
 HTTP sends the media type of the file using the
Content-Type: header
 Some important media types are

◦
◦
◦
◦
◦
◦
text/plain, text/html
image/gif, image/jpeg
audio/basic, audio/wav
model/vrml
video/mpeg, video/quicktime
application/*, application-specific data that does not fall
under any other MIME category, e.g. application/octet-stream
Sample HTTP exchange



To retrieve the file at the URL
http://www.somehost.com/path/file.html
Request:
GET /path/file.html HTTP/1.0
From: [email protected]
User-Agent: HTTPTool/1.0
[blank line here]
Response:
HTTP/1.0 200 OK
Date: Fri, 31 Dec 1999 23:59:59 GMT
Content-Type: text/html
Content-Length: 1354
<html> <body> <h1>Happy New Millennium!</h1> (more
file contents) . . . </body> </html>
HTTP methods



GET: request a resource by url
HEAD
◦ is just like a GET request, except it asks the server to return the
response headers only, and not the actual resource (i.e. no
message body).
◦ This is useful to check characteristics of a resource without
actually downloading it, thus saving bandwidth.
POST
◦ A POST request is used to send data to the server to be
processed in some way, like by a CGI script.
◦ There's a block of data sent with the request, in the message
body. There are usually extra headers to describe this message
body, like Content-Type: and Content-Length:.
◦ The request URI is not a resource to retrieve; it's usually a
program to handle the data you're sending.
◦ The HTTP response is normally program output, not a static file.
HTTP 1.1

It is a superset of HTTP 1.0. Improvements
include:
◦ Faster response, by allowing multiple transactions
to take place over a single persistent connection.
◦ Faster response and great bandwidth savings, by
adding cache support.
◦ Faster response for dynamically-generated pages,
by supporting chunked encoding, which allows a
response to be sent before its total length is
known.
◦ Efficient use of IP addresses, by allowing multiple
domains to be served from a single IP address.
Manually Experimenting
with HTTP
>telnet eng.ui.ac.ir 80
Trying 192.168.50.84…
Connected to eng.ui.ac.ir
Escape character is ‘^]’.
26
Sending a Request
> GET /~ladani/index.htm HTTP/1.0
[blank line]
27
The Response
HTTP/1.1 200 OK
Date: Fri, 29 Feb 2008 08:23:33 GMT
Server: Apache/2.0.52 (CentOS)
Last-Modified: Wed, 07 Nov 2007 12:27:44 GMT
ETag: "6ccb6-741c-43e55e05a5000"
Accept-Ranges: bytes
Content-Length: 29724
Connection: close
Content-Type: text/html; charset=WINDOWS-1256
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<meta
http-equiv="Content-Type" content="text/html; charset=windows-1252">
<meta name="
GENERATOR" content="Microsoft FrontPage 5.0">
….
28
GET /~ladani/index.htm HTTP/1.0
HTTP/1.1 200 OK
HTML code
29
GET /~ladani/no-such-page.htm HTTP/1.0
HTTP/1.1 404 Not Found
HTML code
30
GET /index.html HTTP/1.1
HTTP/1.1 400 Bad Request
HTML code
Why is it a Bad Request?
HTTP/1.1 without Host Header
31
Session-persistent State

What does session-persistent state mean?
◦ State information that is preserved between browsing
sessions.
◦ Information that is stored semi-permanently (i.e., on disk)
for later access.

Why was calculator example not session-persistent?
◦ Sum, current display, etc. not preserved if we went to a
different website and back to calculator.
Why session-persistence?

User-based customizations.
◦ MyYahoo, E*Trade, etc.

Long transactions.
◦ Electronic shopping carts.
◦ Order preparation

Server-side state maintenance.
◦ Large amounts of state info that you don’t
want to pass back and forth.
Cookie Overview
HTTP cookies are a mechanism for creating and
using session-persistent state.
 Cookies are simple string values that are
associated with a set of URL’s.
 Servers set cookies using an HTTP header.
 Client transmits the cookie as part of HTTP
request whenever an associated URL is visited
in the future.

Anatomy of a cookie.

Cookie has 6 parts:
◦
◦
◦
◦
◦
◦

Name
Value
Domain
Path
Expiration
Security flag
Name and Value are required, others have
default value.
Setting a cookie.
A cookie is set using the “Set-cookie”
header in an HTTP response.
 String value of the Set-cookie header is
parsed into semi-colon separated fields
that define the different parts of the
cookie.
 Cookie is stored by the client.

Sending cookies
Every time a client makes an HTTP request, it
tests every cookie for a match.
 Cookies match if…

◦
◦
◦
◦

Cookie domain is suffix of URL server.
Cookie expiration has not passed.
Cookie path is prefix of URL path.
Cookie security flag is on and connection is secure.
If a match is made, then name/value pair of
cookie is sent as “Cookie” header in request.
Setting a Cookie

Full cookie:
Set-Cookie: my_cookie = This is my
cookie value; domain=.eng.ui.ac.ir;
path=/~ladani; expires Thu, 06March-08 12:00:00 GMT

Can have more than one Set-Cookie
header, or can combine more than one
cookie in one header by separating with ,
Cookie Matching

Biggest misunderstanding:
◦ Servers do not RETRIEVE cookies!!!!
◦ Servers RECEIVE cookies previously planted.

Step 1:
◦ Some response by server installs cookie with
“Set-cookie” header.
◦ Client saves cookie to disk.
Cookie Matching

Step 2:
◦ Browser goes to some page which matches
previously received cookie.
◦ Cookie name and value sent in request as
“Cookie” HTTP header.

Step 3:
◦ CGI program detects presence of cookie and
uses it.
 Where is the cookie info?
 Environment variable HTTP_COOKIE
Where are cookies stored on client?
Client-specific locations.
 No standard.
 Latest IE stores in a folder called
“Temporary Internet Files”

◦ Each cookie stored in a separate file.

Netscape stores in “cookies.txt”
Typical Cookie Usages

Cookies as Database Index
◦ Most common use of cookies.
◦ State information is kept in some sort of
database and the cookie acts as an index.

Cookies as State Variables
◦ Name of cookie is like variable name.
◦ Value of cookie is state information.
Cookie Security

Security flag restricts when browser will
send a cookie back to server.
◦ Requires “secure” connection.
 For example: https in effect.

What does this mean about when the
cookies was set?
First Web Server

Berners-Lee wrote two programs
◦ A browser called WorldWideWeb
◦ The world’s first Web server, which ran on
NeXSTEP
 The machine is on exhibition at CERN’s public
museum
Most Famous Web Servers
Apache HTTP Server from Apache
Software Foundation
 Internet Information Services (IIS) from
Microsoft
 Google Web Server (GWS)

◦ Started from May 2007

Lighttpd
◦ powers several popular Web 2.0 sites like
YouTube, wikipedia and meebo
Web Servers Usage – Statistics

The most popular Web servers, used for
public Web sites, are tracked by Netcraft
Web Server Survey
◦ Details given by Netcraft Web Server Reports


Apache is the most popular since April 1996
Currently (February 2008) about
◦
◦
◦
◦
50.93%  Apache
35.56 %  Microsoft (IIS, PWS, etc.)
5.16 %  Google
0.99%  Lighttpd
Web Servers Usage – Statistics cont.
Total Sites Across All Domains August 1995 February 2008
Web Servers Usage – Statistics cont.
Market Share for Top Servers Across All
Domains August 1995 - February 2008
Web Servers Usage – Statistics cont.
Totals for Active Servers Across All Domains
June 2000 - February 2008
Apache (A PAtCHy) Web Server




Origins: NCSA (Univ. of Illinois,Urbana/Champaign)
Now: Apache Software Foundation (www.apache.org),
developers world-wide
Most widely used web server today [NetCraft web
survey, 2/2008]
Open source software
◦ Geographically distributed developers
◦ Modular, extensible design needed where third-party developers
could override or extend basic characteristics
Web Server Processing Steps
Accept Client
Connection
Read HTTP
Request Header
Find
File
Send HTTP
Response Header
Read File
Send Data
Apache HTTP Server

Apache Core
◦
◦
◦
◦


Receives client request
Typically, allocate new process for each incoming request
Allocates request record
Invokes handlers on individual modules in sequence
Modules register handlers during configuration
Handler
◦ Request record passed as single parameter
◦ Each handler reads/modifes request record
Web Server Phases
Apache core invokes a handler for each phase
 Resolve document reference (URI) to a local file
name (or CGI program+parameters)
 Client authentication (verify client identity)
 Client access control (determine access rights)
 Request access control (check if access allowed)
 MIME type determination of the response
 General phase for handling leftovers (e.g., check
syntax of returned response, build up user profile)
 Transmission of the response to client
 Logging data on the processing of the request
References






http://www.jmarshall.com/easy/http/
TCP/IP Tutorial and Technical Overview,
Rodriguez, Gatrell, Karas, Peschke, IBM redbooks,
August 2001
Wikipedia, the free encyclopedia
Apache: The Definitive Guide, 2nd edition, Ben
Laurie, Peter Laurie, O’Reilly, February 1999
Webmaster in a nutshell, 1st edition, Stephen
Spainhour,Valerie Quercia, O’Reilly, October 1996
Netcraft: February 2006 Web Server Survey