Transcript Document

Web Servers
Generic Overview
http://en.wikipedia.org/wiki/Web_servers
Web Servers

A web server can be:

A computer program

Responsible for accepting HTTP requests from
clients (web browsers)



Returns HTTP responses with optional data
contents
Usually web pages
 HTML documents
 Linked objects (images, etc.).
A computer that runs a computer program
which provides the above functionality
Common Features
Common Features

HTTP


Accepts HTTP requests from a client
Provides HTTP responses to the client
 Typically an HTML document
 Can also be:




Raw text file
Image
Some other type of document
 defined by MIME-types
If an error is found in the client request or
while trying to serve the request
 Web server has to send an error response


May include custom HTML
May have text messages to better explain the
problem to end users.
Common Features

Logging

Web servers keep detailed information
to log files
Client requests
 Server responses


Allows the webmaster to collect data

Running log analyzers
Additional Features

Authentication

Optional authorization before allowing
access to some or all resources


Requires a user name and password
Handle


Static content
Dynamic content

Support one or more related interfaces

SSI, CGI, SCGI, FastCGI, JSP, PHP, ASP, ASP
.NET, Server API such as NSAPI, ISAPI, etc.
Additional Features

HTTPS support


VIA SSL or TLS
Allows secure (encrypted) connections


Using port 443 instead of port 80
Content compression


I.e. by gzip encoding
Reduces the size of the responses

Lower bandwidth usage, etc.
Additional Features

Virtual hosting


Serve many web sites using one IP
address
Large file support

Serve files greater than 2 GB


Typical 32 bit OS restriction
Bandwidth throttling

Limit the speed of responses
Do not saturate the network
 Able to serve more clients

Origin of returned content
Where does it all come from?
Content Origin

The origin of the content may be:


Static
 Comes from an existing file pre-existing in a
file system
Dynamic
 Dynamically generated by some other
program



Script
Application Programming Interface (API) called by
the web server
Static content is usually delivered much
faster than dynamic content
 2 to 100 times

Especially if the latter involves data pulled
from a database
Path translation
How does it find it?
Path translation

Web servers map the path component
of a Uniform Resource Locator (URL)
into:

Local file system resource


Internal or external program name


Static requests
Dynamic requests
For a static request the URL path
specified by the client is relative to the
Web server's root directory
Path translation

Consider the following URL requested by a client:


Client's web browser translates it into a connection to
www.example.com with the following HTTP 1.1 request:


http://www.example.com/path/file.html
GET /path/file.html HTTP/1.1 Host: www.example.com
The web server on www.example.com then appends the
given path to the path of its root directory


On Unix machines, this is commonly /var/www/htdocs.
The result would then be the local file system resource:



/var/www/htdocs/path/file.html
Web server then reads the file, if it exists, and sends a
response to the client's web browser
Response will describe the content of the file and
contain the file itself
Performance
Performance

Web servers:



Serve requests quickly
From more than one TCP/IP connection at a time
Main key performance parameters are:

number of requests per second



latency response time in milliseconds for each new
connection or request
throughput in bytes per second


depends on the type of request, etc.
depending on file size, cached or not cached content,
available network bandwidth, etc.
Measured under:


Varying load of clients
Varying requests per client
Performance

Performance parameters may vary
noticeably depending on the number
of active connections


A fourth parameter is the concurrency
level supported by a web server under a
specific configuration
Specific server model used to
implement a web server program can
bias the performance and scalability
level that can be reached under heavy
load or when using high end hardware

many CPUs, disks, etc.
Load limits
Load limits

Web server (program) has defined load limits

It can handle only a limited number of concurrent
client connections per IP address (and IP port)



Can serve only a certain maximum number of
requests per second depending on:






Usually between 2 and 60,000
Default between 500 and 1,000
its own settings
the HTTP request type
content origin (static or dynamic)
whether the served content is or is not cached
the hardware and software limits of the native OS
When a web server is near to or over its limits

It becomes overloaded and thus unresponsive
Overload causes
Overload causes

A sample daily graph of a web
server's load, indicating a spike in
the load early in the day.
Overload causes

At any time web servers can be overloaded because of:

Too much legitimate web traffic



DDoS (Distributed Denial of Service) attacks
Computer worms


Traffic not filtered / limited on large web sites with very few resources
(bandwidth, etc.)
Internet (network) slowdowns


Millions of infected browsers and/or web servers
Internet web robots


Abnormal traffic because of millions of infected computers (not
coordinated)
XSS viruses


Thousands or even millions of clients hitting the web site in a short
interval of time
Client requests are served more slowly and the number of connections
increases so much that server limits are reached
Web servers (computers) partial unavailability




Required / urgent maintenance or upgrade
HW or SW failures
Back-end (i.e. DB) failures, etc.
Remaining web servers get too much traffic and they become overloaded
Overload symptoms
Overload symptoms

The symptoms of an overloaded web server are:

Requests are served with (possibly long) delays


500, 502, 503, 504 HTTP errors are returned to
clients



from 1 second to a few hundred seconds
Sometimes also unrelated 404 error or even 408
error may be returned
TCP connections are refused or reset (interrupted)
before any content is sent to clients
In very rare cases, only partial contents are sent

This behavior may well be considered a bug

Even if it stems from unavailable system resources
Anti-overload techniques
Anti-overload techniques

To partially overcome load limits and to prevent overload
use techniques like:

Managing network traffic by using:
 Firewalls


HTTP traffic managers



Drop, redirect or rewrite requests having bad HTTP patterns
Bandwidth management and traffic shaping


Block unwanted traffic from bad IP sources or having bad
patterns
Smooth down peaks in network usage
Deploying web cache techniques
Using different domain names to serve different content
(static and dynamic) by separate Web servers, i.e.:
 http://images.example.com
 http://www.example.com
Anti-overload techniques

Techniques continued:

Use different domain names and/or computers to
separate big files from small/medium files



Using many Web servers (programs) per computer


Be able to fully cache small and medium sized files
Efficiently serve big or huge (over 10 - 1000 MB) files
by using different settings
Each bound to its own network card and IP address
Use many Web servers that are grouped together


Act or are seen as one big Web server
See Load balancer
Anti-overload techniques

Techniques continued:

Add more hardware resources


RAM, disks, NICs, etc.
Tune OS parameters
Hardware capabilities
 Usage



Use more efficient computer programs
for web servers, etc.
Use workarounds

Specially if dynamic content is involved
Historical notes
Historical notes


The world's first web server.
In 1989 Tim Berners-Lee proposed to his employer
CERN (European Organization for Nuclear Research) a
new project


Goal of easing the exchange of information between
scientists by using a hypertext system
As a result of the implementation of this project, in
1990 Berners-Lee wrote two programs:


A browser called WorldWideWeb
The world's first web server, which ran on NeXTSTEP
Historical notes

The first web server in U.S.A. was installed
on December 12, 1991



Bebo White at SLAC
After returning from a sabbatical at CERN
Between 1991 and 1994 the simplicity and
effectiveness of early technologies used to
surf and exchange data through the World
Wide Web helped to


Port them to many different operating systems
Spread their use among lots of different social
groups of people



First in scientific organizations
Then in universities
Finally in industry
Historical notes

In 1994 Tim Berners-Lee decided to
constitute the World Wide Web
Consortium (W3C)

Regulate the further development of the
many technologies in a standardization
process:
HTTP
 HTML
 etc.


The following years saw an exponential
growth of the number of web sites and
servers
Software
Software

July 2007, the most common HTTP serving programs:


Apache HTTP Server
Microsoft

Microsoft is the sum of sites running





Sun

The sum of sites running:









Microsoft-Internet-Information-Server
Microsoft-IIS, Microsoft-IIS-W
Microsoft-PWS-95
Microsoft-PWS
SunONE,
iPlanet-Enterprise
Netscape-Enterprise
Netscape-FastTrack
Netscape-Commerce
Netscape-Communications
Netsite-Commerce
Netsite-Communications
lighttpd
Software

There are thousands of different web
server programs available


Many specialized for very specific purposes
The fact that a web server is not very
popular does not necessarily mean



Lot of bugs
Poor performance
See Category:Web server software for a
longer list of HTTP server programs.
Statistics
Statistics

The most popular web servers, used for
public web sites, are tracked by


Details given by


Netcraft Web Server Survey
Netcraft Web Server Reports
According to this site:


Apache has been the most popular web
server on the Internet since April of 1996
August 2007 Netcraft Web Server Survey:



50.92% web sites on the Internet use Apache
34.28% web sites use IIS
With the active sites


48.42% running Apache
36.21% running IIS
Popular Web Servers
Who’s running the show?
What are they?
The big two:
Apache
http://en.wikipedia.org/wiki/Apache_web_server
We’re number one!
Apache

Apache HTTP Server, referred to simply as Apache:



A web server
Notable for playing a key role in the initial growth of the
World Wide Web
Apache

First viable alternative to Netscape Communications
Corporation web server


Evolved to rival other Unix-based web servers


Functionality and performance
Since April 1996 Apache has been the most popular
HTTP server on the World Wide Web



Currently known as Sun Java System Web Server
Since March 2006 however it has experienced a steady
decline of its market share
Lost mostly against Microsoft IIS and the .NET platform
September 2007: Apache served 50% of all websites
Apache

Project's name was chosen for two reasons:

Respect for the Native American Indian Apache tribe


Project's roots as a set of patches to the codebase of
NCSA HTTPd 1.3


Well-known for their endurance and their skills in warfare
Making it "a patchy" server
Apache is developed and maintained by


an open community of developers
under the auspices of the Apache Software Foundation

Available for a wide variety of OSs

Microsoft Windows
 Novell NetWare
 Unix-like operating systems: e.g. Linux and Mac
OS X
Released under the Apache License


Apache is free software / open source software.
Apache History
History

The first version of the Apache web server was
created by Robert McCool

Heavily involved with the National Center for
Supercomputing Applications web server


When Rob left NCSA in mid-1994



Known simply as NCSA HTTPd
Development of httpd stalled
Left a variety of patches for improvements circulating
through e-mails
Rob McCool was not alone in his efforts

Several other developers helped form the original
"Apache Group":

Brian Behlendorf, Roy T. Fielding, Rob Hartill, David
Robinson, Cliff Skol nick, Randy Terbush, Robert S.
Thau, Andrew Wilson, Eric Hagberg, Frank Peters, and
Nicolas Pioch
History

Version 2 of the Apache server was a substantial rewrite of much of the Apache 1.x code


Strong focus on further modularization and the
development of a portability layer, the Apache Portable
Runtime
Apache 2.x core: several major enhancements over
Apache 1.x:







UNIX threading
Better support for non-Unix platforms
New Apache API
IPv6 support
First alpha release of Apache March 2, 2000
First general availability release on April 6, 2002
Version 2.2 introduced a new authorization API that
allows for more flexibility

Also features improved cache modules and proxy
modules
Features
Features

Apache supports a variety of features


Many implemented as compiled modules
which extend the core functionality
Range from server-side programming
language support to authentication
schemes:
 Common language interfaces support


mod_perl, mod_python, Tcl, and PHP
Popular authentication modules include

mod_access, mod_auth, and mod_digest.
Features

Other features include:

SSL and TLS support




A proxy module
A useful URL rewriter
 also known as a rewrite engine, implemented under
mod_rewrite
Custom log files


mod_log_config
Filtering support



mod_ssl
mod_include
mod_ext_filter
Apache logs can be analyzed through a web browser
using free scripts
 AWStats/W3Perl

Visitors
Features

Virtual hosting allows one Apache installation to
serve many different actual websites

For example, one machine, with one Apache
installation could simultaneously serve:





Apache features




www.example.com
www.test.com
test47.test-server.test.com
etc.
Configurable error messages
DBMS-based authentication databases
Content negotiation
It is also supported by several graphical user
interfaces (GUIs)

Permit easier, more intuitive configuration of the server
Usage
Usage

Apache is used to serve both static content
and dynamic Web pages


Many web applications are designed expecting
the environment and features that Apache
provides
Apache is the web server component of the
popular XAMPP web server application stack

Partners with


MySQL
PHP/Perl/(Python) programming languages
Usage

Apache is redistributed as part of various
proprietary software packages including the



Mac OS X integrates Apache



Oracle Database
IBM WebSphere application server
Its built-in web server
Support for its WebObjects application server
It is also supported by Borland

Kylix and Delphi development tools
Usage

Apache is included with Novell NetWare 6.5


Apache is used for many other tasks where
content needs to be made available in a
secure and reliable way



Default web server
Sharing files from a personal computer over the
Internet
A user who has Apache installed on their desktop
can put arbitrary files in the Apache's document
root which can then be shared
Programmers developing web applications


Locally installed version of Apache
Preview and test code as it is being developed
Usage

Microsoft Internet Information
Services (IIS) is the main competitor
to Apache

Trailed by
Sun Java System Web Server
 Host of other applications


such as Zeus Web Server
License
License

The software license under which
software from the Apache Foundation is
distributed is a distinctive part of the
Apache HTTP Server's history and presence
in the open source software community


The Apache License allows for the distribution of
both open and closed source derivations of the
source code
The Free Software Foundation does not
consider the Apache License to be
compatible with version 2 of the GNU
General Public License (GPL)

Software licensed under the Apache License
cannot be integrated with software that is
distributed under the GPL
License

It is a free software license

Incompatible with the GPL




Has a specific requirement that is not in the GPL
Has certain patent termination cases that the
GPL does not require
However, version 3 of the GPL includes a
provision (Section 7e) which allows it to be
compatible with licenses that have patent
retaliation clauses, including the Apache
License
The name Apache is a registered trademark
and may only be used with the trademark
holder's express permission
Microsoft IIS
http://en.wikipedia.org/wiki/IIS
We’re # 2…
IIS

Microsoft Internet Information Services (IIS)



Formerly called Internet Information Server
Set of Internet-based services for servers using
Microsoft Windows
World's second most popular web server in terms of
overall websites


September 2007: it served 34.94% of all websites
and 36.63% of all active websites (Netcraft)
Services currently include:




FTP
SMTP
NNTP
HTTP/HTTPS
History of IIS

IIS initially released as additional set
of Internet based services for Windows
NT 3.51



IIS 2.0 added support for the Windows NT
4.0
IIS 3.0 introduced the Active Server Pages
dynamic scripting environment
IIS 4.0 dropped support for the Gopher
protocol

Bundled with Windows NT as a separate "Option
Pack" CD-ROM
History of IIS

Current shipping versions of IIS:


7.0 for Windows Vista
6.0 for Windows Server 2003


Added support for IPv6
5.1 for Windows XP Professional

IIS 5.1 for Windows XP is a restricted version
of IIS that supports only 10 simultaneous
connections and a single web site
History of IIS


FastCGI module available for IIS5.1,
IIS6 and IIS7
Windows Vista does not install IIS
7.0 by default


Can be selected among the list of optionally
installed components
IIS 7.0 on Vista does not limit the number of
connections allowed
 Restricts performance based on active
concurrent requests
Security
Security

Earlier versions of IIS had lot of vulnerabilities

Chief among them CA-2001-19



Led to the infamous Code Red worm
Version 7.0 currently has no reported issues
In perspective, as of 11 September 2007, the
free software Apache web server has one
unpatched reported issue


Affecting only MS Windows systems
Rated "less critical“
Security

IIS 6.0 opted to change the behavior
of pre-installed ISAPI handlers



Many of which were culprits in the
vulnerabilities on 4.0 and 5.0
Reduced the attack surface of IIS
IIS 6.0 added a feature called "Web
Service Extensions“

Prevents IIS from launching any program
without explicit permission by an
administrator
Security

IIS 7.0 the components were
modularized



Only the required components have to
be installed
Further reducing the attack surface
Security features such asURLFiltering
were added

Rejects suspicious URLs based on a user
defined rule set
Security

In IIS 5.1 and lower:

By default all websites were run
In-process
 Under the System account


a default Windows account with elevated rights
Security

In IIS 6.0 all request handling processes
have been brought under a Network
Services account


Has significantly fewer privileges
If there is an exploit in a feature or custom code



Wouldn't necessarily compromise the entire
system
Given the sandboxed environment the processes
run
Contains a new kernel HTTP stack (http.sys)

Stricter HTTP request parser and response cache
for both static and dynamic content
Authentication mechanisms
Authentication mechanisms

IIS 5.0 and higher support the
following authentication
mechanisms:




Basic access authentication
Digest access authentication
Integrated Windows Authentication
.NET Passport Authentication
Internet Information Services
7.0
Internet Information Services 7.0

Debuting with Windows Vista


To be included in Windows Server 2008
IIS 7.0 features a modular architecture




Instead of a monolithic server which features all
services
IIS 7 has a core web server engine
Modules offering specific functionality can be
added to the engine to enable its features
Advantages
 Only the features required need be enabled
 The functionalities can be extended by using
custom modules
Internet Information Services 7.0

IIS 7 will ship with a handful of modules


Microsoft will make other modules available
online
The following sets of modules are slated to ship
with the server:
 HTTP Modules
 Security Modules
 Content Modules
 Compression Modules
 Caching Modules
 Logging and Diagnostics Modules that
integrates with the new configuration store, as
well as the new management environment
Internet Information Services 7.0

Writing extensions to IIS 7 using ISAPI has been
deprecated in favor of the module API, using which
modules can plug in anywhere in the request processing
pipeline.

Much of IIS's own functionality is built on this API




Modules can be written using C++ or using the ihttpmodule
class of the .NET Framework language
Modules can be loaded globally where the services provided
by the module can effect all sites, or loaded on a per-site
basis.
IIS 7 has an integrated mode application pool where .NET
modules are loaded into the pipeline using the module API,
rather than ISAPI.


Developers will have much more control over a request
process than was possible in prior versions
As a result ASP.NET code can be used with all requests to the
server
For applications requiring strict IIS 6.0 compatibility, the
Classic application pool mode loads asp.NET as an ISAPI.
Internet Information Services 7.0

A significant change from previous
versions:



All web server configuration information is stored
solely in XML configuration files
Instead of in the metabase
The server has a global configuration file


Provides defaults
Each virtual web's document root (and any
subdirectory thereof) may contain a web.config

Containing settings that augment or override the
defaults
Internet Information Services 7.0

Changes to these files take effect
immediately


Marks a significant departure from previous
versions whereby web interfaces, or machine
administrator access, were required to change
simple settings such as default document, active
modules and security/authentication
It also eliminates the need to perform
metabase synchronization between multiple
servers in a farm of web servers
Internet Information Services 7.0

Features a completely rewritten
administration interface

Takes advantage of modern MMC
features such as
Task panes
 Asynchronous operation


Configuration of ASP.NET is more fully
integrated into the administrative
interface.
Internet Information Services 7.0

Other changes:







PICS content ratings, support for Microsoft Passport,
and server-side image maps are no longer included
Executing commands via server-side includes is no
longer permitted.
IISRESET -reboot has been removed
The CONVLOG tool, which converts IIS log files into
NCSA format, has been removed
Support for enabling a folder for "Web Sharing" via the
Windows Explorer interface has been removed.
IIS Media Pack, which allows IIS to be used as a barebones media server, without using Windows Media
Services
New FTP module, that integrates with the new
configuration store, as well as the new management
environment
Summary



Concentrated on HTTP servers
Apache and IIS are the main web
serving tools
Apache still king


IIS Up and coming
Usage tracked

Netcraft Web Server Survey