Transcript Document
Web Servers Generic Overview http://en.wikipedia.org/wiki/Web_servers Web Servers A web server can be: A computer program Responsible for accepting HTTP requests from clients (web browsers) Returns HTTP responses with optional data contents Usually web pages HTML documents Linked objects (images, etc.). A computer that runs a computer program which provides the above functionality Common Features Common Features HTTP Accepts HTTP requests from a client Provides HTTP responses to the client Typically an HTML document Can also be: Raw text file Image Some other type of document defined by MIME-types If an error is found in the client request or while trying to serve the request Web server has to send an error response May include custom HTML May have text messages to better explain the problem to end users. Common Features Logging Web servers keep detailed information to log files Client requests Server responses Allows the webmaster to collect data Running log analyzers Additional Features Authentication Optional authorization before allowing access to some or all resources Requires a user name and password Handle Static content Dynamic content Support one or more related interfaces SSI, CGI, SCGI, FastCGI, JSP, PHP, ASP, ASP .NET, Server API such as NSAPI, ISAPI, etc. Additional Features HTTPS support VIA SSL or TLS Allows secure (encrypted) connections Using port 443 instead of port 80 Content compression I.e. by gzip encoding Reduces the size of the responses Lower bandwidth usage, etc. Additional Features Virtual hosting Serve many web sites using one IP address Large file support Serve files greater than 2 GB Typical 32 bit OS restriction Bandwidth throttling Limit the speed of responses Do not saturate the network Able to serve more clients Origin of returned content Where does it all come from? Content Origin The origin of the content may be: Static Comes from an existing file pre-existing in a file system Dynamic Dynamically generated by some other program Script Application Programming Interface (API) called by the web server Static content is usually delivered much faster than dynamic content 2 to 100 times Especially if the latter involves data pulled from a database Path translation How does it find it? Path translation Web servers map the path component of a Uniform Resource Locator (URL) into: Local file system resource Internal or external program name Static requests Dynamic requests For a static request the URL path specified by the client is relative to the Web server's root directory Path translation Consider the following URL requested by a client: Client's web browser translates it into a connection to www.example.com with the following HTTP 1.1 request: http://www.example.com/path/file.html GET /path/file.html HTTP/1.1 Host: www.example.com The web server on www.example.com then appends the given path to the path of its root directory On Unix machines, this is commonly /var/www/htdocs. The result would then be the local file system resource: /var/www/htdocs/path/file.html Web server then reads the file, if it exists, and sends a response to the client's web browser Response will describe the content of the file and contain the file itself Performance Performance Web servers: Serve requests quickly From more than one TCP/IP connection at a time Main key performance parameters are: number of requests per second latency response time in milliseconds for each new connection or request throughput in bytes per second depends on the type of request, etc. depending on file size, cached or not cached content, available network bandwidth, etc. Measured under: Varying load of clients Varying requests per client Performance Performance parameters may vary noticeably depending on the number of active connections A fourth parameter is the concurrency level supported by a web server under a specific configuration Specific server model used to implement a web server program can bias the performance and scalability level that can be reached under heavy load or when using high end hardware many CPUs, disks, etc. Load limits Load limits Web server (program) has defined load limits It can handle only a limited number of concurrent client connections per IP address (and IP port) Can serve only a certain maximum number of requests per second depending on: Usually between 2 and 60,000 Default between 500 and 1,000 its own settings the HTTP request type content origin (static or dynamic) whether the served content is or is not cached the hardware and software limits of the native OS When a web server is near to or over its limits It becomes overloaded and thus unresponsive Overload causes Overload causes A sample daily graph of a web server's load, indicating a spike in the load early in the day. Overload causes At any time web servers can be overloaded because of: Too much legitimate web traffic DDoS (Distributed Denial of Service) attacks Computer worms Traffic not filtered / limited on large web sites with very few resources (bandwidth, etc.) Internet (network) slowdowns Millions of infected browsers and/or web servers Internet web robots Abnormal traffic because of millions of infected computers (not coordinated) XSS viruses Thousands or even millions of clients hitting the web site in a short interval of time Client requests are served more slowly and the number of connections increases so much that server limits are reached Web servers (computers) partial unavailability Required / urgent maintenance or upgrade HW or SW failures Back-end (i.e. DB) failures, etc. Remaining web servers get too much traffic and they become overloaded Overload symptoms Overload symptoms The symptoms of an overloaded web server are: Requests are served with (possibly long) delays 500, 502, 503, 504 HTTP errors are returned to clients from 1 second to a few hundred seconds Sometimes also unrelated 404 error or even 408 error may be returned TCP connections are refused or reset (interrupted) before any content is sent to clients In very rare cases, only partial contents are sent This behavior may well be considered a bug Even if it stems from unavailable system resources Anti-overload techniques Anti-overload techniques To partially overcome load limits and to prevent overload use techniques like: Managing network traffic by using: Firewalls HTTP traffic managers Drop, redirect or rewrite requests having bad HTTP patterns Bandwidth management and traffic shaping Block unwanted traffic from bad IP sources or having bad patterns Smooth down peaks in network usage Deploying web cache techniques Using different domain names to serve different content (static and dynamic) by separate Web servers, i.e.: http://images.example.com http://www.example.com Anti-overload techniques Techniques continued: Use different domain names and/or computers to separate big files from small/medium files Using many Web servers (programs) per computer Be able to fully cache small and medium sized files Efficiently serve big or huge (over 10 - 1000 MB) files by using different settings Each bound to its own network card and IP address Use many Web servers that are grouped together Act or are seen as one big Web server See Load balancer Anti-overload techniques Techniques continued: Add more hardware resources RAM, disks, NICs, etc. Tune OS parameters Hardware capabilities Usage Use more efficient computer programs for web servers, etc. Use workarounds Specially if dynamic content is involved Historical notes Historical notes The world's first web server. In 1989 Tim Berners-Lee proposed to his employer CERN (European Organization for Nuclear Research) a new project Goal of easing the exchange of information between scientists by using a hypertext system As a result of the implementation of this project, in 1990 Berners-Lee wrote two programs: A browser called WorldWideWeb The world's first web server, which ran on NeXTSTEP Historical notes The first web server in U.S.A. was installed on December 12, 1991 Bebo White at SLAC After returning from a sabbatical at CERN Between 1991 and 1994 the simplicity and effectiveness of early technologies used to surf and exchange data through the World Wide Web helped to Port them to many different operating systems Spread their use among lots of different social groups of people First in scientific organizations Then in universities Finally in industry Historical notes In 1994 Tim Berners-Lee decided to constitute the World Wide Web Consortium (W3C) Regulate the further development of the many technologies in a standardization process: HTTP HTML etc. The following years saw an exponential growth of the number of web sites and servers Software Software July 2007, the most common HTTP serving programs: Apache HTTP Server Microsoft Microsoft is the sum of sites running Sun The sum of sites running: Microsoft-Internet-Information-Server Microsoft-IIS, Microsoft-IIS-W Microsoft-PWS-95 Microsoft-PWS SunONE, iPlanet-Enterprise Netscape-Enterprise Netscape-FastTrack Netscape-Commerce Netscape-Communications Netsite-Commerce Netsite-Communications lighttpd Software There are thousands of different web server programs available Many specialized for very specific purposes The fact that a web server is not very popular does not necessarily mean Lot of bugs Poor performance See Category:Web server software for a longer list of HTTP server programs. Statistics Statistics The most popular web servers, used for public web sites, are tracked by Details given by Netcraft Web Server Survey Netcraft Web Server Reports According to this site: Apache has been the most popular web server on the Internet since April of 1996 August 2007 Netcraft Web Server Survey: 50.92% web sites on the Internet use Apache 34.28% web sites use IIS With the active sites 48.42% running Apache 36.21% running IIS Popular Web Servers Who’s running the show? What are they? The big two: Apache http://en.wikipedia.org/wiki/Apache_web_server We’re number one! Apache Apache HTTP Server, referred to simply as Apache: A web server Notable for playing a key role in the initial growth of the World Wide Web Apache First viable alternative to Netscape Communications Corporation web server Evolved to rival other Unix-based web servers Functionality and performance Since April 1996 Apache has been the most popular HTTP server on the World Wide Web Currently known as Sun Java System Web Server Since March 2006 however it has experienced a steady decline of its market share Lost mostly against Microsoft IIS and the .NET platform September 2007: Apache served 50% of all websites Apache Project's name was chosen for two reasons: Respect for the Native American Indian Apache tribe Project's roots as a set of patches to the codebase of NCSA HTTPd 1.3 Well-known for their endurance and their skills in warfare Making it "a patchy" server Apache is developed and maintained by an open community of developers under the auspices of the Apache Software Foundation Available for a wide variety of OSs Microsoft Windows Novell NetWare Unix-like operating systems: e.g. Linux and Mac OS X Released under the Apache License Apache is free software / open source software. Apache History History The first version of the Apache web server was created by Robert McCool Heavily involved with the National Center for Supercomputing Applications web server When Rob left NCSA in mid-1994 Known simply as NCSA HTTPd Development of httpd stalled Left a variety of patches for improvements circulating through e-mails Rob McCool was not alone in his efforts Several other developers helped form the original "Apache Group": Brian Behlendorf, Roy T. Fielding, Rob Hartill, David Robinson, Cliff Skol nick, Randy Terbush, Robert S. Thau, Andrew Wilson, Eric Hagberg, Frank Peters, and Nicolas Pioch History Version 2 of the Apache server was a substantial rewrite of much of the Apache 1.x code Strong focus on further modularization and the development of a portability layer, the Apache Portable Runtime Apache 2.x core: several major enhancements over Apache 1.x: UNIX threading Better support for non-Unix platforms New Apache API IPv6 support First alpha release of Apache March 2, 2000 First general availability release on April 6, 2002 Version 2.2 introduced a new authorization API that allows for more flexibility Also features improved cache modules and proxy modules Features Features Apache supports a variety of features Many implemented as compiled modules which extend the core functionality Range from server-side programming language support to authentication schemes: Common language interfaces support mod_perl, mod_python, Tcl, and PHP Popular authentication modules include mod_access, mod_auth, and mod_digest. Features Other features include: SSL and TLS support A proxy module A useful URL rewriter also known as a rewrite engine, implemented under mod_rewrite Custom log files mod_log_config Filtering support mod_ssl mod_include mod_ext_filter Apache logs can be analyzed through a web browser using free scripts AWStats/W3Perl Visitors Features Virtual hosting allows one Apache installation to serve many different actual websites For example, one machine, with one Apache installation could simultaneously serve: Apache features www.example.com www.test.com test47.test-server.test.com etc. Configurable error messages DBMS-based authentication databases Content negotiation It is also supported by several graphical user interfaces (GUIs) Permit easier, more intuitive configuration of the server Usage Usage Apache is used to serve both static content and dynamic Web pages Many web applications are designed expecting the environment and features that Apache provides Apache is the web server component of the popular XAMPP web server application stack Partners with MySQL PHP/Perl/(Python) programming languages Usage Apache is redistributed as part of various proprietary software packages including the Mac OS X integrates Apache Oracle Database IBM WebSphere application server Its built-in web server Support for its WebObjects application server It is also supported by Borland Kylix and Delphi development tools Usage Apache is included with Novell NetWare 6.5 Apache is used for many other tasks where content needs to be made available in a secure and reliable way Default web server Sharing files from a personal computer over the Internet A user who has Apache installed on their desktop can put arbitrary files in the Apache's document root which can then be shared Programmers developing web applications Locally installed version of Apache Preview and test code as it is being developed Usage Microsoft Internet Information Services (IIS) is the main competitor to Apache Trailed by Sun Java System Web Server Host of other applications such as Zeus Web Server License License The software license under which software from the Apache Foundation is distributed is a distinctive part of the Apache HTTP Server's history and presence in the open source software community The Apache License allows for the distribution of both open and closed source derivations of the source code The Free Software Foundation does not consider the Apache License to be compatible with version 2 of the GNU General Public License (GPL) Software licensed under the Apache License cannot be integrated with software that is distributed under the GPL License It is a free software license Incompatible with the GPL Has a specific requirement that is not in the GPL Has certain patent termination cases that the GPL does not require However, version 3 of the GPL includes a provision (Section 7e) which allows it to be compatible with licenses that have patent retaliation clauses, including the Apache License The name Apache is a registered trademark and may only be used with the trademark holder's express permission Microsoft IIS http://en.wikipedia.org/wiki/IIS We’re # 2… IIS Microsoft Internet Information Services (IIS) Formerly called Internet Information Server Set of Internet-based services for servers using Microsoft Windows World's second most popular web server in terms of overall websites September 2007: it served 34.94% of all websites and 36.63% of all active websites (Netcraft) Services currently include: FTP SMTP NNTP HTTP/HTTPS History of IIS IIS initially released as additional set of Internet based services for Windows NT 3.51 IIS 2.0 added support for the Windows NT 4.0 IIS 3.0 introduced the Active Server Pages dynamic scripting environment IIS 4.0 dropped support for the Gopher protocol Bundled with Windows NT as a separate "Option Pack" CD-ROM History of IIS Current shipping versions of IIS: 7.0 for Windows Vista 6.0 for Windows Server 2003 Added support for IPv6 5.1 for Windows XP Professional IIS 5.1 for Windows XP is a restricted version of IIS that supports only 10 simultaneous connections and a single web site History of IIS FastCGI module available for IIS5.1, IIS6 and IIS7 Windows Vista does not install IIS 7.0 by default Can be selected among the list of optionally installed components IIS 7.0 on Vista does not limit the number of connections allowed Restricts performance based on active concurrent requests Security Security Earlier versions of IIS had lot of vulnerabilities Chief among them CA-2001-19 Led to the infamous Code Red worm Version 7.0 currently has no reported issues In perspective, as of 11 September 2007, the free software Apache web server has one unpatched reported issue Affecting only MS Windows systems Rated "less critical“ Security IIS 6.0 opted to change the behavior of pre-installed ISAPI handlers Many of which were culprits in the vulnerabilities on 4.0 and 5.0 Reduced the attack surface of IIS IIS 6.0 added a feature called "Web Service Extensions“ Prevents IIS from launching any program without explicit permission by an administrator Security IIS 7.0 the components were modularized Only the required components have to be installed Further reducing the attack surface Security features such asURLFiltering were added Rejects suspicious URLs based on a user defined rule set Security In IIS 5.1 and lower: By default all websites were run In-process Under the System account a default Windows account with elevated rights Security In IIS 6.0 all request handling processes have been brought under a Network Services account Has significantly fewer privileges If there is an exploit in a feature or custom code Wouldn't necessarily compromise the entire system Given the sandboxed environment the processes run Contains a new kernel HTTP stack (http.sys) Stricter HTTP request parser and response cache for both static and dynamic content Authentication mechanisms Authentication mechanisms IIS 5.0 and higher support the following authentication mechanisms: Basic access authentication Digest access authentication Integrated Windows Authentication .NET Passport Authentication Internet Information Services 7.0 Internet Information Services 7.0 Debuting with Windows Vista To be included in Windows Server 2008 IIS 7.0 features a modular architecture Instead of a monolithic server which features all services IIS 7 has a core web server engine Modules offering specific functionality can be added to the engine to enable its features Advantages Only the features required need be enabled The functionalities can be extended by using custom modules Internet Information Services 7.0 IIS 7 will ship with a handful of modules Microsoft will make other modules available online The following sets of modules are slated to ship with the server: HTTP Modules Security Modules Content Modules Compression Modules Caching Modules Logging and Diagnostics Modules that integrates with the new configuration store, as well as the new management environment Internet Information Services 7.0 Writing extensions to IIS 7 using ISAPI has been deprecated in favor of the module API, using which modules can plug in anywhere in the request processing pipeline. Much of IIS's own functionality is built on this API Modules can be written using C++ or using the ihttpmodule class of the .NET Framework language Modules can be loaded globally where the services provided by the module can effect all sites, or loaded on a per-site basis. IIS 7 has an integrated mode application pool where .NET modules are loaded into the pipeline using the module API, rather than ISAPI. Developers will have much more control over a request process than was possible in prior versions As a result ASP.NET code can be used with all requests to the server For applications requiring strict IIS 6.0 compatibility, the Classic application pool mode loads asp.NET as an ISAPI. Internet Information Services 7.0 A significant change from previous versions: All web server configuration information is stored solely in XML configuration files Instead of in the metabase The server has a global configuration file Provides defaults Each virtual web's document root (and any subdirectory thereof) may contain a web.config Containing settings that augment or override the defaults Internet Information Services 7.0 Changes to these files take effect immediately Marks a significant departure from previous versions whereby web interfaces, or machine administrator access, were required to change simple settings such as default document, active modules and security/authentication It also eliminates the need to perform metabase synchronization between multiple servers in a farm of web servers Internet Information Services 7.0 Features a completely rewritten administration interface Takes advantage of modern MMC features such as Task panes Asynchronous operation Configuration of ASP.NET is more fully integrated into the administrative interface. Internet Information Services 7.0 Other changes: PICS content ratings, support for Microsoft Passport, and server-side image maps are no longer included Executing commands via server-side includes is no longer permitted. IISRESET -reboot has been removed The CONVLOG tool, which converts IIS log files into NCSA format, has been removed Support for enabling a folder for "Web Sharing" via the Windows Explorer interface has been removed. IIS Media Pack, which allows IIS to be used as a barebones media server, without using Windows Media Services New FTP module, that integrates with the new configuration store, as well as the new management environment Summary Concentrated on HTTP servers Apache and IIS are the main web serving tools Apache still king IIS Up and coming Usage tracked Netcraft Web Server Survey