Internet / Intranet Brandeis Continuing and Professional

Download Report

Transcript Internet / Intranet Brandeis Continuing and Professional

Internet / Intranet
Fall 2000
Class 4
Web Server Technology
HTTP Protocol
Log Files
Class 4 Agenda
Discuss Homework
Milestone 2 Due Week 6
Mini-Homework Due Next Week
Overview of Web Servers and Server Technology
Presentations
HTTP
The Protocol For Communication Between Web
Browser and Server
Log Files
Lab Work
HTTP
Log Files (Mini-Homework)
Brandeis University Internet/Intranet Spring 2000
2
Web Servers
A Basic Web Server is Just a File Server
Client Requests a File via HTTP Protocol
Server Delivers the File via HTTP Protocol
Server Maps URL to a Subdirectory
Web Server Needs Appropriate Permissions to Access
Files/Directories
Supports Non-HTTP Protocols
FTP, Gopher, etc.
A Web Server is Not HTML Specific
Typically Identifies a Filetype by Extension
Or Directory Where File Exists
Brandeis University Internet/Intranet Spring 2000
3
Additional Common Web Server Features
Additional Security Beyond That Provided by O/S
Scripting
Ability to Dynamically Create a Web Page
Run a Program Instead of Returning a File (CGI)
Return the Program Output as the Requested File
Administration
Log Files
Performance Monitoring
Brandeis University Internet/Intranet Spring 2000
4
Advanced Web Server Features
Virtual Hosting
Allow Multiple URL’s to Map to Same Computer
Performance Optimization
Caching
Reliability
Scalability
Proxy Servers (For Security and Performance)
Fetch Documents That are on Other Computers
Cache Them Locally
Allows for Easy Scalability
Multiple Proxy Servers Can Cache Documents From One Source
Computer
Embedded Scripting
Server Side Includes
Custom Scripting Languages
Server API
Brandeis University Internet/Intranet Spring 2000
5
Web Servers – Added Functionality
Database Connectivity
SQL, MySQL
Directory Listings
Icons, etc.
Built-In Search Engines
Built-In ImageMap Handling
Multimedia Support
Session Emulation
Streaming Multimedia
Advanced Security
Encrypted HTTP
S-HTTP (Secure HTTP) – CommerceNet
SSL (Secure Sockets Layer) - Netscape
Web Server “Add-Ons”
CGI Substitutes / CGI Optimizations
Cold Fusion
Brandeis University Internet/Intranet Spring 2000
6
Web Server History
All Web Servers Have a Common Root
httpd (NCSA)
UNIX Orientation
Many Features are Essentially UNIX Features
Apache
Website (O’Reilly)
Netscape Enterprise Server
Microsoft Internet Information Server
A Slew of Others
Brandeis University Internet/Intranet Spring 2000
7
Apache
UNIX Origins – Now Ported to NT
Evolved From httpd
Freeware
Typical UNIX Application
Public Source Code
Many Defaults, Conventions
BUT: All is Configurable
No GUI Interface
Configured via Scripts, Shell Commands, Config Files
Various “Flavors”
Many Optional Features
API
ApacheSSL
Brandeis University Internet/Intranet Spring 2000
8
IIS / Netscape
Microsoft IIS
Not Strictly Derived From httpd/Apache
Windows NT
However: Functionally Very Similar to Apache
Emulates Many UNIX Conventions
E.g. Forward Slashes
Configuration via GUI
Personal Web Server
Peer Web Server
Netscape
Multi-Platform
UNIX is Preferred Platform
Less “Open” Than Apache
More Secure?
Brandeis University Internet/Intranet Spring 2000
9
UNIX File Structure
Forward Slashes (/) to Separate Filenames, Directories
Case Sensitive File Names
Windows is Not
No Limit on Filename Size / Extensions
Extensions are by Convention
Root is “/”
User Home Directory is: “~/”
Symbolic Links / Aliases
Directories Can Be Spread Over Multiple Drives
Can Create Non-Hierarchical Structure
File Permissions
Read, Write, Execute
Separate Permissions for Owner, Group, All
Directories are Special Cases of Files
Execute Permissions = Able to Browse Directory
Brandeis University Internet/Intranet Spring 2000
10
Web Server Configuration
Directory Structure
Virtual Document Tree
Access to User Directories
UNIX: ~user
Symbolic Links
Be Careful: May Link You Out of Directory Structure
Case Sensitivity
Ownership Access
Server is a Process Started by a User.
Has the Permissions of the User Who Started It.
Default Documents
Allow Directory Browsing
Scripting
Who is Allowed to Run Scripts?
How are Scripts Identified?
Brandeis University Internet/Intranet Spring 2000
11
Web Server File Access Control / Security
Directory
O/S Level Security
IP, Domain Level Security
Spoofing
Directory Access
.htaccess
Microsoft Front-Page Extensions
Encryption
S-HTTP
Web Protocols Only
SSL
TCP/IP Level
V1.0 – V2.X : Security Holes Found, Fixed
V3.0 Is Current
Uses Port 443
Microsoft PCT
Response to Holes in SSL 2.0
Now Use SSL
Brandeis University Internet/Intranet Spring 2000
12
Server Administration
Need Sysadmin and O/S Expertise
Lots of “Holes” Gotchas Whenever Scripts are
Allowed
FTP
Who is Allowed to Change Documents?
Who is Allowed to Change Server Configuration?
How do They Get Access?
Direct Access
Remote Access (e.g. FTP)
Log Files
Accessibility
Directory Structure
Management
Brandeis University Internet/Intranet Spring 2000
13
HTTP
The Protocol For Requesting and Delivering Web Pages
Not Restricted to Returning HTML Files
Client Server Model
Request / Reponse
TCP/IP Protocol Using Port 80
Supports Other Ports, Can Be Run Over Other Protocols
“Replaced” FTP as the Primary Method For Internet File
Transfer
Stateless
Uses MIME Format to Encapsulate Data
Message Structure Similar to SMTP Mail Messages
Message Header (metadata)
Message Body (data)
Separated From Header by a Blank Line
Browser Only Displays Body, Not Header
No Restrictions on Message Size / Format (as with SMTP)
Brandeis University Internet/Intranet Spring 2000
14
HTTP Versions
HTTP 1.0 - Commonly Used Version
HTTP 1.1
Formalizes Many Extensions to Version 1.0
Supports Persistent Connections
Supports Compression/Decompression
Supports Virtual Hosting
Single Server With Multiple IP Addresses
Supports Multiple Languages
Supports Byte Range Transfers
Useful For Re-Sending Interrupted Data Transfers
Similar to Process Used By XMODEM, etc.
Brandeis University Internet/Intranet Spring 2000
15
HTTP OVERVIEW
HTTP Request
Client
(Browser)
HTTP Response
Web
Server
File
System
HTML
HTML
CGI
HTML
Brandeis University Internet/Intranet Spring 2000
Server
Application
16
HTTP Commands
Simple Structure
Main Methods
GET <URI> HTTP/1.0
Request the File Specified By the URL
URI is URL Without Protocol/Port
HEAD
Request the HTTP Header Information Only
Don’t Return the File Itself
POST
Sends Data to The Server
Typically Data From a Form
Defined, But Not Widely Implemented
PUT
DELETE
LINK
UNLINK
Brandeis University Internet/Intranet Spring 2000
17
Common HTTP Header Fields
Additional “Parameters” to the HTTP Commands
Used in HTTP Requests:
Accept
Lists the MIME Types That Client Can Accept
E.g. Accept text/plain, text/html or Accept *
Accept-Charset
Lists Accepted Character Sets That Client Can Accept
ASCII, ISO-8859-1 Are Assumed
Accept-Encoding
Accept-Language
Authorization
Basic – UserName:Password (Base64 Encoding)
Cookie
From
E-mail Address of Requesting User
Not Typically Used For Privacy Reasons
Primarily Used By Automated Clients (e.g. Bots)
Brandeis University Internet/Intranet Spring 2000
18
Common HTTP Header Fields (2)
Host
Virtual Host – One Server Handles Multiple Sites
If-Modified-Since
Only Return Data if it Has Been Modified Since This Date
Pragma
General Purpose For “Additional” Headers Not in Standard
Referrer
The URL That Referred One to This URL
User-Agent
Name/Version of the HTTP Client
Used in HTTP Responses:
Allow
Lists the Available Commands Supported by Server
Content-Encoding
Allows for Passing Data in Compressed Formats
Content-Language
Describes the Natural Language of the Intended Audience
Brandeis University Internet/Intranet Spring 2000
19
Common HTTP Header Fields (3)
Content-Length
Size of the Message Body
Content-Type
The MIME Type For the Data
Date
Expires
HTTP Clients Should Not Cache Data After This Date
Last-Modified
Location
Used For Redirection
MIME-Version
Pragma
E.g. no-cache
Retry-After
When Server is Unavailable. Info On When to Try Back
Server
Name/Version of the HTTP Server
Brandeis University Internet/Intranet Spring 2000
20
Common HTTP Header Fields (4)
Title
Descriptive Title of the File
WWW-Authenticate
When Authorization Denied, Tells Client Which Methods of
Authentication are Supported
HTTP Status Codes
Returned By the Server In First Line of Response
Informational (100-199)
Successful (200-299)
Redirection (300-399)
Location in HTTP Header Specifies Redirection
Client Error (400-499)
Server Error (500-599)
Brandeis University Internet/Intranet Spring 2000
21
Common Status Values
200
201
204
300
– OK
– Created (Post Request Was Fulfilled)
- No Content (OK. Nothing For Client to Display
- Multiple Choices
Requested Resource Available From Multiple Locations.
List of Locations Returned in the Response.
301 - Moved Permanently
302 - Moved Temporarily
304 - Not Modified
Document Hasn’t Been Modified Since If-Modified Since Date
400 - Bad Request
401 – Unauthorized
403 - Forbidden
404 – Not Found
500 – Internal Server Error
501 – Not Implemented (Server Does Not Support ThisRequest)
502 – Bad Gateway (Invalid Response From Server)
503 – Service Unavailable
Brandeis University Internet/Intranet Spring 2000
22
Cookies
Cookies Are Name Value Pairs
Stored by the Client
Passed in the HTTP Header
Cookies Have Associated Expiration
Session (Default)
Date / Time
Associated With a URL Path, Not a Page!
Allows Passing Parameters Between Web Pages
Thus Cookies are Used to Provide State
Information to a Stateless Protocol
Brandeis University Internet/Intranet Spring 2000
23
Web Server HTTP Functionality
Content Negotiation
Choose From Several Different Formats Based on
Request
Language Negotiation
Choose From Versions of Same Document Based on
Request
Support for HTTP-Put, HTTP-Delete
Keep-Alive
As-Is
Server Doesn’t Add HTTP Headers
Allows You to Create Specific Behavior
Redirect to Another Site
Never Saved in Browser’s Cache
Brandeis University Internet/Intranet Spring 2000
24
Class Exercise: HTTP
http://www.mkat.com/brandeis/httplist.cfm
Viewhttp.exe
Brandeis University Internet/Intranet Spring 2000
25
Server Log Files
Records Server Activity
Brandeis University Internet/Intranet Spring 2000
26
Some Definitions
Hits
Each HTTP Request is a Hit
Accessing a Web Page May Result in Multiple Hits
E.g. Each Graphic is a Hit
Page Views
Accessing a Single Web Page is a Page View
E.g. Typing in a URL or Clicking on a Link
Visits
A Single Client’s Visit to Your Entire Site (Session)
May Include Multiple Page Views
What Constitutes a Second Visit From the Same Client?
Why is This Important?
Terms are Sometimes Used Interchangeably and Improperly
Compare Apples to Apples
Important for Commercial Web Sites
Advertising is Based on Site Access
Typically Sold on Page View Basis
Brandeis University Internet/Intranet Spring 2000
27
Server Log Files
Many Variations to Web Server Log File Formats
Four Log Files
Access (Transfer) Log
Each Hit is Recorded
User, Date/Time, HTTP Request, etc.
Error Log
Date/Time, Error
Referrer Log
Referring Page, Destination Page
Agent (User) Log
Client’s Browser
Clearly a Need for Standardization
Linking the Four Log Files Together
Brandeis University Internet/Intranet Spring 2000
28
Common Log Format
Host
IP Address (or Hostname) of Client
Some Servers Perform Lookup of IP Address
RFC931
HTTP Request: From
Seldom Used.
Authuser
HTTP Request: Authorization
UserName if Username Authorization is Required
Time Stamp
HTTP Response: Date
E.g. [ 10/Jun/1998:14:23:34 -0700]
Request
The Actual HTTP Request
E.g. GET /index.htm HTTP/1.1
Brandeis University Internet/Intranet Spring 2000
29
Common Log Format (2)
Status
The HTTP Response Status Code
Transfer Volume
HTTP Response: Content-Length
Brandeis University Internet/Intranet Spring 2000
30
Extended Log File Format
Seven Common Log Format Fields Plus
Referrer
HTTP Request: Referrer
User Agent
HTTP Request: User-Agent
Identifies Browser
Other Common Fields
Cookies
Can Help Identify Users
Brandeis University Internet/Intranet Spring 2000
31
Issues
Client vs. User
Typically Don’t Have User Level Information
Only Record IP Address of Computer Used For Access
If Fixed IP Address For a Single User’s Machine
This Can Identify the User
Dynamically Assigned IP Addresses
Identifies the Overall Domain (e.g. AOL.com)
Proxy Servers
All Client’s Have IP Address of Proxy Server
Multiple “Sessions” at Same Time
Impossible to Have Truly Accurate Information
Log File Analysis Software Has Algorithms to Identify Page
Views, Visits
Client Level Caching Affects Logs
“ISP” Level Caching Affects Logs
E.g. AOL Maintains a Cache
No Requirement for Clients, ISPs to Follow Expiration Info
Brandeis University Internet/Intranet Spring 2000
32
Log File Maintenance on Server
Log Files Grow Rapidly
Log Files Compress Very Nicely
Server Configurable
Generate Daily/Weekly/Monthly Logs
Maintenance Scripts to Cleanup Log Files
Compress
Archive
Cycle
E.g. Maintain Current Months Files
Brandeis University Internet/Intranet Spring 2000
33
Log File Analysis
Big Business
Bread and Butter of Sites Driven By Advertising Revenue
Evaluation Factors
Log File Formats Supported
Ability to Link Multiple Logs
How Log Files are Accessed (e.g. via FTP)
Display Methodology
E.g. Available Via Web Pages
Lookup Capabilities
E.g. Map User-Agent to Browser
E.g. Resolve IP Addresses to Domains, Regions
Level of Analysis
E.g. Calculating Visits, Return Visitors
Configurability
Drill-Down Capabilities
Enterprise Capabilities
Ability to Manage Multiple Sites
Brandeis University Internet/Intranet Spring 2000
34
Log File Analysis Options
Important to Understand the Core Log Files
Log File Analysis Programs Make Some Assumptions
Freeware
Commercial
Service Bureaus
Brandeis University Internet/Intranet Spring 2000
35
In Class Exercise / Mini Homework
Download
http://www.mkat.com/brandeis/sample.log
View in Text Editor
Load Into Excel
Delimited / Spaces
Review the Log File in Detail
(Do Not Use Analysis Tools)
Describe What You Can Learn From the Log File
Add it To Your Homepage along With In Class
Exercises
Due Next Week
Brandeis University Internet/Intranet Spring 2000
36
Resources
HTTP
Stein pp. 47-57
Server Comparison
http://webcompare.internet.com/chart.htm
Apache Server
www.apache.org
Website Server
http://website.ora.com
Microsoft IIS
http://www.microsoft.com/NTWorkstation/downloa
ds/Recommended/ServicePacks/NT4OptPk/Default.
asp
Brandeis University Internet/Intranet Spring 2000
37