Transcript Document

The Internet

CT101 –Computing Systems

• The Internet – Architecture – Addressing – Protocols – DNS – E-Mail – WWW – Security

Contents

The Internet

• The Internet: An internet that spans the world – Original goal was to develop a means of connecting networks that would not be disrupted by local disasters.

– Today it has shifted from an academic research project to a commercial undertaking.

4-3

Internet Architecture

• Internet Service Provider (ISP) – Tier-1 – Tier-2 – Tire 1 and tier 2ISPs are networks of routers that collectively provide the Internet’s communication infrastructure • Access ISP: Provides connectivity to the Internet – Traditional telephone (dial up connection) – Cable connections – DSL – Wireless

Internet Composition

Internet Addressing

• IP address: pattern of 32 or 128 bits often represented in dotted decimal notation – E.g. 140.203.8.22 represents an Internet address – 10001100 11001011 00001000 00010110 • Mnemonic address (alternative addressing system more suitable for humans) – Domain names – Top-Level Domains

Package-shipping example

Internet Software Layers

• • • •

Application:

with address Constructs message

Transport:

packets Chops message into

Network:

Handles routing through the Internet

Link:

Handles actual transmission of packets

4-9

Following a message through the Internet

Internet Protocols - TCP/IP Protocol Suite

• Transport Layer – TCP – UDP • Network Layer – IP (IPv4 and IPv6)

Choosing between TCP and UDP

DNS

• IP addresses are difficult to remember – [email protected] would be difficult to remember – If JohnDoe’s mail server moves to another machine, then his e-mail address would not be valid anymore – Something like [email protected] would be appropriate • Need some mechanisms to translate wuzwuz.ucg.ie to the IP address – To solve this problem DNS was invented • It is a hierarchical, domain-based naming scheme and a distributed database system for implementing this naming scheme • Usage: – Map name onto an IP address, an application program calls an library procedure, called

resolver

, passing it the name as a parameter (i.e. gethostbyname() is an resolver) – The resolver sends UDP packet to a local DNS server which looks up the name and returns the IP address to the resolver – The resolver returns the IP address to the application, which can establish an TCP/IP connection with the destination (or send UDP packets)

DNS Name Space

• Internet is divided into over 200 top level domains – Each domain is divided into sub-domains, which are further partitioned, etc..

– All domains can be represented by a tree • The leaves of the tree represent domains that have no sub-domains (but contain machines) • A leaf domain may contain a single host or represent a company and contain thousands of hosts • Top level domains could be generic and country domains

Domain Names

• Can be either absolute (ends with a period i.e.

eng.sun.com.

) or relative (it doesn’t end with a dot) – Relative ones have to be interpreted in a context to find the true meaning – Both of them refers to a specific node in the tree and all the nodes under it • Are case insensitive (

edu, Edu, EDU

are same thing) • Components names can be up to 63 characters and full names should not exceed 255 characters • There is no rule against registering under two top level domains (

sony.com

and

sony.nl

) • Each domain controls how it allocates the domains under it – i.e. Japan makes a domains

ac.jp

and

co.jp

that mimic

edu

and

com

– Netherlands doesn’t make this distinction • To create a new domain, permission is required from the domain that will include it; once created, it can create sub-domains without having to ask permission from the higher up domains.

Name servers

• One DNS server could service all requests – In practice it will be overloaded • To solve this, DNS name space is divided in non overlapping zones – Each zone contains some part of the tree and name servers holding zone info • A zone would have a primary DNS (gets info from disk) • One or more secondary DNS (get info from the primary DNS)

Name Servers – Lookup mechanism

• In the example, a resolver on

flits.cs.vu.nl

is looking for IP address of

linda.cs.yale.edu

to try)

edu

(using

recursive query

, that is found in its database….etc

• Once the records get back to

cs.vu.nl

; some servers don’t implement recursive query and return the address of the next server – The resolver sends a query containing the domain name sought – The query is forwarded by the local name server to the name server for domain name server, they will be entered in a local cache, in case they are needed later;

Internet Corporation for Assigned Names & Numbers (ICANN)

• Allocates IP addresses to ISPs who then assign those addresses within their regions.

• Oversees the registration of domains and domain names.

Traditional Internet Applications

• Electronic Mail (email) – Domain mail server collects incoming mail and transmits outgoing mail – Mail server delivers collected incoming mail to clients via POP3 or IMAP • File Transfer Protocol (FTP) • Telnet and SSH • WWW (World Wide Web)

More Recent Applications

• Voice Over IP (VoIP) • Internet Radio – N-unicast – Multicast

E-Mail

• Architecture and services • User agent • Message formats • Message transfer agents – SMTP • Final delivery

E-Mail Architecture

• E-mail system consists of two parts – User agents, which allow people to read and send email • Local programs that provide a command based or graphical method for interacting with e-mail system – Message transfer agents, which move the messages from source to destination • Are typically system daemons or processes that run in background, having the job to move messages

E-Mail functions

• E-mail system functions –

Composition

– refers to the process of creating messages and answers; although any text editor can be used for the text of the message, the system itself can provide assistance with addressing and numerous header fields – attached to each message

Transfer

– refers to moving messages from the originator to the recipient; this requires establishing a connection to the destination or some intermediate machine, outputting the message and releasing the connection

E-mail functions

• E-mail system functions –

Reporting

– has to do with telling the originator what happened to the message; Was it delivered? Was it rejected? Was it lost?

Displaying

– showing the incoming message is important, so the people can read their e-mail; sometime conversion or a special viewer is required (i.e. if the message is a PS message or an audio file) –

Disposition

– what the recipient does after the message has been received; possibilities include throwing it away before reading it, throwing it away after reading it, saving it and so on.

E-mail architecture and functions

• Distinction between envelope and its contents; –

envelope

encapsulates the message and contains info needed for transporting the message, such as destination address, priority and security level – Message has two parts:

headers

(interpreted by the user agent) and

body

(info for the human recipient)

E-mail user agent

• Sending e-mail – User must provide the message and the destination address (user@dns-address) – User agents may support mailing lists • Receiving e-mail – When an user agent is started, it looks at user’s mailbox before displaying anything – Then it may announce the number of messages in the mailbox

E-mail message format

• Basic ASCII e-mail message using RFC 822 – Messages consists of a primitive envelope (described in RFC821), some number of header fields, a blank line and then the message body – Each header field (logically) consists of a single line of ASCII text, a colon and, for most fields, a value – RFC822 was designed long ago and doesn’t clearly distinguish between the envelope fields and the header fields • This was revised in RFC 2822, however, wasn’t possible to completely redo it due to the widespread usage

Email message transfer

• Message transfer mechanism is concerned with relaying messages from the originator to the destination – This can be done by establishing an transport level connection between the source and the destination and then just transfer the message • SMPT – Simple Mail Transfer Protocol – Source machine establishes a TCP connection on port 25 on destination machine, where SMPT daemon listens. This daemon accepts the incoming connections and copies messages from them into the appropriate mailboxes – If a message can’t be delivered, an error report containing the first part of the undeliverable message is returned to the sender – It is a simple ASCII protocol

SMTP Protocol

• Connection establishment (on port 25) • Data exchange – the client machine (operating as a client) waits for the destination machine (operating as a server) to talk first; – the server begins by sending a line of text giving its identity and telling whether is prepared to receive mail; • if it is not, then the client releases the connection and tries again latter – If the server is willing to accept mail, then the client announces whom the e-mail is coming from and whom it is going to – If such recipient exists at the server end, then the client get the go ahead to send the message – The client sends the message, server acknowledges it • Connection is released

SMTP typical problems

• Some old implementations can’t handle more than 64KB message length • If the server and client have different timeouts, one of them may give up while the other is still busy, unexpectedly terminating the connection • In some situations infinite mail storms can be triggered – If host 1 is holding mailing list A and host 2 holds mailing list B and each list contains an entry for other one, then a message sent to either list could generate a never ending amount of e-mail traffic unless it is checked • RFC281 defines ESMTP (Extended SMTP) – Clients wanting to use it should start initially with EHLO instead HELO; if this is rejected then the server is regular SMTP server

Final delivery

• Assuming that all machines can send and receive mail all the time, the e-mail model so far works • This model breaks for people accessing Internet over a dialup connection – What happens when Elinor wants to send Carolyn e-mail and Carolyn is not currently online?

• One solution is to have a message transfer agent on ISP machine; since this transfer agent can be online all the time, e-mail can be sent 24 hours a day – This solution creates another problem: how does the user gets e mail from ISP’s message transfer agent • Solution to create another protocol that allows user transfer agents (on client PCs) to contact the message transfer agent (on ISP’s machine) and allow e mail to be copied from ISP to the user • One such protocol is POP3 (Post Office Protocol Version 3), RFC 1939

Final delivery

(a) Sending and reading mail when the receiver has a permanent Internet connection and the user agent runs on the same machine as the message transfer agent. (b) Reading e-mail when the receiver has a dial-up connection to an ISP.

POP3

• Starts when the user starts the mail reader • Mail reader calls up the ISP (if there is no connection) and establishes a TCP connection with the message transfer agent on port 110; – Authorization • Having user logged in by sending its username and password – Transactions • User collecting the e-mails and marking them for deletion – Update • Causes the e-mails to be deleted

IMAP

• POP3 works fine for users with one e-mail account with one ISP, accessed from one PC – If mail was accessed from different locations, user may loose e mails, security issues may appear, etc • An alternative final delivery protocol, IMAP (Internet Message Access Protocol), defined in RFC2060 – Instead assuming that all messages will be downloaded and work offline after that (like POP3), IMAP assumes that all e-mail will remain on the server indefinite in multiple mailboxes – Provides extensive mechanisms to read messages or parts of messages, mechanisms to create, destroy and manipulate multiple mailboxes.

POP3 vs. IMAP

Web Mail

• Various companies (i.e. Hotmail and Yahoo) provide e-mail service using Web mail.

• Normal message transfer agents are listening on port 25 for incoming SMTP connections • Messages are delivered using special web pages; when the user goes to the e-mail Web page, a form is presented in which the user is asked for a login name and password.

WWW • Architectural Overview • Static Web Documents • Dynamic Web Documents • HTTP – The HyperText Transfer Protocol • Performance Enhancements • The Wireless Web

Architecture Overview

Web is a collection of

web pages

Each page contains

links

to other pages

Hypertext

– idea of having one page point to another. It is text, displayed on a computer, with references (hyperlinks) to other text that the reader can immediately follow

Browser

– program to view pages

Hyperlinks

– strings of text that are links to other pages • • Example: Typical web page The page reached by clicking on Department of Animal Psychology.

Architectural Overview

• Browser displays a page on the client machine – Click on a link, the browser sends a message to the abcd.com web server asking it for the page – When page arrives, it is displayed; if it contains a hyperlink on a page on xyz.com, that is clicked, then the browser will send a message to xyz.com server and the process continues

Client side

• Pages are named using URL (Uniform Resource Locators) (i.e. http://www.abcd.com/products.html) – Name of protocol (http) – DNS name of the machine where the page is located ( www.abcd.com

) – The name of the file containing the page (products.html) • When the link is selected: – Browser detects the URL (by reading the input) – Browser asks DNS server for IP address of www.abcd.com

– DNS replies with IP address: 156.106.192.32

– Browser makes a TCP connection on port 80 to 156.106.192.32

– It sends a request asking for file /products.html

– Web server www.abcd.com sends file /products.html – TCP connection is released – Browser displays all the text in /products.html

• Web pages are written in standard language called HTML – A page may consist of a formatted document in PDF format, an icon in GIF format, a video in MPEF format, a song in MP3 format, or any other format

Client side

• The browser may have problems interpreting all of these formats … rather than making the browsers larger and larger, a more general solution is adopted. • When a server returns a page, it usually returns some information about the page – MIME type of the page • Pages of type text/html are just displayed directly • If MIME type is not of a built in type, then the browser consults an internal table with associations between MIME types and viewers • Two possibilities – Plug-ins – special modules that the browser loads in its memory space – Helper applications – separate process that takes as parameter the name of the file to display

Server side

• Typical web server operations: – Accept TCP connection – Get the name of the file requested – Get the file (from disk) – this can be a lengthy operation, since every disk access takes in average 5ms (access time) + time to read the file (up the file length); – Return the file to the client – Release the TCP connection • Improvements – Maintain a cache with last “n” most accessed files – Multithreaded server

Server side

• If too many requests come into one second, the CPU will not be able to manage the load, no matter how many disks are used in parallel – The solution is to add more nodes (computers), possible with replicated disks (

server farms

) – A front end still accepts incoming requests and “sprays” them over multiple CPUs rather than multiple threads – Individual machines may be multithreaded and pipelined as before

URL – Uniform Resource Locator

URL – Uniform Resource Locator

• Some common URLs

Stateless and cookies

• • Web is stateless, no concept of login session; the browser sends a request to the server, and gets back a file; server forgets it seen that particular client – A quick solution would be to observe client’s IP addresses (not good since they could be NAT-ed) Cookies (Netscape) solve this problem, by having the server supply additional information when a client requests a page; this info may include a cookie, that is a small (4KB) file or string – Contains up to five fields – When the browser sends a request to a page, it first checks to see if it has an associated cookie with the domain the request is going to. If yes, then it appends this cookie to the request; the server gets it and interpret it any way it wants

HTML – HyperText Markup Language

• (a) HTML source code – Markup language containing explicit commands for formatting • (b) Formatted page

Hypertext Document Format

• Encoded as text file • Contains tags to communicate with browser – Appearance •

to start a level one heading •

to start a new paragraph – Links to other documents and content • – Insert images •

A simple Web page

A simple Web page (continued)

An enhanced simple Web page

An enhanced simple Web page (continued)

Typical HTML tags

HTML versions

• HTML 1.0 was one way • Large demand for two-way traffic – HTML 2.0 included

forms

; contain boxes or buttons that allow users to fill in information or make choices and send the information back to the server

Forms

• • • (a) HTML source code (containing forms) (b) Formatted page Possible response from the user (request from the browser to the server, with filled in values)

Dynamic Web documents

• So far, the model we have used is a static one – A client requests a file, the server sends it back • Content has become more and more dynamic (generated on demand) – Content generation can take place at both server side and client side.

Server side dynamic web page

• Steps in processing a HTML form

PHP form processing example

(a) A Web page containing a form.

(b) A PHP script for handling the output of the form. (c) Output from the PHP script when the inputs are "Barbara" and 24 respectively.

Client-Side Web Page Generation

• CGI, PHP, JSP and ASP solve the problem of handling forms and interaction from forms – They can accept incoming information from forms, look up info in one or more databases, generate HTML pages with the results – They can’t respond to mouse movements or interact directly with users • Client side web page generation – It is required to have scripts embedded into HTML pages, scripts that will be executed on client’s machine; starting with HTML 4.0, such scripts are permitted starting with tag