Transcript Data Modeling - Hiram College
Applications
CPSC 363 Computer Networks Ellen Walker Hiram College (Includes figures from
Computer Networking
by Kurose & Ross, © Addison Wesley 2002)
Review of Layers
• Layer 5: Application (messages) – Implemented on hosts • Layer 4: Transport (segments) – Connection-oriented service (connects hosts) • Layer 3: Network (datagrams / packets) – Routing service (packets between hosts) • Layer 2: Link (frames) • Layer 1: Physical (bits - electric or light) – Layers 2 & 1 in network card. Accurate transmission of bit sequences across physical links (wires, cable, radio, etc.).
Application Layer point of view
• Client requests/establishes a connection with server (via the network layer) • Client sends and receives
messages
across the connection • Client (or server) closes the connection (again via the network layer) • A host can maintain multiple connections at the same time!
– Multiple ports – Multiple processes
Application vs. Protocol
• Network application can consist of many components – Data storage & retrieval – Data formatting & presentation – Encoding / decoding encrypted data – Messages between client & server (or peers) • Only the last of these uses network application protocol
Not all standards are protocols
• Web services – Browser-server communication (HTTP) – Document formatting standard (HTML or XML or PDF or JPG or …) • Email services – Client-server communication (SMTP, IMAP, POP) – Message attachment formatting (MIME)
Protocols define…
• What types of messages are exchanged?
• How is the message formatted into fields? (Syntax) • What do the fields mean? (Semantics) • What are legal messages and responses to messages? (Rules)
Finding Internet Protocols
• Internet protocols are defined in RFC (Request for Comments) documents • A searchable database of these documents is available at http://www.rfc-editor.org
• Not all RFC documents are standards; some are informational, experimental, etc.
Client and Server
• Client – Where the “user” sits, usually – Initiates the conversation (in nearly all cases) • Server – Responds to client’s requests – Provides “services” such as file storage, database query, etc.
• Peer to Peer – Both hosts play both roles
Process Communication
• A process is a running program with its own program counter, registers and memory • Most modern operating systems run many processes simultaneously • Processes communicate with each other through
sockets
(also called
APIs
) – Limited interaction with network layer – Establish type of transport protocol and a few parameters (e.g. segment size) only
Message Addressing
• Which host is this message going to?
– IP address (e.g. 143.206.149.21) – We’ll discuss in detail later (network layer) • Which process on that host will receive the message?
– Port number (e.g. 80 for HTTP) – Standard port numbers have been assigned (see http://www.iana.org/assignments/port-numbers ) – We’ll discuss ports in detail later (transport layer)
Applications’ Demands on Transport Layer
• Reliable Data Transfer – How much loss is acceptable?
– None (e.g. financial applications) vs. some (e.g multimedia) • Bandwidth – What transmission rate is necessary?
– Minimum requirements (e.g. streaming video) vs. use whatever is available (e.g. web, file transfer) • Timing – What end-to-end delay is acceptable?
– Some applications (e.g. telephony) have strict constraints
Transport Layer Services
• TCP – Connection-oriented (handshake) – Reliable transmission – Congestion control (throttling) – No guaranteed minimum transmission rate • UDP – Connectionless (no handshake) – Unreliable (no guarantee of receipt or ordering) – No guaranteed minimum transmission rate
Examples (fig. 2.5)
Application
E-mail Remote terminal Web File Transfer Streaming MM Internet Telephony
App-Layer Protocol
SMTP Telnet HTTP FTP Proprietary Proprietary
Transport Protocol
TCP TCP TCP TCP UDP or TCP UDP
Telnet
• Perhaps the simplest protocol • Opens a TCP connection between two hosts through a specified port • Whatever you type is sent through the connection • Typically used for terminal connection (now superseded by SSH for secure connections) • Can telnet to any port – telnet cs.hiram.edu 23 (terminal connection; default) – telnet cs.hiram.edu 80 (web server connection)
The WorldWide Web
• Basis in hyperlinks and hypertext (documents) – Proposed by Vannevar Bush (Memex) 1945 – “Hypertext” coined by Ted Nelson 1965 – Hypertext in education at Brown (FRESS) 1966 - 198x?
– Hypercard (Apple) 1987 – See http://ei.cs.vt.edu/book/chap1/htx_hist.html
• Historical path… – FTP (file transfer protocol - files aren’t displayed) – Gopher (displays directories & text files) – Web (embedded links can link to any document) • Search engines • Multimedia indexes • Front ends to databases • ETC!
Web Application Vocabulary
• Web page (document) - collection of objects – Usually base HTML file + several referenced objects • Object - any file addressable by a single URL • URL (Uniform Resource Locator) - how to reach an object – Host address + object’s path name • Browser - user agent (client) for the web • Web server - houses objects
HyperText Transfer Protocol (HTTP)
• Request / response protocol – Client requests an object (URL) – Server provides the object(s) corresponding to that URL – Non-persistent (1 object only) vs persistent (explicit close) • Stateless protocol – The server doesn’t store any knowledge (state) of the client – E.g. server doesn’t remember what pages client looked at
HTTP Uses TCP
• Client initiates TCP connection to server, port 80 • Server accepts connection • Client sends HTTP message & Server responds (one or more times) • TCP connection closed
Non-persistent HTTP conversation
• Client to server: “[address] I’d like to talk to you ” • Server to client: “OK, I will talk with you” • Client to server: “Thanks. Please send me [path]” • Server to Client: “Here’s the object you asked for. [Object] Goodbye.” • Client to server: “Got it. Goodbye” • If the object contains embedded links, an additional complete conversation is needed for each!
How long does it take?
• Define RTT (Round Trip Time) as time for one message to travel from client to server & server to client (includes all delays) • Total time is 2*RTT+ file transmission – Beginning of handshake (1 RTT) – End of handshake + transmit request + first packet of response (1 RTT) – Transmit the rest of the file (depends on file size) • If file has 10 images, 22*RTT + file times
Persistent Transmission
• TCP connection remains open until explicitly closed.
• Previous example now takes 2 RTT for setup, plus 1 RTT per request, plus file time (12RTT+ file time) • With pipelining, new requests are sent as files are received, so server is never idle. Only 1RTT for setup, plus 1 RTT for
all objects.
Example now takes 2RTT + file time.
Message Format (Request)
• Request line (command, addr, protocol) GET /~walkerel/cs363/index.html HTTP/1.1
• Header lines (fields & values) Host: cs.hiram.edu
Connection: close User-agent: Mozilla/4.0
nonpersistent browser id
Accept-language: en
preferred lang.
• Entity body (for POST) contains contents of forms filled out • End with 2 CR/LFs
HTTP Commands
• GET path – Get a file • GET path?var=val… – Get a file, specify values (from form) • POST path – Run a program that resides at the specified path – Program will generate a web page, which is the server’s response
Additional Commands
• HEAD – Requests header lines but not the actual file • PUT (1.1 only) – Uploads file to path specified in URL field • DELETE (1.1 only) – Deletes file specified by URL field
Message Format (Response)
• Status line (protocol, status code, message) HTTP/1.1 200 OK • Header lines Connection: close Date: Sat, 25 Jan 2003 12:15:00 EDT Server: Apache/1.3.0 (Unix) Last Modified: … Content Length: … Content-Type: text/html • Data
Status Codes
• 200 OK • 301 Moved Permanently (new URL provided in Location: header) • 400 Bad Request (generic error message) • 404 Not Found • 505 HTTP Version Not Supported (on this server)
Practice HTTP
• Telnet to your favorite server, e.g. cs.hiram.edu, using port 80 telnet cs.hiram.edu 80 • Enter HTTP message, followed by a blank line GET /~walkerel/cs363/index.html
Host: cs.hiram.edu
Cookies
• Information stored by the client to identify the client to the server • The “cookie” is a unique identification number for the user. (e.g. to index purchase history) • It is stored in a “cookie file” on the client machine, and provided to the server as part of a request message • The server can then create “personalized” responses • Cookies can also authenticate, so you can “save your password”
Cookies in HTTP
• In the HTTP Response message (from the server) – Set-cookie: 1253261 • In future HTTP Request messages (from the client) – Cookie: 1253261
Web Cache (Proxy Server)
• Stores copies of recently requested items • Browser first requests item from Proxy Server – If item is stored, it is sent – Otherwise, item is retrieved from external server, stored, then sent • Proxy Server acts as both client and server • Proxy server can also refuse requests – Prevent browsing to “inappropriate” sites • Risk: page has changed since saved (stale page)
Advantage of Web Cache
• Increases average response time – Response time of “hit” is very fast (item in cache) – Response time across network much slower than LAN response time – Hit rates 0.2 - 0.7 in practice (20%-70% of accesses are repeats) • Average response time = – Hit rate * LAN delay + (1-Hit rate) * net delay
Example: Cache Advantage
• Assumptions (Section 2.2.6) – LAN delay = 0.01 sec – Net delay = 2.01 sec – Hit rate = 40% (0.4) • No cache – 2.01 seconds delay • With cache – .4 * 0.01 + .6 * 2.01 = 0.004+ 01.206 = – 1.21 seconds delay
Protocol for Avoiding Stale Pages
• Server requests page
only if changed
GET) (Conditional GET
file
HTTP/1.1
Host:
Ipaddress
If-modified-since:
date
• Response if not changed: HTTP/1.1 304 Not Modified Date:
date
Server:
server
• Response if changed is the same as before HTTP:/1.1 200 OK
Additional headers + data
File Transfer (FTP)
• Send a file from one host to another – User can sit on “donor” or “recipient” host • User provides authentication information once for all transfers – Username & password, or ‘anonymous’ & email address • Connection is persistent until an explicit close • Example: ftp pub/reid.txt from rtfm.mit.edu
FTP Uses 2 Connections
• Control connection – Sends user id, password, commands – “Out of band” because not interspersed with data – Port 21 (TCP) • Data connection (TCP) – Sends actual files – A new data connection is created for each file
Unlike HTTP, State is maintained
• Server remembers which user is connected – vs. HTTP Authorization header in every message • Server remembers current directory – vs. HTTP full path in every message • Because state is maintained, the number of simultaneous connection is limited, relative to HTTP
FTP Commands
• USER username • PASS password • LIST (list the files in the current directory) • RETR filename (retrieve from remote host) • STOR filename (store onto remote host) • Client commands aren’t quite identical (eg. GET, PUT) and may allow additional arguments
Electronic Mail
• User Agent – Allows user to send and receive email – Generally allows access to stored email – e.g. MS Outlook, Eudora • Mail Server – Delivers email, stores it in user’s mailbox (at least) until read – Sends off-site email; queues and retries if external host isn’t available
SMTP (Simple Mail Transfer Protocol)
• All messages (not just headers) restricted to 7-bit ASCII (must be encoded/decoded by user agent) • Transfers mail from origin host to destination host (no intermediate servers) • Commands include HELO, MAIL FROM, RCPT TO, and DATA • To try it: telnet serverName 25
SMTP vs. HTTP
• HTTP is “pull protocol”, SMTP is “push protocol” • SMTP requires 7-bit ASCII, even for data; HTTP allows any format • SMTP puts all data into one message – MIME encoding (Multipurpose Internet Mail Extensions) • Content-Type: and Content-Transfer-Encoding: headers
MIME Types
• Multipart/mixed – Look for part boundaries, content headers • Text/plain • Text/html • Image/gif or image/jpeg • Application/msword, application/pdf, etc.
Mail Headers
• Some from user (e.g. To, cc) • Some from user agent (e.g. Date, From, MIME headings) • Some from servers (e.g. Received) • Most user agents allow “full headers” to be viewed.
Mail Header Example
• • • • • • • • • From: Thomas Bagley [email protected]
Subject: ACM Member Technical Interest Service January 2010 Date: January 26, 2010 5:46:47 PM EST To: Ellen Walker [email protected]
Received: from mail.hiram.edu ([206.57.41.42]) by hiramr.hiram.edu with Microsoft SMTPSVC(6.0.3790.3959); Tue, 26 Jan 2010 17:46:54 -0500 Received: from smtp161.redcondor.net ([206.57.41.40]) by mail.hiram.edu with Microsoft SMTPSVC(6.0.3790.3959); Tue, 26 Jan 2010 17:46:54 -0500 Received: from acm26-2.acm.org ([199.222.69.107]) by smtp161.redcondor.net ({6c7b74fb-260e-4729-9476-f743470f315e}) via TCP (inbound) with ESMTP id 20100126224651153 for
Mail Header Example (Cont’d)
• • • • • • • • • X-Rc-From: [email protected]
X-Rc-Rcpt: [email protected]
Message-Id: 10171514.34121264546007418.JavaMail.Administrator@acm28-8 Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Mailer: ColdFusion 8 Application Server Return-Path: [email protected]
X-Originalarrivaltime: 26 Jan 2010 22:46:54.0837 (UTC) FILETIME=[7568BA50:01CA9ED9]
Mail Access Protocols
• Protocols for conversation between user agent and mail server (SMTP is for mail server to mail server communication) • POP3 – Authorization, transaction, update (after client quit) • IMAP – Allows users to store mail in folders on server – Clients can access message components (e.g. headers only) • WebMail (HTTP) – Mail accessed through a web page using a browser; no application-specific client or protocol
Domain Name System (DNS)
• Translates between mnemonic hostnames and numeric IP addresses – www.hiram.edu
= 206.57.41.47
– Command “host” can look up an address • Distributed database implemented in hierarchy of DNS servers • Application-layer protocol that allows hosts to query this database – Used by other application-layer protocols such as HTTP for name-address translation – This adds a delay to each HTTP request
Database is Distributed
• Root servers (13) point to… • Top-level domain servers (com, org, edu, uk,…) point to… • Authoritative servers (per organization) • Local servers (per ISP) a Verisign, Dulles, VA c Cogent, Herndon, VA (also LA) d U Maryland College Park, MD g US DoD Vienna, VA h ARL Aberdeen, MD j Verisign, ( 21 locations) k RIPE London (also 16 other locations) i Autonomica, Stockholm (plus 28 other locations) m WIDE Tokyo (also Seoul, Paris, SF) b USC-ISI Marina del Rey, CA l ICANN Los Angeles, CA
Translating a Domain (iterative)
• Ask local DNS • Local DNS asks root DNS • Root DNS responds with appropriate top level DNS • Local DNS asks top-level DNS • TL DNS responds with appropriate organization’s authoritative DNS • Local DNS asks Authoritative DNS, and receives address (which it probably caches)
Translating a Domain (recursive)
• Ask local DNS • Local DNS asks root DNS • Root DNS asks top-level DNS • TL DNS asks organization’s authoritative DNS • Organization’s DNS responds to TL DNS, which forwads to Root DNS, which forwards to local DNS, which responds to original client • (Caching as appropriate throughout)
DNS Record Types
• A – Name = hostname, value = IP address • NS – Name = domain (hiram.edu), value = hostname of authoritative DNS • CNAME – Name = canonical hostname (foo.com), value = real hostname (relay1.bar.foo.com) • MX – Name = canonical mail name (gmail.com), value = real hostname (gsmtp183.google.com)
Creating a new DNS Record
• Entity “registers” the site, by paying a registrar and providing authoritative DNS server IP addresses • Registrar verifies uniqueness of name and enters NS and A type records into database • You provide A and MX records for your own servers at your authoritative DNS servers
Why Not Central Name Server?
• Single point of failure • Too many requests • Does not scale !!!!!
P2P (Peer to Peer) Applications
• All content transferred directly between peers without passing through third-party servers – Does not rely on always-on (24/7) servers • When a client requests an object – Find a server, currently connected that has that object – Server transmits object to client • A given host can be both client and (transient) server • Applications include: – File distribution – VOIP (e.g. Skype)
File Sharing
• Goal: Distribute a file from a single server to many hosts • Client-server – Huge burden on source host (must connect to all recipients) • P2P – Any host that has received some portion of the file can redistribute to others (sharing the burden) – Most popular (2009): BitTorrent
How long does it take? (C/S)
• N bits in file, u = upload rate, dmin = min download rate • 1 server -> N clients • Upload time = N*F/u • Max download time = F / dmin • Overall: max (N*F/u, F/dmin) • Assume N is big enough, result is N*F/u
How Long Does It Take? (P2P)
• To get file out, server must send every bit once (F/u) • Slowest recipient will take F/dmin time to get its complete file • Total upload capacity is sum of upload capacity of all uploaders: – uTotal = u + u1 + u2… • Overall time = max (F/u, F/dmin, N*F/uTotal) • When N gets bigger, so does uTotal!
BitTorrent
• • • • P2P Protocol for file distribution
Torrent
= collection of all peers involved in distribution of a single file
Chunks
256K) = equal-sized pieces of file (e.g.
Tracker
= special infrastructure node (one tracker per torrent)
When a peer joins a torrent…
• Tracker sends peer N (say 50) random addresses of hosts in the torrent • Try to connect to all of them. (Successful connections called ‘neighbors’) • Ask each neighbor for lists of chunks they have • Request (from appropriate neighbor) each chunk you don’t have
Rarest First
• Ask for the chunk which is held by the fewest of my neighbors • Result: more copies of that chunk, roughly equalizing the availability of chunks
Responding to Requests (Tit for Tat)
• Host receives many requests • Respond to requests from 4 neighbors sending bits at highest rate (
unchoked
) – These are fastest – These are ‘most generous’ • Respond to requests from a fifth neighbor at random (
optimistically unchoked
) – Might become ‘top-4’ of this neighbor!
– Allows more neighbors to get in on the action.
Distributed Hash Tables
• Need to maintain searchable index of (key, value) pairs • Cannot contain it on a single host (point of failure) – Napster did this in the early days of P2P • Distribute pairs among hosts – How to avoid having all hosts contain all pairs?
– How to avoid having all hosts contact all hosts?
Answer: Use the Hash Value!
• Assign an integer to each host (same range as hash values) • Assign each (key, value) pair to the host whose integer is closest to hash(key) – Equal is closest, then successor (wraps around) • Each host knows its successor (last -> first) – circular
overlay network
To Add or Find
• Peer receives message with hash(key) • If ID is closer to hash(key) than successor’s ID, then peer responds directly to message’s sender – ID ≥ hash(key) – Successor ID < hash(key) • Else peer passes on the message to its successor
Evaluating Circular Network
• Advantage: – Every peer needs to keep track of only 2 neighbors (predecessor and successor) • Disadvantage: – When the circle gets big, messages take a long time to go around!
• Solution: – Add a few “shortcut” links across the circle – Trades off more neighbors vs. shorter travel time
Peer Churn
• Remember, hosts come & go, not 24/7 • By the original plan, if my successor is lost, I am disconnected!
• Instead – Each node tracks 2 successors – Periodically check both your successors are there – If one is gone, find the other’s successor so you still have 2 successors • To join, pass a message around the circle
P2P Case study: Skype
Skype clients (SC) • inherently P2P: pairs of users communicate.
• proprietary application layer protocol (inferred via reverse engineering) Skype login server Supernode (SN) • hierarchical overlay with SNs • Index maps usernames to IP addresses; distributed over SNs From Kurose & Ross Slides v. 5 67