Data Modeling - Hiram College

Transcript Data Modeling - Hiram College

Applications

CPSC 363 Computer Networks Ellen Walker Hiram College (Includes figures from

Computer Networking

by Kurose & Ross, © Addison Wesley 2002)

Review of Layers

• Layer 5: Application (messages) – Implemented on hosts • Layer 4: Transport (segments) – Connection-oriented service (connects hosts) • Layer 3: Network (datagrams / packets) – Routing service (packets between hosts) • Layer 2: Link (frames) • Layer 1: Physical (bits - electric or light) – Layers 2 & 1 in network card. Accurate transmission of bit sequences across physical links (wires, cable, radio, etc.).

Application Layer point of view

• Client requests/establishes a connection with server (via the network layer) • Client sends and receives

messages

across the connection • Client (or server) closes the connection (again via the network layer) • A host can maintain multiple connections at the same time!

– Multiple ports – Multiple processes

Application vs. Protocol

• Network application can consist of many components – Data storage & retrieval – Data formatting & presentation – Encoding / decoding encrypted data – Messages between client & server (or peers) • Only the last of these uses network application protocol

Not all standards are protocols

• Web services – Browser-server communication (HTTP) – Document formatting standard (HTML or XML or PDF or JPG or …) • Email services – Client-server communication (SMTP, IMAP, POP) – Message attachment formatting (MIME)

Protocols define…

• What types of messages are exchanged?

• How is the message formatted into fields? (Syntax) • What do the fields mean? (Semantics) • What are legal messages and responses to messages? (Rules)

Finding Internet Protocols

• Internet protocols are defined in RFC (Request for Comments) documents • A searchable database of these documents is available at http://www.rfc-editor.org

• Not all RFC documents are standards; some are informational, experimental, etc.

Client and Server

• Client – Where the “user” sits, usually – Initiates the conversation (in nearly all cases) • Server – Responds to client’s requests – Provides “services” such as file storage, database query, etc.

• Peer to Peer – Both hosts play both roles

Process Communication

• A process is a running program with its own program counter, registers and memory • Most modern operating systems run many processes simultaneously • Processes communicate with each other through

sockets

(also called

APIs

) – Limited interaction with network layer – Establish type of transport protocol and a few parameters (e.g. segment size) only

Message Addressing

• Which host is this message going to?

– IP address (e.g. 143.206.149.21) – We’ll discuss in detail later (network layer) • Which process on that host will receive the message?

– Port number (e.g. 80 for HTTP) – Standard port numbers have been assigned (see http://www.iana.org/assignments/port-numbers ) – We’ll discuss ports in detail later (transport layer)

Applications’ Demands on Transport Layer

• Reliable Data Transfer – How much loss is acceptable?

– None (e.g. financial applications) vs. some (e.g multimedia) • Bandwidth – What transmission rate is necessary?

– Minimum requirements (e.g. streaming video) vs. use whatever is available (e.g. web, file transfer) • Timing – What end-to-end delay is acceptable?

– Some applications (e.g. telephony) have strict constraints

Transport Layer Services

• TCP – Connection-oriented (handshake) – Reliable transmission – Congestion control (throttling) – No guaranteed minimum transmission rate • UDP – Connectionless (no handshake) – Unreliable (no guarantee of receipt or ordering) – No guaranteed minimum transmission rate

Examples (fig. 2.5)

Application

E-mail Remote terminal Web File Transfer Streaming MM Internet Telephony

App-Layer Protocol

SMTP Telnet HTTP FTP Proprietary Proprietary

Transport Protocol

TCP TCP TCP TCP UDP or TCP UDP

Telnet

• Perhaps the simplest protocol • Opens a TCP connection between two hosts through a specified port • Whatever you type is sent through the connection • Typically used for terminal connection (now superseded by SSH for secure connections) • Can telnet to any port – telnet cs.hiram.edu 23 (terminal connection; default) – telnet cs.hiram.edu 80 (web server connection)

The WorldWide Web

• Basis in hyperlinks and hypertext (documents) – Proposed by Vannevar Bush (Memex) 1945 – “Hypertext” coined by Ted Nelson 1965 – Hypertext in education at Brown (FRESS) 1966 - 198x?

– Hypercard (Apple) 1987 – See http://ei.cs.vt.edu/book/chap1/htx_hist.html

• Historical path… – FTP (file transfer protocol - files aren’t displayed) – Gopher (displays directories & text files) – Web (embedded links can link to any document) • Search engines • Multimedia indexes • Front ends to databases • ETC!

Web Application Vocabulary

• Web page (document) - collection of objects – Usually base HTML file + several referenced objects • Object - any file addressable by a single URL • URL (Uniform Resource Locator) - how to reach an object – Host address + object’s path name • Browser - user agent (client) for the web • Web server - houses objects

HyperText Transfer Protocol (HTTP)

• Request / response protocol – Client requests an object (URL) – Server provides the object(s) corresponding to that URL – Non-persistent (1 object only) vs persistent (explicit close) • Stateless protocol – The server doesn’t store any knowledge (state) of the client – E.g. server doesn’t remember what pages client looked at

HTTP Uses TCP

• Client initiates TCP connection to server, port 80 • Server accepts connection • Client sends HTTP message & Server responds (one or more times) • TCP connection closed

Non-persistent HTTP conversation

• Client to server: “[address] I’d like to talk to you ” • Server to client: “OK, I will talk with you” • Client to server: “Thanks. Please send me [path]” • Server to Client: “Here’s the object you asked for. [Object] Goodbye.” • Client to server: “Got it. Goodbye” • If the object contains embedded links, an additional complete conversation is needed for each!

How long does it take?

• Define RTT (Round Trip Time) as time for one message to travel from client to server & server to client (includes all delays) • Total time is 2*RTT+ file transmission – Beginning of handshake (1 RTT) – End of handshake + transmit request + first packet of response (1 RTT) – Transmit the rest of the file (depends on file size) • If file has 10 images, 22*RTT + file times

Persistent Transmission

• TCP connection remains open until explicitly closed.

• Previous example now takes 2 RTT for setup, plus 1 RTT per request, plus file time (12RTT+ file time) • With pipelining, new requests are sent as files are received, so server is never idle. Only 1RTT for setup, plus 1 RTT for

all objects.

Example now takes 2RTT + file time.

Message Format (Request)

• Request line (command, addr, protocol) GET /~walkerel/cs363/index.html HTTP/1.1

• Header lines (fields & values) Host: cs.hiram.edu

Connection: close User-agent: Mozilla/4.0

nonpersistent browser id

Accept-language: en

preferred lang.

• Entity body (for POST) contains contents of forms filled out • End with 2 CR/LFs

HTTP Commands

• GET path – Get a file • GET path?var=val… – Get a file, specify values (from form) • POST path – Run a program that resides at the specified path – Program will generate a web page, which is the server’s response

Additional Commands

• HEAD – Requests header lines but not the actual file • PUT (1.1 only) – Uploads file to path specified in URL field • DELETE (1.1 only) – Deletes file specified by URL field

Message Format (Response)

• Status line (protocol, status code, message) HTTP/1.1 200 OK • Header lines Connection: close Date: Sat, 25 Jan 2003 12:15:00 EDT Server: Apache/1.3.0 (Unix) Last Modified: … Content Length: … Content-Type: text/html • Data

Status Codes

• 200 OK • 301 Moved Permanently (new URL provided in Location: header) • 400 Bad Request (generic error message) • 404 Not Found • 505 HTTP Version Not Supported (on this server)

Practice HTTP

• Telnet to your favorite server, e.g. cs.hiram.edu, using port 80 telnet cs.hiram.edu 80 • Enter HTTP message, followed by a blank line GET /~walkerel/cs363/index.html

Host: cs.hiram.edu

Cookies

• Information stored by the client to identify the client to the server • The “cookie” is a unique identification number for the user. (e.g. to index purchase history) • It is stored in a “cookie file” on the client machine, and provided to the server as part of a request message • The server can then create “personalized” responses • Cookies can also authenticate, so you can “save your password”

Cookies in HTTP

• In the HTTP Response message (from the server) – Set-cookie: 1253261 • In future HTTP Request messages (from the client) – Cookie: 1253261

Web Cache (Proxy Server)

• Stores copies of recently requested items • Browser first requests item from Proxy Server – If item is stored, it is sent – Otherwise, item is retrieved from external server, stored, then sent • Proxy Server acts as both client and server • Proxy server can also refuse requests – Prevent browsing to “inappropriate” sites • Risk: page has changed since saved (stale page)

Advantage of Web Cache

• Increases average response time – Response time of “hit” is very fast (item in cache) – Response time across network much slower than LAN response time – Hit rates 0.2 - 0.7 in practice (20%-70% of accesses are repeats) • Average response time = – Hit rate * LAN delay + (1-Hit rate) * net delay

Example: Cache Advantage

• Assumptions (Section 2.2.6) – LAN delay = 0.01 sec – Net delay = 2.01 sec – Hit rate = 40% (0.4) • No cache – 2.01 seconds delay • With cache – .4 * 0.01 + .6 * 2.01 = 0.004+ 01.206 = – 1.21 seconds delay

Protocol for Avoiding Stale Pages

• Server requests page

only if changed

GET) (Conditional GET

file

HTTP/1.1

Host:

Ipaddress

If-modified-since:

date

• Response if not changed: HTTP/1.1 304 Not Modified Date:

date

Server:

server

• Response if changed is the same as before HTTP:/1.1 200 OK

Additional headers + data

File Transfer (FTP)

• Send a file from one host to another – User can sit on “donor” or “recipient” host • User provides authentication information once for all transfers – Username & password, or ‘anonymous’ & email address • Connection is persistent until an explicit close • Example: ftp pub/reid.txt from rtfm.mit.edu

FTP Uses 2 Connections

• Control connection – Sends user id, password, commands – “Out of band” because not interspersed with data – Port 21 (TCP) • Data connection (TCP) – Sends actual files – A new data connection is created for each file

Unlike HTTP, State is maintained

• Server remembers which user is connected – vs. HTTP Authorization header in every message • Server remembers current directory – vs. HTTP full path in every message • Because state is maintained, the number of simultaneous connection is limited, relative to HTTP

FTP Commands

• USER username • PASS password • LIST (list the files in the current directory) • RETR filename (retrieve from remote host) • STOR filename (store onto remote host) • Client commands aren’t quite identical (eg. GET, PUT) and may allow additional arguments

Electronic Mail

• User Agent – Allows user to send and receive email – Generally allows access to stored email – e.g. MS Outlook, Eudora • Mail Server – Delivers email, stores it in user’s mailbox (at least) until read – Sends off-site email; queues and retries if external host isn’t available

SMTP (Simple Mail Transfer Protocol)

• All messages (not just headers) restricted to 7-bit ASCII (must be encoded/decoded by user agent) • Transfers mail from origin host to destination host (no intermediate servers) • Commands include HELO, MAIL FROM, RCPT TO, and DATA • To try it: telnet serverName 25

SMTP vs. HTTP

• HTTP is “pull protocol”, SMTP is “push protocol” • SMTP requires 7-bit ASCII, even for data; HTTP allows any format • SMTP puts all data into one message – MIME encoding (Multipurpose Internet Mail Extensions) • Content-Type: and Content-Transfer-Encoding: headers

MIME Types

• Multipart/mixed – Look for part boundaries, content headers • Text/plain • Text/html • Image/gif or image/jpeg • Application/msword, application/pdf, etc.

Mail Headers

• Some from user (e.g. To, cc) • Some from user agent (e.g. Date, From, MIME headings) • Some from servers (e.g. Received) • Most user agents allow “full headers” to be viewed.

Mail Header Example

• • • • • • • • • From: Thomas Bagley [email protected]

Subject: ACM Member Technical Interest Service January 2010 Date: January 26, 2010 5:46:47 PM EST To: Ellen Walker [email protected]

Received: from mail.hiram.edu ([206.57.41.42]) by hiramr.hiram.edu with Microsoft SMTPSVC(6.0.3790.3959); Tue, 26 Jan 2010 17:46:54 -0500 Received: from smtp161.redcondor.net ([206.57.41.40]) by mail.hiram.edu with Microsoft SMTPSVC(6.0.3790.3959); Tue, 26 Jan 2010 17:46:54 -0500 Received: from acm26-2.acm.org ([199.222.69.107]) by smtp161.redcondor.net ({6c7b74fb-260e-4729-9476-f743470f315e}) via TCP (inbound) with ESMTP id 20100126224651153 for ; Tue, 26 Jan 2010 22:46:51 +0000 Received: from acm28-8 ([192.168.1.104]) by acm26-2.acm.org (IceWarp 9.4.2) with SMTP id IXU98141 for ; Tue, 26 Jan 2010 17:46:41 -0500

Mail Header Example (Cont’d)

• • • • • • • • • X-Rc-From: [email protected]

X-Rc-Rcpt: [email protected]

Message-Id: 10171514.34121264546007418.JavaMail.Administrator@acm28-8 Mime-Version: 1.0

Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Mailer: ColdFusion 8 Application Server Return-Path: [email protected]

X-Originalarrivaltime: 26 Jan 2010 22:46:54.0837 (UTC) FILETIME=[7568BA50:01CA9ED9]

Mail Access Protocols

• Protocols for conversation between user agent and mail server (SMTP is for mail server to mail server communication) • POP3 – Authorization, transaction, update (after client quit) • IMAP – Allows users to store mail in folders on server – Clients can access message components (e.g. headers only) • WebMail (HTTP) – Mail accessed through a web page using a browser; no application-specific client or protocol

Domain Name System (DNS)

• Translates between mnemonic hostnames and numeric IP addresses – www.hiram.edu

= 206.57.41.47

– Command “host” can look up an address • Distributed database implemented in hierarchy of DNS servers • Application-layer protocol that allows hosts to query this database – Used by other application-layer protocols such as HTTP for name-address translation – This adds a delay to each HTTP request

Database is Distributed

• Root servers (13) point to… • Top-level domain servers (com, org, edu, uk,…) point to… • Authoritative servers (per organization) • Local servers (per ISP) a Verisign, Dulles, VA c Cogent, Herndon, VA (also LA) d U Maryland College Park, MD g US DoD Vienna, VA h ARL Aberdeen, MD j Verisign, ( 21 locations) k RIPE London (also 16 other locations) i Autonomica, Stockholm (plus 28 other locations) m WIDE Tokyo (also Seoul, Paris, SF) b USC-ISI Marina del Rey, CA l ICANN Los Angeles, CA

Translating a Domain (iterative)

• Ask local DNS • Local DNS asks root DNS • Root DNS responds with appropriate top level DNS • Local DNS asks top-level DNS • TL DNS responds with appropriate organization’s authoritative DNS • Local DNS asks Authoritative DNS, and receives address (which it probably caches)

Translating a Domain (recursive)

• Ask local DNS • Local DNS asks root DNS • Root DNS asks top-level DNS • TL DNS asks organization’s authoritative DNS • Organization’s DNS responds to TL DNS, which forwads to Root DNS, which forwards to local DNS, which responds to original client • (Caching as appropriate throughout)

DNS Record Types

• A – Name = hostname, value = IP address • NS – Name = domain (hiram.edu), value = hostname of authoritative DNS • CNAME – Name = canonical hostname (foo.com), value = real hostname (relay1.bar.foo.com) • MX – Name = canonical mail name (gmail.com), value = real hostname (gsmtp183.google.com)

Creating a new DNS Record

• Entity “registers” the site, by paying a registrar and providing authoritative DNS server IP addresses • Registrar verifies uniqueness of name and enters NS and A type records into database • You provide A and MX records for your own servers at your authoritative DNS servers

Why Not Central Name Server?

• Single point of failure • Too many requests • Does not scale !!!!!

P2P (Peer to Peer) Applications

• All content transferred directly between peers without passing through third-party servers – Does not rely on always-on (24/7) servers • When a client requests an object – Find a server, currently connected that has that object – Server transmits object to client • A given host can be both client and (transient) server • Applications include: – File distribution – VOIP (e.g. Skype)

File Sharing

• Goal: Distribute a file from a single server to many hosts • Client-server – Huge burden on source host (must connect to all recipients) • P2P – Any host that has received some portion of the file can redistribute to others (sharing the burden) – Most popular (2009): BitTorrent

How long does it take? (C/S)

• N bits in file, u = upload rate, dmin = min download rate • 1 server -> N clients • Upload time = N*F/u • Max download time = F / dmin • Overall: max (N*F/u, F/dmin) • Assume N is big enough, result is N*F/u

How Long Does It Take? (P2P)

• To get file out, server must send every bit once (F/u) • Slowest recipient will take F/dmin time to get its complete file • Total upload capacity is sum of upload capacity of all uploaders: – uTotal = u + u1 + u2… • Overall time = max (F/u, F/dmin, N*F/uTotal) • When N gets bigger, so does uTotal!

BitTorrent

• • • • P2P Protocol for file distribution

Torrent

= collection of all peers involved in distribution of a single file

Chunks

256K) = equal-sized pieces of file (e.g.

Tracker

= special infrastructure node (one tracker per torrent)

When a peer joins a torrent…

• Tracker sends peer N (say 50) random addresses of hosts in the torrent • Try to connect to all of them. (Successful connections called ‘neighbors’) • Ask each neighbor for lists of chunks they have • Request (from appropriate neighbor) each chunk you don’t have

Rarest First

• Ask for the chunk which is held by the fewest of my neighbors • Result: more copies of that chunk, roughly equalizing the availability of chunks

Responding to Requests (Tit for Tat)

• Host receives many requests • Respond to requests from 4 neighbors sending bits at highest rate (

unchoked

) – These are fastest – These are ‘most generous’ • Respond to requests from a fifth neighbor at random (

optimistically unchoked

) – Might become ‘top-4’ of this neighbor!

– Allows more neighbors to get in on the action.

Distributed Hash Tables

• Need to maintain searchable index of (key, value) pairs • Cannot contain it on a single host (point of failure) – Napster did this in the early days of P2P • Distribute pairs among hosts – How to avoid having all hosts contain all pairs?

– How to avoid having all hosts contact all hosts?

Answer: Use the Hash Value!

• Assign an integer to each host (same range as hash values) • Assign each (key, value) pair to the host whose integer is closest to hash(key) – Equal is closest, then successor (wraps around) • Each host knows its successor (last -> first) – circular

overlay network

To Add or Find

• Peer receives message with hash(key) • If ID is closer to hash(key) than successor’s ID, then peer responds directly to message’s sender – ID ≥ hash(key) – Successor ID < hash(key) • Else peer passes on the message to its successor

Evaluating Circular Network

• Advantage: – Every peer needs to keep track of only 2 neighbors (predecessor and successor) • Disadvantage: – When the circle gets big, messages take a long time to go around!

• Solution: – Add a few “shortcut” links across the circle – Trades off more neighbors vs. shorter travel time

Peer Churn

• Remember, hosts come & go, not 24/7 • By the original plan, if my successor is lost, I am disconnected!

• Instead – Each node tracks 2 successors – Periodically check both your successors are there – If one is gone, find the other’s successor so you still have 2 successors • To join, pass a message around the circle

P2P Case study: Skype

Skype clients (SC) • inherently P2P: pairs of users communicate.

• proprietary application layer protocol (inferred via reverse engineering) Skype login server Supernode (SN) • hierarchical overlay with SNs • Index maps usernames to IP addresses; distributed over SNs From Kurose & Ross Slides v. 5 67