Transcript Slide 1

IELM 231: IT for Logistics and Manufacturing
Course Agenda
Introduction
IT applications design: Human-Computer Interface
Fundamental IT tools: sorting, searching
The Client-Server architecture, Interacting applications
IT in logistics, Case study 1: web-search
Search robots
Data processing, Data storage/retrieval (DB, indexes)
Data presentation: page ranking techniques
IT in logistics, Case study 2: web-based auctions
How auctions work
Web issues: session tracking
Web issues: secure communications
Web issues: cash transactions
The Internet and Web searching
Part 1. The infrastructure of the web
- The architecture of the Internet
- The ISO/OSI 7-layer communication protocol
- Client-server architecture for applications
Part 2. How to use a web search engine [case study: Google]
- Basic searching
- Advanced searching (Booleans) and filtering (setting up local engines)
- Searching data not accessible to search engines
Part 3. How search engines work
- Crawling the internet
- Indexing of information, database design
- Page ranking
The Internet and Web searching
Part 1. The infrastructure of the web
- The architecture of the Internet
- The ISO/OSI 7-layer communication protocol
- Client-server architecture for applications
Unit of digital communication: Packets
All data (files, streamed data e.g. music, video) is transmitted using
wires/optic-cables/wireless channels
In most cases, there must be no error in the communication 
the data on receiving computer is exactly same as data on sending computer
Why zero-error ? [hint: uploaded programs]
Zero error  some error detection and correction technology must be used
When is some error acceptable? [hint: video-conf, streaming]
Long message  probability (error) is high  Need to re-transmit message ?
Unit of digital communication: Packets
Long message  p(error) is high  Need to re-transmit [part] of message
Solution: Break message into small “packets”  send packets 1-by-1
To address
From address
Long message
part 1 of 3
part 2 of 3
part 3 of 3
packets
To address From address 1/3
To address From address 2/3
To address From address 3/3
data (part 1 of 3)
data (part 2 of 3)
data (part 3 of 3)
EDC
EDC
EDC
transmit
Receiver
Re-constructs
Message from
three parts received
Typical packet size: 2048 - 4096 Bytes
Question: Why do some web pages load in non-sequential fashion (some pictures load first, others later)
Network terminology
LAN: Local Area Network
A network of communicating devices in a small area (e.g. a building,
a factory, etc.)
Common ways of physically connecting computers in a LAN:
Cables (wires), Bluetooth, Wi-Fi…
WAN: Wide Area Network
Two or more LAN’s connected to each other, over a large area,
e.g. international communication networks.
Common ways of connecting between LAN’s in a WAN:
Telephone networks, Long-distance cables, Satellites
Network topologies
Suppose N computers need to communicate with each other
Pairwise connections:
How many ?
Problems ?
Network topologies
Network topology describes how different devices are (physically) connected to each other.
1
1
6
2
6
2
Central Hub
3
5
3
5
4
4
(a) Ring topology
(b) Star topology
6
5
1
1
Terminator
3
2
•• •
5
Stub
3
2
Bus
••
•
4
4
(c) Mesh topology
Tap
(d) Bus topology
6
What is the internet
• millions of connected computing
devices:
hosts = end systems
• running network applications
• communication links
– fiber, copper, radio, satellite
– transmission rate = bandwidth
• routers: forward packets
router
server
workstation
mobile
local ISP
regional ISP
UST
network
What is the internet..
router
workstation
server
• protocols control sending, receiving of msgs
mobile
– e.g., TCP, IP, HTTP, FTP, PPP
local ISP
• Internet: “network of networks”
– public: Internet
– private: Intranet
regional ISP
• Internet standards
– RFC: Request for comments
– IETF: Internet Engineering Task Force
UST
network
What is a protocol
protocols define format, order of msgs sent and received among network entities,
and actions taken on msg transmission, receipt
a human protocol
a computer protocol
Hi
TCP connection req
Hi
TCP connection response
What’s the time?
Get http://www.awl.com/kurose-ross
2pm
<file>
time
A closer look at network structure
• network edge: applications and hosts
• network core:
– routers
– network of networks
• access networks, physical media:
communication links
The network edge
End systems (hosts):
– run application programs
– e.g. Web, email
– at “edge of network”
Client/server model
– client host requests, receives
service from always-on server
– e.g. Web browser/server; email
client/server
Network edge: connection-oriented service
Goal: data transfer between end
systems
• handshaking: setup (prepare
for) data transfer ahead of time
– Hello, hello back human
protocol
– set up “state” in two
communicating hosts
e.g.
TCP service [RFC 793]
• reliable, in-order byte-stream data
transfer
– loss: acknowledgements and
retransmissions
• flow control:
– sender won’t overwhelm
receiver
• congestion control:
– senders “slow down sending
rate” when network congested
Network edge: connection-less service
Goal: data transfer between
end systems
e.g.
UDP - User Datagram Protocol:
connectionless
unreliable data transfer
no flow control
no congestion control
App’s using TCP:
• HTTP (Web), FTP (file transfer), Telnet (remote login), SMTP (email)
App’s using UDP:
• Streaming media, Teleconferencing, DNS, Internet telephony
Network core
Network core: a mesh of inter-connected routers
Basic methods to transfer data through the net:
Circuit switching
Dedicated circuit per call, e.g. telephone net
Packet-switching
Data sent through net in discrete “chunks”
Network core: circuit switching
End-end resources reserved for “call”
• link bandwidth, switch capacity
• dedicated resources: no sharing
• circuit-like (guaranteed)
performance
• call setup required
Network core: packet switching
Each end-end data stream divided into
packets
• user A, B packets share network
resources
• each packet uses full link bandwidth
• resources used as needed
Bandwidth division into “pieces”
Dedicated allocation
Resource reservation
Resource allocation:
• total resource demand can
exceed amount available
• congestion: packets queue, wait
for link use
• store and forward: packets move
one hop at a time
– Node receives complete
packet before forwarding
Packet switching: store and forward
L
R
•
•
•
•
R
R
Packet Length: L bits
Baud rate: R bps
Time to push packet on link: L/R sec
Entire packet must arrive at router before it
can be transmitted on next link: store and
forward
• delay = 3L/R
Example:
• L = 7.5 Mbits
• R = 1.5 Mbps
• delay = 15 sec
Access networks and physical media
Q: How to connect end systems to
edge router?
• residential access nets
• institutional access networks
(school, company)
• mobile access networks
Residential access: point to point access
• Phone modem
– up to 56Kbps direct access to router
(often less)
– Can’t surf and phone at same time: can’t
be “always on”
• ADSL: asymmetric digital subscriber line [similar to NOW Broadband]
– up to 1 Mbps upstream
– up to 8 Mbps downstream
Residential access: Cable modems
cable headend
cable distribution
network (simplified)
home
Residential access: Cable modems..
Diagram: http://www.cabledatacomnews.com/cmic/diagram.html
Company access: local area networks
• company/univ local area network (LAN)
connects end system to edge router
• Ethernet:
– shared or dedicated link connects end
system and router
– 10 Mbs, 100Mbps, Gigabit Ethernet
Wireless access networks
router
base
station
Shared wireless access network connects end system to router
– via base station aka “access point”
Wireless LANs:
– 802.11b (WiFi): 11 Mbps (good for networks)
– bluetooth: 720Kbps (good for device-to-device)
mobile
hosts
Home networks
Typical home network components:
• ADSL or cable modem
• router/firewall/NAT
• Ethernet
• wireless access point
to/from
cable
headend
cable
modem
router/
firewall
Ethernet
wireless
laptops
wireless
access
point
Internet structure: network of networks
• a packet passes through many networks!
local
ISP
Tier 3
ISP
Tier-2 ISP
local
ISP
local
ISP
local
ISP
Tier-2 ISP
Tier 1 ISP
Tier 1 ISP
Tier-2 ISP
local
local
ISP
ISP
Network Access Point
Tier 1 ISP
Tier-2 ISP
local
ISP
Tier-2 ISP
local
ISP
Protocol “Layers”
Networks are complex!
• many “pieces”:
– hosts
– routers
– links of various media
– applications
– protocols
– hardware, software
Analogy: Organization of air travel
ticket (purchase)
ticket (complain)
baggage (check)
baggage (claim)
gates (load)
gates (unload)
runway takeoff
runway landing
airplane routing (departure)
airplane routing (arrival)
airplane routing
[intermediate air-traffic control points]
Layering of airline functionality
ticket (purchase)
ticket (complain)
ticket
baggage (check)
baggage (claim
baggage
gates (load)
gates (unload)
gate
runway (takeoff)
runway (land)
takeoff/landing
airplane routing
airplane routing
airplane routing
departure
airport
airplane routing
airplane routing
intermediate air-traffic
control centers
Layers: each layer implements a service
– via its own internal-layer actions
– relying on services provided by layer below
arrival
airport
Why layering?
Dealing with complex systems:
• explicit structure allows identification, relationship of complex
system’s pieces
– layered reference model for discussion
• modularization eases maintenance, updating of system
– change of implementation of layer’s service transparent to rest of
system
– e.g., change in gate procedure doesn’t affect rest of system
• layering considered harmful?
Internet protocol stack
• application: supporting network applications
– FTP, SMTP, HTTP
• transport: host-host data transfer
– TCP, UDP
• network: routing of datagrams from source to
destination
– IP, routing protocols
• link: data transfer between neighboring network
elements
– PPP, Ethernet
• physical: bits “on the wire”
application
transport
network
link
physical
Encapsulation
message
segment Ht
datagram Hn Ht
frame
Hl Hn Ht
M
M
M
M
source
application
transport
network
link
physical
Hl Hn Ht
M
link
physical
Hl Hn Ht
M
switch
destination
M
Ht
M
Hn Ht
Hl Hn Ht
M
M
application
transport
network
link
physical
Hn Ht
Hl Hn Ht
M
M
network
link
physical
Hn Ht
Hl Hn Ht
M
M
router
The Network Layer: Internet Protocol
• What’s inside a router
• Internet Protocol and IP addresses
• How packets are routed
Internet Protocol (IP)
The Internet Protocol (IP) is a network-layer (Layer 3) protocol that contains
addressing information and some control information that enables packets
to be routed.
Network layer functions: Forwarding and Routing
Forwarding: determines which link to
take at a specific router;
routing algorithm
local forwarding table
header value output link
0100
0101
0111
1001
Routing: plan of a series of forwarding
data that can take the packet from source
to destination
3
2
2
1
value in arriving
packet’s header
0111
1
3 2
DATAGRAM
IP datagram format
IP protocol version
number
header length
(bytes)
“type” of data
max number
remaining hops
(decremented at
each router)
upper layer protocol
to deliver payload to
how much overhead
with TCP?
• 20 bytes of TCP
• 20 bytes of IP
• = 40 bytes + app
layer overhead
32 bits
type of
ver head.
len service
length
fragment
16-bit identifier flgs
offset
upper
time to
Internet
layer
live
checksum
total datagram
length (bytes)
for
fragmentation/
reassembly
32 bit source IP address
32 bit destination IP address
Options (if any)
data
(variable length,
typically a TCP
or UDP segment)
E.g. timestamp,
record route
taken, specify
list of routers
to visit.
Datagram networks
Packets forwarded using destination host address
– packets between same source-dest pair may take different paths
application
transport
network
data link 1. Send data
physical
application
transport
2. Receive data network
data link
physical
Main router functions:
• run routing algorithms/protocol (RIP, OSPF, BGP)
• forwarding datagrams from incoming to outgoing link
IP Addressing
IP Address is a locator to allow one IP device to ‘find’ another IP device
223.1.1.1
IP address: 32-bit identifier for hostrouter interface (128bits in Vista, OS-X)
223.1.2.1
223.1.1.2
223.1.1.4
interface: connection between
host/router and physical link
– router’s: 2 or more interfaces
– host 1 or more interfaces
– each interface has an IP address
223.1.1.3
223.1.3.1
223.1.1.1 = 11011111 00000001 00000001 00000001
223
1
1
1
223.1.2.9
223.1.3.27
223.1.2.2
223.1.3.2
Access networks and physical media
IP address:
– subnet part (high order bits)
– host part (low order bits)
What’s a subnet ?
– device interfaces with same
subnet part of IP address
– can physically reach each
other without intervening
router
223.1.1.1
223.1.2.1
223.1.1.2
223.1.1.4
223.1.1.3
223.1.2.9
223.1.3.27
223.1.2.2
LAN
223.1.3.1
223.1.3.2
network consisting of 3 subnets
Subnets
223.1.1.0/24
To determine the subnets,
detach each interface
from its host or router,
creating islands of
isolated networks.
223.1.2.0/24
Each isolated network is
called a subnet.
223.1.3.0/24
Subnet mask: /24
Subnets..
223.1.1.2
How many ?
223.1.1.1
223.1.1.4
223.1.1.3
223.1.9.2
223.1.7.0
223.1.9.1
223.1.7.1
223.1.8.1
223.1.8.0
223.1.2.6
223.1.2.1
223.1.3.27
223.1.2.2
223.1.3.1
223.1.3.2
Subnets…
Subnets allow us to create sub-collection of IP devices, e.g. a LAN
IP address is made of two parts:
- Network address
- Host (i.e. device) address
How many bits (and which ones) of the IP address are used for
Network address, and for Host ?
Depends on the LAN: e.g. only 4 devices  we may only use 2 bits.
Subnet mask specifies which bits are used for Host name:
Full Network Address
192.168.5.10
11000000.10101000.00000101.00001010
Subnet Mask
255.255.255.0
11111111.11111111.11111111.00000000
Network Portion
192.168.5.0
11000000.10101000.00000101.00000000
Client Portion
0.0.0.10
00000000.00000000.00000000.00001010
IP Address: how to get one?
How does host get IP address?
1. Hard-coded by system administrator in a file
Control-panelNetworkConfigurationtcp/ipproperties
or
2. DHCP: Dynamic Host Configuration Protocol:
Dynamically get address from a server (plug-and-play)
DNS: Domain Name System
People: many identifiers:
– HKID, name, passport #
Internet hosts, routers:
– IP address (32 bit) - used
for addressing datagrams
– “name”, e.g.,
www.yahoo.com - used by
humans
Q: map between IP addresses and
name ?
Domain Name System:
• distributed database implemented
in hierarchy of many name servers
• application-layer protocol host,
routers, name servers to
communicate to resolve names
(address/name translation)
Suppose a client wants to connect to a host www.amazon.co.uk
(1) the “network” must tell us the IP address of a host www.amazon.co.uk
(2) the client sends a “connect” request to that IP address.
DNS..
How to find IP address of a named host?
A DB of {Name  IP address} is stored on computers, Name Servers
resource record
name  IP, …
zone of authority,
managed by Name Server
High level Name server
Lower level Name server, can allocate, store
names to computers below it in hierarchy
Transport services and Protocols
• provide logical communication
between app processes running on
different hosts
• Most common transport protocol: TCP
application
transport
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
The TCP provides a reliable, continuous stream of data
- protocol for automatically requesting missing data
- reordering IP packets that arrive out of order
- converting IP datagrams to a streaming protocol
- routing data within a computer to the correct application.
application
transport
network
data link
physical
The Application Layer
•
•
•
•
Principles of network applications
Web and HTTP
FTP
Electronic Mail
– SMTP, POP3, IMAP
• DNS
• Socket programming with TCP
• Building a Web server
Creating a network application
Write programs that
– run on different end systems
and communicate over a
network
– Example: Web server software
communicates with browser
software
No software written for devices in
network core
– Network core devices do not
function at application layer
– This design allows for rapid
app development
application
transport
network
data link
physical
application
transport
network
data link
physical
application
transport
network
data link
physical
Client-Server Architecture
server:
– always-on host
– permanent IP address
clients:
– communicate with server
– may be intermittently
connected
– may have dynamic IP
addresses
– do not communicate
directly with each other
Addressing processes
For a process to receive messages, it must have an identifier (i.e. address)
A host has a unique 32-bit IP address
Does the IP address of the host on which the process runs suffice for
identifying the process?
Ans: No, many processes can be running on same host
Problem: Most computers have only one “internet connection”, usually a
serial port. How to manage multiple processes (e.g. mail, internet, ftp, telnet)
sending/receiving packets of data through that line?
Ans: The Operating system must somehow separate out
one channel into multiple channels  Sockets
Sockets
• process sends/receives messages
to/from its socket
• socket analogous to door
– sending process shoves
message out door
– sending process relies on
transport infrastructure on other
side of door which brings
message to socket at receiving
process
host or
server
host or
server
process
controlled by
app developer
socket
socket
TCP with
buffers,
variables
process
Internet
controlled by OS
TCP with
buffers,
variables
Sockets..
 Addressing Identifier includes:
IP address of host and
port number of the process on host
socket
a host-local,
application-created,
OS-controlled interface
(a “door”) into which
application process can
both send and
receive messages to/from
another application
process
Default port numbers for common apps:
HTTP server: 80
FTP: 20, 21
SMTP: 25
Telnet: 23
Socket programming
Goal: how to build client/server application that communicate using sockets
Socket API
- introduced in UNIX, 1981
- explicitly created, used, released by apps client/server paradigm
- two types of transport service via socket API:
unreliable datagram
reliable, byte stream-oriented
Socket-programming using TCP
Socket: a door between application process and end-end-transport protocol
(UDP or TCP)
TCP service: reliable transfer of bytes from one process to another
controlled by
application
developer
controlled by
operating
system
process
process
socket
TCP with
buffers,
variables
host or
server
internet
socket
TCP with
buffers,
variables
host or
server
controlled by
application
developer
controlled by
operating
system
Socket programming with TCP
Client must contact server
• server process must first be
running
• server must have created socket
(door) that ‘listens’ for client’s
contact
Client contacts server by:
• creating client-local TCP socket
• specifying IP address, port
number of server process
• When client creates socket:
client TCP establishes
connection to server TCP
• When contacted by client, server
TCP creates new socket for
server process to communicate
with client
– allows server to talk with
multiple clients
– source port numbers used to
distinguish clients
application viewpoint
TCP provides reliable, in-order
transfer of bytes (“pipe”)
between client and server
Stream terminology
On a host, data can come to a port at any time
The receiving process only listens at the port intermittently (why?)
What happens to data if this process is not yet listening?
 Streams
• A stream is a sequence of characters that flow into or out of a
process.
• An input stream is attached to some input source for the
process, eg, keyboard or socket.
• An output stream is attached to an output source, eg, monitor or
socket.
Socket programming with TCP
output
stream
inFromServer
Client
Process
process
input
stream
outToServer
Example client-server app:
1) client reads line from standard input
(inFromUser stream) , sends to
server via socket (outToServer
stream)
2) server reads line from socket
3) server converts line to uppercase,
sends back to client
4) client reads, prints modified line
from socket (inFromServer
stream)
monitor
inFromUser
keyboard
input
stream
client
TCP
clientSocket
socket
to netw ork
TCP
socket
from netw ork
Client/server socket interaction: TCP
Server (running on hostid)
Client
create socket, port=x,
for incoming request:
welcomeSocket = ServerSocket()
TCP
wait for incoming
connection request connection
connectionSocket =
welcomeSocket.accept()
read request from
connectionSocket
write reply to
connectionSocket
close
connectionSocket
setup
create socket,
connect to hostid, port=x
clientSocket =
Socket()
send request using
clientSocket
read reply from
clientSocket
close
clientSocket
Note: For VB 6.0 on Windows, similar commands are in the Winsock control
References and Further Reading
Books:
Jim Kurose, Keith Ross, Computer Networking: A Top Down Approach Featuring
the Internet, 3rd ed., Addison-Wesley, July 2004.
Web sources:
1. Domain Name Systems: Wikipedia DNS
2. Registering your own Domain Name: ICAAN, InterNIC, …
Next: Search engines, Google case study