Transcript Document

ECT 250: Survey of e-commerce technology
The Internet and the WWW
What is the Internet?
• A loosely configured global wide-area network.
• Includes more than 31,000 different networks in
over 100 different countries.
• Millions of people visit and contribute to the
Internet, through e-mail and the World Wide
Web.
• Began as a Department of Defense project.
• For detailed information about the history of the
Internet, see:
http://dir.yahoo.com/Computers_and_internet/Internet/History/
2
Early history of the Internet
• In the 1950s the U.S. Department of Defense
became concerned that a nuclear attack could
disable its computing (and thus planning and
coordinating) capabilities.
• By 1969 the Advanced Research Projects Agency
Network (ARPANet) had been constructed.
• The first computers to be connected were ones at
the University of California at Los Angeles, SRI
International, the University of California at
Santa Barbara, and the University of Utah.
3
The changing Internet
Early on researchers began to find new uses for the
Internet, beyond its original purpose of controlling
weapons systems.
These new applications included the following:
• Electronic mail
• File transfer protocol
• Telnet
• User’s News Network (Usenet)
4
The new uses
• In 1972 a researcher wrote a program that could
send and receive messages over the Internet.
E-mail was quickly adopted by Internet users.
• File transfer protocol (FTP) allowed researchers
using the Internet to transfer files easily across
great distances.
• Telnet allows users of the Internet to log into
their computer accounts from remote sites.
• All three of these applications are still widely
used. We will discuss them again later.
5
Usenet
• In 1979 a group of students and programmers at
Duke and the University of North Carolina
started Usenet, short for User News Network.
• Usenet allows anyone who connects to the
network to read and post articles on a variety
of subjects.
• Usenet survives today in what are called newsgroups.
6
Newsgroups
There are several thousand newsgroups covering a
highly varied groups of subjects.
Examples:
– alt.cats
– comp.databases
– rec.climbing
– soc.penpals
The first part of the name of each group tells you
what type of group it is and the remaining parts
indicate the subject matter.
7
Accessing newsgroups
Newsgroups can be accessed in two ways:
1. Using special software (trn, rn, etc.)
2. Using a browser on the Web.
As an example, DejaNews is a web site that
allows access to a variety of newsgroups as
well as providing an archive of old postings
to the group.
See http://www.deja.com/usenet/
8
Early use of the Internet
• From 1969 until the 1980s the Internet was used
primarily by government and university
researchers.
• The development of the Internet was funded in
part by the National Science Foundation (NSF)
and commercial network traffic was prohibited.
• As personal computers became more powerful,
and affordable in the 1980s, companies created
their own networks. These users wanted to be
able to communicate outside the network.
9
Commercial use of the Internet
• In 1989 the NSF allowed two commercial e-mail
services (MCI Mail and CompuServe) to
establish limited connections to the Internet.
• These connections allowed an exchange of e-mail
between users of the commercial services and
users of the Internet.
• In 1991 the NSF further eased its restrictions on
Internet commercial activity and began planning
for the privatization of the Internet.
10
Privatization
• The privatization of the Internet was substantially
completed in 1995. At that point the NSF
decommissioned its backbone.
• The new structure of the Internet was based on
four network access points (NAPs), each
operated by a separate company.
• The network access providers sell Internet access
rights directly to larger customers and indirectly
to smaller customers through other companies
called Internet service providers (ISPs).
11
A growing Internet
• Researchers had long considered the Internet a
valuable tool.
• As the 1990s began, a larger variety of people
thought of the Internet as a useful resource.
• The Internet grew significantly in 20 years.
Year
# of computers
1969
4
1990
313,000
• The largest growth in the Internet was yet to
come.
12
A prehistory of the Web
• In 1945, Vannevar Bush wrote an article that
proposed a machine (called the Memex) to store
a person’s books, records, letters, and research
results on microfilm. The Memex would have
an index to help locate documents.
• In the 1960s, Ted Nelson described a similar
system in which text on one page would have
links to text on other pages. Nelson called
this page linking system hypertext.
• Douglas Englebart (inventor of the mouse) created
13
the first experimental hypertext system.
CERN and hypertext
• In 1990, Tim Berners-Lee and Robert Calliau were
working on overhauling the document handling
procedures at CERN, a laboratory for particle
particle physics in Geneva, Switzerland.
• CERN had been connected to the Internet for two
years, but its scientists wanted to find better ways
to circulate their scientific papers and data.
• Independently, Berners-Lee and Calliau proposed
a hypertext development project.
14
The birth of the Web
• Over the next two years Berners-Lee developed
the code for a hypertext server program and
made it available on the Internet.
• He envisioned the set of links between computers
as a spider web, hence the name Web.
• The CERN site is considered the birthplace of the
World Wide Web.
The CERN site: http://cern.web.cern.ch/CERN/
15
Terminology
• A hypertext server is a computer that stores files
written in hypertext markup language (HTML)
and lets other computers connect to it and read
those files. It is now called a Web server.
• A hyperlink is a special tag that contains a pointer
to another location in the same or in a different
HTML document.
• HTML is based on Standard Generalized Markup
Language (SGML), which organizations have
used for many years to manage large document
16
filing systems.
Early Web browsers
• A Web browser is a software interface that lets
users read (or browse) HTML documents.
• Early web browsers were text based.
• Although the Web caught on quickly in the
research community, broader acceptance was
slow to materialize.
• Part of the problem was that the early browsers
were difficult to use.
17
GUI Web browsers
• In 1993, Marc Andressen led a team of researchers
and developed the first software with a graphical
user interface for viewing pages over the Web.
• This first GUI browser was named Mosaic.
• Mosaic widened the appeal of the Web by making
access easier and adding multimedia capabilities.
• Andressen later went on to develop the Netscape
Navigator browser.
18
The growth of the Internet
The Internet has grown, and continues to grow, at
a phenomenal rate.
Date
WWW Servers Internet Hosts
12/1969
N/A
4
12/1979
N/A
188
12/1989
N/A
159,000
12/1993
623
2,056,000
12/1996
603,367
21,819,000
12/1999 9,560,866
56,218,000
07/2000 18,169,498
93,047,785
19
Factors behind growth
There are four main factors that led to the surge
in popularity of the Internet:
• The web-like ability to link from site to site.
• The ease of use provided by the browsers’
graphical user interface.
• The growth of personal computers and local
area networks that could be connected to
the Internet.
• The TCP/IP standard.
20
Control of the Internet
• No one organization currently controls the Internet.
• Several groups oversee aspects of the development
of the Internet.
– Internet Engineering Task Force (IETF)
Oversees the evolution of Internet protocols
– Internet Registries (InterNIC)
Maintain and allocate Internet domains
– World Wide Web Consortium (W3C)
Develops standards for the WWW
• See the Internet Standardization Organizations.
21
Internet 2
A project to develop another Internet, Internet2, is
is being led by over 170 U.S. universities working
in partnership with industry and government.
This new network is designed to allow development
and deployment of advanced network applications
and technologies.
For more information see: http://www.internet2.edu/
22
A model for networking
• The world’s telephone companies were the early
models for networked computers because the
networks used leased telephone company lines.
• Telephone companies at the time established a
single connection between sender and receiver
for each telephone call.
• Once a connection was established, data traveled
along that path.
23
Circuit switching
• Telephone company switching equipment (both
mechanical and computerized) selected the
phone lines, or circuits, to connect in order to
create the path between caller and receiver.
• This centrally controlled, single connection
model is known as circuit switching.
• Using circuit switching does not work well for
sending data across a large network.
• Point-to-point connections for each sender/
receiver pair is expensive and hard to manage.
24
A different approach
• The Internet uses a less expensive and more easily
managed technique than circuit switching.
• Files and messages are broken down into packets
that are labeled with codes that indicate their
origin and destination.
• Packets travel from computer to computer along
the network until they reach their destination.
• The destination computer reassembles the data
from the packets it receives.
• This is called a packet switching network.
25
Packet switching
• In a packet-switched network, (some of) the
computers that an individual packet encounters
determine the best way to move the packet to its
destination.
• Computers performing this determination are
called routers.
• The programs that the computers use to determine
the path are called routing algorithms.
26
Benefits of packet switching
There are benefits to packing switching:
• Long streams of data can be broken down into
small manageable data chunks, allowing the
small packets to be distributed over a wide
number of possible paths to balance traffic.
• It is relatively inexpensive to replace damaged
data packets after they arrive, since if a data
packet is altered in transit only a single
packet must be retransmitted.
27
Open architecture
When it was being developed, the people working
on ARPANet adhered to the following principles:
1. Independent networks should not require any
internal changes in order to be connected.
2. The router computers do not retain information
about the packets that they handle.
3. Packets that do not arrive at their destinations
must be retransmitted from their source network.
4. No global control exists over the network.
28
Most popular Internet protocols
The most popular Internet protocols include:
• TCP/IP
• HTTP (Hypertext transfer protocol)
• E-mail protocols (SMTP, POP, IMAP)
• FTP (File transfer protocol)
Each protocol is used for a different purpose,
but all of them are important.
29
TCP/IP
• The protocols that underlie the basic operation of
the Internet are TCP (transmission control
protocol) and IP (Internet protocol).
• Developed by Internet pioneers Vinton Cerf and
and Robert Kahn, these protocols establish rules
about how data are moved across networks and
how network connections are established and
broken.
CTI plug: The Distinguished Lecturer Series
30
Purposes of each protocol
• TCP controls the assembly of a message into
smaller packets before it is transmitted over
the network. It also controls the reassembly
of packets once they reach their destination.
• The IP protocol includes rules for routing
individual data packets from their source to
their destination. It also handles all addressing
details for each packet.
31
Network layers
The work done by communications software is
broken into multiple layers, each of which handles
a different set of tasks.
Each layer is responsible for a specific set of tasks
and works as one unit with the other layers when
delivering information over the Internet.
Each layer provides services for the layer above it.
32
TCP/IP architecture
There are five layers in the Internet model:
1. Application
2. Transport
3. Internet
4. Network interface
5. Hardware
The lowest layer is the hardware layer that handles
the individual pieces of equipment attached to the
network. The highest layer is the application layer
where various network applications run.
33
Positioning within the layers
A full discussion of the Internet model is beyond
the scope of this class.
It is, however, useful to know where each protocol
resides. TCP operates in the transport layer and IP
in the Internet layer. See Figure 2-2 on page 38.
Some of the application layer protocols include
HTTP, SMTP, POP, IMAP, and FTP. (Telnet
also operates in the application layer).
34
HTTP
• HTTP (hypertext transfer protocol) is the protocol
responsible for transferring and displaying Web
pages.
• It has continued to evolve since being introduced.
• Like other Internet protocols, HTTP uses the client/
server model of computing. Thus, to understand
how HTTP works, we need to first discuss the
client/server model.
35
Client/server model
• In the client/server model there are two roles: the
client and the server.
• The client process makes requests of the server.
The client is only capable of sending a request
to the server and then waiting for the reply.
• The server satisfies the requests of the client. It
usually has access to a resource, such as data,
that the client wants. When the resource that
the client wants becomes available, it sends a
message to the client.
• This model simplifies communication.
36
HTTP and client/server
• With HTTP the client is the user’s Web browser
and the server is the Web server.
• To open a session, the browser sends a request
to the server that holds the desired web page.
• The server replies by sending back the page or an
error message if the page could not be found.
• After the client verifies that the response sent was
correct, the TCP/IP connection is closed and
the HTTP session ends.
• Each new page that is desired will result in a new
HTTP session and another TCP/IP connection.
37
One page, multiple requests
• If a Web page contains objects such as movies,
sound, or graphics, a client must make a
request for each object.
• For example, a Web page containing a background sound and three graphics will result in
five separate server request messages to retrieve
the four objects plus the page itself.
38
Mail protocols
• Electronic mail is managed and stored by programs
and hardware collectively known as mail servers.
• It is displayed and created using mail programs
such as Eudora, Pine, Unix mail, Outlook, etc.
• Protocols are necessary for the mail programs to
construct requests to receive e-mail and for the
mail servers in downloading and/or deleting
e-mail.
• The e-mail protocols we will discuss use the client/
server model.
39
SMTP
• SMTP (simple mail transfer protocol) is used by
client mail programs to make requests about
mail delivery.
• A wide variety of mail programs such as Eudora,
Unix mail, and PINE, use SMTP to send mail
to a mail server.
• The SMTP specifies the exact format of a mail
message and describes how mail is to be
administered.
40
POP
• POP (post office protocol) is responsible for
retrieving e-mail from a server.
• Provides support for Multipurpose Internet Mail
Extensions (MIME), which allow the user to
attach binary file messages such as spreadsheets.
• When e-mail is read, POP specifies that it must
be downloaded to your computer from the
server. Read e-mail is not kept on the server.
41
IMAP
IMAP (Internet Message Access Protocol) is a
newer protocol that, like POP, can ask if there
is new mail, download e-mail, and delete e-mail.
IMAP has capabilities that POP does not:
• It defines how a client program asks a mail
server to present available mail.
• It can allow you to manipulate and manage
e-mail without downloading it from the server.
42
Internet addresses
Internet addresses are represented in several ways,
but all the formats are translated to a 32-bit number
called an IP address.
The increased demand for IP addresses will soon
make 32-bit addresses too small, and they will be
replaced with 128-bit addresses in the near future.
See the links page for more information.
How does increasing the number of bits in the
address help with increasing demand?
43
Dotted quads
• IP numbers appear as a series of up to 4 separate
numbers delineated by a period.
• Examples:
students.depaul.edu: 140.192.1.100
condor.depaul.edu: 140.192.1.6
facweb.cs.depaul.edu: 140.192.33.6
• Each of the four numbers can range from 0 to
255, so the possible IP addresses range from
0.0.0.0 to 255.255.255.255
44
Domain names
• Since IP numbers can be difficult for humans to
remember, domain names are associated with
each IP address.
• Examples:
students.depaul.edu: 140.192.1.100
facweb.cs.depaul.edu: 140.192.33.6
• A domain name server is responsible for the
mapping between domain names and IP
addresses.
45
Uniform resource locator
• People on the Web use a naming convention
called the uniform resource locator (URL).
• A URL consists of at least two and as many as
four parts.
• A simple two part URL contains the protocol
used to access the resource followed by the
location of the resource.
Example: http://www.cs.depaul.edu/
• A more complex URL may have a file name
and a path where the file can be found.
46
A URL deconstructed
http://facweb.cs.depaul.edu/asettle/ect250/section602/hw/assign2.htm
hypertext
transfer
protocol
domain
path that indicates
the location of the
document in the
host’s file system
document
name
47
Anatomy of an e-mail address
asettle @ cs . depaul . edu
Handle
Host/Server
Domain
Domain
Type
Others:
• students
• hawk
• condor
48
Domain types
• edu: educational
• com: commercial
• net: originally for telecommunications
• org: organizations (non-profit)
• gov: U.S. government
• ja, uk, de, … : Nations other than the U.S.
• New additions: info, biz, name, pro, museum,
coop, aero, tv. See links page for a related
news story.
49
Internet utility programs
TCP/IP supports a variety of utility programs that
allow people to use the Internet more efficiently.
These utility programs include:
• Finger
• Ping
• Talk
50
Finger
Finger is a program that allows a user to obtain
limited information about other network users.
The information that can be obtained includes:
• Which users are currently logged on
• Where each user logged onto the network from
• How long the user has been on the network
• When the user last logged onto the system
Finger is sometimes disabled for security reasons.
51
Ping
• Ping (Packet InterNet Groper) tests the connectivity
between two Internet hosts and determines if a
host is active on the network.
• It works by sending a packet to the specified address
and waiting for a reply.
• Ping is typically used to troubleshoot connections.
• To run ping, you simply type ping followed by the
IP address or domain name of the machine you are
interested in.
Example: ping students.depaul.edu
52
Talk
• Talk is a program that runs on Unix computers
and allows a user to interactively communicate
with another user on the network.
• To use it you type talk followed by the string
username@host where username is the person
you wish to talk to and the host is the machine
where they are logged on.
Example: talk [email protected]
• It is similar to instant messaging programs such
as AOL instant messaging, Yahoo! pager, etc.
53