Transcript Document
ECT 250: Survey of e-commerce technology The Internet and the WWW What is the Internet? • A loosely configured global wide-area network. • Includes more than 31,000 different networks in over 100 different countries. • Millions of people visit and contribute to the Internet, through e-mail and the World Wide Web. • Began as a Department of Defense project. • For detailed information about the history of the Internet, see: http://dir.yahoo.com/Computers_and_internet/Internet/History/ 2 Early history of the Internet • In the 1950s the U.S. Department of Defense became concerned that a nuclear attack could disable its computing (and thus planning and coordinating) capabilities. • By 1969 the Advanced Research Projects Agency Network (ARPANet) had been constructed. • The first computers to be connected were ones at the University of California at Los Angeles, SRI International, the University of California at Santa Barbara, and the University of Utah. 3 The changing Internet Early on researchers began to find new uses for the Internet, beyond its original purpose of controlling weapons systems. These new applications included the following: • Electronic mail • File transfer protocol • Telnet • User’s News Network (Usenet) 4 The new uses • In 1972 a researcher wrote a program that could send and receive messages over the Internet. E-mail was quickly adopted by Internet users. • File transfer protocol (FTP) allowed researchers using the Internet to transfer files easily across great distances. • Telnet allows users of the Internet to log into their computer accounts from remote sites. • All three of these applications are still widely used. We will discuss them again later. 5 Usenet • In 1979 a group of students and programmers at Duke and the University of North Carolina started Usenet, short for User News Network. • Usenet allows anyone who connects to the network to read and post articles on a variety of subjects. • Usenet survives today in what are called newsgroups. 6 Newsgroups There are several thousand newsgroups covering a highly varied groups of subjects. Examples: – alt.cats – comp.databases – rec.climbing – soc.penpals The first part of the name of each group tells you what type of group it is and the remaining parts indicate the subject matter. 7 Accessing newsgroups Newsgroups can be accessed in two ways: 1. Using special software (trn, rn, etc.) 2. Using a browser on the Web. As an example, DejaNews is a web site that allows access to a variety of newsgroups as well as providing an archive of old postings to the group. See http://www.deja.com/usenet/ 8 Early use of the Internet • From 1969 until the 1980s the Internet was used primarily by government and university researchers. • The development of the Internet was funded in part by the National Science Foundation (NSF) and commercial network traffic was prohibited. • As personal computers became more powerful, and affordable in the 1980s, companies created their own networks. These users wanted to be able to communicate outside the network. 9 Commercial use of the Internet • In 1989 the NSF allowed two commercial e-mail services (MCI Mail and CompuServe) to establish limited connections to the Internet. • These connections allowed an exchange of e-mail between users of the commercial services and users of the Internet. • In 1991 the NSF further eased its restrictions on Internet commercial activity and began planning for the privatization of the Internet. 10 Privatization • The privatization of the Internet was substantially completed in 1995. At that point the NSF decommissioned its backbone. • The new structure of the Internet was based on four network access points (NAPs), each operated by a separate company. • The network access providers sell Internet access rights directly to larger customers and indirectly to smaller customers through other companies called Internet service providers (ISPs). 11 A growing Internet • Researchers had long considered the Internet a valuable tool. • As the 1990s began, a larger variety of people thought of the Internet as a useful resource. • The Internet grew significantly in 20 years. Year # of computers 1969 4 1990 313,000 • The largest growth in the Internet was yet to come. 12 A prehistory of the Web • In 1945, Vannevar Bush wrote an article that proposed a machine (called the Memex) to store a person’s books, records, letters, and research results on microfilm. The Memex would have an index to help locate documents. • In the 1960s, Ted Nelson described a similar system in which text on one page would have links to text on other pages. Nelson called this page linking system hypertext. • Douglas Englebart (inventor of the mouse) created 13 the first experimental hypertext system. CERN and hypertext • In 1990, Tim Berners-Lee and Robert Calliau were working on overhauling the document handling procedures at CERN, a laboratory for particle particle physics in Geneva, Switzerland. • CERN had been connected to the Internet for two years, but its scientists wanted to find better ways to circulate their scientific papers and data. • Independently, Berners-Lee and Calliau proposed a hypertext development project. 14 The birth of the Web • Over the next two years Berners-Lee developed the code for a hypertext server program and made it available on the Internet. • He envisioned the set of links between computers as a spider web, hence the name Web. • The CERN site is considered the birthplace of the World Wide Web. The CERN site: http://cern.web.cern.ch/CERN/ 15 Terminology • A hypertext server is a computer that stores files written in hypertext markup language (HTML) and lets other computers connect to it and read those files. It is now called a Web server. • A hyperlink is a special tag that contains a pointer to another location in the same or in a different HTML document. • HTML is based on Standard Generalized Markup Language (SGML), which organizations have used for many years to manage large document 16 filing systems. Early Web browsers • A Web browser is a software interface that lets users read (or browse) HTML documents. • Early web browsers were text based. • Although the Web caught on quickly in the research community, broader acceptance was slow to materialize. • Part of the problem was that the early browsers were difficult to use. 17 GUI Web browsers • In 1993, Marc Andressen led a team of researchers and developed the first software with a graphical user interface for viewing pages over the Web. • This first GUI browser was named Mosaic. • Mosaic widened the appeal of the Web by making access easier and adding multimedia capabilities. • Andressen later went on to develop the Netscape Navigator browser. 18 The growth of the Internet The Internet has grown, and continues to grow, at a phenomenal rate. Date WWW Servers Internet Hosts 12/1969 N/A 4 12/1979 N/A 188 12/1989 N/A 159,000 12/1993 623 2,056,000 12/1996 603,367 21,819,000 12/1999 9,560,866 56,218,000 07/2000 18,169,498 93,047,785 19 Factors behind growth There are four main factors that led to the surge in popularity of the Internet: • The web-like ability to link from site to site. • The ease of use provided by the browsers’ graphical user interface. • The growth of personal computers and local area networks that could be connected to the Internet. • The TCP/IP standard. 20 Control of the Internet • No one organization currently controls the Internet. • Several groups oversee aspects of the development of the Internet. – Internet Engineering Task Force (IETF) Oversees the evolution of Internet protocols – Internet Registries (InterNIC) Maintain and allocate Internet domains – World Wide Web Consortium (W3C) Develops standards for the WWW • See the Internet Standardization Organizations. 21 Internet 2 A project to develop another Internet, Internet2, is is being led by over 170 U.S. universities working in partnership with industry and government. This new network is designed to allow development and deployment of advanced network applications and technologies. For more information see: http://www.internet2.edu/ 22 A model for networking • The world’s telephone companies were the early models for networked computers because the networks used leased telephone company lines. • Telephone companies at the time established a single connection between sender and receiver for each telephone call. • Once a connection was established, data traveled along that path. 23 Circuit switching • Telephone company switching equipment (both mechanical and computerized) selected the phone lines, or circuits, to connect in order to create the path between caller and receiver. • This centrally controlled, single connection model is known as circuit switching. • Using circuit switching does not work well for sending data across a large network. • Point-to-point connections for each sender/ receiver pair is expensive and hard to manage. 24 A different approach • The Internet uses a less expensive and more easily managed technique than circuit switching. • Files and messages are broken down into packets that are labeled with codes that indicate their origin and destination. • Packets travel from computer to computer along the network until they reach their destination. • The destination computer reassembles the data from the packets it receives. • This is called a packet switching network. 25 Packet switching • In a packet-switched network, (some of) the computers that an individual packet encounters determine the best way to move the packet to its destination. • Computers performing this determination are called routers. • The programs that the computers use to determine the path are called routing algorithms. 26 Benefits of packet switching There are benefits to packing switching: • Long streams of data can be broken down into small manageable data chunks, allowing the small packets to be distributed over a wide number of possible paths to balance traffic. • It is relatively inexpensive to replace damaged data packets after they arrive, since if a data packet is altered in transit only a single packet must be retransmitted. 27 Open architecture When it was being developed, the people working on ARPANet adhered to the following principles: 1. Independent networks should not require any internal changes in order to be connected. 2. The router computers do not retain information about the packets that they handle. 3. Packets that do not arrive at their destinations must be retransmitted from their source network. 4. No global control exists over the network. 28 Most popular Internet protocols The most popular Internet protocols include: • TCP/IP • HTTP (Hypertext transfer protocol) • E-mail protocols (SMTP, POP, IMAP) • FTP (File transfer protocol) Each protocol is used for a different purpose, but all of them are important. 29 TCP/IP • The protocols that underlie the basic operation of the Internet are TCP (transmission control protocol) and IP (Internet protocol). • Developed by Internet pioneers Vinton Cerf and and Robert Kahn, these protocols establish rules about how data are moved across networks and how network connections are established and broken. CTI plug: The Distinguished Lecturer Series 30 Purposes of each protocol • TCP controls the assembly of a message into smaller packets before it is transmitted over the network. It also controls the reassembly of packets once they reach their destination. • The IP protocol includes rules for routing individual data packets from their source to their destination. It also handles all addressing details for each packet. 31 Network layers The work done by communications software is broken into multiple layers, each of which handles a different set of tasks. Each layer is responsible for a specific set of tasks and works as one unit with the other layers when delivering information over the Internet. Each layer provides services for the layer above it. 32 TCP/IP architecture There are five layers in the Internet model: 1. Application 2. Transport 3. Internet 4. Network interface 5. Hardware The lowest layer is the hardware layer that handles the individual pieces of equipment attached to the network. The highest layer is the application layer where various network applications run. 33 Positioning within the layers A full discussion of the Internet model is beyond the scope of this class. It is, however, useful to know where each protocol resides. TCP operates in the transport layer and IP in the Internet layer. See Figure 2-2 on page 38. Some of the application layer protocols include HTTP, SMTP, POP, IMAP, and FTP. (Telnet also operates in the application layer). 34 HTTP • HTTP (hypertext transfer protocol) is the protocol responsible for transferring and displaying Web pages. • It has continued to evolve since being introduced. • Like other Internet protocols, HTTP uses the client/ server model of computing. Thus, to understand how HTTP works, we need to first discuss the client/server model. 35 Client/server model • In the client/server model there are two roles: the client and the server. • The client process makes requests of the server. The client is only capable of sending a request to the server and then waiting for the reply. • The server satisfies the requests of the client. It usually has access to a resource, such as data, that the client wants. When the resource that the client wants becomes available, it sends a message to the client. • This model simplifies communication. 36 HTTP and client/server • With HTTP the client is the user’s Web browser and the server is the Web server. • To open a session, the browser sends a request to the server that holds the desired web page. • The server replies by sending back the page or an error message if the page could not be found. • After the client verifies that the response sent was correct, the TCP/IP connection is closed and the HTTP session ends. • Each new page that is desired will result in a new HTTP session and another TCP/IP connection. 37 One page, multiple requests • If a Web page contains objects such as movies, sound, or graphics, a client must make a request for each object. • For example, a Web page containing a background sound and three graphics will result in five separate server request messages to retrieve the four objects plus the page itself. 38 Mail protocols • Electronic mail is managed and stored by programs and hardware collectively known as mail servers. • It is displayed and created using mail programs such as Eudora, Pine, Unix mail, Outlook, etc. • Protocols are necessary for the mail programs to construct requests to receive e-mail and for the mail servers in downloading and/or deleting e-mail. • The e-mail protocols we will discuss use the client/ server model. 39 SMTP • SMTP (simple mail transfer protocol) is used by client mail programs to make requests about mail delivery. • A wide variety of mail programs such as Eudora, Unix mail, and PINE, use SMTP to send mail to a mail server. • The SMTP specifies the exact format of a mail message and describes how mail is to be administered. 40 POP • POP (post office protocol) is responsible for retrieving e-mail from a server. • Provides support for Multipurpose Internet Mail Extensions (MIME), which allow the user to attach binary file messages such as spreadsheets. • When e-mail is read, POP specifies that it must be downloaded to your computer from the server. Read e-mail is not kept on the server. 41 IMAP IMAP (Internet Message Access Protocol) is a newer protocol that, like POP, can ask if there is new mail, download e-mail, and delete e-mail. IMAP has capabilities that POP does not: • It defines how a client program asks a mail server to present available mail. • It can allow you to manipulate and manage e-mail without downloading it from the server. 42 Internet addresses Internet addresses are represented in several ways, but all the formats are translated to a 32-bit number called an IP address. The increased demand for IP addresses will soon make 32-bit addresses too small, and they will be replaced with 128-bit addresses in the near future. See the links page for more information. How does increasing the number of bits in the address help with increasing demand? 43 Dotted quads • IP numbers appear as a series of up to 4 separate numbers delineated by a period. • Examples: students.depaul.edu: 140.192.1.100 condor.depaul.edu: 140.192.1.6 facweb.cs.depaul.edu: 140.192.33.6 • Each of the four numbers can range from 0 to 255, so the possible IP addresses range from 0.0.0.0 to 255.255.255.255 44 Domain names • Since IP numbers can be difficult for humans to remember, domain names are associated with each IP address. • Examples: students.depaul.edu: 140.192.1.100 facweb.cs.depaul.edu: 140.192.33.6 • A domain name server is responsible for the mapping between domain names and IP addresses. 45 Uniform resource locator • People on the Web use a naming convention called the uniform resource locator (URL). • A URL consists of at least two and as many as four parts. • A simple two part URL contains the protocol used to access the resource followed by the location of the resource. Example: http://www.cs.depaul.edu/ • A more complex URL may have a file name and a path where the file can be found. 46 A URL deconstructed http://facweb.cs.depaul.edu/asettle/ect250/section602/hw/assign2.htm hypertext transfer protocol domain path that indicates the location of the document in the host’s file system document name 47 Anatomy of an e-mail address asettle @ cs . depaul . edu Handle Host/Server Domain Domain Type Others: • students • hawk • condor 48 Domain types • edu: educational • com: commercial • net: originally for telecommunications • org: organizations (non-profit) • gov: U.S. government • ja, uk, de, … : Nations other than the U.S. • New additions: info, biz, name, pro, museum, coop, aero, tv. See links page for a related news story. 49 Internet utility programs TCP/IP supports a variety of utility programs that allow people to use the Internet more efficiently. These utility programs include: • Finger • Ping • Talk 50 Finger Finger is a program that allows a user to obtain limited information about other network users. The information that can be obtained includes: • Which users are currently logged on • Where each user logged onto the network from • How long the user has been on the network • When the user last logged onto the system Finger is sometimes disabled for security reasons. 51 Ping • Ping (Packet InterNet Groper) tests the connectivity between two Internet hosts and determines if a host is active on the network. • It works by sending a packet to the specified address and waiting for a reply. • Ping is typically used to troubleshoot connections. • To run ping, you simply type ping followed by the IP address or domain name of the machine you are interested in. Example: ping students.depaul.edu 52 Talk • Talk is a program that runs on Unix computers and allows a user to interactively communicate with another user on the network. • To use it you type talk followed by the string username@host where username is the person you wish to talk to and the host is the machine where they are logged on. Example: talk [email protected] • It is similar to instant messaging programs such as AOL instant messaging, Yahoo! pager, etc. 53