Transcript Document

Module 3 - Internet

Search Engines

Search engine anatomy Different search engines Effective searching techniques

Search Engines

 Need for it?

  Multitude of web pages exist on the web.

How to locate the most relevant to your needs?

 Anatomy of a Search Engine  Spider a.k.a robots, webbots  A program that traverses the web and stores the contents of all searchable web pages.

  Web sites can deny access to some resources.

  Using a robots.txt file eg. Try http://www.usask.ca/robots.txt

User-agent: * Disallow: /testing

Anatomy…

 Spider…  Indexing Software  Indexes the web pages into a easily searchable database collection  Interface for queries  Allows users to enter keywords and other combinations.

 Searches are performed within the indexed database

Different Search Engines

   www.yahoo.com

   Directory listing organised into various categories  Yellow pages in our phone book.

 All page are hand linked

“Yet Another Hierarchical Officious Oracle”

Gulliver’s travels-

‘yahoo’

www.altavista.com

“a view from above”

 First truly huge collection of indexed database of web pages www.google.com

“googol”

: 1 followed by 100 zeros  Top search engine today - over 100 million queries a day.

Why Google?

 Relevant results are ranked at the top (first) page of a query.

    Why relevance is important?

 Typical user rarely goes beyond the first page How is relevance measured?

  Number of links that point to the same page.

Not

just

by the number of times a keyword is repeated.

Careful here

: If enough people say a becomes the truth. - Goebbelsian

lie

Lies to be

true

, it

Googlebomb

:

“talentless hack”

Googlewhack

:

‘the search for the

one’!!

 Eg.

ceremonial overstuffing

Effective Searching

 Composing the right keywords in the query  Saves time and frustration  AND OR NOT     AND: combines two keywords  specifies that both keywords should be found on the resulting web page OR: combines two keywords  Specifies one or both keywords to be found on the web page NOT: operates on a single keyword  Ensures that this keyword should not be found in any page returned.

Examples:

vacation london OR paris

bass AND fishing NOT music

Effective Searching..

  +/- signs   + indicates a keyword

must

be present in the result - indicates a keyword

must not

be present   

star wars episode +1

Quotation marks “ ”  The signs are usually

stuck

to the keyword Example:

+bass +fish –music

Groups a set of keywords and the resulting page should have these in the

exact

same order   Can be used in combination with other methods Examples:

“star wars episode 1”

“to kill a mocking bird” -movie

Networking and Telecommunication

Topics

 Linking Up: Network Basics  Connecting to the Internet  Networks: Near and Far  Communication Software

Linking Up: Network Basics

 A computer

network

is any system of two or more computers that are linked together.

 How do networks impact systems?

 People share computer hardware, thus reducing costs  People share data and software programs, thus increasing efficiency and production

Linking Up: Network Basics

Internet

is a network of networks   Globally connected network that links various organisations and individuals.

Web is not Internet.

 WWW is one particular usage of internet.

 Email, FTP (File Transfer Protocol) are other such uses.

Connecting to the Internet

 The amount of information that can be transmitted in a given amount of time is defined as the

bandwidth

 Impacted by:  Physical media that make up the network  Amount of network traffic  Software protocols of the network

Communication á la Modem

 A

modem

is a hardware device that connects a computer’s serial port to a telephone line (for remote access).

 Modulator-demodulator  May be internal on the system board or external modem sitting in a box linked to a serial port.

 Modem transmission speed is measured in bits per second (bps) and generally transmit at 28,000 bps to 56.6K bps

Connecting to the Internet

 Direct connections using T1 or T3 lines.

 1.5Mpbs to 45 Mbps  Dial up connections  Modems  Broadband connections  DSL Digital Subscriber Line 300Kbps to 1.3Mbps  Cable Modems 10Mbps.

Networks: Near and Far……

Networks Near and Far

Local-area network (LAN)  Computers are linked within a building or cluster of buildings.

 Each computer and peripheral is an individual node on the network.

 Nodes are connected by cables which may be either twisted pair (copper wires) or coaxial cable.

Wide-Area Networks

  A network that extends over a long distance .

Each network site is a node on the WAN network  Made up of LANs linked by phone lines, microwave towers, and communication satellites.

 Data is transmitted over common pathways called a

backbone

.

 CANet3 http:// www.canet3.net/stats/CAnet3map/CAnet3map.htm

CANet3: Canadian backbone

Protocols for Communication……

Communication Software

Protocol -

set of rules for the exchange of data between a terminal and a computer or between two computers  TCP/IP Transmission Control Protocol / Internet Protocol  Messages are broken into Packets - 1500 bytes  Packets are numbered and sent over the network

Communication Software

 IP defines the addressing system  128.236.24.161 - 4 bytes, 0 to 255  Every packet includes the source IP, destination IP and the packet number (7 of 13)  TCP is an end-to-end protocol.  packets are reliably transmitted from one computer to another.

 Lost packets are re-transmitted.

Communication Software

 

Communication software

establishes a protocol that is followed by the computer’s hardware Different forms: 

Client/server model

- one or more computers act as dedicated servers and all the remaining computers act as clients  Web server and client browsers 

Peer-to-peer model

both client and server - every computer on the network is  Napster, Gnutella  Many networks are

hybrids

, using features of the client/server and peer-to-peer models

Client/Server Model

Server

software responds to client requests by providing data

Client

software sends requests from the user to the server eg. http://www.cs.usask.ca

Internet Addresses…

Internet Addresses

 The host is named using DNS (domain name system), which translates IP addresses into a string of names.

 Address: 128.233.130.63 is www.cs.usask.ca

 Address: 216.239.51.101 is www.google.com

 Easier to remember strings of alphabets than numbers.

Internet Domains

Top level domains include:  .

edu - educational sites       .com - commercial sites .gov - government sites .mil - military sites .net - network administration sites .org - nonprofit organizations .ca - Canada

Addressing Computers

 Unique IP numbers  Need for it? – similar to the house address  DNS servers  Arranged in a hierarchy - 4 top level servers in US  Multiple computers can be mapped on to the same domain name  Eg. www.yahoo.com

 Gateways  Takes care of routing packets in and out of a LAN  Routers  Takes care of routing packets across multiple network nodes

Addressing Persons

Examples:

[email protected]

User

President

whose mail is stored on the host

whitehouse

in the

government

domain of

USA [email protected]

User

abc123

at the server for

Computer Science, University of Saskatchewan, Canada.

Internet Email Addresses

An Internet address includes: [email protected]

 

username

is the person’s “mailbox”

hostname

is the name of the host computer and is followed by one or more domains separated by periods: –

host.subdomain.domain : @mail.usask.ca

host.domain : @hotmail.com

host.subdom.subdom.domain : @finance.sk.gov.ca

Web Addresses

Dissecting Web Page address:

Path to the host

http:// www.vote-smart.org/ help/database.html

Protocol for Web pages Resource Page

Addressing Resources

     URL: Uniform Resource Locator Web: http://www.cs.usask.ca/index.html

 A

Web server

browsers.

stores Web pages and sends pages to client Web FTP: ftp://ftp.cs.usask.ca

File transfer protocol (FTP)

allows users to download files from remote servers to their computers and to upload files.

Telnet: telnet://scrooge.usask.ca

 Allows users to

login

into remote computers.

Other resources like Gopher, NNTP - newsgroups

Cookies

Cookies: what are they? Are files created on your computer by a website to store information about you.

To accept or not ?

Benefits: stores some of the personal information (repeat info) allows pages to be customised to your preferences Eg. Layouts, advertisements… Privacy issues.

Do you want your browsing patterns to be used by a company/organisation?

Email, Viruses and Internet Issues

Topics

 E-mail: Access Protocols  Other Internet Applications: Chat, Newsgroups  Netiquette: some tips  Intranets and Extranets  Viruses  Internet: Ethical and Political issues

Email on the Internet

 Email formats include:  ASCII text--can be viewed by any mail client program   HTML--displays text formatting, pictures, and links to Web pages SMTP – Simple Mail Transfer Protocol   Asynchronous communication form UUCP – Unix to Unix Copy

Email on the Internet

 What appears on the screen depends on the type of Internet connection you have and the mail program you use.

 Popular graphical email programs include Eudora, Outlook and Netscape Communicator.

Email on the Internet

 IMAP Vs POP:  Internet Message Access Protocol

Vs

Post Office Protocol  Messages remain on the email server

Vs

messages are downloaded to your computer and deleted in the mail server.

  Online

Vs

Offline access.

Retrieve messages in any order

Vs

“in-order” retrieval  Limit set by your e-mail server Vs number of messages is limited by your hard-disk size.

Mailing Lists & Network News

Mailing lists

allow you to participate in email discussion groups on special-interest topics.  E-mails are sent to the whole group  A

newsgroup

is a public discussion on a particular subject consisting of notes written to a central Internet site and redistributed through a worldwide newsgroup network called Usenet.

 Protocol used

NNTP

– Network News Transport Protocol   I-HELP is a similar application. - More like a message board.

Could be local interest too: usask.forsale

Real-Time Communication

 Users are logged in at the same time.

Instant Messaging

for exchanging instant messages with on-line friends and co-workers 

Chat Rooms

for conversing with multiple people in real-time  Internet telephony (IP telephony) for long distance toll-free telephone service  Videoconferencing for two-way meetings

Rules of Thumb: Netiquette

         Say what you mean and say it with care.

Keep it short and to the point.

Proof-read your messages.

Learn the “nonverbal” language of the Net. :) Keep your cool.

Don’t be a source of spam (Internet junk mail).

Lurk before you leap.

Check your FAQs (Frequently Asked Questions) Give something back.

Intranets and Extranets

Intranets

are self-contained intra organizational networks that offer email, newsgroups, file transfer, Web publishing and other Internet-like services.

Firewalls

prevent unauthorized communication and secure sensitive internal data 

Gateways

where the firewalls exist, act as the gate keeper.

Intranets and Extranets

Extranets

are private TCP/IP networks designed for outside use by customers, clients and business partners of the organisation.

Electronic data interchange - EDI -

a set of specifications for ordering, billing, and paying for parts and services over private networks

Viruses

Viruses

are programs that could damage your data and hinder a computer’s normal functioning.

 Activate itself : executable files, boot sector, macros   Replicate itself: through e-mail attachments Do “something”: destroy contents   

Trojan horses

useful software.

are malicious programs disguised as

Worms

are programs that could travel across the network and replicate themselves.

Anti-Virus programs check for known viruses  Strains are identified by “unique” strings and their actions.

Internet Issues: Ethical and Political Dilemmas

      Copyright Laws: how do they apply for online content?

 Especially across international boundaries.

Filtering software to combat inappropriate content  Parental controls.

Digital cash to make on-line transactions easier and safer Encryption software to prevent credit card theft Digital signatures to prevent email forgery

Digital divide

: computer haves from have-nots.

Next Class HTML

This text coded as HTML ..

Welcome to Computer Confluence

Publishing on the Web Appears like this on the screen …