LSU SLIS Computing Session 1 LIS 7008 Information Technologies Agenda • About the course • Looking backwards • What “computers” do • How they do it.

Download Report

Transcript LSU SLIS Computing Session 1 LIS 7008 Information Technologies Agenda • About the course • Looking backwards • What “computers” do • How they do it.

LSU SLIS Computing

Session 1 LIS 7008 Information Technologies

• About the course

Agenda

• Looking backwards • What “computers” do • How they do it

Course Description

• Course Website: http://www.csc.lsu.edu/~wuyj/Teaching/7008/fa09/index.html

Read: Description, Syllabus, Project • Project (think about it early and bear that in your mind) • Slides usually posted on before 9pm Wednesdays • SLIS Web server account assignment – Please email me your preferred userid and password • Jon Frosch (email: [email protected]

) will set up your account • FTP software – Download and install FileZilla FTP Client (see syllabus page session 1) – Internet Explorer Built-in FTP: sftp://slis.lsu.edu

Course Goals

• Conceptual goals – Understand computers and networks – Appreciate the effects of design tradeoffs – Evaluate the role of information technology • Practical goals – Learn to use some common tools – Solve a practical problem • Create a Website for a real organization – Develop a personal plan for further study

Some Motivating Questions

• What are the technical implications for: – Privacy? – Copyright?

• How will digital repositories develop?

– How will they interact with distance education?

– What are the implications for archives?

• How might electronic dissemination impact: – Roles of authors, publishers, and readers?

– Access by disenfranchised populations?

Some IT-Related Courses

• LIS 7409 HCI • LIS 7410 Digital Libraries • LIS 7510 Website Design and Management • LIS 7610/CSC7481 Information Retrieval Systems • LIS 7502 Network for Information Centers • CSC 4402 Introduction to DBMS

Approach to Learning LIS7008

• Readings – Readings provide background and detail of a topic – Please do the required readings before reading slides!

• So that you can understand my slides.

– Do readings under “Useful Resources” (on the Syllabus page) that interest you.

• Slides – Provide conceptual structure of each class session • Homework, project – Provide hands-on experience of learning/applying technologies • Quiz, exams – Measure your progress

Workload

• At least 9 hours every session (for an average student) – 3 hours: do required readings, discuss on Moodle – 3 hours: in class – 2-3 hours: do the homework – 1 hour: do extra readings (under Useful Resources) • Technologically weak students are expected to invest more time.

Computers

The Grand Plan

CMC Web Databases Networking HCI HTML/XML Web Quiz Multimedia Search Midterm Programming Policy Project Databases Life Cycle Final

A Personal Approach to Learning

• Work ahead, so that you are never behind • Ask questions about the readings and homework • In class or go to Moodle to post your questions and answers • Augment practical skills with outside resource – Especially “Useful Resource” for each session (on the Syllabus page) • Start thinking about your project soon – Read the project information on the Syllabus page – Pick partners with

complementary

skills • Project management skill, technical skill, design skill, communication skill… – Work in Group of 3 persons preferably (group of 2 allowed, 4 not allowed) – Start to contact your client early to solicit their information needs. They can be busy too.

Syllabus

• Master the tools in the first 8 sessions – 2-3 readings and one homework most sessions • Explore integrating issues in the last 8 weeks – 1 reading each week + the project – Most students need to invest significant amount of time on the project!

Grading

• 35-38% individual work – Midterm/Final Exams: 25% for the best, 10% for the other – Quiz: 3% • 12-15% individual or group work – in any groups you like – 3% each for best 5 of the 7 homework/quiz – Quiz: not group work • 40% group work, in 3-person project teams – Project has higher weight than project report • 10% “class” participation – Your questions/answers, discussions in class and on Moodle will be credited to your class participation grade.

Some Observations on Grading

• One exam is worth more than all the homework – Message: Use the homework to learn the material – It is a very bad idea to skip any homework!

• Midterm grades predict final grades well – Message: Develop sound study skills early • You need not be good at everything to get an A – But you do need to be excellent at several things

The Fine Print

• Group work is encouraged on homework – You can discuss homework with classmates anywhere!

– But you must personally write what you turn in.

– Copying other people’s homework is plagiarism and will be reported to the Dean & the university • All deadlines are firm and sharp – The pace of the course is very fast!

– Allowances for individual circumstances are already included in the grading computation • Academic integrity is a serious matter – Open book on quiz and exams.

• You can check books, Moodle, and Internet. – No group work during the exams or the quiz!

• Do not consult any living person (except the teaching staff) during the exams or the quiz.

Course Materials

• Helps learning: – Textbooks – Course Website • Supplemental readings (recommended) – Daily access to a networked computer!

Check Moodle regularly.

• A USB memory stick may be useful.

Goals of Session 1

• By the end of this class, you will… – Know what this course is about, what you need to do – Have a basic understanding of computers – Know how to think about “space,” “time” and “speed” – Understand of how computers store data and move data around

A Very Brief History of Computing

• Computer = “a person who computes” (< 1940’s) • Hardware: all developed for the government – Mechanical: essentially a big adding machine – Analog: designed for calculus, limited accuracy – Digital: early machines are large: filled a room – Microchips: designed for missile guidance • Software: initial applications were military – Numeric: computing gun angles – Symbolic: code-breaking

Commercial Developments

• Mainframes (1960’s) – IBM • Minicomputers(1970’s) – DEC • Personal computers (1980’s) – Apple, Microsoft • Networked computers (1990’s) – Web (who invented Internet?) • Ubiquitous and embedded computers: convergence (2000’s) – Cell phones/PDA, TV/Computer, … • Q: how many computers do you have in your room?

Hardware Processing Cycle

• Input comes from somewhere – Keyboard, mouse, microphone, camera, … – Computers fetch data from memory • The system does something with the input – Use processor, memory, software, network, … – Add, subtract, multiply, etc.

• Output goes somewhere – Monitor, speaker, robot controls, … – Store data back into memory

What’s Special About Computers?

• Digital content – Computers make perfect copies of digital content – Human beings cannot do that well.

• Programmed behavior – Speed • Processing speed is programmed – Repetition • Being able to do things again and again is programmed – Complexity • Being able to do complex jobs is programmed.

Today’s Focus

Storing

and

moving

around data – Within a computer – Between computers • Inside a single computer: connecting the processor with the memory • Between multiple computers: computer networks (next session)

Thinking about Size

• What’s a bit?

• How much information can 8 bits represent?

• What’s the difference between decimal and binary?

• And octal?

• And hexadecimal?

Unit

bit byte kilobyte megabyte gigabyte terabyte petabyte

Units of Size

Abbreviation Size (bytes)

b 1/8 B KB MB GB TB 1 2 10 = 1024 2 20 = 1024x1024=1,048,576 2 30 = 1,073,741,824 2 40 = 1,099,511,627,776 PB 2 50 = 1,125,899,906,842,624 1 MB = 1024 KB, 1 GB = 1024 MB

How do hard drive manufactures “cheat” you? 100GB=?MB

Thinking About Time for Transferring Data on the Internet

• Total “transfer time” is what counts – Time for first bit + time between first and last bits • For long distances, the first factor is important – California: 1/80 of a second (by optical fiber) – London: 1/4 of a second (by satellite) • • • For large files, the second factor dominates – Number of bits per second is limited by physics • Telephone line, cable, optical fiber, satellite all have physical limits

Latency:

the amount of time it takes data to travel from source to destination

Bandwidth:

the amount of data that can be transmitted in a fixed amount of time

Latency and Bandwidth Examples

• Latency -- Chat room: latency of 5 seconds is acceptable -- Telephone call: latency of 5 seconds is agonizing -- Human perception time: A quarter of a second delay • Bandwidth -- Typing speed: 6 words a second. (that is your typing bandwidth) • Latency question: – I have a time-critical dataset that fills up a whole hard disk with 160GB. I want to send the data to CA ASAP. I am given two options: use a Modem with a bandwidth of 54kbps (which is about 7k bytes/s), or take a flight (which is 5 hours). Which option shall I choose?

Discussion Point

• What are the latency and bandwidth requirements for the following applications?

– Streaming audio (e.g., NPR broadcast over Web) – Streaming video (e.g., CNN broadcast over Web) – Audio chat (telephone) – Video conferencing

Thinking About Speed

• Speed can be expressed two ways: – How long to do something once?

• Memory speed measured as “access time” – How many things can you do in one second?

• Processor speed measured in “instructions per second (IPS)” • Bandwidth measured in “bits per second (bps)” • Convenient units are typically used – “10 microseconds” rather than “0.00001 seconds” • When comparing speeds, convert units first!

Unit

second millisecond microsecond nanosecond picosecond femtosecond

Units of Time

Abbreviation

sec/s ms m s ns ps fs

Duration (seconds)

1 10 -3 = 1/1,000 10 -6 = 1/1,000,000 10 -9 = 1/1,000,000,000 10 -12 = 1/1,000,000,000,000 10 -15 = 1/1,000,000,000,000,000 A blink of eye = 0.1 second = 100 million ns 1 computer operation: at the scale of ns

Units of Frequency

Unit

hertz kilohertz megahertz gigahertz

Abbreviation

Hz KHz MHz GHz

Cycles (or operations) per second

1 10 3 = 1,000 10 6 = 1,000,000 10 9 = 1,000,000,000

Correlate Time with Frequency

• Hertz: cycle (or operation) per second • 1 GHz = 1 operation/ns • 1 KHz = 1 operation/ms

Aside: The Gigahertz Race

• Intel Pentium 4: 3.80 GHz • Intel Core Duo: 2.33 GHz • What does it mean?

• Which is actually faster?

• Why is this important for consumers?

Computer Hardware

• Central Processing Unit (CPU) – Intel Xeon, Motorola Power PC, … • Communications “Bus” – FSB, PCI, ISA, USB, Firewire, … • Storage devices – Cache, RAM, hard drive, floppy disk, DVD … • External communications – Modem, Ethernet, GPRS, 802.11 wireless, …

Extracted From Shelly Cashman Vermatt’s Discovering Computers 2004

System Architecture

Keyboard Mouse Sound Card Video Card Input Controller System Bus Front Side Bus CPU L2 L1 Cache RAM Hard Drive CD/ DVD USB Port Motherboard

The CPU and the Memory

• CPU (Central Processor Unit) – where actual computation is performed • Memory – location of data on which computation is performed • Bus – moves data from memory to and from CPU • Desiderata for memory: – Large – Fast – Cheap

The Storage Hierarchy

• The problem: – Fast memory is expensive (such as cache, RAM) • So large memory (such as hard disk) is slow!

– But fast access to large memories is needed!

• The solution: – Keep what you need often in small (fast) places • Keep the rest in large (slow) places – Get things to the fast place before you need them

Best of Both Worlds?

Small, but fast… + = Large and fast?!

Large, but slow… Think about your bookshelf and the library…

The Storage Hierarchy

Type

Registers Cache RAM Hard drive

Speed

~300 ps ~1 ns ~10 ns ~10 ms

Size

256 B 4 MB 1 GB 100 GB

Cost

Very expensive Expensive Cheap Very cheap 1 ps (picosecond) = 1/1000 ns

Locality

Spatial locality:

If the system fetched data

x

, it is likely to fetch data located near

x

Temporal locality:

is likely to fetch

x

If the system fetched data again

x

, it • Insight behind the storage hierarchy : move important data from slow, large memory to fast, small memory for processing. • Caching strategies: what’s the most effective strategy for moving data around?

• • • •

“Solid-State” Memory

ROM – Does not require power to retain content – Used for “Basic Input/Output System” (BIOS) Cache (Fast low-power “Static” RAM) – Level 1 (L1) cache: small, single-purpose (dedicated) – Level 2 (L2) cache: larger, shared (“Dynamic”) RAM (Slower, power hungry) – Reached over the “Front-Side Bus” (FSB) Flash memory (fast read, slow write EEPROM) – Electrically Erasable Programmable Read-Only Memory – Reached over USB bus or SD socket – Used in memory sticks (“non-volatile” storage) – Q: how many photos (1MB each) can a 4GB SD card store?

“Rotating” Memory

• Fixed magnetic disk (“hard drive”) – May be partitioned into multiple volumes • In Windows, referred to as C:, D:, E:, … • In Unix, referred to as /software, /homes, /mail, … • Removable magnetic disk – Floppy disk, zip drives, flash memory stick … • (Removal) optical disk – CDROM, DVD, CD-R, – Removable: CD-RW, DVD+RW, …

How Disks Work

Also: http://en.wikipedia.org/wiki/File:Innansicht_Festplatte_512_MB_von_Quantum.jpg

Move header  a cylinder on all platters  a sector Extracted From Shelly Cashman Vermatt’s Discovering Computers 2004

Trading Speed for Space

• Hard disk is larger than RAM but much slower – Typical hard drive: 3~12 ms access time , 100 GB (at 5400 rpm) • One thousand times larger than RAM •

10 million

times slower than the CPU!

• The initial access is the slow part – Subsequent bytes sent at 17 MB/sec (60 ns/byte) • As “virtual memory,” makes RAM seem larger – But too little physical RAM results in “thrashing” • Q about speed and space: your computer ran slow, IT specialist suggested buying bigger RAM, why?

Everything is Relative

• The CPU is the fastest part of a computer – 3 GHz Core 2 Duo = 6,000 MIPS (why?) • 3 operations per processor every nanosecond • Cache memory is fast enough to keep up – 128 kB L1 cache on chip (dedicated, CPU speed) – 4 MB L2 cache on chip (shared, CPU speed) • RAM is larger, but slower – 1 GB or more, about 6 ns access time

Moore’s Law

• Processing speed doubles every 18 months – Faster CPU, longer words, larger L1 cache • Cost/bit for RAM drops 50% every 12 months – Small decrease in feature size has large effect • Cost/bit for disk drops 50% every 12 months – But transfer rates don’t improve much

Tape Backup

• Hard Disk: files erased/modified; sectors can fail – Allows random access of data. • Tapes store data sequentially – Very fast transfer, but not “random access” • Used as backup storage for fixed disks – Weekly incremental backup is a good idea • With a complete (“level zero”) monthly backup • Used for archival storage – Higher data density than DVD’s

Discussion Point: Data Migration

• What format should old tapes be converted to?

– Newer tape?

– CD?

– DVD?

• How often must we “refresh” these media?

• How can we afford this?

Types of Software

• Application programs (e.g., Internet Explorer) – What you normally think of as a “program” • Compilers and interpreters (e.g., JavaScript) – Allow programmers to create new behavior • Operating system (e.g., Windows XP/Vista) – Moves data between disk and RAM (+lots more!) • Embedded program (e.g., BIOS) – Permanent software inside some device

Installing Applications

• Copy to a permanent place on your hard drive – From a CD, the Internet, … • Installs any other required programs – “.DLL” files can be shared by several applications • Register the program’s location – Associates icons/start menu items with it – Configures the uninstaller for later removal • Configure it for your system – Where to find data files and other programs

Discussion Point: What’s a Virus?

• Characteristics – Initiation • Somebody sent the virus to you and you initiated it.

– Behavior • it does something to your computer – Propagation • propagate to your system and network • Spyware • Is it a kind of virus?

• Detection – Virus detection software

Graphical User Interfaces

• Easy way to perform simple tasks – Used to start programs, manage files, … – Relies on a physical metaphor (e.g., a desktop) • Built into most modern operating systems – Windows XP, Mac OS-X, Unix X-windows • Application programs include similar ideas – Point-and-click, drag and drop, …

Cursor-based Interfaces

• Useful for specifying complex operations – Or when graphical display is impractical • Available in most operating systems – telnet – Ping (e.g., ping www.lsu.edu) – Manual FTP – Command window in Windows XP • Initiate a cursor-based interface (an exercise)

Summary

• For computation to occur, data must be moved to and from memory • Different type of memories represent different tradeoffs • Speed, cost, and size: – You can easily get any 2, but not all 3 – Computers use caching as a compromise strategy; Caching strategies and the storage hierarchy give us the best of both worlds • Hardware and software work synergistically – Our focus will be on software and the Internet – But understand hardware abilities and limitations

Homework 1 Goals

• Think about relative speed and relative size • Interpret specifications for computer systems • Try some “back of the envelope” calculations • Some helpful hints: – There is a calculator in Windows accessories – If you’re rusty on math, ask TA and me for help or post a question on Moodle.

– I will prepare a short tutorial on this (in narrated PPT) • Homework posted on course Website (syllabus page)

Goals

• By the end of this class, you will… – Have a basic understanding of computers – Know how to think about “space,” “time” and “speed” – Understand of how computers store data and move data around • Have we reached these goals?

Reminder

• We will set up a SLIS server account for you.

• Do not forget to email me your preferred user id and password. Do this ASAP if you have not done so.

• TA lab hours: TBA

Questions/Comments?

• What was the muddiest point in this class session?

• You can send me questions and comments about this class session anonymously!

– Use an anonymous email account (such as yahoo, hotmail)