Computer Architecture & Related Topics

Download Report

Transcript Computer Architecture & Related Topics

Computer Architecture & Related Topics

Ben Schrooten Shawn Borchardt, Eddie Willett Vandana Chopra

   

Presentation Topics

Computer Architecture History Single Cpu Design GPU Design (Brief) Memory Architecture    Communications Architecture Dual Processor Design Parallel & Supercomputing Design

Part 1 History and Single Cpu

Ben Schrooten

HISTORY!!!

One of the first computing devices to come about was . . The ABACUS!

The

ENIAC : 1946

• • • • • Completed:1946 Programmed:plug board and switches Speed:5,000 operations per second Input/output:cards, lights, switches, plugs Floor space:1,000 square feet

The EDSAC(1949) and The UNIVAC I(1951)

UNIVAC EDSAC Speed:1,905 operations per second Technology:vacuum tubes Input/output:magnetic tape, unityper, printer lines Speed:714 operations per second Memory type:delay lines, magnetic tape First practical stored-program Technology:serial vacuum tubes, delay lines, computer Floor space:943 cubic feet Cost:F.O.B. factory $750,000 plus $185,000 for a high speed printer

Intel 4004 1971

Progression of The Architecture

Vacuum tubes -- 1940 – 1950 Transistors -- 1950 – 1964 Integrated circuits -- 1964 – 1971 Microprocessor chips -- 1971 – present

Current CPUArchitecture

•Basic CPU Overview

Single Bus Slow Performance

Example of Triple Bus Architecture

Motherboards / Chipsets / Sockets

OH MY!

•Chipset In charge of: •Memory Controller •EIDE Controller •PCI Bridge •Real Time Clock •DMA Controller •IRDA Controller •Keyboard •Mouse •Secondary Cache •Low-Power CMOS SRAM

•Socket 4 & 5 •Socket 7 •Socket 8 •Slot 1 •Slot A

Sockets

GPU’s

•Allows for Real Time Rendering Graphics on a small PC •GPUs are true processing units •Pentium 4 contains 42 million transistors on a 0.18 micron process •Geforce3 contains 57 million transistors on a 0.15 micron manufacturing process

More GPU

Sources

Source for DX4100 Picture Oneironaut http://oneironaut.tripod.com/dx4100.jpg

Source for Computer Architecture Overview Picture http://www.eecs.tulane.edu/courses/cpen201/slides/201Intro.pdf

Pictures of CPU Overview, Single Bus Architecture, Tripe Bus Architecture Roy M. Wnek Virginia Tech. CS5515 Lecture 5 http://www.nvc.cs.vt.edu/~wnek/cs5515/slide/Grad_Arch_5.PDF

Historical Data and Pictures The Computer Museum History Center.

http://www.computerhistory.org/ Intel Motherboard Diagram/Pentium 4 Picture Intel Corporation http://www.intel.com

The Abacus Abacus-Online-Museum http://www.hh.schule.de/metalltechnik didaktik/users/luetjens/abakus/china/china.htm

Information Also from Clint Fleri http://www.geocities.com/cfleri/ Memory Functionality Dana Angluin http://zoo.cs.yale.edu/classes/cs201/Fall_2001/handouts/lecture -13/node4.html

Benchmark Graphics Digital Life http://www.digit-life.com/articles/pentium4/index3.html

Chipset and Socket Information Motherboards.org

http://www.motherboards.org/articlesd/tech planations/17_2.html

Amd Processor Pictures Toms hardware http://www6.tomshardware.com/search/search.html?category=a ll&words=Athlon GPU Info 4 th Wave Inc.

http://www.wave-report.com/tutorials/gpu.htm

NV20 Design Pictures Digital Life http://www.digit-life.com/articles/nv20/

Main Memory

Memory Hierarchy

DRAM vs. SRAM

•DRAM is short for Dynamic Random Access Memory •SRAM is short for Static Random Access Memory DRAM is dynamic in that, unlike SRAM, it needs to have its storage cells refreshed or given a new electronic charge every few milliseconds. SRAM does not need refreshing because it operates on the principle of moving current that is switched in one of two directions rather than a storage cell that holds a charge in place.

Parity vs. Non-Parity

 Parity is error detection that was developed to notify the user of any data errors. By adding a single bit to each byte of data, this bit is responsible for checking the integrity of the other 8 bits while the byte is moved or stored.  Since memory errors are so rare, many of today’s memory is non-parity.

SIMM vs. DIMM vs. RIMM?

   SIMM-Single In-line Memory Module DIMM-Dual In-line Memory Modules RIMM-Rambus In-line Memory Modules   SIMMs offer a 32-bit data path while DIMMs offer a 64 bit data path. SIMMs have to be used in pairs on Pentiums and more recent processors RIMM is the one of the latest designs. Because of the fast data transfer rate of these modules, a heat spreader (aluminum plate covering) is used for each module

Evolution of Memory

1970 1987 1995 1997 1998 1999 1999/2000 2000 2001 RAM / DRAM FPM EDO PC66 SDRAM PC100 SDRAM RDRAM PC133 SDRAM DDR SDRAM EDRAM 4.77 MHz 20 MHz 20 MHz 66 MHz 100 MHz 800 MHz 133 MHz 266 MHz 450MHz

• FPM-Fast Page Mode DRAM -traditional DRAM •EDO-Extended Data Output -increases the Read cycle between Memory and the CPU •SDRAM-Synchronous DRAM -synchronizes itself with the CPU bus and runs at higher clock speeds

•RDRAM-Rambus DRAM -DRAM with a very high bandwidth (1.6 GBps) •EDRAM-Enhanced DRAM -(dynamic or power-refreshed RAM) that includes a small amount of static RAM (SRAM) inside a larger amount of DRAM so that many memory accesses will be to the faster SRAM. EDRAM is sometimes used as L1 and L2 memory and, together with Enhanced Synchronous Dynamic DRAM, is known as cached DRAM.

Read Operation

On a read the CPU will first try to find the data in the cache, if it is not there the cache will get updated from the main memory and then return the data to the CPU.

Write Operation

On a write the CPU will write the information into the cache and the main memory.

  

References

http://www-ece.ucsd.edu/~weathers/ece30/downloads/Ch7_memory(4x).pdf

http://home.cfl.rr.com/bjp/eric/ComputerMemory.html

http://aggregate.org/EE380/JEL/ch1.pdf

Defining a Bus

 A parallel circuit that connects the major components of a computer, allowing the transfer of electric impulses from one connected component to any other

   

VESA -

Video Electronics Standards Association  32 bit bus Found mostly on 486 machines Relied on the 486 processor to function People started to switch to the PCI bus because of this Otherwise known as VLB

ISA -

Industry Standard Architecture   Very old technology Bus speed 8mhz   Speed of 42.4 Mb/s maximum Very few ISA ports are found in modern machines.

MCA -

Micro Channel Bus  IBM’s attempt to compete with the ISA bus    32 bit bus Automatically configured cards (Like Plug and Play) Not compatible with ISA

EISA -

Extended Industry Standard Architecture   Attempt to compete with IBM’s MCA bus Ran on a 8.33Mhz cycle rate    32 bit slots Backward compatible with ISA Went the way of MCA

PCI –

Peripheral Component Interconnect        Speeds up to 960 Mb/s Bus speed of 33mhz 16-bit architecture Developed by Intel in 1993 Synchronous or Asynchronous PCI popularized Plug and Play Runs at half of the system bus speed

    

PCI – X

Up to 133 Mhz bus speed 64-bit bandwidth 1GB/sec throughput Backwards compatible with all PCI Primarily developed for increased I/O demands of technologies such as Fibre Channel, Gigabit Ethernet and Ultra3 SCSI.

AGP –

Accelerated Graphics Port     Essentially a high speed PCI Port Capable of running at 4 times PCI bus speed. (133mhz) Used for High speed 3D graphics cards Considered a port not a bus  Only two devices involved  Is not expandable

BUS 8-bit ISA 16-bit ISA EISA VLB PCI AGP AGP(X2) AGP(X4) 8 16 32 32 32 32 32 32 Width (bits) Bus Speed (Mhz) 8.3

8.3

8.3

33 33 66 66 X 2 66 X 4 Bus Bandwith (Mbytes/sec) 7.9

15.9

31.8

127.2

127.2

254.3

508.6

1017.3

IDE -

Integrated Drive Electronics    Tons of other names:

ATA

,

ATA/ATAPI, EIDE

,

ATA-2

,

Fast ATA

,

ATA-3, Ultra ATA

,

Ultra DMA

Good performance at a cheap cost Most widely used interface for hard disks

SCSI Small Computer System Interface “skuzzy”    Capable of handling internal/external peripherals Speed anywhere from 80 – 640 Mb/s Many types of SCSI

TYPE SCSI-1 Fast SCSI Fast Wide SCSI Ultra SCSI Ultra Wide SCSI Ultra2 SCSI

Wide Ultra2 SCSI

Ultra3 SCSI Ultra320 SCSI Bus Speed, MBytes/ Sec. Max.

5 10 20 20 40 40 80 160 320 Bus Width, bits 8 8 16 8 16 8 16 16 16 Max. Device Support 8 8 16 8 16 8 16 16 16

Serial Port

   Uses DB9 or DB25 connector Adheres to RS-232c spec Capable of speeds up to 115kb/sec

USB

  1.0

    hot plug-and-play Full speed USB devices signal at 12Mb/s Low speed devices use a 1.5Mb/s subchannel.

Up to 127 devices chained together 2.0

 data rate of 480 mega bits per second

  

USB On-The-Go

For portable devices.

Limited host capability to communicate with selected other USB peripherals A small USB connector to fit the mobile form factor

Firewire

i.e. IEEE 1394 and i.LINK

  High speed serial port 400 mbps transfer rate   30 times faster than USB 1.0

hot plug-and-play

PS/2 Port

 Mini Din Plug with 6 pins   Mouse port and keyboard port Developed by IBM

Parallel port i.e. “printer port”

  Old type Two “new” types  ECP (extended capabilities port) and EPP (enhanced parallel port)   Ten times faster than old parallel port Capable of bi-directional communication.

Game Port

  Uses a db15 port Used for joystick connection to the computer

Need for High Performance Computing

 There’s a need for tremendous computational capabilities in science engineering and business  There are applications that require gigabytes of memory and gigaflops of performance

What is a High Performance Computer

 Definition of a High Performance computer : An HPC computer can solve large problems in a reasonable amount of time Characteristics : Fast Computation Large memory High speed interconnect High speed input /output

How is an HPC computer made to go fast

 Make the sequential computation faster  Do more things in parallel

Applications

1> Weather Prediction 2> Aircraft and Automobile Design 3> Artificial Intelligence 4> Entertainment Industry 5> Military Applications 6> Financial Analysis 7> Seismic exploration 8> Automobile crash testing

Who Makes High Performance Computers

* SGI/Cray Power Challenge Array Origin-2000 T3D/T3E * HP/Convex SPP-1200 SPP-2000 * IBM SP2 * Tandem

Trends in Computer Design

 Performance of the fastest computer has grown exponentially from 1945 to the present averaging a factor of 10 every five years The growth flattened somewhat in 1980s but is accelerating again as massively parallel computers became available

Real World Sequential Processes

Sequential processes we find in the world.

The passage of time is a classic example of a sequential process.

Day breaks as the sun rises in the morning.

Daytime has its sunlight and bright sky.

Dusk sees the sun setting in the horizon.

Nighttime descends with its moonlight, dark sky and stars.

Parallel Processes

Music An orchestra performance , where every instrument plays its own part, and playing together they make beautiful music.

Parallel Features of Computers

Various methods available on computers for doing work in parallel are : Computing environment Operating system Memory Disk Arithmetic

Computing Environment - Parallel Features

Using a timesharing environment The computer's resources are shared among many users who are logged in simultaneously.

Your process uses the cpu for a time slice , and then is rolled out while another user’s process is allowed to compute.

The opposite of this is to use dedicated mode yours is the only job running.

where The computer overlaps computation and I/O While one process is writing to disk, the computer lets another process do some computation

Operating System - Parallel Features

Using the UNIX background processing facility a.out > results & man etime Using the UNIX Cron jobs feature You submit a job that will run at a later time.

Then you can play tennis while the computer continues to work.

This overlaps your computer work with your personal time.

Memory - Parallel Features

Memory Interleaving Memory is divided into multiple banks , and consecutive data elements are interleaved among them.

There are multiple ports to memory. When the data elements that are spread across the banks are needed, they can be accessed and fetched in parallel.

The memory interleaving increases the memory bandwidth .

Memory - Parallel Features(Cont)

 Multiple levels of the memory hierarchy Global memory which any processor can access.

Memory local to a partition of the processors.

Memory local to a single processor: cache memory memory elements held in registers

Disk - Parallel Features

RAID disk R edundant A rray of I nexpensive D isk Striped disk When a dataset is written to disk, it is broken into pieces which are written simultaneously to different disks in a RAID disk system.

When the same dataset is read back in, the pieces of the dataset are read in parallel, and the original dataset is reassembled in memory.

Arithmetic - Parallel Features

We will examine the following features that lend themselves to parallel arithmetic: Multiple Functional Units Super Scalar arithmetic Instruction Pipelining

Parallel Machine Model (Architectures)

von Neumann Computer

MultiComputer

  A multicomputer comprises a number of von Neumann computers or nodes linked by a interconnection network In a idealized network the cost of sending the a message between two nodes is independent of both node location and other network traffic but does depend on message length

Locality Scalibility Concurrency

Distributed Memory (MIMD)

MIMD means that each processor can execute separate stream of instructions on its own local data,distributed memory means that memory is distributed among the processors rather than placed in a central location

 Difference between multicomputer and MIMD The cost of sending a message between multicomputer and the distributed memory is not independent of node location and other network traffic

Examples of MIMD machine

MultiProcessor or Shared Memory MIMD

 All processors share access to a common memory via bus or hierarchy of buses

Example for Shared Memory MIMD

 Silicon Graphics Challenge

SIMD Machines

 All processors execute the same instruction stream on a different piece of data

Example of SIMD machine :

 MasPar MP

Use of Cache

Why is cache used on parallel computers?

The advances in memory technology aren’t keeping up with processor innovations.

Memory isn’t speeding up as fast as the processors.

One way to alleviate the performance gap between main memory and the processors is to have local cache.

The cache memory can be accessed faster than the main memory.

Cache keeps up with the fast processors, and keeps them busy with data.

Shared Memory Cache Memory 1 processor 1 Network Cache Memory 2 processor 2 Cache Memory 3 processor 3

Cache Coherence

What is cache coherence? Keeps a data element found in several caches current with each other and with the value in main memory.

Various cache coherence protocols are used.

snoopy protocol directory based protocol

Various Other Issues

  Data Locality Issue Distributed Memory Issue  Shared Memory Issue