Transcript Computer Architecture & Related Topics
Computer Architecture & Related Topics
Ben Schrooten Shawn Borchardt, Eddie Willett Vandana Chopra
Presentation Topics
Computer Architecture History Single Cpu Design GPU Design (Brief) Memory Architecture Communications Architecture Dual Processor Design Parallel & Supercomputing Design
Part 1 History and Single Cpu
Ben Schrooten
HISTORY!!!
One of the first computing devices to come about was . . The ABACUS!
The
ENIAC : 1946
• • • • • Completed:1946 Programmed:plug board and switches Speed:5,000 operations per second Input/output:cards, lights, switches, plugs Floor space:1,000 square feet
The EDSAC(1949) and The UNIVAC I(1951)
UNIVAC EDSAC Speed:1,905 operations per second Technology:vacuum tubes Input/output:magnetic tape, unityper, printer lines Speed:714 operations per second Memory type:delay lines, magnetic tape First practical stored-program Technology:serial vacuum tubes, delay lines, computer Floor space:943 cubic feet Cost:F.O.B. factory $750,000 plus $185,000 for a high speed printer
Intel 4004 1971
Progression of The Architecture
Vacuum tubes -- 1940 – 1950 Transistors -- 1950 – 1964 Integrated circuits -- 1964 – 1971 Microprocessor chips -- 1971 – present
Current CPUArchitecture
•Basic CPU Overview
Single Bus Slow Performance
Example of Triple Bus Architecture
Motherboards / Chipsets / Sockets
OH MY!
•Chipset In charge of: •Memory Controller •EIDE Controller •PCI Bridge •Real Time Clock •DMA Controller •IRDA Controller •Keyboard •Mouse •Secondary Cache •Low-Power CMOS SRAM
•Socket 4 & 5 •Socket 7 •Socket 8 •Slot 1 •Slot A
Sockets
GPU’s
•Allows for Real Time Rendering Graphics on a small PC •GPUs are true processing units •Pentium 4 contains 42 million transistors on a 0.18 micron process •Geforce3 contains 57 million transistors on a 0.15 micron manufacturing process
More GPU
Sources
Source for DX4100 Picture Oneironaut http://oneironaut.tripod.com/dx4100.jpg
Source for Computer Architecture Overview Picture http://www.eecs.tulane.edu/courses/cpen201/slides/201Intro.pdf
Pictures of CPU Overview, Single Bus Architecture, Tripe Bus Architecture Roy M. Wnek Virginia Tech. CS5515 Lecture 5 http://www.nvc.cs.vt.edu/~wnek/cs5515/slide/Grad_Arch_5.PDF
Historical Data and Pictures The Computer Museum History Center.
http://www.computerhistory.org/ Intel Motherboard Diagram/Pentium 4 Picture Intel Corporation http://www.intel.com
The Abacus Abacus-Online-Museum http://www.hh.schule.de/metalltechnik didaktik/users/luetjens/abakus/china/china.htm
Information Also from Clint Fleri http://www.geocities.com/cfleri/ Memory Functionality Dana Angluin http://zoo.cs.yale.edu/classes/cs201/Fall_2001/handouts/lecture -13/node4.html
Benchmark Graphics Digital Life http://www.digit-life.com/articles/pentium4/index3.html
Chipset and Socket Information Motherboards.org
http://www.motherboards.org/articlesd/tech planations/17_2.html
Amd Processor Pictures Toms hardware http://www6.tomshardware.com/search/search.html?category=a ll&words=Athlon GPU Info 4 th Wave Inc.
http://www.wave-report.com/tutorials/gpu.htm
NV20 Design Pictures Digital Life http://www.digit-life.com/articles/nv20/
Main Memory
Memory Hierarchy
DRAM vs. SRAM
•DRAM is short for Dynamic Random Access Memory •SRAM is short for Static Random Access Memory DRAM is dynamic in that, unlike SRAM, it needs to have its storage cells refreshed or given a new electronic charge every few milliseconds. SRAM does not need refreshing because it operates on the principle of moving current that is switched in one of two directions rather than a storage cell that holds a charge in place.
Parity vs. Non-Parity
Parity is error detection that was developed to notify the user of any data errors. By adding a single bit to each byte of data, this bit is responsible for checking the integrity of the other 8 bits while the byte is moved or stored. Since memory errors are so rare, many of today’s memory is non-parity.
SIMM vs. DIMM vs. RIMM?
SIMM-Single In-line Memory Module DIMM-Dual In-line Memory Modules RIMM-Rambus In-line Memory Modules SIMMs offer a 32-bit data path while DIMMs offer a 64 bit data path. SIMMs have to be used in pairs on Pentiums and more recent processors RIMM is the one of the latest designs. Because of the fast data transfer rate of these modules, a heat spreader (aluminum plate covering) is used for each module
Evolution of Memory
1970 1987 1995 1997 1998 1999 1999/2000 2000 2001 RAM / DRAM FPM EDO PC66 SDRAM PC100 SDRAM RDRAM PC133 SDRAM DDR SDRAM EDRAM 4.77 MHz 20 MHz 20 MHz 66 MHz 100 MHz 800 MHz 133 MHz 266 MHz 450MHz
• FPM-Fast Page Mode DRAM -traditional DRAM •EDO-Extended Data Output -increases the Read cycle between Memory and the CPU •SDRAM-Synchronous DRAM -synchronizes itself with the CPU bus and runs at higher clock speeds
•RDRAM-Rambus DRAM -DRAM with a very high bandwidth (1.6 GBps) •EDRAM-Enhanced DRAM -(dynamic or power-refreshed RAM) that includes a small amount of static RAM (SRAM) inside a larger amount of DRAM so that many memory accesses will be to the faster SRAM. EDRAM is sometimes used as L1 and L2 memory and, together with Enhanced Synchronous Dynamic DRAM, is known as cached DRAM.
Read Operation
•
On a read the CPU will first try to find the data in the cache, if it is not there the cache will get updated from the main memory and then return the data to the CPU.
Write Operation
•
On a write the CPU will write the information into the cache and the main memory.
References
http://www-ece.ucsd.edu/~weathers/ece30/downloads/Ch7_memory(4x).pdf
http://home.cfl.rr.com/bjp/eric/ComputerMemory.html
http://aggregate.org/EE380/JEL/ch1.pdf
Defining a Bus
A parallel circuit that connects the major components of a computer, allowing the transfer of electric impulses from one connected component to any other
VESA -
Video Electronics Standards Association 32 bit bus Found mostly on 486 machines Relied on the 486 processor to function People started to switch to the PCI bus because of this Otherwise known as VLB
ISA -
Industry Standard Architecture Very old technology Bus speed 8mhz Speed of 42.4 Mb/s maximum Very few ISA ports are found in modern machines.
MCA -
Micro Channel Bus IBM’s attempt to compete with the ISA bus 32 bit bus Automatically configured cards (Like Plug and Play) Not compatible with ISA
EISA -
Extended Industry Standard Architecture Attempt to compete with IBM’s MCA bus Ran on a 8.33Mhz cycle rate 32 bit slots Backward compatible with ISA Went the way of MCA
PCI –
Peripheral Component Interconnect Speeds up to 960 Mb/s Bus speed of 33mhz 16-bit architecture Developed by Intel in 1993 Synchronous or Asynchronous PCI popularized Plug and Play Runs at half of the system bus speed
PCI – X
Up to 133 Mhz bus speed 64-bit bandwidth 1GB/sec throughput Backwards compatible with all PCI Primarily developed for increased I/O demands of technologies such as Fibre Channel, Gigabit Ethernet and Ultra3 SCSI.
AGP –
Accelerated Graphics Port Essentially a high speed PCI Port Capable of running at 4 times PCI bus speed. (133mhz) Used for High speed 3D graphics cards Considered a port not a bus Only two devices involved Is not expandable
BUS 8-bit ISA 16-bit ISA EISA VLB PCI AGP AGP(X2) AGP(X4) 8 16 32 32 32 32 32 32 Width (bits) Bus Speed (Mhz) 8.3
8.3
8.3
33 33 66 66 X 2 66 X 4 Bus Bandwith (Mbytes/sec) 7.9
15.9
31.8
127.2
127.2
254.3
508.6
1017.3
IDE -
Integrated Drive Electronics Tons of other names:
ATA
,
ATA/ATAPI, EIDE
,
ATA-2
,
Fast ATA
,
ATA-3, Ultra ATA
,
Ultra DMA
Good performance at a cheap cost Most widely used interface for hard disks
SCSI Small Computer System Interface “skuzzy” Capable of handling internal/external peripherals Speed anywhere from 80 – 640 Mb/s Many types of SCSI
TYPE SCSI-1 Fast SCSI Fast Wide SCSI Ultra SCSI Ultra Wide SCSI Ultra2 SCSI
Wide Ultra2 SCSI
Ultra3 SCSI Ultra320 SCSI Bus Speed, MBytes/ Sec. Max.
5 10 20 20 40 40 80 160 320 Bus Width, bits 8 8 16 8 16 8 16 16 16 Max. Device Support 8 8 16 8 16 8 16 16 16
Serial Port
Uses DB9 or DB25 connector Adheres to RS-232c spec Capable of speeds up to 115kb/sec
USB
1.0
hot plug-and-play Full speed USB devices signal at 12Mb/s Low speed devices use a 1.5Mb/s subchannel.
Up to 127 devices chained together 2.0
data rate of 480 mega bits per second
USB On-The-Go
For portable devices.
Limited host capability to communicate with selected other USB peripherals A small USB connector to fit the mobile form factor
Firewire
i.e. IEEE 1394 and i.LINK
High speed serial port 400 mbps transfer rate 30 times faster than USB 1.0
hot plug-and-play
PS/2 Port
Mini Din Plug with 6 pins Mouse port and keyboard port Developed by IBM
Parallel port i.e. “printer port”
Old type Two “new” types ECP (extended capabilities port) and EPP (enhanced parallel port) Ten times faster than old parallel port Capable of bi-directional communication.
Game Port
Uses a db15 port Used for joystick connection to the computer
Need for High Performance Computing
There’s a need for tremendous computational capabilities in science engineering and business There are applications that require gigabytes of memory and gigaflops of performance
What is a High Performance Computer
Definition of a High Performance computer : An HPC computer can solve large problems in a reasonable amount of time Characteristics : Fast Computation Large memory High speed interconnect High speed input /output
How is an HPC computer made to go fast
Make the sequential computation faster Do more things in parallel
Applications
1> Weather Prediction 2> Aircraft and Automobile Design 3> Artificial Intelligence 4> Entertainment Industry 5> Military Applications 6> Financial Analysis 7> Seismic exploration 8> Automobile crash testing
•
Who Makes High Performance Computers
* SGI/Cray Power Challenge Array Origin-2000 T3D/T3E * HP/Convex SPP-1200 SPP-2000 * IBM SP2 * Tandem
Trends in Computer Design
Performance of the fastest computer has grown exponentially from 1945 to the present averaging a factor of 10 every five years The growth flattened somewhat in 1980s but is accelerating again as massively parallel computers became available
Real World Sequential Processes
Sequential processes we find in the world.
The passage of time is a classic example of a sequential process.
Day breaks as the sun rises in the morning.
Daytime has its sunlight and bright sky.
Dusk sees the sun setting in the horizon.
Nighttime descends with its moonlight, dark sky and stars.
Parallel Processes
Music An orchestra performance , where every instrument plays its own part, and playing together they make beautiful music.
Parallel Features of Computers
Various methods available on computers for doing work in parallel are : Computing environment Operating system Memory Disk Arithmetic
Computing Environment - Parallel Features
Using a timesharing environment The computer's resources are shared among many users who are logged in simultaneously.
Your process uses the cpu for a time slice , and then is rolled out while another user’s process is allowed to compute.
The opposite of this is to use dedicated mode yours is the only job running.
where The computer overlaps computation and I/O While one process is writing to disk, the computer lets another process do some computation
Operating System - Parallel Features
Using the UNIX background processing facility a.out > results & man etime Using the UNIX Cron jobs feature You submit a job that will run at a later time.
Then you can play tennis while the computer continues to work.
This overlaps your computer work with your personal time.
Memory - Parallel Features
Memory Interleaving Memory is divided into multiple banks , and consecutive data elements are interleaved among them.
There are multiple ports to memory. When the data elements that are spread across the banks are needed, they can be accessed and fetched in parallel.
The memory interleaving increases the memory bandwidth .
Memory - Parallel Features(Cont)
Multiple levels of the memory hierarchy Global memory which any processor can access.
Memory local to a partition of the processors.
Memory local to a single processor: cache memory memory elements held in registers
Disk - Parallel Features
RAID disk R edundant A rray of I nexpensive D isk Striped disk When a dataset is written to disk, it is broken into pieces which are written simultaneously to different disks in a RAID disk system.
When the same dataset is read back in, the pieces of the dataset are read in parallel, and the original dataset is reassembled in memory.
Arithmetic - Parallel Features
We will examine the following features that lend themselves to parallel arithmetic: Multiple Functional Units Super Scalar arithmetic Instruction Pipelining
Parallel Machine Model (Architectures)
von Neumann Computer
MultiComputer
A multicomputer comprises a number of von Neumann computers or nodes linked by a interconnection network In a idealized network the cost of sending the a message between two nodes is independent of both node location and other network traffic but does depend on message length
Locality Scalibility Concurrency
Distributed Memory (MIMD)
MIMD means that each processor can execute separate stream of instructions on its own local data,distributed memory means that memory is distributed among the processors rather than placed in a central location
Difference between multicomputer and MIMD The cost of sending a message between multicomputer and the distributed memory is not independent of node location and other network traffic
Examples of MIMD machine
MultiProcessor or Shared Memory MIMD
All processors share access to a common memory via bus or hierarchy of buses
Example for Shared Memory MIMD
Silicon Graphics Challenge
SIMD Machines
All processors execute the same instruction stream on a different piece of data
Example of SIMD machine :
MasPar MP
Use of Cache
Why is cache used on parallel computers?
The advances in memory technology aren’t keeping up with processor innovations.
Memory isn’t speeding up as fast as the processors.
One way to alleviate the performance gap between main memory and the processors is to have local cache.
The cache memory can be accessed faster than the main memory.
Cache keeps up with the fast processors, and keeps them busy with data.
Shared Memory Cache Memory 1 processor 1 Network Cache Memory 2 processor 2 Cache Memory 3 processor 3
Cache Coherence
What is cache coherence? Keeps a data element found in several caches current with each other and with the value in main memory.
Various cache coherence protocols are used.
snoopy protocol directory based protocol
Various Other Issues
Data Locality Issue Distributed Memory Issue Shared Memory Issue