Šiuolaikinių kompiuterių architektūra

Download Report

Transcript Šiuolaikinių kompiuterių architektūra

COMPUTER
ARCHITECTURE
Assoc.Prof. Stasys Maciulevičius
Computer Dept.
[email protected]
von Neumann architecture
The term Von Neumann architecture derives from a computer
architecture proposal by the mathematician and early
computer scientist John von Neumann and others (1945),
entitled First Draft of a Report on the EDVAC
This describes a design architecture for an electronic digital
computer with subdivisions of a processing unit consisting of
an arithmetic logic unit and processor registers, a control
unit containing an instruction register and program counter, a
memory to store both data and instructions, external mass
storage, and input and output mechanisms
2009-2014
©S.Maciulevičius
2
von Neumann architecture
Structure of such computer looks like this:
Arithmetic
logic unit
Control
unit
Input/
output
Communication unit (bus)
Memory
2009-2014
©S.Maciulevičius
3
Processor
Now the arithmetic logic unit and control unit
(sometimes called instruction and data processors)
are integrated into one unit – central processor
(CPU).
4
2009-2014
©S.Maciulevičius
4
Processor
Now the arithmetic logic unit and control unit are
integrated into one unit – central processor (CPU):
PROCESSOR
Control
signals
Control
unit
Information
about
operation flow
Instruction
(from memory)
2009-2014
©S.Maciulevičius
Arithmetic logic
unit
Data
(from memory)
Results
(to memory)
5
Processor
• Control unit fetches instructions from the memory, analyzes
them and controls operations in functional unit;
• Arithmetic logic unit executes operations according to
current instruction;
• These two devices work together: the control unit generates
control signals according to operation code, the arithmetic
logic unit transmits condition signals to the control unit
informing about the running operation; these may affect the
generation of the subsequent control signals (e.g., sign of
operand, value of some bit, etc.)
4
2009-2014
©S.Maciulevičius
6
Functional unit of processor
Functional unit
operands
Internal
memory
(registers,
cache)
Data
(from memory)
2009-2014
results
Control
signals (from control
unit)
©S.Maciulevičius
Operations
performing
circuits
Information
Results
about running operation (to memory)
(to control unit)
7
Functional unit of processor
If we look at the interior of the functional unit, it
can be divided into two groups of elements:
• Internal memory, which is required to keep data to
be processed (operands); it consists of registers, a
separate triggers, cache memory [cache], in some
cases – stack;
• Circuits performing the operations - they
perform all necessary actions to process the
information – addition, logic operations, shifts, etc.
2009-2014
©S.Maciulevičius
8
Processor market
Intel company – the leader (about 80%
market share – for computers)
 AMD company – the main competitor
(about 19% market share)
 The remaining producers – about 1% of
the market
 New gamer in mobile processor market ARM

2009-2014
©S.Maciulevičius
9
Intel processors
8085
8080
80386SX 80486SX
8088
4004
8008
8086
80286 80386 80486
80486DX2
80486DX4
2009-2014
©S.Maciulevičius
10
Intel 4004
Intel 4004, first microprocessor (November, 1971)
 Designed for calculator
 Data word - 4 bits,
 16 registers (4 bits),
 Instruction length - 8 bits,
 Instruction number - 46;
 Separate memories: 1 KB – for data, 4 KB – for
program,
 PC length - 12 bits,
 4 level stack for subprogram calls,
 Frequency - 108 KHz,
 2300 transistors (fabrication process - 10 m).
2009-2014
©S.Maciulevičius
11
Intel 8086 (1978)









Software compatible with Intel 8080, has similar register
set
Data word - 16 bits
Instruction prefetch buffer length - 6 bytes
Four 16-bit general registers
Four 16-bit registers for addresses
Segment registers
Addresable memory - 1 MB
29 000 transistors (fabrication process - 3 m)
Frequency - 4,77 MHz, price - $360
2009-2014
©S.Maciulevičius
12
Intel 8086
A19-A16(ST6-ST2) AD15-AD0
Addr./stat.
buff.
Address/ data buffer
Address summator
8088:
• 8-bit data bus
• 4-byte
instruction
queue
2009-2014
CS
SS
DS
ES
AX
CX
DX
BX
AH
CH
DH
BH
IP
Instruction
queue
AL
CL
DL
BL
SP
BP
SI
DI
©S.Maciulevičius
ALU
F
control
and
synchronisation
unit
13
Intel 80386 (1985)







Extended addressing capability, adding index multiplier
(base reg + index reg  multiplier (1, 2, 4 ar 8) +
displacement (8 / 32-bit constant)
Added memory management unit (MMU), privilege levels
(using protection rings)
Addressable memory - 4 GB
Virtual memory - 64 TB
Transistor count - 275 000 (1,5 m)
Frequency - 16 MHz, price - $299
80386SX - with 16-bit data bus:



addressable memory - 16 MB,
virtual memory - 256 GB)
80386SL (1990) - first microprocessor for notebooks:

addressable memory - 4 GB, virtual memory - 64 TB.
 transistor count - 855 000 (1 m), frequency - 20 MHz.
2009-2014
©S.Maciulevičius
14
Intel 80486 (1989)









has instruction pipeline
internal 8KB cache both for data and instructions
integrated FPU
addressabe memory - 4 GB, virtual memory - 64 TB
transistor count - 1,2 mln. (1 m; 50 MHZ - 0,8 m)
frequency - 25 MHz, price - $900
along with basic variant (DX) the 80486DX2 (with
frequency duplication) and 80486DX4 (with frequency
triplication) were developed
80486SX (1991) - without FPU
80486SL (1992) - for notebooks
2009-2014
©S.Maciulevičius
15
Intel processors (2)
P5
Pentium
Pentium MMX
P6
Pentium Pro
Pentium II
Pentium III
P7
Pentium 4
P8
Core
Core Duo
Core ix
IA-64
Itanium
2009-2014
Itanium 2
©S.Maciulevičius
16
Intel Pentium (1993)










The first superscalar x86 architecture processor
(with dual integer pipelines, a faster FPU)
5-stage pipeline
branch prediction
separate 8KB instruction and data caches
64-bit external databus
addressabe memory - 4 GB
virtual memory - 64 TB
3.1 million transistors
fabricated in a 0.8 µm process
frequency - 60 MHz, price - $878
2009-2014
©S.Maciulevičius
17
Pentium 4





7th generation processor (P7)
NetBurst microarchitecture
Oriented on high clock frequency (1,4-1,5 times
higher than in other processors)
This significantly increased the length of the
pipeline, made the devices more complex and
therefore increased energy consumption
New variant of this processor – Prescott (P4-E); it
supports 64-bit integer operations and EM64T
addressing
2009-2014
©S.Maciulevičius
18
Intel Pentium pipelines
2009-2014
©S.Maciulevičius
19
AMD K5 (5k86)




The K5 was AMD's first x86 processor to be
developed entirely in-house, introduced in
March 1996
Its primary competition was Intel's Pentium
The branch target buffer was four times the size
of the Pentium's and register renaming improved
parallel performance of the pipelines
It has 16 KB instruction cache, which was
double that of the Pentium
2009-2014
©S.Maciulevičius
20
AMD K7 (Athlon, Duron)
The original Athlon was the first 7th-generation x86
processor and retained the initial performance
lead it had over Intel's competing processors for
a significant period of time









Superpipelined superscalar processor
Has x86RISC86 decoders
Three o-o-o superscalar superpipelined FPU, executing
all x87 (foating point), MMX and 3DNow! instructions
Three o-o-o superscalar superpipelined integer ALUs
Three o-o-o superscalar superpipelined address
generating units
Larger L1 caches (64 KB + 64 KB)
200 MHz system bus
Enhanced dynamic branch prediction
37 mln transistors
2009-2014
©S.Maciulevičius
21
AMD K7 microarchitecture
2009-2014
©S.Maciulevičius
22
AMD K7 pipelines
1
2
3
4
5
In case of branch
misprediction 10
clocks will be lost
6
Fetch Scan Align1 Align2 Edec IDec
Load
Integer
7
8
9
Sched Ex Addr
Floating
point
7
8
9
10
DC
10 11 12 13 14 15
Stack Name WSch Sched FReg FX0 FX1 FX2
2009-2014
©S.Maciulevičius
FX3
23
Transmeta Crusoe










For mobile systems
Original VLIW instruction set
1 FPU, 2 ALU, 1 LSU, 1 BU
64 registers
Decoding of x86 instructions
Code Morphing Software
Enhanced power manegement
Models:
 TM3200 (only one 96 KB L1 cache),
 TM5400 (256 KB L1 cache, 256 KB L2 cache),
 TM5600 (512 KB L1 cache, 512 KB L2 cache)
 TM5800 (512 KB L1 cache, 512 KB L2 cache)
In 2002 - TM6000
In 2003 - TM8000 (Efficeon)
2009-2014
©S.Maciulevičius
24
Transmeta Crusoe
128-bit bundles of instructions (molecule)
FADD
ADD
LD
BRCC
FPU
ALU
LSU
BU
(Float
Point
Unit)
(Integer
ALU)
(LoadStore
Unit)
(Branch
Unit)
2009-2014
©S.Maciulevičius
25
Transmeta Crusoe TM8000
2009-2014
©S.Maciulevičius
26
PowerPC processors





IBM 801 minicomputer (it has RISC instruction
set) was as prototype
Superscalar RISC system System/6000 was
introduced in early 1990
Soon it becomes name POWER (Performance
Optimization with Enhanced RISC) architecture
Thereafter, IBM formed an alliance with Motorola
(68000) and Apple, wich used a Motorola
processor in Macintosh PCs
So PowerPC architecture was born
2009-2014
©S.Maciulevičius
27
PowerPC processors

601: the first implementation of the
PowerPC architecture, released in 1992;



603: 32-bit processor for low-end desktops and
notebooks
604: 32-bit processor for desktops and low-end
servers;
620: first 64-bit processor for high-end servers
2009-2014
©S.Maciulevičius
28
PowerPC processor 604e




RISC instruction set
For servers and workstations
Core uses 1,9V power
Dispatch Unit
 Issues up to 4 instructions per clock
 Issue buffer contains 8 instructions

Completion Unit
 In
one clock completes up to 4 instructions plus 1
store plus 1 branch instruction
2009-2014
©S.Maciulevičius
29
PowerPC processor 604e

Load/Store Unit

Hardware support for unaligned little-endian access
 Hardware controlled parallel access to several registers
(reads and stores)
 Out-of-order (o-o-o) reads and stores

Three integer units (IU):

Two single-cycle integer units - SCIU
 One multiple-cycle integer unit (MCIU)


FPU – IEEE-754 standard support
Branch prediction:

512-entry branch history table
 64-entry branch target address cache
2009-2014
©S.Maciulevičius
30
PowerPC 604e pipeline
Fetch
Decode
Dispatch
Execute
SCIU1
SCIU2
MCIU
FPU
BPU
LSU
Complete
Writeback
2009-2014
©S.Maciulevičius
31
PowerPC processor 7400 (G4)




Superscalar RISC processor
Issues up to 4 instructions per clock
Executes up to 8 instructions in parallel
Has 8 functional units and 3 register files:









IU1 and IU2 – two integer units and general register file
FPU – floating point unit and register file
VPU and VALU - two vector units and vector register file
BPU – branch processing unit
SRU – system register unit
LSU – load/store unit
Separate 32 KB instruction and data caches
L2 level cache controller
For servers and workstations
2009-2014
©S.Maciulevičius
32
PowerPC processor 7400 (G4)
VPU and VALU - vector units and vector register file
2009-2014
©S.Maciulevičius
33
PowerPC processor 7400 (G4)
IU1 and IU2 – two integer units and general register file
2009-2014
©S.Maciulevičius
34
7400 (G4) pipeline
2009-2014
©S.Maciulevičius
35
Why RISC processors are so
not popular?

Incompatibility with x86 instruction set. Therefore,
x86-based programs may be executed only through
emulation. And this in several tens of % reduces
advantages of RISC

Software. Initially, traditional PC operating system was
DOS. A lot of popular and effective programs were
written in DOS and 16-bit versions of Windows.
Meanwhile, various RISC platforms used different and
incompatible versions of Unix, for which is written little
popular and effective programs - more programs were
developed for workstations and servers
2009-2014
©S.Maciulevičius
36
Why RISC processors are so
not popular?

A higher price for RISC processors. Although the
original idea was to RISC processor was more simply
RISC chips, in fact RISC chips were actually more
expensive than the Intel x86. Wider RISC bus (128 or even
256 bits) requires more expensive and more complex
control circuits, chipsets and boards. Workstations and
servers oriented decisions had been too expensive for a
PC.
 RISC systems manufacturers passivity. "Serious"
companies (Sun, DEC) felt that there is no need to reduce
the cost of RISC workstations because of their indisputable
advantages
2009-2014
©S.Maciulevičius
37
Comparing RISC and CISC
Comparing RISC
and CISC processors
Performance of some processors in SPEC2000
Frequency, MHz
2009-2014
©S.Maciulevičius
SPECINT2000
SPECFP2000
38