Šiuolaikinių kompiuterių architektūra

Download Report

Transcript Šiuolaikinių kompiuterių architektūra

COMPUTER
ARCHITECTURE
Assoc.Prof. Stasys Maciulevičius
Computer Dept.
[email protected]
Development of processor architecture
Main processor development and production
companies, creating a new processors to the
various market segments, are seeking:
 enhance its performance; to reach this goul they:
 increase
clock frequency
 use a variety of microarchitecture enhancements
 move to multi-core microarchitectures

reduce energy consumption
2014
©S.Maciulevičius
2
Word length: from 32 to 64 bits



32-bit processor can do operations over
integers to 232 or 4.3 billion
64-bit processor’s facilities reach 264 or round
18.4 quintillion (18,400,000,000,000,000,000);
32-bit processors and operating systems
can support up to 4 gigabytes of memory,
including only 2 gigabytes for applications;
CAD/CAM and scientific calculations this is
not enough at present
2014
©S.Maciulevičius
3
Data in processors
Register
set
Functional
unit
x86 word
x86-64
word
Integers
GPR
ALU
32
64
Addresses
GPR
ALU or AGU
32
64
Floating point
numbers
FPR
FPU
64
64
VR
VPU
128
128
Data type
Vectors
As can be seen, but differs only length of integers
and addresses
2014
©S.Maciulevičius
4
x86-64 specification
The x86-64 specification was designed by
Advanced Micro Devices (AMD) as an
extension of the x86 instruction set
 It allows far larger virtual and physical
address spaces than x86, doubles the
width of the integer registers from 32 to 64
bits, increases the number of integer
registers, and provides other
enhancements

2014
©S.Maciulevičius
5
Intel® EM64T
Intel has released their “64-bit technology”
in order to compete with AMD’s 64-bit
technology
 Intel EM64T enhances system performance
enabling access more than 4 GB memory
 Intel EM64T supports:





2014
64-bit virtual address space
64-bit pointers
64-bit general purpose registers
64-bit integers
©S.Maciulevičius
6
EM64T (and x86-64) registers
2014
©S.Maciulevičius
7
Multi-core processors



Increase the frequency towards increasing
performance, becoming more and more
difficult
Instead, the companies have focused their
efforts to increase the parallelism - developed
dual-core processors, later moving to a multicore processors
This way follow Intel, AMD, Motorola, Sun
and other companies
2014
©S.Maciulevičius
8
Intel Core microarchitecture summary
2014
©S.Maciulevičius
9
Intel Nehalem microarchitecture

Nehalem is the codename for an Intel processor
microarchitecture, successor to the Core
microarchitecture

The first processor released with the Nehalem architecture
was the desktop Core i7, which was released in November
2008.
Nehalem differs radically from Netburst. Nehalem-based
microprocessors use higher clock speeds and are more
energy-efficient. Hyper-threading is reintroduced, along with
a reduction in L2 cache size, as well as an enlarged L3
cache that is shared by all cores

2014
©S.Maciulevičius
10
Intel Nehalem microarchitecture







64 KB L1 cache/core (32 KB L1 Data + 32 KB L1 Instruction)
and 256 KB L2 cache/core
4–12 MB L3 cache
Native (all processor cores on a single die) quad- and octacore processors
Intel QuickPath Interconnect in high-end models replacing the
legacy front side bus
Integration of PCI Express and DMI into the processor,
replacing the northbridge
Integrated memory controller supporting two or three memory
channels of DDR3 SDRAM or four FB-DIMM2 channels
Second-generation Intel Virtualization Technology
2014
©S.Maciulevičius
11
Some of Intel Nehalem processors
Processor Interface
Number of Cores
Turbo Boost
Hyper-Threading
L1 Cache
L2 Cache
L3 Cache
Memory Channels
Max. Memory Rate
Chipset
Price
2014
Core i7
(LGA 1366)
Core i7
(LGA 1156)
Core i5
Core 2 Quad
LGA 1366
4
LGA 1156
4
LGA 1156
4
LGA 775
4
Yes
Yes
Yes
Yes
Yes
No
No
No
32KB/32KB per
core
256KB per core
32KB/32KB per
core
256KB per core
8MB shared
3
DDR3-1066
8MB shared
2
DDR3-1333
X58
$284-$999
P55
$285-$555
©S.Maciulevičius
32KB/32KB 32KB/32KB
per core
per core
256KB per Up to 12MB
core
shared
8MB shared
No
2
2
DDR3-1333 DDR3-1600
P55
$199
X48
$163-$316
12
Intel’s strategy

Intel introduces new microprocessor
architectures every 2 years as part of “TickTock” strategy:
2014
©S.Maciulevičius
13
Intel’s Sandy Bridge
Sandy Bridge is the codename for a
microarchitecture developed by Intel
beginning in 2005 for CPUs in computers to
replace the Nehalem microarchitecture
 It was designed for the full range of
applications from mobile devices, laptop and
desktop computers, to large enterprise
servers
.
 Intel demonstrated a Sandy Bridge processor
in 2009, and released first products in
January 2011 based on the architecture

2014
©S.Maciulevičius
14
Intel’s Sandy Bridge

Sandy Bridge main features:


32 nm fabrication process
CPU clock rate 1.4–3.4 GHz, grafics clock rate 350850 MHz (for different models)

Turbo Boost 2.0 technology enables rise of clock
rate till 3.8 GHz and 1350 MHz respectively
 32 kB data + 32 kB instruction L1 cache
.
(3 clocks) and 256 kB L2 cache (8 clocks) per
core
 Shared L3 cache – 3-8 MB (25 clocks)
2014
©S.Maciulevičius
15
Intel’s Sandy Bridge

Sandy Bridge has integrated graphic controller
and specialized accelerator; it accelerates
multimedia content processing significantly
 Sandy Bridge supports DirectX 10.1 and
OpenCL 1.1; its productivity far exceeds the
performance of the first generation Core
 Advanced Vector Extensions (AVX)
256-bit
.
instruction set with wider vectors, new
extensible syntax and rich functionality
2014
©S.Maciulevičius
16
Intel’s Sandy Bridge





2014
Decoded micro-operation cache and enlarged,
optimized branch predictor
256-bit/cycle ring bus interconnect between
cores, graphics, cache and System Agent
Domain
Intel Quick Sync Video, hardware support for
video encoding and decoding
.
Up to 8 physical cores or 16 logical
cores
through Hyper-threading
TDP of desktop CPUs is 35–95 W, for mobile
CPUs –17-55 W
©S.Maciulevičius
17
Intel’s Sandy Bridge caches
.
2014
©S.Maciulevičius
18
Sandy Bridge microarchitecture
.
2014
©S.Maciulevičius
19
Sandy Bridge: L0 cache
.
2014
©S.Maciulevičius
20
Sandy Bridge: ring bus
2014
©S.Maciulevičius
Each core, each slice
of L3 (LLC) cache, the
on-die GPU, media
engine and the system
agent all have a stop
on the ring bus
The bus is made up of
four independent rings:
a data ring, request
ring, .acknowledge ring
and snoop ring. Each
stop for each ring can
accept 32-bytes of data
per clock
21
Intel’s Ivy Bridge
Ivy Bridge is the first chip to use Intel's
22nm tri-gate transistors, which help scale
frequency and reduce power consumption
 At a high level Ivy Bridge looks a lot like
Sandy Bridge
 Ivy Bridge is considered a tick from the
CPU perspective but a tock from the GPU
perspective

2014
©S.Maciulevičius
22
Intel’s Ivy Bridge
2014
©S.Maciulevičius
23
Intel’s Ivy Bridge
2014
©S.Maciulevičius
24
Intel’s Ivy Bridge

Ivy Bridge introduces
configurable TDP that allows
the platform to increase the
CPU's TDP if given additional
cooling, or decrease the TDP
to fit into a smaller form factor
Ivy Bridge Configurable TDP
2014
cTDP
Down
Nominal
cTDP Up
Ivy Bridge ULV
13W
17W
33W
Ivy Bridge XE
45W
55W
65W
©S.Maciulevičius
25
Intel’s Ivy Bridge
Sandy Bridge brought a completely
redesigned GPU core onto the processor
die itself
 With Ivy Bridge the GPU remains on die
but it grows more than the CPU does this
generation
 Ivy Bridge GPU adds support for OpenCL
1.1, DirectX 11 and OpenGL 3.1

2014
©S.Maciulevičius
26
From Nehalem to Hasswell
2014
©S.Maciulevičius
27
Intel’s Hasswell



Haswell is the codename for
a processor microarchitecture as the successor
to the Ivy Bridge architecture
Using the 22 nm process, Intel is expected to
release CPUs based on this microarchitecture
around June 2, 2013
With Haswell, Intel will introduce a new lowpower processor designed for convertible or
'hybrid' Ultrabooks
2014
©S.Maciulevičius
28
Intel’s Hasswell


The Haswell architecture is specifically designed to
optimize the power savings and performance benefits
Haswell is expected to launch in three major forms:



Desktop version (LGA1150 socket): Haswell-DT
Mobile/Laptop version (PGA socket): Haswell-MB
BGA version:



2013
2014
47W and 57W TDP classes: Haswell-H (For "All-in-one" systems,
Mini-ITX form factor motherboards, and other small footprint formats.)
13.5W and 15W TDP classes (SoC): Haswell-ULT (For Intel's
UltraBook platform.)
10W TDP class (SoC): Haswell-ULX (For tablets and certain
UltraBook-class implementations.)
©S.Maciulevičius
29
Intel’s Hasswell Performance

Compared to Ivy Bridge:
 Twice
the vector processing performance
 At least 10% sequential CPU performance increase (8
execution ports per core versus 6)
 Up to double the performance of the integrated GPU
2014
©S.Maciulevičius
30
Intel’s Hasswell
2014
©S.Maciulevičius
31
2014
©S.Maciulevičius
32
CPU Idle Power
2014
©S.Maciulevičius
33
2014
©S.Maciulevičius
34
Intel’s Hasswell
2014
©S.Maciulevičius
35
Intel Hasswell
2013
©S.Maciulevičius
36
AVX2 – FMA
2013
©S.Maciulevičius
37
Some models
CPU
Freq.
Turbo
Boost
CacheMemory
Cores /
Threads
TDP
Core i7-4770K
3.5 GHz 3.9 GHz
8 MB
4/8
84 W
Core i7-4770
3.4 GHz 3.9 GHz
8 MB
4/8
84 W
Core i7-4770S
3.1 GHz 3.9 GHz
8 MB
4/8
65 W
Core i7-4770T
2.5 GHz 3.7 GHz
8 MB
4/8
45 W
Core i7-4765T
2.0 GHz 3.0 GHz
8 MB
4/8
35 W
2013
©S.Maciulevičius
38