Processor Architecture Survey

Download Report

Transcript Processor Architecture Survey

Alpha
Itanium II
Opteron
PA-RISC
Pentium Xeon
Power4/5/5+
UltraSPARC IV/T1/64
Ken Moreau
Solutions Architect, Hewlett-Packard
© 2006 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice
Agenda
•
Define terminology
•
Processor architecture
•
System architecture
•
Extending the addressing of the x86 architecture
•
Q&A
July 7, 2015
A Survey Of Processor Technologies
Slide 2 of 38
Terminology – Processor Components
L2 cache
Execution Units (Int, FP, etc)
Registers
I(nstruction) and D(ata) Cache – L1 Cache
Dual Core
July 7, 2015
A Survey Of Processor Technologies
Slide 3 of 38
Terminology – Cache Components
•
L1, L2 and L3 cache are increasingly large but
increasingly slow, but still far faster than RAM
•
In an SMP system, you must guarantee that what
is in one processor’s cache is the same as in all
other processor’s cache, and it matches what is in
RAM
− Also known as “cache coherency”
− Directory schemes maintain a list of cached blocks
− “Snooping” schemes have all caches monitor all other
caches requests for memory
− In either case, when a write is detected, all other cache
blocks are updated or invalidated
•
Translation Lookaside Buffer (TLB) maps virtual
addresses to physical addresses, caching PTEs
July 7, 2015
A Survey Of Processor Technologies
Slide 4 of 38
Terminology - Processor Packaging
Single processor in a slot
Two processors in a slot
Four processors in a slot
July 7, 2015
A Survey Of Processor Technologies
Slide 5 of 38
Terminology – Processor Packaging
•
•
How do we refer to this new world of processors with
multi-core and multi-threaded packaging?
HP is focusing on processor and cores as the most
important factors:
− Processor: the number of physical pieces of silicon
− Core: the number of processing units contained in those
processors
− Threads: the number of sequences of execution in each core
•
Examples:
−
−
−
−
−
•
ProLiant with Pentium Xeon
ProLiant with Opteron dual-core
HP 9000 with PA-8900
Integrity with Itanium 2 Madison
Integrity with Itanium 2 Montecito
– DL580 4p/4c
– DL585 4p/8c
– rp3440 2p/4c
– rx4640 4p/4c
– rx6600 4p/8c
Other vendors are doing it differently
− IBM is counting each core as a processor
− Sun is counting each thread
in each core as a processor
July 7, 2015
A Survey Of Processor Technologies
Slide 6 of 38
Terminology – Register Remapping
July 7, 2015
R1
R2
R3
R1
R2
R3
R4
...
R4
R5
R6
Function Red() {
...
call Yellow();
... };
Function Yellow() {
...
call Blue();
...};
R1
R2
R3
R4
R1
R31
R32
R33
R34
R35
R36
R2
R3
R1
R2
R3
R37
R38
R39
R40
R41
R42
Function Blue() {
...
call Green();
...};
R4
R1
R2
R3
R4
R43
R44
R45
R48
Function Green() {
... };
...
R46
R47
A Survey Of Processor Technologies
Slide 7 of 38
Terminology – Predication
•
Branch instructions really hurt performance
− The processor pipeline is flushed, and the instruction
cache is potentially flushed
•
Every if-then statement is potentially a branch
− But only potentially: we might not take the branch
Typical
Optimized
branch.eq (r1,r2)
p1
instr 2
instr 3
July 7, 2015
p2
instr 4
instr 5
(p1,p2)<-cmp(r1,r2)
if (p1) instr 2
if (p2) instr 4
if (p1) instr 3
if (p2) instr 5
A Survey Of Processor Technologies
Slide 8 of 38
Terminology – Multi-Threading
Time
ALU
INT
FP
Core 0
FP
Core 1
INT
FP
FP
Chip
Dual-Threaded
Multi
Superscalar
Dual-Core
Threading
Dual-Core
(CMT)
July 7, 2015
A Survey Of Processor Technologies
Slide 9 of 38
Terminology – Multi-Thread MultiCore
INT
INT
FP
Core
Core 20
FP
Core
Core 31
INT
INT
FP
FP
July 7, 2015
A Survey Of Processor Technologies
Slide 10 of 38
Multi-Threading Approaches
Non Shared
Resources
Shared
resources
Low utilization
Low complexity
Low latency hiding
A
A
A
A A
A
Single
thread
A
B
B
B
B
B
Opteron
(Chip Multi-Threading)
PA-8800/8900
(Chip Multi-Threading)
UltraSPARC IV
(Chip Multi-Threading)
A A A
B A
B A
B B
A
B
A
B
A
A
B A B
B A
B A B
B B
A B
Event
Temporal
Low utilization
Low complexity
High latency hiding
B A B
B A
B A
B
A
B
A
B
B A B
B
A
A
B B A B
B
A B
B
A
B
B B A
B
A B
B B A B
Medium utilization
Low complexity
High latency hiding
Pentium Xeon
(Hyperthreading)
Power5
(Symmetric
Multi-Threading)
A
Itanium 2 Montecito
July 7, 2015
Multiple
thread
/ cycle
High utilization
High complexity
Medium latency hiding
UltraSparc T1
(Coarse or Switch On Event
Multi-Threading)
A Survey Of Processor Technologies
(Fine Grained
Multi-Threading)
Slide 11 of 38
Multi-Core Software Licensing
Type of
Software
Licensing Method
Vendor Status
Issues & Comments
Operating System
Processor, Site
IBM, Microsoft, Novell, Red
Hat and Sun by processor,
HP by core
Development Tools
Processor, Site, User
IBM, Sun by processor
Database Engines
Processor, Core, Site,
User
Microsoft SQL and DB2 by
processor, Oracle by core at
0.75 or 0.5
For 1p Oracle will round
down to 1 from 1.25
Management Tools
Processor, Site, User
EMC, HP, IBM, Tivoli, Veritas
offer many choices
Veritas has licensed by
core for Opteron
Licensing Software
Processor, Core, User,
Site
HP, IBM, Sun offer many
choices
Virtualization
Processor
Microsoft and VMware by
processor
Email Server
Processor, Site, User
IBM, Microsoft by processor
Web Server
Processor, Site, User,
Transaction
Microsoft by processor
Application Server
Processor, Core, Site,
User
IBM by processor, BEA by
processor x1.25 for multi-core
HPC Software
Processor, Thread
Everybody does it differently
Open
July Source
7, 2015
GPL, LGPL, BSDA Survey Of Processor
Everybody
does it differently
Technologies
Licensing by thread will
almost always cost more
Slide 12 of 38
Terminology - Pipelining
•
An extremely simple
model has 5 stages:
− Instruction Fetch
− Instruction Decode
− Execution
− Memory Access
− Writeback
•
Most processors have
many more stages
− Long pipelines increase Instruction Level Parallelism (ILP)
− Short pipelines reduce the chance of dependency stalls,
and reduce the number of times the pipeline is completely
empty because of a branch
− Each processor designer makes difference choices
July 7, 2015
A Survey Of Processor Technologies
Slide 13 of 38
Virtualization Technologies
•
Processors have different “privilege levels” (aka,
rings), where functions such as memory mapping
are only allowed at higher privilege levels
•
But the operating systems in each of the guest
instances needs to run at higher privilege levels
•
Some VM’s “fix up” the guest operating
systems, so they never run at the
highest privilege level
System
Instance
•
AMD, Intel and Sun implement a higher
privilege level, allowing solid security
between the guest instances
System
Instance
System
Instance
Virtualization Intermediary
July 7, 2015
A Survey Of Processor Technologies
Physical System
Slide 14 of 38
System Architecture
Bus Based
Switch Based
•HP Integrity low-end
•HP Alpha low-end and
•HP PA-RISC low-end
mid-range
•HP Pentium ProLiant
•HP Integrity high-end
•IBM eServer xSeries
•HP PA-RISC high-end
low-end
•IBM System x servers
•Sun UltraSPARC
high-end
low-end
•Sun UltraSPARC
Note: IBM p570 is ring based high-end
•Unisys ES7000 series
July 7, 2015
A Survey Of Processor Technologies
Mesh Based
•HP Alpha high-end
•HP Opteron ProLiant
•IBM System i servers
•IBM System p servers
Slide 15 of 38
Memory Access Comparison
CPU
CPU
CPU
CPU
Bus / Switch
I/O
Bus or switch access
•HP PA-RISC, Intel Pentium and
Itanium, Sun UltraSPARC III/IV/T1
•Uses an external switch (“front
side bus”) to access memory
•Cell controller, Northbridge,
address/data repeaters, etc
July 7, 2015
Direct access
•AMD Opteron, HP Alpha EV7, IBM
Power5/5+
•Uses the on-chip switch to access other
processors, memory and I/O cards
•Inter-chip communication to access
memory of another processor
A Survey Of Processor Technologies
Slide 16 of 38
Processor Comparison (part 1)
Core
Alpha EV7
1 (2
per
slot)
Itanium 2
Montecito
Registers
(num@bits)
Cache
Inst/
Cycle
Mem
ns
Execution Units
(per core)
60
(355
worst
case)
185
(385
acros
s
cells)
90
(240
worst
case)
4 Int
2 FP
2 Load/Store
6 Int, 3 Branch
2 FP, 1 SIMD
2 Load and 2 Store
2 Int
4 FP
2 Load/Store
32@64 Int
32@64 FP
64+64KB L1
1.75MB L2(p)
4
2
128@64 Int
128@64 FP
64@1 Predicate
32+32KB L1
1M+256KB L2(p)
12+12MB L3(p)
6
1 or 2
16@32/64 GPR
8@64 Media
16@128 Media
64+64KB L1
1MB L2(p)
3
32@64 Int
32@64 FP
1.5+1.5MB L1
32MB L2(s)
2
1 or 2
8@32 GPR
8@80 Int/FP
8@128 Media
16+12KB L1
4MB L2(s)
3
185
(385
acros
s
cells)
130
Power5/5+
2
32+64KB L1
1.92MB L2(s)
36MB L3(s)
4/5
116
1 Int, 1 Branch, 1 Control
2 FP, 2 Load/Store
Separate in p5
UItraSPARC
IV
UltraSPARC
T1
2
152@64 Int/FP
(p4)
240@64 Int/FP
(p5)
96@64 GPR
96KB L1
8+8MB L2(s)
4
240
2 Int, 1 Branch
2 FP, 1 Load/Store
8
96@64 GPR
8+16KB L1(p)
3MB L2(s)
4
232
2 Int, 1 Branch
1 FP, 1 Load/Store
(per core except FP)
July 7, 2015
2
Opteron
PA-RISC
8800/8900
Pentium
Xeon
2
A Survey Of Processor Technologies
128K+128K L1
3 Int
3 FP, 1 SIMD
2 Load/Store
3 Int
3 FP, 1 SIMD
2 Load/Store
Slide 17 of 38
Processor Comparison (part 2)
Pipeline
Depth
Memory Access
Type / Width
Alpha EV7
7
Direct / 128bits
Itanium 2
Montecito
Opteron
8
Cell Controller
(Switch) / 128bits
12
Direct / 64bits
8
20 (31
with
EM64T)
PA-RISC
8800/8900
Pentium
Xeon
Threading Model
On-chip routing (4
ports) to Mesh
Cell Controller to
Cross-Bar Switch
1 thread per core
1 thread per core
Cell Controller
(Switch) / 128bits
HyperTransport (2
uni-directional
ports) to Mesh
Cell Controller to
Cross-Bar Switch
Memory Controller
Hub / 64bits
Memory Controller
Hub
Hyper-Threading – 2 threads per
core
On-chip routing (3
or 4 ports) to fabric
Symmetric Multi-Threading –
1 thread per core
(2 threads per core with p5+ and
AIX 5.3)
Chip Multi-Threading – 1 thread
per core
Power5/5+
12
Direct / 64bits
UltraSPARC
IV
14
System Board
Controller / 64bits
UltraSPARC
T1
6
July 7, 2015
Inter-Processor
Communication
System Board
Controller to
Centerplane
Switch
System Board
System Board
Controller / 64bits
Controller to
Centerplane
A Survey Of Processor
Technologies
Switch
Switch On Event MultiThreading – 2 threads per core
Chip Multi-Threading – 1 thread
per core
Fine Grained Multi-Threading –
4 threads per core
Slide 18 of 38
Memory Error Checking and Correcting
•
RAID memory
− Uses RAID-4 (distributed striping with dedicated parity) or
RAID-5 (distributed striping with distributed parity)
− Catches and corrects all memory errors
− Requires extra memory
− No single point of failure, but can slow down writes
•
Parity
− An extra bit per block of memory such that the sum of
the 1 bits in the block are always either even or odd
1 0 0 1 01 1 0 0
1 Odd Parity
− Does not catch transposed bits or multiple errors, and
does not help correct the error
July 7, 2015
A Survey Of Processor Technologies
Slide 19 of 38
Memory Error Checking and Correcting
•
Single Error Correction – Dual Error Detection
(SEC-DED)
− Multiple error checking bits so you can determine the
location of the failed single bit
0 1 1 1 1 0 010 1 0 1 1 0 0
P0 P1 P2 D1 P3 D2 D3 D4 P4 D5 D6 D7 D8
Parity bits (original) = 1 1 1 0
Parity bits (computed) = 1 0 0 0
Syndrome Word = 0 1 1 0 and P0 check failed
•If syndrome word is non-zero and the P0 parity check passes,
you have a double bit error but you can’t tell which one
•Gets more efficient in memory usage as word size increases:
– 8 data bits needs 4 parity bits, but 256 data bits needs 9 parity bits
July 7, 2015
A Survey Of Processor Technologies
Slide 20 of 38
Memory Error Checking and Correcting
SDRAM 1
SDRAM 2
SDRAM ‘n’-3
SDRAM ‘n’
X X
No memory DRAM
contributes more
than one bit to each
ECC word
4 data bits
per access
X
X
ECC word 1
X
X
ECC word 2
to CPU
X
X
ECC word 3
X
X
ECC word 4
bit ‘n’
bit 1
July 7, 2015
A Survey Of Processor Technologies
Slide 21 of 38
Dynamic Processor Resiliency (DPR) &
Hot Spare Processor
Spare
CPUs
Dynamic Processor
Resiliency
no system crash
no performance loss
no resource loss
July 7, 2015
CPU 1
CPU
4
Error1
CPU
Spare
CPU 1
CPU 2
CPU 5
Spare
CPU 2
CPU 3
CPU 6
Spare
Fatal
completely
transparent to the endusers
A Survey Of Processor Technologies
Slide 22 of 38
Processor RAS Differences
HP
Alpha
EV7
Intel
Itanium 2
AMD
Opteron
HP
PA-RISC
PA-8900
Intel
Pentium
Xeon
IBM
Power5/5+
Sun
USPARC
IV and T1
Chip Thermal
Sensors and
Management
Yes
Yes
Yes
No
Yes
Limited
No
Full Cache
Parity / ECC
Yes
Yes
Some
Yes
Some
Yes
Some
Processor Data
Bus ECC
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
No
No
No
Yes
No
No
Yes
Limited
Proprietary
method
Limited
Proprietary
method
Limited
No
Yes
No
Yes
No
Yes
Limited
† Data
Poisoning &
Signaling for
Error Recovery
‡ Enhanced
MCA handling &
Error Logging
Dynamic
Processor
Resilience
†To prevent an unrecoverable error from propagating and corrupting data, the “Data is Poisoned” in a way that marks it as
permanently bad so that the system will either force a reread of that data, or in the worst case forces a crash.
‡Enhanced MCA handling and Error Logging are tools provided to allow for more granularity of error containment. This
allows errors to not propagate and to let recovery to have the least impact as possible.
July 7, 2015
A Survey Of Processor Technologies
Slide 23 of 38
System RAS Differences
HP
Alpha
EV7
HP
Itanium 2
HP
Opteron
HP
PA-RISC
PA-8900
HP
Pentium
Xeon
IBM
Power5/5+
Sun
USPARC
IV and T1
Data Bus ECC
Protection
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Address Bus
Parity Protection
Yes
† Yes
No
† Yes
No
No
No
Yes
Yes
?
Yes
?
Yes
No
Yes
Yes
No
Yes
Yes
Yes
No
Yes
Yes
No
Yes
Yes
No
No
† Dynamic
Memory
Resilience
No
Yes
No
Yes
No
No
No
‡ Mirroring or
RAID
Yes
No
No
No
Yes
Yes
No
I/O Bus ECC
Protection
Advanced ECC/
Double Chip
Spare/Chip Kill
Page Deallocation
†Cellular systems only with proprietary DIMMs
‡Memory mirroring/RAID is an expensive way (in terms of $ and performance) to accomplish the same
protection as DMR
July 7, 2015
A Survey Of Processor Technologies
Slide 24 of 38
Processor Power Requirements
•
Moore’s Law is alive and well, but with each “wire”
approaching the width of an electron, leakage and power
are becoming critical
•
If it costs you $1 to run the processor, it will cost you >$1
to cool it when you are running it
HP
Alpha EV7
Intel
Itanium 2
Madison &
Montecito
AMD
Opteron
HP
PA-RISC
PA-8900
Intel
Pentium
Xeon
IBM
Power5/5+
Sun
USPARC IV
and T1
Sun
SPARC64
VI
80W
(1 core)
130W
(1 core)
95W
(2
cores)
70W
(2
cores)
89W
(2
cores)
285W (2
cores)
90W (1
core)
120W (2
cores)
104W
(2 cores)
July 7, 2015
A Survey Of Processor Technologies
79W (8
cores)
Slide 25 of 38
Extending The Addressing
Of The x86 Processor
A Comparison Of The
AMD Opteron
And The
Pentium Xeon with EM64T
July 7, 2015
A Survey Of Processor Technologies
Slide 26 of 38
How does a 32-bit platform address
64GBytes of memory?
•
All Pentiums past the Pro support 36-bit addresses
•
The Intel Physical Address Extensions (/PAE) switch in
BOOT.INI will enable the larger addresses
•
Windows can then address all of physical memory, even
though 32-bit applications can still only use 4GB each
− No re-coding of 32-bit applications to run more instances
•
If an application needs to use more, it can use the Microsoft
Address Windows Extensions (AWE) API to lock large virtual
address ranges into physical memory
− Those applications which need lots of memory can get it
July 7, 2015
A Survey Of Processor Technologies
Slide 27 of 38
x86 to x86-extensions - registers
SSE & SSE2
127
GPR
0
63
XMM0
RAX
31
15 7 0
EAX ah al
.
.
.
.
.
.
EBX
ECX
EDX
ESP
EBP
ESI
EDI
XMM7
XMM8
bx
cx
dx
sp
bp
si
di
79
0
MMX0/FPR0
.
.
.
.
.
.
MMX7/FPR7
R8
.
.
.
.
.
.
XMM15
X87/MMX
R15
Program Counter
63
31
EIP
July 7, 2015
A Survey Of Processor Technologies
16
0
ip
Slide 28 of 38
64-bit Extension Technology
Supported Modes
Legacy Mode
Compatibility Mode
64-Bit Mode
• 32-bit OS
• 64-bit OS
• 64-bit OS
• 32-bit apps (no
• 32-bit apps (no
re-compile required) re-compile required)
• 64-bit apps (recompile required)
• 32-bit drivers
• 64-bit drivers
• 64-bit drivers
• No re-compile
for applications
• 4 GB address space
• 64-bit flat virtual
address space
• GPRs are 32-bit
• GPRs are 64-bit
Processor can switch between each mode on a code-segment by
code-segment basis
• Allows 32/16-bit applications to run under 64-bit O/S w/o recompile
• Re-certification of the application may be required
July 7, 2015
A Survey Of Processor Technologies
Slide 29 of 38
Windows Server 64-bit Restrictions
•
No Mixed 64-Bit/32-Bit Processes
−
−
−
−
•
64-bit programs cannot load and call 32-bit MDAC
64-bit Microsoft Internet Explorer cannot load 32-bit ActiveX controls
The 64-bit shell cannot load 32-bit Inproc shell extensions
32-bit installer programs cannot load and register 64-bit DLLs
No 16-Bit Code
− No 16-bit code can run, except for recognized InstallShield and Acme
installers (these are hard-coded to allow them to work)
− 16-bit Setup bootstraps are not supported
− 16-bit MS-DOS and Microsoft Windows 3.x utilities will not start
•
•
No OS/2 or Posix Program Support
No Kernel-Mode 32-Bit Code
− 32-bit virus-detection or 32-bit file system filters
− 32-bit video adapter or 32-bit network adapter drivers
− 32-bit Kernel-mode printer drivers
http://support.microsoft.com/default.aspx?scid=kb;en-us;282423
July 7, 2015
A Survey Of Processor Technologies
Slide 30 of 38
Memory limits for HW and SW
Hardware
Platform
IA-32
IA-32
Extended
Memory 64
Opteron
Hardware
Address Bits
32 (4GB)
36 (64GB)
40 (1TB)
40 (1TB)
Software Platform
Windows 2003, 32-bit
64GB (EE)
RedHat Enterprise Server 4
64GB
Windows 2003, 32-bit
64GB (EE)
Windows 2003, 64-bit
1,024GB (EE)
RedHat Enterprise Server 4
64GB
Windows 2003, 32-bit
64GB (EE)
Windows 2003, 64-bit
1,024GB (EE)
RedHat Enterprise Server 4
Itanium
50 (16PB)
Windows 2003, 64-bit
RedHat Enterprise Server 4
July 7, 2015
Max Qualified
Physical Memory
A Survey Of Processor Technologies
128GB
1,024GB (DCE)
256GB
Slide 33 of 38
Customer choice
HP ProLiant
ProLiant
servers
and
Integrity
servers
1- to 8-way
x86 processor
architecture
•
•
•
•
HP Integrity
servers
Small- to medium-scale
applications and databases
Well-defined, less-complex
workloads
Primarily front-end/network edge
and application tier
Scale-out and small- to mid-size
scale-up
1- to 128-way
Intel® Itanium®
architecture
•
•
•
•
•
Medium to large-scale applications
and databases
Complex workloads—technical and
commercial
Primarily back-end database and
application tier
Enterprise scale-up and scale-out
Server consolidation
Customer-specific needs driven
July 7, 2015
A Survey Of Processor Technologies
Slide 34 of 38
What is not important?
•
64-bits
− Intel Pentium Xeon with EM64T, AMD Opteron and Intel
Itanium all use 64-bit memory addressing
− The operating systems running on those are fully 64-bit
− All major applications running on those are fully 64-bit
•
Multi-core and multi-thread
− All architectures have multi-core and multi-threaded
offerings
•
Single stream processor intensive benchmarks
− SPECint/fp run inside the cache of the chip, and don’t
tell you anything about memory performance, multiprocessor scalability or overall system throughput
•
GHz between architectures is worse than useless
July 7, 2015
A Survey Of Processor Technologies
Slide 35 of 38
How do you decide?
•
Pick your application
− But the major ISVs put their apps on many systems
•
Pick your performance level
− But mid-range performance can be achieved by almost
every platform, as very few people need 1M TPM-C
•
Pick your price point
− But vendors are all focused on the same price points
•
So if you have too many choices (and you will)...
July 7, 2015
A Survey Of Processor Technologies
Slide 36 of 38
How do you decide?
•
If your business requires
− Extreme reliability, availability and serviceability
− Scalability beyond 4 processors
− Scalability beyond 64GBytes of memory
− Scalability beyond 8 PCI slots
− Extreme performance under complex heavy loads
− In a single system, then you need Integrity
•
If your business requires
− High reliability, availability and serviceability
− Scalability to 4 processors
− Scalability to 64GBytes of memory
− Scalability to 8 PCI slots
− High performance under single function loads
− In a single system, then you need ProLiant
July 7, 2015
A Survey Of Processor Technologies
Slide 37 of 38
Resources
•
http://www.mdronline.com/mpr - Microprocessor
Report (subscription required)
•
http://www.realworldtech.com – General
processor information
•
http://www.itaniumsolutionsalliance.com – Itanium
information
July 7, 2015
A Survey Of Processor Technologies
Slide 38 of 38
July 7, 2015
A Survey Of Processor Technologies
Slide 39 of 38
Intel® Itanium® 2-based
microarchitecture block diagram
New for
Montecito
11 issue ports B B B
Foxton
IA-32
decode
and
control
Added
Protection
8 bundles
M M M M I
I
F F
Register stack engine/re-mapping
Branch
units
128integer
integerregisters
registers
128
Integer
and
MM units
Quad-port
L1 data
cache
128floating-point
floating-pointregisters
registers
128
Floatingpoint
units
DLTB
Bus Controller
Address
branch & predicate
registers
ALAT
Scoreboard, predicate,
NaTs, exceptions
L2 cache—quad port
Instruction
queue
Data
ECC
Pellston
Branch
Added
predication
Protection
ITLB
ECC
L3 cache
Tag
Data Tags
Data Tags
ECC
L1 instruction
instruction cache
cache and
L1
fetch/pre-fetch engine
ECC Protected
Parity Protected
NOTE: new features reduce incidents of hard partition crashes due to CPU failures by 75%
July 7, 2015
A Survey Of Processor Technologies
Slide 48 of 38
Application Register Set
General Registers
Gr 127
Floating-point Registers
FPr 127
Predicates
Branch Registers
P63
Br00
Br01
Br02
Br03
Br04
Br05
Br06
Br07
128 Floating
Point Registers
128 General
Registers
Instruction
Pointer Pointer
Instruction
FPr 32 – 127
Gr 32 – 127
Current Frame Marker
Can be used with
rotation
User Mask
Can be used with
rotation
Performance Monitor Data
Registers
P0
Gr 32
Gr 31
FPr 32
Application Registers
Kernel R 00
Kernel R 0x
Kernel R 07
Ar16 RSC
Ar17 BSP
Ar18 BSPointer
Ar19 RNAT
Ar21 FCR
Ar24 EFLAG
Ar25 CSD
Ar26 SSD
Ar27 CFLG
Ar28 FSR
Ar29 FIR
Ar30 FDR
Ar32 CCV
Ar36 UNAT
Advanced Load Address
Table
Ar40 FPSR
Ar44 ITC
Processor Identifiers
Gr 00
July 7, 2015
July 7, 2015
FPr 00
Ar64 PFS
Ar65 LC
Ar66 EC
Ar127
49
A Survey Of Processor Technologies
Slide 49 of 38
IBM Inter-Processor Communication
July 7, 2015
A Survey Of Processor Technologies
Slide 59 of 38