Transcript ARMv8
Microprocessor system
architectures – ARMv8
Jakub Yaghob
ARM architecture
RISC
Large uniform register file
Load/store architecture
Simple addressing modes
Execution states
AArch64 x AArch32
Architecture profiles
A – application profile
R – real-time profile
M – microcontroller profile
Execution states – AArch64
AArch64
31 64-bit general-purpose registers
X30 – procedure link
64-bit PC, SPs, ELRs (exception link registers)
32 128-bit SIMD registers
Single instruction set A64
Exception levels EL0-EL3
64-bit virtual addressing
Names each system register with suffix that indicates
the lowest EL with access
PSTATE (Process state)
Execution states – AArch32
AArch32
13 32-bit general purpose registers
32-bit PC, SP, LR (link register)
Some registers banked for each execution mode
Single ELR (return from Hyp)
32 64-bit SIMD registers
A32 instruction set – fixed length encoding,
compatible with ARMv7
T32 instruction set – variable-length, compatible with
ARMv7 Thumb
32-bit virtual address
CPSR (current program state register)
Supported data types,
cryptographic extension
Integer
Floating point
B, H, W, D, Q
HP, SP, DP
IEEE 754
Cryptographic extension
Operates on the vector register file
AES, SHA1, SHA2-256
Memory model
The ARM memory model supports
Generating an exception on an unaligned memory access
Restricting access by applications to specified areas of memory
Translating virtual addresses provided by executing instructions
into physical addresses
AArch64 – 64-bit addressing, TCR (Translation Control Register)
determines VA range, EL0+EL1 have 2 independent VA ranges each
with its own TCR
AArch32 – 32-bit addressing, TCR determines VA range, OS can split
VA range into 2 subranges for EL0+EL1 with separate TCR
Altering the interpretation of multi-byte data between big-endian
and little-endian
Controlling the order of accesses to memory
Controlling caches and address translation structures
Synchronizing access to shared memory by multiple PEs
Application architecture –
AArch64
31 general-purpose registers R0-R30
64-bit GP registers X0-X30
32-bit GP registers W0-W30
Encoding 1Fh for register used as ZR (zero register)
32 vector registers V0-V31
FPCR, FPSR – floating-point status and control
register
SP 64-bit
X30 procedure link
WSP 32-bit
Current SP
PC 64-bit
Application architecture –
AArch64 – vector registers
Application architecture –
AArch64 – PSTATE
Process state for EL0
Data processing flags
N – negative
Z – zero
C – carry
V – overflow
Exception masking bits
D – debug mask
A – system error mask
I – IRQ mask
F – FIQ mask
System registers
Register naming
General system control registers
Debug registers
Generic timer registers
Performance monitor registers
Optional
Trace registers
<register_name>_Elx, x∈{0,1,2,3}
Optional
Generic Interrupt Controller (GIC) CPU interface registers
Optional
Software control and EL0
Exception handling
System instructions for control flow
WFI – Wait For Interrupt
WFE – Wait For Event
YIELD – hint
Can enter low-power state
Cache management
Interrupts
Memory system aborts
Undefined instructions
System calls
Secure monitor or Hypervisor traps
Must be enabled by EL1
Debug events
BKPT – breakpoint
DBG – hint to the debug system
HLT – entry to Debug state
Caches and memory hierarchy
Point of Unification
IC, DC see the same copy of a memory
Point of Coherency
All agents that can access memory are guaranteed to see the same copy
Memory types
Normal
Bulk memory operations, R/W, R/O
Device
Speculative reads forbidden
Additional attributes
Gathering
Reordering
Write can be acknowledged other than at the end point
Shareability
Preserves access order and synchronization requirements
Early write acknowledgement
Prevents aggregation of R/W
Non-shareable, inner shareable, outer shareable
Cacheability
Non-cacheable, write-through cacheable, write-back cacheable
Alignment
Instruction alignment
A64 instructions must be word-aligned
Data alignment
Unaligned access to any Device memory causes an
Alignment fault
Normal memory
SCTLR_ELx.A – configure unaligned access behavior
Generate an Alignment fault
Perform an unaligned access
Unaligned access
Not guaranteed to be atomic
Takes a number of additional cycles
Can abort more times for memory exceptions
Endian support
Instruction endianness
A64 instructions are always little-endian
Data endianness
SCTLR_EL1.E0E – configures endianness for
EL0 at EL1 or higher
Instructions for reverting data in registers
REV16, REV32, REV64
Synchronization and
semaphores
Load-exclusive instructions
Store-exclusive instructions
STXP, STXR, STXRH, STXRB
Clear-exclusive
LDXP, LDXR, LDXRH, LDXRB
CLREX
Should scale on MPS
Exception levels
Exception levels EL0-EL3
EL0 – unprivileged execution, applications
EL1 – OS kernel
EL2 – supports virtualization of non-secure operation,
hypervisor
EL3 – supports switching between two security states
(secure state, non-secure state), secure monitor
All implementations must include EL0 and EL1
Stack pointer register selection
SP_ELx
Exception levels
Exception mechanism
Saved Program Status Register
Saves PE state on taking exceptions
SPSR_ELx for exception taken to ELx
When returning from an exception, PE state restored
to the state stored SPSR
Exception link registers
ELR_ELx holds preferred exception return address
Exception vectors
Vector Base Address Register (VBAR)
Each Elx
Defines base address for the table at that ELx
System calls
SVC
HVC
Supervisor call exception
EL0 calls OS at EL1
Hypervisor call exception
For EL1 and higher
SMC
Secure monitor call exception
For EL1 and higher
Virtual Memory System
Architecture
VMSA
Provides MMU
MMU translates VAs to PAs independently for
ELx and security states
A64 has 48-bit VA and PA
Address translation system
VMSAv8-64
Translation Table Base Register (TTBR)
Translation Control Register (TCR)
Up to four levels of address lookup
IA of up to 48 bits
OA of up to 48 bits
A translation granule size of 4K, 16K, 64K
4K translation granule
16K translation granule
64K translation granule
Translation table entries –
levels 0-2
Translation table entries –
level 3
Attribute fields
MMU faults
All types of MMU exceptions
Alignment fault
Permission fault
Translation fault
Address size fault
Synchronous external abort on a translation table
walk
Access flag fault
TLB conflict abort
Translation Lookaside Buffers
(TLB)
TLB
Caches results from translation table walks
Global pages
Process-specific pages
Address Space Identifier (ASID)
Virtual Machine Identifier (VMID)
Concept of locked entries
Implementation defined size 8 or 16 bits
Optional for implementation
Maintenance instructions
TLBI <operation>{,Xt}