Transcript ARMv8

Microprocessor system
architectures – ARMv8
Jakub Yaghob
ARM architecture





RISC
Large uniform register file
Load/store architecture
Simple addressing modes
Execution states


AArch64 x AArch32
Architecture profiles



A – application profile
R – real-time profile
M – microcontroller profile
Execution states – AArch64

AArch64

31 64-bit general-purpose registers








X30 – procedure link
64-bit PC, SPs, ELRs (exception link registers)
32 128-bit SIMD registers
Single instruction set A64
Exception levels EL0-EL3
64-bit virtual addressing
Names each system register with suffix that indicates
the lowest EL with access
PSTATE (Process state)
Execution states – AArch32

AArch32









13 32-bit general purpose registers
32-bit PC, SP, LR (link register)
Some registers banked for each execution mode
Single ELR (return from Hyp)
32 64-bit SIMD registers
A32 instruction set – fixed length encoding,
compatible with ARMv7
T32 instruction set – variable-length, compatible with
ARMv7 Thumb
32-bit virtual address
CPSR (current program state register)
Supported data types,
cryptographic extension

Integer


Floating point



B, H, W, D, Q
HP, SP, DP
IEEE 754
Cryptographic extension


Operates on the vector register file
AES, SHA1, SHA2-256
Memory model

The ARM memory model supports



Generating an exception on an unaligned memory access
Restricting access by applications to specified areas of memory
Translating virtual addresses provided by executing instructions
into physical addresses






AArch64 – 64-bit addressing, TCR (Translation Control Register)
determines VA range, EL0+EL1 have 2 independent VA ranges each
with its own TCR
AArch32 – 32-bit addressing, TCR determines VA range, OS can split
VA range into 2 subranges for EL0+EL1 with separate TCR
Altering the interpretation of multi-byte data between big-endian
and little-endian
Controlling the order of accesses to memory
Controlling caches and address translation structures
Synchronizing access to shared memory by multiple PEs
Application architecture –
AArch64

31 general-purpose registers R0-R30

64-bit GP registers X0-X30






32-bit GP registers W0-W30
Encoding 1Fh for register used as ZR (zero register)
32 vector registers V0-V31
FPCR, FPSR – floating-point status and control
register
SP 64-bit



X30 procedure link
WSP 32-bit
Current SP
PC 64-bit
Application architecture –
AArch64 – vector registers
Application architecture –
AArch64 – PSTATE

Process state for EL0

Data processing flags





N – negative
Z – zero
C – carry
V – overflow
Exception masking bits




D – debug mask
A – system error mask
I – IRQ mask
F – FIQ mask
System registers

Register naming





General system control registers
Debug registers
Generic timer registers
Performance monitor registers


Optional
Trace registers


<register_name>_Elx, x∈{0,1,2,3}
Optional
Generic Interrupt Controller (GIC) CPU interface registers

Optional
Software control and EL0

Exception handling






System instructions for control flow





WFI – Wait For Interrupt
WFE – Wait For Event
YIELD – hint
Can enter low-power state
Cache management


Interrupts
Memory system aborts
Undefined instructions
System calls
Secure monitor or Hypervisor traps
Must be enabled by EL1
Debug events



BKPT – breakpoint
DBG – hint to the debug system
HLT – entry to Debug state
Caches and memory hierarchy

Point of Unification


IC, DC see the same copy of a memory
Point of Coherency

All agents that can access memory are guaranteed to see the same copy
Memory types

Normal


Bulk memory operations, R/W, R/O
Device


Speculative reads forbidden
Additional attributes

Gathering


Reordering


Write can be acknowledged other than at the end point
Shareability


Preserves access order and synchronization requirements
Early write acknowledgement


Prevents aggregation of R/W
Non-shareable, inner shareable, outer shareable
Cacheability

Non-cacheable, write-through cacheable, write-back cacheable
Alignment

Instruction alignment


A64 instructions must be word-aligned
Data alignment


Unaligned access to any Device memory causes an
Alignment fault
Normal memory

SCTLR_ELx.A – configure unaligned access behavior



Generate an Alignment fault
Perform an unaligned access
Unaligned access



Not guaranteed to be atomic
Takes a number of additional cycles
Can abort more times for memory exceptions
Endian support

Instruction endianness


A64 instructions are always little-endian
Data endianness


SCTLR_EL1.E0E – configures endianness for
EL0 at EL1 or higher
Instructions for reverting data in registers

REV16, REV32, REV64
Synchronization and
semaphores

Load-exclusive instructions


Store-exclusive instructions


STXP, STXR, STXRH, STXRB
Clear-exclusive


LDXP, LDXR, LDXRH, LDXRB
CLREX
Should scale on MPS
Exception levels

Exception levels EL0-EL3






EL0 – unprivileged execution, applications
EL1 – OS kernel
EL2 – supports virtualization of non-secure operation,
hypervisor
EL3 – supports switching between two security states
(secure state, non-secure state), secure monitor
All implementations must include EL0 and EL1
Stack pointer register selection

SP_ELx
Exception levels
Exception mechanism

Saved Program Status Register




Saves PE state on taking exceptions
SPSR_ELx for exception taken to ELx
When returning from an exception, PE state restored
to the state stored SPSR
Exception link registers

ELR_ELx holds preferred exception return address
Exception vectors

Vector Base Address Register (VBAR)


Each Elx
Defines base address for the table at that ELx
System calls

SVC



HVC



Supervisor call exception
EL0 calls OS at EL1
Hypervisor call exception
For EL1 and higher
SMC


Secure monitor call exception
For EL1 and higher
Virtual Memory System
Architecture

VMSA



Provides MMU
MMU translates VAs to PAs independently for
ELx and security states
A64 has 48-bit VA and PA
Address translation system

VMSAv8-64






Translation Table Base Register (TTBR)
Translation Control Register (TCR)
Up to four levels of address lookup
IA of up to 48 bits
OA of up to 48 bits
A translation granule size of 4K, 16K, 64K
4K translation granule
16K translation granule
64K translation granule
Translation table entries –
levels 0-2
Translation table entries –
level 3
Attribute fields
MMU faults

All types of MMU exceptions







Alignment fault
Permission fault
Translation fault
Address size fault
Synchronous external abort on a translation table
walk
Access flag fault
TLB conflict abort
Translation Lookaside Buffers
(TLB)

TLB



Caches results from translation table walks
Global pages
Process-specific pages

Address Space Identifier (ASID)



Virtual Machine Identifier (VMID)
Concept of locked entries


Implementation defined size 8 or 16 bits
Optional for implementation
Maintenance instructions

TLBI <operation>{,Xt}