Hyperthread Support in OpenVMS V8.3

Download Report

Transcript Hyperthread Support in OpenVMS V8.3

Hyperthread Support in
OpenVMS V8.3
What to do about Montecito?
© 2003 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without
Pre-Summary
•
We added some features to help you manage
hyperthreads
–
–
–
•
We added some features to reduce hyperthreads
hurting or confusing you
–
–
•
SHOW CPU/BRIEF displays thread info
SET CPU/NOCOTHREAD
[SYSTEST]HTHREAD.EXE
Scheduler change
Accounting change
You need to experiment with your own application mix
to see if hyperthreads help you
July 20, 2015
page 2
Definitions of terms
•
Processor
–
•
Core
–
•
A ‘thing’ within a core that logically executes programs
CPU
–
•
A ‘thing’ within a processor that physically executes
programs
Hyperthread
–
•
A chip or package
The OpenVMS abstraction for a ‘thing’ that executes
programs
Thread of execution
–
Software concept of what a CPU executes
July 20, 2015
page 3
What is “Hyperthreading” vs “Dual Core”?
•
•
•
Both are features of new “Montecito” Itanium chips
Both abstracted as CPUs on OpenVMS
Very different in implementation
July 20, 2015
page 4
Dual Core
•
•
•
•
Two (nearly) complete CPUs on one chip
Think two older CPU chips glued together :-)
Separate cache, separate processing units, separate
state. (Share bus interface)
Both cores executing simultaneously
July 20, 2015
page 5
Montecito Micrograph
1MB L2I
2 Way
Multi-threading
Power
Management/
Frequency
Boost
(Foxton)
Dualcore
Soft Error
Detection/
Correction
July 20, 2015
2x12MB L3
caches
with
Arbiter
Pellston
page 6
Dual Cores
L1I
Cache (16KB)
B
B
B
Branch
Prediction
I
I
M
M
Instruction
TLB
M
M
F
L1I
Cache (16KB)
F
B
Branch Unit
L1D
Cache (16KB)
Integer
Unit
Memory/
Integer
ALAT
Floating
Point Unit
Data
TLB
L2D
Cache (256KB)
L2I
Cache (1MB)
Queues/
Control
L3
Cache (12MB)
Synchronizer
July 20, 2015
Floating
Point
Registers
Integer
Registers
I
Branch &
Predicate
Registers
Branch Unit
L1D
Cache (16KB)
Arbiter
Branch &
Predicate
Registers
B
I
M
M
Instruction
TLB
M
M
F
F
Register Stack Engine / Re-name
System Interface
Register Stack Engine / Re-name
B
Branch
Prediction
Floating
Point
Registers
Integer
Registers
Integer
Unit
Memory/
Integer
ALAT
Floating
Point Unit
Data
TLB
L2D
Cache (256KB)
L2I
Cache (1MB)
Queues/
Control
L3
Cache (12MB)
Synchronizer
page 7
Hyperthreading
•
Hyperthread: A set of state (e.g. user registers, control
registers, IP, etc) in a core
–
–
•
•
Shares execution resources with other threads
Only one hyperthread active (i.e. executing a program) at
once on Montecito
When hyperthread blocks, other hyperthread activates
Also swaps on a timer
July 20, 2015
page 8
Montecito Multi-threading
Serial Execution
Ai
Idle
Ai+1
Bi
Idle
Bi+1
Montecito Multi-threaded Execution
Ai
Idle Ai+1
Bi
Bi+1
Multi-threading decreases stalls
and increases performance
July 20, 2015
page 9
Dynamic Thread Switching
July 20, 2015
•
Speculate that a long latency event will stall execution
– L3 miss
– Uncached accesses
•
Time outs ensure fairness
•
•
hint@pause gives software control
OS has no knowledge or control of hyperthread
switches
page 10
Hyperthread Abstraction in VMS
•
Reminder: 1 processor (or package or chip) has
• 2 Cores
• 4 Threads
•
•
•
Each hyperthread appears in OpenVMS as a CPU
CPUs that share the same cores are called “Cothread
CPUs”
Note: Cores that share a processor (or package or
chip) are not named or treated differently
July 20, 2015
page 11
Identifying CoThread CPUs on OpenVMS
•
$ show cpu/brief
•
System: XXXXXX, HP rx4640
•
•
CPU 0
•
•
CPU 1
CPU 2
•
•
CPU 3
•
July 20, 2015
CPUDB: 820FDF80
Current: 000004C8
Handle: 00005E80
Partition 0
CPUDB: 820FFC80
Current: 000004C8
Handle: 00005F90
Partition 0
CPUDB: 82101A80
Current: 000004C8
Handle: 000060A0
Partition 0
10
State: RUN
Owner: 000004C8
Cothd:
Handle: 00005D70
Partition 0
9
State: RUN
Owner: 000004C8
Cothd:
•
CPUDB: 8202A000
Current: 000004C8
8
State: RUN
Owner: 000004C8
Cothd:
•
•
•
State: RUN
Owner: 000004C8
Cothd:
•
(1.40GHz/12.0MB)
11
page 12
Tradeoffs with Hyperthreads: Basics
•
•
One core with two threads MAY perform better than one
core with one thread (but not always)
One core with two threads NEVER performs as well as
two cores
July 20, 2015
page 13
Montecito Multi-threading
Serial Execution
Ai
Idle
Ai+1
Bi
Idle
Bi+1
Montecito Multi-threaded Execution
Ai
Idle Ai+1
Bi
Bi+1
Multi-threading decreases stalls
and increases performance
July 20, 2015
page 14
Montecito Multi-threading (No Stalls)
Serial Execution
Ai
Ai+1
Bi
Bi+1
Montecito Multi-threaded Execution
Ai
Ai+1
Bi
July 20, 2015
Bi+1
page 15
Multi-threading vs Two Cores
Execution on Two Cores
Ai
Bi
Ai+1
Bi+1
Montecito Multi-threaded Execution
Ai
Ai+1
Bi
July 20, 2015
Bi+1
page 16
VMS support for Hyperthreading
•
Three categories of support
–
–
–
Managing/getting info
Reducing “waste” of hyperthread cycles
Scheduling
July 20, 2015
page 17
Managing/Getting Info
•
Hyperthread to CPU mapping
–
First thread of all cores followed by second threads
• Ex: 2 processor system. CPU 0,1,2,3 are all separate cores.
CPU 4,5,6,7 are cothreads of 0,1,2,3
•
SHOW CPU/BRIEF and /FULL
–
•
SET CPU/[NO]COTHREAD
–
•
Notes CPU that is the Cothread of the displayed CPU
Stops one of the cothreads on the core associated with
this CPU
Accounting
–
Only charge a process ½ the CPU time if CPUs cothread
is busy
July 20, 2015
page 18
Managing
•
Efi command: cpuconfig threads on/off
–
–
•
Supported part of efi
Requires two resets: one to get to efi; one to make thread
command take effect.
[systest]hthread.exe
–
–
–
–
Like RADCHECK, an unsupported but helpful little utility
Check and modify firmware state of hyperthreading
$hthread –show $hthread –on $ hthread –off
Change after next reboot (i.e. only a single reset)
July 20, 2015
page 19
Reducing Hyperthread Cycle Waste
•
•
Main point: A hyperthread spinning in halt or idle still
uses cycles that its cothread might have used
Idle loop
–
–
•
STOP/CPU
–
•
hint@pause between each check for busy
Power saver mode as usual
hint@pause while halted
Future possibilities:
–
–
hint@pause while spinning on locks?
Tradeoffs abound!
July 20, 2015
page 20
Scheduler Changes
•
Two cores always better than two hyperthreads on the
same core so:
–
•
Attempt to schedule processes on CPUs without a busy
cothread
Ties in with waste reduction since an idle hyperthread
will give up its cycles to its cothread
July 20, 2015
page 21
Question you are too polite to ask
•
Why didn’t you change the scheduler to make good use
of hyperthreads?
Answer:
•
We don’t know how.
•
Seriously, it is VERY application mix dependent.
•
July 20, 2015
page 22
Tradeoffs with Hyperthreading
•
Imagine you want to make best use of hyperthreads
–
What threads of execution do you run on same core?
July 20, 2015
page 23
Who shares a core?
•
Threads that share the same memory space (e.g.
kernel threads within a process)
–
–
•
Threads that have nothing to do with each other
–
–
•
They might share some cache and require fewer cache
fills and thus perform better!
But if they stall less, hyperthreads are less advantageous!
More cache misses so threads help more
But more cache misses means poorer individual
performance!
Clearly there is a tradeoff somewhere, but we can’t
make it automatically
July 20, 2015
page 24
My recommendation
•
Even without threads, Montecito works well
–
•
Experiment with processes on threads
–
•
Try it with threads off; you will likely be happy
Use affinity to group different processes on cothreads, or
to avoid cothreads
Experiment with fastpath CPUs on threads.
–
Do you get better throughput spreading I/O across all
threads or only using one thread per core?
July 20, 2015
page 25
Other features - Soon
•
ar.ruc
•
NUMA
•
Power control
July 20, 2015
page 26
Other features further out
•
User mode rfi
–
–
Might allow one to go to an instruction within a bundle
Useful for AST returns (maybe?)
July 20, 2015
page 27