NUMA aware heap memory manager
Download
Report
Transcript NUMA aware heap memory manager
How to use our resources wisely in multi-thread and
multi-proccessor systems
Michael Shteinbok
Shai Ben Nun
Supervisor: Dmitri Perelman
What’s the problem?
While increasing number of cores/processors seems to
increase the performance in proportion, it usually
increases in lower numbers and sometimes even slows
us down.
Ok, But Why?
Some physical issues can still cause a bottleneck. For
example Memory Access and Management.
UMA – Uniformed Memory Access
Solution - NUMA
Non Uniform Memory Access - each CPU has it’s own close
memory with quick access.
One processor can access the other memory units but with
slower BW.
The NUMA API lets the programmer to manage the memory.
How One Manages The Memory?
When we allocate and free, we are blind to the process
in the background. For this need we have different lib
function. (glibc, TCMalloc, TCMalloc NUMA Aware)
The article refers to this management problem and
offers the TCMalloc NUMA Aware heap manager
Performance Change
The Previous solutions(more BW, more local access):
Project Goals
Improving the current TCMalloc numa aware in
scenarios it losses performance
Managing wisely the memory of different cores and
offer a new “read” function that will be faster (TBD)
Problem In Current Solution
When thread frees memory that was allocated by another
thread (on another numa node) we can get performance loss
Thread A
Thread A
X = malloc
X = Allocate
Thread B
Thread A
Free pool
Free (X)
Free (X)
Our Benchmark
Each couple of threads on a different numa node.
8 quad-cores processors => Total 16 couples.
Allocator
Thread
Alloc
Queue
Alloc Rand
List (5000)
First-touch
policy
Access
Memory
Free
Thread
Achieving Project Goals
Step By Step
Learn how does the TCMalloc work
Find scenarios that makes current TCM to loss
performance.
Sort the scenarios by most likely to occur in real
environment
Implement a support in these scenarios
Make new TCMalloc to get better performance!