Seminar 1: Efficient Algorithms for Molecular Dynamics

Download Report

Transcript Seminar 1: Efficient Algorithms for Molecular Dynamics

Seminar 2:
Efficient Algorithms for Molecular
Dynamics Simulations and Other
Dynamic Spatial Join Queries
Andrew Noske
Thesis & code is available at:
>> http://manning.it.jcu.edu.au/~jc130551/thesis/ <<
About the Project
Build an “engine layer” for particle simulations –
part of TOMSK (Towards Molecular Structure
Kinetics) Project
Molecular Dynamics (MD):
integrates equations of motion
of atoms each timestep.

Cannot predict precisely what
will happen: generates
statistical prediction
Definitions
Range Query: Find all points
(neighbour atoms) within
cutoff radius of each other
Self-spatial join query:
Execute range query for all
points (to determine
complete list of neighbour
atoms)
Single
Range
Query
Ignore
Cutoff radius (Rc)
Spatial
join query:
Methods Used in Testing
Lennard-Jones
Potential Pair:
I have used this to simulate
atoms in a stable liquid.
*NOTE: Final code allows you to extend system class &
implement any interaction model & statistical analysis
functions you choose.
Methods Used in Testing
To simulate bulk liquid:
Have used Periodic Bounding Condition 
(boundaries wrap around)
1
2
2
1
2
5
5
3
4
4
4
3
1
3
2
1
5
3
2
1
5
4
3
2
1
5
3
2
1
5
2
Range search on
box with PBC
5
4
3
1
4
3
1
4
2
5
4
4
Microscopic droplet
(finite particles)
2
5
5
Surface
particle
have less
neighbours
1
4
3
PBC on 2D box
3
Spatial Data Structure: Fixed Grid
Fixed grid:


Most effective for uniform particle distribution.
Time to build (place
Fixed grid
all points
NOTE: cells per side (CPS)=5
into index) = O(N)
boxLen
rc
Cutoff
radius
(rc)
cellLen
Original cell list
technique
Simulation Steps
Set-up:



#1) Setup grid structure
#2) Setup atoms in offset lattice
#3) Assign random velocities.
Iterate:





#1) Build grid (assign all atoms to cells).
#2) Build neighbor list (for each atom in
each cell: find neighbors).  take up to 95% of time
#3) Calculate force and move atoms  implements
interaction model (can change).
#4) Wrap atoms back into cell boundaries.
#5) Increment timestep.
Scientific Process
Used MS Visual C++
console application.
Testing process:


Run a series of simulations
in batch…
Output results to CSV…
including:
grid parameters,
clock tics elapsed,  main focus
distance calculates,
& more


Analyze/graph CSV using
Excel.
Was time consuming! 
Atom List vs. Cell List
Atom list approach:
For each atom: check
which cells are within rc of
atom
Cell list approach:
Predetermine which cells are
within rc of each cell
rc
Original (cubic) cell list
q
q
Cubic atom list
Minimum cell list
NOTE: Volume sphere = 52% of it’s bounding cube
Loaded vs. Unloaded Cell List
Loaded cell list vs. unloaded cell list
12000
(3 min)
unloaded:
numAtoms=20000
10000
loaded:
numAtoms=20000
7985
timeToLoadAdjList
numAtoms=200000
8000
avgClocksPerIt
(2 mins)
6000
4000
num Atom s:
10000,20000
rc: 0.9
boxLen: 10
CPS: *
timeStep: 0.004
timeStepsToExe: 1
useVerlet: 0
technique:
half_min_cell_list
UNLOADED
(1 min)
LOADED
2000
31
INITIAL COST TO
LOAD
875
0
0
1
2
3
cellSidesDivRs
4
5
Half-Range Search Technique
Normal
approach:
Search
sphere
Faster
approach:
Search upper
hemi-sphere
i
HALF
minimum
cell list
minimum
cell list
j
Range search
i
j
HALF-range
search
Size of Various Cell Lists
Number of cells for different types of cell lists
800
700
600
# cells
500
400
300
200
cubic full
min full
100
cubic half
min half
0
0
1
2
cell sides per search radius
3
4
Size of Various Cell Lists
cubic full
Size comparison of different types of cell lists
20000
1
18000
0.9
min full
cubic half
# cells
min half
16000
0.8
14000
0.7
12000
0.6
10000
0.5
8000
0.4
6000
0.3
4000
0.2
2000
0.1
0
0
0
1
2
3
4
5
6
7
8
cell sides per search radius
9
10
11
12
cells in full min list
/ full cubic list
cells in half min
list / half cubic list
Half-Range Search Technique
Full range search vs. half range search for minimum cell list over
changing CPS
12000
60.00%
half min cell list
full min cell list
50.00%
10000
improvement
40.00%
6000
20.00%
2649
4000
10.00%
2000
0.00%
2115
cellSidesPerRs
4.
04
3.
76
3.
48
3.
20
2.
92
2.
65
2.
37
2.
09
1.
81
1.
53
1.
25
-10.00%
0.
97
0
percentage
improvement
30.00%
0.
70
avgTicsPerIt
8000
numAtoms: 10,000
rc: 3
boxLen: 21.54
CPS: 5-30
timeStep: 0.01
timeStepsToExe: 20
useVerlet: 0
density=1
Half-Range Search Technique
Full range search vs. half range search for atom list over
changing CPS
7000
6000
90.00%
half cubic
atom list
80.00%
full cubic atom
list
improvement
70.00%
5000
50.00%
3000
40.00%
30.00%
2000
20.00%
1000
2085
10.00%
0
0.00%
0.
70
0.
97
1.
25
1.
53
1.
81
2.
09
2.
37
2.
65
2.
92
3.
20
3.
48
3.
76
4.
04
avgTicsPerIt
4000
percentage
improvement
60.00%
3230
cellSidesPerRs
numAtoms: 10,000
rc: 3
boxLen: 21.54
CPS: 5-30
timeStep: 0.01
timeStepsToExe: 200
useVerlet: 0
density=1
Atom List vs. Cell Lists
Half minimum cell list vs. half atom list
800
atoms=5000 cubic atom list
700
atoms=5000 unloaded min
cell list
avgTicsPerIt
600
atoms=5000 (loaded) min cell
list
500
400
numAtoms: 5000
rc: 0.1
boxLen: 1
CPS: *
timeStep: 0.01
timeStepsToExe:
20
useVerlet: 0
394.5
300
306.4
297
200
100
0
0
0.5
1
1.5
2
cellSidesPerRs
2.5
3
3.5
Improving Cache Hits through
Spatial Locality
Spatial locality principle: objects close to
referred ones are more likely to be
requested in the future.
 Unsorted atoms  many cache misses.
4
5
3
2
1
6
Space-filling curve: A line passing
through every point in grid, in some order.
Row-wise
Gray curve
Z-ordering
Hilbert curve
Improving Spatial Locality with
Space-Filling Curve
Space-filling curve performance
10000
Random
ordering
9000
Z-order
avgTicsPerIt
8000
Hilbert
7000
6000
5000
numAtoms: *
rc: 1
boxLen: 8
CPS: 8
timeStep: 0.004
timeStepsToExe: 3
useVerlet: 0
4000
3000
Less than 3%
performance
improvements 
2000
1000
0
0
5000
10000
15000
numAtoms
20000
25000
technique:
cubic_atom_list
Improvements
for small
number of
atoms were
crap.
However
improvements
for 200,000+
atoms were
significant 
virtual paging
Finding Optimal Cells Per Side
Peformance vs. cellSidesPerRs for different numAtoms
for a cell list
5000 atoms
100
4000 atoms
90
3000 atoms
2000 atoms
80
1000 atoms
avgTicsPerIt
70
numAtoms:
1000-5000
rc: 1
boxLen: 10
CPS: *
useVerlet: 0
cellLen: 2.5
numCells: 64
boxVol: 1000
avgAtomsPerCell:
*
cellSidesPerRc: *
60
50
40
30
20
10
0
0
1
2
cellSidesPerRs
3
technique:
half_min_cell_list
Finding Optimal Cells Per Side
Performance vs. avgAtomsPerCell using a cubic atom list
250
numAtoms=1000,r
c=1
numAtoms=2000,r
c=1
200
avgTicsPerIt
numAtoms=5000,r
c=1
numAtoms=10000,
rc=1
150
numAtoms:
1000-10,000
rc: 1
boxLen: *
CPS: *
useVerlet: 0
density: 1
100
50
technique:
cubic_atom_list
0
0
0.5
1
1.5
2
averageAtomsPerCell
2.5
3
3.5
EXTRA SLIDE
Finding Optimal Cells Per Side
Performance vs. CPS using cubic atom list
numAtoms=1000,
rc=1
numAtoms=2000,
rc=1
numAtoms=5000,
rc=1
numAtoms=10000,
rc=1
300
250
avgTicsPerIt
200
actual
minimum
150
local minimum
numAtoms:
1000-10,000
rc: 1
boxLen=
num Atom s^1/3
CPS: *
useVerlet: 0
density: 1
100
50
0
0
5
10
15
20
CPS
25
30
35
technique:
cubic_atom_list
Verlet Neighbour List Technique
Choose a “verlet radius” greater than rc.
Build the “verlet neighbor list” using verlet radius
(use the fixed grid)
Next few iterations: update list by recalculating
distances & flag neighbour pairs inside rc.
Skin/verlet radius
(Rv)
5
Rl
6
7
6'
7'
1
Rc
2
Cut-off sphere
3
Skin
4
Verlet Neighbour List
Performance vs. verlet radius
328
avgTicsPerIt
350
avgItsPerNeiRe
buildB
325
300
This line represents the same
simulation done with no verlet radius only marginally worse than using a
verlet radius of 1 (whereby
displacements would be checked
every timestep and an extra rebuild
would obviously be needed).
200
numAtoms: 5000
rc: 3
boxLen: 20
CPS: 20
timeStep: 0.01
timeStepsElap: 400
useVerlet: 1
rVerlet: 1.0-1.5
150
100
76
avgMaxVel = 0.0087
technique:
half_cubic_atom_list
50
verlet radius / cutoff radius
1.5
1.45
1.4
1.35
1.3
1.25
1.2
1.15
1.1
1.05
0
1
avgTicsPerIt
250
Verlet Neighbour List
Optimal verlet performance for different particle velocities
350
250000
200000
avgTicsPerIt
250
150000
200
150
100000
100
50000
50
84
97
107
119
0
0
1
1.05
1.1
1.15
1.2
1.25
verlet radius / cutoff radius
1.3
1.35
1.4
number of verlet neighbours
300
maxVel=0.02
maxVel=0.04
maxVel=0.06
maxVel=0.08
avgNeiSize
numAtoms: 1000
rc: 1
boxLen: 3
CPS: 9
timeStep: 1
timeStepsElap: 200
useVerlet: 1
rVerlet: *
density: 37
m axVel: 0.002-0.008
technique:
half_cubic_atom_list
Finding
the optimalVerlet
verlet radius
Finding
Optimal
Radius
250
Result obtained using algorithm
for 1400 timesteps - only slighly
w orse than optimal
Starting value of
verlet radius w as
1.2*cutoff radius
avgTicsPerIt
200
1
3
150
15
16 3
3 17
3
100
2
4 3 3
3 3Order
rvDivRc is changed
using algorithm
18
3
First "fluctuation" detected, three possilbe
options at this point are to:
1) stop changing rc.
2) keep going, but decrease rate of change per
iteration to obtain more accurate optimal value.
3) keep going at same rate, in case particles
change velocity. << chosen option
50
…
3
numA to ms: 5000
rc: 2
bo xLen: 17
CP S: 18
timeStep: 1
timeStepsTo Exe: 50
useVerlet: 100
rVerlet: 2-2.76
maxParticleVel = 0.001
technique:
half_cubic_atom_list
0
1
1.05
1.1
1.15
1.2
rvDivRc
1.25
1.3
1.35
1.4
Selective Checking of Verlet List
Using Atom Displacement
Don’t update distances between verlet neighbours
outside cut-off radius each iteration: calculate
safeDistance for these neighbours & decrement
that by the displacement of both atoms each
iteration.
5
Rl
6
6'
2
Timestep
displacements both
atoms suddenly >
safeDistance …
1
Rc
3
4
compute neighbour
distance and
SafeDistance
safeDistance again
percent
Fraction of atoms inside verlet radius but outside
cutoff radius
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
(separation distance – cutoff radius)
1
1.1
1.2
1.3
verlet radius / cutoff radius
1.4
Selective Checking of Verlet List
Using Atom Displacement
Performance improvement of selective checking
40
total cost per iteration
no check
maxVel check
35
displacement check
no check
30
maxVel check
avgTicsPerIt
displacement check
25
19.19
18.82
cost of updating list
numAtoms: 1000
rc: 2
boxLen: 10
CPS: 9
timeStep: 0.001
timeStepsElap: 1000
useVerlet: 1
rVerlet: *
maxVel: 4
20
17.85
15
10
5
0
1
1.1
1.2
1.3
rvDivRc
1.4
1.5
technique:
half_cubic_atom_list
Summary of Results & Contribution
Results for proposed techniques:

Minimum cell list  better than traditional cell list
()


Half-range search  great results! ()
Using MBRs in cells  not effective for evenly
spread particles, but has potential for skewed data sets

Sub-grid cell list template guide  refines # of
cells in loaded cell list you must check for each atom –
results successful, but only when avg. # of atoms per
cell were high.

Early elimination of non-neighbours  good
idea, yet made performance worse ().

Selective checking of verlet list using atom
displacement  makes verlet list cheaper to update
(especially if verlet radius big) ()
Interesting
ideas &
results, but
insufficient
time to
cover here
Summary of Results & Contribution
Other findings:

Space-filling curves  if simulation fits in memory gives
negligible improvement, if not curve can up-to double
performance! Hilbert curve = best curve. ()
Argued half-cubic atom list = best choice (most
dynamic)
 Investigated optimal # of cells per side & verlet radius
 Proposed algorithms to find optimal values above –
worked quite well ()
Interesting results,
 Investigated performance vs. accuracy
but insufficient time
to cover
relationship for MD simulation
Thesis & code is guideline to implementing/optimizing
spatial join simulations

Future Work
Much opportunity to further investigate
optimization techniques including those
proposed.
Some suggestions:





Test techniques for much larger simulations (can’t in
memory).
Investigate other optimal value finding algorithms –
find optimal verlet radius & cells per side
simultaneously!
Try different compilers/languages & platforms.
Test techniques for skewed data sets.
… and many more little technical suggestions – more
intelligent ways to decide which cells to check.
Work may be carried out by future TOMSK
students.
DEMONSTRATION OF CODE
>> http://manning.it.jcu.edu.au/~jc130551/thesis/ <<
QUESTION TIME
>> http://manning.it.jcu.edu.au/~jc130551/thesis/ <<
“Don’t tell anyone, but I fell
asleep after the first slide”
QUESTION TIME OVER…
Thank you for listening,
and thanks again to those who
helped with my thesis.
QUESTION
TIME
>> http://manning.it.jcu.edu.au/~jc130551/thesis/ <<
THESIS – first half
I’m not going to the next slide until everyone
has finished reading up to here
OTHER TECHNIQUES
Sub-Grid Cell List Template Guide
Sub-grid: break each cell into (imaginary)
sub-grid.

When considering an atom, check with subcell it belongs to, that sub-cell will refine which
cells in the cell list to search.
Minimum cell list
representation:
{ (-1,1), (0,1), (1,1),
(-1,0), (0,0), (1,0), }
Corresponding cell list template
guide (for this sub-cell only):
{ (false), (false), (false),
(true), (true), (false) }
Sub-Grid Cell List Template Guide
Effectiveness of sub-grid over varying search radius
using a half mininum cell list
avg fraction of cell in min cell list in range of
sub cells
1
CPS=2
0.9
CPS=3
0.8
CPS=4
CPS=5
0.7
CPS=6
0.6
CPS=10
CPS=20
0.5
0.4
0.3
0.2
0.1
0
0
0.5
1
1.5
cellSidesPerRs
2
2.5
3
Sub-Grid Cell List Template Guide
Sub-grid Performance
1000 atoms with
subgrid
1600
1000 atoms
(without subgrid)
1400
2000 atoms with
subgrid
2000 atoms
(without subgrid)
avg clocks per iteration
1200
3000 atoms with
subgrid
1000
3000 atoms
(without subgrid)
800
num Atom s:
1000,2000,3000
rc: *
boxLen: 1
CPS: 10
timeStep: 0.004
timeStepsToExe: 10
useSubGrid: 1
subGCPS : 4
600
400
200
technique:
half_min_cell_list
0
0
0.5
1
1.5
2
cellSideDivRs
2.5
3
3.5
Sub-Grid Cell List Template Guide
Sub-grid performance
1000 atoms
(1 atom per
cell)
2000 atoms
(2 atoms per
cell)
3000 atoms
(3 atoms per
cell)
10000 atoms
(10 atoms
per cell)
1.6
1.4
avgTicsPerIt
1.2
1
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
cellSideDivRs
2
2.5
3
numAtoms:
1000,2000,3000
rc: 0.01-0.3
boxLen: 1
CPS: 10
timeStep: 0.004
timeStepsToExe:
10
useSubGrid: 1
subGCPS : 4
technique:
half_min_cell_list
Use of MBRs in Cells
For each cell, maintain a Minimum Bounding
Rectangle (MBR) around it’s atoms.

For any cell JUST tipped by rc, check atom is inside
rc of MBR before considering atoms exhaustively.
This cell is just tipped”,
but MBR is out of range
MBR
Use of MBRs in Cells
Performance of variations on atom list
120
atoms=1000 min atom list
using MBRs
100
atoms=1000 cubic atom list
80
avgTicsPerIt
atoms=1000 min atom list
60
numAtoms: 1000
rc: 0.01-0.25
boxLen: 1
CPS: 10
timeStep: 0.01
timeStepsToExe: 20
useVerlet: 0
40
20
0
0
0.5
1
1.5
2
cellSidesPerRs
2.5
3
3.5
EXTRA SLIDE
Early Elimination Technique
Early elimination if distance
between atoms along any
dimension > cutoffRadius.
Don’t calculate sqrt:


Lennard-Jones can be done
using dist2
NOTE:
if (dist2 <= cutoffRadius2)
 is in range
Determine optimal # of cells per size.

Is cell length = cutoff radius optimal?
j
i
EXTRA SLIDE
Early Elimination Technique
Early Elimination Of Non-Neighbours
10000
9000
no early elim &
using sqrt
8000
no early elim
avgTicsPerIt
7000
early elim
6000
5000
numAtoms: 10000
rc: *
boxLen: 1
CPS: 10
timeStep: 0.004
timeStepsToExe: 5
useVerlet: 0
4000
3000
2000
1000
0
0.2
0.4
0.6
0.8
1
1.2
cellSidesPerRc
1.4
1.6
1.8
2
technique:
half_min_cell_list
ADDITIONAL RESULTS
EXTRA SLIDE
Linear Growth
numAtoms vs speed for const avgLatticeSpacing
4.5
num Atom s: *(4^3-100^3)
rc: 1
boxLen: *(4-100)
CPS: *(4-100)
timeStep: 0.004
timeStepsToExe: 5
useVerlet: 0
4
avgTimePerIt
3.5
3
2.5
2
avgAtomsPerCell: 1
density: 1
avgLatticeSpacing: 1
1.5
1
0.5
0
0
20000
40000
60000
80000
100000
120000
140000
numAtoms
rc
rc
rc
EXTRA SLIDE
Linear Growth
numAtoms vs speed for const rc and boxLen
7
avgTimePerIt
600,000
avgTimePerIt
6
500,000
5
400,000
4
300,000
3
200,000
2
100,000
1
0
avgNumNeigsPerIt
avgNeiSize
num Atom s: *
rc: 0.9
boxLen: 10
CPS: 10
timeStep: 0.004
timeStepsToExe: 5
useVerlet: 0
avgAtomsPerCell: *
density: *
avgLatticeSpacing: *
0
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
numAtoms
O(N)  where N is number of neighbors
rc
Performance vs. Accuracy
Some techniques that will
improve performance, but
decrease “accuracy”:

Increasing timestep

Decreasing cutoff radius
small
timestep
larger
timestep
large rc
I collected results for these by
comparing all simulations with
control simulations
smaller rc
EXTRA SLIDE
Optimal choice of cell length
avAtomsPerCell vs avgTimePerIt
5000 atoms
1.4
4000 atoms
1.2
3000 atoms
2000 atoms
avgTimePerIt
1
1000 atoms
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
2
cellSidesDivRc
2.5
3
rc: 1
boxLen: 10
CPS: *
useVerlet: 0
cellLen: 2.5
numCells: 64
boxVol: 1000
avgAtomsPerCell:
3.5 *
cellSidesPerRc: *
EXTRA SLIDE
Average Atoms per Cell
avAtomsPerCell vs avgTimePerIt
1.4
1.2
avgTimePerIt
1
5000 atoms
4000 atoms
0.8
3000 atoms
2000 atoms
0.6
1000 atoms
0.4
0.2
0
0
2
4
6
8
10
12
avgAtomsPerCell
14
16
18
20
EXTRA SLIDE
Performance Breakdown
EXTRA SLIDE
Adjacency List Savings
Adjacency List Size Comparison
20000
18000
16000
14000
cubic full
# cells
12000
min full
10000
cubic half
8000
cubic half
6000
4000
2000
0
0
2
4
6
8
Cell sides per cutoff radius
10
12
14
Chosen Solution
Single
Range
Query
Chosen approach: chose
cutoff radius, and ignore
particles beyond this.

Ignore
Cutoff radius (Rc)
Involves: moving self-spatial join query (many range
queries)  has numerous applications:
GIS, Computer graphics, etc.
Symmetrical
attractive/
repulsive
forces
Argon
atom
(inert)
O--
–
H+
H+
Water
molecule
+
Spatial
join query:
NOTE:
Direction
forces!
Permanent
dipole
+
–
–
–
Improving Cache Hits through
Spatial Locality
Spatial locality principle: objects close to
referred ones will probably be requested again
in the future.

Unsorted atoms  means many cache misses.
4
5
3
2
1
6
Space Filling Curves
Space-filling curve: A line passing through
every point in a space, in some order (according
to some algorithm).


Resort atom periodically (group by cells & order using curve).
Improves CPU performance “>50% in 2D moving point query” 
worth trying.
Row-wise
Gray curve
Z-ordering
Hilbert curve