Transcript MRG8
TTI 2008@ Tuscany, Italy.
Some New Random Number Generators
(Multi Recursive Generator)
Tested on CASINO
Ryo Maezono
Japan Advanced Institute of Science and Technology,
Kanazawa, Japan.
QuickTimeý Dz
T IF F Åià•
èkǻǵŠj êLí£ÉvÉçÉOÉâÉÄ
ǙDZÇà ÉsÉNÉ `ÉÉ Ç¾å©ÇÈǞǽDžÇÕïK óvÇÇ•
ÅB
Collaborators…
QuickTimeý Dz
êLí£Év ÉçÉ OÉ âÉÄ
ǙDZÇÃÉs ÉNÉ`ÉÉǾå©ÇÈǞǽDžÇÕïKóvÇÇ•
ÅB
Dr. Kenta Hongo
(Maezono group)
Prof. Ken-ichi Miura
(National Institute of Informatics)
QuickTimeý Dz
T IF F Åià•
èkǻǵŠj êLí£ÉvÉçÉOÉâÉÄ
ǙDZÇà ÉsÉNÉ `ÉÉ Ç¾å©ÇÈǞǽDžÇÕïK óvÇÇ•
ÅB
Summary
- A new Random Number Generator (RNG)
“MRG8”
developed by L’Eucuyer and Miura.
- Much Simpler (only 33 lines!) than recent fancy RNG.
Theoretically clear as well.
It Shows good performance...
- Seems equivalent to Ranlux4 with less cost than Ranlux3.
QuickTimeý Dz
T IF F Åià•
èkǻǵŠj êLí£ÉvÉçÉOÉâÉÄ
ǙDZÇà ÉsÉNÉ `ÉÉ Ç¾å©ÇÈǞǽDžÇÕïK óvÇÇ•
ÅB
Random Number Generator
(RNG)
- A kernel component of QMC.
- What is the point for us?
- Auto-correlation length and Bias, mostly.
- No “serious” RNG required such as,
- Physical RNG (based on thermal noise)
- Cryptographic RNG (based on thermal noise)
... These matter only on Cryptograph, Gambles etc.
©前園涼・本郷研太2008
Points for us
- Auto-correlation Length
Better RNG has less correlation.
- Bias
→ Shorter Blocking Length.
→ Effectively costless accumulation.
Worse RNG has less homogeneous distribution
(Sparse Lattice Structure)
→ Sampling would be biased.
→ Biased results.
(3N-dim in our simulation)
©前園涼・本郷研太2008
Representative RNG
(pseudo RNG)
- Linear Congruential Method
Xn a1 Xn1 mod p
:-) Simple, Costless and Fast.
:-( Sparse Hyper plane → Biased sampling
1) Recursive generation is just 1st. order.
2) Famous story about “RANDU” on IBM360/370.
... because of further reason of bad choice of a1 16807
(N.B., Not the genuine drawback of Congruential method in general)
- Feedback Shift Register Method
... using ‘ ’ with more than two Xn k
- Generalization for Higher Order
(Topics of the study here)
... using ‘ ’ with more than two Xn k
©前園涼・本郷研太2008
(Not the target of the study here, brief overview)
Further developments (1)
on Feedback Shift Register Method
... using ‘ ’ with more than two Xn k
Long period sequence with practically easy implementation.
In practical situation, however, ...
Part of it is used.
Wrong “period” appears...
Further improved GFSRs... (Generalized Feedback Shift Register)
include “Mersenne Twister”
(Matsumoto&Nishimura, 1996)
©前園涼・本郷研太2008
Further developments(2)
on Higher order congruential-based.
... using ‘ ’ with more than two Xn k
Include...
- “Fibonacci-based”
- “Subtract-with-Borrow” and its relatives
Further ‘tune-up’-ed → “Ranlux”
(Main target of the study here)
(implemented in CASINO)
- “Multiple Recursive Generator (MRG)”
Xn a1 Xn1 a2 Xn2 L ak Xnk mod p
has been known as a simple/powerful RNG, but...
How to choose coeffs
ak ?
(Knuth’s criteria)
it had been the practical obstacle.
©前園涼・本郷研太2008
Desired properties of RNG
“Multiplicative Recursive Generator”
has several
Desired properties :
- Costless and Fast.
N.B.) “Mersenne Twister” is said to be ‘Good’ as well.
- Theoretically simple
N.B.) Non-linear congruential methods have no firm theoretical background.
- Easy to be accelerated (Vector/Parallel, discussed later).
©前園涼・本郷研太2008
Choice of coefficients
“Knuth’s criteria”
Xn a1 Xn1 a2 Xn2 L ak Xnk mod p
Choosing coeffs ak
so that it has
the Longest possible Period
P pk 1
Characteristic Polynomial
f z z k a1z k 1 L ak 1z ak
Galois Field
is a primitive polynomial on GF(p)
Choose ak so that f z can be factorized (mod p) in proper way.
Random Search for ak
(Pierre L’Ecuyer@Univ. Montreal)
pk 1
~ Factorization of r
p 1
©前園涼・本郷研太2008
勘所は...
- 計算機による數値演算とは、いわば有限體上での元操作である。
- 疑似乱數生成とは、
有限體上での元から元への射影操作を生成する漸化式のうち
周期が恐ろしく長いものを實現するという事である。
- したがって其の本質は所与の有限體の代數構造で決まっている。
©前園涼・本郷研太2008
MRG8
Xn a1 Xn1 a2 Xn2 L ak Xnk mod p
A good choice obtained for k=8 and p 2 1
31
a5= 189748160
a6= 1984088114
a7= 626062218
a8= 1927846343
a1= 1089656042
a2= 1906537547
a3= 1764115693
a4= 1304127872
(Found/Chosen by P. L’Ecuyer@Univ. Montreal)
A RNG named “MRG8”
8
P 2 1 4.5 10 74
31
(implemented/tested by Prof. Miura)
Only 33 lines !
©前園涼・本郷研太2008
Point (1)
Quality of RNG is critically depending on Coef. Choice.
(Sparse Lattice structure)
a1= 1089656042
a2= 1906537547
a3= 1764115693
a4= 1304127872
a5= 189748160
a6= 1984088114
a7= 626062218
a8= 1927846343
L’Ecuyer’s group carefully choose/test them.
Again...
Famous horror about “RANDU” on IBM360/370.
- Recursive generation is just 1st. order.
- Bad choice of coefficients.
©前園涼・本郷研太2008
Point (2)
Xn a1 Xn1 a2 Xn2 L ak Xnk mod p
On 64-bit architecture
31
The choice of p 2 1 makes the implementation quite simple/fast!
(No dividing operation required)
Because...
z Z63 L Z32 Z31 L Z2 Z1 2 : z1·231 z2
then,
(binary description on 64-bit architecture)
z z1 ·2 31 z1 z1 z2 z1 2 31 1 z1 z2
∴
z mod 2 31 1 z1 z2 shift z,31 and z,2 31 1
(No dividing operation required)
∵)
z1 0L 0Z63 L Z32 2 shift z,31
z2 0L 0Z31 L Z2 Z1 2 Z63 L Z32 Z31 L Z2 Z1 2 .and.0L 01L 1 and z,231 1
©前園涼・本郷研太2008
Statistical Tests
Done by Miura or L’Ecuyer…
(To be completed)
©前園涼・本郷研太2008
QMC Test
(combined with CASINO-v1.8)
Tests using G1 set
1.02604152729604E- 04
Ground St. Ene. (a.u.)
SO2 (VMC)
-548.205
(16)
-548.206
-548.207
(# of step = 300 million)
RANLUX-0
(4)
-548.208
MRG8
(Blocking length)
(16)
-1
(1)
-2
(2)
-3
(2)
-548.209
RANLUX-4
©前園涼・本郷研太2008
Notes on Ranlux
“Tuned-up” version of SwB algorithm
(Martin & Luescher, 1993)
Plucking of sequence to reduce auto-correlation
“RANLUX-0” = “Subtract with Borrow”
“RANLUX-1”
More plucking, better performance in Spectrum Test
“RANLUX-2”
“RANLUX-3”
N.B.) “RANLUX-0” = “Subtract with Borrow”
~ “1st. order Linear congruential method”
(an effective implementation with very large prime number)
(Tezuka & L’Ecuyer, 1992)
Consistent with D.P. Landau’s work by Monte Carlo.
©前園涼・本郷研太2008
Tests using G1 set
1.02604152729604E- 04
Ground St. Ene. (a.u.)
SO2 (VMC)
-548.205
(16)
-548.206
-548.207
(# of step = 300 million)
RANLUX-0
(4)
(Blocking length)
(16)
-1
-548.208
MRG8
(1)
-2
(2)
-3
(2)
-548.209
RANLUX-4
... can be viewed as 1st. order linear congruential RNG
as well as the ‘Subtract with Borrow’ RNG.
©前園涼・本郷研太2008
1.02604152729604E- 04
SO2 (DMC)
-548.560
-548.564 (512)
RANLUX-0
(2048)
MRG8
# of config. = 10,000
(Blocking length)
(256)
(512)
-548.562
-548.566
# of step = 40,000
(8192)
-3
(2048)
-1
-2
RANLUX-4
©前園涼・本郷研太2008
Source of Different Bias in DMC/VMC
1) Generating Random Walk (VMC/DMC)
2) Metropolis reject/accept (VMC/DMC)
3) Branching
(DMC)
©前園涼・本郷研太2008
Timing Info
SO2, DMC.
User
MRG8
# of step = 1,000
System
# of config. = 10,000
Total CPU
% CPU
176.0448
189.3554
365.6764
0.42
RANLUX-0
101.2084
190.2169
290.9649
0.33
RANLUX-1
127.7751
192.9557
321.1352
0.36
RANLUX-2
176.5535
183.0453
359.7706
0.42
RANLUX-3
304.1154
182.4181
486.7478
0.57
RANLUX-4
470.0464
182.1101
652.6842
0.76
(second)
©前園涼・本郷研太2008
He & PH2 (VMC)
(Blocking length)
He
1.02604152729604E- 04
# of step = 1000 milion
-2.903686
-342.235
# of step = 150 milion
(1)
-342.236
-2.903688
-2.903690
PH2
1.02604152729604E- 04
(4)
-342.237
(2) (1)
(1)
(1)
-2.903692
MRG8
-0
(1)
-1 -2 -3
RANLUX
(1)
-342.238
-342.239
-4
(4) (2)
(16)
-342.240
MRG8
(1)
-0
-1 -2 -3
RANLUX
-4
©前園涼・本郷研太2008
He & PH2 (DMC)
He
1.02604152729604E- 04
-2.90371
-2.90372
PH2
# of step = 50,000
# of config. = 10,000
(1024)
(1024)
-2.90374
(Blocking length)
(2048)
MRG8
(1024)
-342.473
-342.475
(512)
(4096)
-0
# of step = 30,000
# of config. = 10,000
-342.474
(1024)
-2.90373
1.02604152729604E- 04
-1 -2 -3
RANLUX
(1024)
(512)(1024)
(2048)
(512)
-342.476
-342.478
-4
MRG8
-0
-1 -2 -3
RANLUX
-4
©前園涼・本郷研太2008
System dependence
- (Bias of results) ~ (Homogeneity of RNG)
gets worse in higher dim. of sampling space…
(3N-dim’ in our case)
Systems with larger # of electrons are interesting.
- Sampling space has nodal structure.
“All-electron systems” and “Pseudo Potential systems”
Should examine Both.
differ in its character.
©前園涼・本郷研太2008
SH4 (VMC/DMC, Pseudo Potential calc.)
(VMC)
-6.28928
-6.28944
(32)
(16)
(64)
(32)
(DMC)
-6.3055
(32)
(16)
-6.3060
-6.3065
-6.28952
E/H
E(SiH4-
-6.28936
(Blocking length)
(512)
(1024)
(2048)
(512)
-6.3070
(512)
-6.28960
-6.3075
-6.28968
-6.28976
-6.28984
-6.3080
MRG8
-0
-1
-2 -3
RANLUX
-4
# of step = 10 million
-6.3085
MRG8
(2048)
-0
-1
-2 -3
RANLUX
-4
# of step = 30,000
# of config. = 10,000
©前園涼・本郷研太2008
QMC Test
(combined with CASINO-v1.8)
“A Research Background”
RNG research people are interested in it.
c.f.) D.P. Landau’s test of RNG by QMC (Ising model)
Better RNG in spectrum tests gives
not always better performance on application.
RNG people start to consider
“harmony of RNG” depending on applications”
Discussions
1) MRG8 always give the same answer as Ranlux4
2) MRG8 is costless than Ranlux3.
3) MRG8 gives no significant improvement on Blocking length.
4) Difference of Bias appeared in DMC/VMC
1) Generating Random Walk
(VMC/DMC)
2) Metropolis reject/accept
(VMC/DMC)
3) Branching
(DMC)
©前園涼・本郷研太2008
Acceleration of RNG
Xn a1 Xn1 a2 Xn2 L ak Xnk mod p
can be written as
Generating Sequence
X1,L , X8 X9
X2 ,L , X8 , X9 X10
X2 ,L , X9 , X10 X11
i.e.,
a1 a2 L
Xn
X 0 1 0L
n 1
0
O
M M
M
X
n k 1
0 0 0L
ak
X n 1
0
Xn 2
M M
X n k
1
: A
X8
X
7 A X 9 A X10 L
M
X
1
(Normal Sequence)
©前園涼・本郷研太2008
Acceleration (cont’d)
a1 a2 L
Xn
X 0 1 0L
n 1
0
O
M M
M
X
n k 1
0 0 0L
ak
X n 1
0
Xn 2
M M
X n k
1
: A
Having evaluated A n in advance...
X10
X9
X8
X
X
X
9
8
2
A A 7
M
M
M
X
X
X
3
2
1
X8 n
X8
X
X
8 n 1
n
A 7
M
M
X
X
1 n
1
©前園涼・本郷研太2008
Acceleration (cont’d)
n
Evaluation of A can be made faster by ...
- Contemporary Compiler (Vectorization)
- Intra-node parallelization (“thread parallel”)
Then...
X8
X
7
M
X
1
A X9 A X10 A X11 A X12 L
(normal ‘serial’ generation)
A
A2
A3
A4
X9
A
X10
A2
X11
A3
X12
A4
X13
A
X14
A2
X15
A3
X16
A4
X17
X18
L
X19
X20
(Accelerated generation)
©前園涼・本郷研太2008