slides - National University of Singapore
Download
Report
Transcript slides - National University of Singapore
Conservative Simulation using
Distributed-Shared Memory
Teo, Y. M., Ng, Y. K. and Onggo, B. S. S.
Department of Computer Science
National University of Singapore
PADS 2002
1
Objectives
Improve performance of SPaDES/Java by
reducing overhead:
Synchronization of events
Distributed communications
Study the memory requirements in parallel
simulations.
PADS 2002
2
Presentation Outline
Parallel Simulation
Null Message Protocol
Performance Improvement
Memory Requirement
Conclusion
PADS 2002
3
Parallel Simulation
Sequential simulations execute on a single
thread in one processor.
Ideally, parallelizing the simulation should
enhance its real-time performance since the
workload is distributed.
The need to maintain causality throughout a
parallel simulation
=> Event synchronization protocols.
=> Adds to inter-process communications.
=> New bottleneck! PADS 2002
4
Null Message Protocol
First designed by Chandy and Misra (1979).
Prevents deadlock situations between LPs.
LPi sends null messages to each of its neighbours
at the end of every simulation pass, with
timestamp = local virtual time of LPi.
Timestamp on null message, T, indicates that the
source LP will not send any messages to other LPs
before T.
PADS 2002
5
Null Message Protocol
Clock = 4
4
4
LP
4
LP
4
LP
4
7
LP
FEL
PADS 2002
6
Performance Improvement
Chandy-Misra-Byrant’s (CMB) protocol performs
poorly due to high null message overhead. It
transmits null msgs on every simulation pass
NMR ~> 1 for nearly all [0, T).
Optimizations incorporated:
Carrier-null message scheme
Flushing mechanism
Demand-driven null message algorithm
Remote communications using JavaSpace
PADS 2002
7
Carrier-Null Message Algorithm
Problem with cyclic topologies
Use carrier-null message algorithm (Wood,
Turner, 1996)
Avoids transmissions of redundant null
messages in such cycles.
PADS 2002
8
Performance Improvement
Demand driven null messaging + flushing
Output Channel (A)
Flusher
20
25
30
35
Logical
Process
(A)
FEL
20
18
35
Logical
Process
(B)
REQ
Request Channel (B)
PADS 2002
9
Performance Evaluation
Experiments conducted using
PC cluster of 8 nodes running RedHat
Linux version 7.0. Each node is a Pentium
II 400 MHz processor with 256 MB of
memory connected through 100 Mbps
switch.
2 benchmark programs
PHOLD system
Linear Pipeline
PADS 2002
10
PHOLD (3x3, m)
Closed system
Node
Node
Node
Node
Node
Node
Node
Node
Node
PADS 2002
11
Linear Pipeline (4, )
Open system
Customer population
Service
Center
Service
Center
Service
Center
Service
Center
Depart
PADS 2002
12
PHOLD (n x n, m)
1
CMB
0.9
0.8
CM B (m=1)
CM B (m=8)
+ Carrier-Null
CM B (m=16)
0.7
Carrier-null (m=1)
NMR
Carrier-null (m=8)
+ Flushing
Carrier-null (m=16)
0.6
Flushing (m=1)
Flushing (m=8)
Flushing (m=16)
0.5
Demand-driven (m=1)
Demand-driven (m=8)
Demand-driven (m=16)
0.4
+ Demand-driven null msging
0.3
0.2
4x4
8x8
Problem Size (n x n)
PADS 2002
16 x 16
13
Linear Pipeline (n, )
1
CMB + Carrier-Null
0.9
CMB / Carrier-null (0.2)
CMB / Carrier-null (0.4)
0.8
CMB / Carrier-null (0.6)
+ Flushing
CMB / Carrier-null (0.8)
NMR
Flushing (0.2)
Flushing (0.4)
0.7
Flushing (0.6)
Flushing (0.8)
Demand-driven (0.2)
Demand-driven (0.4)
0.6
Demand-driven (0.6)
+ Demand-driven null msging
Demand-driven (0.8)
0.5
0.4
4
8
12
16
Problem size (n)
PADS 2002
14
Performance Summary
%tage Reduction in NMR:
PHOLD system
CMB Carrier-null 30%
Flushing incorporated 42%
Demand-driven null msg 55%
Linear Pipeline
CMB Carrier-null 0%
Flushing incorporated 23%
Demand-driven null msg 35%
PADS 2002
15
Distributed Communications
Originally, SPaDES/Java uses the RMI library
to transmit messages between remote LPs. But
the serialization phase presents a bottleneck.
Previous performance optimization effort:
message deflation.
Only
solution
to
overcome
remote
communications overhead => send less
messages. How?
Target at null messages.
PADS 2002
16
JavaSpaces
A special Java-Jini service developed by Sun
Microsystems, Inc., built on top of Java’s RMI,
mimicking a tuple space.
Abstract platform for developing complex
distributed applications.
Distributed data persistence.
Holds objects, known as entries, with variable
attribute types.
Key concept: matching of attribute types/values.
PADS 2002
17
JavaSpaces
4 generic operations: write, read, take and notify.
read
take
write
Notifier
Client
notify
Client
PADS 2002
18
Distributed Communications
Replace the RMI communication module in
SPaDES/Java with one running on a single
JavaSpace.
Use a FrontEndSpace: permits crash
recovery of entries in the space.
Transmission
of processes and null
messages between remote hosts go through
theFrontEndSpace as space entries.
PADS 2002
19
Space Communications :
Processes
Time
==
t >0 0
Time
SProcess
SProcess
SProcess
sender = 2
receiver = 2
receiver = 1
receiver = 1
……..
LP1
LP2
PADS 2002
20
Space Communications :
Null Messages
LP4
Req
sender = 2
LP1
NullMsg
Req
sender = 2
sender = 2
……..
LP2
PADS 2002
LP3
21
Performance Evaluation –
PHOLD(n x n, m)
0.55
0.5
RM I/JavaSpace (1processo r, m=1)
RMI
RM I/JavaSpace (1processo r, m=8)
0.45
RM I/JavaSpace (1processo r, m=16)
RM I (4 processo rs, m=1)
RM I (4 processo rs, m=8)
0.4
RM I (4 processo rs, m=16)
NMR
RM I (8 processo rs, m=1)
RM I (8 processo rs, m=8)
0.35
RM I (8 processo rs, m=16)
JavaSpace (4 procs)
JavaSpace (4 processo rs, m=1)
JavaSpace (4 processo rs, m=8)
JavaSpace (4 processo rs, m=16)
0.3
JavaSpace (8 processo rs, m=1)
JavaSpace (8 processo rs, m=8)
JavaSpace (8 processo rs, m=16)
0.25
JavaSpace (8 procs)
0.2
4x4
8x8
PADS 2002
Problem Size (n x n)
16 x 16
22
Overall Performance Evaluation –
PHOLD(n x n, m)
1
CMB
0.9
CM B (m=1)
CM B (m=8)
CM B (m=16)
0.8
Carrier-null (m=1)
+ Carrier-Null
Carrier-null (m=8)
0.7
Carrier-null (m=16)
NMR
Flushing (m=1)
+ Flushing
0.6
Flushing (m=8)
Flushing (m=16)
Demand-driven (m=1)
Demand-driven (m=8)
0.5
Demand-driven (m=16)
+ Demand-driven null msging
JavaSpace [4 pro cs] (m=1)
0.4
JavaSpace [4 pro cs] (m=8)
JavaSpace (4 procs)
JavaSpace [4 pro cs] (m=16)
JavaSpace [8 pro cs] (m=1)
0.3
JavaSpace [8 pro cs] (m=8)
JavaSpace (8 procs)
JavaSpace [8 pro cs] (m=16)
0.2
4x4
8x8
PADS 2002
Problem Size (n x n)
16 x 16
23
Performance Summary
%tage Reduction in NMR:
CMB Carrier-null 30%
Flushing incorporated 42%
Demand-driven null msg 55%
JavaSpace (4 processors) 63%
JavaSpace (8 processors) 74%
PADS 2002
24
Memory Requirement
Mprob
ni=1 MaxQueueSize(LPi)
Mord
ni=1 MaxFELSize(LPi)
Msync
ni=1 MaxNullMsgBufferSize(LPi)
PADS 2002
25
Memory Requirement
Space Usage
0.2
Mprob
Mord
Msy nc (RMI)
Msy nc (JavaSpaces)
M (RMI)
M (JavaSpaces)
98
50
331
305
479
453
PIPELINE (16, p)
p
0.4
0.6
192
52
341
308
585
552
320
54
348
311
722
685
PADS 2002
0.8
740
56
352
312
1148
1108
PHOLD (16x16, m)
m
1
8
16
256
2048
4096
665
347
921
603
651
332
2699
2380
638
317
4734
4413
26
Achievements & Conclusion
Enhanced
the performance of SPaDES/Java
through various synchronization protocols,
achieving an excellent NMR of < 30%.
Implemented
a brand new discrete-event
simulation library based on the concept of shared
memory in a JavaSpace.
Implemented a TSA into SPaDES/Java that can be
used as a bench for memory usage studies in
parallel simulations.
PADS 2002
27
Acknowledgments
Port of Singapore Authority (PSA)
Ministry of Education, Singapore
Constructive feed-back from referees
PADS 2002
28
References
SPaDES/Java homepage
http://www.comp.nus.edu.sg/~pasta/spades-java/spadesJava.html
Current project webpage
http://www.comp.nus.edu.sg/~ngyewkwo/HYP.html
MSG homepage
http://www.comp.nus.edu.sg/~rpsim/MSG
PADS 2002
29