Transcript Reliable Multicasting with JGroups Bela Ban, Jan 2004
Reliable Multicasting with JGroups Bela Ban, Jan 2004 [email protected]
http://www.jgroups.org
Overview
API, architecture Protocols Building Blocks Performance Future, Conclusion EBIG, Oakland Jan 21 2004 2
What Is It ?
Toolkit for reliable multicasting Fragmentation Message retransmission Ordering Group membership, membership change notification LAN or WAN based EBIG, Oakland Jan 21 2004 3
License
JGroups is a toolkit (JAR), to be linked against an application Open Source under LGPL Commercial products can use JGroups without having to LGPL their code Modifications to JGroups itself need to be LGPL'ed (if distributed) Dual licensing in the future EBIG, Oakland Jan 21 2004 4
API
Channel: similar to java.net.MulticastSocket
plus group membership, reliability Operations: Create a channel with a set of properties Connect to a group X. Everyone that connects to X will see each other Send a message to all members of X Send a message to a single member EBIG, Oakland Jan 21 2004 5
API
Receive a message Retrieve membership Be notified when members join, leave (including crashes) Disconnect from the group Close the channel EBIG, Oakland Jan 21 2004 6
API
JChannel channel=new JChannel("file://home/bela/default.xml"); channel.
connect
("demo-group"); System.out.println("members are: " + channel.
getView().getMembers()
); Message msg=new Message(null, null, "Hello world"); channel.
send
(msg); Message m=(Message)channel.
receive
(0); System.out.println("received msg from " + m.getSrc() + ": " + m.getObject()); ch.
disconnect
(); ch.
close
(); EBIG, Oakland Jan 21 2004 7
Group topology
EBIG, Oakland Jan 21 2004 8
Architecture of JGroups
Application Building Blocks Channel GMS UNICAST NAKACK FD UDP Application Building Blocks Channel GMS UNICAST NAKACK FD UDP Application Building Blocks Channel GMS UNICAST NAKACK FD UDP Network
Demo
Draw ReplicatedTree: shared state EBIG, Oakland Jan 21 2004 10
Stats
JGroups has ~ 90KLOC 30KLOC protocols 45KLOC main + building blocks 15KLOC unit tests ~ 90 protocols shipped with JGroups Set of well-tested stacks (in XML files) EBIG, Oakland Jan 21 2004 11
Available protocols I
Transport UDP, TCP, TCP_NIO, TUNNEL, JMS, LOOPBACK Discovery PING, TCPPING, TCPGOSSIP, UDPPING Group membership Reliable delivery & FIFO NAKACK, SMACK, UNICAST EBIG, Oakland Jan 21 2004 12
Available protocols II
Failure detection FD, FD_SOCK, FD_PID, FD_SIMPLE, FD_PROB, VERIFY_SUSPECT Security ENCRYPT, SSL ConnectionTable (n/a) Fragmentation (FRAG) State transfer (STATE_TRANSFER) EBIG, Oakland Jan 21 2004 13
Available protocols III
Ordering FIFO, CAUSAL, TOTAL, TOTAL_TOKEN Virtual Synchrony FLUSH, QUEUE, VIEW_ENFORCER Probabilistic Broadcast PBCAST Merging: MERGE(2), MERGEFAST EBIG, Oakland Jan 21 2004 14
Available protocols IV
Distributed message garbage collection STABLE Debugging PERF, TRACE, PRINTOBJS, SIZE, BSH Simulation SHUFFLE, DELAY, DISCARD, DEADLOCK, LOSS, PARTITIONER EBIG, Oakland Jan 21 2004 15
Available protocols V
Dynamic configuration AUTOCONF Flow control FLOW_CONTROL, FC Misc PIGGYBACK, COMPRESS EBIG, Oakland Jan 21 2004 16
Transport
Task Send messages from above to all members in the group, or to a single member Receive messages from NW, pass up stack UDP: multicast and multiple UDP unicast TCP: mcast done by multiple TCP unicasts TUNNEL: send to external router, e.g. through firewall EBIG, Oakland Jan 21 2004 17
Discovery
Task Initial discovery of members Used by GMS to determine coordinator to send JOIN request to Each member returns its own addr, plus the addr of the coordinator Typical response ({A,A}, {B,A}, {C,A}) Wait for n milliseconds or m responses EBIG, Oakland Jan 21 2004 18
Discovery - UDP
Multicast discovery request Each member responds with a unicast UDP datagram (local-addr, coord-addr), back to the sender EBIG, Oakland Jan 21 2004 19
Discovery - TCPGOSSIP
Can be used by both UDP and TCP External GossipServer org.jgroups.stack.GossipServer
Maintains table of
Discovery - TCPGOSSIP
To obtain initial membership for a given group, TCPGOSSIP contacts the GossipServer Membership info does not need to be accurate - only goal is to determine coord to send JOIN request to EBIG, Oakland Jan 21 2004 21
Discovery - TCPPING
Give a set of well known members For discovery, those members are pinged If at least 1 responds, we can find the coordinator Does not require additional process EBIG, Oakland Jan 21 2004 22
Group Membership
Task Maintain a list of members Notify members when a new member joins, or an existing member leaves (or crashes) Each member has the same ordered list List can be retrieved by Channel.getView() First (= oldest) member is coordinator If coord crashes, 2nd oldest takes over EBIG, Oakland Jan 21 2004 23
Group Membership - JOIN
New member uses discovery to find coord If first member -> become coord Else: sends JOIN to coord Coord adds new member to list, multicasts new view (member list) to all members If 2 initial members are started at the same time, MERGE protocol merges them into a single group EBIG, Oakland Jan 21 2004 24
Group Membership - LEAVE
Member sends LEAVE to coord Coord multicasts new view to all members EBIG, Oakland Jan 21 2004 25
Group membership CRASH
Failure detection protocol sends up SUSPECT event VERIFY_SUSPECT double checks GMS multicasts new view (not containing crashed member) If member resurfaces, it will be shunned Has to leave and rejoin group EBIG, Oakland Jan 21 2004 26
Failure detection
Task Detect if a member has crashed and send SUSPECT event up the stack (to be handled by GMS) Logical ring over membership Each member pings its neighbor to the right EBIG, Oakland Jan 21 2004 27
Failure detection - FD
EBIG, Oakland Jan 21 2004 28
Reliable delivery & FIFO
Lossless and FIFO delivery for multicast and unicast messages Multicast: NAK and ACK Unicast: ACK Missing messages (gaps) are retransmitted Sender resends or Receiver requests retransmission EBIG, Oakland Jan 21 2004 29
Encryption
Uses public/private encryption to join new member and get shared group key Shared key is used to encrypt all messages Group key is recomputed on joins/leaves SSL ConnectionTable As alternative, to be used in TCP Uses SSLSocket rather than Socket EBIG, Oakland Jan 21 2004 30
Properties configuration
Plain string format "UDP(mcast_addr=228.8.8.8;mcast_port=45566;ip_ttl=32;" + "mcast_send_buf_size=64000;mcast_recv_buf_size=64000):" + "PING(timeout=2000;num_initial_members=3):" + "MERGE2(min_interval=5000;max_interval=10000):" + "FD_SOCK:" + "VERIFY_SUSPECT(timeout=1500):" + "pbcast.NAKACK(max_xmit_size=8096;gc_lag=50;retransmit_timeout=600,1200,2400):" + "UNICAST(timeout=600,1200,2400,4800):" + "pbcast.STABLE(desired_avg_gossip=20000):" + "FRAG(frag_size=8096;down_thread=false;up_thread=false):" + "pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;" + "shun=false;print_local_addr=true)" URL / XML EBIG, Oakland Jan 21 2004 31
Advantages of protocol stacks
Each property is implemented by 1 prot Fragmentation, retransmission, ordering Protocols are assembled into a stack Stack has exactly the properties needed by the appl / required by the network Can‘t get this with java.net.Socket, always comes with full TCP/IP EBIG, Oakland Jan 21 2004 32
Advantages of protocol stacks
Small scope: a protocol does just one job, but does it well Protocol stacks are fashionable: Servlet 2.3 filters Interceptors (Corba, JBoss) AOP: separation of concerns, e.g. fragmentation should not be an application concern EBIG, Oakland Jan 21 2004 33
Benefits
Same application code, different protocol stacks (deployment issue) Application requirements reflected in protocol stack specification App focuses on domain specific issues EBIG, Oakland Jan 21 2004 34
Building Blocks
Replicated Cache NotificationBus Group RPC EBIG, Oakland Jan 21 2004 35
Replicated Cache
Shared state across a group Any change is replicated to all members New members acquire initial state from coord Structures supported Tree Hashmap Queues EBIG, Oakland Jan 21 2004 36
NotificationBus
Thin layer on Channel Notifications sent to all members Callback when notification is received Hook for state sharing EBIG, Oakland Jan 21 2004 37
Group RPC
Invoke a method call in all members Get a list of responses Wait for all responses, majority, first, or none response (use optional timeout) Handles crashed members correctly (no blocking) EBIG, Oakland Jan 21 2004 38
Serverless JMS
JMS based on JGroups Peer-to-peer architecture rather than C/S Client publishing to a topic Instead of sending msg to server, and server distributes to multiple clients: publisher multicasts message JMS Server just another member Handles persistent messages (DB) EBIG, Oakland Jan 21 2004 46
Serverless JMS
Client/Server Model JMS Server Publisher Subscriber Cost: 4 unicasts Subscriber Subscriber (discard) Serverless Model JMS Server (accept) Publisher Multicast Subscriber (accept) Cost: 1 multicast Subscriber (accept) Subscriber (accept) (discard) EBIG, Oakland Jan 21 2004 47
Serverless JMS
Clients are still able to publish even when server is down Caveat: works in scenario where client and server are in same multicast-reachable NW Status Topics/Queues available No TX/XA, no durable subscriptions, no persistent messages Download (standalone) beta at jboss.org
EBIG, Oakland Jan 21 2004 48
Where is JGroups used ?
JBoss Clustering Replication of entity beans, SLSBs and SFSBs HA-JNDI Cache invalidation Session repl (integrated Tomcat, Jetty) Serverless JMS Cache Replicated transactional clustered cache EBIG, Oakland Jan 21 2004 52
Where is JGroups used ?
Jonas appserver (clustering) GroupPac (FT-CORBA impl) GCT: port to .NET
Replicated Caching OpenSyphony OSCache Jakarta Turbine's JCS Swarmcache EBIG, Oakland Jan 21 2004 53
Where is JGroups used ?
Session replication Jetty Tomcat 4.x
Work in progress on plugin architecture for Tomcat 5.x
Unofficial ones...
EBIG, Oakland Jan 21 2004 54
Performance
4 nodes, 1 or 2 senders 750MHz SunBlade 1000 512MB, 100MB switched ethernet JGroups 2.1
8000 10K msgs, in 200 bursts of 20 (2 senders), sleep after burst = 5ms 451 msgs/s == 4.5MB/s throughput Resident heap size 35MB max (-Xmx128m) EBIG, Oakland Jan 21 2004 55
Performance
1.4 billion messages total 4 nodes, 2 senders Message size = 10K Average msgs/s: 350 Max resident mem: 35M (-Xmx128m) Tests available as part of JG distro Includes gnuplot scripts to generate graphs EBIG, Oakland Jan 21 2004 56
Current and future projects
JBossCache, Serverless JMS Port to J2ME (first version available on www.jgroups-me.org) hsqldb (HyperSonic) database replication JCache JSR 107 compliant impl (JBoss Cache) Potential work on GroupComm JSR jcluster project on dev.java.net
EBIG, Oakland Jan 21 2004 57
Links
www.jgroups.org
"Papers and Articles": link to IBM devworks EBIG, Oakland Jan 21 2004 58