Reliable Multicasting with JGroups Bela Ban, Jan 2004

Download Report

Transcript Reliable Multicasting with JGroups Bela Ban, Jan 2004

Reliable Multicasting with JGroups Bela Ban, Jan 2004 [email protected]

http://www.jgroups.org

Overview

     API, architecture Protocols Building Blocks Performance Future, Conclusion EBIG, Oakland Jan 21 2004 2

What Is It ?

  Toolkit for reliable multicasting     Fragmentation Message retransmission Ordering Group membership, membership change notification LAN or WAN based EBIG, Oakland Jan 21 2004 3

License

   JGroups is a toolkit (JAR), to be linked against an application Open Source under LGPL   Commercial products can use JGroups without having to LGPL their code Modifications to JGroups itself need to be LGPL'ed (if distributed) Dual licensing in the future EBIG, Oakland Jan 21 2004 4

API

  Channel: similar to java.net.MulticastSocket

 plus group membership, reliability Operations:     Create a channel with a set of properties Connect to a group X. Everyone that connects to X will see each other Send a message to all members of X Send a message to a single member EBIG, Oakland Jan 21 2004 5

API

     Receive a message Retrieve membership Be notified when members join, leave (including crashes) Disconnect from the group Close the channel EBIG, Oakland Jan 21 2004 6

API

JChannel channel=new JChannel("file://home/bela/default.xml"); channel.

connect

("demo-group"); System.out.println("members are: " + channel.

getView().getMembers()

); Message msg=new Message(null, null, "Hello world"); channel.

send

(msg); Message m=(Message)channel.

receive

(0); System.out.println("received msg from " + m.getSrc() + ": " + m.getObject()); ch.

disconnect

(); ch.

close

(); EBIG, Oakland Jan 21 2004 7

Group topology

EBIG, Oakland Jan 21 2004 8

Architecture of JGroups

Application Building Blocks Channel GMS UNICAST NAKACK FD UDP Application Building Blocks Channel GMS UNICAST NAKACK FD UDP Application Building Blocks Channel GMS UNICAST NAKACK FD UDP Network

Demo

  Draw ReplicatedTree: shared state EBIG, Oakland Jan 21 2004 10

Stats

   JGroups has ~ 90KLOC  30KLOC protocols  45KLOC main + building blocks  15KLOC unit tests ~ 90 protocols shipped with JGroups Set of well-tested stacks (in XML files) EBIG, Oakland Jan 21 2004 11

Available protocols I

    Transport  UDP, TCP, TCP_NIO, TUNNEL, JMS, LOOPBACK Discovery  PING, TCPPING, TCPGOSSIP, UDPPING Group membership Reliable delivery & FIFO  NAKACK, SMACK, UNICAST EBIG, Oakland Jan 21 2004 12

Available protocols II

    Failure detection  FD, FD_SOCK, FD_PID, FD_SIMPLE, FD_PROB, VERIFY_SUSPECT Security  ENCRYPT, SSL ConnectionTable (n/a) Fragmentation (FRAG) State transfer (STATE_TRANSFER) EBIG, Oakland Jan 21 2004 13

Available protocols III

    Ordering  FIFO, CAUSAL, TOTAL, TOTAL_TOKEN Virtual Synchrony  FLUSH, QUEUE, VIEW_ENFORCER Probabilistic Broadcast  PBCAST Merging:  MERGE(2), MERGEFAST EBIG, Oakland Jan 21 2004 14

Available protocols IV

   Distributed message garbage collection  STABLE Debugging  PERF, TRACE, PRINTOBJS, SIZE, BSH Simulation  SHUFFLE, DELAY, DISCARD, DEADLOCK, LOSS, PARTITIONER EBIG, Oakland Jan 21 2004 15

Available protocols V

   Dynamic configuration  AUTOCONF Flow control  FLOW_CONTROL, FC Misc  PIGGYBACK, COMPRESS EBIG, Oakland Jan 21 2004 16

Transport

    Task  Send messages from above to all members in the group, or to a single member  Receive messages from NW, pass up stack UDP: multicast and multiple UDP unicast TCP: mcast done by multiple TCP unicasts TUNNEL: send to external router, e.g. through firewall EBIG, Oakland Jan 21 2004 17

Discovery

   Task   Initial discovery of members Used by GMS to determine coordinator to send JOIN request to Each member returns its own addr, plus the addr of the coordinator  Typical response ({A,A}, {B,A}, {C,A}) Wait for n milliseconds or m responses EBIG, Oakland Jan 21 2004 18

Discovery - UDP

  Multicast discovery request Each member responds with a unicast UDP datagram (local-addr, coord-addr), back to the sender EBIG, Oakland Jan 21 2004 19

Discovery - TCPGOSSIP

  Can be used by both UDP and TCP External GossipServer      org.jgroups.stack.GossipServer

Maintains table of Each member registers (groupname, own addr) Lease based - members have to periodically renew registration Multiple GossipServers possible EBIG, Oakland Jan 21 2004 20

Discovery - TCPGOSSIP

  To obtain initial membership for a given group, TCPGOSSIP contacts the GossipServer Membership info does not need to be accurate - only goal is to determine coord to send JOIN request to EBIG, Oakland Jan 21 2004 21

Discovery - TCPPING

    Give a set of well known members For discovery, those members are pinged If at least 1 responds, we can find the coordinator Does not require additional process EBIG, Oakland Jan 21 2004 22

Group Membership

 Task       Maintain a list of members Notify members when a new member joins, or an existing member leaves (or crashes) Each member has the same ordered list List can be retrieved by Channel.getView() First (= oldest) member is coordinator If coord crashes, 2nd oldest takes over EBIG, Oakland Jan 21 2004 23

Group Membership - JOIN

   New member uses discovery to find coord  If first member -> become coord  Else: sends JOIN to coord Coord adds new member to list, multicasts new view (member list) to all members If 2 initial members are started at the same time, MERGE protocol merges them into a single group EBIG, Oakland Jan 21 2004 24

Group Membership - LEAVE

  Member sends LEAVE to coord Coord multicasts new view to all members EBIG, Oakland Jan 21 2004 25

Group membership CRASH

    Failure detection protocol sends up SUSPECT event VERIFY_SUSPECT double checks GMS multicasts new view (not containing crashed member) If member resurfaces, it will be shunned  Has to leave and rejoin group EBIG, Oakland Jan 21 2004 26

Failure detection

 Task   Detect if a member has crashed and send SUSPECT event up the stack (to be handled by GMS) Logical ring over membership  Each member pings its neighbor to the right EBIG, Oakland Jan 21 2004 27

Failure detection - FD

EBIG, Oakland Jan 21 2004 28

Reliable delivery & FIFO

  Lossless and FIFO delivery for multicast and unicast messages  Multicast: NAK and ACK  Unicast: ACK Missing messages (gaps) are retransmitted   Sender resends or Receiver requests retransmission EBIG, Oakland Jan 21 2004 29

Encryption

    Uses public/private encryption to join new member and get shared group key Shared key is used to encrypt all messages Group key is recomputed on joins/leaves SSL ConnectionTable   As alternative, to be used in TCP Uses SSLSocket rather than Socket EBIG, Oakland Jan 21 2004 30

Properties configuration

             Plain string format  "UDP(mcast_addr=228.8.8.8;mcast_port=45566;ip_ttl=32;" + "mcast_send_buf_size=64000;mcast_recv_buf_size=64000):" + "PING(timeout=2000;num_initial_members=3):" + "MERGE2(min_interval=5000;max_interval=10000):" + "FD_SOCK:" + "VERIFY_SUSPECT(timeout=1500):" + "pbcast.NAKACK(max_xmit_size=8096;gc_lag=50;retransmit_timeout=600,1200,2400):" + "UNICAST(timeout=600,1200,2400,4800):" + "pbcast.STABLE(desired_avg_gossip=20000):" + "FRAG(frag_size=8096;down_thread=false;up_thread=false):" + "pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;" + "shun=false;print_local_addr=true)" URL / XML EBIG, Oakland Jan 21 2004 31

Advantages of protocol stacks

    Each property is implemented by 1 prot  Fragmentation, retransmission, ordering Protocols are assembled into a stack Stack has exactly the properties needed by the appl / required by the network Can‘t get this with java.net.Socket, always comes with full TCP/IP EBIG, Oakland Jan 21 2004 32

Advantages of protocol stacks

  Small scope: a protocol does just one job, but does it well Protocol stacks are fashionable:    Servlet 2.3 filters Interceptors (Corba, JBoss) AOP: separation of concerns, e.g. fragmentation should not be an application concern EBIG, Oakland Jan 21 2004 33

Benefits

   Same application code, different protocol stacks (deployment issue) Application requirements reflected in protocol stack specification App focuses on domain specific issues EBIG, Oakland Jan 21 2004 34

Building Blocks

   Replicated Cache NotificationBus Group RPC EBIG, Oakland Jan 21 2004 35

Replicated Cache

    Shared state across a group Any change is replicated to all members New members acquire initial state from coord Structures supported    Tree Hashmap Queues EBIG, Oakland Jan 21 2004 36

NotificationBus

    Thin layer on Channel Notifications sent to all members Callback when notification is received Hook for state sharing EBIG, Oakland Jan 21 2004 37

Group RPC

    Invoke a method call in all members Get a list of responses Wait for all responses, majority, first, or none response (use optional timeout) Handles crashed members correctly (no blocking) EBIG, Oakland Jan 21 2004 38

Serverless JMS

    JMS based on JGroups Peer-to-peer architecture rather than C/S Client publishing to a topic  Instead of sending msg to server, and server distributes to multiple clients: publisher multicasts message JMS Server just another member  Handles persistent messages (DB) EBIG, Oakland Jan 21 2004 46

Serverless JMS

Client/Server Model JMS Server Publisher Subscriber Cost: 4 unicasts Subscriber Subscriber (discard) Serverless Model JMS Server (accept) Publisher Multicast Subscriber (accept) Cost: 1 multicast Subscriber (accept) Subscriber (accept) (discard) EBIG, Oakland Jan 21 2004 47

Serverless JMS

   Clients are still able to publish even when server is down Caveat: works in scenario where client and server are in same multicast-reachable NW Status  Topics/Queues available  No TX/XA, no durable subscriptions, no persistent messages  Download (standalone) beta at jboss.org

EBIG, Oakland Jan 21 2004 48

Where is JGroups used ?

 JBoss    Clustering  Replication of entity beans, SLSBs and SFSBs  HA-JNDI  Cache invalidation  Session repl (integrated Tomcat, Jetty) Serverless JMS Cache  Replicated transactional clustered cache EBIG, Oakland Jan 21 2004 52

Where is JGroups used ?

    Jonas appserver (clustering) GroupPac (FT-CORBA impl) GCT: port to .NET

Replicated Caching    OpenSyphony OSCache Jakarta Turbine's JCS Swarmcache EBIG, Oakland Jan 21 2004 53

Where is JGroups used ?

  Session replication    Jetty Tomcat 4.x

Work in progress on plugin architecture for Tomcat 5.x

Unofficial ones...

EBIG, Oakland Jan 21 2004 54

Performance

    4 nodes, 1 or 2 senders 750MHz SunBlade 1000 512MB, 100MB switched ethernet JGroups 2.1

8000 10K msgs, in 200 bursts of 20 (2 senders), sleep after burst = 5ms   451 msgs/s == 4.5MB/s throughput Resident heap size 35MB max (-Xmx128m) EBIG, Oakland Jan 21 2004 55

Performance

      1.4 billion messages total 4 nodes, 2 senders Message size = 10K Average msgs/s: 350 Max resident mem: 35M (-Xmx128m) Tests available as part of JG distro  Includes gnuplot scripts to generate graphs EBIG, Oakland Jan 21 2004 56

Current and future projects

     JBossCache, Serverless JMS Port to J2ME (first version available on www.jgroups-me.org) hsqldb (HyperSonic) database replication JCache JSR 107 compliant impl (JBoss Cache) Potential work on GroupComm JSR  jcluster project on dev.java.net

EBIG, Oakland Jan 21 2004 57

Links

 www.jgroups.org

 "Papers and Articles": link to IBM devworks EBIG, Oakland Jan 21 2004 58

Questions ?