JGroups Protocol GMS
External
- JGroups Wiki http://community.jboss.org/wiki/JGroupsPbcastGMS
Internal
Overview
Group Membership Service (pbcast.GMS) is the protocol responsible for joining/leaving members. Also handles suspected members, and excludes them from the membership. Emits and sends views to all members when a membership change has occurred.
Suspected Member Exclusion
If a SUSPECT message survives VERIFY_SUSPECT verification, it is passed up the stack and reaches the GMS layer, which computes and multicasts a new view. The new view does not contain the suspect member:
2016-02-29 09:25:24,110 [TRACE] [org.jgroups.protocols.pbcast.GMS] (ViewHandler,TEST-CLUSTER,host01/tcp) -- host01/tcp: joiners=[], suspected=[host02/tcp], leaving=[], new view: [host01/tcp|3] (2) [host01/tcp, host03/tcp] 2016-02-29 09:25:24,110 [TRACE] [org.jgroups.protocols.pbcast.GMS] (ViewHandler,TEST-CLUSTER,host01/tcp) -- host01/tcp: mcasting view [host01/tcp|3] (2) [host01/tcp, host03/tcp] (2 mbrs)
The log sequence above shows how "host02" is removed from the [host01, host02, host03] view.
This is how the VIEW message looks at the transport level:
2016-02-29 09:25:24,125 [TRACE] [org.jgroups.protocols.TCP] (ViewHandler,TEST-CLUSTER,host01/tcp) -- null: sending msg to null, src=host01/tcp, headers are GMS: GmsHeader[VIEW], NAKACK2: [MSG, seqno=434687], TCP: [channel_name=TEST-CLUSTER]
The new view is installed at the GMS level - including for the coordinator that originated it - when the multicast message propagates up from the transport to the GMS protocol. The DEBUG message shown below should be visible in all members' logs:
2016-02-29 09:25:24,157 [TRACE] [org.jgroups.protocols.pbcast.GMS] (Cluster Dispatch) -- host01/tcp: received delta view [host01/tcp|3], ref-view=[host01/tcp|2], left=[host02/tcp] 2016-02-29 09:25:24,157 [DEBUG] [org.jgroups.protocols.pbcast.GMS] (Cluster Dispatch) -- host01/tcp: installing view [host01/tcp|3] (2) [host01/tcp, host03/tcp]
Configuration
<pbcast.GMS print_local_addr="true" join_timeout="3000" shun="true" view_bundling="true" view_ack_collection_timeout="5000" resume_task_timeout="7500"/>
!!!Implementation Details - What Happens on Coordinator when it Receives a Join Request
Traced on JGroups 2.6.6.
The GMS protocol receives a JOIN_REQ event; the JOIN_REQ event was sent by the member who wants to join the group. Upon receiving, the GMS protocol submits the event to the GMS' ViewHandler instance, where it is queued up to be processed on a different thread ("ViewHandler,<group-name>,127.0.0.1:50324").
The ViewHandler thread pushes the event (now a "Request") to the CoordGmsImpl instance.
During the processing, a new View instance is created, and it is immediately pushed up and down the protocol stack attached to a PREPARE_VIEW event.
Then GMS starts a flush, using the new view, but only if FLUSH is present in the protocol stack.
Then it sends a SUSPEND_STABLE event down the stack. The event propagates down and suspends the STABLE protocol, with a timeout of 30 seconds.
Then is sends a GET_DIGEST_EVT down the stack and waits for a GET_DIGEST_OK response or a timeout, whichever occurs first. The GET_DIGEST_EVT is handled by the NAKACK protocol.
Then a JoinRsp instance is created.
Then the GMS protocol creates a Message with a GmsHeader.VIEW header containing the new view and the digest to be broadcasted to all members of the group, except to itself. The acknowledgments for the view change from all members are expected, and there's a special mechanism (the AckCollector) designed to collect them.
Then it sends a local TMP_VIEW up and down the stack. This is needed by certain layers (NAKACK) to compute correct digest in case client's next request (e.g. getState()) reaches us before our own view change multicast. Check NAKACK's TMP_VIEW handling for details.
Then it sends the Message with the GmsHeader.VIEW down the stack, the ViewHandler thread blocks on the AckCollector instance until all view change acknowledgments are received - or a timeout of 2 seconds occurs.
Why waiting on ack_collector twice?
Then it sends a RESUME_STABLE down the stack.
[[...]
If the join process takes a long time, the messages sent down on coordinator's stack during the join process take a long time to be delivered to the new member. See "How to free the stuck issue on the view change?" thread in NetBase.
!!!Member not allowed to become initial coordinator
Old JGroups versions used to allow members to be configured so they WON'T become coordinators on initial startup. This was achieved by setting the "disable_initial_coord" GMS configuration attribute to "true".
If "disable_initial_coord" was set to "true", that member's GMS protocol instance would simply continue to loop upon receiving a null initial membership (meaning there were no other members in the group) and wait for a non-null initial membership. The constrain would not extent beyond startup, though. Any member configured with disable_initial_coord=true could actually become coordinator - later, as a consequence of a previous coordinator leaving the group, for example.
Regardless, the configuration parameter was deprecated since 2.4, it is ignored in 2.6.6 and beyond and was completely dropped since 3.0.
The motivation behind this decision was that "this forces JGroups into a client-server mode, and that's not right as JGroups is peer-to-peer".
So, it is simply not possible to configure this behavior in 2.6.6 and any subsequent release.
However, the behavior can be implemented at application level (see [1] for an idea).