JGroups Protocol GMS: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
Line 71: Line 71:
=Member not allowed to become initial coordinator=
=Member not allowed to become initial coordinator=


Old JGroups versions used to allow members to be configured so they WON'T become coordinators on initial startup. This was achieved by setting the "disable_initial_coord" GMS configuration attribute to "true".  
Old JGroups versions used to allow members to be configured so they do not become coordinators on initial startup. This was achieved by setting the "disable_initial_coord" GMS configuration attribute to "true".  


If "disable_initial_coord" was set to "true", that member's GMS protocol instance would simply continue to loop upon receiving a null initial membership (meaning there were no other members in the group) and wait for a non-null initial membership. The constrain would not extent beyond startup, though. Any member configured with disable_initial_coord=true could actually become coordinator - later, as a consequence of a previous coordinator leaving the group, for example.
If "disable_initial_coord" was set to "true", that member's GMS protocol instance would simply continue to loop upon receiving a null initial membership (meaning there were no other members in the group) and wait for a non-null initial membership. The constrain would not extent beyond startup, though. Any member configured with disable_initial_coord=true could actually become coordinator - later, as a consequence of a previous coordinator leaving the group, for example.

Latest revision as of 03:01, 9 August 2016

External

Internal

Overview

Group Membership Service (pbcast.GMS) is the protocol responsible for joining/leaving members. Also handles suspected members, and excludes them from the membership. Emits and sends views to all members when a membership change has occurred.

Current Group Membership

Find the latest group membership by looking in logs for:

cat server.log | grep "installing view"

result:

2016-03-04 22:00:46,631 DEBUG [org.jgroups.protocols.pbcast.GMS] (Incoming-8,shared=tcp) jgh2/web: installing view [jgh1/web|2] (3) [jgh1/web, jgh2/web, jgh3/web]

It requires GMS DEBUG level.

Suspected Member Exclusion

If a SUSPECT message survives VERIFY_SUSPECT verification, it is passed up the stack and reaches the GMS layer, which computes and multicasts a new view. The new view does not contain the suspect member:

2016-02-29 09:25:24,110 [TRACE] [org.jgroups.protocols.pbcast.GMS] (ViewHandler,TEST-CLUSTER,host01/tcp) -- host01/tcp: joiners=[], suspected=[host02/tcp], leaving=[], new view: [host01/tcp|3] (2) [host01/tcp, host03/tcp]
2016-02-29 09:25:24,110 [TRACE] [org.jgroups.protocols.pbcast.GMS] (ViewHandler,TEST-CLUSTER,host01/tcp) -- host01/tcp: mcasting view [host01/tcp|3] (2) [host01/tcp, host03/tcp] (2 mbrs)

The log sequence above shows how "host02" is removed from the [host01, host02, host03] view.

This is how the VIEW message looks at the transport level:

2016-02-29 09:25:24,125 [TRACE] [org.jgroups.protocols.TCP] (ViewHandler,TEST-CLUSTER,host01/tcp) -- null: sending msg to null, src=host01/tcp, headers are GMS: GmsHeader[VIEW], NAKACK2: [MSG, seqno=434687], TCP: [channel_name=TEST-CLUSTER]

The new view is installed at the GMS level - including for the coordinator that originated it - when the multicast message propagates up from the transport to the GMS protocol. The DEBUG message shown below should be visible in all members' logs:

2016-02-29 09:25:24,157 [TRACE] [org.jgroups.protocols.pbcast.GMS] (Cluster Dispatch) -- host01/tcp: received delta view [host01/tcp|3], ref-view=[host01/tcp|2], left=[host02/tcp]
2016-02-29 09:25:24,157 [DEBUG] [org.jgroups.protocols.pbcast.GMS] (Cluster Dispatch) -- host01/tcp: installing view [host01/tcp|3] (2) [host01/tcp, host03/tcp]

Configuration

<pbcast.GMS print_local_addr="true"
            join_timeout="3000"
            shun="true"
            view_bundling="true"
            view_ack_collection_timeout="5000"
            resume_task_timeout="7500"/>

Implementation Details

JGroups GMS - What Happens on Coordinator when it Receives a Join Request

Member not allowed to become initial coordinator

Old JGroups versions used to allow members to be configured so they do not become coordinators on initial startup. This was achieved by setting the "disable_initial_coord" GMS configuration attribute to "true".

If "disable_initial_coord" was set to "true", that member's GMS protocol instance would simply continue to loop upon receiving a null initial membership (meaning there were no other members in the group) and wait for a non-null initial membership. The constrain would not extent beyond startup, though. Any member configured with disable_initial_coord=true could actually become coordinator - later, as a consequence of a previous coordinator leaving the group, for example.

Regardless, the configuration parameter was deprecated since 2.4, it is ignored in 2.6.6 and beyond and was completely dropped since 3.0. The motivation behind this decision was that "this forces JGroups into a client-server mode, and that's not right as JGroups is peer-to-peer". So, it is simply not possible to configure this behavior in 2.6.6 and any subsequent release. The behavior can be implemented at application level (see http://sourceforge.net/p/javagroups/discussion/130427/thread/fd115532 for ideas how to do that).

Additional details: https://issues.jboss.org/browse/JGRP-459