JGroups GMS - What Happens on Coordinator when it Receives a Join Request

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

Internal

Relevance

  • JGroups 2.6.6

Process

The GMS protocol receives a JOIN_REQ event; the JOIN_REQ event was sent by the member who wants to join the group. Upon receiving, the GMS protocol submits the event to the GMS' ViewHandler instance, where it is queued up to be processed on a different thread ("ViewHandler,<group-name>,127.0.0.1:50324").

The ViewHandler thread pushes the event (now a "Request") to the CoordGmsImpl instance.

During the processing, a new View instance is created, and it is immediately pushed up and down the protocol stack attached to a PREPARE_VIEW event.

Then GMS starts a flush, using the new view, but only if FLUSH is present in the protocol stack.

Then it sends a SUSPEND_STABLE event down the stack. The event propagates down and suspends the STABLE protocol, with a timeout of 30 seconds.

Then is sends a GET_DIGEST_EVT down the stack and waits for a GET_DIGEST_OK response or a timeout, whichever occurs first. The GET_DIGEST_EVT is handled by the NAKACK protocol.

Then a JoinRsp instance is created.

Then the GMS protocol creates a Message with a GmsHeader.VIEW header containing the new view and the digest to be broadcasted to all members of the group, except to itself. The acknowledgments for the view change from all members are expected, and there's a special mechanism (the AckCollector) designed to collect them.

Then it sends a local TMP_VIEW up and down the stack. This is needed by certain layers (NAKACK) to compute correct digest in case client's next request (e.g. getState()) reaches us before our own view change multicast. Check NAKACK's TMP_VIEW handling for details.

Then it sends the Message with the GmsHeader.VIEW down the stack, the ViewHandler thread blocks on the AckCollector instance until all view change acknowledgments are received - or a timeout of 2 seconds occurs.

Why waiting on ack_collector twice?

Then it sends a RESUME_STABLE down the stack.

If the join process takes a long time, the messages sent down on coordinator's stack during the join process take a long time to be delivered to the new member.