JGroups Protocol FD ALL: Difference between revisions
Line 9: | Line 9: | ||
=Overview= | =Overview= | ||
FD_ALL | FD_ALL is a multicast based failure detection protocol. | ||
=Failure Detection= | =Failure Detection= |
Revision as of 03:07, 3 March 2016
External
- User Manual FD_ALL http://www.jgroups.org/manual/html/protlist.html#FD_ALL
Internal
Overview
FD_ALL is a multicast based failure detection protocol.
Failure Detection
Also See
Configuration
JGroups standalone
<FD_ALL interval="1000" timeout="3000"/>
interval
The periodicity (in ms) of the HEARTBEAT message is sent by a member to the cluster AND with which the response timestamps are checked. In the example above, each member sends a heartbeat and check the response timestamps for previous heartbeats every second. If at any moment the difference between a specific HEARTBEAT event timestap and the response timestamp from a member is larger than 'timeout', that member is suspected.
So, for the values defined above, if heartbeat H1 doesn't receive a response, the remote member will be suspected after 3 * interval + interval (~4,000 ms). If the timeout is set to 2,500 ms, the member will be suspected after 3,000 ms.
timeout
Timeout after which a node is suspected by the current node if neither a heartbeat nor data were received from. Also see the 'interval' definition above.
ergonomics
Enables ergonomics: dynamically find the best values for properties at runtime
level
Sets the logger level (see javadocs)
msg_counts_as_heartbeat
Treat messages received from members as heartbeats. Note that this means we're updating a value in a hashmap every time a message is passing up the stack through FD_ALL, which is costly. Default is false.
stats
Determines whether to collect statistics (and expose them via JMX). Default is true
Recommendations
Resulted from personal experimentation
Both FD_SOCK and FD failure detection protocols, which rely on directly pinging a neighbor, have proved unreliable under certain platform-specific circumstances. If you plan to use those, test response times with your specific JVM and platform. They may work very well on certain platforms and fail on others. I would use FD_ALL as a central failure detection protocol, its coverage seems to be most generic.