WildFly HornetQ-Based Messaging Subsystem - Network Replication Implementation Details

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

Internal

Implementation

Replication is NOT synchronous all the way to the replicated persistent storage on the backup node.

The following race condition is possible: The message M is being replicated, and in transit on the network between the active and the backup node, then written on the local storage and successfully acknowledged to the client, then the active node crashes before M is fully transferred to the backup and persisted there. The client got the confirmation that the message has been persistent (sent() returned) but after the fail-over, M is not to be found in the backup node journal. Technically, M is not "lost", it's in the active node's journal, which is temporarily inaccessible. I am not sure how probable this scenario is, but it is theoretically possible.

In the course of sending a persistent message, the message (actually a ServerMessageImpl) is handed over to the active JournalStorageManager instance as argument of the storeMessage(ServerMessage) method. See JournalStorageManager.java line 898. The storeMessage() method invokes appendAddRecord() into the messageJournal instance, line 920. For a HornetQ server configured with HA and message replication, the messageJournal is a ReplicatedJournal. See ReplicatedJournal.java line 108:

public void appendAddRecord(final long id,
                            final byte recordType,
                            final EncodingSupport record,
                            final boolean sync,
                            final IOCompletion completionCallback) throws Exception
{
   if (ReplicatedJournal.trace)
   {
      ReplicatedJournal.trace("Append record id = " + id + " recordType = " + recordType);
   }
   replicationManager.appendUpdateRecord(journalID, ADD_OPERATION_TYPE.ADD, id, recordType, record);
   localJournal.appendAddRecord(id, recordType, record, sync, completionCallback);
}

The thread first invokes into the replication manager - then writes to the local journal.

The ReplicationManager.appendUpdateRecord() implementation - ReplicationManager.java line 139 - replicates the packet with ReplicationManager.sendReplicatePacket(), line 360, which sends the packet on its replicationChannel with send(). Documentation says that send returns true if the send was successful, but it does not specify more than that. Successful could mean "in transit", as long as we don't have an explicit and synchronous confirmation that the data reached the other end.

Internally, send() writes on a transport connection, which is a Netty connection, but I don't see an acknowledgment on any kind. See ChannelImpl.java line 278. The message data ends being sent by NettyConnection.java on a Netty channel with channel.write() http://netty.io/4.0/api/io/netty/channel/Channel.html#write(java.lang.Object) at NettyConnection.java line 233, while the "flush" flag is set to false, and the returned ChannelFuture is ignored. That means the replication is being initiated, but to my knowledge, the initiator does not wait for a confirmation that the data reached the other end and it was actually persisted.

Confirmed by Case 01631216.