TCP KeepAlive
External
Internal
Overview
TCP Keep-Alive is a mechanism that insures small probe packets are periodically sent to the other end of the TCP connection. Under normal circumstances, if the TCP Keep-Alive is enabled and if the TCP stacks at both ends are up, the connection will stay up indefinitely, regardless of whether the application layers at both ends send data or stay idle. Note that the keep-alive mechanism has to be explicitly enabled, whether it is enabled by default or not is implementation dependent. The keep-alive probe logic is implemented in the TCP stack: if the keep-alive option is enabled, and no application data has been exchanged across the socket in either direction for
An ACK response is expected for each packet. Since ACK will only be returned if the other end of the connection is reachable and alive, the lack of acknowledgment is interpreted as failure and, after some retries, the OS will close the TCP end-point and will release the associated resources. The application listening on that particular socket will receive an error from the OS.
Aside from application notification in case of connection failure, another benefit of enabling TCP Keep-Alive is that it keeps the connection "active" so if the connection goes over a firewall that watches for inactivity, that will prevent the firewall from dropping the connection.
The keep-alive protocol is implemented with a keepalive packet, which contains contains null data. In an Ethernet network, a keepalive frame length is 60 bytes, while the server response to this, also a null data frame, is 54 bytes.
When the keepalive option is set for a TCP socket and no data has been exchanged across the socket in either direction for 2 hours (NOTE: the actual value is implementation dependent), TCP automatically sends a keepalive probe to the peer. This probe is a TCP segment to which the peer must respond. One of three responses is expected: 1. The peer responds with the expected ACK. The application is not notified (since everything is OK). TCP will send another probe following another 2 hours of inactivity. 2. The peer responds with an RST, which tells the local TCP that the peer host has crashed and rebooted. The socket is closed. 3. There is no response from the peer. The socket is closed. The purpose of this option is to detect if the peer host crashes. Valid only for TCP socket: SocketImpl
Configuration
Java networking API will report whether a certain socket has its keep-alive mechanism turned on or off, see Socket SO KEEPALIVE.
There are three parameters related to keepalive:
Keepalive time
The time of connection inactivity after which the first keep alive request is sent. In other words, is the duration between two keepalive transmissions in idle condition. The default value on Linux is 2 hours (7,200 seconds). More details TCP KeepAlive on Linux.
Keepalive interval
The duration between two successive keepalive retransmissions, if acknowledgement to the previous keepalive transmission is not received.
Keepalive retry
The number of retransmissions to be carried out before declaring that remote end is not available.
O/S Specific Details
The fact that TCP KeepAlive is enabled or not, and how it is configured, it is OS-dependent
TCP Keepalive on Linux
- The TCP KeepAlive Source of Record http://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/
- Using TCP Keepalive to Detect Network Errors http://www.gnugk.org/keepalive.html
Questions and TODO
- Can keepalive be set per TCP connection, or is a system-wide setting (all TCP/IP connections)?
- So it is true that if I don't have keep alive, my write can block forever if I power off the other end suddenly.