Java Non-Blocking I/O Concepts

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

External

Internal

Overview

Until NIO, all that was available for I/O in Java were Streams (java.io.*). All operations with Streams are blocking: a thread waits until there is data to read from the Stream instance or until it can write to the Stream instance. The disadvantage of this approach is that multiple threads are required when we need to simultaneously handle multiple sources of data, such as concurrent network connections. These threads usually spend most of their time blocked waiting on I/O events. This is the I/O and threading model Tomcat is built on. Another particularity of the Stream API is that it reads or writes data one byte at a time, which is not the most efficient way of doing I/O - the O/S handle I/O in blocks. NIO addresses both of this issues: it offers a method of handing I/O in an asynchronous manner, referred to as "multiplexed non-blocking I/O facility", described below. It also comes with APIs that allow block-oriented I/O operations.

Multiplexed Non-Blocking I/O Facility

Selectors, selection keys and selectable channels work together to provide a multiplexed, non-blocking I/O facility, whose main advantage is that is much more scalable than thread-oriented blocking I/O. Asynchronous I/O works as follows: a Selector instance is created using the Selector.open() static method. The Selector instance is then used to register selectable channels, so the Selector instance acts as a multiplexor of selectable channels. This mechanism allows the usage of a single selector thread that could be notified and process I/O events arriving from multiple sources.

The registration procedure requires specifying the set of channel I/O operations to be tested for readiness by the selector. This set of operations is also referred to as "interest set". There are four types of operations the selector notifies on:

  • READ: this notification is sent if the selector detects that the corresponding channel is ready for reading, has reached end-of-stream, has been remotely shut down for further reading, or has an error pending.
  • WRITE: this notification is sent if the selector detects that the corresponding channel is ready for writing, has been remotely shut down for further writing or has an error pending.
  • CONNECT: this notification is sent if the selector detects that the corresponding channel is ready to complete its connection sequence, or has an error pending.
  • ACCEPT: this notification is sent if the selector detects that the corresponding channel is ready to accept another connection, or has an error pending.

The registration procedure returns a selection key that represents the registration.

After the registration, select() can be invoked on the selector to discover which channels, if any, have become ready to perform one or more of the operations in which interest was previously declared. The selection operation is blocking. The underlying operating system is queried for an update on registered channels' readiness. If one or more channels are ready, the key returned when it was registered will be added to the selector's "selected-key set". The set is returned by Selector.selectedKeys(). The keys of the selected-key set can be examined to determine the operations for which each channel is ready. The key also gives access to the channel instance, which then can be used to perform the I/O operation. Note that if the selected key for a channel that has been handled must be explicitly removed from the set, otherwise it will stay there and be present in the set on the next select() operation.

The selector thread can be used inside the main event loop to also perform the I/O operation on the channel (or channels) that have become ready, and then return to block on select(), or the task of performing the actual I/O operation can be delegated to a different thread from an auxiliary thread pool, so the selector thread immediately returns to and blocks on select().

java.nio.channels provides selectable channel classes corresponding to DatagramSocket, ServerSocket and Socket. If the channel needs an associated socket, the socket will be created as a side effect on this operation.

Channels cannot be deregistered directly, instead, the key representing their registration must be cancelled.

Java asynchronous I/O is built in top of underlying O/S non-blocking I/O facilities. Non-blocking I/O read and write support in the API, along with block access, allow Java applications to implement high-speed I/O without having to write native code that would access those O/S specific facilities.

Stream-Oriented vs. Block-Oriented I/O Operations

A stream-oriented I/O system deals with data one byte at a time: an input stream produces a byte of data and an output stream consumes a byte of data. Stream-oriented APIs allow data to be easily filtered. They also allow multiple streams to be easily chained. However, moving data this way is rather slow. A block-oriented I/O system deals with data in blocks - each operation produced or consumes a block of data in one step. This could move data faster, but the block-oriented APIs lack the elegance and simplicity of the stream-oriented APIs. NIO exposes block-level access via Channels and Buffers. Data can be read and written in blocks via Buffers into and from Channels. Caching is already done, efficiently, by the O/S.The data transfer between Channels and Buffers is done transparently, without requiring application threads to move bytes around. An application thread is notified once the transfer had completed via an I/O event. The NIO API does not do anything that the Stream API can't do - essentially reading and writing data from/to I/O devices - but it does it faster and using less threads.

Java NIO and TCP Connections

A working example that shows various Java NIO primitives collaborating in establishing a bidirectional TCP connection and providing non-blocking and block-oriented access to it:

Java NIO and TCP Connections

Primitives

Selector

Selector

A multiplexor that allows registration of multiple selectable channels so they can be serviced by a single selector thread. The selector thread will block in select() and it will be notified by the selector's implementation only when an I/O event, such as data becoming available or a new connection being established, occurs.

Selection Key

A selection key represents a registration of a channel with a selector. When a selector notifies the calling thread of an incoming event, it does so by supplying a selection key that corresponds to the event. The selection key can also be used to deregister a channel from the selector.

Channel

A Channel represents an open connection to an entity such as a hardware device, a file, a network socket or a program component that is capable of performing I/O operations. The Channel is essentially a source of I/O events. All the data that goes in and out of an application must pass through the Channel. However, unlike in the case of the Stream API, where the application writes and reads to/from the Stream, in NIO's case the application does not read or write data from/to the Channel directly, it does so via Buffers, after being notified of data availability via a selector. Unlike Streams, which are uni-directional, Channels are bi-directional, they can be used to read and write data. This behavior reflects better the reality of the underlying O/S channels. A channel is either opened or closed. A channel is open upon creation and once closed it remains closed. Chanel are in general intended to be safe for multithreaded access. A selectable channel is a special type of channel that can be put into non-blocking mode, and which has to be put in blocking mode if it is to be multiplexed under a selector.

For more details se Channel/Buffer Interaction below.

ServerSocketChannel

ServerSocketChannel

A selectable channel used to listen for incoming network connections and create new SocketChannels for each TCP connection. The ServerSocketChannel delegates to a ServerSocket to do the actual listening.

SocketChannel

SocketChannel

FileChannel

FileChannel

A working FileChanel example:

Playground FileChannel Example

Buffer

Buffer

java.nio.Buffer is container for a fixed amount of data, a linear, finite sequence of elements of a specific primitive type. From an implementation perspective, a Buffer is a primitive array that has state allowing it to keep track of what has been written or read from it. The essential properties of a buffer are:

  • capacity
  • position - the index of the next element to be read or written.
  • limit - is the index of the first element that should not be read or written.

The base class java.nio.Buffer defines these properties, and methods for clearing, flipping, rewinding, marking the current position and resetting the position to a previous mark. Subclasses define get() and put() methods for moving data out of and into the buffer, methods for compacting, duplicating and slicing a buffer. Data is transferred to and from Channels via Buffers. An application does not transfer data directly into or from a Channel, it can only do it via a Buffer.

Byte buffers are special, in that they can:

  • be allocated as a direct buffer, in which case the JVM will make a best effort to perform native I/O operations directly on it.
  • be created by mapping a region of a file directly into memory.

Buffer Allocation

ByteBuffer.allocate(capacity)
ByteBuffer.wrap(...)

Types of Buffers

ByteBuffer

ByteBuffer is used for most standard I/O operations.

DirectBuffer

TODO ByteBuffer.allocateDirect(capacity)

Heap ByteBuffer

A ByteBuffer that is not direct is allocated on the heap.

CharBuffer

ShortBuffer

IntBuffer

LongBuffer

DoubleBuffer

Channel/Buffer Interaction

To read from a channel use:

ReadableByteChannel channel = ...;
ByteBuffer buffer = ...;
channel.read(buffer);

The invocation of the read() method initiates an attempt to transfer data from the channel into the buffer. The read operation might not fill the buffer, and it fact might not read any bytes at all - if none are available on the channel. In the best case, it fills all the space available in the buffer. The result is the number of bytes read, possibly zero, or -1 if the channel has reached end-of-stream. The buffer's internal accounting variables are modified correspondingly.

If the channel is in blocking mode, the method will block until at least one byte is read.

TODO: https://docs.oracle.com/javase/8/docs/api/java/nio/channels/ReadableByteChannel.html

Address writing on a channel. What does in mean "ready for write" event on the selector loop?