Java Non-Blocking I/O Concepts: Difference between revisions
(33 intermediate revisions by the same user not shown) | |||
Line 9: | Line 9: | ||
=Overview= | =Overview= | ||
Until NIO, all that was available for I/O in Java were Streams (<tt>java.io.*</tt>). All operations with Streams are blocking: a thread waits until there is data to read from the Stream instance or until it can write to the Stream instance. The disadvantage of this approach is that multiple threads are required when we need to simultaneously handle multiple sources of data, such as concurrent network connections. These threads usually spend most of their time blocked waiting on I/O events. This is the I/O and threading model Tomcat is built on. Another particularity of the Stream API is that it reads or writes data one byte at a time, which is not the most efficient way of doing I/O - the O/S handle I/O in blocks. | Until NIO, all that was available for I/O in Java were Streams (<tt>java.io.*</tt>). All operations with Streams are blocking: a thread waits until there is data to read from the Stream instance or until it can write to the Stream instance. The disadvantage of this approach is that multiple threads are required when we need to simultaneously handle multiple sources of data, such as concurrent network connections. These threads usually spend most of their time blocked waiting on I/O events. This is the I/O and threading model Tomcat is built on. Another particularity of the Stream API is that it reads or writes data one byte at a time, which is not the most efficient way of doing I/O - the O/S handle I/O in blocks. NIO addresses both of this issues: it offers a method of handing I/O in an asynchronous manner, referred to as "[[#Multiplexed_Non-Blocking_I.2FO_Facility|multiplexed non-blocking I/O facility]]", described below. It also comes with APIs that allow [[#Stream-Oriented_vs._Block-Oriented_I.2FO_Operations|block-oriented I/O operations]]. | ||
NIO addresses both of this issues: it offers a method of handing I/O in an asynchronous manner, referred to as "[[#Multiplexed_Non-Blocking_I.2FO_Facility|multiplexed non-blocking I/O facility]]", described below. It also comes with APIs that allow [[#Stream-Oriented_vs._Block-Oriented_I.2FO_Operations|block-oriented I/O operations]] | |||
=Multiplexed Non-Blocking I/O Facility= | =Multiplexed Non-Blocking I/O Facility= | ||
[[#Selector|Selectors]], [[#Selection_Key|selection keys]] and [[#Selectable_Channel|selectable channels]] work together to provide a multiplexed, non-blocking I/O facility, whose main advantage is that is much more scalable than thread-oriented blocking I/O. | [[#Selector|Selectors]], [[#Selection_Key|selection keys]] and [[#Selectable_Channel|selectable channels]] work together to provide a multiplexed, non-blocking I/O facility, whose main advantage is that is much more scalable than thread-oriented blocking I/O. Asynchronous I/O works as follows: a [[#Selector|Selector]] instance is created using the [https://docs.oracle.com/javase/10/docs/api/java/nio/channels/Selector.html#open() Selector.open()] static method. The Selector instance is then used to register [[#Selectable_Channel|selectable channels]], so the Selector instance acts as a multiplexor of selectable channels. This mechanism allows the usage of a single selector thread that could be notified and process I/O events arriving from multiple sources. | ||
The | The registration procedure requires specifying the set of channel I/O operations to be tested for readiness by the selector. This set of operations is also referred to as "interest set". There are four types of operations the selector notifies on: | ||
* READ: this notification is sent if the selector detects that the corresponding channel is ready for reading, has reached end-of-stream, has been remotely shut down for further reading, or has an error pending. | |||
* WRITE: this notification is sent if the selector detects that the corresponding channel is ready for writing, has been remotely shut down for further writing or has an error pending. | |||
* CONNECT: this notification is sent if the selector detects that the corresponding channel is ready to complete its connection sequence, or has an error pending. | |||
* ACCEPT: this notification is sent if the selector detects that the corresponding channel is ready to accept another connection, or has an error pending. | |||
The | The registration procedure returns a [[#Selection_Key|selection key]] that represents the registration. | ||
The | After the registration, select() can be invoked on the selector to discover which channels, if any, have become ready to perform one or more of the operations in which interest was previously declared. The selection operation is blocking. The underlying operating system is queried for an update on registered channels' readiness. If one or more channels are ready, the key returned when it was registered will be added to the selector's "selected-key set". The set is returned by Selector.selectedKeys(). The keys of the selected-key set can be examined to determine the operations for which each channel is ready. The key also gives access to the channel instance, ''which then can be used to perform the I/O operation''. Note that if the selected key for a channel that has been handled must be explicitly removed from the set, otherwise it will stay there and be present in the set on the next select() operation. | ||
The selector thread can be used inside the main event loop to also perform the I/O operation on the channel (or channels) that have become ready, and then return to block on select(), or the task of performing the actual I/O operation can be delegated to a different thread from an auxiliary thread pool, so the selector thread immediately returns to and blocks on select(). | |||
<tt>java.nio.channels</tt> provides selectable channel classes corresponding to [https://docs.oracle.com/javase/10/docs/api/java/net/DatagramSocket.html DatagramSocket], [https://docs.oracle.com/javase/10/docs/api/java/net/ServerSocket.html ServerSocket] and [https://docs.oracle.com/javase/10/docs/api/java/net/Socket.html Socket]. If the channel needs an associated socket, the socket will be created as a side effect on this operation. | <tt>java.nio.channels</tt> provides selectable channel classes corresponding to [https://docs.oracle.com/javase/10/docs/api/java/net/DatagramSocket.html DatagramSocket], [https://docs.oracle.com/javase/10/docs/api/java/net/ServerSocket.html ServerSocket] and [https://docs.oracle.com/javase/10/docs/api/java/net/Socket.html Socket]. If the channel needs an associated socket, the socket will be created as a side effect on this operation. | ||
Channels cannot be deregistered directly, instead, the key representing their registration must be cancelled. | Channels cannot be deregistered directly, instead, the key representing their registration must be cancelled. | ||
Java asynchronous I/O is built in top of underlying O/S non-blocking I/O facilities. Non-blocking I/O read and write support in the API, along with [[#Stream-Oriented_vs._Block-Oriented_I.2FO_Operations|block access]], allow Java applications to implement high-speed I/O without having to write native code that would access those O/S specific facilities. | |||
=Stream-Oriented vs. Block-Oriented I/O Operations= | =Stream-Oriented vs. Block-Oriented I/O Operations= | ||
A stream-oriented I/O system deals with data one byte at a time: an input stream produces a byte of data and an output stream consumes a byte of data. Stream-oriented | A stream-oriented I/O system deals with data one byte at a time: an input stream produces a byte of data and an output stream consumes a byte of data. Stream-oriented APIs allow data to be easily filtered. They also allow multiple streams to be easily chained. However, moving data this way is rather slow. A block-oriented I/O system deals with data in blocks - each operation produced or consumes a block of data in one step. This could move data faster, but the block-oriented APIs lack the elegance and simplicity of the stream-oriented APIs. NIO exposes block-level access via [[#Channel|Channels]] and [[#Buffer|Buffers]]. Data can be read and written in blocks via Buffers into and from Channels. Caching is already done, efficiently, by the O/S.The data transfer between Channels and Buffers is done transparently, without requiring application threads to move bytes around. An application thread is notified once the transfer had completed via an I/O event. The NIO API does not do anything that the Stream API can't do - essentially reading and writing data from/to I/O devices - but it does it faster and using less threads. | ||
=Java NIO and TCP Connections= | |||
A working example that shows various Java NIO primitives collaborating in establishing a bidirectional TCP connection and providing non-blocking and block-oriented access to it: | |||
{{Internal|Java_NIO_and_TCP_Connections#Overview|Java NIO and TCP Connections}} | |||
=Primitives= | =Primitives= | ||
Line 55: | Line 50: | ||
A multiplexor that allows registration of multiple [[#Selectable_Channel|selectable channels]] so they can be serviced by a single selector thread. The selector thread will block in <tt>select()</tt> and it will be notified by the selector's implementation only when an I/O event, such as data becoming available or a new connection being established, occurs. | A multiplexor that allows registration of multiple [[#Selectable_Channel|selectable channels]] so they can be serviced by a single selector thread. The selector thread will block in <tt>select()</tt> and it will be notified by the selector's implementation only when an I/O event, such as data becoming available or a new connection being established, occurs. | ||
==Selection Key== | |||
A selection key represents a registration of a [[#Channel|channel]] with a [[#Selector|selector]]. When a selector notifies the calling thread of an incoming event, it does so by supplying a selection key that corresponds to the event. The selection key can also be used to deregister a channel from the selector. | |||
==Channel== | ==Channel== | ||
<span id='Selectable_Channel'></span>''' | A Channel represents an open connection to an entity such as a hardware device, a file, a network socket or a program component that is capable of performing I/O operations. The Channel is essentially a source of I/O events. All the data that goes in and out of an application must pass through the Channel. However, unlike in the case of the Stream API, where the application writes and reads to/from the Stream, in NIO's case the application does not read or write data from/to the Channel directly, it does so via [[#Buffer|Buffers]], after being notified of data availability via a [[#Selector|selector]]. Unlike Streams, which are uni-directional, Channels are bi-directional, they can be used to read and write data. This behavior reflects better the reality of the underlying O/S channels. A channel is either opened or closed. A channel is open upon creation and once closed it remains closed. Chanel are in general intended to be safe for multithreaded access. A <span id='Selectable_Channel'></span>'''selectable channel''' is a special type of channel that can be put into non-blocking mode, and which has to be put in blocking mode if it is to be multiplexed under a selector. | ||
For more details se [[#Channel.2FBuffer_Interaction|Channel/Buffer Interaction]] below. | |||
===ServerSocketChannel=== | ===ServerSocketChannel=== | ||
Line 69: | Line 70: | ||
{{External|[https://docs.oracle.com/javase/10/docs/api/java/nio/channels/SocketChannel.html SocketChannel]}} | {{External|[https://docs.oracle.com/javase/10/docs/api/java/nio/channels/SocketChannel.html SocketChannel]}} | ||
===FileChannel=== | |||
{{External|[https://docs.oracle.com/javase/10/docs/api/java/nio/channels/FileChannel.html FileChannel]}} | |||
A working FileChanel example: | |||
{{External|[https://github.com/NovaOrdis/playground/tree/master/java/nio/file-channel Playground FileChannel Example]}} | |||
==Buffer== | ==Buffer== | ||
= | {{External|[https://docs.oracle.com/javase/10/docs/api/java/nio/Buffer.html Buffer]}} | ||
<tt>java.nio.Buffer</tt> is container for a fixed amount of data, a linear, finite sequence of elements of a specific primitive type. From an implementation perspective, a Buffer is a primitive array that has state allowing it to keep track of what has been written or read from it. The essential properties of a buffer are: | |||
* '''capacity''' | |||
* '''position''' - the index of the next element to be read or written. | |||
* '''limit''' - is the index of the first element that should not be read or written. | |||
The base class <tt>java.nio.Buffer</tt> defines these properties, and methods for clearing, flipping, rewinding, marking the current position and resetting the position to a previous mark. Subclasses define get() and put() methods for moving data out of and into the buffer, methods for compacting, duplicating and slicing a buffer. Data is transferred to and from [[NIO Concepts#Channel|Channels]] via Buffers. An application does not transfer data directly into or from a Channel, it can only do it via a Buffer. | |||
Byte buffers are special, in that they can: | |||
* be allocated as a [[#Direct_Buffer|direct buffer]], in which case the JVM will make a best effort to perform native I/O operations directly on it. | |||
* be created by mapping a region of a file directly into memory. | |||
===Buffer Allocation=== | |||
<syntaxhighlight lang='java'> | |||
ByteBuffer.allocate(capacity) | |||
ByteBuffer.wrap(...) | |||
</syntaxhighlight> | |||
===Types of Buffers=== | |||
====ByteBuffer==== | |||
<tt>ByteBuffer</tt> is used for most standard I/O operations. | |||
====DirectBuffer==== | |||
<font color=darkgray>TODO ByteBuffer.allocateDirect(capacity)</font> | |||
====Heap ByteBuffer==== | |||
A ByteBuffer that is not [[#Direct_Buffer|direct]] is allocated on the heap. | |||
====CharBuffer==== | |||
====ShortBuffer==== | |||
====IntBuffer==== | |||
====LongBuffer==== | |||
====DoubleBuffer==== | |||
==Channel/Buffer Interaction== | |||
<span id='Reading_from_a_Channel'></span>To read from a channel use: | |||
<syntaxhighlight lang='java'> | |||
ReadableByteChannel channel = ...; | |||
ByteBuffer buffer = ...; | |||
channel.read(buffer); | |||
</syntaxhighlight> | |||
The invocation of the <tt>read()</tt> method initiates an attempt to transfer data from the channel into the buffer. The read operation might not fill the buffer, and it fact might not read any bytes at all - if none are available on the channel. In the best case, it fills all the space available in the buffer. The result is the number of bytes read, possibly zero, or -1 if the channel has reached end-of-stream. The buffer's internal accounting variables are modified correspondingly. | |||
If the channel is in blocking mode, the method will block until at least one byte is read. | |||
<font color=darkgray>TODO: https://docs.oracle.com/javase/8/docs/api/java/nio/channels/ReadableByteChannel.html</font> | |||
<font color=darkgray>Address writing on a channel. What does in mean "ready for write" event on the selector loop?</font> |
Latest revision as of 03:50, 15 September 2020
External
Internal
Overview
Until NIO, all that was available for I/O in Java were Streams (java.io.*). All operations with Streams are blocking: a thread waits until there is data to read from the Stream instance or until it can write to the Stream instance. The disadvantage of this approach is that multiple threads are required when we need to simultaneously handle multiple sources of data, such as concurrent network connections. These threads usually spend most of their time blocked waiting on I/O events. This is the I/O and threading model Tomcat is built on. Another particularity of the Stream API is that it reads or writes data one byte at a time, which is not the most efficient way of doing I/O - the O/S handle I/O in blocks. NIO addresses both of this issues: it offers a method of handing I/O in an asynchronous manner, referred to as "multiplexed non-blocking I/O facility", described below. It also comes with APIs that allow block-oriented I/O operations.
Multiplexed Non-Blocking I/O Facility
Selectors, selection keys and selectable channels work together to provide a multiplexed, non-blocking I/O facility, whose main advantage is that is much more scalable than thread-oriented blocking I/O. Asynchronous I/O works as follows: a Selector instance is created using the Selector.open() static method. The Selector instance is then used to register selectable channels, so the Selector instance acts as a multiplexor of selectable channels. This mechanism allows the usage of a single selector thread that could be notified and process I/O events arriving from multiple sources.
The registration procedure requires specifying the set of channel I/O operations to be tested for readiness by the selector. This set of operations is also referred to as "interest set". There are four types of operations the selector notifies on:
- READ: this notification is sent if the selector detects that the corresponding channel is ready for reading, has reached end-of-stream, has been remotely shut down for further reading, or has an error pending.
- WRITE: this notification is sent if the selector detects that the corresponding channel is ready for writing, has been remotely shut down for further writing or has an error pending.
- CONNECT: this notification is sent if the selector detects that the corresponding channel is ready to complete its connection sequence, or has an error pending.
- ACCEPT: this notification is sent if the selector detects that the corresponding channel is ready to accept another connection, or has an error pending.
The registration procedure returns a selection key that represents the registration.
After the registration, select() can be invoked on the selector to discover which channels, if any, have become ready to perform one or more of the operations in which interest was previously declared. The selection operation is blocking. The underlying operating system is queried for an update on registered channels' readiness. If one or more channels are ready, the key returned when it was registered will be added to the selector's "selected-key set". The set is returned by Selector.selectedKeys(). The keys of the selected-key set can be examined to determine the operations for which each channel is ready. The key also gives access to the channel instance, which then can be used to perform the I/O operation. Note that if the selected key for a channel that has been handled must be explicitly removed from the set, otherwise it will stay there and be present in the set on the next select() operation.
The selector thread can be used inside the main event loop to also perform the I/O operation on the channel (or channels) that have become ready, and then return to block on select(), or the task of performing the actual I/O operation can be delegated to a different thread from an auxiliary thread pool, so the selector thread immediately returns to and blocks on select().
java.nio.channels provides selectable channel classes corresponding to DatagramSocket, ServerSocket and Socket. If the channel needs an associated socket, the socket will be created as a side effect on this operation.
Channels cannot be deregistered directly, instead, the key representing their registration must be cancelled.
Java asynchronous I/O is built in top of underlying O/S non-blocking I/O facilities. Non-blocking I/O read and write support in the API, along with block access, allow Java applications to implement high-speed I/O without having to write native code that would access those O/S specific facilities.
Stream-Oriented vs. Block-Oriented I/O Operations
A stream-oriented I/O system deals with data one byte at a time: an input stream produces a byte of data and an output stream consumes a byte of data. Stream-oriented APIs allow data to be easily filtered. They also allow multiple streams to be easily chained. However, moving data this way is rather slow. A block-oriented I/O system deals with data in blocks - each operation produced or consumes a block of data in one step. This could move data faster, but the block-oriented APIs lack the elegance and simplicity of the stream-oriented APIs. NIO exposes block-level access via Channels and Buffers. Data can be read and written in blocks via Buffers into and from Channels. Caching is already done, efficiently, by the O/S.The data transfer between Channels and Buffers is done transparently, without requiring application threads to move bytes around. An application thread is notified once the transfer had completed via an I/O event. The NIO API does not do anything that the Stream API can't do - essentially reading and writing data from/to I/O devices - but it does it faster and using less threads.
Java NIO and TCP Connections
A working example that shows various Java NIO primitives collaborating in establishing a bidirectional TCP connection and providing non-blocking and block-oriented access to it:
Primitives
Selector
A multiplexor that allows registration of multiple selectable channels so they can be serviced by a single selector thread. The selector thread will block in select() and it will be notified by the selector's implementation only when an I/O event, such as data becoming available or a new connection being established, occurs.
Selection Key
A selection key represents a registration of a channel with a selector. When a selector notifies the calling thread of an incoming event, it does so by supplying a selection key that corresponds to the event. The selection key can also be used to deregister a channel from the selector.
Channel
A Channel represents an open connection to an entity such as a hardware device, a file, a network socket or a program component that is capable of performing I/O operations. The Channel is essentially a source of I/O events. All the data that goes in and out of an application must pass through the Channel. However, unlike in the case of the Stream API, where the application writes and reads to/from the Stream, in NIO's case the application does not read or write data from/to the Channel directly, it does so via Buffers, after being notified of data availability via a selector. Unlike Streams, which are uni-directional, Channels are bi-directional, they can be used to read and write data. This behavior reflects better the reality of the underlying O/S channels. A channel is either opened or closed. A channel is open upon creation and once closed it remains closed. Chanel are in general intended to be safe for multithreaded access. A selectable channel is a special type of channel that can be put into non-blocking mode, and which has to be put in blocking mode if it is to be multiplexed under a selector.
For more details se Channel/Buffer Interaction below.
ServerSocketChannel
A selectable channel used to listen for incoming network connections and create new SocketChannels for each TCP connection. The ServerSocketChannel delegates to a ServerSocket to do the actual listening.
SocketChannel
FileChannel
A working FileChanel example:
Buffer
java.nio.Buffer is container for a fixed amount of data, a linear, finite sequence of elements of a specific primitive type. From an implementation perspective, a Buffer is a primitive array that has state allowing it to keep track of what has been written or read from it. The essential properties of a buffer are:
- capacity
- position - the index of the next element to be read or written.
- limit - is the index of the first element that should not be read or written.
The base class java.nio.Buffer defines these properties, and methods for clearing, flipping, rewinding, marking the current position and resetting the position to a previous mark. Subclasses define get() and put() methods for moving data out of and into the buffer, methods for compacting, duplicating and slicing a buffer. Data is transferred to and from Channels via Buffers. An application does not transfer data directly into or from a Channel, it can only do it via a Buffer.
Byte buffers are special, in that they can:
- be allocated as a direct buffer, in which case the JVM will make a best effort to perform native I/O operations directly on it.
- be created by mapping a region of a file directly into memory.
Buffer Allocation
ByteBuffer.allocate(capacity)
ByteBuffer.wrap(...)
Types of Buffers
ByteBuffer
ByteBuffer is used for most standard I/O operations.
DirectBuffer
TODO ByteBuffer.allocateDirect(capacity)
Heap ByteBuffer
A ByteBuffer that is not direct is allocated on the heap.
CharBuffer
ShortBuffer
IntBuffer
LongBuffer
DoubleBuffer
Channel/Buffer Interaction
To read from a channel use:
ReadableByteChannel channel = ...;
ByteBuffer buffer = ...;
channel.read(buffer);
The invocation of the read() method initiates an attempt to transfer data from the channel into the buffer. The read operation might not fill the buffer, and it fact might not read any bytes at all - if none are available on the channel. In the best case, it fills all the space available in the buffer. The result is the number of bytes read, possibly zero, or -1 if the channel has reached end-of-stream. The buffer's internal accounting variables are modified correspondingly.
If the channel is in blocking mode, the method will block until at least one byte is read.
TODO: https://docs.oracle.com/javase/8/docs/api/java/nio/channels/ReadableByteChannel.html
Address writing on a channel. What does in mean "ready for write" event on the selector loop?