Java 8 Streams API: Difference between revisions
Line 233: | Line 233: | ||
=Parallel Streams= | =Parallel Streams= | ||
A ''parallel stream'' is a stream that splits its elements into multiple chunks, processing each chunk with a different thread. This way the workload is automatically partitioned on all cores of a multicore processor | A ''parallel stream'' is a stream that splits its elements into multiple chunks, processing each chunk with a different thread. This way the workload is automatically partitioned on all cores of a multicore processor. | ||
The last call on <tt>parallel()</tt> or <tt>sequential()</tt> affects the pipeline globally, it is not possible to have parts of the pipeline executed in parallel and parts of it executed sequentially. | A parallel stream is produced by a collection by invoking its <tt>parallelStream()</tt> method. An existing stream can be parallelized by invoking <tt>parallel()</tt> on the stream itself. Calling <tt>parallel()</tt> on a sequential stream does not imply any concrete transformation on the stream itself. Internally, a boolean flag is set to signal that the subsequent operations should be run in parallel. A parallel stream can be turned into a sequential stream by invoking <tt>sequential()</tt>. The last call on <tt>parallel()</tt> or <tt>sequential()</tt> affects the pipeline globally, it is not possible to have parts of the pipeline executed in parallel and parts of it executed sequentially. | ||
Parallel stream implementations uses [[Java_7_Fork/Join_Framework#Overview|fork/join framework]] introduced in Java 7, specifically the ForkJoinPool. | Parallel stream implementations uses [[Java_7_Fork/Join_Framework#Overview|fork/join framework]] introduced in Java 7, specifically the ForkJoinPool. |
Revision as of 19:12, 6 April 2018
External
Internal
Overview
The Streams API provides a method to represent and process sequenced data in parallel, transparently taking advantage of multi-core architectures. The Streams API offers a higher level of abstraction than Java Collections, based on the concept of transforming a stream of objects into a stream of different objects, while delegating parallel processing concerns to the runtime. The Streams API obviates the need for explicitly programming Threads and the synchronized mechanism. This represents a shift to focusing on partitioning the data rather than coordinating access to it.
Stream
A stream is a sequence of data items of a specific type, which are conceptually produced one at a time by a source, and that supports sequential and parallel aggregate operations. There may be several intermediate operations, connected into a stream pipeline, and one terminal operation that executes the stream pipeline and produces a non-stream result.
All streams implement java.util.stream.Stream<T>.
Unlike collections, which are data structures for storing and accessing elements with specific time/space complexities, streams are about expressing computations on data. Unlike collections, which store all the values of the data structure in memory, and every element of the collection has to be computed before it can be added to collection, streams' elements are computed on demand. The Streams API uses behavior parameterization and expects code that parameterizes the behavior of its operations to be passed to the API. Collections are mostly about storing and accessing data, whereas the Streams API is mostly about describing computations on data.
A stream can be traversed only once. After it was traversed, a stream is said to be consumed. An attempt to consume it again throws:
java.lang.IllegalStateException: stream has already been operated upon or closed
The Streams API is meant as an alternative way of processing collections, in a declarative manner, by using internal iterations. Unlike in a collection's external iteration case, the loop over elements is managed internally inside the library. The API users provides a function specifying the computation that needs to be done.
Encounter Order
The encounter order specifies the order in which items logically appear in the stream. For example, a stream generated from a List will expose elements in the same order in which they appear in the List.
The fact that a stream has an encounter order or not depends on the source (List) and on the intermediate operations (sorted()). Some operations may render an ordered stream unordered (BaseStream.unordered()). unordered() makes sense because in cases where the stream has an encounter order, but the user does not care about it, explicit de-ordering may improve parallel performance for some stateful or terminal operations. For sequential streams, the presence or absence of an encounter order does not affect performance, only determinism. If a stream is ordered, repeated execution of identical stream pipelines on an identical source will produce an identical result; if it is not ordered, repeated execution might produce different results. For parallel streams, relaxing the ordering constraint can sometimes enable more efficient execution. Operations that are intrinsically tied to encounter order, such as limit(), may require buffering to ensure proper ordering, undermining the benefit of parallelism. Most stream pipelines, however, still parallelize efficiently even under ordering constraints.
Ordered Stream
An ordered stream is a stream that has a defined encounter order.
If a stream is ordered, most operations are constrained to operate on the elements in their encounter order.
Source
Stream data sources are collections, arrays and I/O resources. For more details on how streams are created see Stream Creation.
Data elements generated by an ordered collection will have the same order in the stream.
Stream Pipeline
Stream operations are composed into a stream pipeline to perform a computation.
Stream Operation
Stream operations have two important characteristics:
- Pipelining. Most stream operations return a stream, allowing operations to be chained and form a larger pipeline and enabling certain operations such as laziness and short-circuiting and
- Internal Iteration.
The stream operations that return a stream, and thus can be connected, are called intermediate operations. The stream operations that close the stream and return a non-stream result are called terminal operations.
Ideally, stream operations must be based on functions that don't interact - the encapsulated functionality must not access shared state. For more details see Functional Programming.
Intermediate Operations
A stream operation that returns another stream, and thus can be connected to other stream operations to form a pipeline, is called intermediate operation. Intermediate operations do not consume from streams, their purpose is to serve as a processing element in a pipeline. Intermediate operations do not perform any processing until a terminal operation is invoked on the stream pipeline. It is said that the intermediate operations are lazy.
The idea behind a stream pipeline is similar to the builder pattern.
Filtering Data
Filtering in this context means dropping certain elements based on a criterion.
Transforming Data
Sorting Data
Stream<T> sorted();
This form applies to streams whose elements have a natural order (they implement Comparable). If the elements of this stream are not Comparable, a java.lang.ClassCastException may be thrown when the terminal operation is executed.
Stream<T> sorted(Comparator<? super T> comparator);
Sorting operations are stateful unbouned.
Terminal Operations
A stream operation that consumes and closes the stream and returns a non-stream result is called terminal operation.
Reduction
A reduction operation is an operation through which a stream is reduced to a value
Stream-Level Predicates
These "match" operations apply a predicate to all elements of the stream, subject to short-circuiting, and return a boolean result.
Find Methods
Other Terminal Operations
forEach() consumes each element from a stream and applies a lambda to each of them. The method returns void.
void forEach(Consumer<? super T> action);
Stateful Operations
Stream Creation
Empty Stream
Stream<String> empty = Stream.empty();
From Collections
Java 8 Collections expose a new stream() method that returns a stream.
Numeric Ranges
Specialized primitive interfaces IntStream and LongStream expose static range() and rangeClosed() that return a sequential ordered specialized stream with values in the specified range.
IntStream is = IntStream.range(1, 11);
From Values
The Stream class exposes static of() that builds a Stream instance from one argument or a variable list of arguments:
static <T> Stream<T> of(T t);
static <T> Stream<T> of(T... values);
In the first case we'll get one element stream, and in the second, the stream will has as many elements as arguments. Of course, this API is practical to use for a small number of elements. To create a stream for a larger array, or when we need a stream of primitive types, see From Arrays below.
From Arrays
Arrays.stream() creates a stream from an array.
public static <T> Stream<T> stream(T[] array);
Arrays.stream() is better than Stream.of() when we need a primitive stream, because it will produce a specialized stream instead of autoboxing primitives into Java objects.
From Nullable
Since Java 9.
static <T> Stream<T> ofNullable(T t);
Stream<String> values = Stream.of("config", "home", "user").flatMap(key -> Stream.ofNullable(System.getProperty(key)));
From Files
Many static methods in java.nio.file.Files return a stream.
Note that the associated I/O resources must be closed to avoid leaks, and the streams are auto-closable, so the following pattern may be used:
try(Stream<String> stream = Files.lines(new File(...).toPath())) {
stream.forEach(...);
}
catch(IOException e) {
// ...
}
From Functions
Streams can be created from functions, which may result in infinite streams.
Stream.iterate()
Stream.iterate() should be used when you need to produce a sequence of successive values, where a value depends on the previous one:
static <T> Stream<T> iterate(T seed, UnaryOperator<T> f);
static <T> Stream<T> iterate(T seed, Predicate<? super T> hasNext, UnaryOperator<T> next)
Stream.generate()
static <T> Stream<T> generate(Supplier<? extends T> s)
Short-Circuiting
Some operations do not need to process the whole stream to produce a result. For example, the evaluation of anyMatch may stop at the first element that matches.
Autoboxing and Specialized Interfaces
The Streams API supplies primitive stream specializations that support specialized method to work with primitive types. These interfaces eliminate the need for autoboxing.
These interfaces bring new methods to perform common numeric reductions such as sum() and max(). In addition they have methods to convert back to a stream of objects when necessary. Example:
IntStream intStream = ...;
Stream<Integer> s = intStream.boxed();
Examples of specialized API: mapping and flat-mapping.
Specialized Interface Numeric Ranges
IntStream and LongStream expose static methods that generate numeric ranges: range() and rangeClosed().
Also see Numeric Ranges above.
Parallel Streams
A parallel stream is a stream that splits its elements into multiple chunks, processing each chunk with a different thread. This way the workload is automatically partitioned on all cores of a multicore processor.
A parallel stream is produced by a collection by invoking its parallelStream() method. An existing stream can be parallelized by invoking parallel() on the stream itself. Calling parallel() on a sequential stream does not imply any concrete transformation on the stream itself. Internally, a boolean flag is set to signal that the subsequent operations should be run in parallel. A parallel stream can be turned into a sequential stream by invoking sequential(). The last call on parallel() or sequential() affects the pipeline globally, it is not possible to have parts of the pipeline executed in parallel and parts of it executed sequentially.
Parallel stream implementations uses fork/join framework introduced in Java 7, specifically the ForkJoinPool.
TODO
- Process https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
- Explain how data is partitioned.