Java 8 Streams API: Difference between revisions
Line 258: | Line 258: | ||
===Stream.iterate()=== | ===Stream.iterate()=== | ||
static <T> Stream<T> iterate(T seed, UnaryOperator<T> f); | static <T> Stream<T> iterate(T seed, [[Java.util.function.UnaryOperator#Overview|UnaryOperator]]<T> f); | ||
static <T> Stream<T> iterate(T seed, Predicate<? super T> hasNext, UnaryOperator<T> next) | static <T> Stream<T> iterate(T seed, Predicate<? super T> hasNext, [[Java.util.function.UnaryOperator#Overview|UnaryOperator]]<T> next) | ||
===Stream.generate()=== | ===Stream.generate()=== |
Revision as of 21:26, 29 March 2018
External
Internal
Overview
The Streams API provides a method to represent and process sequenced data in parallel, transparently taking advantage of multi-core architectures. The Streams API offers a higher level of abstraction, based on the concept of transforming a stream of objects into a stream of different objects, while delegating parallel processing concerns to the runtime. The Streams API obviates the need for explicitly programming Threads and the synchronized mechanism. This represents a shift to focusing on partitioning the data rather than coordinating access to it.
Stream
A stream is a sequence of data items of a specific type, which are conceptually produced one at a time by a source, and that supports sequential and parallel aggregate operations. There may be several intermediate operations, connected into a stream pipeline, and one terminal operation that executes the stream pipeline and produces a non-stream result.
All streams implement java.util.stream.Stream<T>.
Unlike collections, which are data structures for storing and accessing elements with specific time/space complexities, streams are about expressing computations on data. Unlike collections, which store all the values of the data structure in memory, and every element of the collection has to be computed before it can be added to collection, streams' elements are computed on demand. The Streams API uses behavior parameterization and expects code that parameterizes the behavior of its operations to be passed to the API. Collections are mostly about storing and accessing data, whereas the Streams API is mostly about describing computations on data.
A stream can be traversed only once. After it was traversed, a stream is said to be consumed. An attempt to consume it again throws:
java.lang.IllegalStateException: stream has already been operated upon or closed
The Streams API is meant as an alternative way of processing collections, in a declarative manner, by using internal iterations. Unlike in a collection's external iteration case, the loop over elements is managed internally inside the library. The API users provides a function specifying the computation that needs to be done.
Encounter Order
The encounter order specifies the order in which items logically appear in the stream. For example, a stream generated from a List will expose elements in the same order in which they appear in the List.
The fact that a stream has an encounter order or not depends on the source (List) and on the intermediate operations (sorted()). Some operations may render an ordered stream unordered (BaseStream.unordered()). unordered() makes sense because in cases where the stream has an encounter order, but the user does not care about it, explicit de-ordering may improve parallel performance for some stateful or terminal operations. For sequential streams, the presence or absence of an encounter order does not affect performance, only determinism. If a stream is ordered, repeated execution of identical stream pipelines on an identical source will produce an identical result; if it is not ordered, repeated execution might produce different results. For parallel streams, relaxing the ordering constraint can sometimes enable more efficient execution. Operations that are intrinsically tied to encounter order, such as limit(), may require buffering to ensure proper ordering, undermining the benefit of parallelism. Most stream pipelines, however, still parallelize efficiently even under ordering constraints.
Ordered Stream
An ordered stream is a stream that has a defined encounter order.
If a stream is ordered, most operations are constrained to operate on the elements in their encounter order.
Source
Stream data sources are collections, arrays and I/O resources.
Data elements generated by an ordered collection will have the same order in the stream.
Stream Pipeline
Stream operations are composed into a stream pipeline to perform a computation.
Stream Operation
Stream operations must be based on functions that don't interact - the encapsulated functionality must not access shared state. For more details see Functional Programming.
Stream operations have two important characteristics: 1) pipelining: most stream operations return a stream, allowing operations to be chained and form a larger pipeline and enabling certain operations such as laziness and short-circuiting and 2) internal iteration. The stream operations that return a stream, and thus can be connected, are called intermediate operations. The stream operations that close the stream and return a non-stream result are called terminal operations.
Intermediate Operations
A stream operation that returns a stream, and thus can be connected to other stream operations to form a pipeline, is called intermediate operation. Intermediate operations do not perform any processing until a terminal operation is invoked on the stream pipeline. It is said that the intermediate operations are lazy.
The idea behind a stream pipeline is similar to the builder pattern.
Filtering Data
Filtering in this context means dropping certain elements based on a criterion.
Transforming Data
Sorting Data
Stream<T> sorted();
This form applies to streams whose elements have a natural order (they implement Comparable). If the elements of this stream are not Comparable, a java.lang.ClassCastException may be thrown when the terminal operation is executed.
Stream<T> sorted(Comparator<? super T> comparator);
Sorting operations are stateful unbouned.
Terminal Operations
A stream operation that closes the stream and returns a non-stream result is called terminal operation.
Reduction
A reduction operation is an operation through which a stream is reduced to a value. In functional programming, such an operation is referred to as a fold because it can be viewed as repeatedly folding a long piece of paper - the stream - until it forms a small square. A reduction operation takes a sequence of input elements and combines them into a single summary result, either by repeated application of a combining operation, or accumulating elements into a list.
The traditional way of implementing reduction operations until the introduction of the Streams API was in external loops, as a mutative accumulation. It is in general a good idea to prefer a reduction operation over a iterative mutative accumulation: the reduction is "more abstract", it operates on the stream as a whole rather than individual elements, and a properly constructed reduction is inherently parallelizable, as long as the function used to process the elements are associative and stateless. Reduction parallellizes well because the implementation can operate on subsets of the data in parallel, and then combine the intermediate results to get the final correct answer. The library can provide an efficient parallel implementation with no additional synchronization required.
A chain of map (transform) and reduce operations is commonly known as the map-reduce pattern. For more on map-reduce parallelism see:
Reduction operations are stateful bounded.
Collection
collect() reduces the stream to create a collection such as List or Map:
public interface Stream<T> ... {
<R> R collect(Supplier<R> supplier, BiConsumer<R, ? super T> accumulator, BiConsumer<R, R> combiner);
<R, A> R collect(Collector<? super T, A, R> collector);
}
Single-Result Reduction
reduce() is the most generic form of a single-result reduction. More specialized versions such as max(), min() and count() are available.
reduce(identity, accumulator)
T reduce(T identity, BinaryOperator<T> accumulator);
In this form, the identity element is both an initial seed value for the reduction and a default result if there are no input elements. The accumulator function takes a partial result and the next element, and produces a new partial result:
(partial-result, next-element) -> { ... return next-partial-result; }
The accumulator function must be an associative, non-interfering and stateless function.
reduce(accumulator)
Optional<T> reduce(BinaryOperator<T> accumulator);
In this form, we only need the accumulator function that takes the partial result and the next element and produces a new partial result:
(partial-result, next-element) -> { ... return next-partial-result; }
The accumulator function must be an associative, non-interfering and stateless function. No identity element is necessary: if the stream has just one element, the accumulator function is not invoked, and if has more than one element, the accumulator will be first invoked with the first two elements. If the stream is empty, the reduction will result into an empty Optional.
reduce(identity, accumulator, combiner)
<U> U reduce(U identity, BiFunction<U, ? super T, U> accumulator, BinaryOperator<U> combiner);
This is the most generic reduction method. The identity element is both an initial seed value for the reduction and a default result if there are no input elements. The accumulator function takes a partial result and the next element, and produces a new partial result:
(partial-result, next-element) -> { ... return next-partial-result; }
The accumulator function must be an associative, non-interfering and stateless function. The combiner function combines two partial results to product a new partial result, and it is necessary in parallel reductions, where the input is partitioned, a partial accumulation computed for each partition, and then the partial results are combined to produce a final result.
count()
count() returns the number of elements in the stream:
long count();
Stream-Level Predicates
These "match" operations apply a predicate to all elements of the stream, subject to short-circuiting, and return a boolean result.
Find Methods
Other Terminal Operations
forEach() consumes each element from a stream and applies a lambda to each of them. The method returns void.
void forEach(Consumer<? super T> action);
Stateful Operations
Stream Creation
Empty Stream
Stream<String> empty = Stream.empty();
From Collections
Java 8 Collections expose a new stream() method that returns a stream.
Numeric Ranges
Specialized primitive interfaces IntStream and LongStream expose static range() and rangeClosed() that return a sequential ordered specialized stream with values in the specified range.
IntStream is = IntStream.range(1, 11);
From Values
The Stream class exposes static of() that builds a Stream instance from one argument or a variable list of arguments:
static <T> Stream<T> of(T t);
static <T> Stream<T> of(T... values);
In the first case we'll get one element stream, and in the second, the stream will has as many elements as arguments. Of course, this API is practical to use for a small number of elements. To create a stream for a larger array, or when we need a stream of primitive types, see From Arrays below.
From Arrays
Arrays.stream() creates a stream from an array.
public static <T> Stream<T> stream(T[] array);
Arrays.stream() is better than Stream.of() when we need a primitive stream, because it will produce a specialized stream instead of autoboxing primitives into Java objects.
From Nullable
Since Java 9.
static <T> Stream<T> ofNullable(T t);
Stream<String> values = Stream.of("config", "home", "user").flatMap(key -> Stream.ofNullable(System.getProperty(key)));
From Files
Many static methods in java.nio.file.Files return a stream.
Note that the associated I/O resources must be closed to avoid leaks, and the streams are auto-closable, so the following pattern may be used:
try(Stream<String> stream = Files.lines(new File(...).toPath())) {
stream.forEach(...);
}
catch(IOException e) {
// ...
}
From Functions
Streams can be created from functions, which may result in infinite streams.
Stream.iterate()
static <T> Stream<T> iterate(T seed, UnaryOperator<T> f);
static <T> Stream<T> iterate(T seed, Predicate<? super T> hasNext, UnaryOperator<T> next)
Stream.generate()
Short-Circuiting
Some operations do not need to process the whole stream to produce a result. For example, the evaluation of anyMatch may stop at the first element that matches.
Autoboxing and Specialized Interfaces
The Streams API supplies primitive stream specializations that support specialized method to work with primitive types. These interfaces eliminate the need for autoboxing.
These interfaces bring new methods to perform common numeric reductions such as sum() and max(). In addition they have methods to convert back to a stream of objects when necessary. Example:
IntStream intStream = ...;
Stream<Integer> s = intStream.boxed();
Examples of specialized API: mapping and flat-mapping.
Specialized Interface Numeric Ranges
IntStream and LongStream expose static methods that generate numeric ranges: range() and rangeClosed().
Also see Numeric Ranges above.
Subjects
Collectors
https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html
Loop Fusion
Interesting API Calls
TODO
- Process https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
- Explain how data is partitioned.
- Specialized primitive Streams (IntStream, ...)