Java 8 Streams API: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 238: Line 238:


Many static methods  in <tt>java.nio.file.Files</tt> return a stream.
Many static methods  in <tt>java.nio.file.Files</tt> return a stream.
Note that the associated I/O resources must be closed, and the streams are auto-closable, so the following [[Java 7 try-with-resources#Overview|pattern]] may be used:


<syntaxhighlight lang='java'>
<syntaxhighlight lang='java'>
Stream<String> stream = Files.lines(new File(...).toPath());
Stream<String> stream = Files.lines(new File(...).toPath());
</syntaxhighlight>
</syntaxhighlight>


==From Functions==
==From Functions==

Revision as of 21:00, 29 March 2018

External

Internal

Overview

The Streams API provides a method to represent and process sequenced data in parallel, transparently taking advantage of multi-core architectures. The Streams API offers a higher level of abstraction, based on the concept of transforming a stream of objects into a stream of different objects, while delegating parallel processing concerns to the runtime. The Streams API obviates the need for explicitly programming Threads and the synchronized mechanism. This represents a shift to focusing on partitioning the data rather than coordinating access to it.

Stream

https://docs.oracle.com/javase/10/docs/api/java/util/stream/Stream.html

A stream is a sequence of data items of a specific type, which are conceptually produced one at a time by a source, and that supports sequential and parallel aggregate operations. There may be several intermediate operations, connected into a stream pipeline, and one terminal operation that executes the stream pipeline and produces a non-stream result.

All streams implement java.util.stream.Stream<T>.

Unlike collections, which are data structures for storing and accessing elements with specific time/space complexities, streams are about expressing computations on data. Unlike collections, which store all the values of the data structure in memory, and every element of the collection has to be computed before it can be added to collection, streams' elements are computed on demand. The Streams API uses behavior parameterization and expects code that parameterizes the behavior of its operations to be passed to the API. Collections are mostly about storing and accessing data, whereas the Streams API is mostly about describing computations on data.

A stream can be traversed only once. After it was traversed, a stream is said to be consumed. An attempt to consume it again throws:

java.lang.IllegalStateException: stream has already been operated upon or closed

The Streams API is meant as an alternative way of processing collections, in a declarative manner, by using internal iterations. Unlike in a collection's external iteration case, the loop over elements is managed internally inside the library. The API users provides a function specifying the computation that needs to be done.

Encounter Order

The encounter order specifies the order in which items logically appear in the stream. For example, a stream generated from a List will expose elements in the same order in which they appear in the List.

The fact that a stream has an encounter order or not depends on the source (List) and on the intermediate operations (sorted()). Some operations may render an ordered stream unordered (BaseStream.unordered()). unordered() makes sense because in cases where the stream has an encounter order, but the user does not care about it, explicit de-ordering may improve parallel performance for some stateful or terminal operations. For sequential streams, the presence or absence of an encounter order does not affect performance, only determinism. If a stream is ordered, repeated execution of identical stream pipelines on an identical source will produce an identical result; if it is not ordered, repeated execution might produce different results. For parallel streams, relaxing the ordering constraint can sometimes enable more efficient execution. Operations that are intrinsically tied to encounter order, such as limit(), may require buffering to ensure proper ordering, undermining the benefit of parallelism. Most stream pipelines, however, still parallelize efficiently even under ordering constraints.

Ordered Stream

An ordered stream is a stream that has a defined encounter order.

If a stream is ordered, most operations are constrained to operate on the elements in their encounter order.

Source

Stream data sources are collections, arrays and I/O resources.

Data elements generated by an ordered collection will have the same order in the stream.

Stream Pipeline

Stream operations are composed into a stream pipeline to perform a computation.

Stream Operation

Stream operations must be based on functions that don't interact - the encapsulated functionality must not access shared state. For more details see Functional Programming.

Stream operations have two important characteristics: 1) pipelining: most stream operations return a stream, allowing operations to be chained and form a larger pipeline and enabling certain operations such as laziness and short-circuiting and 2) internal iteration. The stream operations that return a stream, and thus can be connected, are called intermediate operations. The stream operations that close the stream and return a non-stream result are called terminal operations.

Intermediate Operations

A stream operation that returns a stream, and thus can be connected to other stream operations to form a pipeline, is called intermediate operation. Intermediate operations do not perform any processing until a terminal operation is invoked on the stream pipeline. It is said that the intermediate operations are lazy.

The idea behind a stream pipeline is similar to the builder pattern.

Filtering Data

Filtering in this context means dropping certain elements based on a criterion.

Filtering Data with Java 8 Streams API

Transforming Data

Transforming Data with Java 8 Streams API

Sorting Data

Stream<T> sorted();

This form applies to streams whose elements have a natural order (they implement Comparable). If the elements of this stream are not Comparable, a java.lang.ClassCastException may be thrown when the terminal operation is executed.

Stream<T> sorted(Comparator<? super T> comparator);

Sorting operations are stateful unbouned.

Terminal Operations

A stream operation that closes the stream and returns a non-stream result is called terminal operation.

Reduction

A reduction operation is an operation through which a stream is reduced to a value. In functional programming, such an operation is referred to as a fold because it can be viewed as repeatedly folding a long piece of paper - the stream - until it forms a small square. A reduction operation takes a sequence of input elements and combines them into a single summary result, either by repeated application of a combining operation, or accumulating elements into a list.

The traditional way of implementing reduction operations until the introduction of the Streams API was in external loops, as a mutative accumulation. It is in general a good idea to prefer a reduction operation over a iterative mutative accumulation: the reduction is "more abstract", it operates on the stream as a whole rather than individual elements, and a properly constructed reduction is inherently parallelizable, as long as the function used to process the elements are associative and stateless. Reduction parallellizes well because the implementation can operate on subsets of the data in parallel, and then combine the intermediate results to get the final correct answer. The library can provide an efficient parallel implementation with no additional synchronization required.

A chain of map (transform) and reduce operations is commonly known as the map-reduce pattern. For more on map-reduce parallelism see:

Streams API and Parallelism

Reduction operations are stateful bounded.

Collection

collect() reduces the stream to create a collection such as List or Map:

public interface Stream<T> ... {

    <R> R collect(Supplier<R> supplier, BiConsumer<R, ? super T> accumulator,  BiConsumer<R, R> combiner);

    <R, A> R collect(Collector<? super T, A, R> collector);

}

Single-Result Reduction

reduce() is the most generic form of a single-result reduction. More specialized versions such as max(), min() and count() are available.

reduce(identity, accumulator)
T reduce​(T identity, BinaryOperator<T> accumulator);

In this form, the identity element is both an initial seed value for the reduction and a default result if there are no input elements. The accumulator function takes a partial result and the next element, and produces a new partial result:

(partial-result, next-element) -> { 
    ... 
    return next-partial-result;
}

The accumulator function must be an associative, non-interfering and stateless function.

reduce(accumulator)
Optional<T> reduce​(BinaryOperator<T> accumulator);

In this form, we only need the accumulator function that takes the partial result and the next element and produces a new partial result:

(partial-result, next-element) -> { 
    ... 
    return next-partial-result;
}

The accumulator function must be an associative, non-interfering and stateless function. No identity element is necessary: if the stream has just one element, the accumulator function is not invoked, and if has more than one element, the accumulator will be first invoked with the first two elements. If the stream is empty, the reduction will result into an empty Optional.

reduce(identity, accumulator, combiner)
<U> U reduce​(U identity, BiFunction<U, ? super T, U> accumulator, BinaryOperator<U> combiner);

This is the most generic reduction method. The identity element is both an initial seed value for the reduction and a default result if there are no input elements. The accumulator function takes a partial result and the next element, and produces a new partial result:

(partial-result, next-element) -> { 
    ... 
    return next-partial-result;
}

The accumulator function must be an associative, non-interfering and stateless function. The combiner function combines two partial results to product a new partial result, and it is necessary in parallel reductions, where the input is partitioned, a partial accumulation computed for each partition, and then the partial results are combined to produce a final result.

count()

count() returns the number of elements in the stream:

long count();

Stream-Level Predicates

These "match" operations apply a predicate to all elements of the stream, subject to short-circuiting, and return a boolean result.

Stream-Level Predicates

Find Methods

Find Methods

Other Terminal Operations

forEach() consumes each element from a stream and applies a lambda to each of them. The method returns void.

void forEach(Consumer<? super T> action);

Stateful Operations

Stream Creation

Empty Stream

Stream<String> empty = Stream.empty();

From Collections

Java 8 Collections expose a new stream() method that returns a stream.

Numeric Ranges

Specialized primitive interfaces IntStream and LongStream expose static range() and rangeClosed() that return a sequential ordered specialized stream with values in the specified range.

IntStream is = IntStream.range(1, 11);

From Values

The Stream class exposes static of() that builds a Stream instance from one argument or a variable list of arguments:

static <T> Stream<T> of(T t);
static <T> Stream<T> of(T... values);

In the first case we'll get one element stream, and in the second, the stream will has as many elements as arguments. Of course, this API is practical to use for a small number of elements. To create a stream for a larger array, or when we need a stream of primitive types, see From Arrays below.

From Arrays

Arrays.stream() creates a stream from an array.

https://docs.oracle.com/javase/10/docs/api/java/util/Arrays.html#stream(T%5B%5D)
public static <T> Stream<T> stream(T[] array);

Arrays.stream() is better than Stream.of() when we need a primitive stream, because it will produce a specialized stream instead of autoboxing primitives into Java objects.

From Nullable

Since Java 9.

static <T> Stream<T> ofNullable(T t);
Stream<String> values = Stream.of("config", "home", "user").flatMap(key -> Stream.ofNullable(System.getProperty(key)));

From Files

Many static methods in java.nio.file.Files return a stream.

Note that the associated I/O resources must be closed, and the streams are auto-closable, so the following pattern may be used:

Stream<String> stream = Files.lines(new File(...).toPath());

From Functions

Short-Circuiting

Some operations do not need to process the whole stream to produce a result. For example, the evaluation of anyMatch may stop at the first element that matches.

Autoboxing and Specialized Interfaces

The Streams API supplies primitive stream specializations that support specialized method to work with primitive types. These interfaces eliminate the need for autoboxing.

These interfaces bring new methods to perform common numeric reductions such as sum() and max(). In addition they have methods to convert back to a stream of objects when necessary. Example:

IntStream intStream = ...;
Stream<Integer> s = intStream.boxed();

Examples of specialized API: mapping and flat-mapping.

Specialized Interface Numeric Ranges

IntStream and LongStream expose static methods that generate numeric ranges: range() and rangeClosed().

Also see Numeric Ranges above.

Subjects

Collectors

https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html

Loop Fusion

Interesting API Calls

TODO