Java 8 Streams API: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(41 intermediate revisions by the same user not shown)
Line 6: Line 6:


* [[Java#Java_8|Java]]
* [[Java#Java_8|Java]]
=TODO=
<font color=red>
* Process https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
</font>


=Overview=
=Overview=


The Streams API provides a method to represent and process sequenced data in parallel, transparently taking advantage of multi-core architectures. The Streams API offers a higher level of abstraction, based on the concept of transforming a stream of objects into a stream of different objects, while delegating parallel processing concerns to the runtime. The Streams API obviates the need for explicitly programming Threads and the [[Java synchronized mechanism#Inefficiencies|<tt>synchronized</tt> mechanism]]. This represents a shift to focusing on partitioning the data rather than coordinating access to it.
The Streams API provides a method to represent and process sequenced data in parallel, transparently taking advantage of multi-core architectures. The Streams API offers a higher level of abstraction than Java [[Java Collections|Collections]], based on the concept of transforming a stream of objects into a stream of different objects, while delegating parallel processing concerns to the runtime. The Streams API obviates the need for explicitly programming Threads and the [[Java synchronized mechanism#Inefficiencies|<tt>synchronized</tt> mechanism]]. This represents a shift to focusing on partitioning the data rather than coordinating access to it.


=Stream=
=Stream=
Line 41: Line 49:
=Source=
=Source=


Stream data sources are collections, arrays and I/O resources.
Stream data sources are collections, arrays and I/O resources. For more details on how streams are created see [[#Stream_Creation|Stream Creation]].


Data elements generated by an ordered collection will have the same order in the stream.
Data elements generated by an ordered collection will have the same order in the stream.
Line 48: Line 56:


[[#Stream_Operations|Stream operations]] are composed into a ''stream pipeline'' to perform a computation.
[[#Stream_Operations|Stream operations]] are composed into a ''stream pipeline'' to perform a computation.
=<span id='Empty_Stream'></span><span id='From_Collections'></span><span id='Numeric_Ranges'></span><span id='From_Values'></span><span id='From_Arrays'></span><span id='From_Nullable'></span><span id='From_Files'></span><span id='From_Functions'></span>Stream Creation=
{{Internal|Java 8 Streams API Stream Creation|Stream Creation}}


=Stream Operation=
=Stream Operation=


Stream operations must be based on functions that don't interact - the encapsulated functionality must not access shared state. For more details see [[Functional_Programming#Overview|Functional Programming]].
Stream operations have two important characteristics:
 
# '''Pipelining'''. Most stream operations return a stream, allowing operations to be chained and form a larger [[#Stream_Pipeline|pipeline]] and enabling certain operations such as ''laziness'' and ''short-circuiting'' and
# '''Internal Iteration'''.  


Stream operations have two important characteristics: 1) ''pipelining'': most stream operations return a stream, allowing operations to be chained and form a larger [[#Stream_Pipeline|pipeline]] and enabling certain operations such as ''laziness'' and ''short-circuiting'' and 2) ''internal iteration''. The stream operations that return a stream, and thus can be connected, are called [[#Intermediate_Operations|intermediate operations]]. The stream operations that close the stream and return a non-stream result are called [[#Terminal_Operations|terminal operations]].
The stream operations that return a stream, and thus can be connected, are called [[#Intermediate_Operations|intermediate operations]]. The stream operations that close the stream and return a non-stream result are called [[#Terminal_Operations|terminal operations]].
 
Ideally, stream operations must be based on functions that don't interact - the encapsulated functionality must not access shared state. For more details see [[Functional_Programming#Overview|Functional Programming]].


==Intermediate Operations==
==Intermediate Operations==


A stream operation that returns a stream, and thus can be connected to other stream operations to form a [[#Stream_Pipeline|pipeline]], is called ''intermediate operation''. Intermediate operations do not perform any processing until a [[#Terminal_Operations|terminal operation]] is invoked on the stream pipeline. It is said that the intermediate operations are lazy.
A stream operation that returns another stream, and thus can be connected to other stream operations to form a [[#Stream_Pipeline|pipeline]], is called ''intermediate operation''. Intermediate operations do not consume from streams, their purpose is to serve as a processing element in a pipeline. Intermediate operations do not perform any processing until a [[#Terminal_Operations|terminal operation]] is invoked on the stream pipeline. It is said that the intermediate operations are lazy.


The idea behind a stream pipeline is similar to the [[Builder Pattern#Overview|builder pattern]].
The idea behind a stream pipeline is similar to the [[Builder Pattern#Overview|builder pattern]].
Line 83: Line 100:
==Terminal Operations==
==Terminal Operations==


A stream operation that closes the stream and returns a non-stream result is called ''terminal operation''.
A stream operation that consumes and closes the stream and returns a non-stream result is called ''terminal operation''.


===Reduction===
===Reduction===


A ''reduction operation'' is an operation through which a stream is reduced to a value. In functional programming, such an operation is referred to as a ''fold'' because it can be viewed as repeatedly folding a long piece of paper - the stream - until it forms a small square. A reduction operation takes a sequence of input elements and combines them into a [[#Single-Result_Reduction|single summary result]], either by repeated application of a combining operation, or [[#Collection|accumulating elements into a list]].
A ''reduction operation'' is an operation through which a stream is reduced to a value


The traditional way of implementing reduction operations until the introduction of the Streams API was in external loops, as a mutative accumulation. It is in general a good idea to prefer a reduction operation over a iterative mutative accumulation: the reduction is "more abstract", it operates on the stream as a whole rather than individual elements, and a properly constructed reduction is inherently parallelizable, as long as the function used to process the elements are [[Functional_Programming#Associative_Function|associative]] and [[Functional_Programming#Stateless_Function|stateless]]. Reduction parallellizes well because the implementation can operate on subsets of the data in parallel, and then combine the intermediate results to get the final correct answer. The library can provide an efficient parallel implementation with no additional synchronization required.
{{Internal|Java 8 Streams API - Reduction#Overview|Stream Reduction}}
 
A chain of map ([[#Transforming_Data|transform]]) and [[#Reduction|reduce]] operations is commonly known as the map-reduce pattern. For more on map-reduce parallelism see: {{Internal|Parallelism#Streams_API_and_Parallelism|Streams API and Parallelism}}
 
Reduction operations are [[Functional_Programming#Stateful_Bounded|stateful bounded]].
 
====Collection====
 
<tt>collect()</tt> reduces the stream to create a collection such as List or Map:
 
<syntaxhighlight lang='java'>
public interface Stream<T> ... {
 
    <R> R collect(Supplier<R> supplier, BiConsumer<R, ? super T> accumulator,  BiConsumer<R, R> combiner);
 
    <R, A> R collect(Collector<? super T, A, R> collector);
 
}
</syntaxhighlight>
 
====Single-Result Reduction====
 
<tt>reduce()</tt> is the most generic form of a single-result reduction. More specialized versions such as <tt>max()</tt>, <tt>min()</tt> and <tt>[[#count|count()]]</tt> are available.
 
=====reduce(identity, accumulator)=====
 
T [https://docs.oracle.com/javase/10/docs/api/java/util/stream/Stream.html#reduce(T,java.util.function.BinaryOperator) reduce​](T identity, [[Java.util.function.BinaryOperator#Overview|BinaryOperator<T>]] accumulator);
 
In this form, the [[Functional_Programming#Identity_Value|identity element]] is both an initial seed value for the reduction and a default result if there are no input elements. The ''accumulator'' function takes a partial result and the next element, and produces a new partial result:
 
(''partial-result'', ''next-element'') -> {
    ...
    return ''next-partial-result'';
}
 
The accumulator function must be an [[Functional_Programming#Associative_Function|associative]], [[Functional_Programming#Non-Interfering_Function|non-interfering]] and [[Functional_Programming#Stateless_Function|stateless]] function.
 
=====reduce(accumulator)=====
 
Optional<T> [https://docs.oracle.com/javase/10/docs/api/java/util/stream/Stream.html#reduce(java.util.function.BinaryOperator) reduce​]([[Java.util.function.BinaryOperator#Overview|BinaryOperator<T>]] accumulator);
 
In this form, we only need the ''accumulator'' function that takes the partial result and the next element and produces a new partial result:
 
(''partial-result'', ''next-element'') -> {
    ...
    return ''next-partial-result'';
}
 
The accumulator function must be an [[Functional_Programming#Associative_Function|associative]], [[Functional_Programming#Non-Interfering_Function|non-interfering]] and [[Functional_Programming#Stateless_Function|stateless]] function. No [[Functional_Programming#Identity_Value|identity element]] is necessary: if the stream has just one element, the accumulator function is not invoked, and if has more than one element, the accumulator will be first invoked with the first two elements. If the stream is empty, the reduction will result into an empty Optional.
 
=====reduce(identity, accumulator, combiner)=====
 
<&#85;> U [https://docs.oracle.com/javase/10/docs/api/java/util/stream/Stream.html#reduce(U,java.util.function.BiFunction,java.util.function.BinaryOperator) reduce​](U identity, [[Java.util.function.BiFunction#Overview|BiFunction<U, ? super T, U>]] accumulator, [[Java.util.function.BinaryOperator#Overview|BinaryOperator<&#85;>]] combiner);
 
This is the most generic reduction method. The [[Functional_Programming#Identity_Value|identity element]] is both an initial seed value for the reduction and a default result if there are no input elements. The ''accumulator'' function takes a partial result and the next element, and produces a new partial result:
 
(''partial-result'', ''next-element'') -> {
    ...
    return ''next-partial-result'';
}
 
The accumulator function must be an [[Functional_Programming#Associative_Function|associative]], [[Functional_Programming#Non-Interfering_Function|non-interfering]] and [[Functional_Programming#Stateless_Function|stateless]] function. The ''combiner'' function combines two partial results to product a new partial result, and it is necessary in parallel reductions, where the input is partitioned, a partial accumulation computed for each partition, and then the partial results are combined to produce a final result.
 
=====count()=====
 
<span id='count'></span><tt>count()</tt> returns the number of elements in the stream:
 
<syntaxhighlight lang='java'>
long count();
</syntaxhighlight>


===Stream-Level Predicates===
===Stream-Level Predicates===
Line 179: Line 127:


==Stateful Operations==
==Stateful Operations==
=Stream Creation=
=Empty Stream=
<syntaxhighlight lang='java'>
Stream<String> empty = Stream.empty();
</syntaxhighlight>
==From Collections==
Java 8 Collections expose a new <tt>stream()</tt> method that returns a stream.
==Numeric Ranges==
Specialized primitive interfaces IntStream and LongStream expose static <tt>[https://docs.oracle.com/javase/10/docs/api/java/util/stream/IntStream.html#range(int,int) range()]</tt> and <tt>[https://docs.oracle.com/javase/10/docs/api/java/util/stream/IntStream.html#rangeClosed(int,int) rangeClosed()]</tt> that return a sequential ordered specialized stream with values in the specified range.
<syntaxhighlight lang='java'>
IntStream is = IntStream.range(1, 11);
</syntaxhighlight>
==From Values==
The Stream class exposes static <tt>of()</tt> that builds a Stream instance from one argument or a variable list of arguments:
<syntaxhighlight lang='java'>
static <T> Stream<T> of​(T t);
static <T> Stream<T> of​(T... values);
</syntaxhighlight>
In the first case we'll get one element stream, and in the second, the stream will has as many elements as arguments. Of course, this API is practical to use for a small number of elements. To create a stream for a larger array, or when we need a stream of primitive types, see [[#From_Arrays|From Arrays]] below.
==From Arrays==
Arrays.stream() creates a stream from an array.
{{External|https://docs.oracle.com/javase/10/docs/api/java/util/Arrays.html#stream(T%5B%5D)}}
<syntaxhighlight lang='java'>
public static <T> Stream<T> stream​(T[] array);
</syntaxhighlight>
<tt>Arrays.stream()</tt> is better than <tt>[[#From_Values|Stream.of()]]</tt> when we need a primitive stream, because it will produce a specialized stream instead of autoboxing primitives into Java objects.
==From Nullable==
Since Java 9.
<syntaxhighlight lang='java'>
static <T> Stream<T> ofNullable​(T t);
</syntaxhighlight>
<syntaxhighlight lang='java'>
Stream<String> values = Stream.of("config", "home", "user").flatMap(key -> Stream.ofNullable(System.getProperty(key)));
</syntaxhighlight>
==From Files==
Many static methods  in <tt>java.nio.file.Files</tt> return a stream.
Note that the associated I/O resources must be closed to avoid leaks, and the streams are [[Java_7_try-with-resources#AutoCloseable|auto-closable]], so the following [[Java 7 try-with-resources#Overview|pattern]] may be used:
<syntaxhighlight lang='java'>
try(Stream<String> stream = Files.lines(new File(...).toPath())) {
    stream.forEach(...);
}
catch(IOException e) {
    // ...
}
</syntaxhighlight>
==From Functions==


=Short-Circuiting=
=Short-Circuiting=
Line 281: Line 155:
Also see [[#Numeric_Ranges|Numeric Ranges]] above.
Also see [[#Numeric_Ranges|Numeric Ranges]] above.


=Subjects=
=Parallel Streams=
 
==Collectors==
 
https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html
 
==Loop Fusion==
 
==Interesting API Calls==
 
=TODO=
 
<font color=red>
 
* Process https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
* Explain how data is partitioned.
* Specialized primitive Streams (IntStream, ...)


</font>
{{Internal|Java 8 Streams API - Parallel Streams|Parallel Streams}}

Latest revision as of 19:45, 6 April 2018

External

Internal

TODO

Overview

The Streams API provides a method to represent and process sequenced data in parallel, transparently taking advantage of multi-core architectures. The Streams API offers a higher level of abstraction than Java Collections, based on the concept of transforming a stream of objects into a stream of different objects, while delegating parallel processing concerns to the runtime. The Streams API obviates the need for explicitly programming Threads and the synchronized mechanism. This represents a shift to focusing on partitioning the data rather than coordinating access to it.

Stream

https://docs.oracle.com/javase/10/docs/api/java/util/stream/Stream.html

A stream is a sequence of data items of a specific type, which are conceptually produced one at a time by a source, and that supports sequential and parallel aggregate operations. There may be several intermediate operations, connected into a stream pipeline, and one terminal operation that executes the stream pipeline and produces a non-stream result.

All streams implement java.util.stream.Stream<T>.

Unlike collections, which are data structures for storing and accessing elements with specific time/space complexities, streams are about expressing computations on data. Unlike collections, which store all the values of the data structure in memory, and every element of the collection has to be computed before it can be added to collection, streams' elements are computed on demand. The Streams API uses behavior parameterization and expects code that parameterizes the behavior of its operations to be passed to the API. Collections are mostly about storing and accessing data, whereas the Streams API is mostly about describing computations on data.

A stream can be traversed only once. After it was traversed, a stream is said to be consumed. An attempt to consume it again throws:

java.lang.IllegalStateException: stream has already been operated upon or closed

The Streams API is meant as an alternative way of processing collections, in a declarative manner, by using internal iterations. Unlike in a collection's external iteration case, the loop over elements is managed internally inside the library. The API users provides a function specifying the computation that needs to be done.

Encounter Order

The encounter order specifies the order in which items logically appear in the stream. For example, a stream generated from a List will expose elements in the same order in which they appear in the List.

The fact that a stream has an encounter order or not depends on the source (List) and on the intermediate operations (sorted()). Some operations may render an ordered stream unordered (BaseStream.unordered()). unordered() makes sense because in cases where the stream has an encounter order, but the user does not care about it, explicit de-ordering may improve parallel performance for some stateful or terminal operations. For sequential streams, the presence or absence of an encounter order does not affect performance, only determinism. If a stream is ordered, repeated execution of identical stream pipelines on an identical source will produce an identical result; if it is not ordered, repeated execution might produce different results. For parallel streams, relaxing the ordering constraint can sometimes enable more efficient execution. Operations that are intrinsically tied to encounter order, such as limit(), may require buffering to ensure proper ordering, undermining the benefit of parallelism. Most stream pipelines, however, still parallelize efficiently even under ordering constraints.

Ordered Stream

An ordered stream is a stream that has a defined encounter order.

If a stream is ordered, most operations are constrained to operate on the elements in their encounter order.

Source

Stream data sources are collections, arrays and I/O resources. For more details on how streams are created see Stream Creation.

Data elements generated by an ordered collection will have the same order in the stream.

Stream Pipeline

Stream operations are composed into a stream pipeline to perform a computation.

Stream Creation

Stream Creation

Stream Operation

Stream operations have two important characteristics:

  1. Pipelining. Most stream operations return a stream, allowing operations to be chained and form a larger pipeline and enabling certain operations such as laziness and short-circuiting and
  2. Internal Iteration.

The stream operations that return a stream, and thus can be connected, are called intermediate operations. The stream operations that close the stream and return a non-stream result are called terminal operations.

Ideally, stream operations must be based on functions that don't interact - the encapsulated functionality must not access shared state. For more details see Functional Programming.

Intermediate Operations

A stream operation that returns another stream, and thus can be connected to other stream operations to form a pipeline, is called intermediate operation. Intermediate operations do not consume from streams, their purpose is to serve as a processing element in a pipeline. Intermediate operations do not perform any processing until a terminal operation is invoked on the stream pipeline. It is said that the intermediate operations are lazy.

The idea behind a stream pipeline is similar to the builder pattern.

Filtering Data

Filtering in this context means dropping certain elements based on a criterion.

Filtering Data with Java 8 Streams API

Transforming Data

Transforming Data with Java 8 Streams API

Sorting Data

Stream<T> sorted();

This form applies to streams whose elements have a natural order (they implement Comparable). If the elements of this stream are not Comparable, a java.lang.ClassCastException may be thrown when the terminal operation is executed.

Stream<T> sorted(Comparator<? super T> comparator);

Sorting operations are stateful unbouned.

Terminal Operations

A stream operation that consumes and closes the stream and returns a non-stream result is called terminal operation.

Reduction

A reduction operation is an operation through which a stream is reduced to a value

Stream Reduction

Stream-Level Predicates

These "match" operations apply a predicate to all elements of the stream, subject to short-circuiting, and return a boolean result.

Stream-Level Predicates

Find Methods

Find Methods

Other Terminal Operations

forEach() consumes each element from a stream and applies a lambda to each of them. The method returns void.

void forEach(Consumer<? super T> action);

Stateful Operations

Short-Circuiting

Some operations do not need to process the whole stream to produce a result. For example, the evaluation of anyMatch may stop at the first element that matches.

Autoboxing and Specialized Interfaces

The Streams API supplies primitive stream specializations that support specialized method to work with primitive types. These interfaces eliminate the need for autoboxing.

These interfaces bring new methods to perform common numeric reductions such as sum() and max(). In addition they have methods to convert back to a stream of objects when necessary. Example:

IntStream intStream = ...;
Stream<Integer> s = intStream.boxed();

Examples of specialized API: mapping and flat-mapping.

Specialized Interface Numeric Ranges

IntStream and LongStream expose static methods that generate numeric ranges: range() and rangeClosed().

Also see Numeric Ranges above.

Parallel Streams

Parallel Streams