Java 8 Streams API - Reduction

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

Internal

Overview

A reduction operation is an operation through which a stream is reduced to a value. In functional programming, such an operation is referred to as a fold because it can be viewed as repeatedly folding a long piece of paper - the stream - until it forms a small square.

A reduction operation takes a sequence of input elements and combines them into a single summary result, either by repeated application of a combining operation, or accumulating elements into a list.

The traditional way of implementing reduction operations until the introduction of the Streams API was in external loops, as a mutative accumulation. It is in general a good idea to use a reduction operation instead of an iterative mutative accumulation: the reduction is "more abstract", it operates on the stream as a whole rather than individual elements, and a properly constructed reduction is inherently parallelizable, as long as the function used to process the elements are associative and stateless. Reduction parallellizes well because the implementation can operate on subsets of the data in parallel, and then combine the intermediate results to get the final correct answer. The library can provide an efficient parallel implementation with no additional synchronization required.

Reduction operations are stateful bounded.

Map/Reduce

A chain of map (transform) and reduce operations is commonly known as the map-reduce pattern. For more on map-reduce parallelism see:

Streams API and Parallelism

Semantic Difference between Collection and Reduction

A reduction is meant to combine two values and produce a new one. From this perspective, it is immutable. Reduction is meant to use in conjunction with parallelization.

A collection is designed to mutate a container to accumulate results. While theoretically you can use Stream.reduce() to collect stream elements, it is not semantically correctly to do it.

Single-Result Reduction

reduce() is the most generic form of a single-result reduction. More specialized versions such as max(), min() and count() are available.

reduce(identity, accumulator)

T reduce​(T identity, BinaryOperator<T> accumulator);

In this form, the identity element is both an initial seed value for the reduction and a default result if there are no input elements. The accumulator function takes a partial result and the next element, and produces a new partial result:

(partial-result, next-element) -> { 
    ... 
    return next-partial-result;
}

The accumulator function must be an associative, non-interfering and stateless function.

reduce(accumulator)

Optional<T> reduce​(BinaryOperator<T> accumulator);

In this form, we only need the accumulator function that takes the partial result and the next element and produces a new partial result:

(partial-result, next-element) -> { 
    ... 
    return next-partial-result;
}

The accumulator function must be an associative, non-interfering and stateless function. No identity element is necessary: if the stream has just one element, the accumulator function is not invoked, and if has more than one element, the accumulator will be first invoked with the first two elements. If the stream is empty, the reduction will result into an empty Optional.

reduce(identity, accumulator, combiner)

<U> U reduce​(U identity, BiFunction<U, ? super T, U> accumulator, BinaryOperator<U> combiner);

This is the most generic reduction method. The identity element is both an initial seed value for the reduction and a default result if there are no input elements. The accumulator function takes a partial result and the next element, and produces a new partial result:

(partial-result, next-element) -> { 
    ... 
    return next-partial-result;
}

The accumulator function must be an associative, non-interfering and stateless function. The combiner function combines two partial results to product a new partial result, and it is necessary in parallel reductions, where the input is partitioned, a partial accumulation computed for each partition, and then the partial results are combined to produce a final result.

count()

count() returns the number of elements in the stream:

long count();

The functionality is equivalent with Collectors.counting().


Collection

collect() reduces the stream to create a collection such as List or Map. From a generic perspective, a collection is a reduction operation parameterized by a Collector, which in some particular cases may also produces a single value.

public interface Stream<T> ... {

    <R> R collect(Supplier<R> supplier, BiConsumer<R, ? super T> accumulator,  BiConsumer<R, R> combiner);

    <R, A> R collect(Collector<? super T, A, R> collector);

}

java.util.stream.Collector Interface

https://docs.oracle.com/javase/10/docs/api/java/util/stream/Collector.html

A recipe for how to build a summary of the elements of a stream, as part of the final reduction operation.

A mutable reduction operation that accumulates input elements into a mutable result container, optionally transforming the accumulated result into a final representation after all input elements have been processed.

Custom collectors can be written, but the Collectors interface contains several static factory methods to conveniently create instances of the most common collectors that are ready to use.

More details page 163 Java 8 & 9 in Action.

java.util.stream.Collectors Interface

https://docs.oracle.com/javase/10/docs/api/java/util/stream/Collectors.html

This interface defines recipes for accumulating elements of the stream into a summary result. The Collectors interface contains several static factory methods to conveniently create instances of the most common collectors that are ready to use. Do not mistake with Collection, Collector and collect().

The collectors produced by Collectors' static methods offer there main functionalities:

Reduce and Summarize Stream Elements into a Single Value

Collectors.counting()

Collectors.counting()

Returns a Collector that counts the number of input elements. If no elements are present, the result is 0. The functionality is equivalent with Stream.count().

Minimum and Maximum

Collectors.minBy()
Collectors.maxBy()

Summation and Averaging and other Statistics

Collectors.summingInt() 
Collectors.summingLong()
Collectors.summingDouble()
Collectors.averagingInt() 
Collectors.averagingLong()
Collectors.averagingDouble()
Collectors.summarizingInt() 
Collectors.summarizingLong()
Collectors.summarizingDouble()

Join Strings

Various versions of Collectors.joining(...) concatenate the results of invoking toString() on stream elements.

Reduction

Collectors.reducing(...) offers the possibility to apply a generic reduction method. Stream.reduce() is also available.

Group Elements

Grouping only:

public static <T, K> Collector<T, ?, Map<K,List<T>>> groupingBy(Function<? super T, ? extends K> classifier)

Returns a Collector implementing a "group by" operation on stream elements of type T, grouping elements according to a classification function. We call it classification function because it is u sed to classify the elements of the stream into different groups. The result is a Map having as map key the values returned by the classification function, and as corresponding map value a list of all the items in the stream having that classified value.

public static <T, K, A, D> Collector<T, ?, Map<K, D>> groupingBy(Function<? super T, ? extends K> classifier,
                                                      Collector<? super T, A, D> downstream)
public static <T, K, D, A, M extends Map<K, D>> Collector<T, ?, M> groupingBy (Function<? super T, ? extends K> classifier,
                                                                   Supplier<M> mapFactory,
                                                                   Collector<? super T, A, D> downstream)

Usage examples "Java 8 & 9 in Action", page 150 6.3.1 Manipulating grouped elements.

Multilevel grouping "Java 8 & 9 in Action", page 152 6.3.2 Multilevel grouping.

"Java 8 & 9 in Action", page 153 6.3.3 Collecting data in subgroups.

Partition Elements

Partitioning is a special case of grouping, using a predicate. The predicate is called partitioning function and it is used as classification function. The result will be a Map with two keys (true and false), and will contain exactly two groups of data.