Java 8 Streams API - Reduction

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

Internal

Overview

A reduction operation is an operation through which a stream is reduced to a value. In functional programming, such an operation is referred to as a fold because it can be viewed as repeatedly folding a long piece of paper - the stream - until it forms a small square.

A reduction operation takes a sequence of input elements and combines them into a single summary result, either by repeated application of a combining operation, or accumulating elements into a list.

The traditional way of implementing reduction operations until the introduction of the Streams API was in external loops, as a mutative accumulation. It is in general a good idea to use a reduction operation instead of an iterative mutative accumulation: the reduction is "more abstract", it operates on the stream as a whole rather than individual elements, and a properly constructed reduction is inherently parallelizable, as long as the function used to process the elements are associative and stateless. Reduction parallellizes well because the implementation can operate on subsets of the data in parallel, and then combine the intermediate results to get the final correct answer. The library can provide an efficient parallel implementation with no additional synchronization required.

Reduction operations are stateful bounded.

Map/Reduce

A chain of map (transform) and reduce operations is commonly known as the map-reduce pattern. For more on map-reduce parallelism see:

Streams API and Parallelism

Collection

collect() reduces the stream to create a collection such as List or Map. From a generic perspective, a collection is a reduction operation parameterized by a Collector, which in some particular cases may also produces a single value.

public interface Stream<T> ... {

    <R> R collect(Supplier<R> supplier, BiConsumer<R, ? super T> accumulator,  BiConsumer<R, R> combiner);

    <R, A> R collect(Collector<? super T, A, R> collector);

}

java.util.stream.Collector Interface

https://docs.oracle.com/javase/10/docs/api/java/util/stream/Collector.html

A recipe for how to build a summary of the elements of a stream. It can be seen as the final reduction operation.

A mutable reduction operation that accumulates input elements into a mutable result container, optionally transforming the accumulated result into a final representation after all input elements have been processed.

Custom collectors can be written, but the Collectors interface contains several static factory methods to conveniently create instances of the most common collectors that are ready to use.

java.util.stream.Collectors Interface

https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html

This interface defines recipes for accumulating elements of the stream into a summary result. The Collectors interface contains several static factory methods to conveniently create instances of the most common collectors that are ready to use. Do not mistake with Collection, Collector and collect().

The collectors produced by Collectors' static methods offer there main functionalities:

Reduce and Summarize Stream Elements into a Single Value

Collectors.counting()

Collectors.counting()

Returns a Collector that counts the number of input elements. If no elements are present, the result is 0. The functionality is equivalent with Stream.count().

Minimum and Maximum

Collectors.minBy()
Collectors.maxBy()

Summation and Averaging and other Statistics

Collectors.summingInt() 
Collectors.summingLong()
Collectors.summingDouble()
Collectors.averagingInt() 
Collectors.averagingLong()
Collectors.averagingDouble()
Collectors.summarizingInt() 
Collectors.summarizingLong()
Collectors.summarizingDouble()

Join Strings

Various versions of Collectors.joining(...) concatenate the results of invoking toString() on stream elements.

Reduction

Collectors.reducing(...) offers the possibility to apply a generic reduction method. Stream.reduce() is also available.

Group Elements

Partition Elements

Partitioning is a special case of grouping, using a predicate.

Single-Result Reduction

reduce() is the most generic form of a single-result reduction. More specialized versions such as max(), min() and count() are available.

reduce(identity, accumulator)

T reduce​(T identity, BinaryOperator<T> accumulator);

In this form, the identity element is both an initial seed value for the reduction and a default result if there are no input elements. The accumulator function takes a partial result and the next element, and produces a new partial result:

(partial-result, next-element) -> { 
    ... 
    return next-partial-result;
}

The accumulator function must be an associative, non-interfering and stateless function.

reduce(accumulator)

Optional<T> reduce​(BinaryOperator<T> accumulator);

In this form, we only need the accumulator function that takes the partial result and the next element and produces a new partial result:

(partial-result, next-element) -> { 
    ... 
    return next-partial-result;
}

The accumulator function must be an associative, non-interfering and stateless function. No identity element is necessary: if the stream has just one element, the accumulator function is not invoked, and if has more than one element, the accumulator will be first invoked with the first two elements. If the stream is empty, the reduction will result into an empty Optional.

reduce(identity, accumulator, combiner)

<U> U reduce​(U identity, BiFunction<U, ? super T, U> accumulator, BinaryOperator<U> combiner);

This is the most generic reduction method. The identity element is both an initial seed value for the reduction and a default result if there are no input elements. The accumulator function takes a partial result and the next element, and produces a new partial result:

(partial-result, next-element) -> { 
    ... 
    return next-partial-result;
}

The accumulator function must be an associative, non-interfering and stateless function. The combiner function combines two partial results to product a new partial result, and it is necessary in parallel reductions, where the input is partitioned, a partial accumulation computed for each partition, and then the partial results are combined to produce a final result.

count()

count() returns the number of elements in the stream:

long count();

The functionality is equivalent with Collectors.counting().