Java 8 Streams API - Parallel Streams

From NovaOrdis Knowledge Base
Revision as of 20:47, 6 April 2018 by Ovidiu (talk | contribs) (→‎Overview)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Internal

Java 8 Streams API

TODO

Explain how data is partitioned.

Overview

A parallel stream is a stream that splits its elements into multiple chunks, processing each chunk with a different thread. This way the workload is automatically partitioned on all cores of a multicore processor.

A parallel stream is produced by a collection by invoking its parallelStream() method. An existing stream can be parallelized by invoking parallel() on the stream itself. Calling parallel() on a sequential stream does not imply any concrete transformation on the stream itself. Internally, a boolean flag is set to signal that the subsequent operations should be run in parallel. A parallel stream can be turned into a sequential stream by invoking sequential(). The last call on parallel() or sequential() affects the pipeline globally, it is not possible to have parts of the pipeline executed in parallel and parts of it executed sequentially.

Parallel stream implementations uses fork/join framework introduced in Java 7, specifically the ForkJoinPool.

Some stream operations are more parallelizable than others. For example, a Stream.iterate() operation cannot be split in chunks that can be executed independently because the input of a function invocation is dependent on the result of the previous invocation. LongStream.rangeClosed(), on the other hand, produces ranges of numbers that can be split into independent chunks.

Configuring the Thread Pool Used by Parallel Streams

Parallel streams use internally the default ForkJoinPool, which by default has as many threads as the machine has processors, as returned by Runtime.getRuntime().availableProcessors(). The size of the pool can be changed globally per VM setting

System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "100");

This is a global setting that will affect all parallel streams in the code.