Should I always use a parallel stream when possible?

stream vs parallel stream performance
java parallel stream number of threads
parallel stream vs executorservice
parallel stream().foreach example
disadvantages of parallel stream in java 8
how java 8 parallel stream works internally
is parallel stream thread-safe
parallel stream filter example

With Java 8 and lambdas it's easy to iterate over collections as streams, and just as easy to use a parallel stream. Two examples from the docs, the second one using parallelStream:
    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));

myShapesCollection.parallelStream() // <-- This one uses parallel
    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));

As long as I don't care about the order, would it always be beneficial to use the parallel? One would think it is faster dividing the work on more cores.

Are there other considerations? When should parallel stream be used and when should the non-parallel be used?

(This question is asked to trigger a discussion about how and when to use parallel streams, not because I think always using them is a good idea.)

Think Twice Before Using Java 8 Parallel Streams, Let's take a look at why you should think twice before using this feature. behind Java 8, you will often hear that parallelism was the main motivation. The problem is that all parallel streams use common fork-join thread pool, and if What's worse, you can not specify thread pool for parallel streams; the� A parallel stream has a much higher overhead compared to a sequential one. Coordinating the threads takes a significant amount of time. I would use sequential streams by default and only consider parallel ones if. I have a massive amount of items to process (or the processing of each item takes time and is parallelizable)

The Stream API was designed to make it easy to write computations in a way that was abstracted away from how they would be executed, making switching between sequential and parallel easy.

However, just because its easy, doesn't mean its always a good idea, and in fact, it is a bad idea to just drop .parallel() all over the place simply because you can.

First, note that parallelism offers no benefits other than the possibility of faster execution when more cores are available. A parallel execution will always involve more work than a sequential one, because in addition to solving the problem, it also has to perform dispatching and coordinating of sub-tasks. The hope is that you'll be able to get to the answer faster by breaking up the work across multiple processors; whether this actually happens depends on a lot of things, including the size of your data set, how much computation you are doing on each element, the nature of the computation (specifically, does the processing of one element interact with processing of others?), the number of processors available, and the number of other tasks competing for those processors.

Further, note that parallelism also often exposes nondeterminism in the computation that is often hidden by sequential implementations; sometimes this doesn't matter, or can be mitigated by constraining the operations involved (i.e., reduction operators must be stateless and associative.)

In reality, sometimes parallelism will speed up your computation, sometimes it will not, and sometimes it will even slow it down. It is best to develop first using sequential execution and then apply parallelism where

(A) you know that there's actually benefit to increased performance and

(B) that it will actually deliver increased performance.

(A) is a business problem, not a technical one. If you are a performance expert, you'll usually be able to look at the code and determine (B), but the smart path is to measure. (And, don't even bother until you're convinced of (A); if the code is fast enough, better to apply your brain cycles elsewhere.)

The simplest performance model for parallelism is the "NQ" model, where N is the number of elements, and Q is the computation per element. In general, you need the product NQ to exceed some threshold before you start getting a performance benefit. For a low-Q problem like "add up numbers from 1 to N", you will generally see a breakeven between N=1000 and N=10000. With higher-Q problems, you'll see breakevens at lower thresholds.

But the reality is quite complicated. So until you achieve experthood, first identify when sequential processing is actually costing you something, and then measure if parallelism will help.

Should I always use a parallel stream when possible?, With Java 8 and lambdas it's easy to iterate over collections as streams, and just as easy to use a parallel stream. Two examples from the docs, the second one� The stream is then switched to parallel mode; numbers that are not primes are filtered out and the remaining numbers are counted. You can see that the stream API allows us to describe the problem

I watched one of the presentations of Brian Goetz (Java Language Architect & specification lead for Lambda Expressions). He explains in detail the following 4 points to consider before going for parallelization:

Splitting / decomposition costs – Sometimes splitting is more expensive than just doing the work! Task dispatch / management costs – Can do a lot of work in the time it takes to hand work to another thread. Result combination costs – Sometimes combination involves copying lots of data. For example, adding numbers is cheap whereas merging sets is expensive. Locality – The elephant in the room. This is an important point which everyone may miss. You should consider cache misses, if a CPU waits for data because of cache misses then you wouldn't gain anything by parallelization. That's why array-based sources parallelize the best as the next indices (near the current index) are cached and there are fewer chances that CPU would experience a cache miss.

He also mentions a relatively simple formula to determine a chance of parallel speedup.

NQ Model:

N x Q > 10000

where, N = number of data items Q = amount of work per item

Java 8 Streams - Sequential vs Parallel streams, Parallel streams divide the provided task into many and run them in but that doesn't matter if we use the same intermediate operations for the both. We are also making each iteration to sleep for 200ms so that we can clearly� There's also a much bigger chance of screwing up with a parallel stream, as your post shows. Read Should I always use a parallel stream when possible? for more arguments. Second, this code is not thread-safe at all, since it uses several concurrent threads to add to a thread-unsafe ArrayList. It can be safe if you use collect() to create the final list for you instead of forEach() and add things to the list by yourself. The code should be

JB hit the nail on the head. The only thing I can add is that Java 8 doesn't do pure parallel processing, it does paraquential. Yes I wrote the article and I've been doing F/J for thirty years so I do understand the issue.

What Are Sequential vs. Parallel Streams in Java?, Gain a perspective about and see how parallel streams can improve performance. They are basically non-parallel streams used a single thread to process It is quite possible to execute multiple threads in a single core However, we always can switch between parallel and sequential as per the need . According to my information, a parallel stream is the same as a serial stream except divided into multiple substreams. It is a question of speed. All operations over the elements are done and the results of the substreams are combined at the end. In the end, the result of the operations should be the same for parallel and serial streams in my opinion.

Other answers have already covered profiling to avoid premature optimization and overhead cost in parallel processing. This answer explains the ideal choice of data structures for parallel streaming.

As a rule, performance gains from parallelism are best on streams over ArrayList , HashMap , HashSet , and ConcurrentHashMap instances; arrays; int ranges; and long ranges. What these data structures have in common is that they can all be accurately and cheaply split into subranges of any desired sizes, which makes it easy to divide work among parallel threads. The abstraction used by the streams library to perform this task is the spliterator , which is returned by the spliterator method on Stream and Iterable.

Another important factor that all of these data structures have in common is that they provide good-to-excellent locality of reference when processed sequentially: sequential element references are stored together in memory. The objects referred to by those references may not be close to one another in memory, which reduces locality-of-reference. Locality-of-reference turns out to be critically important for parallelizing bulk operations: without it, threads spend much of their time idle, waiting for data to be transferred from memory into the processor’s cache. The data structures with the best locality of reference are primitive arrays because the data itself is stored contiguously in memory.

Source: Item #48 Use Caution When Making Streams Parallel, Effective Java 3e by Joshua Bloch

Java Parallel Streams Are Bad for Your Health!, Parallel Java streams can speed up some tasks. Allegedly, it might speed up some tasks your application executes by utilizing multiple threads from instance and ensure that it knows when to compensate workers stuck in a blocking call. You can do it yourself by supplying the -Djava.util.concurrent. It is safe to use a non-concurrent collector in a collect operation of a parallel stream. In the specification of the Collector interface, in the section with half a dozen bullet points, is this: For non-concurrent collectors, any result returned from the result supplier, accumulator, or combiner functions must be serially thread-confined.

Should I always use a parallel stream when possible?, With Java 8 and lambdas it's easy to iterate over collections as streams, and just as easy to use a parallel stream. Two examples from the docs,� Should I always use a parallel stream when possible? Custom thread pool in Java 8 parallel stream ; Java 8: performance of Streams vs Collections ; How to Convert a Java 8 Stream to an Array? Convert Iterable to Stream using Java 8 JDK

Parallelism (The Java™ Tutorials > Collections > Aggregate , One difficulty in implementing parallelism in applications that use collections is that collections are not You can execute streams in serial or in parallel. When you create a stream, it is always a serial stream unless otherwise specified. Parallel processing is all around nowadays. Because of the increase of the number of cpu cores and the lower hardware cost which allows cheaper cluster-systems, parallel processing seems to be the next big thing. Java 8 cares for this fact with the new stream API and the simplification of creating parallel processing on collections and arrays.

Java 8 Parallel Streams Example, Do remember, Parallel Streams must be used only with stateless, non-interfering, and associative 1.2 When to use Parallel Streams? You can download the full source code of this example here: Java8ParallelStream. Parallel stream leverage multicore processors, resulting in a substantial increase in performance. Unlike any parallel programming, they are complex and error prone. However, the Java stream library provides the ability to do it easily, and in a reliable manner.

  • Good answer. I would add that if you have a massive amount of items to process, that only increases the thread coordination issues; it's only when processing of each items takes time and is parallelizable that parallelization might be useful.
  • @WarrenDew I disagree. The Fork/Join system will simply split the N items into, for example, 4 parts, and process these 4 parts sequentially. The 4 results will then be reduced. If massive really is massive, even for fast unit processing, parallelization can be effective. But as always, you have to measure.
  • i have a collection of objects that implement Runnable that I call start() to use them as Threads, is it ok to change that to using java 8 streams in a .forEach() parallelized ? Then i'd be able to strip the thread code out of the class. But are there any downsides?
  • @JBNizet If 4 parts pocess sequentially, then there is no difference of it being process parallels or sequentially know? Pls clarify
  • @Harshana he obviously means that the elements of each of the 4 parts will be processed sequentially. However, the parts themselves may be processed simultaneously. In other words, if you have several CPU cores available, each part can run on its own core independently of the other parts, while processing its own elements sequentially. (NOTE: I don't know, if this is how parallel Java streams work, I'm just trying to clarify what JBNizet meant.)
  • This post gives further details about the NQ model:
  • @specializt: switching a stream from sequential to parallel does change the algorithm (in most cases). The determinism mentioned here is regarding properties your (arbitrary) operators might rely on (the Stream implementation can’t know that), but of course shouldn’t rely on. That’s what that section of this answer tried to say. If you care about the rules, you can have a deterministic result, just like you say, (otherwise parallel streams were quite useless), but there’s also the possibility of intentionally allowed non-determinism, like when using findAny instead of findFirst
  • "First, note that parallelism offers no benefits other than the possibility of faster execution when more cores are available" -- or if you're applying an action that involves IO (e.g. -> downloadPage(url))...).
  • @Pacerier That's a nice theory, but sadly naive (see the 30-year history of attempts to build auto-parallelizing compilers for a start). Since it is not practical to guess right enough of the time to not annoy the user when we inevitably get it wrong, the responsible thing to do was just to let the user to say what they want. For most situations, the default (sequential) is right, and more predictable.
  • @Jules: Never use parallel streams for IO. They are solely meant for CPU intensive operations. Parallel streams use ForkJoinPool.commonPool() and you don't want blocking tasks to go there.
  • Streams are not iterable because streams do internal iteration instead of external. That's the whole reason for streams anyway. If you have a problems with academic work then functional programming might be not for you. Functional programming === math === academic. And no, J8-FJ is not broken, it's just that most of the people do not read the f****** manual. The java docs say very clear that it's not a parallel execution framework. That's the whole reason for all the spliterator stuff. Yes it's academic, yes it works if you know how to use it. Yes it should be easier to use a custom executor
  • Stream does have an iterator() method, so you can iterate them external if you want. My understanding was that they don't implement Iterable because you can only use that iterator once and nobody could decide whether that was OK.
  • to be honest : your entire paper reads like a massive, elaborate rant - and that pretty much negates its credibility ... i'd recommend re-doing it with a much less aggressive undertone otherwise not many people will actually bother to fully read it ... im just sayan
  • A couple of questions about your article... first of all, why do you apparently equate balanced tree structures with directed acyclic graphs? Yes, balanced trees are DAGs, but so are linked lists and pretty much every object-oriented data structure other than arrays. Also, when you say recursive decomposition only works on balanced tree structures and is therefore not relevant commercially, how do you justify that assertion? It seems to me (admittedly without really examining the issue in depth) that it should work just as well on array-based datastructures, e.g. ArrayList/HashMap.
  • This thread is from 2013, a lot has change since then. This section is for comments not detailed answers.