We are all well aware of all the fancy new features that JDK 8 brought to us and probably it’s hard to find a Java developer that doesn’t know what Java Streams, Lambdas or CompletableFutures are. So all these nice features came a few years back with JDK 8, but what happened a bit earlier with the release of JDK 7?
If we check the new features in the JDK 7 release, it’s still a bit difficult to spot one of the most important new features introduced in this release.
We’ll have to go under “Concurrency utilities” section to actually see what we’re looking for.
We’re of course talking about the release of the ForkJoinPool framework in JDK 7, something that in my opinion didn’t get the publicity that it actually deserved; I’d say that many of the Java developers nowadays are a bit unfamiliar with what ForkJoinPool is and where is used.
In this article we’re going to go over the internals of the ForkJoinPool framework and explain why it’s so important; trying to give some justice to this unfairly treated part of the JDK.
Let’s start then!
What is the ForkJoinPool framework then? The ForkJoinPool framework is a fine-grained task execution framework designed for an efficient parallel execution of tasks. As we mentioned earlier, it was introduced as part of JDK 7 release.
Although most of the times we don’t use it explicitly, ForkJoinPool framework is used under the covers in many well-known frameworks and components. For instance, CompletableFuture or parallel Streams make use of ForkJoinPool internally.
It’s worth mentioning that ForkJoinPool is also used in languages like Kotlin or Scala.
So what’s so interesting about ForkJoinPool? What does it give us that existing Executors couldn’t provide? We can reply to this question with just two words: work stealing!
The design of ForkJoinPool has been based on the work-stealing framework created for Cilk; if you’re interested about the original design you can read about it here.
Let’s see what is this about and how ForkJoinPool works!
How does ForkJoinPool work
The design of ForkJoinPool is actually very simple, but at the same time it’s very efficient. It’s based on the “Divide-And-Conquer” algorithm; each task is split into subtasks until they cannot be split anymore, they get executed in parallel and once they’re all completed, the results get combined.
Sounds familiar? If you read my article “Understanding Java Stream”, Java parallel Streams are executed very similarly.
So what we have is a framework which follows a computation model of divisible tasks where each of them follows a pattern similar to this chunk of code written in pseudo-code:
As you can see, a task can get divided into multiple subtasks recursively until it reaches a predefined condition where we consider that the task is small enough to be executed immediately.
This is basically what we call the Fork-Join model.
The Fork-Join model
As we just mentioned, the fork-join model is a method where we split each task (fork) and then wait for the completion (join) of all these subtasks; once they’re all completed we can combine them and return a result back.
In the above figure we can see how each task gets divided every time that fork is called; in the same way, when all tasks are completed, they get joined and combined to generate a final result.
Now that we understand how the model works, let’s get to the most critical part, how does ForkJoinPool execute these tasks internally?
ForkJoinPool, as any pool of threads, is composed by a predefined number of threads or workers. When we create a new ForkJoinPool, the default level of parallelism (number of threads) will be by default the number of available processors in our system, a number that gets returned by the method Runtime.availableProcessors(). Please be aware that nowadays with so much virtualisation in use (Cloud VMs and Docker), your JVM in many cases won’t have as many available processors as the number of available processors in the underlying machine.
You could also create your own ForkJoinPool specifying how many threads you need; what you should keep in mind is that for CPU-intensive tasks you’ll see no benefit in having a pool larger than the number of processors that you have available. However, if your tasks are IO-intensive tasks (what means that they’ll be frequently waiting for IO operations to complete) you could possibly benefit from a larger pool in some cases.
Every of these worker threads has its own worker queue, a double-ended queue of type WorkQueue. These local queues are normally called “deque“.
So the way it works is that each of these workers keeps scanning for available subtasks to be executed, the main goal is to keep worker threads as busy as possible and maximise the use of CPU cores; a thread will only block then when there are no available subtasks to run.
What happens when a worker cannot find tasks to run on its own queue? It will try to “steal” tasks from those workers that are busier!
This is where it gets interesting; how does the framework guarantees that the owner of the queue and the “stealer” don’t interfere with each other if they try to grab a task at the same time?
Well, to minimise the contention and make it in a more efficient way, both the owner of a queue and the “stealers” grab tasks from different parts of the queue.
To insert tasks in a queue push() method is used and the owner of the queue grabs a task by calling pop() method. So the queue is used as a stack by the owner of the queue; taking elements from the head of the stack. Something similar to what’s shown in the illustration below:
We can quickly notice that LIFO method (Last In, First Out) is used, why has it been designed in such a way? Wouldn’t it make more sense to process first the tasks that has been in the queue for a longer period of time?
Well, the answer is no. The main reason to do this is to improve performance; by always picking the most recent task, we increase the chances of having the task resources still allocated in the CPU caches; something that will boost performance considerably. This is commonly called locality of reference.
On the other hand, when a worker runs out of tasks to process it will always take tasks from other worker’s queue tail by calling poll() method.
In this case we follow a FIFO approach then; this is basically to reduce the contention needed to synchronise both the owner worker and the “stealer”.
Another very good reason to do this is because, due to the nature of these divisible tasks, the older tasks in the queue are the most likely to provide big chunks of work, as they probably haven’t been split yet.
You will probably notice that push and pop methods are only called by the owner of the queue and poll method is only called by the worker trying to “steal” work from a different worker.
Push and pop methods are wait-free CAS (Compare-And-Swap) operations, so they’re quite performant. However, poll method is not always lock free; it’ll block in those cases where the queue is almost empty as some synchronisation will be required in order to guarantee that only the owner or the stealer pick a given task, but not both.
It’s actually quite interesting how this framework has been designed, isn’t it?
Putting everything together
Considering what we’ve seen so far, it’s quite obvious that to take advantage of this framework we’d need tasks that are able to split themselves easily.
This is where Java collections fit well, remember when we talked about the Spliterator interface? Java collections can be split very easily since JDK 8, that’s why Java Streams and ForkJoinPool are perfect partners!
Parallel Java Streams can work very efficiently, optimising the use of the CPU cores available, thanks to the ForkJoinPool framework!
But you might be thinking, what if for the type of task I want to process it’s not possible to execute it with Java Streams? You can create your own “divisible” tasks simply by extending ForkJoinTask class.
What is a ForkJoinTask? ForkJoinTask is a Java class that behaves similarly to a Java Thread, but it’s much more lightweight; basically because it doesn’t maintain its own runtime stack or program counters.
There are three subtypes of ForkJoinTask: RecursiveAction, RecursiveTask and CountedCompleter. Picking one or another will depend on the kind of tasks you are writing, take a look at the docs to understand which one fits your needs best.
One thing worth mentioning is that these three classes are not functional interfaces, mainly because ForkJoinPool was released in JDK7, so unfortunately we cannot use Lambda expressions. However, the Java 8 parallel stream framework provides a functional API to transparently use the ForkJoinPool.
Let’s try implementing a simple task using ForkJoinTask; we’re going to write a Fibonacci calculator task using ForkJoinTask.
How will it look like?
That’s it, it’s quite simple, right? What will happen is that our task will get divided into multiple subtasks and the ForkJoinPool worker threads will work together to solve all of them.
A very good practice that we should follow when we write ForkJoinTasks is to always write them as pure functions, not sharing state and avoiding the mutation of objects; this is the best way to ensure that our subtasks can be executed safely and in an independent manner.
Also be aware that ForkJoinPool not only allows the submission of ForkJoinTasks, it also allows the submission of Callable or Runnable tasks, so you can use ForkJoinPool in the same way that you could use the existing Executors.
The only difference would be that your task won’t split itself, but you could benefit from work stealing performance improvements if multiple tasks are submitted and there are some threads with less work than others.
If you need to have a better understanding about concurrency and multi-threading and why we do things in the way we do them nowadays, I’d recommend that you read the book “Java Concurrency in Practice” by Brian Goetz.
So that’s it from me! I hope you’ve learned something about ForkJoinPool with today’s article!
As we’ve seen today, the Fork Join model is an efficient way of processing tasks efficiently using the “Divide-And-Conquer” method; this together with the work stealing feature makes of ForkJoinPool a powerful tool to parallelise our code in Java.
Nowadays it’s very easy to transparently use ForkJoinPool when we use Java Streams or CompletableFutures, so probably only in rare cases we’ll have to write our own divisible tasks; in any case, it’s always good to understand how it works and know that we also have this possibility.
I really hope you’ve enjoyed reading this article and please follow if you want to get notified when the next article gets published! Looking forward to see you soon!
Thank you very much for reading!