virtual threads - Java

Understanding Java Virtual Threads – The Death of Async Programming

In the JDK 19 release, we can find the first preview of something JDK developers have been working on for a long time, Project Loom. This first preview is available as part of JEP 425 and it will allow the creation of “virtual threads“. As it is still a preview, you will have to enable previews when you compile your program in Java 19. You can check how to enable preview features in Java in our article “How to enable previews in Java”.

More recently, the second preview of virtual threads has been released as part of JEP 436. Some of the changes introduced in the first preview has been made final, we are one step closer to gain full access to virtual threads.

In this article we will try to give you a solid background about why Java virtual threads are very needed in the JVM ecosystem and mainly, provide you with the basics to be able to understand Java virtual threads.

Virtual threads in Java - The new future
Photo by Pan Yunbo on Unsplash

The Problem With Platform Threads

Parity Between OS threads and Platform Threads

Currently, in the JDK, there is a one-to-one relationship between Java threads (also called “platform” threads) and OS threads.

This means that when a thread is waiting for the completion of an IO operation, the underlying OS thread will remain blocked and therefore, unused, until this operation completes. This has always been a big problem in terms of scalability in the Java ecosystem, because our application is limited by the available threads in the host.

In the last decade, we have tried to addressed this problem with the use of asynchronous processing libraries and the use of futures. For example, using CompletableFuture we can achieve a non-blocking approach, although the readability of these models is, in many cases, not what we would expect. If you are interested in looking at some CompletableFuture examples, you can read our article “Multiple API calls with CompletableFuture”.

The Problem with Async Programming

Although async programming is a viable solution for the threads limitation, writing asynchronous code is more complex than sequential. The developer has to define callbacks to apply operations based on the response to a given task, this makes difficult to follow and reason about the code.

Another big issue is that debugging these applications becomes hard. A given request can be handled by multiple threads, therefore debugging, logging or analysing stack traces gets really difficult.

In terms of flexibility and maintainability, asynchronous programming is very limited too. We have to renounce to certain sequential workflow structures like loops, this means that a piece of code that has been written as sequential cannot be transformed to asynchronous easily. Exactly the same happens in the opposite case.

Lastly, but not less important, writing explicit asynchronous code is more error-prone due to the complexities that come with it.

Expensive Creation of Threads

Another problem with platform threads is that they are heavy objects which are expensive to create, therefore we need to create them beforehand and store them in a pool of threads to avoid creating new threads every time that we need a thread to run our code. Why are they expensive?

The creation of Java threads is expensive because it involves allocating memory for the thread, initialising the thread stack and also making OS calls to register the OS thread.

When we consider both problems, the limit in OS threads and how expensive it is to create platform threads, this means that we need bounded pools of threads to be able to run our applications safely. If we don’t use a bounded pool of threads, we are risking running out of resources with dramatic consequences for our system.

Expensive Context Switching

The other problem of this design is how expensive context switching is. When there is a context switch, an OS thread switches from one platform thread to another, the OS has to save the local data and memory pointers for the current thread and load the ones for the new platform thread. Context switching is a very expensive operation as it involves many CPU cycles. The OS takes care of pausing a thread, saving its stack and allocating the new thread, this process is costly as it requires loading and unloading the thread stack.

So how do we solve these problems? This is where Project Loom and its virtual threads come into play. Let’s see how!

Virtual Threads to the Rescue

Virtual threads in Java get their name from an analogy with virtual memory. This is because we are provided with an illusion of having an almost unlimited number of threads available (figuratively speaking), in a similar way to what virtual memory does.

virtual threads - illusion
Photo by Armand Khoury on Unsplash

Virtual threads resolve one of the main problems with scalability in the JDK but how does it resolve it? The answer is, mainly, by breaking the association between a platform thread and an OS thread.

Many applications in the JVM ecosystem break way before reaching their CPU or memory limits, this is mainly due to this parity between platform threads and OS threads. The creation of platform threads is very expensive, therefore the need to use thread pools, and we are always limited by the number of processing units (CPUs) available in our host.

On the other hand, a virtual thread contributes with a minimal overhead to the system, therefore, we can have thousands of them in our application. Every virtual thread requires an OS thread to do some work, BUT, it does NOT hold the OS thread while waiting for resources. This means that virtual threads can wait for IO, free the platform thread they are current using so another virtual thread can use it to do some work, and resume their work later on when the IO operation gets done.

What is the main advantage of this? One of the answers is a cheap context switching! Let’s see why!

Cheap Context Switching

As we mentioned earlier, context switching is very expensive in Java due to having to save and load thread stacks every time it happens.

The difference with virtual threads is that, due to them being under the control of the JVM, the thread stack is stored in the heap memory and not in the stack. This means that allocating the thread stack for an awakened virtual thread becomes much cheaper. The process of loading a virtual thread’s data stack into the “carrier” thread stack when it gets assigned to its carrier, is called mounting. The process of unloading it gets called unmounting.

Let’s take a look a brief look at thread scheduling now.

Scheduling

The traditional platform threads are scheduled by the operating system, whereas virtual threads are scheduled by the JDK runtime.

In the case of platform threads, the OS scheduler is the one responsible for assigning work to each OS thread. The way it does it is by assigning time slices to each process, when this time is up, it is another process turn to get CPU time to do some work. This is the way that the OS tries to ensure a fair(ish) distribution of CPU time among the different existing processes.

On the other hand, virtual threads are directly scheduled by the JDK runtime. The way it’s been implemented is by using a ForkJoinPool internally, this is a dedicated pool used as virtual thread scheduler. Meaning that the common pool returned by ForkJoinPool.commonPool is a different instance to this new thread pool.

virtual threads - scheduling
Image Credit: Author

The JDK scheduler does not use time slices, in this case is the virtual thread itself who will yield and renounce to its carrier thread when it’s waiting for a blocking operation response. The main consequence of this is that we will have a much better resource utilisation, therefore an increased throughput in our application.

It’s worth mentioning that the underlying platform threads, also called carrier threads in terms of scheduling, are still managed by the OS scheduler. They’re now a layer of abstraction completely invisible to the developer writing concurrent code.

Another aspect to consider here is that the use of virtual threads provides a false sensation of executing work in parallel. What is actually happening is that processing units time gets distributed more efficiently. Each processing unit won’t be doing any work in parallel, just switching from one virtual thread to another much more frequently and cheaply.

Now that we have seen how virtual threads work, there is one question we could be having now. Will every application benefit from the introduction of virtual threads? Not really, let’s see why is that.

IO-Bound Applications

Not every application will benefit from a big performance improvement after having adopted virtual threads. We will only observe a huge benefit when our application is IO-bound.

What does it mean? IO-bound applications are those which spend a considerable time waiting for the response of IO operations such as network calls, file or database access. These are the majority of applications nowadays.

The reason why the benefit of using virtual threads is huge in IO-bound applications is because virtual threads are very good at waiting, meaning that a thread can wait and resume in a very cheap manner in terms of performance.

In this situation, the virtual thread blocks while it waits, but the platform thread won’t. The platform thread will be assigned to a different virtual thread to continue doing useful work instead of waiting. This means that we will have a much better resource utilisation in our system! In the example shown below, we have two platform threads, which are mapped to a corresponding OS thread in our operating system. We can see how platform threads are always occupied doing some work, instead of waiting for the completion of IO.

Every time that a virtual thread waits for IO, it yields to free its carrier thread. Once the IO operation gets completed, the virtual thread is put back into the FIFO queue of the ForkJoinPool and will wait until a carrier thread is available.

virtual threads - JDK scheduling
Image Credit: Author

This also means that we can achieve a considerable increase in throughput in our applications. Virtual threads achieve this by increasing the number of tasks we can process concurrently, not by reducing latency. To make it clear, virtual threads are not faster than platform threads, they are just more efficient in how they wait and how the work gets distributed.

For CPU-bound applications we have another tools at hand, like parallel tasks or work stealing in ForkJoinPool to improve its performance, virtual threads will have a minimal impact on their performance. Please remember the difference between the two to avoid getting unexpected results!

What other benefits do virtual threads bring to our applications? There is a very important one, we can now write non-blocking concurrent code in a synchronous manner.

Synchronous style for non-blocking operations

With the introduction of virtual threads in Java, writing concurrent code gets simplified enormously. Our code becomes easier to read and understand, this is one of the big problems of asynchronous programming nowadays, its complexity sometimes can get out of hands.

We could now write concurrent code without having to deal with orchestrating the different interactions that could happen in an asynchronous manner, the JDK runtime will deal with it for us, assigning the available OS threads among the existing virtual threads.

What happens if we maintain an old application using the traditional concurrency mechanisms available in Java?

Backwards compatibility

If you were wondering what would happen to your code after migrating to Java 19 if it uses synchronized blocks or any of the traditional concurrency mechanisms, there’s good news. The old concurrent code will work with virtual threads without having to modify it at all. You probably won’t even need to recompile and build new artefacts in some cases, because all of this is dealt with by the JDK runtime. In other cases, the changes will be minimal to take full advantage of virtual threads.

There are some limitations currently with the use of synchronised blocks and thread locals, we won’t get into much details but the general recommendation is to avoid using them.

Let’s look at how the JDK API looks like now!

Programming with Virtual Threads

There’s been a few changes in the JDK proposed under JEP 425. In terms on how to write code able to take advantage of virtual threads, we will see that it’s quite straightforward.

You can write code in the same way that you always do, virtual threads is a built-in feature in the JDK, therefore you don’t need to do much to take advantage of it.

One of the good things is that virtual threads extend from Thread class, there is no need for a new thread class object.

The only changes are in the way we define if a thread we create represents a virtual or platform thread. To achieve this, the JDK brings a Thread.Builder to be able to instantiate and configure both easily.

Thread.Builder provides two methods to instantiate threads. One of them creates a traditional platform thread by using Thread.Builder.ofPlatform() method. In order to instantiate a virtual thread we’ll have to use Thread.Builder.ofVirtual() instead.

Another change is the inclusion of a new ExecutorService, this new executor service can be instantiated by running Executors.newVirtualThreadPerTaskExecutor() method.

Let’s see how it works with a couple of examples!

Executors.newVirtualThreadPerTaskExecutor()

The introduction of this new method allows transitioning from existing concurrent code to virtual threads very easily. Let’s see how in the following example:

        final LongAdder adder = new LongAdder();        
        Runnable task = () -> {
            try {
                Thread.sleep(10);
                System.out.println("I'm running in thread " + Thread.currentThread());
                adder.increment();
            } catch (InterruptedException e) {
                Thread.interrupted();
            }
        };
        long start = System.nanoTime();
        try (ExecutorService executorService = Executors.newCachedThreadPool()) {
            IntStream.range(1, 10000)
                    .forEach(number -> executorService.submit(task));
        }
        long end = System.nanoTime();
        System.out.println("Completed " + adder.intValue() + " tasks in " + (end - start)/1000000 + "ms");

You can see how the example above uses a cached thread pool to submit 10,000 tasks which simulate a small IO operation that takes 10ms plus the time taken to print to the console and incrementing a counter.

If we run this code, we get the following results:

...
I'm running in thread Thread[#1271,pool-1-thread-1242,5,main]
I'm running in thread Thread[#1260,pool-1-thread-1231,5,main]
I'm running in thread Thread[#928,pool-1-thread-899,5,main]
I'm running in thread Thread[#275,pool-1-thread-246,5,main]
Completed 9999 tasks in 4740ms

We’ve only included the latest elements and the final result for the sake of brevity, you can see how we are using platform threads from a cached thread pool. It takes 4.7 seconds to run something as simple as this.

Let’s see what happens when we use the new virtual thread executor:

        long start = System.nanoTime();
        try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor()) {
            IntStream.range(1, 10000)
                    .forEach(number -> executor.submit(task));
        }
        long end = System.nanoTime();
        System.out.println("Completed " + adder.intValue() + " tasks in " + (end - start)/1000000 + "ms");

As you will notice, switching to virtual threads is as simple as switching to the new executor service, the rest of the code stays the same! This is awesome, right?

What about the performance using virtual threads? These are the results we got:

I'm running in thread VirtualThread[#10029]/runnable@ForkJoinPool-1-worker-10
I'm running in thread VirtualThread[#10031]/runnable@ForkJoinPool-1-worker-10
I'm running in thread VirtualThread[#10027]/runnable@ForkJoinPool-1-worker-10
I'm running in thread VirtualThread[#10028]/runnable@ForkJoinPool-1-worker-10
Completed 9999 tasks in 760ms

It only took 760ms with virtual threads! Why is that? As we saw earlier, platform threads don’t get blocked while virtual threads are waiting for IO operations, therefore additional tasks can be processed while virtual threads are waiting. This is huge for the JVM ecosystem!

Thread.ofVirtual()

Now let’s look at a similar example, in this case we will use Thread.ofPlatform() and Thread.ofVirtual() to specify what kind of threads we’ll be using in the test.

We will run an example using Thread.ofPlatform() first:

long start = System.nanoTime();
        int[] numbers = IntStream.range(1, 10000).toArray();
        List<Thread> threads = Arrays.stream(numbers).mapToObj(num ->
                Thread.ofPlatform()
                        .start(task)
        ).toList();
        threads.parallelStream().forEach(thread -> {
            try {
                thread.join();
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        });
        long end = System.nanoTime();
        System.out.println("Completed " + adder.intValue() + " tasks in " + (end - start)/1000000 + "ms");

We start 9,999 threads to run the same task we used in our previous example, we then wait for their completion by using join().

If we run this test, it takes around 2-3 seconds to complete:

I'm running in thread Thread[#10023,Thread-9993,5,main]
I'm running in thread Thread[#10025,Thread-9995,5,main]
I'm running in thread Thread[#10026,Thread-9996,5,main]
I'm running in thread Thread[#10028,Thread-9998,5,main]
Completed 9999 tasks in 2394ms

What happens if we use the same example but just instantiating virtual threads?

...
List<Thread> threads = Arrays.stream(numbers).mapToObj(num ->
                Thread.ofVirtual()
                        .start(task)
        ).toList();
...

As we observed in the previous example, virtual threads provide a much better throughput, as seen in the results below.

I'm running in thread VirtualThread[#10029]/runnable@ForkJoinPool-1-worker-4
I'm running in thread VirtualThread[#10030]/runnable@ForkJoinPool-1-worker-4
I'm running in thread VirtualThread[#10031]/runnable@ForkJoinPool-1-worker-4
I'm running in thread VirtualThread[#9954]/runnable@ForkJoinPool-1-worker-5
Completed 9999 tasks in 722ms

Again, virtual threads beat platform threads with a considerable difference.

Please keep in mind that these timings are not accurate in the sense that we’re not running proper benchmarks. We are not warming up the JVM to give time to the JIT compiler to perform improvements and we’re also running a single execution. This is just to show you how much can our throughput improve with virtual threads!

One last thing that we wanted to mention is that virtual threads also opens the door to introduce structured concurrency in Java. This change will also make Java code much safer when running multiple concurrent tasks at different nested levels.

Java will soon introduce structured concurrency and something called scopes as part of JEP 429, this is quite similar to what Kotlin does in their coroutines.

Conclusion

In this article we have seen how virtual threads will solve one of the major problems in the Java ecosystem. The existing parity between OS threads and platform threads was a huge limitation factor for some applications due to the limit in the number of OS threads in a host.

Asynchronous programming has been our saviour for a long time, however, we see virtual threads are a big factor to cause what we thing will be the death of asynchronous programming as we know it. Simple paradigms of concurrency will be adopted after this change is made available in one of the next JDK releases.

That’s all from us today! We really hope that you have enjoyed this article and learn something new about the JVM ecosystem. We think that the future is bright for the JVM ecosystem and all the developers and languages part of this community after this upcoming change.

Looking forward to seeing you again very soon! Thanks for reading us!