As Java developers most of us ignore the importance of having a complete understanding on how the machine where we run our applications works. In the case of Java, a part of a huge importance in its performance is the JIT compiler.
This machine, in the case of Java or Kotlin developers, is the JVM. The JVM (Java Virtual Machine) as most of you will know, uses JIT (Just-In-Time) compiler to optimise our code on execution time, and this is one of the main mechanisms to improve performance in our applications.
Java is a compiled language and it gets compiled to an intermediate low-level language, which gives Java the portability to be able to run in any kind of machine as long as it runs a JVM.
Java is then capable of compiling this intermediate code into a platform binary code at execution time, that is precisely why we called this compiler “Just-In-Time” compiler. Initially many sections of the code are interpreted, which will be faster than waiting for compilation + execution time if the method is executed only once; however, if that method gets executed very frequently that’s when it’ll be a big improvement to compile it and optimise it.
That’s exactly what the JIT compiler does.
Initially many sections of the code are interpreted, this will be faster than waiting for compilation + execution time if the method is executed only once.
The JIT compiler design is based then on a very basic principle: compile and optimise the sections of code which are more frequently executed. This is a very sensible and pragmatic approach, why optimising a section of code even if it gets 80% faster if that code is very rarely executed?
Those frequently used code sections are called hot spots, which gives name to the HotSpot JVM.
Some of you might be asking, why not compiling and optimising all our code before our app starts executing? Well, the main reason is that the more a method gets executed the more JIT compiler knows about that method and the bigger will be the performance improvement. So the code needs to get executed in order to be able to optimise it and also the compiled code is stored in code caches that have a limited size.
A bit of history
There are two different types of compiler: client compiler (also called C1 compiler) and server compiler (also called C2 compiler).
Historically client compiler was used for short-lived applications or any application where startup time had to be as fast as possible; however, if we were running a long-lived application server compiler was used, as it’d optimise the code for the long-term run.
These two compilers have existed and still exist in newer versions of Java, but the main difference is that in previous versions of Java an application had to be started with just one of these compilers; however, in newer versions we always use both compilers and the existing flags to enable these compilers are no longer in use.
This new way of running both compilers is what is called tiered compilation.
In earlier versions to JDK 8, we had to specify the -client flag to use the client compiler and -server or -d64 to use the server compiler.
Since JDK 8 these flags have no effect.
In tiered compilation we use both compilers then. The C1 compiler starts compiling first while the C2 compiler waits and profiles some sections of code to gather information about them.
That allows C1 compiler to improve performance faster; however, C2 compiler will be able to make much better performance improvements, as it has better information about the hot methods.
Tiered compilation is enabled by default since JDK 8 but if for some reason you want to disable it, the flag to be used is –XX:-TieredCompilation.
Analyse JIT compilation logs
By default JIT compiler logs are disabled, so in order to be able to see them we’d have to use the following JVM flag: -XX:+PrintCompilation
Another way of analysing JIT logs is by enabling the logs to be written to a log file. We can do that passing the following JVM flags:
-XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=jit_compiler.log
The format of the logs when using LogCompilation flag is very complex, so that’s why most of the times we’d use a tool like JITWatch to analyse these logs. For this reason, we’d be explaining only the logs enabled using PrintCompilation flag, as they’re much easier to read.
Even following PrintCompilation format, the way in which these logs are printed is not intuitive at all, so you’ll have to memorise or consult what they mean.
First of all let’s look at an example by showing several lines of JIT compiler logs and we’ll go through what each element in the log means, as there are several things to cover.
78 39 3 java.util.concurrent.ConcurrentHashMap::putIfAbsent (8 bytes)
79 42 n 0 jdk.internal.misc.Unsafe::compareAndSetLong (native)
79 40 ! 3 java.util.concurrent.ConcurrentHashMap::putVal (432 bytes)
79 44 n 0 jdk.internal.misc.Unsafe::compareAndSetObject (native)
80 41 3 java.util.concurrent.ConcurrentHashMap::addCount (289 bytes)
These are the correspondences for each of the fields that we can see in each log line:
|time since startup||incremental id per each compiled method||Indicates the state of the code being compiled||level of compilation||Method name||Size of the Java bytecode||Deoptimisation when it happens|
Let’s explain some of them in more detail:
There are five different types of attributes to express the state of the compilation. These are:
|%||Indicates that an On-Stack Replacement occurred|
|s||Indicates that the method is synchronised|
|!||Indicates that the method contains an exception handler|
|b||Indicates that compilation occurred in blocking mode|
|n||Indicates that compilation transformed a wrapper to native method|
You will never see the blocking mode, as it only happens when background compiling is disabled. It’s enabled by default and there’s not really a good reason to disable it.
To see what OSR is with an example you can check this link.
Although having two compilers could make us think that we’d have two tier levels, we actually have five tier levels in tiered compilation.
The different values for the tier level field are:
|blank||Only when tiered compilation is disabled|
|0||The code has been interpreted|
|1||Simple C1 compilation|
|2||Limited C1 compilation|
|3||Full C1 compilation|
In an ideal scenario what will happen is that our code gets fully optimised going directly from level 0 (interpreted, the initial state for any method) to levels 3 and 4.
However, in some scenarios that for the sake of brevity we won’t cover extensively in this article, the code will be compiled to levels 1 and 2.
Summarising, level 2 will be used when either C2 compiler queue is full or C1 compiler queue is full.
With respect to level 1, some methods could end up in level 1 if they’re too trivial, so they’ll be transitioned to level 1 from levels 3 and 4.
One thing that could happen is that the code gets de-optimised, which is basically like a rollback of a previous compilation.
In those cases where a de-optimisation happens, we will see a text in the last column of our JIT logs. That field will normally have one of these two values: “made not entrant” or “made zombie”.
In the case of not entrant code, there are two situations when this could happen. One of them is due to the way tiered compilation works and it’s totally expected; the other it’s basically due to “a change of mind” in the compiler.
Let’s say that it made an optimisation based on the information it had from the execution until that point in time and then afterwards it gathered more information that made it rollback the previous optimisation to be able to compile it again and re-optimise the code based on the new information.
With respect to zombie code, this is basically a clean up mechanism. After a section of code is marked as not entrant it will eventually be marked as zombie and will be collectable by the GC to free that space from the code cache.
Let’s review and clarify what we’ve seen so far by going through a simple example. I hope that helps you settle all these new concepts in your mind.
We’re going to use a quite simple piece of code where we basically have two implementations of a LogPrinter interface and we loop through 1,000,000 elements using initially the ConsoleLogPrinter implementation and then switching to the other implementation named FileLogPrinter after N number of iterations.
And now a main method that will be doing the iterations:
Quite simple, right? Let’s run this example passing -XX:+PrintCompilation as a VM flag. We’ll be omitting many of the JIT logs for simplicity and show just some of the logs that we’re interested in.
528 145 % 3 com.theboreddev.examples.jvm.compiler.DeOptimisationExample::main @ 2 (80 bytes)
This first log shows that an OSR took place for the main method at tier level 3, this basically indicates that a compiled version of the main method replaced the initially interpreted version.
At the same time, a few instants earlier the printLog method that belongs to ConsoleLogPrinter was compiled at tier level 3.
464 110 3 com.theboreddev.examples.jvm.compiler.DeOptimisationExample$ConsoleLogPrinter::printLog (9 bytes)
A few milliseconds later after main method was compiled at level 3 (keep in mind that the first column is a timestamp in milliseconds after start of the application), the same printLog method was compiled and optimised at tier level 4 and the previous compiled version at level 3 is made obsolete.
This situation is what we mentioned to be an expected de-optimisation to happen using tiered compilation.
543 140 4 com.theboreddev.examples.jvm.compiler.DeOptimisationExample$ConsoleLogPrinter::printLog (9 bytes) 555 110 3 com.theboreddev.examples.jvm.compiler.DeOptimisationExample$ConsoleLogPrinter::printLog (9 bytes) made not entrant
Then what happens next is that main method gets compiled at tier level 4 and the previous compilation is made obsolete by marking it as not entrant.
569 149 % 4 com.theboreddev.examples.jvm.compiler.DeOptimisationExample::main @ 2 (80 bytes) 628 145 % 3 com.theboreddev.examples.jvm.compiler.DeOptimisationExample::main @ 2 (80 bytes) made not entrant
What happens next is interesting, for the first time we see a code compiled at tier level 4, which was supposed to be optimised, to be marked as not entrant! This is a de-optimisation that is not due to the normal way of working of tiered compilation happens.
831 149 % 4 com.theboreddev.examples.jvm.compiler.DeOptimisationExample::main @ 2 (80 bytes) made not entrant
The compiler rolls back this version because with the new information it has gathered it detects that the compiled code is not optimal anymore.
Later on the printLog corresponding to FileLogPrinter is compiled at level 3, please notice the exclamation mark, which indicates that the method contains an exception handler.
986 235 ! 3 com.theboreddev.examples.jvm.compiler.DeOptimisationExample$FileLogPrinter::printLog (305 bytes)
A few milliseconds later, the main method gets recompiled with a new version at tier level 3 for second time.
1083 269 % 3 com.theboreddev.examples.jvm.compiler.DeOptimisationExample::main @ 2 (80 bytes)
Now it has to pass more than half a second until FileLogPrinter.printLog method gets compiled at tier level 4, this is probably due to be more complex than the ConsoleLogPrinter implementation.
1672 296 ! 4 com.theboreddev.examples.jvm.compiler.DeOptimisationExample$FileLogPrinter::printLog (305 bytes) 1834 235 ! 3 com.theboreddev.examples.jvm.compiler.DeOptimisationExample$FileLogPrinter::printLog (305 bytes) made not entrant
Then more than three seconds later, the main method gets compiled at tier level 4 again and replaces the previous compiled version at level 3.
5153 306 % 4 com.theboreddev.examples.jvm.compiler.DeOptimisationExample::main @ 2 (80 bytes) 5224 269 % 3 com.theboreddev.examples.jvm.compiler.DeOptimisationExample::main @ 2 (80 bytes) made not entrant
In our code the loop calls only ConsoleLogPrinter for 699,999 iterations, what makes the compiler to optimise the code to be used for that implementation, something called “monomorphic dispatch”.
Later on, when it starts using FileLogPrinter it detects that the existing compiled version is no longer valid, so a de-optimisation occurs and the code has to be recompiled.
We’ve seen the two scenarios where de-optimisations can happen:
- Expected de-optimisation of level 3 compiled code in favor of level 4 compiled code in tiered level compilation.
- Another scenario where compiler optimises code based on the existing information but later on that information changes on execution time.
Other things to look at
Although this is a topic that could be discussed extensively, some other things we could have in consideration when analysing our code are:
- Code cache sizes. Watch for issues with code cache sizes.
For example, you can use JDK Mission Control (JMC) to see this under “Memory” section:
As you can see, there are three different types of code caches which sizes you can check in JMC. Explaining how these caches work and the different types will be out of the scope of this article.
Some signs that there could be some issues with code cache size are: COMPILE SKIPPED log message; multiple level 1 and level 2 compilations for queued level 3 and 4 compilations; and lastly a specific warning from the JIT compiler specifying “CodeCache is full. Compiler has been disabled”.
If you have problems with code case sizes you can check on Oracle’s documentation how to tune the code caches here.
- Compiler thresholds
Although normally we don’t need to modify these flags, there are two flags to specify JIT compiler after how many invocations of one method this should become a candidate to be compiled.
The flags to modify these values are -XX:Tier3InvocationThreshold=N and -XX:Tier4InvocationThreshold=N. The default values are 200 and 5000 respectively.
- Compiler threads
The number of compilers is set based on the CPU available on the machine and they can be overridden using an existing flag. Again, most of the times this flag won’t need to be modified but in case you need it:
That is the total number of threads and they way it works is that a third of the threads will be assigned to C1 compiler and the remaining threads will be assigned to C2 compiler. Of course, if a third is less than one there will be a minimum of one thread assigned to that compiler.
This can be a very extensive topic, so I’ve tried to don’t overwhelm you guys with too much information in this article; however, there are still some interesting things to talk about. I will talk about those in the second part of this article “Java JIT compiler explained – Part 2”.
If you are interested in reading more Java articles, you can find our articles about Java here.
I really hope you’ve enjoyed this reading and that I made you feel a bit more interested in the internals of the JVM. If you like my articles click “Subscribe” to receive a notification when a new post is published.
Thank you for reading!
You must log in to post a comment.