portrait

End of Line blog

Thoughts on software development, by Adam Ruka

Graal Truffle tutorial part 0 – what is Truffle?

This article is part of a tutorial on GraalVM's Truffle language implementation framework.


I recently worked with GraalVM’s Truffle language implementation framework. I wanted to write a tutorial about it, as I’ve found the existing explanations of how it works difficult to follow (I had to piece a lot of information from different sources to get the whole picture). Since it’s a very large topic, I’m breaking it up into multiple parts. In the first installment, I want to explain exactly what Truffle and Graal is, and how they work together to help you easily create a high-performance implementation of a programming language.

Truffle is very closely tied to a different research project from Oracle Labs – Graal. So, to understand Truffle, we need to start with Graal.

Graal

So, what is Graal? Fortunately, this first question has a very straightforward answer. Graal is a compiler. It’s written in Java, and can be called from Java like any other library. It takes Java Virtual Machine bytecode as its input, so let’s spend a second explaining what that is in case you’ve never seen it before.

Bytecode is the binary executable format that the JVM uses for programs running on it. It is the format produced when you compile Java code. It’s stored in files with the .class extension; the popular JAR files that Java libraries are often distributed as contain inside of them those bytecode files.

For example, if you have a Java class like this:

public class Example {
    public static int increment(int argument) {
        return argument + 1;
    }
}

You can compile it with javac:

$ javac Example.java

That produces a file Example.class that contains the bytecode. We can look at it using the javap tool:

$ javap -c Example.class
public class Example {
  public Example();
    Code:
       0: aload_0
       1: invokespecial #1          // Method java/lang/Object."<init>":()V
       4: return

  public static int increment(int);
    Code:
       0: iload_0
       1: iconst_1
       2: iadd
       3: ireturn
}

This shows that bytecode is quite a bit lower-level than the Java language. Interestingly, we can also see the default constructor there (which does nothing but call the constructor of Example‘s superclass, java.lang.Object), which was generated by the javac compiler, since we did not declare a constructor ourselves, and each class in Java has to have at least one constructor (otherwise, it would be impossible to instantiate or subclass it!).

So, Graal is a compiler that takes this bytecode as input. As its output, it produces machine code, as well as several intermediate representations that are useful for debugging purposes (we’ll see some of them in later articles in the series).

GraalVM

So, that is Graal. But what’s the difference between Graal, and GraalVM? To answer that, we first need to dig a little deeper into the JVM architecture.

The way the JVM achieves high performance is Just-In-Time compilation. When the JVM first starts executing code, it simply interprets the bytecode that we’ve seen above. However, as time passes, and the JVM gathers profiling information about the execution of the bytecode, it turns the bytecode that is invoked frequently into machine code, which is multiple orders of magnitude faster to execute. This process is called Just-In-Time (JIT) compilation, because the machine code is generated not during the process of compiling the source files (which would be Ahead-Of-Time, or AOT), but during the execution of the program.

The JIT compiler that is used in most distributions of the JVM is HotSpot. It’s the first Java JIT compiler, and thus its codebase is quite old. It’s also written in C++; and since a JIT compiler is a very complex piece of technology, C++’s unmanaged nature means every potential bug in its code can lead to very serious JVM issues, like runtime crashes, security vulnerabilities, or memory leaks. All of those factors mean that developing HotSpot is so difficult that only a few experts in the world can realistically do it. This slow pace of development means that HotSpot lags behind the current state of the art when it comes to supporting all of the newest optimizations.

Graal, since it’s written in Java, is much simpler and less risky. The whole point of Java as a language is to be higher-level than C++, and for developers using it to be more productive, so it would stand to reason that that would be true for a piece of software as complex as the JVM itself as well.

So, GraalVM is a JVM distribution that uses Graal as its JIT compiler, instead of the Hotspot compiler. It’s possible because of JVMCI, which is a JDK extension introduced in Java 9 that allows you to swap out the JIT compiler implementation the JVM uses.

Truffle

Which leads me to Truffle. Truffle is a language implementation framework whose goal is to make it much easier to develop high-performance implementations of new programming languages.

Traditionally, there are 2 ways to implement a new programming language:

  1. An interpreter, which is (relatively, of course) easy to implement, but slow.
  2. A compiler that generates code, which is much more efficient to execute than the interpreter, but much, much more difficult to actually implement (especially if you want to support sophisticated optimizations).

Truffle’s goal is to combine the best traits of both approaches. It allows you to write a simple interpreter for your custom language in Java, using the framework it provides, which then gets JIT-compiled by GraalVM into efficient machine code. The end result (that is, the execution speed of your new language) is pretty much the same as what you would get if you wrote a sophisticated optimizing compiler yourself, but with a tiny fraction of the development effort.

For example, an interpreter for a simple language that allows only addition of integer literals, would look something like this:

import com.oracle.truffle.api.nodes.Node;

// Node is a class from Truffle
public abstract class MyNode extends Node {
    public abstract int executeInt();
}

public class IntLiteralNode extends MyNode {
    private final int value;

    public IntLiteralNode(int value) {
        this.value = value;
    }

    @Override
    public int executeInt() {
        return this.value;
    }
}

public class IntAddNode extends MyNode {
    private final MyNode left, right;

    public IntAddNode(MyNode left, MyNode right) {
        this.left = left;
        this.right = right;
    }

    @Override
    public int executeInt() {
        int leftResult = this.left.executeInt();
        int rightResult = this.right.executeInt();
        return leftResult + rightResult;
    }
}

How does Truffle change that straightforward interpreter into efficient machine code? It uses something called partial evaluation. Partial evaluation is a very interesting topic in theoretical computer science – see the Wikipedia article on Futamura projections for some pretty mind-bending ideas. But for our case, it means compiling specific instances of a class, as opposed to the entire class, with aggressive optimizations, mainly inlining, constant folding and dead code elimination.

Let’s see a concrete example. Let’s say you’re interpreting the following program in our simple “integer literal with additions” language from above: 12 + 34. After parsing and creating the correct Nodes from the parse tree, the interpreter looks something like this:

public int interpretMyLanguage() {
    MyNode node = new IntAddNode(
            new IntLiteralNode(12),
            new IntLiteralNode(34));
    return node.executeInt();
}

If the number of invocations of this code exceeds some predetermined threshold, Truffle sends it to Graal for compilation to machine code using partial evaluation (remember, Graal is a compiler written in Java, which means it exposes an API that can be invoked from other Java code). Even though this code operates on specific instances of classes like IntAddNode and IntLiteralNode, Graal has access to their class’ original bytecode through standard JVM reflection. Graal does aggressive inlining of the executeInt method and constant folding for these objects, and produces something like this (I’m showing the code in Static Single Assignment form):

int interpretMyLanguage();
    $tmp1 = 12            # left IntLiteralNode, inlined and constant-folded
    $tmp2 = 34            # right IntLiteralNode, inlined and constant-folded
    $tmp3 = $tmp1 + $tmp2 # inlined executeInt() from IntAddNode
    return $tmp3

(By the way, it’s very probable that Graal would keep applying constant folding until it reduced this code to just return 46, but for illustrative purposes, let’s leave it as-is)

Looking at the code generated by Graal, we see that something incredible has happened: the interpreter is gone! There is no mention of classes like IntLiteralNode and IntAddNode anymore; in fact, there is not even an executeInt() method! All of the interpreter code has been eliminated by partial evaluation of the class instances.

What is even more incredible, the above code is exactly what you would expect a compiler for the language to produce for the program 12 + 34! But we never had to worry about writing a code generator, making sure it supports all of the different processor architectures the JVM runs on, etc. We just wrote a straightforward Java interpreter for our language, and Truffle took care of transforming it to efficient machine code.

Truffle to Graal hints

What is more, because Truffle knows that it will be Graal doing the compilation to native code, and not some generic compiler, it allows the interpreter to add hints to its code that will help Graal produce more efficient machine code when JITting.

We’ll see many of these hints used in later parts of the tutorial, but here’s a taste. Imagine we want to extend our simple “integer literals with addition” language with a built-in average function. It takes in an any amount of arguments, and returns the (arithmetic) average of all of them. For example, avg(1, 2, 3) is (1 + 2 + 3) / 3 = 2.

The implementation of the IntAverageNode in the interpreter is fairly simple:

public class IntAverageNode extends MyNode {
    public final List<MyNode> args;

    public IntAverageNode(List<MyNode> args) {
        this.args = args;
    }

    @Override
    public int executeInt() {
        int sum = 0;
        for (int i = 0; i < args.size(); i++) {
            sum += args.get(i).executeInt();
        }
        return sum / args.size();
    }
}

However, there’s something interesting here. Even though the interpreter uses a loop, every instance of the IntAverageNode class will require a constant number of iterations to calculate its result, and that number will never change throughout the lifetime of a specific IntAverageNode object. For instance, our example from above, avg(1, 2, 3), will always require exactly 3 iterations of that loop, and that number can’t ever change.

So, in your interpreter, you can tell Graal that that is in fact the case by annotating your executeInt method with the @ExplodeLoop Truffle annotation:

public class IntAverageNode extends MyNode {
    // same as above...

    @Override
    @ExplodeLoop
    public int executeInt() {
        // implementation as above...
    }
}

This way, when this code gets JITted, Graal will know to apply the loop unrolling optimization, which, when combined with inlining and constant folding, will produce the following code for the avg(1, 2, 3) case (again in SSA form):

    $sum = 0
    $tmp1 = $sum  + 1  # $tmp1 == 1
    $tmp2 = $tmp1 + 2  # $tmp2 == 3
    $tmp3 = $tmp2 + 3  # $tmp3 == 6
    $tmp4 = $tmp3 / 3  # $tmp4 == 2
    return $tmp4

The loop is completely gone, and the final executable code is (again) pretty much exactly what you would expect to be generated from a compiler for the language.

Summary

So, these are the basics of Truffle (and Graal). In later parts of the series, we’ll dive deeper into the various elements of writing the interpreter, and exploring its performance.


This article is part of a tutorial on GraalVM's Truffle language implementation framework.