Graal Truffle tutorial part 4 – parsing, and the TruffleLanguage class
This article is part of a tutorial on GraalVM's Truffle language implementation framework.
- Part 0 – what is Truffle
- Part 1 – setup, Nodes, CallTarget
- Part 2 – introduction to specializations
- Part 3 – specializations with Truffle DSL, TypeSystem
- Part 4 – parsing, and the TruffleLanguage class
- Part 5 – global variables
- Part 6 – static function calls
- Part 7 – function definitions
- Part 8 – conditionals, loops, control flow
- Part 9 – performance benchmarking
- Part 10 – arrays, read-only properties
- Part 11 – strings, static method calls
- Part 12 – classes 1: methods, new
- Part 13 – classes 2: fields, this, constructors
- Part 14 – classes 3: inheritance, super
- Part 15 – exceptions
Parsing
Up to this point, when writing unit tests for our EasyScript implementation, we created the AST Nodes explicitly, like this:
EasyScriptNode exprNode = new AdditionNode(
new IntLiteralNode(1),
new DoubleLiteralNode(2.0));
But, of course, that’s not the way you write programming language code.
Normally, a program is written as text in a file –
for example, the above expression would be written as simply 1 + 2.0
.
The process of transforming that text into an abstract syntax tree is called parsing.
You might be surprised to learn that Truffle does not ship out of the box with any tools to parse your language. Since this is a task every language implementation will have to perform, that might seem like a mistake. However, when considering the problem a little deeper, I think there are good reasons for making that choice.
There are many different parsing algorithms (LL(n), recursive descent, LALR, etc.), each with different tradeoffs around performance, memory usage, context-free grammar constraints, etc. It makes sense for Truffle to not want to be overly prescriptive in the matter, and give the language implementer complete freedom in choosing the right tool for their particular circumstances.
Another factor in making that decision is that Java has a wealth of libraries to choose from for the task of parsing. This blog article gives a nice overview.
For this article series, I’ll be using ANTLR. It’s one of the oldest and most battle-tested of all the libraries, and I like that the result of executing it is the parse tree, instead of forcing you to build the AST yourself by inserting code directly into the grammar file, which I consider an anti-pattern.
I won’t be focusing too much in these posts on the parsing aspect of implementing a new language, nor on the details of using ANTLR, as both of those are deep topics, each worthy of its own article series. Feel free to use my code as the jumping-off point when implementing your own language, and consult the excellent ANTLR documentation as needed.
Here’s the ANTLR context-free grammar for our simple language from
part 3
that allows addition of integer and double
literals:
grammar EasyScript ;
@header{
package com.endoflineblog.truffle.part_04;
}
start : expr EOF ;
expr : left=expr '+' right=expr #AddExpr
| literal #LiteralExpr
;
literal : INT | DOUBLE ;
fragment DIGIT : [0-9] ;
INT : DIGIT+ ;
DOUBLE : DIGIT+ '.' DIGIT+ ;
// skip all whitespace
WS : (' ' | '\r' | '\t' | '\n' | '\f')+ -> skip ;
And here is the actual parser code. It works by first invoking ANTLR to get the parse tree, and then turns that parse tree into our Truffle AST:
import org.antlr.v4.runtime.BailErrorStrategy;
import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.TerminalNode;
public final class EasyScriptTruffleParser {
public static EasyScriptNode parse(String program) {
return parse(CharStreams.fromString(program));
}
public static EasyScriptNode parse(Reader program) throws IOException {
return parse(CharStreams.fromReader(program));
}
private static EasyScriptNode parse(CharStream inputStream) {
var lexer = new EasyScriptLexer(inputStream);
// remove the default console error listener
lexer.removeErrorListeners();
var parser = new EasyScriptParser(new CommonTokenStream(lexer));
// remove the default console error listener
parser.removeErrorListeners();
// throw an exception when a parsing error is encountered
parser.setErrorHandler(new BailErrorStrategy());
EasyScriptParser.ExprContext context = parser.start().expr();
return parseExpr(context);
}
private static EasyScriptNode parseExpr(EasyScriptParser.ExprContext expr) {
return expr instanceof EasyScriptParser.AddExprContext
? parseAdditionExpr((EasyScriptParser.AddExprContext) expr)
: parseLiteralExpr((EasyScriptParser.LiteralExprContext) expr);
}
private static AdditionNode parseAdditionExpr(EasyScriptParser.AddExprContext addExpr) {
return AdditionNodeGen.create(
parseExpr(addExpr.left),
parseExpr(addExpr.right)
);
}
private static EasyScriptNode parseLiteralExpr(EasyScriptParser.LiteralExprContext literalExpr) {
TerminalNode intTerminal = literalExpr.literal().INT();
return intTerminal != null
? parseIntLiteral(intTerminal.getText())
: parseDoubleLiteral(literalExpr.getText());
}
private static EasyScriptNode parseIntLiteral(String text) {
try {
return new IntLiteralNode(Integer.parseInt(text));
} catch (NumberFormatException e) {
// it's possible that the integer literal is too big to fit in a 32-bit Java `int` -
// in that case, fall back to a double literal
return parseDoubleLiteral(text);
}
}
private static DoubleLiteralNode parseDoubleLiteral(String text) {
return new DoubleLiteralNode(Double.parseDouble(text));
}
}
(EasyScriptLexer
and EasyScriptParser
are classes generated from the grammar by ANTLR,
in my case, at build time using the
ANTLR Gradle plugin)
With this in place, we can write our first real EasyScript program!
import com.oracle.truffle.api.CallTarget;
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.assertEquals;
public class ParsingTest {
@Test
public void parses_and_executes_EasyScript_code_correctly() {
EasyScriptNode exprNode = EasyScriptTruffleParser.parse("1 + 2 + 3.0 + 4");
var rootNode = new EasyScriptRootNode(exprNode);
CallTarget callTarget = rootNode.getCallTarget();
var result = callTarget.call();
assertEquals(10.0, result);
}
}
GraalVM’s polyglot API
One of the reasons that Truffle was created in the first place is to make GraalVM the best possible multi-language virtual machine environment. The vision for GraalVM is to allow programmers to freely mix code between many languages in the same program, taking the maxim of “use the best tool for the job” to the extreme. The way all of these different languages can communicate with each other is GraalVM’s polyglot API.
For example,
the Graal team maintains a
JavaScript implementation
(it used to ship bundled with GraalVM,
but since version 22
, it’s now a
separate library
that you have to depend on in your build.gradle
or pom.xml
file),
and we can write a simple unit test executing a JavaScript program straight from Java:
import org.graalvm.polyglot.Context;
import org.graalvm.polyglot.Value;
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.assertEquals;
public class PolyglotTest {
@Test
public void runs_JavaScript_code_correctly() {
Context context = Context.create();
Value result = context.eval("js",
"function sub13(x) { return x - 13; } sub13(25)");
assertEquals(12, result.asInt());
}
}
Context
is the entrypoint to the polyglot API,
and we can use it to evaluate programs with different registered languages
(what GraalVM calls “guest languages”).
Value
is a general class that wraps the result of executing a language.
It can be as simple as a single integer,
or as complex as a function that you can invoke from Java,
or any other JVM-compatible language like
Kotlin,
Scala
or Groovy
(what GraalVM often refers to as the “host language”).
For more information, check out the GraalVM polyglot documentation.
The TruffleLanguage
class
We can register EasyScript as an implemented language,
similarly to the above JavaScript implementation,
by writing a class that extends the abstract TruffleLanguage
class.
We need to override the parse(ParsingRequest)
method that contains the source code of the program we’re called with,
and return from it the CallTarget
that represents the execution entrypoint of our language.
As a last step, we need to annotate our language class with the
@TruffleLanguage. Registration
annotation,
providing it the unique identifier and the human-readable name of our language.
The identifier is what will be passed as the first argument to Context.eval()
.
Here’s how this class looks for EasyScript:
import com.oracle.truffle.api.CallTarget;
import com.oracle.truffle.api.TruffleLanguage;
import com.oracle.truffle.api.TruffleLanguage.Env;
import com.oracle.truffle.api.TruffleLanguage.ParsingRequest;
@TruffleLanguage.Registration(id = "ezs", name = "EasyScript")
public final class EasyScriptTruffleLanguage extends TruffleLanguage<Void> {
@Override
protected CallTarget parse(ParsingRequest request) throws Exception {
EasyScriptNode exprNode = EasyScriptTruffleParser.parse(request.getSource().getReader());
var rootNode = new EasyScriptRootNode(exprNode);
return rootNode.getCallTarget();
}
@Override
protected Void createContext(Env env) {
return null;
}
}
(Don’t worry about the Void
usage here –
every TruffleLanguage
is parametrized with a Context class,
but we don’t need one yet,
so we’re just using Void
as a placeholder.
We’ll write a custom class for the Context in later parts of the series.)
With this in place, we can evaluate EasyScript code the same way we did JavaScript earlier:
import org.graalvm.polyglot.Context;
import org.graalvm.polyglot.Value;
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.assertEquals;
public class PolyglotTest {
@Test
public void runs_EasyScript_code() {
Context context = Context.create();
Value result = context.eval("ezs",
"10 + 24 + 56.0");
assertEquals(90.0, result.asDouble(), 0.0);
}
}
TruffleLanguage
in RootNode
The EasyScriptTruffleLanguage
class also solves a small mystery that you might have noticed in the previous parts of the series,
concerning our RootNode
class.
As a reminder, it looks like this:
import com.oracle.truffle.api.frame.VirtualFrame;
import com.oracle.truffle.api.nodes.RootNode;
public final class EasyScriptRootNode extends RootNode {
@SuppressWarnings("FieldMayBeFinal")
@Child
private EasyScriptNode exprNode;
public EasyScriptRootNode(EasyScriptNode exprNode) {
super(null);
this.exprNode = exprNode;
}
@Override
public Object execute(VirtualFrame frame) {
return this.exprNode.executeGeneric(frame);
}
}
That first argument in the super()
call that we pass as null
is of type TruffleLanguage
,
which means we can modify EasyScriptRootNode
to take an
EasyScriptTruffleLanguage
in its constructor,
and pass that in the super()
call.
Then, in the parse(ParsingRequest)
method in the EasyScriptTruffleLanguage
,
we can pass this
to the EasyScriptRootNode
instance we use for the CallTarget
we eventually return from that method.
Summary
In this part of the series, we’ve made EasyScript a fully-fledged language, with a parser, and a first-class citizen of the GraalVM polyglot ecosystem.
As always, all code from the article is available on GitHub.
In the next part of the series, we will finally start making EasyScript look more like a real programming language – we will add support for (global) variables.
This article is part of a tutorial on GraalVM's Truffle language implementation framework.
- Part 0 – what is Truffle
- Part 1 – setup, Nodes, CallTarget
- Part 2 – introduction to specializations
- Part 3 – specializations with Truffle DSL, TypeSystem
- Part 4 – parsing, and the TruffleLanguage class
- Part 5 – global variables
- Part 6 – static function calls
- Part 7 – function definitions
- Part 8 – conditionals, loops, control flow
- Part 9 – performance benchmarking
- Part 10 – arrays, read-only properties
- Part 11 – strings, static method calls
- Part 12 – classes 1: methods, new
- Part 13 – classes 2: fields, this, constructors
- Part 14 – classes 3: inheritance, super
- Part 15 – exceptions