Turbocall: the Just-in-time compiler for Deno FFI

Divy Srivastava1
1[email protected]
(March 25, 2024)
Abstract

In this post, we will explore the lesser known optimization in Deno that makes FFI fast.

1 Introduction

V8 Isolates are little sandboxes that run JS. JavaScript runtimes give you the ability to call native functions by reaching out of this sandbox. These native functions are often referred to as ”bindings”.

Optimizing these bindings are one of the most important optimizations in a JavaScript runtime. Over the years, V8 has made significant improvements in this area to make bindings faster for embedders.

Let’s look at an example of a V8 C++ binding:

1void Add(const FunctionCallbackInfo<Value>& args) {
2 Isolate* isolate = args.GetIsolate();
3 // Check the number of arguments passed.
4 if (args.Length() < 2) {
5 isolate->ThrowException(Exception::TypeError(
6 String::NewFromUtf8(isolate, "Wrong number of arguments", NewStringType::kNormal).ToLocalChecked()));
7 return;
8 }
9 // Check the argument types
10 if (!args[0]->IsNumber() || !args[1]->IsNumber()) {
11 isolate->ThrowException(Exception::TypeError(
12 String::NewFromUtf8(isolate, "Wrong arguments", NewStringType::kNormal).ToLocalChecked()));
13 return;
14 }
15 // Convert the arguments to numbers.
16 double value = args[0]->NumberValue(isolate) + args[1]->NumberValue(isolate);
17 // Create a new Number value and set it as the return value.
18 Local<Number> num = Number::New(isolate, value);
19 args.GetReturnValue().Set(num);
20}
Listing 1: example V8 C++ binding

This does a bunch of stuff, like checking the number of arguments, type checking, converting arguments and setting the return value. Moreover, V8 has to jump through (quite literally) a lot of hoops to make this work. It sets up guards and jumps out of the optimized JIT code to the runtime.

What if there was a way to call bindings without moving out of the optimized JIT code and without all the type checks?

2 V8 Fast API Calls

V8 Fast calls are a relatively new optimization in V8.

V8 can call our native binding directly from the optimized JIT code if we provide it with the necessary type information. The necessary typechecks happen in the compiler itself including fallback to the slow path.

1int FastAdd(int a, int b);
2
3// Extracts type information from the function signature
4v8::CFunction fast_add = MakeV8CFunction(FastAdd);
Listing 2: example V8 Fast API call

This results in massive speedups for repetitve native calls from optimized JavaScript. The calls are inlined and theoretically as fast as calling a native function.

Apart from native runtime bindings, one of the most common places where this optimization is used is in FFI (Foreign Function Interface) calls.

3 Enter Deno FFI

1const { symbols } = Deno.dlopen("libc.6.so", {
2 open: {
3 parameters: ["buffer", "i32"],
4 result: "i32",
5 },
6});
Listing 3: example Deno FFI

‘Deno.dlopen‘ is the API to open a dynamic library. Notice anything familiar? We are defining the number of arguments, types and the return value.

We could use this information to generate optimized native binding and give it to V8!

4 Turbocall: a JIT for JIT

111https://github.com/denoland/deno/tree/ae52b49dd6edcfbb88ea39c3fcf0c0cc4b59eee7/ext/ffi

Deno created a tiny assembler (in Rust ofc) to generate optimized bindings for FFI calls based on the type information.

1Deno.dlopen("libtest.so", {
2 func: {
3 parameters: ["buffer", "i32", "i32"],
4 result: "i32",
5 },
6});
Listing 4: example Deno.dlopen

Turbocall generates the following bindings:

1.arch aarch64
2
3ldr x0, [x1, #8] ; buffer->data
4mov x1, x2 ; a
5mov x2, x3 ; b
6
7moxz x8, 0
8br x8 ; tailcall
Listing 5: example Turbocall assembly

This is simply ARM64 assembly for something like this in C:

1int func_trampoline(void* _this, FastApiTypedArray* buffer, int a, int b) {
2 return func(buffer->data, a, b);
3}
Listing 6: generated function trampoline

Most notably, it generates code to properly pass JS typed arrays and arguments to the native FFI symbol.

I gave a talk on this topic at the DenoFest Meetup in Tokyo222https://www.youtube.com/watch?v=ssYN4rFWRIU which goes into more detail about the implementation.

5 Benchmarks

This made FFI calls 100x faster in Deno: https://github.com/denoland/deno/pull/15125

Let’s see how this compares against other runtimes.

Refer to caption
Figure 1: Benchmark comparing Deno, Bun and Node.js on Sqlite and DuckDB

This is running sqlite3 and duckdb benchmarks on Deno, Bun and Node.js. See benchmark source. 333https://github.com/littledivy/blazing-fast-ffi-talk

6 Turbocall in action

Slide from the DenoFest talk:

Refer to caption
Figure 2: Turbocall slide

7 Future

It will be interesting to see how Static Hermes444https://tmikov.blogspot.com/2023/09/how-to-speed-up-micro-benchmark-300x.html will compare against V8 fast calls. Both can probably generate similar code at runtime but implemented very differently.

I’m also excited about ‘just-js/lo‘555https://github.com/just-js/lo which is a WIP low-level JS runtime that aims to generate V8 fast calls bindings ahead-of-time (similar to Deno) but also allow for a more engine-agnostic design where you could swap out V8 for other engines like Hermes, Quickjs.

That’s it! Feel free to follow me on Twitter: https://twitter.com/undefined_void

This document is available as PDF: https://divy.work/pdf/turbocall.pdf