Natively run OCaml from Rust

“How to get segfaults with two safe languages”

Introduction

This document is available as a pdf here

Conceptual difference between calling and FFI.
Conceptual difference between calling and FFI.

Here come FFIs, or how to compile pieces of code from different languages so that you don’t have to rely on forking, subprocesses, JSON/MessagePack/Cap'n Proto, etc.

One language will declare the availability of some functionality to the external world, another will declare its use of some external functionality, both will be compiled together and the resulting binary will not distinguish who’s who and just run fast. Some manual fiddling may be required to get the memory representation to match, but often it’s just a question of shifting by a few bytes one way or another.

In the present document we focus on interfacing OCaml [1], Rust [2] and C [3] but more languages support this out of the box [4–6]. More specifically, we’ll see how we can make OCaml functions available to either C or Rust, in two different ways.

Disclaimer: this has only been tested on GNU/Linux, with

We hope it generalizes!

Expose the relevant OCaml functions

OCaml needs to know what function the external world is allowed to call. Adding this line does exactly this:

This tells the world that there is a function named “twice” that it can try to use, and when used it will call the OCaml function twice.

The first argument, the string, will be crucial down the road since the callback syntactically matches on this. Mismatched names lead to segfaults and there are no protections — all lies in the programmer’s hands.

Next step is to compile this in such a way that it can be linked in other projects. Both static and dynamic linking will be covered below.

The static way

Compiling the OCaml libraries

Using ocamlopt

Ocamlopt has a flag for static libraries, -output-obj. However this will not export OCaml’s runtime, leading to more linking mess. There is anotherIt’s not in the man page, but ocamlopt --help yields “output-complete-obj: Output an object file, including runtime, instead of an executable”

 flag that does the job, namely -output-complete-obj.

This creates a static library that contains OCaml’s runtime.

Using dune — formally known as “jbuilder”

Dune is a build system designed for OCaml/Reason projects only. It focuses on providing the user with a consistent experience and takes care of most of the low-level details of OCaml compilation. All you have to do is provide a description of your project and Dune will do the rest.

Using dune can prove useful for big project with several dependencies.

It took us a while to understand why we couldn’t get it to produce the same thing as above. The trick is that it relies on the output name to know what to put in the file — having a correct dune file is not enough.

One needs to:

  • Target an object: that’s in the config file and in the extension.
  • Include the runtime. For dune this requires an executable target.

Here’s a possible dune file:

This and this got us to understand that to do that a working extension is .exe.o, so build it with dune build math.exe.o. Inspecting the file gives the following output:

Targeting C

Code: ocaml_c_static.tar.gz

The C wrapper

A C file that wraps the external functions into C functions is then required. It has to handle the memory layout of values, plus some other plumbing. This is where the exposed string name — “twice” — will matter, beware!

Val_int and Int_val are provided by mlvalues.h to interface the memory layout. We will have to reimplement them later on for Rust interfacing.

This needs to be compiled to a static library too. One way to do this is to run ocamlc -c mathwrap.c which will produce a mathwrap.o static libraryThis can be taken care of either by ocamlopt or by dune directly.

. ocamlc takes care of the caml headers, which by default gcc wouldn’t — though in the end gcc gets called.

The main.c file

The main.c has little more to do. It initializes the OCaml runtime and must be told about the functions — they are not extern because they are already imported by the wrapper:

It must be compiled against both the C wrapper object and the OCaml object. It also needs to know where OCaml’s header files are — and it needs two additional libraries on our machinesThis may change from version to version? I’ve seen curses used elsewhere too.

.

Below is a complete Makefile to summarize the required steps:

Note: we added an ocamlc version for completeness, which requires the libcurses library to be passed on to gcc.

A remark about OCaml libraries

Note that adding OCaml libraries to your source, say the Unix module, can be taken care of by dune. It will go as far as adding the symbols in the output .o file when it canIf it can’t shared objects are required, see below

. As far as we can tell, ocamlopt will not.

If one wants to stay with ocamlopt and add unix.cmxa to the ocamlopt instruction, then unix.a must be passed to gcc. However some functions may be defined twice, both in unix.a and in libmath.o hence errors will occur. The workaround is to use the ar command to add libmath.o to unix.a and avoid the repetitions of symbols — something like this:

If you want to add more libraries, think about switching to dune. Otherwise the stack gets worse: ar x <lib>.a to get its internal .o files, then ar qs <...> to archive the whole thing together — it gets messy quickly and we’re on shaky grounds.

Targeting Rust

Code: ocaml_rust_static.tar.gz

The idea is fundamentally the same: make an object file and have a wrapper around it that handles the memory layout. One has to go back and forth with the OCaml headers to see how the types are encoded in C in order to make things match.

The main Rust file

We’ll give the file and then go through it:

The first half is a port of OCaml’s header files. For your own project you may either want to check the raml project that already ports a few of the structures or read the relevant OCaml header files.

The #[link(name = "math")] line tells the compiler that the symbols are defined in an object named libmath.o.

Finally it declares Rust functions that call the OCaml counterparts, with the relevant memory layoutization. This is where the name must match what OCaml exported, beware!

The build process

With Rustc directly

Building the object files is the same as the C case. Because the library link name is already given in main.rs, only the -L flag is required — not the -l one (to specify a library path rather than a specific library name).

For some reason we can’t link against the .o file with Rust. An .a archive that contains only one element (libmath.o) makes the following work:We’d love to understand more about this?

.

You can put that in a Makefile and have either dune or ocamlopt build the initial object.

Note: If you get a segfault, either you’re calling a function that does not existRemember the syntactic matching!

, or you have linking problemsAre you linking against a .a and not .o?

, or you’re doomed.

With Cargo

This is the canonical way of building stuff in Rust. Instead of a Makefile, a build.rs handles the building process. The Cargo.toml file has nothing special so we’ll just give the build file:

It looks a lot like the C Makefile: it calls dune and copies the library. The extra ar pass is required here too, we still don’t know why. The last line informs rustc to look for static objects in the build directory.

Phew. That may not look like much, but there are many hidden traps waiting all over: static vs shared library, including the OCaml runtime, handling OCaml’s “Value” type, gluing all this together…

The dynamic way

Let’s now assume that the project uses some OCaml stuff that relies on shared libraries. We’ll use cairo (OCaml binding) as an example since that’s how we ended up learning about this.

Compiling the OCaml libraries

Let’s start with the simplest example provided in the cairo repository and modify it minimally so that it receives a string and write to that string:

output of draw.ml

A set of cairo instructions to draw a square

Upon adding cairo2 to the dune file and assuming OCaml’s cairo2 is installedopam install cairo2

, dune should figure out how to build it.

However, most systems don’t have the static library for cairo — and trying to build them from source leads to a rabbit hole of building more and more required static libraries. Therefore trying to build a complete obj leads to a linking error along the line of “ld can’t find -lcairo”.

Ideally, what one wants is a binary that has the OCaml part as a static library and relies dynamically on the shared cairo library.

This doesn’t “Just Work™ We haven’t been able to do thatdune can’t build a static object without -lcairo and otherwise makes everything dynamic.

 so we moved everything to shared libraries. Changing the”modes" field in dune to shared_object and running dune build draw.so does that.

Sanity check:

Cool, it’s a indeed a shared library.

Targeting C

Code: ocaml_c_shared.tar.gz

The wrapper and main files are modified accordingly — see the attached archive below for the details of implementation. Upon trying to run the program you’ll be faced with the following message:

Looking at the dynamically linked binaries shows the missing library:

OK, but we the programmers know where it is. A few directions:

  • Moving the .so to the relevant /usr/local/lib folder, but then that should probably wait for a make install part.
  • Running the program with LD_LIBRARY_PATH=. ./a.out to tell it to look in the current folder.
  • Compiling it with -Wl,-rpath,'$ORIGIN'$ORIGIN means wherever the binary currently is. Don’t forget to double the $ sign if it’s in a Makefile!

     so that the binary, whenever run, looks in its own current folder for relevant shared libraries.

We opted for the last solution in the code but here’s what happens with the second one:

Yay!

That was not so bad was it?

Targeting Rust

Code: ocaml_rust_shared.tar.gz

With rustc

Change are needed the main.rs similar to what was needed for main.c — details in the source archive — and then the previous dune -> cp libraries -> Rust sequence should build the binary.

As expected, ./main fails with a shared library loading error, but as above the LD_LIBRARY_PATH should solve thatsetting it to the shared library’s path

:

Success.

With cargo

The build.rs file is pretty straightforward once the previous case is solved:

And indeed if we try:

For the time being we don’t know how to tell rustc/cargo to change the rpath of the binary so we don’t know how you can move it around with its friend the shared library and we have to rely on LD_LIBRARY_PATH.

Conclusion

We designed a silly benchmark roughly inspired by a arithmetical operation of computing a Linear Congruential Generator, or rdm, many times.

We ran this a bunch of times to make the test non trivial, and then used the benchmark crate in rust to compute the time per operation.

OCaml:

Rust:

Results of the benchmark:

Call TypeTime (ns)
Rust::looped_rdm7,365
OCaml::looped_rdm9,791
Rust::ocaml::looped_rdm10,025
Rust::looped_rdm (calling ocaml::rdm)29,593

Remarks:

To whoever made it this far:

I hope this may help some other lonely OCaml developer […]

So do we. Reach for us maybe?

Edits


[1] Leroy, X., Doligez, D., Frisch, A., Garrigue, J., Rémy, D. and Vouillon, J. 2013. The ocaml system release 4.01 documentation and user’s manual. (2013).

[2] Matsakis, N.D. and Klock II, F.S. 2014. The rust language. ACM sigada ada letters (2014), 103–104.

[3] Ritchie, D.M., Kernighan, B.W. and Lesk, M.E. 1988. The c programming language. Prentice Hall Englewood Cliffs.

[4] Jones, S.P. 2003. Haskell 98 language and libraries: The revised report. Cambridge University Press.

[5] Chakravarty, M.M. 2003. The haskell foreign function interface 1.0: An addendum to the haskell 98 report.

[6] Bielman, J. and Oliveira, L. 2010. CFFI–the common foreign function interface. CL package version 0.10. 6, (2010).