Best Practices Writing Rust Bindings for Existing C++ Libraries
Introduction
This document is an attempt at guidance for how Rust changes can be made to existing C++ libraries, including core foundational libraries.
For an introduction, see Rust Bindings for C++ Libraries.
Code Organization
For technical reasons, it is generally necessary for the C++ library and its Rust bindings to be the same Bazel target. It is not possible to define the Rust bindings for a target as a completely separate and independent target. The automatically generated bindings, and their configuration, must be on and in the C++ target itself.
The reasons why are fairly technical, and you can stop reading here if you're OK with this.
Technical Justification
Crubit generates bindings using Bazel aspects: given an arbitrary C++ Bazel target, Crubit generates, in an aspect, the Rust library which wraps it. To users it appears as if the Bazel target was both a C++ and a Rust library.
This is necessary for the same reason that it's necessary for protocol buffers.
And, just like protocol buffers, this means that we don't have a rust_library
target where we could customize its behavior using Bazel attributes.
Specifically, we cannot use a regular Bazel rule for bindings generation because the rule cannot generate bindings for transitive dependencies: if A depends on B, then bindings(A) depends on bindings(B), so that bindings(A) can wrap functions in A that return types from B, and so on. (See FAQ: Why can't we use separate rules?)
Because bindings are generated in an aspect, and not a rule, there are only two places to configure the bindings of a target A:
- In the source code of the target receiving Rust support, using configuration pragmas or attributes. (This is similar to protocol buffers.)
- In the BUILD file, on the target receiving Rust support, via
aspect_hints. Aspect hints are a storage location for configuration data, readable by the aspect, placed directly on the target that the aspect runs on.
Generally speaking, it's better to modify the source code than to configure externally via aspect hints. However, some source code annotations are nonstandard and can have performance implications (see b/321933939). In addition to this, source code is not readable from the build system itself, and so where configuring a target requires customizing the build graph, these must go in aspect hints.
For these reasons, currently most publicly available methods of customizing bindings occur in aspect hints.
In any case, any configuration or support for Rust is done directly to the target.
Example
To enable Crubit on a C++ target, one actually modifies the target itself,
adding aspect_hints = ["//features:supported"]. This must
be an aspect hint, not a source code annotation, for all of the above reasons:
- It makes the build faster and more resilient: when Crubit is disabled on a target, Bazel needs to know so it can completely avoid running Crubit on it.
- There is no stable, reliable, and style-approved header-wide pragma we can
use for enabling/disabling Crubit, but
aspect_hintsdoes work.
FAQ: Why can't we use separate rules?
A library A, and its bindings bindings(A), must be linked together in the
build graph: if B uses a type from A, then bindings(B) uses a type from
bindings(A).
Crucially, this also goes in reverse: if a Rust library C uses a type from
bindings(A), then reverse_bindings(C) uses a type from A. This forms a
natural dependency cycle: the build graph must understand both the link from A
to bindings(A), and the link from bindings(A) to A.
Crubit resolves this by making A and bindings(A) the same target in the
build graph: bindings for a target are obtained by reading an aspect on the
target.
It is not possible to make A one build target, and bindings(A) a separate
build target, call it X:
- We cannot literally configure on
Athat its bindings are in a different targetX, because this ends up producing a real dependency cycle, as mentioned above: ifbindings(A)=X, thenreverse_bindings(X)=A. - We cannot avoid the cycle by creating the dependency "lazily", or
"dynamically" based on e.g. a naming scheme during Bazel analysis. Bazel
dependencies cannot be discovered dynamically; once Bazel reaches this point
of evaluation, dependencies need to be fully resolved: labels in
depsare no longer strings in this stage, they are edges in a dependency graph. That graph must not have cycles. - In some limited cases, we can hardcode the relationship within Crubit:
Crubit is actually two aspects, each of which handles a single direction of
interop. So Crubit can hardcode inside of itself that
bindings(A)=X, and in the other half, thatreverse_bindings(X)=A. This requires that Crubit itself depends onAandX. Therefore, to avoid another dependency cycle, neitherAnorXcan depend/use Crubit in their transitive dependencies. This is not feasible except in very isolated cases. Currently, we only do this for the Rust and C++ standard libraries.
To compare with another similar technology, PyCLIF avoids this problem because it only supports "one-directional" interop, and so it doesn't need to avoid dependency cycles. Crubit is bidirectional, and this comes with some technical restrictions.
FAQ: Why are there extra dependencies in deps(target)?
Because the Rust bindings are created using an aspect on the C++ target,
everything that the Rust bindings need to depend on will appear in a Bazel query
/ depserver query for deps(target).
For example, if you wanted to add some extra source file to the Rust bindings,
you might specify them in aspect_hints. This file will show up in
deps(target).
These Rust-only deps are not used at all in pure-C++ builds (the Bazel actions registered by them won't be executed), but they will show up in the dependency graph anyway, due to how Bazel query and depserver track dependencies.
NOTE: In particular, if your project has tests that count/limit the transitive dependencies of a C++ binary, they will overcount the dependencies, and the overcounting will get worse as Rust support is rolled out through the C++ build graph.
Wrapping and type bridging vs direct use of types
Crubit automatically generates layout-compatible Rust equivalents of C++ types.
When the C++ type is Rust-movable, the
Crubit-generated Rust type is Rust-movable, these can be used by value, by
pointer, in struct fields, arrays, and any other compound data type. A C++
pointer const T* can become a Rust *const T, and a C++ T field can become
a Rust T field, and so on, with few restrictions.
For example, the following C++ type:
struct Vec2d {
float x;
float y;
};
Becomes (roughly) the following Rust type:
#![allow(unused)] fn main() { #[repr(C)] struct Vec2d { pub x: f32, pub y: f32, } }
These have an identical layout, and so a C++ pointer or field containing a C++
Vec2d is exactly equivalent to a Rust pointer or field containing a Rust
Vec2d.
(See Types for more information about layout-compatibility.)
Because of this, it is often not required to manually write any new types. The bindings generated by Crubit will produce a working type automatically.
When to wrap a type
There are, still, a handful of reasons to manually write "wrapper" types which encapsulate or replace the original C++ type (or its Crubit-generated Rust type).
- If the type is not naturally Rust-movable, but it's important for the Rust type to be Rust-movable. It may be possible to make changes to the C++ code to make the type Rust-movable using some of the strategies described in the cookbook. This allows the greatest flexibility, as the type becomes usable in almost every context. But if that is not possible, writing a new "wrapper" type can keep Rust programmers productive.
- Some Rust types have very special semantics, which are impossible to
implement in the bindings for a C++ type. For example, Rust has special
support for
ResultandOptionin error handling via the?operator, which cannot yet be implemented byStatusorstd::optionalusing stable Rust features. These privileged Rust types can be used instead of the equivalent C++ types, as a wrapper type.
In these cases, we may bridge to a wrapper type as a workaround, while we hopefully fix the underlying issues that mean we cannot directly use the underlying type. This offers us a subset of the API we want, and allows continued progress.
Why not to wrap a type
Wrapper types work best when passed by value: if you return a T in C++, the
corresponding Rust function can automatically convert it to and return a
WrappedT.
However, no conversion is possible for references or fields, which really are
the original type, with its size and alignment and address in memory - to make
this work transparently requires an ever-expanding network of wrapper types, one
for every compound data type that might contain T:
Tmust becomeWrappedTconst T&, if it is supported at all, must become something likeTRef<'a>, or a dynamically sized&TView.std::vector<T>, if it is supported at all, must become something likeTVector.struct MyStruct {T x;}must become a wrappedWrappedMyStruct.- ...
The problems introduced by wrapper types can easily outweigh the benefits that they bring. Crubit aims to reduce their necessity to zero over time.
Bad reasons to wrap a type
In most other circumstances where one might want to reach for wrapper types, alternatives exist:
-
If we want to use a wrapper type in order to give the type a nicer Rust API, then, as an alternative, one can customize the Rust API of the wrapped type using an aspect hint. You can define new methods and trait implementations to the side, without altering any C++ code.
-
If we want to use a wrapper type in order to change the type invariants – to make them stricter or looser – this is fine, as long as it doesn't replace the not-as-nice type. For example, if a C++ API returns
std::string(bytes, "probably" UTF-8), the Rust equivalent should not return a RustString(Unicode, definitely UTF-8). Changing type invariants in-place causes some APIs to become impossible to call, and causes the Rust and C++ ecosystems to diverge and become incompatible. The bindings should be high fidelity. Wrapper types of this form should be optional, and available equally to both C++ and Rust to avoid fragmenting the ecosystem.
Fidelity
Anything possible in C++ should be possible in Rust. See
The Rust API for a given C++ API should not try to make the interface "better" at more than a superficial level, because it can compromise the ability of other teams to write new Rust code, or port existing C++ code to Rust.
Good changes:
- Changing method names, especially to names that Rust callers might expect.
For example, changing
Status::ok()(C++) toStatus::is_ok()(Rust) – Rust callers expect many of these boolean functions to be prefixed withis_. - Adding new APIs that Rust users expect. For example, trait implementations
that allow the type to better interoperate with the Rust ecosystem, or
functions which accept a
Pathor&strin addition to a raw C++string_view. - Reifying C++ comments around lifetime or safety as actual lifetime
annotations or
unsafedeclarations.
If the Rust type is outright unnatural to use, people won't use it, and it's worse for the ecosystem to have two APIs than one API.
Bad changes:
- Removing deprecated APIs which still have C++ callers.
- Placing new requirements on Rust callers that were not placed on C++ callers, such as requiring UTF-8 when C++ does not.