Crubit: C++/Rust Bidirectional Interop Tool

rust workflow

NOTE: Crubit currently expects deep integration with the build system, and is difficult to deploy to environments dissimilar to Google's monorepo. External contributions are accepted, but may in some cases be difficult to integrate for tooling reasons. See CONTRIBUTING. Both of these are being worked on, see https://github.com/google/crubit/blob/main/docs/overview/status.md#usage-outside-of-google

Crubit is a bidirectional bindings generator for C++ and Rust, with the goal of integrating the C++ and Rust ecosystems.

Status

See the [status](http:///overview/status) page for an overview of the current supported features.

Example

C++

Consider the following C++ function:

bool IsGreater(int lhs, int rhs);

This function, if present in a header file which is processed by Crubit, becomes callable from Rust as if it were defined as:

pub fn IsGreater(lhs: ffi::c_int, rhs: ffi::c_int) -> bool {...}

Note: There are some temporary restrictions on the API shape. For example, functions that accept a type like std::map can't be called from Rust directly via Crubit. These restrictions will be relaxed over time.

Rust

Consider the following Rust function:

#![allow(unused)]
fn main() {
pub fn is_greater(lhs: i32, rhs: i32) -> bool { ... }
}

This function becomes callable from C++ as if it were defined as:

bool is_greater(int32_t lhs, int32_t rhs);

Note: There are some temporary restrictions on the API shape. For example, functions that accept two mutable references can't be called from C++ directly via Crubit. These restrictions will be relaxed over time.

Getting Started

We have detailed walkthroughs on how to use C++ from Rust, or Rust from C++, using Crubit, as well as copy-pastable example code. The example code also includes spanshots of what the generated bindings look like.

Building Crubit

Cargo

cc_bindings_from_rs

You can build cc_bindings_from_rs, which allows Rust code to be called from C++, using cargo build --bin cc_bindings_from_rs.

rs_bindings_from_cc

Prerequisites:

  • Requires LLVM and Clang libraries to be built and installed.
    • They must be built with support for compression (zlib), which is the default build config.
  • Requires Abseil libraries to be built and installed.
  • Requires zlib (e.g. libz.so) to be available in the system include and lib paths.
  • An up-to-date stable Rust toolchain.

Linux-specific setup:

# Choice of compiler is optional.
export CC=/path/to/clang
export CXX=/path/to/clang++

# We must use `lld` linker via clang. It must be in the PATH.
export PATH="$PATH:/dir/containing/lld"
export RUSTFLAGS="$RUSTFLAGS -Clinker=/path/to/clang"
export RUSTFLAGS="$RUSTFLAGS -Clink-arg=-fuse-ld=lld"

# If you want to use a sysroot.
# SYSROOT_FLAG=--sysroot=$SYSROOT
# export CXXFLAGS="$CXXFLAGS $SYSROOT_FLAG"
# export RUSTFLAGS="$RUSTFLAGS -Clink-arg=$SYSROOT_FLAG"

MacOS-specific setup:

export CC=clang
export CXX=clang++
export RUSTFLAGS="$RUSTFLAGS -Clinker=clang"
export RUSTFLAGS="$RUSTFLAGS -Clink-arg=-fuse-ld=lld"

# Point to the Xcode sysroot.
export CXXFLAGS="$CXXFLAGS -isysroot $(xcrun --show-sdk-path)"
export RUSTFLAGS="$RUSTFLAGS -Clink-arg=-isysroot -Clink-arg=$(xcrun --show-sdk-path)"

Windows-specific setup:

  • Windows is currently unsupported, and the APIs generated by Crubit may not compile and will change over time.
  • All commands must be run from a development shell, where MSVC environment variables are set up.
# We use clang compiler (clang-cl); MSVC may work too but is unsupported.
export CC=clang-cl
export CXX=clang-cl
# We must use lld to link, which is spelt lld-link. So user-specified linker
# flags must be in MSVC format.
export RUSTFLAGS="$RUSTFLAGS -Clinker=/path/to/lld-link"

# LLVM was built with Zlib support. Point Crubit to the same library.
export CXXFLAGS="$CXXFLAGS /I/path/to/zlib"
export RUSTFLAGS="$RUSTFLAGS -Clink-arg=/LIBPATH:/path/to/zlib"

# Avoid deprecation warnings.
export CXXFLAGS="$CXXFLAGS /D_CRT_SECURE_NO_DEPRECATE"

# If LLVM (-DCMAKE_MSVC_RUNTIME_LIBRARY) and Abseil (-DABSL_MSVC_STATIC_RUNTIME)
# are built against static CRT, then Rust needs to match, or vice-versa.
# export RUSTFLAGS="$RUSTFLAGS -Ctarget-feature=+crt-static"

Run the build step via cargo:

# Paths for Crubit's cargo to use.
## This path contains clang/ and llvm/ dirs with their respective headers.
export CLANG_INCLUDE_PATH=/path/to/llvm/and/clang/headers
## This path contains libLLVM*.a and libclang*.a.
export CLANG_LIB_STATIC_PATH=/path/to/llvm/and/clang/libs
## This path contains absl/ dir with all the includes.
export ABSL_INCLUDE_PATH=/path/to/absl/include/dir
## This path contains libabsl_*
export ABSL_LIB_STATIC_PATH=/path/to/absl/libs

cargo build --bin rs_bindings_from_cc

Bazel

apt install clang lld bazel
git clone git@github.com:google/crubit.git
cd crubit
bazel build --linkopt=-fuse-ld=/usr/bin/ld.lld //rs_bindings_from_cc:rs_bindings_from_cc_impl

Using a prebuilt LLVM tree

git clone https://github.com/llvm/llvm-project
cd llvm-project
CC=clang CXX=clang++ cmake -S llvm -B build -DLLVM_ENABLE_PROJECTS='clang' -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install
cmake --build build -j
# wait...
cmake --install build
cd ../crubit
LLVM_INSTALL_PATH=../llvm-project/install bazel build //rs_bindings_from_cc:rs_bindings_from_cc_impl

Are we Crubit Yet?

NOTE: The bug links below, of the form b/123456, are for Google-internal tracking purposes.

What follows is an overview of the major features Crubit does and does not support. The list is necessarily incomplete, because there exist more features and types than could be feasibly listed in anything readable, but it should give a rough idea.

This page should evolve over time:

  • If the status of a given feature is not listed, and not clear based on what is here, we should add it.
  • Some features may not have bug IDs attached. If a feature is actively requested, it should be listed with a given bug that updates will be posted to.
  • This page may fall out of date, since the set of features supported by Crubit is documented in many places. Sorry! Please update it if you notice any problems.

Types

See /types for more details about types in general, including explanations of what it means for a type to be ABI-compatible versus layout-compatible.

Unless otherwise specified, the types below are supported and ABI-compatible (see /types/primitive, /types/pointer):

  • integer types (except 128-bit integers)
  • floating point types
  • user-defined types
    • These are either layout-compatible (usually) or ABI-compatible (rarely – if all member types are supported, and it's nonempty, and it uses no obscure attributes)
  • function pointers, where the parameters and return type are in this list and are ABI-compatible
  • std::string_view / absl::string_view
  • Bridged: std::string
  • Bridged: &str
  • Bridged: Rust tuples (e.g. (i32, i64))
  • Bridged: std::optional<T>
  • Bridged: (allowlisted) protocol buffers
  • Bridged: absl::Status
  • raw pointers to any ABI-compatible or layout-compatible item in this list

We have experimental unreleased support for the following types:

  • (2025H2) b/362475441: references and pointers to MaybeUninit<T>, which are treated as T.

We have planned support for the following types:

  • (2025H2) b/271016831: layout-compatible *const [T], *mut [T]
  • (2025H2) bridged Option<T>
  • (2025) b/356638830: layout-compatible std::vector
  • (2025) b/369994952: layout-compatible std::unique_ptr

The following types are not yet supported, among many others:

  • b/254507801: Rust !
  • b/260128806: Arrays (std::array<T, N>, [T; N])
  • b/254094650: i128 and u128
  • Rust String
  • Result<T, E>
  • b/254099023: () as anything but a return type.
  • b/213960614: std::byte

C++

For C++ libraries, used from Rust, we have support for the following language features, used in public interfaces:

  • rust-movable structs. (Either trivially copyable, or [[clang::trivial_abi]])
  • rust-movable unions.
  • enums
  • type aliases
  • non-overloaded functions (which are not member functions)
    • inline or non-inline
    • extern "C" or non-extern "C"

We have experimental unreleased support for the following language features:

  • forward declarations
  • non-trivial types
  • b/356224404: non-overloaded member functions, (overloaded) constructors and assignment operators
  • templated types, bridged to a non-generic concrete type.
    • e.g. vector<int> becomes struct __crubit_mangled_vector_i, not struct vector<T>(...)
    • specialization
  • operator overloading
  • nullability annotations
  • lifetime annotations, mapped unsafely to references
  • Some object-orientation:
    • types with non-virtual base classes
    • upcasting
    • downcasting
    • inherited methods

The following features are not supported yet, among many others:

  • b/213280424: overloading
  • b/313733992: Object-Oriented Programming more generally
    • e.g., cannot derive from a C++ class and override its virtual methods
  • safe support for references
  • template-generic bridging, so that a C++ template becomes a Rust generic
  • non-type using aliases
    • using enum
    • using namespace
  • constants
  • macros

Rust

For Rust libraries, used from C++, we have support for the following language features, used in public interfaces:

  • structs
  • repr(C) unions
  • opaque representations of other user-defined types
    • enums
    • non-repr(C) unions
  • aliases (via use, type)
  • functions and methods
  • references
  • specific known traits with equivalents in C++:
    • Clone
    • Default
    • Drop
    • From
  • simple const constants
  • Defining a C++ enum from Rust

We have experimental unreleased support for the following language features:

  • non-opaque enums
  • non-opaque non-repr(C) unions

The following features are not supported yet, among others:

  • traits and trait methods in general
  • defining C++ abstractions from Rust
    • inheriting from a C++ class
    • defining a C++ base class
  • statics and more complex const constants
  • macros

Usage outside of Google

Crubit was initially written to take advantage of the superpowers that come with a centrally controlled monorepo using a Bazel build system. However, this presents a high barrier to entry: in order to use Crubit, you must satisfy all of the preconditions.

In 2026, we are building Crubit up to be a tool shaped like OSS users expect: an IDL-based FFI tool with Cargo integration, with options for a better experience in codebases with strong control over the build environment. (Though for calling Rust from C++, we might stop short of an IDL, and instead rely on compiler-synced binary releases, since there is only one compiler.)

In particular, this involves decomposing Crubit into a collection of parts that can be used on their own, without needing to consume the whole:

  • Reusable libraries that implement C++ functionality (e.g., forward declarations, nontrivial object semantics.)
  • An IDL-based core, with optional compiler integration at the front-end.
  • Support for building with Cargo, stable named versions of Clang or Rust, etc.

Decoupling from the toolchain

By using an IDL as input, instead of a C++ compiler frontend, Crubit can be made compatible with arbitrary C++ compilers: a human can write the IDL in a way that is compatible with the compiler in question, even if Crubit does not integrate with that compiler yet.

For the Rust compiler, however, there is only one. The main toolchain integration hazard is that the compiler and its arguments must be exactly matched with the version and arguments used to compile the Rust crate itself. This can be resolved by using rmeta files as inputs, instead of source code.

TODO:

  • rs_bindings_from_idl and idl_from_cc exist, and Crubit can be used with IDL inputs
  • cc_bindings_from_rs can accept rmeta inputs

Crate Ecosystem

TODO:

  • Crubit accepts pull requests and regularly reviews GitHub issues and PRs.
  • A C++ stdlib crate exists in crates.io
  • The Crubit ctor crate is either replaced with pin-init, the equivalent standard library module, or else has a crate in crates.io with documentation and an explanation of why to use it vs pin-init.
  • For all other support libraries: they exist in crates.io and are documented.

Build System

We currently only support Bazel.

TODO:

  • cc_bindings_from_rs builds using Cargo
  • rs_bindings_from_cc builds using Cargo
  • idl_bindings_from_cc, rs_bindings_from_idl build using Cargo
  • Crubit is usable as a Bazel dependency
  • Crubit is usable as a Bazel dependency
  • Crubit builds against public Rust and Clang releases
  • Crubit binary releases
  • (not planned) Buck2
  • (not planned) CMake

Types

Overview

/overview/status#types outlines the current and future status of Crubit's type support.

In brief, Crubit supports:

  • Primitive types (/types/primitive), such as float or i32.
  • Pointer types (/types/pointer), such as float* or *const i32, including function pointers.
  • User-defined types, with some language-specific rules and restrictions. (See /cpp and /rust).

ABI-Compatibility

Certain references to C++ or Rust types will not receive Crubit bindings. Some types may only be usable in certain locations due to current Crubit limitations, inherent properties of the type, or both. Supported types fall into one of three categories ranging from "most widely supported" to "most restricted":

  • ABI-compatible: these types have a C-ABI-equivalent representation which can be used anywhere a value of this type is expected from both C++ and Rust.
  • Layout-compatible: these types have equivalent in-memory representations in C++ and Rust but cannot be represented using standard C ABI. These types will only be usable as by-value function arguments if they are C++-movable. For example, Box<i32> is not C++-movable because it has no nullptr / moved-from representation.
  • Bridged: these types may have different in-memory representations in C++ and Rust, and so can only be passed by-value between the two languages. Examples include Rust tuples, which are bridged by-value into C++ std::tuple.
Level of SupportExamplePass by-referencePass by-valueReturn by-valueFieldsIn Function Pointer Types
ABI Compatiblei32YYYYY
Layout-compatible C++ typeabsl::string_viewYif Rust-movable1if Rust-movable2YN
Layout-compatible Rust typeUserDefinedStructYif C++ movable3YYN
Bridged(i32, i32)NYYNN
1

See /cpp/classes_and_structs#trivially_relocatable

2

See /cpp/classes_and_structs#trivially_relocatable

3

See /rust/movable_types

NOTE: All primitive and pointer types are ABI-compatible. However, due to b/369895805, all non-bridged user-defined types are only layout-compatible.

In the following examples, foo receives bindings, but bad_foo will not receive bindings, because while the types it uses in its function signature are supported by Crubit, they are not supported in this particular context.

C++

void foo(int32_t);
void foo(void (*)(int32_t));
void foo(Status);
struct LayoutCompatibleType {
    UnsupportedType field;
    // or [[no_unique_addres]] int field; or...
};

// foo cannot receive bindings, because the function pointer type
// does not work with non-ABI-compatible types
void bad_foo(void (*)(LayoutCompatibleType));
// foo cannot receive bindings, because bridged types cannot be passed
// by reference
void bad_foo(const Status&);

Rust

#![allow(unused)]
fn main() {
pub fn foo(_: i32) {}
pub fn foo(_: fn(i32)) {}
pub fn foo(_: Status) {}
}
#![allow(unused)]
fn main() {
struct LayoutCompatibleType {
    field: UnsupportedType
}

// foo cannot receive bindings, because the function pointer type
// does not work with non-ABI-compatible types
pub fn bad_foo(_: fn(LayoutCompatibleType)) {}
}
#![allow(unused)]
fn main() {
// foo cannot receive bindings, because bridged types cannot be passed
// by reference
fn bad_foo(_: &Status) {}
}

Bidirectionality

Usually, the mapping of types between languages is bidirectional. For example, a C++ function which returns an int32_t will become a Rust function returning an i32, and vice versa. In some sense, an i32 is an int32_t.

However, in other cases, the mapping is not reversible. C++ and Rust have types or aliases that the other language does not. For example, isize becomes intptr_t, but intptr_t is (on some platforms) the same type as int64_t, and so intptr_t becomes i64.

Primitive types

Crubit maps primitive types1 to the direct equivalent in the other language. For example, C++ int32_t is Rust i32, C++ int is Rust ffi::c_int, C++ double is Rust f64, and so on.

Exceptions:

  • C++: There is no mapping for the currently-unsupported types nullptr_t, char8_t, wchar_t, and (u)int128_t.
  • Rust: There is no mapping for the currently-unsupported char and str types, and the never (!) type, except as a return type.

For more information, see Unsupported types

Bidirectional type mapping

The following map is bidirectional. If you call a C++ interface from Rust using Crubit, then int32_t in C++ becomes i32 in Rust. Vice versa, if you call a Rust interface from C++ using Crubit, i32 in Rust becomes int32_t in C++.

C++Rust
void() as a return type, ::core::ffi::c_void otherwise.
int8_ti8
int16_ti16
int32_ti32
int64_ti64
intptr_tisize
uint8_tu8
uint16_tu16
uint32_tu32
uint64_tu64
uintptr_tusize
boolbool
doublef64
floatf32
char::core::ffi::c_char 2
signed char::core::ffi::c_schar
unsigned char::core::ffi::c_uchar
short::core::ffi::c_short
unsigned short::core::ffi::c_ushort
int::core::ffi::c_int
unsigned int::core::ffi::c_uint
long::core::ffi::c_long
unsigned long::core::ffi::c_ulong
long long::core::ffi::c_longlong
unsigned long long::core::ffi::c_ulonglong

One-way type mapping

The types below are mapped in only one direction, but do not round trip back to the original type. For example, size_t maps to usize, but usize maps to uintptr_t.

C++ to Rust

The following C++ types become the following Rust types, but not vice versa:

C++Rust
ptrdiff_tisize
size_tusize
char16_tu16
char32_tu32 3

One-way mapping of Rust to C++ types

The following Rust types become the following C++ types, but not vice versa:

RustC++
! (return type)void

Unsupported types

Bindings for the following types are not supported at this point:

C++

  • nullptr_t and char8_t have not yet been implemented.
  • b/283268558: wchar_t is currently unsupported, for portability reasons.
  • b/254094650: int128_t is currently unsupported, because it does not yet have a decided ABI.

Rust

  • char is currently unsupported, pending design review.
  • b/262580415: str has not yet been implemented
  • b/254507801: ! has not yet been implemented except for return types.
1

Rust calls these types primitive types, while C++ calls them fundamental types. Since the Rust terminology is probably well understood by everybody, we use it here.

3

Unlike Rust char, char16_t and char32_t may contain invalid Unicode characters.

2

Note that Rust c_char and C++ char have different signedness in Google, or any other codebase with widespread use of unsigned char in x86.

TODO(jeanpierreda): document this in more detail.

Pointer types

C++ defines two categories of pointer types, while Rust adds a third. They are:

  • Pointers to some (non-function) object, without lifetime information. C++ calls these object pointers, while Rust calls them raw pointers.
  • Function pointers (C++, Rust).
  • Finally, Rust references: non-aliasing pointers with lifetime information.

With the exception of Rust references, which are only permitted in limited circumstances, pointer types are fully supported as long as the type they point to is supported. For example, const int32_t* maps bidirectionally to *const i32, and void (*)(int32_t) maps bidirectionally to fn(i32).

Object pointers

An "object pointer" is the C++ terminology for any pointer that is not a function pointer. Rust would call these "raw pointers". These are mapped to each other bidirectionally:

C++Rust
const T**const T
T**mut T

C++ pointers with lifetime

C++ allows attaching lifetime annotations to arbitrary types, including pointers. There are two competing annotations for this, neither of which are supported in Rust bindings yet:

Function pointers

C++ function pointers map to Rust extern "C" fn(...) -> ... function pointers, and vice versa:

C++Rust
void(&)(int32_t)>extern "C" fn(i32)
void(*)(int32_t)Option<extern "C" fn(i32)>
std::type_identity_t<void(int32_t)>Not supported 1

If the corresponding C++ function definition would be unsafe in Rust (per the rules for C++ function declarations), then so is the function pointer – for example, a C++ reference to void(void*) becomes a Rust unsafe extern "C" fn(_: *mut c_void).

Not all function pointers receive bindings. If the function cannot be called directly, due to a known or potential ABI mismatch between Rust and C++, then the function pointer receives no bindings. In particular, function pointers cannot take layout-compatible types by value. You can work around this by taking or returning such problematic types by pointer instead of by value.

Lifetime

All function pointers are 'static.

There is no way to specify the lifetime of a function pointer in Rust, nor in C++: both assume a 'static lifetime. In scenarios where the lifetime may be shorter than 'static (e.g., JIT compilation, or dynamic loading and unloading of shared libraries at runtime), the developer is responsible for managing the lifetime of the function pointer.

1

C++ has plain function types: the type pointed to by function pointers. There is no Rust equivalent. However, since C++ functions implicitly coerce to function pointers, this only comes up in template classes like std::function or absl::AnyInvocable. Or, in this case, type_identity_t.

Rust references

Rust references, unlike C++ references, cannot mutably alias. This introduces a new form of Undefined Behavior (UB) that many C++ programmers may not be accustomed to. For now, C++ pointers and references do not map to Rust references. Instead, they map to Rust raw pointers. Vice versa, Rust references are an unsupported type which do not map to any C++ type at all.

The one exception to this rule are function parameters. In some limited circumstances, Rust functions may accept references, and the corresponding C++ interface will accept C++ references. This is documented in /rust/functions.

absl::Status in Rust

NOTE: The APIs here have planned future backwards-incompatible changes, and you may see LSCs as we migrate to the end state API.

In Google C++, the standard types for communicating an error are absl::Status and absl::StatusOr<T>. These have support in Rust when they are directly passed by value, or returned by value, and are mapped to a Rust Result. For example:

absl::Status Foo();

This becomes:

#![allow(unused)]
fn main() {
pub fn Foo() -> Result<(), StatusError> {...}
}

(Specifically, it will return Status, which is an alias for Result<(), StatusError>.)

Calling C++ APIs using Status

C++ functions returning Status/StatusOr can be defined as normal:

cs/file:examples/types/absl_status/cpp_api.h content:ReturnsStatus

...and will return a Result:

cs/file:examples/types/absl_status/user_of_cpp_api.rs content:ReturnsStatus

Calling Rust APIs using Status

Unlike when calling C++ APIs, currently you cannot directly call a Rust API returning a Status or StatusOr. Instead, it must use a workaround type, StatusWrapper. This is tracked by b/441266536.

cs/file:examples/types/absl_status/rust_api.rs

The StatusWrapper type automatically becomes an absl::Status in C++:

cs/file:examples/types/absl_status/user_of_rust_api.cc content:rust_api::ReturnsStatus

Future Evolution

We expect to stop using Result, and instead use the plain actual bindings for absl::Status itself, using the Try trait to enable conversion into Result and error handling via ?.

This would allow Status to be used not only as function parameter and return values, but also in struct fields, arrays, or behind pointers and references.

However, this is blocked on stabilization of the Try trait.

C++/Rust Protobuf interop

WARNING: This page documents functionality that is currently internal to the Google monorepo.

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data. Once you define how you want your data to be structured once, you can generate source code in a variety of languages to manipulate and serialize/deserialize your structured data. Protobuf messages are among the most common types at Google, appearing in vast majority of APIs.

The usual way to passing data from one language to another using Protobufs is to serialize a message in one language, and deserialize it in another. This serialization/deserialization has costs which makes this approach unsuitable for hot code paths.

To avoid those costs, we've intentionally designed C++ and Rust Protobuf message types to have identical layouts. We avoid the need for serialization/deserialization and instead we directly use the same message object from both languages. Crubit automatically generates the zero-cost glue code for us. For example, take this piece of a C++ header:

MyProto Foo(); 

This becomes available to Rust as:

#![allow(unused)]
fn main() {
pub fn Foo() -> MyProto {...}
}

(Specifically, Crubit will detect that this is a Protobuf message, and it will convert from the C++ message type to the Rust message type.)

Calling Rust APIs using Protobuf message types

RustC++
MessageMessage
MessageViewconst Message*
MessageMutMessage*

Protocol buffers are supported by value, and using the View and Mut view types, where they are mapped to C++ pointers.

See cc_bindings_from_rs/test/bridging/protobuf/rust_lib.rs for an example definition, and cc_bindings_from_rs/test/bridging/protobuf/user_of_rust_lib.cc for how to call it from Rust.

Calling C++ APIs using Protobuf message types

Calling C++ APIs which use protobuf is slightly more difficult.

First of all, add your proto_library target to the [allowlist](http://). See b/414381884 for more context & information on when this allowlist will be removed.

Passing by value

C++Rust
MessageMessage

When a C++ proto message is passed or returned by value, it is mapped directly to the Rust message type, as you would expect.

C++:

cs/file:google_internal/protobuf/by_value.h content:foo::Message

Rust:

cs/file:google_internal/protobuf/by_value_test.rs content:by_value\:\:|\bmsg\b

Passing by reference

C++Rust
const Message*, const Message&*const Incomplete<symbol!("Message"), ...>
Message*, Message&*mut Incomplete<symbol!("Message"), ...>

When a C++ proto is passed by pointer or by reference, the Rust type is a pointer to a forward declaration of the C++ protocol buffer type.

In particular, C++ APIs are not exposed using the View or Mut types.

These are pointers because C++ APIs do not annotate ownership, lifetime, or aliasing properties, and so these cannot be mapped to the distinct owned, View, or Mut types of the Rust protobuf API. And these are forward-declared because the C++ types do not have direct Rust bindings: the generated .proto.h file does not get piped through Crubit.

  • To convert a Rust Proto to a C++ const Proto*: use my_proto.as_view().cpp_cast()

  • To convert a Rust Proto to a C++ Proto*: use my_proto.as_mut().cpp_cast()

  • To convert a C++ (const) Proto* to a Rust View/Mut: use unsafe {my_ptr.unsafe_cpp_cast()}.

See support/forward_declare.rs for the definition of Incomplete, CppCast, and UnsafeCppCast.

For copy-pastable example code, see the examples in google_internal/protobuf/

Type visibility

In Crubit's :wrapper mode, pub(crate) types can be generated, which are restricted to a specific library. This is generally a temporary state of affairs: as a way of enabling types to be used for a specific library, without exposing them everywhere, if their bindings are flawed or need work.

Visibility errors

_visibility_error

If the generated bindings for a type are pub(crate), then bindings will not be generated when the type is used outside of that library. For example, consider the following library, which uses :wrapper mode:

struct WEIRD_EXPERIMENTAL_ATTRIBUTE SomeType {};
void Foo(SomeType);

If SomeType is pub(crate) because of its use of WEIRD_EXPERIMENTAL_ATTRIBUTE, then functions, class members, constants, etc. which use that type will only receive bindings in the same crate, and those bindings will themselves be pub(crate):

#![allow(unused)]
fn main() {
pub(crate) struct SomeType { ... }
pub(crate) fn Foo(...: SomeType) { ... }
}

If a different library uses the type, and defines a similar function Bar, then it will not receive bindings at all, because the bindings for Bar are only visible in the library where it was defined.

void Bar(SomeType);  // won't receive bindings: it's in a another library

This can dramatically reduce the set of bindings which are generated, and it is for this reason that these pub(crate) type bindings are only used sparingly, typically for early release of features that cannot yet be globally supported. You should not rely on the pub(crate) status of a type!

Fix

To work around this, you can wrap or hide the type as it is used in the public API. For example, if you needed to accept a pointer to X, but X is pub(crate), you can accept a void* instead.

Rust bindings for C++ libraries

When a C++ library enables Crubit, that library can be used directly from Rust. This page documents roughly what that entails, and additional subpages (available in the left-hand navigation) document specific aspects of the generated bindings.

Tip: The code examples below are pulled straight from examples/cpp/function/. The other examples in examples/cpp/ are also useful. If you prefer just copy-pasting something, start there.

How to use Crubit

Crubit allows you to call some C++ interfaces from Rust. It supports functions, rust-movable classes and structs, and enums. Crubit does not support advanced features like templates or virtual inheritance.

The rest of this document goes over how to create a C++ library that can be called from Rust, and how to actually call it from Rust. The quick summary is:

  1. A cc_library gets (nonempty) Rust bindings if it specifies aspect_hints = ["//features:supported"].

  2. Any Rust build target can depend on the bindings for a cc_library, by specifying cc_deps=["//path/to:target"].

  3. The bindings can be previewed using the following command:

    $ bazel build --config=crubit-genfiles //path/to:target
    

Write a cc_library target

The first part of creating a library that can be used by Crubit is to write a cc_library target. For example:

cs/file:examples/cpp/function/example.h

If you write a BUILD target as normal, it will not actually get Crubit bindings, but we'll start from there:

cs/file:examples/cpp/function/BUILD symbol:example_lib_broken

Look at the generated bindings

Bindings can be generated for any C++ target, anywhere in the build graph. (Crubit is an aspect1 on all C++ targets.) However, that is not to say that the generated bindings will be useful: by default, Crubit doesn't generate any bindings. Try it!

To examine the generated C++ bindings for the target, you can run the following command:

$ bazel build --config=crubit-genfiles //examples/cpp/function:example_lib_broken

This is the best way to preview the generated bindings for a given C++ target right now. You might end up using this a lot, so keep it in your shell history.

If you run the above command, you should see some output like the following:

Aspect //rs_bindings_from_cc/bazel_support:rust_bindings_from_cc_aspect.bzl%rust_bindings_from_cc_aspect of //examples/cpp/function:example_lib_broken up-to-date:
  bazel-bin/examples/cpp/function/example_lib_broken_rust_api_impl.cc
  bazel-bin/examples/cpp/function/example_lib_broken_rust_api.rs
  bazel-bin/examples/cpp/function/example_lib_broken_namespaces.json

These files are the generated bindings which are used under the hood when depending on a C++ target from Rust. They consist of:

  1. The supporting C++ code to glue Rust and C++ together. (The .cc file.)
  2. The public Rust interface. (The .rs file.)
  3. Supporting information that is used by bindings that depend on these bindings. (The .json file.)

You don't need to check them in, as they are regenerated automatically whenever you build a Rust build target which depends on C++.

The .rs file is the interesting one for end users. For a library like :example_lib_broken, which does not enable Crubit, the .rs file will be essentially empty, only consisting of comments describing the bindings it did not generate:

#![allow(unused)]
fn main() {
// Generated from: examples/cpp/function/example.h;l=11
// Error while generating bindings for item 'crubit_add_two_integers':
// Can't generate bindings for crubit_add_two_integers, because of missing required features (<internal link>):
// //examples/cpp/function:example_lib_broken needs [//features:supported] for crubit_add_two_integers (return type)
// //examples/cpp/function:example_lib_broken needs [//features:supported] for crubit_add_two_integers (the type of x (parameter #0))
// //examples/cpp/function:example_lib_broken needs [//features:supported] for crubit_add_two_integers (the type of y (parameter #1))
// //examples/cpp/function:example_lib_broken needs [//features:supported] for crubit_add_two_integers (extern \"C\" function)
}

This error is saying something important. It was trying to generate bindings for the function crubit_add_two_integers, but it couldn't, because four different things about the function require the supported feature to be enabled on the target. The parameter and return types require supported, as does the function itself in the abstract.

supported indicates that a library target supports Rust callers via Crubit, using the stable features. Other functions and classes might require experimental, for experimental features of Crubit. For example, if we had defined anoperator+. For more on this, see .

Enable Crubit on a target

To enable Crubit on a C++ target, one must pass an argument, via aspect_hints. Specifically, as mentioned in the comments, the target must enable the supported feature:

cs/file:examples/cpp/function/BUILD symbol:\bexample_lib\b

This tells Crubit that it can generate bindings for this target, for any part of the library that uses features from supported. Now, if we look at a preview of the automatically generated bindings:

$ bazel build --config=crubit-genfiles //examples/cpp/function:example_lib

We can see the fully-fledged bindings for the library:

cs/file:examples/cpp/function/example_generated.rs

Use a C++ library from Rust

To depend on a C++ library from Rust, add it to cc_deps:

cs/file:examples/cpp/function/BUILD symbol:main

At that point, the bindings are directly usable from Rust. The interface is identical to the .rs file previewed earlier, but can be used directly:

cs/file:examples/cpp/function/main.rs

Common Errors

Unsupported features

Some features are either unsupported, or else only supported with experimental feature flags (). In order to get bindings for a C++ interface, that interface must only use the subset of features currently supported.

For a particularly notable example, a class cannot have a std::string field, because std::string has properties around move semantics that Crubit does not yet support. In turn, this means the class containing the std::string has semantics that Crubit doesn't yet support.

The way to work around this kind of problem, in all cases, is to wrap or hide the problematic interface behind an interface Crubit can handle:

  • Move nontrivial types behind a unique_ptr<T>. A std::string field is not rust-movable, but a unique_ptr<std::string> field is.
  • Hide unsupported types, in general, behind a wrapper. For example, a std::vector<T> is not supported, but a struct which wraps a unique_ptr<std::vector<int32_t>> is.
  • Wrap unsupported functions behind wrappers. For example, methods are not yet supported, but top-level functions are, and can invoke methods.
1

Crubit is an aspect: an automatically generated entity that exists on every build target. It is disabled by default, so that Rust callers don't accidentally impose on C++ libraries that weren't expecting them.

Aspects allow Crubit to fully understand the dependency graph: the
bindings for X are in the Crubit aspect of X. This allows Crubit to
generate bindings which themselves rely on bindings: if a function
in target `A` returns a struct from target `B`, we know that the
bindings for `A` will depend on the bindings for `B`. Because Crubit
is an aspect, it already knows the name of the bindings for `B`:
it's simply the Crubit aspect on `B`!

Without aspects, or something like aspects, you would need to write
down, for every library, the location of its Rust bindings. There is
no need for that kind of boilerplate when aspects are involved, and
that is why most things shaped like Crubit use aspects. For example,
protocol buffers use aspects for their generated implementations in
multiple languages. (They *also* use named rules, but the rules
simply re-export the aspect, and the underlying aspect is what is
used within the rule for referring to transitive dependencies.)
Thanks to aspects, the `proto_library` doesn't need to re-specify
"ah, and the Go proto is named `'x'`".

Be not afraid! Aspects are what make transitive dependencies work
seamlessly, without boilerplate. So when you see aspect this, or
aspect that, remember: this is a Good Thing.

C++ Bindings Cookbook

This document presents a collection of techniques for creating Rust bindings for C++ libraries.

These techniques are often workarounds for gaps in what Crubit can do. Expect the recommended practices to evolve over time, as Crubit's capabilities expand!

BEST PRACTICE: The tips below describe deviations from typical C++ style. (If typical C++ style worked, you wouldn't need a cookbook.) When you deviate from typical C++ style, document why, and try to keep changes limited in scope, close to the interop boundary.

If possible, solve the same problem while staying within more typical C++ style. For example: you may be able to add ABSL_ATTRIBUTE_TRIVIAL_ABI to a type you control, instead of boxing the type in a pointer.

Making types Rust-movable

As described in /cpp/classes_and_structs#rust_movable, types cannot be passed by value in Rust unless they are Rust-movable.

This can happen for a couple of easily fixable reasons, described in subsections:

  • The type defines a destructor or copy/move constructor / assignment operator. If it is in-principle still Rust-movable, and these functions do not care about the address of the object in memory, then the type can be annotated with ABSL_ATTRIBUTE_TRIVIAL_ABI
  • The type has a field which is not rust-movable. In that case, the field can be boxed in a pointer.

There are other reasons a type can become non-Rust-movable, which do not have these easy fixes described below. For example, virtual methods, or non-Rust-movable base classes. For those, your only option is the hard option of more radically restructuring your code to avoid those patterns.

ABSL_ATTRIBUTE_TRIVIAL_ABI

/cpp/cookbook#trivial_abi

One of the ways a type can become non-Rust-movable is if it has a copy/move constructor / assignment operator, or a destructor. In that case, Clang will assume that it cannot be trivially relocated, unless it is annotated with ABSL_ATTRIBUTE_TRIVIAL_ABI.

struct LogWhenDestroyed {
  ~LogWhenDestroyed() {
    std::cerr << "I was destroyed!\n";
  }
};
struct ABSL_ATTRIBUTE_TRIVIAL_ABI LogWhenDestroyed {
  ~LogWhenDestroyed() {
    std::cerr << "I was destroyed!\n";
  }
};

WARNING: Only use ABSL_ATTRIBUTE_TRIVIAL_ABI if changing the location of an object in memory is safe. In particular, if the object is self-referential, using ABSL_ATTRIBUTE_TRIVIAL_ABI will result in Undefined Behavior (UB).

class SelfReferential {
 public:
  SelfReferential(const SelfReferential& other) : x(other.x), x_ptr(&x) {}
 private:
  int x = 0;
  int* x_ptr = &x;
}

Types like this, if Rust-moved, will contain invalid pointers. Carefully review any code adding ABSL_ATTRIBUTE_TRIVIAL_ABI.

Boxing in a pointer

/cpp/cookbook#boxing

One of the ways a type can become non-Rust-movable is if it has a field, where the type of that field is not Rust-movable. There is no way to override this: there is nothing a type can do to make itself Rust-movable if one subobject is not.

For example, consider a field like std::string name;. std::string defines a custom destructor and copy / move constructor/assignment operator, in order to correctly manage owned heap memory for the string. Because of this, it also is not Rust-movable. And, at the time of writing, std::string currently cannot use ABSL_ATTRIBUTE_TRIVIAL_ABI in any STL implementation. In the case of libstdc++, for example, std::string contains a self-referential pointer: when the string is small enough, the data() pointer refers to the inside of the string. Rust-moving it would cause the pointer to refer back to the old object, which would cause undefined behavior.

If a struct or class contains a std::string as a subobject by value, or any other non-Rust-movable object, then that struct or class is itself also not Rust-movable. (If you somehow were able to Rust-move the parent object, this would also Rust-move the string, causing the very same issues.)

Instead, what you can do is change the type of the field, so that it doesn't contain the problematic type by value. Instead, it can hold the non-Rust-movable type by pointer.

BEST PRACTICE: Except where necessary for better Rust interop, this is not good C++ style. When you use this trick, document why, and try to limit it to types close to the interop boundary. If possible, instead of boxing T, make T itself rust-movable. (This is not easy for standard library types, but if the type is under your control, it may be as easy as adding ABSL_ATTRIBUTE_TRIVIAL_ABI.)

unique_ptr

NOTE: The following is non-portable, and only works in libc++ with the unstable ABI. If you aren't sure about whether you are using the unstable ABI, it is likely that you are not, but you might want to check in with your local toolchain maintainer.

If you tightly control your dependencies, you might be using libc++'s unstable ABI. The unstable ABI, among other things, makes unique_ptr<T> Rust-movable. In fact, it is Rust-movable even if T itself is not.

This means that if a particular field is making its parent type non-Rust-movable, one fix is to wrap it in a unique_ptr:

struct Person {
  std::string name;
  int age;
}
struct Person {
  // boxed to make Person rust-movable: <internal link>/cpp/cookbook#boxing
  std::unique_ptr<std::string> name;
  int age;
}

Raw pointers

BEST PRACTICE: This should only be used in codebases that do not use a Rust-movable unique_ptr or unique_ptr equivalent. Consider wrapping this in an ABSL_ATTRIBUTE_TRIVIAL_ABI type which resembles unique_ptr, instead.

When not using libc++'s unstable ABI, the most straightforward way to make a field Rust-movable is to instead use a raw pointer, and delete it in the destructor (as if it were held by a unique_ptr).

struct Person {
  std::string name;
  int age;
}
struct ABSL_ATTRIBUTE_TRIVIAL_ABI Person {
  // Owned, boxed to make Person rust-movable: <internal link>/cpp/cookbook#boxing
  std::string* name;
  int age;

  ~Person() {
    delete name;
  }
}

(Note the use of ABSL_ATTRIBUTE_TRIVIAL_ABI: because we added a destructor, we also need to add ABSL_ATTRIBUTE_TRIVIAL_ABI to indicate that the destructor does not care about the address of Person.)

Renaming functions for Rust

/cpp/cookbook#renaming

Overloaded functions cannot be called from Rust (yet: b/213280424). To make them available anyway, you can define new non-overloaded functions with different names:

void Foo(int x);
void Foo(float x);
void Foo(int x);
void Foo(float x);

// For Rust callers: <internal link>/cpp/cookbook#renaming
inline void FooInt(int x) {return Foo(x);}
// For Rust callers: <internal link>/cpp/cookbook#renaming
inline void FooFloat(float x) {return Foo(x);}

Working around blocking bugs in Crubit

Crubit is still in development, and has bugs which can completely stop your work if Crubit was in the critical path. These can take the form of parsing errors or crashes when Crubit runs, or else generated bindings which do not compile.

The following workarounds can help get you moving again:

Disable Crubit on a declaration

If a declaration causes hard failures within Crubit, that declaration alone can be disabled using the CRUBIT_DO_NOT_BIND attribute macro, defined in support/annotations.h. This must be paired with an additional entry in rs_bindings_from_cc/bazel_support/generate_bindings.bzl, recording the name of the item.

To mail the CL performing this change, use _manage: add AUTO_MANAGE=testing:TGP to the CL description.

NOTE: By disabling Crubit on this declaration, items which depend on it may also, in turn, not receive bindings. For example, if it declares a type, then functions which accept or return that type will also not receive bindings.

Disable Crubit on a header

If an entire header is giving problems (e.g. is unparseable), then it can be removed from consideration by Crubit. Once disabled, Crubit will avoid reading the header directly, although it is still included via #include preprocessor directives.

Add the target name and header name to public_headers_to_remove in rs_bindings_from_cc/bazel_support/rust_bindings_from_cc_aspect.bzl. See the example in rs_bindings_from_cc/test/disable/disable_header/.

To mail the CL performing this change, use _manage: add AUTO_MANAGE=testing:TGP to the CL description.

NOTE: By disabling Crubit on this header, items which are defined in that header will not receive bindings. For example, this means that functions which use types defined in that header will also not get bindings, even if the function was defined in a header that was not disabled.

When possible, it's preferable to use a smaller fix. For example, if the same header is owned by two targets, it's preferable to move the header into a third target, depended on by both. That way, functions which use types defined in that header will still get bindings, in both targets.

Best Practices Writing Rust Bindings for Existing C++ Libraries

Introduction

This document is an attempt at guidance for how Rust changes can be made to existing C++ libraries, including core foundational libraries.

For an introduction, see Rust Bindings for C++ Libraries.

Code Organization

For technical reasons, it is generally necessary for the C++ library and its Rust bindings to be the same Bazel target. It is not possible to define the Rust bindings for a target as a completely separate and independent target. The automatically generated bindings, and their configuration, must be on and in the C++ target itself.

The reasons why are fairly technical, and you can stop reading here if you're OK with this.

Technical Justification

Crubit generates bindings using Bazel aspects: given an arbitrary C++ Bazel target, Crubit generates, in an aspect, the Rust library which wraps it. To users it appears as if the Bazel target was both a C++ and a Rust library.

This is necessary for the same reason that it's necessary for protocol buffers. And, just like protocol buffers, this means that we don't have a rust_library target where we could customize its behavior using Bazel attributes.

Specifically, we cannot use a regular Bazel rule for bindings generation because the rule cannot generate bindings for transitive dependencies: if A depends on B, then bindings(A) depends on bindings(B), so that bindings(A) can wrap functions in A that return types from B, and so on. (See FAQ: Why can't we use separate rules?)

Because bindings are generated in an aspect, and not a rule, there are only two places to configure the bindings of a target A:

  • In the source code of the target receiving Rust support, using configuration pragmas or attributes. (This is similar to protocol buffers.)
  • In the BUILD file, on the target receiving Rust support, via aspect_hints. Aspect hints are a storage location for configuration data, readable by the aspect, placed directly on the target that the aspect runs on.

Generally speaking, it's better to modify the source code than to configure externally via aspect hints. However, some source code annotations are nonstandard and can have performance implications (see b/321933939). In addition to this, source code is not readable from the build system itself, and so where configuring a target requires customizing the build graph, these must go in aspect hints.

For these reasons, currently most publicly available methods of customizing bindings occur in aspect hints.

In any case, any configuration or support for Rust is done directly to the target.

Example

To enable Crubit on a C++ target, one actually modifies the target itself, adding aspect_hints = ["//features:supported"]. This must be an aspect hint, not a source code annotation, for all of the above reasons:

  1. It makes the build faster and more resilient: when Crubit is disabled on a target, Bazel needs to know so it can completely avoid running Crubit on it.
  2. There is no stable, reliable, and style-approved header-wide pragma we can use for enabling/disabling Crubit, but aspect_hints does work.

FAQ: Why can't we use separate rules?

A library A, and its bindings bindings(A), must be linked together in the build graph: if B uses a type from A, then bindings(B) uses a type from bindings(A).

Crucially, this also goes in reverse: if a Rust library C uses a type from bindings(A), then reverse_bindings(C) uses a type from A. This forms a natural dependency cycle: the build graph must understand both the link from A to bindings(A), and the link from bindings(A) to A.

Crubit resolves this by making A and bindings(A) the same target in the build graph: bindings for a target are obtained by reading an aspect on the target.

It is not possible to make A one build target, and bindings(A) a separate build target, call it X:

  1. We cannot literally configure on A that its bindings are in a different target X, because this ends up producing a real dependency cycle, as mentioned above: if bindings(A) = X, then reverse_bindings(X) = A.
  2. We cannot avoid the cycle by creating the dependency "lazily", or "dynamically" based on e.g. a naming scheme during Bazel analysis. Bazel dependencies cannot be discovered dynamically; once Bazel reaches this point of evaluation, dependencies need to be fully resolved: labels in deps are no longer strings in this stage, they are edges in a dependency graph. That graph must not have cycles.
  3. In some limited cases, we can hardcode the relationship within Crubit: Crubit is actually two aspects, each of which handles a single direction of interop. So Crubit can hardcode inside of itself that bindings(A) = X, and in the other half, that reverse_bindings(X) = A. This requires that Crubit itself depends on A and X. Therefore, to avoid another dependency cycle, neither A nor X can depend/use Crubit in their transitive dependencies. This is not feasible except in very isolated cases. Currently, we only do this for the Rust and C++ standard libraries.

To compare with another similar technology, PyCLIF avoids this problem because it only supports "one-directional" interop, and so it doesn't need to avoid dependency cycles. Crubit is bidirectional, and this comes with some technical restrictions.

FAQ: Why are there extra dependencies in deps(target)?

Because the Rust bindings are created using an aspect on the C++ target, everything that the Rust bindings need to depend on will appear in a Bazel query / depserver query for deps(target).

For example, if you wanted to add some extra source file to the Rust bindings, you might specify them in aspect_hints. This file will show up in deps(target).

These Rust-only deps are not used at all in pure-C++ builds (the Bazel actions registered by them won't be executed), but they will show up in the dependency graph anyway, due to how Bazel query and depserver track dependencies.

NOTE: In particular, if your project has tests that count/limit the transitive dependencies of a C++ binary, they will overcount the dependencies, and the overcounting will get worse as Rust support is rolled out through the C++ build graph.

Wrapping and type bridging vs direct use of types

Crubit automatically generates layout-compatible Rust equivalents of C++ types. When the C++ type is Rust-movable, the Crubit-generated Rust type is Rust-movable, these can be used by value, by pointer, in struct fields, arrays, and any other compound data type. A C++ pointer const T* can become a Rust *const T, and a C++ T field can become a Rust T field, and so on, with few restrictions.

For example, the following C++ type:

struct Vec2d {
    float x;
    float y;
};

Becomes (roughly) the following Rust type:

#![allow(unused)]
fn main() {
#[repr(C)]
struct Vec2d {
    pub x: f32,
    pub y: f32,
}
}

These have an identical layout, and so a C++ pointer or field containing a C++ Vec2d is exactly equivalent to a Rust pointer or field containing a Rust Vec2d.

(See Types for more information about layout-compatibility.)

Because of this, it is often not required to manually write any new types. The bindings generated by Crubit will produce a working type automatically.

When to wrap a type

There are, still, a handful of reasons to manually write "wrapper" types which encapsulate or replace the original C++ type (or its Crubit-generated Rust type).

  • If the type is not naturally Rust-movable, but it's important for the Rust type to be Rust-movable. It may be possible to make changes to the C++ code to make the type Rust-movable using some of the strategies described in the cookbook. This allows the greatest flexibility, as the type becomes usable in almost every context. But if that is not possible, writing a new "wrapper" type can keep Rust programmers productive.
  • Some Rust types have very special semantics, which are impossible to implement in the bindings for a C++ type. For example, Rust has special support for Result and Option in error handling via the ? operator, which cannot yet be implemented by Status or std::optional using stable Rust features. These privileged Rust types can be used instead of the equivalent C++ types, as a wrapper type.

In these cases, we may bridge to a wrapper type as a workaround, while we hopefully fix the underlying issues that mean we cannot directly use the underlying type. This offers us a subset of the API we want, and allows continued progress.

Why not to wrap a type

Wrapper types work best when passed by value: if you return a T in C++, the corresponding Rust function can automatically convert it to and return a WrappedT.

However, no conversion is possible for references or fields, which really are the original type, with its size and alignment and address in memory - to make this work transparently requires an ever-expanding network of wrapper types, one for every compound data type that might contain T:

  • T must become WrappedT
  • const T&, if it is supported at all, must become something like TRef<'a>, or a dynamically sized &TView.
  • std::vector<T>, if it is supported at all, must become something like TVector.
  • struct MyStruct {T x;} must become a wrapped WrappedMyStruct.
  • ...

The problems introduced by wrapper types can easily outweigh the benefits that they bring. Crubit aims to reduce their necessity to zero over time.

Bad reasons to wrap a type

In most other circumstances where one might want to reach for wrapper types, alternatives exist:

  • If we want to use a wrapper type in order to give the type a nicer Rust API, then, as an alternative, one can customize the Rust API of the wrapped type using an aspect hint. You can define new methods and trait implementations to the side, without altering any C++ code.

  • If we want to use a wrapper type in order to change the type invariants – to make them stricter or looser – this is fine, as long as it doesn't replace the not-as-nice type. For example, if a C++ API returns std::string (bytes, "probably" UTF-8), the Rust equivalent should not return a Rust String (Unicode, definitely UTF-8). Changing type invariants in-place causes some APIs to become impossible to call, and causes the Rust and C++ ecosystems to diverge and become incompatible. The bindings should be high fidelity. Wrapper types of this form should be optional, and available equally to both C++ and Rust to avoid fragmenting the ecosystem.

Fidelity

Anything possible in C++ should be possible in Rust. See .

The Rust API for a given C++ API should not try to make the interface "better" at more than a superficial level, because it can compromise the ability of other teams to write new Rust code, or port existing C++ code to Rust.

Good changes:

  • Changing method names, especially to names that Rust callers might expect. For example, changing Status::ok() (C++) to Status::is_ok() (Rust) – Rust callers expect many of these boolean functions to be prefixed with is_.
  • Adding new APIs that Rust users expect. For example, trait implementations that allow the type to better interoperate with the Rust ecosystem, or functions which accept a Path or &str in addition to a raw C++ string_view.
  • Reifying C++ comments around lifetime or safety as actual lifetime annotations or unsafe declarations.

If the Rust type is outright unnatural to use, people won't use it, and it's worse for the ecosystem to have two APIs than one API.

Bad changes:

  • Removing deprecated APIs which still have C++ callers.
  • Placing new requirements on Rust callers that were not placed on C++ callers, such as requiring UTF-8 when C++ does not.

Customizing bindings using annotations

/cpp/customizing

[TOC]

The Rust bindings for a C++ declaration can be customized using an attribute macro from <crubit/support/annotations.h>.

For instance:

  • A function can be marked unsafe in Rust, even if Crubit would otherwise assume it was safe, using CRUBIT_UNSAFE.
  • Missing bindings for an item can be treated as an error, instead of ignored, using CRUBIT_MUST_BIND.
  • An item can be given a different name in Rust using CRUBIT_RUST_NAME("rust_name_here").

More information:

  • Dependency: //support:annotations
  • Include: #include <crubit/support/annotations.h>
  • Full API documentation: support/annotations.h

Example

Given the following C++ header:

cs/file:examples/cpp/unsafe_attributes/example.h symbol:SafeSignatureButAnnotatedUnsafe

Crubit will generate the following bindings:

cs/file:examples/cpp/unsafe_attributes/example_generated.rs symbol:SafeSignatureButAnnotatedUnsafe

Rust bindings for C++ functions

Rust code can call (non-member) functions defined in C++, provided that the parameter and return types are supported by Crubit:

  • If a parameter or return type is a primitive type, then the bindings for the function use the corresponding Rust type.
  • Similarly, if a parameter or return type is a pointer type, then the bindings for the function use the corresponding Rust pointer type.
  • If the type is a user-defined type, such as a class type or enum, then the bindings for the function use the bindings for that type.

Additionally, code can call member functions defined in C++ if the parameter and return types are supported by Crubit (see above). Currently, member functions are translated as non-method associated functions.

Examples

Functions

Given the following C++ header:

cs/file:examples/cpp/function/example.h function:add_two_integers

Crubit will generate the following bindings, with a safe public function that calls into the corresponding FFI glue:

cs/file:examples/cpp/function/example_generated.rs function:add_two_integers

Methods

Given the following C++ header:

cs/file:examples/cpp/method/example.h class:Bar

Crubit will generate the following bindings:

cs/file:examples/cpp/method/example_generated.rs class:Bar
cs/file:examples/cpp/method/example_generated.rs snippet:0,6 "impl Bar"

unsafe functions

Which C++ functions are marked unsafe in Rust?

By default, the Rust binding to a C++ function is marked as safe or unsafe based on the types of its parameters. If a C++ function accepts only simple types like integers, the resulting Rust binding will be marked as safe. Functions which accept a raw pointer are automatically marked as unsafe.

This behavior can be overridden using the CRUBIT_UNSAFE, CRUBIT_UNSAFE_MARK_SAFE and CRUBIT_OVERRIDE_UNSAFE(is_unsafe) macros.

For example, given the following C++ header:

cs/file:examples/cpp/unsafe_attributes/example.h content:^([^/#\n])[^\n]*

Crubit will generate the following bindings:

cs/file:examples/cpp/unsafe_attributes/example_generated.rs content:^([^/\n])([^!\n]|$)[^\n]*

Correct usage of unsafe

Functions marked unsafe cannot be called outside of an unsafe block. In order to avoid undefined behavior when using unsafe, callers must:

  • Ensure that the pointer being passed to C++ is a valid C++ pointer. In particular, it must not be dangling (e.g. Nonnull::dangling()).

  • Ensure that the safety conditions documented in C++ are upheld. For example, if the C++ function accepts a reference or non-null pointer, then do not pass in 0 as *const _.

Soundness

Note that many "safe" C++ functions may still trigger undefined behavior if used incorrectly. Regardless of whether a C++ function is marked as unsafe, calls into C++ will only be memory-safe if the caller verifies that all function preconditions are met.

Function Attributes

Function attributes are not currently supported. Functions marked [[noreturn]], [[nodiscard]], etc. do not have bindings.

Rust bindings for C++ classes and structs

A C++ class or struct is mapped to a Rust struct with the same fields. If any subobject of the class cannot be represented in Rust, the class itself will still have bindings, but the relevant subobject will be private.

To have bindings, the class must be "Rust-movable". For example, any trivial or "POD" class is Rust-movable.

Example

Given the following C++ header:

cs/file:examples/cpp/trivial_struct/example.h class:Position

Crubit will generate a struct with the same layout:

cs/file:examples/cpp/trivial_struct/example_generated.rs class:Position

For an example of a Rust-movable class with a destructor, see examples/cpp/trivial_abi_struct/.

Fields

The fields on the Rust struct type are the corresponding Rust types:

  • If the C++ field has primitive type, then the Rust field uses the corresponding Rust type.
  • Similarly, if the C++ field has pointer type, then the Rust field has the corresponding Rust pointer type.
  • If the field has a user-defined type, such as a class type or enum, then the bindings for the function use the bindings for that type.

Unsupported fields

Subobjects that do not receive bindings are made private, and replaced with an opaque blob of [MaybeUninit<u8>; N], as well as a comment in the generated source code explaining why the subobject could not receive bindings. For example, since inheritance is not supported, the space of the object occupied by a base class will instead be this opaque blob of bytes.

Specifically, the following subobjects are hidden and replaced with opaque blobs:

  • Base class subobjects
  • Non-public fields (private or protected fields)
  • Fields that have nontrivial destructors
  • Fields whose type does not have bindings
  • Fields that have any unrecognized attribute, including no_unique_address

A Rust struct with opaque blobs is ABI-incompatible with the C++ struct or class that it corresponds to. As a consequence, if the struct is used for FFI outside of Crubit, it should not be passed by value. Within Crubit, it can't be passed by value in function pointers, but can otherwise be used as normal.

Rust-movable classes

For a type to be passed or returned by value in Rust, it must be "Rust-movable": the class must be able to be "teleported" in memory during its lifetime, as if by using memcpy and then discarding the old location without running any destruction logic. This means that it can be present in Rust using normal objects and pointers and references, without using Pin.

For example, a string_view is Rust-movable. In fact, every trivially copyable type is Rust-movable

However, unlike Rust, many types in C++ are not Rust-movable. For example, a std::string might be implemented using the "short string optimization", in a fashion similar to this:

class String {
    union {
        size_t length;
        char inline_data[sizeof(length)];
    };
    char* data; // either points to `inline_data`, or the heap.
  public:
    size_t size() {
        if (data == (char*)this) {
            return strlen(data);
        } else {
            return length;
        }
    }
    // ...
};

This class is self-referential: the data pointer may point to inline_data, which is inside the object itself. If we bitwise copy the object to a new location, as in a "Rust move" or as with memcpy, then the data pointer will remain bitwise identical, and point into the old object. It becomes a dangling pointer!

C++ allows self-referential types. In C++, fields can and often do point at other fields, because assignment is overloadable: the assignment operator can be modified to, when copying or moving the string, also "fix up" the data pointer so that it points to the new location in the new object, instead of dangling.

Rust does not do this. In Rust, assignment is always a "trivial relocation" -- assignment runs no code when copying or moving an object, and copies the bytes as they are. This would break on the String type defined above, or any other self-referential type.

Unfortunately, any class with a user-defined copy/move operation or destructor might be self-referential, and so by default they are not Rust-movable. If a class has a user-defined destructor or copy/move constructor/assignment operator, and "should be" Rust-movable, it must explicitly declare that it is safe to perform a Rust move, using the attribute ABSL_ATTRIBUTE_TRIVIAL_ABI. This attribute allows a class to be trivially relocated, even though it defines an operation that would ordinarily disable trivial relocation.

For example, in the unstable libc++ ABI we use within Google, a unique_ptr<T> is Rust-movable, because it applies ABSL_ATTRIBUTE_TRIVIAL_ABI. This is safe to do, for unique_ptr, because its exact location in memory does not matter, and paired move/destroy operations can be replaced with Rust move operations.

Requirements

The exact requirements for a class to be Rust-movable are subject to change, because they are still being defined within Clang and within the C++ standard. But at the least:

  • Any trivially copyable type is also Rust-movable.
  • Any class or struct type with only Rust-movable fields and base classes is Rust-movable, unless:
    • it is not ABSL_ATTRIBUTE_TRIVIAL_ABI and defines a copy/move constructor, copy/move assignment operator, or destructor, or,
    • it is otherwise nontrivial, e.g., from defining a virtual member function.

Some examples of Rust-movable types:

  • any primitive type (integers, character types, floats, etc.)
  • raw pointers
  • string_view
  • struct tm, or any other type in the C standard library
  • unique_ptr, in the Clang unstable ABI.
  • absl::Status

Some examples of types that are not Rust-movable:

  • (For now) std::string, std::vector, and other nontrivial standard library types.
  • (For now) absl::flat_hash_map, absl::AnyInvocable, and other nontrivial types used throughout the C++ ecosystem, even outside the standard library.
  • absl::Mutex, absl::Notification, and other non-movable types.

Attributes

Crubit does not support most attributes on structs and their fields. If a struct is marked using any attribute other than alignment or ABSL_ATTRIBUTE_TRIVIAL_ABI, it will not receive bindings. If a field is marked using any other attribute, it will be replaced with a private opaque blob.

Rust bindings for C++ enums

A C++ enum is mapped to a Rust struct with a similar API to a Rust enum.

  • The enumerated constants are present as associated constants: MyEnum::kFoo in C++ is MyEnum::kFoo in Rust.
  • The enum can be converted to and from its underlying type using From and Into. For example, static_cast<int32_t>(x) is i32::from(x) in Rust, and vice versa static_cast<MyEnum>(x) is MyEnum::from(x).

However, a C++ enum is not a Rust enum. Some features of Rust enums are not supported:

  • C++ enums must be converted using From and Into, not as.
  • C++ enums do not have exhaustive pattern matching.

Example

Given the following C++ header:

cs/file:examples/cpp/enum/example.h class:Color

Crubit will generate the following bindings:

cs/file:examples/cpp/enum/example_generated.rs class:Color

Why isn't it an enum?

A C++ enum cannot be translated directly to a Rust enum, because C++ enums are "representationally non-exhaustive": a C++ enum can have any value supported by the underlying type, even one not listed in the enumerators. For example, in the enum above, static_cast<Color>(42) is a valid instance of Color, even though none of kRed, kBlue, or kGreen have that value.

Rust enums, in contrast, are representationally exhaustive. An enum declares a closed set of valid discriminants, and it is undefined behavior to attempt to create an enum with a value outside of that set, whether it's via transmute, a raw pointer cast, or Crubit. The behavior is undefined the moment the invalid value is created, even if it is never used.

Since a value like static_cast<Color>(42) is not in the list of enumerators, a Rust enum cannot be used to represent an arbitrary C++ enum. Instead, the Rust bindings are a struct. This struct is given the most natural and enum-like API possible, though there are still gaps. (Casts using as, for example, will not work with a C++ enum.)

What about #[non_exhaustive]?

The #[non_exhaustive] attribute on an enum communicates to external crates that more variants may be added in the future, and so a match requires a wildcard branch. Within the defining crate, non_exhaustive has no effect. It remains undefined behavior to transmute from integers not declared by the enum.

C++ bindings for Rust libraries

Rust libraries can be used directly from C++. This page documents roughly what that entails, and additional subpages (available in the left-hand navigation) document specific aspects of the generated bindings.

Tip: The code examples below are pulled straight from examples/rust/function/. The other examples in examples/rust/ are also useful. If you prefer just copy-pasting something, start there.

How to use Crubit

Crubit allows you to call some Rust interfaces from C++. It supports functions (including methods), structs, and even enums as "opaque" objects. Crubit does not support advanced features like generics or dynamic dispatch with dyn.

The rest of this document goes over how to create a Rust library that can be called from C++, and how to actually use it from C++. The quick summary is:

  • All rust_library targets can receive C++ bindings.

  • To use the bindings for a target //path/to:example_crate, you must create a C++ rule exporting the bindings, using cc_bindings_from_rust(name="any_name_here", crate=":example_crate").

  • The header name is the Rust target's label with a .h appended: to include the header for the Rust library //path/to:example_crate, you use #include "path/to/example_crate.h".

  • The namespace name is the Rust target name, e.g. example_crate. To change the namespace, use cc_bindings_from_rust_library_config, described below.

  • To see the generated C++ API, right click the "path/to/example_crate.h" include in Cider, and select "Go to Definition".

    NOTE: In some cases the generated file in Cider may be out of date. If it isn't refreshing, you can manually inspect the bindings using the workaround command in b/391395849.

Write a rust_library target

The first part of creating a library that can be used by Crubit is to write a rust_library target. For example:

cs/file:examples/rust/function/example.rs content:^[^/].*

In the BUILD file, in addition to defining the rust_library, you should also define the cc_bindings_from_rust target to make it easier to use from C++:

cs/file:examples/rust/function/BUILD symbol:example_crate|example_crate_cc_api

Example: If your Rust library is named //path/to:example_crate, then the C++ header file is "path/to/example_crate.h", and the C++ namespace is example_crate by default.

Use a Rust library from C++

C++ build rules do not have a rust_deps parameter, so to depend on the C++ bindings for a target, they must depend on the cc_bindings_from_rust rule.

For example:

cs/file:examples/rust/function/BUILD symbol:main
cs/file:examples/rust/function/main.cc content:^[^/\n].*

NOTE: Other than for declaring the dependency, all other information about the generated bindings comes from the actual rust_library rule. For example, the #include for the above is #include "examples/rust/function/example_crate.h", not example_crate_cc_api.h.

(Optional) Customize the generated C++ API

Give it a better namespace

The crate name might make a poor namespace. In addition, typically, multiple C++ headers and build targets share the same namespace. To customize the namespace name, use cc_bindings_from_rust_library_config:

cs/file:examples/rust/library_config/BUILD symbol:custom_namespace|example_crate

Now, instead of the crate name, the generated bindings will use the namespace name you provided:

cs/file:examples/rust/library_config/main.cc content:^[^/\n].*

Look at the generated bindings

There are two ways to look at the generated header file:

  • Click through the #include in Cider. Given the following C++ code:

    #include "path/to/example_crate.h"
    

    If you right click the file path, and select "Go to Definition", you will be taken to a file starting with // Automatically @generated C++ bindings.

  • Run bazel build //path/to:example_crate --config=crubit-genfiles, and open bazel-bin/path/to/example_crate.h in your text editor of choice.

Common Errors

Unsupported features

Some features are either unsupported, or else only supported with experimental feature flags (). In order to get bindings for a Rust interface, that interface must only use the subset of features currently supported.

For a particularly notable example, references are only supported as function parameters, and only in a subset of cases that we can prove does not add aliasing UB to C++ callers.

The way to work around this kind of problem, in all cases, is to wrap or hide the problematic interface behind an interface Crubit can handle:

  • Use raw pointers instead of references, if this use of references falls into a case Crubit does not support.
  • Hide unsupported types behind a wrapper type. For example, a Vec<T> is not supported by Crubit, but pub struct MyStruct(Vec<i32>); is.

C++ bindings for Rust functions

C++ code can call functions defined in Rust, provided that the parameter and return types are supported by Crubit:

  • If a parameter or return type is a fundamental type, then the bindings for the function use the corresponding Rust type.
  • Similarly, if a parameter or return type is a pointer type, then the bindings for the function use the corresponding Rust pointer type.
  • If the type is a user-defined type, such as a struct or enum, then the bindings for the function use the bindings for that type.

As a special case, functions also support reference parameters to supported types, with some restrictions to ensure safety. See References.

Example

Given the following Rust crate:

cs/file:examples/rust/function/example.rs function:add_two_integers

Crubit will generate the following function declaration, which calls into accompanying glue code:

cs/file:examples/rust/function/example_generated.h function:add_two_integers

unsafe functions

C++ does not have an unsafe marker at this time. In the future, Crubit may introduce a way to mark unsafe functions to help increase the reliability of C++ callers.

References

In general, Rust references are not exposed to C++. However, some Rust functions which accept reference parameters do get mapped to C++ functions accepting C++ references:

  • All references must have an unbound parameter lifetime – not 'static, for example.
  • Only the parameter itself can be a reference type. References to references, vectors of references, etc. are still unsupported.
  • If there is a mut reference parameter, it is the only reference parameter.

This set of rules is intended to describe a safe subset of Rust functions, which do not introduce substantial aliasing risk to a mixed C++/Rust codebase.

For example, the following Rust functions will receive C++ bindings, and can be called from C++:

#![allow(unused)]
fn main() {
fn foo(&self) {}
fn foo(_: &mut i32) {}
fn foo(_: &i32, _: &i32) {}
}

However, none of the below will receive bindings:

#![allow(unused)]
fn main() {
fn foo(_: &'static i32) {}  // 'static lifetime is bound
fn foo(_: &&i32) {}  // Reference in non-parameter type
fn foo(_: &mut i32, _: &i32) {}  // More than one reference, one of which is mut
fn foo(_: &'a i32) {}  // 'a is not a lifetime parameter of `foo`
}

Returned references are still not supported, and references which are bound to some lifetime (e.g. 'static) are also still not supported.

If you wish to accept more than one reference/pointer in C++, a raw pointer (*const T, *mut T) can be used instead. However, all of the usual unsafe caveats apply.

C++ bindings for Rust structs

A Rust struct is mapped to a C++ class/struct with the same fields. If any field cannot be represented in C++, the struct itself will still have bindings, but the relevant field will be private.

To receive C++ bindings, the struct must be movable in C++. See Movable Types.

Example

Given the following Rust module:

cs/file:examples/rust/struct/example.rs class:Struct

Crubit will generate the following bindings:

cs/file:examples/rust/struct/example_generated.h class:CRUBIT_INTERNAL_RUST_TYPE|Struct

Fields

The fields on the C++ class are the corresponding Rust types:

  • If the Rust field has primitive type, then the C++ field uses the corresponding C++ type.
  • Similarly, if the Rust field has pointer type, then the C++ field has the corresponding C++ pointer type.
  • If the field has a user-defined type, such as a struct or enum, then the bindings for the function use the bindings for that type.

Unsupported fields

Fields that do not receive bindings are made private, and replaced with an opaque blob of maybe-uninitialized bytes, as well as a comment in the generated source code explaining why the field could not receive bindings. For example, since String is not supported, the space of the object occupied by a String field will instead be this opaque blob of bytes:

#![allow(unused)]
fn main() {
// Rust: `my_field` is some unsupported type, such as `String`
pub my_field: String,
}
// C++: `my_field` becomes `private`, and its type is replaced by bytes.
private: unsigned char my_field[24]

Specifically, the following subobjects are hidden and replaced with opaque blobs:

  • Non-public fields (private or pub(...) fields).
  • Fields that implement Drop.
  • Fields whose type does not have bindings.
  • Fields that have an unrecognized or unsupported attribute.

C++ movable

To receive C++ bindings, the struct must be movable in C++. See Movable Types.

C++ bindings for Rust enums

A Rust enum is mapped to an opaque C++ type. C++ code cannot create a specific variant, but can call functions accepting or returning an enum.

To receive C++ bindings, the enum must be movable in C++. See Movable Types.

Example

Given the following Rust crate:

cs/file:examples/rust/enum/example.rs class:Color

Crubit will generate the following bindings:

cs/file:examples/rust/enum/example_generated.h class:CRUBIT_INTERNAL_RUST_TYPE|Color

Why isn't it a C++ enum?

A repr(i32) or fieldless repr(C) enum is very similar to a C++ enum. However, Rust enums are exhaustive: any value not explicitly listed in the enum declaration does not exist, and it is undefined behavior to attempt to create one.

C++ enums, in contrast, are "non-exhaustive": a C++ enum can have any value supported by the underlying type, even one not listed in the enumerators. For example, if the above example were a C++ enum, static_cast<Color>(42) would be a valid instance of Color, even though neither Red, Blue, nor Green have that value.

In order to prevent invalid Rust values from being produced by C++, a C++ enum cannot be used to represent a Rust enum. Instead, the C++ bindings are a struct, even for fieldless enums.

C++ movable

To receive C++ bindings, the enum must be movable in C++. See Movable Types.

Generating C++ enums from Rust enums

By default, a Rust enum is mapped to an opaque C++ type (see C++ bindings for Rust enums). However, Crubit can try to map Rust enums to C++ enums if requested using the #[cpp_enum] attribute. C++ code can use such enums like any other C++ enum.

But #[cpp_enum] cannot be used with exhaustive Rust enums. It may only be used on non-exhaustive enums, such as those created with #[open_enum] from the open_enum crate. Therefore, to generate C++ enum bindings, you must annotate your Rust enum with #[cpp_enum], #[repr(...)] (where ... is an integer type like i32), and #[open_enum].

C++ enums are non-exhaustive by default, meaning they can hold values other than the explicitly named enumerators. #[open_enum] generates a Rust enum that is similarly non-exhaustive. Additionally, C++ allows multiple enumerators to have the same value, which can be enabled in Rust by using #[open_enum(allow_alias)].

Example

Given the following Rust crate that uses #[cpp_enum] and #[open_enum(allow_alias)]:

cs/file:examples/rust/cpp_enum/example.rs class:Color

Crubit will generate the following bindings:

cs/file:examples/rust/cpp_enum/example_generated.h class:CRUBIT_INTERNAL_RUST_TYPE|Color

C++ bindings for Rust type aliases.

A rust Rust type aliases, such as pub type X = ...;, is mapped to the equivalent C++ type alias, such as using X = ...;.

Limitations:

  • The type must be a supported type.
  • The alias must not be generic: aliases with generic parameters, such as pub type X<T> = ..., are not supported.

Example

Given the following Rust crate:

cs/file:examples/rust/type_alias/example.rs content:\bpub\ type\b

Crubit will generate the following bindings:

cs/file:examples/rust/type_alias/example_generated.h content:\busing\b

C++ bindings for Rust use declarations

Crubit supports use declarations for functions and types, mapping them to equivalent using declarations in C++.

Limitations:

  • The use declaration must refer to a function or type.
    • If it refers to a function, it must not rename the function.
  • The use declaration must import exactly one entity per name. For example, pub use m::x; is supported if x refers to a function, or to a type, but not if it refers to both a function and a type.

Example

Given the following Rust crate:

cs/file:examples/rust/use_declaration/example.rs content:\bpub\ use\b

Crubit will generate the following bindings:

cs/file:examples/rust/use_declaration/example_generated.h content:\busing\b

Movable types

Crubit requires types to be "movable" to be passed by value: if a Rust type does not logically support a C++ move operation, then it can receive bindings, but it cannot be passed by value.

A Rust type can be made movable in C++ in one of three ways:

  1. Copyable: the Rust type implements Clone.
  2. Trivially move-constructible and destructible: the Rust type does not have a destructor. (It does not implement Drop, and nor do any of its fields.)
  3. Non-trivially move-constructible: the Rust type has a destructor, but implements Default.

The easiest way to ensure your type is useful to end users, even if it is changed in the future, is to implement Clone and Default. This makes the type default-constructible and copyable1, as well as efficiently movable.

Copyable

If the Rust type implements Clone, then the C++ type will be copyable:

  • Copy construction has the same behavior as Clone::clone.
  • Copy assignment has the same behavior as Clone::clone_from.

Because the type is copyable, it is also movable, at worst by a copy operation.

Trivially move-constructible and destructible

If no logic occurs during destruction, because the type doesn't implement Drop, and none of its fields do, then the C++ type will be trivially-movable and trivially-destructible:

  • Move construction and assignment copy the bytes of the object, with the same behavior as a Rust move operation.

NOTE: All Copy types are guaranteed to be trivially move-constructible and destructible.

If the Rust type is Copy, then the moved-from object is guaranteed to hold its old value, and be valid for all operations.

Otherwise, the object is only valid for assignment and destruction, and the behavior of performing any other operation is undefined.

Non-trivially move-constructible

If the Rust type is not trivially movable and destructible, but implements Default, then the resulting C++ type will be (non-trivially) move constructible:

  • Move construction has the same behavior as std::mem::take(): it copies the bytes to the new object, as if by a Rust move, and replaces the moved-from object with Default::default().
  • Move assignment copies the bytes to the new object, as if by a Rust move, and replaces the moved-from object with an unspecified but valid object.

Why is this required?

In general, Crubit needs to be able to move objects as part of the implementation of pass-by-value, even in C++17, due to platform ABI restrictions. Even without this requirement, types are not very useful in C++ if they are not movable.

Unlike Rust, C++ has no "destructive move". There is no way to change an object's location in memory, only to create a new object with the same value, and leave behind something in the old (still valid) object. Sometimes, what's left behind is an identical copy of the object state: this is a copy operation, implemented by the C++ copy constructor or copy assignment operator. But sometimes, copying is expensive, and instead what we might leave behind is some kind of junk value. It still must be a valid object (at least so that its destructor and assignment operator can be invoked), but it might represent some invalid or moved-from state.

For example, to "move" a unique_ptr (the C++ equivalent of Box) from one variable to another, you copy the bytes, and then replace the old location with a special null value representing an unoccupied / moved-from unique_ptr. This is why unique_ptr must be nullable in the C++ type system: otherwise, it could not be moved!

1

The combination of default-constructible and copyable is so important for making types useful in C++ that it even has a name: "semiregular"

High-level design of C++/Rust interop

This document describes the high-level design choices of Crubit, a C++/Rust Bidirectional Interop Tool.

[TOC]

C++/Rust interop goal

The primary goal of Crubit is to enable Rust to be used side-by-side with C++ in large existing codebases.

In the short term we would like to focus on codebases that roughly follow the Google C++ style guide to improve the interop fidelity. Other, more diverse codebases are possible prospective users in the long term, and their needs will be addressed by customization and extension points.

C++/Rust interop requirements

In support of the interop goal, we identify the following requirements:

  1. Enable using existing C++ libraries from Rust with high fidelity
    • High fidelity means that interop will make C++ APIs available in Rust, even when those API projections would not be idiomatic, ergonomic, or safe in Rust, to facilitate cheap, small step incremental migration workflow. Based on the experience of other cross-language interoperability systems and language migrations (for example, Objective-C/Swift, Java/Kotlin, JavaScript/TypeScript), we believe that working in a mixed C++/Rust codebase would be significantly harder if some C++ APIs were not available in Rust.
    • Interop will bridge C++ constructs to Rust constructs only when the semantics match closely. Bridging large semantic gaps creates a risk of making C++ APIs unusable in Rust, as well as a risk of creating performance problems. For example, interop will not bridge destructive Rust moves and non-destructive C++ moves; instead it will make C++ move constructors and move assignment operators available to use in Rust code. As another example, interop will not bridge C++ templates and Rust generics by default.
    • Interop should be performant, as close to having no runtime cost as possible. The performance costs of the interop should be documented, and where possible, intuitive to the user.
    • Interop should be ergonomic and safe, as long as ergonomic and safety accommodations do not hurt performance or fidelity. Where a tradeoff is possible, the interop will choose performance and fidelity over ergonomics; the user will be allowed to override this choice.
    • Enable owners of the C++ API to control their Rust API projection, for example, with attributes in C++ headers and by extending generated bindings with a manually implemented overlay. Such an overlay will wrap or extend generated bindings to improve ergonomics and safety.
  2. Enable using Rust libraries from C++
    • However, using C++ libraries from Rust has a higher priority than using Rust libraries from C++.
  3. Put little to no barriers to entry
    • Ideally, no boilerplate code needs to be written in order to start using a C++ library from Rust. Adding some extra information can make the generated bindings more ergonomic to use.
    • The amount of duplicated API information is minimized.
    • Future evolution of C++ APIs should be minimally hindered by the presence of Rust users.

Proposal and high-level design

We propose to develop our own C++/Rust interop tooling. There are no existing tools that satisfy all of our requirements. Modifying an existing tool to fulfill these requirements would take more effort than building a new tool from scratch or might require forking its codebase given that some existing tools have goals that conflict with our goals.

See the "alternatives considered" section for a discussion of existing tools.

Source of information about C++ API

Interop tooling will read C++ headers, as they contain the information needed to generate Rust API projections and the necessary glue code. Interop tooling that is used during builds will not read C++ source files, to maintain the principle that C++ API information is only located in headers, and that a C++ library can't break the build of its dependencies by changing source files.

Some interop-adjacent tools (e.g., large-scale refactoring tools that seed the initial set of lifetime annotations) will also read C++ sources. These tools will not be used during builds.

Pros

  • Minimal barrier to entry: minimal amount of manual work is required to start using a C++ library from Rust.
    • Encourages leaf projects to start incrementally adopting Rust in new code, or incrementally rewriting C++ targets in Rust.
  • C++ API information is located only in headers, regardless of the language that the API consumer is written in (C++ or Rust).
  • Interop tooling that generates Rust API projections from a C++ header can get exactly the same information that the C++ compiler has when processing a translation unit that uses one of the APIs declared within that header.
    • Interop tooling can generate the most performant calls to C++ APIs, without C++-side thunks that translate the C++ ABI into a C ABI.
    • Interop tooling can autodetect implementation details that are critical for interop but are not a part of the API surface (for example, the size and alignment of C++ classes that have private data members).
    • In alternative solutions, users need to repeat these implementation details in sidecar files. Interop can verify that the specified information is correct through static assertions in generated C++ code, but the overall user experience is inferior.

Cons

  • Having to read C++ headers makes interop tooling more complex.
  • The Rust projection of the C++ API is only visible in machine-generated files.
    • These are not trivially accessible.
    • There is a limit on how readable these files can be made.
    • We can mitigate these issues by building tooling that shows the Rust view of a C++ header (for example in Code Search, or in editors as an alternative go-to-definition target).

Customizability

Interop tooling will be sufficiently customizable to accommodate the unique needs of different C++ libraries in the codebase. Interop should be customizable enough to accommodate existing codebases. C++ API owners can:

  • Guide how interop tooling generates Rust API projections from C++ headers. For example, headers can provide:
    • Custom Rust names for C++ function overloads (instead of applying the general interop strategy for function overloads),
    • Custom Rust names for overloaded C++ operators,
    • Custom Rust lifetimes for pointers and references mentioned in the C++ API,
    • Nullability information for pointers in the C++ API,
    • Assertions (verified at compile time) and promises (not verified by tooling) that certain C++ types are Rust-movable.
  • Provide custom logic to bridge types, for example, mapping C++ absl::StatusOr to Rust Result.
  • Provide API overlays that improve the automatically generated Rust API.
    • For example, the overlays could inject additional methods into automatically generated Rust types or hide some of the generated methods.

More intrusive customization techniques will be useful for template and macro-heavy libraries where the baseline import rules just won't work. We believe customizability will be an essential enabler for providing high-fidelity interop.

Source of additional information that customizes C++ API projection into Rust

Where C++ headers don't already provide all information necessary for interop tooling to generate a Rust API projection, we will add such information to C++ headers whenever possible. If it is not desirable to edit a certain C++ header, extra information can be stored in a sidecar file.

Examples of additional information that interop tooling will need:

  • Nullability annotations. C++ APIs often expose pointers that are documented or assumed by convention to be never null, but can't be refactored to references due to language limitations (for example, std::vector<MyProtobuf *>). If C++ headers don't provide nullability information for pointers in a machine-readable form, interop tooling has to conservatively mark all C++ pointers as nullable in the Rust API projection. The Rust compiler will then force users to write unnecessary (and untestable) null checks.
  • Lifetimes of references and pointers in C++ headers are not described in a machine-readable way (and sometimes are not even documented in prose). Lifetime information is essential to generate safe and idiomatic Rust APIs from C++ headers.

Additional information is stored in C++ headers

Pros

  • Additional information needed for C++/Rust interop will be expressed as annotations on existing syntactic elements in C++.
    • The annotations are located in the most logical place.
    • The annotations are more likely to be noticed and updated by C++ API owners.
    • API owners retain full control over how the API looks in Rust.
  • C++ users may find lifetime and nullability annotations useful. For example, information about lifetimes is highly important to C++ and Rust users alike.
  • C++ API definitions are only written once, minimizing duplication and maintenance burden.

Cons

  • Annotations that benefit Rust users can bother C++ API owners who don't care about Rust. Especially at the beginning of integrating Rust into an existing codebase, C++ API owners can push back on adding annotations.
    • To encourage adoption of annotations, we can develop tooling for C++ that uses lifetime and nullability annotations to find bugs in C++ code.
    • The pushback is likely to be short-term: if Rust takes off in a C++ codebase, C++ library owners in that codebase will need to care about Rust users and how their API looks in Rust.
  • There may be headers that we cannot (or would not want to) change, for example, headers in third-party code, headers that are open-sourced, or when first-party owners are not cooperating.

Additional information is stored in sidecar files

Additional information needed for C++/Rust interop can be stored in sidecar files, similarly to Swift APINotes, CLIF etc. If sidecar files get sufficiently broad adoption (for example, if annotating third-party code turns out to be sufficiently important that optimizing C++/Rust interop ergonomics there would be worth it), it would make sense to write sidecar files in a Rust-like language, as that provides the most natural way to define Rust APIs.

Pros

  • Sidecar files enable more broad adoption of annotations by providing additional interop information without modifying C++ headers. Sidecar files will allow us to annotate headers in third-party code, headers that can't adopt annotations for technical reasons, or headers owned by first-party owners who are not cooperating.

Cons

  • Like in the Use Rust code to customize API projection into Rust alternative, some part of C++ API information is duplicated, which is a burden for the C++ API owners.
  • The projection of C++ APIs to Rust is defined in a new language.
    • C++ API owners and Rust users will have to learn this language.
    • If we expect wide adoption of sidecar files, we will need to create tooling to parse, edit, and run LSCs against this language.
  • Annotations in sidecar files are more prone to become out of sync with the C++ code. When making changes to C++ code, engineers are less likely to notice and update the annotations in sidecar files.
    • Presubmits can catch some cases of desynchronization between C++ headers and sidecar filles. However, presubmit errors that remind engineers to edit more files create an inferior user experience.
  • Sidecar files create extra friction to modify the code. Where previously one had to edit only a C++ header and a C++ source file, now one also likely needs to update a sidecar file.
    • When engineers realize that they need to update a sidecar file, opening another file and finding the right place to update creates extra friction to modify code.
    • Once engineers understand the extra maintenance burden associated with sidecar files that tend to go out of sync with headers, they will be less likely to adopt annotations in the first place.

Glue code generation

C++/Rust interop tooling will generate executable glue code and type definitions in Rust and in C++ (not just merely extern "C" function declarations) in order to achieve the following goals:

  • Enable instantiating C++ templates from Rust, and monomorphizing Rust generics from C++. Enable Rust types to participate in C++ inheritance hierarchies.
    • For example, imagine Rust code using an object of type std::vector<MyProtobuf>, while C++ code in the same program is never instantiating this type. The Bazel rust_library target that mentions this type must therefore be responsible for instantiating this template and linking the resulting executable code into the final program. We propose that this instantiation happens in an automatically generated "glue" C++ translation unit that is a part of that rust_library.
  • Enable automatically wrapping C++ code to be more ergonomic in Rust. For example:
    • extern "C" functions in Rust are necessarily unsafe (it is a language rule). We would like the vast majority of C++ API projections into Rust to be safe. In the current Rust language, we can achieve that only by wrapping the unsafe extern "C" function in a safe function marked with #[inline(always)].
    • C++ API owners can provide rules for automatic type bridging, for example, mapping C++ absl::StatusOr to Rust Result. This conversion necessitates generation of a Rust wrapper function around a C++ entry point that takes advantage of such type bridging.
  • Provide stable locations (C++ modules, Rust crates) that "own" the types from the language point of view.
    • For example, when we project a C++ type into Rust, its Rust definition must be located in a Rust crate. Furthermore, all Rust users of this type must observe it as being defined in the same crate in order for every users to consider that they use the same type. Indeed, this is a rule in Rust, that types defined in different crates are unrelated types.
    • When we project a Rust type into C++ we could repeat its C++ definition in C++ code any number of times (for example, in every C++ user of a Rust type). This is technically fine because C++ allows the same type to be defined multiple types within a program. Nevertheless, such duplication is error-prone.

Glue code is generated as C++ and Rust source code

Interop tooling will generate glue code as C++ and Rust source files, which are then compiled with an unmodified compiler for that language. The alternative is to generate LLVM IR or object files with machine code directly from interop tooling.

Pros

  • It is easy to inject customizations provided by API owners into generated source code.
    • The customizations will be written in the target language, making it (hopefully) intuitive to write them.
  • Generated source code can be easily inspected by compiler engineers while debugging interop problems and compiler bugs.
  • Generated source code can be inspected and understood by interop users, who are not compiler experts.
    • LLVM IR wouldn't be meaningful to them.
  • Generated source code is processed by the regular toolchain like any other code in the project.
    • It automatically benefits from all performance optimizations and sanitizers that are newly implemented in Clang and Rust compilers.
  • We avoid adding a new tool that generates unique LLVM IR patterns.
    • We avoid making the job of the C++ toolchain maintainers harder.

Cons

  • Interop tooling will be limited to generating LLVM IR and machine code that Clang and Rust compilers can generate.

Glue code and API projections will assume implementation details of the target execution environment

To provide the most ergonomic and performant interop, C++/Rust interop tooling will allow the target codebase to opt into assuming various implementation details of the target execution environment. For example:

  • When calling C++ from Rust, interop tooling can either wrap C++ functions in thunks with a C calling convention, or call C++ entry points directly. Thunks cause code bloat and can collectively add up to become a performance problem, so it is desirable to call C++ entry points from Rust directly. Interop tooling can do that only if it may assume a specific target platform and C++ ABI.

Implementation details of the target execution environment that are considered stable enough will be reflected in API projections, for example:

  • The C++ standard does not specify sizes of integer types (short, int, long etc.) To map them to Rust, interop tooling will need to assume a size that they have on the platform that targets in practice. The alternative would be to create target-agnostic integer types (for example, Int in Swift is a strong typedef for Int32 on 32-bit targets, and Int64 on 64-bit targets), but this makes it harder to provide idiomatic, transparent, high-performance interop.
  • The C++ standard does not specify whether standard library types like std::vector are in any sense Rust-movable; it is an implementation detail. Universal interop tooling would have to conservatively assume non-Rust-movable types. Interop tooling specific to certain environments can rely on libc++ providing a Rust-movable std::vector and project it into Rust in a much more ergonomic way.

Pros

  • Interop tooling will generate the most performant code sequences to call foreign language functions.
    • If interop tooling generates portable code, it would have some overhead. The overhead can be eliminated by C++ and Rust optimizers at least in some cases, but at the cost of increased build times. For example, eliminating thunks would require turning on LTO, which is not fast, and usually only used for release builds. It is much preferable to not generate thunks in the first place, if the target platform does not need them.
  • Ergonomics of API projections will be improved.
    • For example, whether a C++ type is Rust-movable or not is an implementation detail in C++, transparent to C++ users of that type, but it makes a huge ergonomic difference in the Rust API projection.

Cons

  • C++ code will have additional evolution constraints.
    • For example, changing a type from Rust-movable to non-Rust-movable is a non-API-breaking change for C++ users, but it would break Rust users.
  • It would be more difficult to switch internal environments to a different C++ standard library.
  • Code that is deployed in environments that have incompatible implementation details won't be able to use this C++/Rust interop system.
    • Alternatively, these executables would have to bring a suitable execution environment with them (e.g., a copy of libc++).

Interop tooling should be maintainable and evolvable for a long time

We should design and implement C++/Rust interop tooling in such a way that we can maintain and evolve it for more than a decade. If Rust becomes tightly integrated into an existing C++ project, specific requirements for interop and API projection rules will keep changing. The more Rust adoption we will have, the more library and team-specific interop customizations we will have to support, and the more it will make sense for the performance team to tweak generated code to implement sweeping optimizations. These kinds of changes should be readily possible, and they should not create conflicts of interest between diferent users of the interop tooling.

Interop tooling should facilitate C++ to Rust migration

C++/Rust interop tooling should try to create a favorable environment for migrating C++ code to Rust. Specifically, projections of C++ APIs into Rust should be implementable in Rust. This way, a C++ library can be converted from C++ into Rust transparently for its users, as its public API won't change.

Alternatives Considered: Design decisions

Repeat C++ API completely in a separate IDL

Instead of reading C++ headers in the interop tooling, we would require the user to repeat the C++ API in some other form, for example, in a Rust-based IDL like in the cxx crate, or in sidecar files in a completely new format.

Pros

  • Interop tooling can be simpler if it does not have to read C++ headers. But even under this alternative approach, tooling might want to read C++ headers, nullifying this advantage. For example, tooling might want to automatically generate an initial Rust snippet or to suggest in presubmits to adjust the Rust code that mirrors a C++ API when that C++ API changes.
  • The most natural way to define Rust APIs is by using Rust code or Rust-like syntax in sidecar files.
  • Available Rust APIs are defined in easily accessible checked-in files.
  • API definitions written by a human might have higher quality, on average.

Cons

  • A big part of the C++ API needs to be duplicated to reliably match the Rust code with the C++ declarations. The initial code can be generated by tooling, but it has to be kept in sync. This is a burden for the C++ API owners, potentially a bigger one than allowing annotations in C++ headers.
    • There is a risk that C++ API owners might refuse to own IDL files.
  • The need to create a sidecar file creates a barrier to start using C++ libraries from Rust.
    • While the duplication overhead is justifiable for widely-used libraries, it is relatively high for libraries with few users and binaries, making it less likely that leaf teams will start adopting Rust.
  • When the C++ API is changed, the Rust definitions become out-of-sync with it. Tooling needs to detect this, and the Rust definitions need to be changed (either manually or tool-assisted).
  • There is no effective way to verify Rust binding code at the presubmit time of a C++ library other than building downstream projects.
  • Mapping Rust API definitions to the original C++ API definitions is more complicated and error-prone. For example, how would we target a specific overload of a function or constructor?
  • There is a risk that individual teams will build team-specific tooling that generates IDL files from C++ headers or generates both IDL files and C++ headers from a single source. These solutions are unlikely to scale to existing large codebases and will likely only work for that specific team.

Use Rust code to customize API projection into Rust

An alternative to storing additional information in C++ headers is to put it into Rust code. For example, the cxx crate requires users to re-state the C++ API in Rust syntax, adding information about lifetimes and nullability. The pros and cons of this choice are the same as when defining a special IDL that repeats the C++ API completely (see above).

Generate glue code in binary formats

Instead of generating glue code as textual sources, interop tooling could use Clang and LLVM APIs to emit object files with C++ glue code and use Rust compiler APIs to generate rmeta and rlib files with Rust glue code.

Pros

  • More flexibility in the code that can be generated. Controlling LLVM IR generation allows interop tooling to generate code that an unmodified compiler can't generate from textual source code. For example, the Rust language does not have any constructs that map to linkonce_odr functions in LLVM IR; if the interop tooling embedded the Rust compiler as a library and had more control over how it generates the IR, we could make that happen.

Cons

  • Injecting customizations provided by API owners is harder.
  • LLVM, Clang, and Rust compiler APIs are not stable. The format of Rust metadata files is not stable either. The larger the API subset we consume from Clang and Rust, the more difficult it becomes to maintain the tooling.
  • To generate object files the interop tooling has to ensure that its Clang/LLVM version and configuration is identical with the Clang compiler used to build other C++ code.
    • We can solve this problem, but it makes the system more fragile, compared to using existing C++ and Rust compilers to compile generated sources.
  • From time to time LLVM introduces bugs that cause miscompilations. If interop tooling embeds LLVM, we would be adding another tool that toolchain engineers will need to look into when debugging a miscompilation. We would be making the job of C++ toolchain maintainers harder.

Alternatives Considered: Existing tools

bindgen

bindgen automatically generates Rust bindings from C and C++ headers, which it consumes using libclang. The generated bindings are pure Rust code that interfaces with C and C++ using Rust’s built-in FFI for C (#[repr(C)] to indicate that a struct should use C memory layout and extern "C" to indicate that a function should use a C calling convention). C++ functions are handled by generating a Rust extern "C" function that has the same ABI as the C++ function and attaching a link_name attribute with the mangled name.

See here for an in-depth description of the use of bindgen in Stylo, a Rust component in Firefox.

Pros

  • The oldest and the most mature of the existing C++ interop tools (developed since Feb 2012).

Cons

  • Deficiencies in safety and ergonomics, for example:
    • References are imported as pointers. No lifetimes, no null-safety.
    • Constructors and destructors are not called automatically.
    • Overloads are distinguished by a numbered suffix in Rust. These numbers clutter the source code and are hard to remember, as they have no meaning. Adding overloads can change the numbering and hence break Rust callers.
  • It is impossible to use C++ inline functions and templates from Rust because of bindgen’s architecture1. The architecture is unlikely to change, and therefore, this is a dealbreaker.

Evaluation

bindgen could be used in a project that has very limited C++ interop needs. However, creating safe and ergonomic wrappers for the generated bindings would require additional effort. Our vision and goals for C++ interop are very different from what bindgen provides.

cbindgen

cbindgen automatically generates C or C++ headers for Rust libraries which expose a public C API.

Pros

Cons

  • Shallow understanding of Rust's modules and types.

    • cbindgen's docs point out that "A major limitation of cbindgen is that it does not understand Rust's module system or namespacing. This means that if cbindgen sees that it needs the definition for MyType and there exists two things in your project with the type name MyType, it won't know what to do. Currently, cbindgen's behaviour is unspecified if this happens."
    • This limitation seems mostly caused by building cbindgen on top of the syn crate. syn is able to parse Rust source code into an AST, but there is no facility at the syn level for type deduction or module traversal. Building such functionality would require replicating parts of the rustc compiler into cbindgen, or alternatively rewriting cbindgen on top of the rustc_driver crate).
  • Support of only extern "C" functions.

    • Supporting Rust functions that use the default calling convention would require generating not only C/C++ headers, but also generating Rust source with extern "C" thunks that trampoline into the original function (requiring that cbindgen starts generating Rust sources).
  • Support of only #[repr(C)] structs.

    • Default memory layout of Rust structs is unspecified and therefore cannot be determined by code examination at the syn level.
    • Even if the memory layout could be determined, the layout can change in a future compiler version, or change depending on compilation command line flags. To prevent using stale layout information, the auto-generated FFI code should therefore include compile-time assertions that the layout didn't change from the FFI generation time. The assertions should be present both in the generated C/C++ headers and on the Rust side (requiring that cbindgen starts generating Rust sources). The assertions would effectively verify that the FFI generation is driven by the build system (i.e. by Bazel, or Cargo, or GN/ninja, rather than manually) and that the integration between the FFI tools and the build system doesn't have any bugs (e.g. that it faithfully replicates all relevent compilation flags).

Evaluation

cbindgen could be used in a project that can create a narrow extern "C" / #[repr(C)] API and that is ready to manage the risk of incorrect name/module resolution. Wrapping additional Rust APIs would require extra effort.

Take-aways for Crubit design

Notes and observations about cbindgen can guide some design aspects of Crubit's cc_bindings_from_rs tool (that similarly to cbindgen generates C++ bindings for Rust crates). Using internal compiler knowledge (e.g. memory layout of structs, name and type resolution) requires that cc_bindings_from_rs depends on rustc_driver and other internal crates of rustc. The API of these crates is unstable which might increase the risk and maintenance cost of Crubit. Nevertheless, our experience with maintaining tools based on (also unstable) Clang APIs suggests that this extra risk and cost is likely going to be acceptable.

Build determinism requires that the Rust compiler produces the same output for the same set of inputs (the same compiler version, the same command-line flags, the same sources, etc.). This means that (despite conservative reservations about layout determinism) it should be okay to assume that cc_bindings_from_rs and rustc invocations will observe the same memory layout of structs, but this requires that cc_bindings_from_rs is built against exactly the same version of rustc_driver libraries as rustc. (This should also be reinforced by compile-time assertions in the generated FFI layer.)

cxx

cxx generates Rust bindings for C++ APIs and vice versa from an interface definition language (IDL) included inline in Rust source code. cxx generates Rust and C++ source code from IDL definitions. To check that the IDL definitions match the actual C++ API, cxx inserts static assertions2 into the generated C++ code; it does not, however, read the C++ headers itself. cxx contains built-in bindings for various Rust and C++ standard library types that are not customizable.

As far as we understand, cxx has the following design constraints and goals:

  • Ship a stable product for its intended audience.
    • As a consequence, improvements such as integrating move semantics are not going to be accepted soon. We understand that cxx is not a vehicle for experimentation. cxx maintainers would prefer us to first show that our ideas work in a fork of cxx or in a different system, such as autocxx, and that our improvements pull their weight given the added complexity.
  • Remain simple and transparent. There is a limit on the amount of complexity that will be tolerated.
    • There is a chance that improvements such as modeling C++ move semantics or various attempts at eliminating thunks will not be ever accepted in upstream cxx.
  • Non-goal: Automatically provide high fidelity interop.
    • cxx is designed for the use case of an executable where C++ and Rust parts communicate through a narrow interface.
  • Non-goal: Automatically provide the most performant interop in as many cases as possible. For example:
    • cxx does not attempt to eliminate C++-side thunks. Instead, using LTO is recommended.
    • cxx considers it acceptable to allocate all objects of "opaque" types on the heap. Users who find these heap allocations unacceptable for performance reasons are expected to implement a different C++ entry point that does not hit this limitation and bind it to Rust instead of the original C++ API. Heap allocation is acceptable for many C++ classes in most environments, but the exceptions are important enough for us that this is a major restriction.

Pros

  • Mature and ergonomic enough today for mixing C++ and Rust in existing codebases with limited C++ interop needs.
  • We avoid being on a tech island.

Cons

  • cxx’s stability goal makes it hard to experiment with how the Rust API looks.
  • Our goals are unlikely to align well with the goals of the intended user audience of cxx. We would be pulling cxx in directions that make it a worse product for its current users.
  • Almost no customizability. Users who are not satisfied with what cxx does are expected to wrap the target C++ API in a different C++ API that is more friendly to cxx.
  • cxx tries to be compatible with most standard C++ implementations found in the real world, so it cannot take advantage of unique guarantees provided by the target execution environment.

Evaluation

cxx could be used in projects with limited C++/Rust interop requirements. However, we would not be able to implement many interop features that we consider essential (for example, move semantics, templates).

autocxx

autocxx automatically generates Rust bindings from C++ headers. As the name implies, it automatically generates IDL definitions for cxx, which then produces the actual bindings. In addition, autocxx generates its own Rust and C++ code to extend the Rust API beyond what cxx itself would provide, for example to support passing POD types by value. autocxx consumes C++ headers indirectly by first running bindgen on them and then parsing the Rust code output by bindgen.

autocxx’s design goals are similar to our own in this document.

We did a case study on using an existing project's C++ API from Rust using autocxx.

Pros

  • Low barrier to entry: Bindings are generated from C++ headers, no need to write duplicate API definitions.
  • Ergonomic mappings for many C++ constructs.
  • Open to contributions that change the generated Rust APIs or make architectural changes.

Cons

  • Relatively new and immature.
  • Cannot (yet) consume complex headers without errors. We’ve managed to import some actual Spanner headers, but there are still enough outstanding issues that we can’t yet do anything useful with Spanner.
  • Architecture can make modifications difficult. autocxx is built on top of two other tools, bindgen and cxx, and the interfaces between these components can make it harder to make a modification than it would be in a monolithic tool. Specifically:
    • autocxx uses bindgen to generate a description of the C++ API that it can parse easily (as opposed to trying to parse C++ headers either directly or using Clang APIs). Since bindgen was not intended for this purpose, its output lacks some information that autocxx needs, so autocxx has forked bindgen to adapt it to its needs. The forked version emits additional information about the C++ API in the form of attributes attached to various API elements.
    • bindgen in turn is built on the libclang API, which doesn’t surface all of the functionality available through Clang’s C++ API. Adding features to libclang requires additional effort and has a 6 month lead time to appear in a stable release (to become eligible to be used from bindgen).
    • When errors occur, it can be hard to figure out which of the components is responsible.
    • Adding features can require touching multiple components, which requires commits to multiple repositories.

Evaluation

We initially intended to use autocxx to prototype various interop ideas and potentially as a basis for a field trial. We still believe this would be feasible, but after trying to modify autocxx and its bindgen fork during an internal C++/Rust interop study, we feel that autocxx’s complex architecture is enough of an impediment that we could achieve our goals with less total effort by creating an interop tool from scratch that consists of a single codebase and uses the Clang C++ API to directly interface with Clang.

1

Doing so would require either generating C++ source code or interfacing deeply enough with Clang to generate object code for inline functions and template instantiation.

2

And tricks such as suitable type conversions that force the C++ compiler to perform appropriate checks at compile time.

Lifetime Annotations for C++

Summary: We propose a scheme for annotating lifetimes for references and pointers in C++.

Note: This is a living document that is intended to always reflect the most current semantics and syntax of the lifetime annotations.

Introduction

This document proposes an attribute-based annotation scheme for C++ that describes object lifetime contracts. Lifetime annotations serve the following goals:

  • They allow relatively cheap, scalable, local static analysis to find many common cases of heap-use-after-free and stack-use-after-return bugs.
  • They allow other static analysis algorithms to be less conservative in their modeling of the C++ object graph and potential mutations done to it.
  • They serve as documentation of an API’s lifetime contract, which is often not described in the prose documentation of the API.
  • They enable better C++/Rust and C++/Swift interoperability.

The annotation scheme is inspired by Rust lifetimes, but it is adapted to C++ so that it can be incrementally rolled out to existing C++ codebases. Furthermore, the annotations can be automatically added to an existing codebase by a tool that infers the annotations based on the current behavior of each function’s implementation.

While the annotation scheme can express a large subset of Rust’s lifetime semantics, we have omitted some constructs that we do not expect to be necessary for our purposes. For example, lifetime bounds (e.g. 'a: 'b or T: 'a) may be needed rarely enough that we can do without them, and higher-ranked trait bounds (e.g. where for<'a> F: Fn(&'a i32)) are possible only for function types, which is what they are usually needed for.

We are aware of two existing schemes for annotating lifetimes and flagging lifetime violations in C++; we describe them in the sections “Alternative considered: [[clang::lifetimebound]]” and “Alternative considered: P1179 / -Wdangling-gsl” below. Both of these schemes have limitations that make them unsuitable for our purposes. We plan to enable our lifetime analysis to understand the existing annotations by translating them into our annotation syntax internally (where possible).

Proposal

Examples

To give a feel for how the annotations work in practice, we will first show some examples.

Here is a simple example:

const std::string& [[lifetime(a)]] smaller(
    const std::string& [[lifetime(a)]] s1,
    const std::string& [[lifetime(a)]] s2) {
  if (s1 < s2) {
    return s1;
  } else {
    return s2;
  }
}

The annotation states that both s1 and s2 may be referred to by the return value of the function. This implies that the lifetime of the return value is the shorter of the lifetimes of s1 and s2. In Rust, this example would be expressed as follows:

#![allow(unused)]
fn main() {
pub fn smaller<'a>(s1: &'a String, s2: &'a String) -> &'a String;
}

Note how the syntax is broadly similar. The main difference is that, unlike in Rust, our proposal does not require lifetimes to be declared.

A lifetime annotation placed after a member function refers to the lifetime of the object the member function is called on:

struct string {
  // The returned pointer should not outlive ``*this``.
  const char *[[lifetime(a)]] data() const [[lifetime(a)]];
};

Similar to Rust, [[lifetime(static)]] is used to denote a static lifetime. A common pattern is for a class to have a static function returning a reference to some default value:

class Options final {
 public:
  // ...
  static const Options &[[lifetime(static)]] DefaultOptions();
  // ...
};

The attribute can be applied to references that appear inside a more complex type expression. For example:

const std::vector<const A *[[lifetime(static)]]> &[[lifetime(static)]]
get_static_as();

This expresses that both the reference to the vector and the pointers to the As contained inside it have static lifetimes.

This roughly corresponds to the following in Rust (with the difference that, unlike C++ pointers, Rust references cannot be null):

#![allow(unused)]
fn main() {
fn get_static_as() -> &'static CxxVector<&'static A>;
}

Lifetimes

Lifetimes are associated with certain types that we call reference-like types. A reference-like type is one of the following:

  • A pointer (except pointers to functions and pointers to members)
  • A reference (except references to functions)
  • A user-defined type that has been annotated as having lifetime parameters. (We will explain user-defined reference-like types in detail in a later section.)

The reason that pointers to functions and references to functions do not have lifetimes is to be consistent with Rust, where fn types do not have lifetimes either. In C++, the function that a pointer or reference refers to almost always exists for the duration of the program execution. There are some exceptions, such as functions created by a JIT compiler or functions in plugins loaded and unloaded at runtime. Such functions may be destroyed before the program exits, but we consider them to be unusual enough that we don't support annotating their lifetimes.

Pointers to members don't have lifetimes because they aren't pointers in the narrower sense. A pointer to member doesn't refer to a specific object in memory; rather, it can be used to refer to a specific member of any object of a given type. In implementation terms, a pointer to member is not an address but an offset.

Lifetimes are annotated using the new attribute lifetime1. The attribute takes one or several lifetime names as arguments. Appendix A contains a formal description of the attribute syntax.

For brevity, lifetimes may be implicitly inferred in some situations; this is referred to as lifetime elision, and we describe the specific rules for this later.

There are two lifetime names with special meaning:

  • static: A lifetime that lasts for the duration of the program.
  • unsafe: A lifetime that cannot otherwise be represented correctly using lifetime annotations. We will discuss the semantics of an unsafe lifetime in more detail below.

In addition, there are two types of lifetimes that cannot be named in a lifetime attribute but that are implicitly associated with reference-like types in certain situations:

  • Local lifetime: The lifetime of a pointer to a variable with automatic storage duration.
  • Unknown lifetime: A lifetime that has not been annotated and cannot be implicitly inferred.
    The concept of unknown lifetimes is important because it allows us to migrate a codebase to lifetime annotations incrementally. Tools that verify lifetime correctness should assume that operations involving unknown lifetimes are lifetime-correct; this avoids generating large numbers of nuisance errors for code that has not been annotated yet. Note that this makes unknown lifetimes fundamentally different from unsafe lifetimes.

We call static, unsafe, local, and unknown lifetimes constant lifetimes. We call all other lifetimes variable lifetimes; this reflects the fact that they may be substituted by other lifetimes.

The lifetime attribute can be applied to reference-like types in function signatures, variable declarations (including member variable declarations), alias declarations, and to user-defined reference-like types when referring to static members of such types. The sections below give details on how the attribute can be applied to these constructs and what the semantics are in each case.

Note that, unlike in Rust, lifetimes are not part of the type. For the purposes of C++ semantics (e.g. function overloading), two types that differ only in their lifetime annotations are considered the same type. This is by design: We don’t want to change the semantics of existing code by adding lifetimes, and this is one of the reasons we have chosen to use C++ attributes; the C++ standard allows compilers to ignore attributes they don’t know, which implies that they have no effect on the C++ semantics.

Lifetime-correctness

The implementation of a function must be lifetime-correct. This section explains what that means.

Most expressions propagate lifetimes in ways that are straightforward. We will therefore explain lifetime-correctness rules only for those cases that are non-trivial.

Dereferencing a pointer or accessing the value referred to by a reference is lifetime-correct in exactly the following cases:

  • If its lifetime is static or a variable lifetime
  • If its lifetime is local and the access happens during the lifetime of the corresponding local variable.

Dereferencing a pointer with unknown lifetime or accessing the value referred to by a reference with unknown lifetime is not lifetime-correct, but tools should not emit lifetime verification errors in these cases.

operator new returns a pointer with unsafe lifetime. operator delete takes a pointer parameter that has unsafe lifetime.

Initializing or assigning an object of reference-like type with another object is always correct if the lifetimes of the two objects are the same.

In addition, there are a number of cases where it is permissible to initialize or assign an object of reference-like type with another object that has different lifetimes. We call such an operation a lifetime conversion.

To define lifetime correctness of conversions, we first need to define what it means for one lifetime to outlive another:

  • Any lifetime outlives itself.
  • The static lifetime outlives any variable or local lifetime.
  • Any variable lifetime outlives any local lifetime.
  • A local lifetime local1 outlives another local lifetime local2 if the object associated with local1 outlives the object associated with local2 according to C++’s lifetime rules.
  • The unsafe lifetime does not outlive any lifetime except itself, and no other lifetime outlives the unsafe lifetime.
  • The unknown lifetime does not outlive any lifetime except itself, and no other lifetime outlives the unknown lifetime. However, tools should not emit lifetime verification errors for lifetime conversions involving unknown lifetimes.

Note that no variable lifetime a outlives any other variable lifetime b; our annotation scheme does not permit specifying lifetime bounds between lifetimes in the way that Rust does.

Here are the rules for the correctness of lifetime conversions:

  • Lifetime-converting a non-const pointer of type T_from *[[lifetime(l_from)]] to type T_to *[[lifetime(l_to)]] is lifetime-correct if and only if
    • l_from outlives l_to, and
    • any lifetimes in T_from and T_to are identical.
  • Lifetime-converting a const pointer of type T_from * const [[lifetime(l_from)]] to type T_to * const [[lifetime(l_to)]] is lifetime-correct if and only if
    • l_from outlives l_to, and
    • converting T_from to T_to is lifetime-correct.
  • The rules for converting references are analogous to those for converting pointers.
  • An object of a class T with lifetime parameters may not be converted to an object of the same class T but with different lifetime parameters; see also the sections on variance and special member functions.

We will describe the lifetime-correctness rules for certain other constructs in the specific sections that deal with those constructs below.

lifetime_cast

To permit building safe abstractions on top of APIs that use unsafe lifetimes, we provide a way to cast unsafe lifetimes to safe lifetimes and vice versa using a function template called lifetime_cast2. A lifetime_cast is similar to C++ cast operations such as const_cast and static_cast but may only be used to change lifetimes.

Obviously, code that uses lifetime_cast must guarantee that the operation is actually lifetime-correct, i.e. that there is no risk of a use-after-free. Like unsafe code in Rust, uses of lifetime_cast should therefore be carefully reviewed and constrained to small parts of the codebase.

lifetime_cast is a function template defined suitably such that the call lifetime_cast<T>(e) evaluates to e and does not perform any copy or move operations. Tools will assume that the lifetimes of the result are those specified in the template argument for T. Apart from lifetime attributes, T must be the same as decltype(e).

A typical use case for lifetime_cast would be building a container such as std::vector on top of raw memory allocation primitives such as operator new. For example, one of the constructors for a vector might look like this:


template <class T>
void vector<T>::vector(size_t size) [[lifetime(a)]]
:   size_(size), capacity_(size), data_(lifetime_cast<T *[[lifetime(a)]]>(new
    T[size])) {}

Concise syntax using macros

Even with lifetime elision, there is a potential concern that the annotations will introduce excessive clutter. A lifetime in Rust typically requires only two characters, e.g. 'a. In contrast, the attribute proposed above, [[lifetime(a)]], requires at least 15 characters, or more if the attribute is scoped inside a namespace.

To reduce verbosity, we suggest providing a macro with a short name that expands to the actual lifetime attribute. The single-character macro name “$” is not in widespread use in many codebases3; a codebase maintainer would obviously want to consider carefully what to use it for, but we think lifetimes could be a worthwhile use. In addition to a general $(lifetime) macro, we could also define lifetime macros $a through $z to allow an even more concise annotation. As an example, this is what the smaller() example from the beginning would look like with this concise syntax:

const std::string &$a smaller(
    const std::string &$a s1,
    const std::string &$a s2);

For a more extensive example, see appendix B, which shows what std::string_view would look like with these annotations.

Every codebase can of course define its own macro shortcuts that work within the context of the codebase. A more traditional and still concise macro name would be LT, with additional macros LT_A through LT_Z for concise single-letter lifetimes.

For brevity, in the examples that follow, we will use the $ convention.

Pointers and References

As already noted, pointers and references can be annotated with a lifetime, which specifies the lifetime of the object the pointer or reference refers to (the pointee). The lifetime of the pointee must outlive the lifetime of the pointer or reference itself.

For example, let’s look at the example of a double pointer int * $a * $b. The annotation $b on the outer pointer specifies the lifetime of the inner pointer of type int *; the annotation $a on the inner pointer specifies the lifetime of the int. When these lifetime variables are substituted with constant lifetimes, the lifetime substituted for $a must outlive the lifetime substituted for $b. This ensures that the int lives for at least as long as the int * pointer that refers to it.

Functions

Lifetime attributes may be placed in the parameter types and return type of a function or function type. In addition, for non-static member functions, a lifetime attribute may be placed after the function declaration to describe the lifetime of the object the member function is called on, i.e. the lifetime of the implicit this parameter.

If a translation unit contains multiple declarations of the same function (including its definition), the lifetime attributes in all declarations must be the same.

As in Rust, a function is considered to be parameterized by the lifetimes that appear in its signature. To express this, a lifetime_param attribute containing the variable lifetime parameters may be placed in front of the function definition, like this:

[[lifetime_param(a)]]
int *$a ReturnPtr(int *$a p) {
  return p;
}

However, for brevity, this lifetime_param attribute may and should be left out in most cases. The exception to this is when the signature of the function contains a function type that itself contains lifetimes; in this case, a lifetime_param attribute must be added to disambiguate whether the lifetime should be considered a parameter of the function type or the function. For example:

// Lifetime $a is a parameter of the function type int*(int*).
void AddCallback(std::function<int *$a(int *$a) [[lifetime_param(a)]]> f);

// Lifetime $a is a parameter of the function AddCallback().
[[lifetime_param(a)]]
void AddCallback(std::function<int *$a(int *$a)> f, int *$a p);

Lifetime parameters on function types are analogous to higher-ranked trait bounds in Rust; unlike Rust, however, we only allow this concept in the context of function types, which is where it is typically required.

TODO: Show an example where we're passing a pointer to a local variable into the callback and discuss how this is allowed in the HRTB case but not the other case.

Lifetime-converting a function pointer from to a function pointer to of the same type but with different lifetimes is lifetime-correct if from has either the same lifetimes as to or lifetimes that are more permissive. This means that we must be able to substitute the lifetime parameters of from with lifetime parameters of to such that:

  • Every parameter of to is lifetime-convertible to the corresponding parameter of from. (Note the direction of the conversion, which is reversed from what one might initially expect. The idea is that from needs to be able to stand in for to, so we need to be able to convert the parameters of to to the parameters of from.4)

  • The return type of from is lifetime-convertible to to.

Similarly, a virtual member function Derived::f that overrides a base class function Base::f must have either the same lifetimes or lifetimes that are more permissive. This means that there we must be able to substitute the lifetime parameters of Derived::f with lifetime parameters of Base::f such that:

  • Every parameter of Base::f is lifetime-convertible to the corresponding parameter of Derived::f.
  • The return type of Derived::f is lifetime-convertible to Base::f.

A function call is lifetime-correct if the lifetime parameters of the callee can be substituted by lifetimes from the caller in such a way that converting all arguments to the respective parameter lifetimes is lifetime-correct. If no such substitution can be found, the function call is not lifetime-correct.

Here is an example that illustrates how this works:

void copy_ptr(int *$x from, int *$x *$y to) {
  *to = from;
}

int *$a return_ptr(int *$a p) {
  int* copy;
  copy_ptr(p, &copy);
  return copy;
}

First of all, the copy pointer is inferred to have lifetime $a because it is used in the return statement. Let’s use the name $local1 for the lifetime of the copy variable itself.

Now let’s look at the call to copy_ptr. If we make the substitutions $x = $a and $y = $local1, we see that the lifetimes of the arguments are identical to those of the parameters, so it is trivially correct to lifetime-convert them.

Assume now that return_ptr had been declared with different lifetimes for its parameter and return type:

int *$a return_ptr(int *$b p) {
  int* copy;
  copy_ptr(p, &copy);  // Error, not lifetime-correct.
  return copy;
}

Again, the copy pointer has lifetime $a. If we choose the substitution $x = $a, we can lifetime-convert the second argument but not the first argument (we need an int *$a but we have an int *$b). If we choose $x = $b, we can lifetime-convert the first argument but not the second argument (we need an int *$b * but we have an int *$a *).

Because there is no substitution we can make for $x that allows a lifetime-correct conversion of the arguments of copy_ptr to the respective parameter lifetimes, the call is not lifetime-correct.

Lifetime elision

As in Rust, to avoid unnecessary annotation clutter, we allow lifetime annotations to be elided (omitted) from a function signature when they conform to certain regular patterns. Lifetime elision is merely a shorthand for these regular lifetime patterns. Elided lifetimes are treated exactly as if they had been spelled out explicitly; in particular, they are subject to lifetime verification, so they are just as safe as explicitly annotated lifetimes.

We adopt the same lifetime elision rules as Rust. We will expand on the rationale for this below, but first let us present the rules.

We call lifetimes on parameters input lifetimes and lifetimes on return values output lifetimes. There are three rules:

  1. Each input lifetime that is elided (i.e. not stated explicitly) becomes a distinct lifetime.
  2. If there is exactly one input lifetime (whether stated explicitly or elided), that lifetime is assigned to all elided output lifetimes.
  3. If there are multiple input lifetimes but one of them applies to the implicit this parameter, that lifetime is assigned to all elided output lifetimes.

If a function signature contains a function type (in a parameter or the return value), lifetime elision is performed separately for any lifetimes that occur in this function type, independent of the lifetimes in the surrounding function signature. Any elided lifetimes within the function type become lifetime parameters of the function type. See also the discussion of lifetime parameters on function types in this section.

Lifetime elision rules have two requirements:

  1. They need to be easy for a programmer to remember and apply.
  2. They should be applicable to as many functions as possible, i.e. they should maximize the percentage of functions whose lifetime semantics correspond to the elided lifetimes. Put differently, they should minimize the percentage of functions which need explicit, non-elided lifetimes.

There is some alignment between these requirements, but some tension too. Working out what the best set of rules is likely requires quite a bit of testing. Instead of doing this, we have for the time being adopted the same set of rules that Rust uses, which presumably have a lot of collective experience embedded in them. The underlying assumption is that Rust and C++ functions do similar things with lifetimes in their interfaces; this assumption seems passable, though surely not perfect. An added benefit of using the Rust rules is that programmers using both languages don't need to keep two sets of rules in their head.

Once we have static analysis tooling that can run on real-world codebases, we may do some tweaking of the lifetime elision rules, but there would need to be clear benefits to justify giving up commonality with Rust.

Introducing lifetimes to a codebase will have to happen incrementally, and this requires some additional considerations. During the transition, there will be some files that have not yet been annotated, and we may indeed decide to exclude some parts of the code base from annotation permanently. Lifetime elision should not be applied to files that have not been annotated or verified for lifetime correctness; instead, the lifetimes should be assumed to be unknown, as described above.

We propose using a pragma or suitable comment string to mark source files where lifetime elision is allowed, e.g.:

#pragma clang lifetime_elision

Static member variables and non-member variables

Static member variable declarations and non-member variable declarations need not contain lifetime attributes but may do so for clarity.

In general, it may not even be possible to annotate a local variable correctly with the current lifetime annotation syntax. This happens when a local variable may refer to objects of different, unrelated lifetimes. Such a situation is entirely permissible; lifetime inference and verification tools need to deal with this by using a richer internal representation for the lifetimes of local variables.

If a variable has static storage duration, all lifetimes in its type are implicitly assumed to be static. Any manual annotations that are present may only specify the lifetimes static or unsafe.

Taking the address of a static member variable or non-member variable yields a pointer with a lifetime that depends on the variable’s storage duration. If the variable has static storage duration, the pointer has static lifetime. If the variable has automatic storage duration, the pointer has a local lifetime.

Classes and non-static member variables

A class may be annotated with one or several lifetime parameters by placing the new attribute lifetime_param in the class declaration, and a class annotated in this way is considered to be a reference-like type. All declarations of a class must be annotated with the same lifetime parameters. (See appendix A for a formal description of the attribute syntax.)

When lifetime parameters are substituted with constant lifetime arguments, all of these lifetime arguments must outlive the lifetime of the object they are applied to. This is analogous to the corresponding rule for pointers and references.

Lifetime parameters are necessary when an object of the class contains references to data that has a different lifetime than the object itself; the standard C++ types std::string_view and std::span are examples of this.

The lifetime parameters may be used in the declarations of non-static member functions and non-static member variables of the class.

As an example, here is how parts of std::string_view might be annotated5:

class [[lifetime_param(a)]] string_view {
  string_view(const char *$a data, size_type len)
      : ptr_(data), len_(len) {}


  const char *$a data() const { return ptr_; }

  string_view $a substr(size_t pos, size_t count) const;

private:
  const char *$a ptr_;
  size_t len_;
};

All reference-like types in the declaration of a non-static member variable must be annotated with the lifetimes static, unsafe, or one of the lifetime parameters of the class.

If a class contains owning pointers to manually allocated memory, these pointers will typically be annotated with an unsafe lifetime. Collection types such as std::vector are examples of this. Member functions that provide access to the owned memory will typically perform a lifetime_cast to the lifetime of the owning object. For example, std::vector::at() has the lifetime signature T& $a std::vector<T>::at(size_type) $a.

A class is not required to use any of its lifetime parameters; it may declare lifetime parameters solely for the purpose of associating a lifetime with objects of the class.

Derived classes

Derived classes inherit the lifetime parameters of their base classes. It is not permissible to add lifetime parameters to a derived class; in other words, all lifetime parameters need to be declared on the base class. If a derived class has multiple base classes, only one of these base classes may declare lifetime parameters.

TODO: Having a derived class “silently” inherit the lifetime parameters of its base classes isn’t great because it doesn't make the lifetime parameters of the derived class visible at the place where it is defined. We should instead consider requiring the lifetime parameters to be re-declared.

The motivation for this rule is to cover the case where a call to a virtual member function in the base class may access member variables of reference-like type in a derived class. A similar situation exists when casting a pointer from the base class to the derived class. In both cases, we want all lifetimes that are relevant to the derived class to be known on the base class.

Special member functions

Special member functions can be annotated with lifetimes just like other member functions, but they deserve special attention because they can be implicitly declared and because they are central to the semantics of C++ value types.

The default constructor and destructor are trivial as they only take a single reference-like parameter, the implicit this parameter, so we will not discuss them further.

The lifetimes in the copy and move operations for a type A without lifetime parameters are as follows (using $s and $o as mnemonics for “self” and “other”):

A(const A& $o) $s;
A(A&& $o) $s;
A& $s operator=(const A& $o) $s;
A& $s operator=(A&& $o) $s;

Conveniently, these are the lifetimes that are implied by lifetime elision, so they would be omitted in practice.

The implication of these lifetimes is that it is possible to move or assign an object of type A to another object with a different lifetime.

The situation is slightly more complicated for a type with lifetime parameters. As an example, consider the following class:

struct [[lifetime_param(p)]] B {
  int* $p p;
};

(The special member functions are implicitly defaulted.)

The lifetimes of the special member functions on B are as follows:

B(const B $p & $o) $s;
B(B $p && $o) $s;
B& $s operator=(const B $p & $o) $s;
B& $s operator=(B $p && $o) $s;

Note that while the lifetimes of the “self” and “other” objects themselves are different, their lifetime parameters are the same. This implies that the copy and move operations cannot extend the lifetime of B::p.

The lifetimes above are not the same as those implied by lifetime elision. Classes with lifetime parameters that use the defaulted copy and move operations need to add explicitly defaulted definitions for these operations.

Alias declarations

Alias declarations can declare lifetime parameters in a similar way to classes. These lifetime parameters can then be used on the right-hand side of the alias declaration. In addition, any alias declaration, whether it has lifetime parameters or not, can use the lifetimes static and unsafe on its right-hand side.

If an alias declaration is contained inside a class, its right-hand side may not use any lifetime parameters of that class. This is because, in general, an instance of the alias type has no connection to an instance of the class.

Here is an example for an alias declaration with lifetime parameters, again using std::string_view:

class [[lifetime_param(a)]] string_view {
public:
  // ...
  using const_iterator [[lifetime_param(i)]] = const char *$i;
  const_iterator $a begin() const;
  const_iterator $a end() const;
  // ...
};

Note that the lifetime_param attribute comes after the type alias name, whereas in a class declaration it comes before the class name. This may seem inconsistent, but the placement is dictated by the C++ grammar.

So far, we have pretended that string_view is a class, but it is in fact itself an alias declaration for basic_string_view<char>, and this alias declaration therefore has a lifetime parameter:

template <class T> class [[lifetime_param(a)]] basic_string_view {
  // ...
};

using string_view [[lifetime_param(a)]] = basic_string_view<char> $a;

The interpretation of this is that string_view is a type with a lifetime parameter a, and that this lifetime parameter should be forwarded to the lifetime parameter of basic_string_view<char>.

Templates

A function template or class template may be annotated with lifetime attributes and, in the case of class templates, a lifetime_param attribute, just like a non-template function or class.

Explicit template instantiations may not contain lifetime attributes.

Lifetime-correctness of a template may, in general, depend on the template arguments. A template is lifetime-correct if there exists at least one set of arguments for which no specialization exists, that do not result in substitution failure, and for which the specialized template is lifetime-correct.

In general, therefore, lifetimes can only be inferred and verified on a template instantiation. This implies that inference and verification may need to be done multiple times if the same template instantiation is used in multiple translation units. This is slightly unfortunate, but there does not seem to be a good way around it, and it mirrors the fact that such a template instantiation is also compiled multiple times.

To the extent that it is possible to infer and verify lifetimes on the template itself, independent of the template arguments, tooling should do this. In other words, a lifetime-correctness error should be flagged if there is no set of template arguments for which the specialized template is lifetime-correct. Lifetimes should be inferred if they are correct for any template arguments for which no specialization exists and which do not result in a substitution failure.

Partial template specializations should be treated the same way as primary template definitions: Tooling should infer and verify lifetimes on the partial specialization to the extent that this can be done independent of the template arguments.

Full template specializations should be treated the same way as non-template functions and classes: Lifetimes should be inferred and verified on the full template specialization.

When analyzing code that uses a template for which partial or full specializations exist, tooling must of course make sure to refer to the correct specialization.

Function templates

A function template’s type arguments as well as other dependent types may, in general, be reference-like types. Therefore, when a function template instantiation is used (either by calling it or by taking its address), tooling should do the following:

  • Verify the lifetime-correctness of the function template instantiation.
  • Infer lifetimes for all reference-like types in the signature of the function template instantiation, except for reference-like types that occur in the function template itself and are already annotated with lifetimes there.

The lifetimes inferred for the specialized function template should be used when inferring and verifying lifetimes of functions that use the specialized function template.

Class templates

The type arguments to a class template may be reference-like types. A class template that is specialized with reference-like types in this way is itself considered to be a reference-like type. The specialized class template has a lifetime parameter for each reference-like type that occurs in the template arguments; these lifetime parameters are in addition to any lifetime parameters that are annotated on the class template itself using the lifetime_param attribute. The lifetime parameters associated with a template argument are implicitly propagated to all uses of that argument in the class template.

TODO: Add a discussion of template template arguments

Lifetimes are assigned to a specialized class template’s lifetime arguments as for any other reference-like type, i.e. depending on the context in which the specialized class template is used they may be explicitly annotated, implied by lifetime elision, or inferred. However, there is a syntactical difference: When lifetimes are explicitly annotated, they are placed in the template arguments instead of after the type, as they would be for other lifetime parameters. For example, here a function that takes a vector of pointers and returns an element of the vector:

int* $a get_ith(const std::vector<int* $a>& $b v, size_t i) {
  return v[i];
}

TODO: Discuss dependent types.

A member function of a specialized class template need not be lifetime-correct for all possible assignments of the lifetime parameters associated with the template arguments.

Instead, we only require that every ODR-use of a member function of a specialized class template is lifetime-correct for the lifetimes assigned to the lifetime parameters for that particular use.

A (slightly contrived) example will help to illustrate why these rules are written the way they are.

template <class From, class To>
struct Convert {
  To convert(From from) { return from; }
};

void constify(int* [[lifetime(a)]] p,
              const int *[[lifetime(a)]] *[[lifetime(b)]] pp) {
  Convert<int*, const int*> c;
  *pp = c.convert(p);
}

The specialized class template Convert<int*, const int*> has two lifetime parameters: one lifetime parameter (which we will call x) for the int* template argument, and one lifetime parameter (which we will call y) for the const int* template argument.

The Convert::convert() member function is not lifetime-correct if we consider x and y to be arbitrary variable lifetimes, as it is not lifetime-correct to lifetime-convert an int *with lifetime x to a const int * with lifetime y.

However, for the use of Convert in the declaration of c, we infer that both x and y should be substituted by the lifetime a. Convert::convert() is lifetime-correct when x and y are substituted in this way.

TODO: Do we need to make this distinction between lifetime parameters and the lifetimes they are substituted with, or can we make the substitution directly?

Variance

As in Rust, we need to establish some variance rules for type and lifetime parameters, but the specific rules differ slightly from Rust.

  • Const references and pointers const T & and const T * are covariant with respect to T.
  • Non-const references and pointers T & and T * are invariant with respect to T.
  • Class templates are invariant with respect to their type parameters (including lifetimes contained in them).
  • All lifetime-parameterized types (classes and alias declarations) are invariant with respect to their lifetime parameters.

The last two rules differ from Rust, which infers the variance of type and lifetime parameters on user-defined types. Unlike Rust generics, C++ class templates are invariant with respect to their type parameters6, and we want to be consistent with this.

Regarding lifetime parameters on types, we restrict ourselves to invariance for simplicity. Rust infers the variance of lifetime parameters from the way they are used in the definition of the type, but in C++, this is impossible to do, at least on a single-translation-unit basis, as a lifetime-parameterized class may only be forward-declared in the current translation unit. For simplicity, and consistency with template parameters, we have therefore decided that lifetime parameters will always be invariant, as we expect this to be sufficient in practice. If this turns out to be too limiting, we may need to provide a way of annotating the variance of lifetime parameters.

Alternative considered: [[clang::lifetimebound]]

Clang already provides a limited ability to annotate lifetimes with the [[clang::lifetimebound]] attribute7. Quoting from the documentation:

The lifetimebound attribute on a function parameter or implicit object parameter indicates that objects that are referred to by that parameter may also be referred to by the return value of the annotated function (or, for a parameter of a constructor, by the value of the constructed object).

If the lifetime annotation is applied to aggregates (arrays and simple structs), those aggregates are considered to refer to any pointers or references transitively contained within them.

Here, again, is the smaller() example, but annotated with [[clang::lifetimebound]]:

const std::string& smaller(
    const std::string& s1 [[clang::lifetimebound]],
    const std::string& s2 [[clang::lifetimebound]]);

The attribute may also be applied to a member function to indicate that the lifetime of the return value corresponds to the lifetime of the object. Here is an example from the [[clang::lifetimebound]] documentation:

struct string {
  // The returned pointer should not outlive ``*this``.
  const char *data() const [[clang::lifetimebound]];
};

This is an example of the very common case where a member function returns a pointer or reference to part of the object, or to another object owned by it.

The [[clang::lifetimebound]] attribute provides a way to express lifetimes in many common scenarios, but it does have its limitations:

  • There is no way to differentiate between different lifetimes.

  • There is no way to annotate a static lifetime.

  • The attribute attaches to function parameters and always implicitly refers to the outermost reference-like type8; it is not possible to attach it to part of a type (e.g. to the T * in a const std::vector<T *> &).

  • The single lifetime is implicitly applied to the outermost reference-like type in the function’s return type (or the value of the constructed object, in the case of a constructor). Again, it is not possible to associate the lifetime with inner reference types in the return value (e.g. the T * in const std::vector<T *> &).

  • The lifetime of a constructor parameter can be associated with the lifetime of the object being constructed, i.e. with the lifetime of the this pointer, but this isn’t possible in other member functions. In other words, a member function cannot associate the lifetime of a parameter with the lifetime of the object the member function is called on.

  • There is no way to add a lifetime parameter to a struct.

Alternative considered: P1179 / -Wdangling-gsl

The WG21 proposal P1179 describes a static analysis that aims to prevent many common types of use-after-free. It uses an attribute-based annotation scheme to describe the lifetime contracts of functions and to annotate user-defined types containing indirections.

Preliminary implementations of this scheme exist in MSVC and a fork of Clang. In addition, Clang trunk implements statement-local warnings inspired by the scheme, which are enabled by the on-by-default flag -Wdangling-gsl.

The scheme has both advantages and disadvantages compared to the scheme proposed here:

  • Advantages
    • Can express independent pre- and postconditions for lifetimes, e.g. to annotate std::swap(ptr1, ptr2), where the lifetimes of the pointers after the call are swapped compared to before the call.
    • Can diagnose some cases of iterator invalidation.
  • Disadvantages
    • User-defined types can only be annotated as having one of a class of fairly specific lifetime semantics (“SharedOwner”, “Owner”, “Pointer”); arbitrary annotation of classes with lifetime parameters is not possible.
    • Cannot refer to lifetimes of pointers in template arguments, e.g. no way to express int *$a return_first(const vector<int *$a> &$b v);
    • Annotations can be verbose and syntactically removed from the objects they refer to.

We believe the limitations of this scheme will restrict its usefulness in the use cases we are interested in. A more in-depth comparison of P1179 with our proposed scheme can be found here.

Appendix A: Lifetime attribute specification

This appendix describes where lifetime attributes may appear and what arguments they can take.

Temporary syntax

We are currently still experimenting with the exact syntax and semantics for the lifetime annotations. While we are doing so, we will use the general-purpose annotate and annotate_type attributes as stand-ins for the new attributes proposed below.

Attribute definitions

We introduce two new attributes, lifetime and lifetime_param. In practice, these would be scoped to a namespace (probably clang), but for ease of exposition, we assume they are in the global namespace.

Attribute lifetime_param

This attribute may be applied to the following:

  • A class definition (more formally, it may appear in the attribute-specifier-seq of a class-head)
  • An alias-declaration (specifically, the attribute-specifier-seq following the identifier)

The attribute takes one or more arguments. Each of these arguments must be an identifier9; each argument defines a lifetime parameter for the corresponding class.

If the class definition or alias declaration is nested within a class that itself has a lifetime_param attribute, none of the lifetime parameter names of the outer class may be used as lifetime parameter names on the nested class definition or alias declaration.

Attribute lifetime

This attribute may be applied to the following:

The attribute takes one or more arguments, each of which must be an identifier or the keyword static. We call these identifiers lifetime names.

In addition, the following constraints apply:

  • When the lifetime attribute is applied to a type, the type must be a class type or alias declaration whose definition contains a lifetime_param attribute.

    The lifetime attribute must have the same number of arguments as the lifetime_param attribute on the corresponding class or alias declaration. (These arguments define lifetime parameters for the object instance.)

  • When the lifetime attribute is applied to a pointer operator, it must take exactly one argument. (This defines a lifetime for the object referenced by the pointer operator.).

  • When the lifetime attribute is applied to a non-static member function declaration, it must take exactly one argument. (This defines a lifetime for the implicit object parameter).

  • Every lifetime name that appears in a function’s return value must either be static or also appear either in

    • the function’s parameter list, or
    • the lifetime attribute for the implicit object parameter (in the case of a non-static member function), or
    • the lifetime_param attribute of the class (in the case of a non-static member function).
  • For every constructor of a class that has a lifetime_param attribute, every lifetime name that appears in the lifetime_param attribute must appear in the constructor’s parameter list.

  • Every lifetime name that appears in a non-static member variable declaration must either be static or one of the lifetime parameters declared in a lifetime_param attribute on the class containing the member variable declaration.

  • Every lifetime name that appears in the defining-type-id of an alias declaration must either be static or one of the lifetime parameters declared in a lifetime_param attribute on the alias declaration. Note that if the alias declaration is nested within a class that also has lifetime parameters, those lifetime parameters may not appear in the defining-type-id of the alias declaration.

Appendix B: std::string_view with lifetime annotations

To illustrate how lifetime annotations work on a larger code sample, here is an annotated version of interesting parts of std::string_view. To keep the code clear, we have omitted basic_string_view and simply stamped out string_view for the template arguments used in its definition.

// Lifetime "s" is mnemonic for "lifetime parameter of string_view"
class LIFETIME_PARAM(s) string_view {
public:
  using const_pointer LIFETIME_PARAM(iter_lifetime) = const char *$(iter_lifetime);
  using const_reference LIFETIME_PARAM(iter_lifetime) = const char &$(iter_lifetime);
  using const_iterator LIFETIME_PARAM(iter_lifetime) = const char *$(iter_lifetime);
  using iterator LIFETIME_PARAM(iter_lifetime) = const_iterator $(iter_lifetime);
  using const_reverse_iterator LIFETIME_PARAM(iter_lifetime) =
      std::reverse_iterator<const_iterator $(iter_lifetime)>;
  using reverse_iterator LIFETIME_PARAM(iter_lifetime) =
      const_reverse_iterator $(iter_lifetime);

  using size_type = size_t;

  static constexpr size_type npos = static_cast<size_type>(-1);

  constexpr string_view() noexcept;
  constexpr string_view(const string_view $s & other) noexcept = default;
  constexpr string_view(const char* $s data, size_type len);

  constexpr const_iterator $s begin() const noexcept;
  constexpr const_iterator $s end() const noexcept;
  constexpr const_reverse_iterator $s rbegin() const noexcept;
  constexpr const_reverse_iterator $s rend() const noexcept;

  constexpr const_reference $s front() const;
  constexpr const_reference $s back() const;

  constexpr const_pointer $s data() const noexcept;

  constexpr const_reference $s operator[](size_type i) const;
  constexpr const_reference $s at(size_type i) const;

  // The annotation cannot express that the lifetime parameter of `this` and
  // `other` are swapped after the call, so we have to be overly restrictive and
  // require `this` and `other` to have the same lifetime parameter.
  constexpr void swap(string_view $s & other) noexcept;

  // Output buffer may have a different lifetime than this string view's data.
  size_type copy(char* buf, size_type n, size_type pos = 0) const;

  // Returned substring has the same lifetime parameter as this `string_view`.
  constexpr string_view $s substr(size_type pos = 0, size_type n = npos) const;

  // `string_view` to compare against does not need to share the same lifetime.
  constexpr int compare(string_view x) const noexcept;

private:
  const char* $s ptr_;
  size_type length_;
};

Notes

1

The attribute will be scoped to some suitable namespace, but for ease of exposition we assume here that it is placed in the global namespace.

2

lifetime_cast will be placed in a suitable namespace, but for ease of exposition, we assume here that it is in the global namespace.

3

$” is not part of the standard set of characters allowed in C++ identifiers (including macro names), but the C++ standard permits implementations to allow additional implementation-defined characters, and gcc, Clang, and MSVC allow $ as an implementation-defined character.

4

More formally, this is because function types are contravariant in their parameter types.

5

For simplicity, we are showing std::string_view as if it was a non-template type.

6

Unless converting constructors and conversion constructors are used to simulate variance.

7

This attribute is inspired by the C++ Standards Committee paper P0936R0.

8

Quoting Richard Smith: "The Clang attribute behaves as if each type has exactly one associated lifetime, and the attribute says in which cases the outermost lifetime of a parameter matches the outermost lifetime of the return value.”

9

Note that this automatically disallows the special lifetime name static, which is allowed in lifetime attributes. We make no other constraints on identifiers, but codebases that want to use the lifetime annotations for C++ / Rust interop may want to enforce a rule that prohibits invalid Rust identifiers (e.g. Rust keywords) in the lifetime_param and lifetime attributes..

Static Analysis for C++ Lifetimes

Summary: We describe a static analysis that infers lifetimes in C++ function signatures.

NOTE: This document describes the approach we are currently pursuing but it is a) incomplete, and b) out of date. It has become clear that we are still making changes to the static analysis frequently enough that it does not seem worth updating a document in parallel with those changes. Once the static analysis appears reasonably stable, we plan to update this document to describe it.

Introduction

Lifetime analysis has two goals:

  • Infer lifetime annotations to put in C++ function signatures, using the attributes described in this doc.
  • Verify lifetime-correctness of function bodies.

To infer and verify lifetimes, we perform a pointer analysis1. For each pointer or other reference-like type, a pointer analysis determines a points-to set consisting of the storage locations it may point to.

There are different approaches to pointer analysis that can be classified according to various properties. The pointer analysis we perform here has the following properties:

  • Intraprocedural, context-insensitive. We analyze each function individually and do not take into account how it is called from different callsites.
  • Array-insensitive. We treat all elements in an array containing a reference-like type as having the same lifetime.
  • Field-insensitive. We treat member variables of reference-like type as having the same lifetime as the object they are contained in (unless they carry a lifetime annotation).
  • Flow-sensitive. When analyzing a function, we take statement ordering and control flow into account. We believe flow sensitivity is important to avoid inferring overly restrictive lifetimes and emitting false positive errors.

The pointer analysis we perform is relatively coarse-grained in that we do not distinguish between different storage locations with the same lifetime; equivalently, we can say that we identify a storage location merely by its lifetime.

A points-to set is therefore just a set of lifetimes; a reference-like object is also simply identified by its lifetime. The state that is tracked during the analysis is therefore just a mapping from a lifetime (identifying the reference-like object) to a set of lifetimes (identifying the storage locations it may point to).

This coarse-grained approach simplifies the analysis and is sufficient for our purposes because we are only attempting to infer and verify statements about lifetimes.

Analysis of a translation unit

We analyze all functions in a translation unit for which we have a definition.

We attempt to analyze all of these functions in topological order so that callees are analyzed before callers. Where recursion makes this impossible, we analyze the functions that take part in the recursive cycle in arbitrary order. We accept that this may make it impossible to infer lifetimes for functions in a recursive cycle.

Analysis of a function

As explained in the introduction, we identify an object (often called a storage location in pointer analysis) merely by its lifetime.

A points-to set is therefore simply a set of lifetimes. It represents the set of objects that a reference-like type or glvalue can be referencing at some point of execution of the program. We will sometimes refer to the objects in a points-to set as pointees.

We associate each local variable in the function with a different local lifetime. This serves two purposes: a) It reflects the fact that local variables do, in general, have different lifetimes, and this is important for lifetime verification. b) It allows us to associate a different points-to set with different local variables of reference-like type, and this is required to make the analysis precise enough.

We perform a data-flow analysis using the Clang dataflow framework (documentation) to propagate points-to sets through the function. After the analysis is complete, we produce lifetime annotations from the points-to sets; if these lifetime annotations are different from existing annotations (ignoring pure renamings), we output the new annotations as suggested edits.

The data-flow analysis tracks the following state:

  • For each reference-like object (identified by its lifetime), the points-to set of that reference-like object
  • For each expression of reference-like type, the points-to set of the expression
  • For each glvalue expression, the points-to set representing the glvalue’s referent
  • If the function’s return type is a reference or pointer type, a points-to set for the return value

The join operation on points-to sets means taking the union of the two sets.

The initial state for the data flow analysis is produced as follows:

  • Associate each parameter of reference-like type with a points-to set containing a new unique regular lifetime representing the pointee.
  • If the pointee is itself of reference-like type, recursively associate that pointee with a points-to set containing a new regular lifetime, and so on.

During the analysis, we propagate points-to sets through expressions and update the points-to sets of reference-like objects.

After the analysis is complete, we obtain lifetime annotations by examining the points-to sets of all parameters of reference-like type and the return value (if applicable), descending into pointees that are themselves of reference-like type.

For every points-to set, we look at the set of lifetimes of its pointees. If there are multiple lifetimes, they are substituted by a single lifetime. This lifetime then becomes the lifetime of the corresponding reference or pointer type in the signature.

Here are some examples:

void foo(int* from, int** to) {
  // from_pointee (int): '1
  // to_pointee (int *): '2
  // to_pointee_pointee (int): '3
  // from: { from_pointee }
  // to: { to_pointee }
  // to_pointee: { to_pointee_pointee }

  *to = from;
  // to_pointee: { from, to_pointee_pointee }
}

TODO: Explain. Also talk about why, after the assignment *to = from, we keep to_pointee_pointee in the points-to set and how we can, in some cases, eliminate it. (Distinguish between scalar and aggregate pointees -- the latter are arrays, for example. We can only delete existing pointees if *to has a single pointee and it's scalar.)

int* target(int* p1, int* p2) {
  // p1_pointee (int): '1
  // p1: { p1_pointee }
  // p2_pointee (int): '2
  // p2: { p2_pointee }
  int** pp;
  if (foo()) {
    pp = &p1;  // pp: { p1 }
  } else {
    pp = &p2;  // pp: { p2 }
  }
  // pp: { p1, p2 }
  int local = 42;
  *pp = &local;  // glvalue on left side is { p1, p2 }, so:
                 // p1: { p1_pointee, local }
                 // p2: { p2_pointee, local }
  return p1; // rval: { p1_pointee, local }
}

TODO: Explain. Also mention how this is an example where we have two pointees on the left hand side, so we can't eliminate existing pointees from p1 and p2.

Function calls

Here is how we handle function calls:

  1. Create a mapping from callee lifetimes to points-to sets. For each variable lifetime that occurs in the callee's parameter list, find the union of all points-to sets in those argument positions to yield a mapping from lifetimes to points-to sets.
  2. Propagate points-to sets to output parameters. For each lifetime 'l in an invariant argument position, replace the argument's existing points-to set with the points-to set established for 'l in Step 1.
  3. Step 3: Determine points-to set for the return value. If the return value is of reference-like type with lifetime 'l, find the points-to set established for 'l in Step 1; this becomes the points-to set for the call expression's value.

If the 'static lifetime occurs in output parameters (i.e. in invariant position) or in the return value, the callee may be returning references to pointees that do not occur as inputs to the callee. Therefore, when we encounter the 'static lifetime in these positions, we create new pointees for the corresponding outputs.

Here is an example of how this works:

void copy_ptr(int *'x from, int *'x *'y to) {
  *to = from;
}

int * get_lesser_of(int * arg1, int * arg2) {
  // arg1_pointee (int): '1
  // arg2_pointee (int): '2
  // arg1: { arg1_pointee }
  // arg2: { arg2_pointee }
  int* result = arg2;
    // result: { arg2_pointee }
  if (*arg1 < *arg2) {
    copy_ptr(arg1, &result);
      // &result: { result }
      // 'x pointees: { arg1_pointee, arg2_pointee }
      // result: { arg1_pointee, arg2_pointee }
  }
  return result;
}

TODO: Continue exposition

Virtual member functions

Inferring lifetimes for virtual member functions is complicated by two factors:

  • The lifetimes of the base class member function are constrained by the lifetimes of all of its overrides.
  • The definitions of the overrides and the base class function (if it is not pure virtual) are typically contained in different translation units, and we plan to analyze each translation unit individually.

For more details, see this section in the lifetime annotation specification.

We will describe an approach that can infer and update lifetimes for virtual member functions progressively, as each translation unit is processed.

If a translation unit contains definitions for multiple overrides, or if it contains the definition of the the base class function and at least one override, we analyze these definitions in topological order from base class to more derived class.

If the definitions are contained in different translation units, we effectively process them in the same order because we analyze dependencies of a library before analyzing the library itself, and libraries containing derived classes generally depend on the library containing the base class.

TODO: The description above implicitly assumes we're talking about the initial change where we add lifetimes across the codebase. Discuss also how this applies when people are editing code.

When we encounter the definition of a virtual member function (whether it is the base class implementation or an override), we first perform lifetime inference on its implementation, as for any other function, and update the declaration of the member function in its containing class.

If the function is an override, call it Derived::f, we then update the lifetimes of every base class function Base::f that it overrides. (There may be several if there is a chain of overrides.) We do so as follows:

  • If the declaration of Base::f does not yet contain any lifetime annotations, annotate it with the lifetimes of Derived::f. Because we process base class functions before derived class functions, this case can only occur if Base::f is pure virtual.
  • If the existing lifetimes of Base::f are more permissive than the lifetimes inferred for Derived::f, perform lifetime substitutions on the lifetimes of Base::f until they are at most as permissive as those of Derived::f.
  • If the existing lifetimes of Base::f at most as permissive as the lifetimes inferred for Derived::f, do nothing.

TODO: Can we ever get caught in a situation where neither the second nor the third point above applies? I think we'll always be able to restrict the lifetimes of Base::f until they're compatible with Derived::f, but this needs a formal argument.

TODO: Discuss how the lifetime changes affect callers – may need to process them again.

TODO: Show an example

Templates

Templates pose a specific challenge to lifetime analysis:

  • Reference-like types may occur in the template itself as well as in template arguments and dependent types.
  • For reference-like types that occur in the template, we wish to infer and check lifetimes on the template itself to the greatest extent possible. This reflects the fact that, even though C++ templates are not really generics, they are often used as if they were. However, the semantics of C++ templates pose two difficulties here:
    • Templates may be specialized, and we must be careful not to apply the lifetimes inferred on the primary template to the specialization.
    • The inferred lifetimes and the lifetime correctness of a template may, in general, depend on the template arguments, even if the template arguments and dependent types do not contain any reference-like types. We show an example of this below.

The lifetime annotation specification defines what the semantics of lifetimes on templates should be but does not say how they should be implemented. That is the purpose of this section.

Example scenarios

Before we discuss generally how we will analyze templates, let us look at some scenarios that may occur.

As an example of why we want to be able to analyze templates themselves, let’s take a look at part of a simplified implementation of std::vector:

template <class T>
class vector {
public:
  vector(const vector& other);

  T* $a begin() $a { return data_; }
  T* $a end() $a { return data_ + size_; }

private:
  T* data_;
  size_t size_;
};

We should be able to infer the lifetimes of begin() and end() from the template itself. These member functions operate only on pointers to T, and the lifetime behavior of a pointer to T is independent of the type T itself2.

On the other hand, we cannot infer the lifetimes of the copy constructor. It calls the copy constructor of T, and as explained in the lifetime annotation specification, copy and move operations can have two different lifetime signatures.

Here is another example of how lifetimes can depend on a template argument:

template <int i>
int* return_ith(int* i0, int* i1) {
  if (i == 0) {
    return i0;
  } else {
    return i1;
  }
}

This example is contrived, but it is certainly not implausible that a trait argument could affect behavior in a similar way.

While these examples do show the limitations of lifetime analysis on templates, we likely won’t need to do anything subtle to detect them within the analysis. In the case of the copy constructor of vector, we will notice when calling the copy constructor of T that we’re doing member lookup on a dependent type and that we can’t continue the analysis. In the case of return_ith(), we will be able to analyze the function, but we will conclude that the lifetimes of all pointers involved are the same. This is more restrictive than the result we would obtain if we analyzed a template instantiation, but this limitation may be acceptable.

General approach

The constraints described above imply that lifetime analysis of templates need to proceed in two phases:

  • Analysis of the template itself. We first attempt to infer lifetimes on the template itself, as well as any partial or full specializations, to the extent that the lifetimes do not depend on template arguments. If the inferred lifetimes are different from the function’s current (possibly elided) lifetimes, we generate a corresponding annotation. If we cannot infer lifetimes for the function, we annotate all lifetimes on the function as unsafe. This is required to distinguish this case from the situation where we were able to infer lifetimes and those lifetimes are elided.

    TODO: Is there any alternative to marking the lifetimes unsafe? This isn't what we usually use unsafe lifetimes for, but I also don't really want to invent yet another syntax.

    Performing lifetime analysis on the template itself, rather than only on instantiations, serves two purposes: a) It documents the lifetimes in the code, and b) it saves us from having to analyze every instantiation in cases where the lifetimes don’t depend on template arguments.

  • Analysis of template instantiations. In the following situations, we infer lifetimes on a function template instantiation or member function of a class template instantiation that is called in the translation unit we are analyzing:

    • If the template itself contains reference-like types but does not provide lifetimes for these.
    • If the template arguments contain reference-like types.

    We use the inferred lifetimes when performing lifetime analysis on the callers of these functions, but we obviously cannot produce annotations for these inferred lifetimes.

As discussed in the lifetime annotation specification, any lifetimes in a template argument should be propagated to all uses of the argument. Clang does not provide a built-in mechanism for this, so this needs to be done in the lifetime analysis code.

Verifying lifetime correctness

TODO

Generating error messages

If we detect that there is a lifetime error – either because a function is returning a reference to a local or because there is a lifetime error inside the function – we want to produce an easily comprehensible error message that explains the error.

TODO: Explain how

Alternative considered

We previously considered an alternative approach that built a set of constraints between lifetimes involved in the function. Unfortunately, this approach produced wrong results on some fairly simple examples involving variable overwrites. A coworker identified a way to extend the approach in a way that overcame many of these limitations, but this extension introduced additional complexity. In the end, we decided that the approach based on points-to-sets was the simpler alternative.

Differences from Rust

The exclusivity rule

The borrow checker in Rust, in addition to checking lifetimes, also enforces the exclusivity rule: at any given time the program may have either one mutable reference or any number of immutable references to the same storage location.

The exclusivity rule protects against certain kinds of memory safety errors. For example, if it was applied to C++, it would catch the use after free here:

int test() {
  std::vector<int> xs;
  xs.push_back(10);
  const int &x0 = xs[0]; // `x0` borrows `xs` here.
  xs.push_back(20); // exclusivity error: `xs` is mutably borrowed here,
                    // overlapping with the `x0` borrow.
  return x0; // `xs` is borrowed by `x0` at least until here
             // because `x0` is used here.
}

Most C++ iterator invalidation bugs could be prevented by enforcing exclusivity: while there are outstanding iterators that borrow the container, the container can't be mutated.

From our experience porting woff2 from C++ to Rust, adjusting existing code to follow the exclusivity rule is one of the most difficult steps in porting. Therefore, it makes sense to separate rolling out lifetime checking from exclusivity checking. Lifetime checks without exclusivity checks don't guarantee memory safety, but they catch memory safety issues on their own, and should not require many adjustments to C++ code.

Exclusivity checking could be rolled out in an optional second step. This would not only provide additional memory safety to the C++ code but would facilitate a manual or automatic conversion of C++ code to Rust.

Spatial memory safety

Lifetime verification does not establish spatial memory safety, that is, it does not prove that all accesses are in bounds. Rust collections perform these checks at runtime.

Notes

1

The following documents provide more background material: doc 1, doc 2.

3

Unless converting constructors and conversion constructors are used to simulate variance.

2

Note, however, that Clang is currently very conservative in assigning types to type-dependent expressions.

Struct Layout

C++ (in the Itanium ABI) extends the C layout rules, and so repr(C) isn't enough. This pages documents the tweaks to Rust structs to give them the same layout as C++ structs.

In particular:

  • C++ classes and Rust structs must have the same alignment, so that references can be exchanged without violating the alignment rules. This is usually ensured by the regular #[repr(C)] layout algorithm, but sometimes the interop tool needs to generate explicit #[repr(align(n))] annotations.
  • C++ classes and Rust structs must have the same size, so that arrays of objects can be exchanged.
  • Public subobjects must have the same offsets in C++ and Rust versions of the structs.

Non-field data

Rust bindings introduce a __non_field_data: [MaybeUninit<u8>; N] field to cover data within the object that is not part of individual fields. This includes:

  • Base classes.
  • VTable pointers.
  • Empty struct padding.

Empty Structs

One notable special case of this is the empty struct padding. An empty struct or class (e.g. struct Empty{};) has size 1, while in Rust, it has size 0. To make the layout match up, bindings for empty structs will always enforce that the struct has size of at least 1, via __non_field_data.

(In C++, different array elements are guaranteed to have different addresses, and also, arrays are guaranteed to be contiguous. Therefore, no object in C++ can have size 0. Rust, like C++, has only contiguous arrays, but unlike C++ Rust does not guarantee that distinct elements have distinct addresses.)

Potentially-overlapping objects

In C++, in some circumstances, the requirement that objects do not overlap is relaxed: base classes and [[no_unique_address]] member variables can have subsequent objects live inside of their tail padding. The most famous instance of this is the empty base class optimization (EBCO): a base class with no data members is permitted to take up zero space inside of derived classes.

NOTE: This has other, non-layout consequences for Rust: for example, it is not safe to obtain two &mut references to overlapping objects, unless they are of size 0. (To prevent this, classes that might be base classes are always !Unpin.)

This is impossible to represent in a C-like struct. (Indeed, it's impossible to represent even in a C++-like struct, before the introduction of [[no_unique_address]]). Therefore, in Rust, we don't even try: potentially-overlapping subobjects are replaced in the Rust layout by a [MaybeUninit<u8>; N] field, where N is large enough to ensure that the next subobject starts at the correct offset. The alignment of the struct is still changed so that it matches the C++ alignment, but via #[repr(align(n))] instead of by aligning the field.

Example

For example, consider these two C++ classes:

// This is a class, instead of a struct, to ensure that it is not POD for the
// purpose of layout. (The Itanium ABI disables the overlapping subobject
// optimization for POD types.)
class A {
  int16_t x_;
  int8_t y_;
};

struct B final : A {
  int8_t z;
}

In memory, this may be laid out as so:

| x_ | x_ | y_ | z |
 <------------> <->
  A subobject  | B
<------------------>
  sizeof(A)
  (also sizeof(B))

The correct representation for B, in Rust, is something like this:

#[repr(C)]
#[repr(align(2))] // match the alignment of the int16_t variable.
struct B {
  // The We don't use a field of type `A`, because it would have a size of 4,
  // and Rust wouldn't permit `z` to live inside of it.
  // Nor do we align the array, for the same reason -- correct alignment must be
  // achieved via the repr(align(2)) at the top.
  __non_field_data : [MaybeUninit<u8>; 3];
  pub z: i8,
}

Thunks for class template member functions

Problem definition

Given the C++ header below...

#pragma clang lifetime_elision

template <typename T>
class MyTemplate {
 public:
  MyTemplate(T value) : value_(value) {}
  const T& GetValue() const;
 private:
  T value_;
};

using MyIntTemplate = MyTemplate<int>;

... Crubit will generate Rust bindings that can call into the MyTemplate<int>::GetValue() member function. To support such calls, Crubit has to generate a C++ thunk (to instantiate the class template and to provide a symbol for a C-ABI-compatible function that Rust can call into):

extern "C"  // <- C ABI
int const& __rust_thunk___ZNK10MyTemplateIiE8GetValueEv(
    const class MyTemplate<int>* __this) {
  return __this->GetValue();
}

There are other (non-template-related) scenarios that require generating thunks (e.g. inline functions, or functions that use a custom calling convention), but templates bring one extra requirement: a class template can be defined in one header (say my_template.h) and used in multiple other headers (e.g. library_foo/template_user1.h and library_bar/template_user2.h). Because of this, the same thunk might need to be present in multiple generated ..._rs_api_impl.cc files (e.g. in library_foo_rs_api_impl.cc and library_bar_rs_api_impl.cc). This may lead to duplicate symbol errors from the linker:

ld: error: duplicate symbol: __rust_thunk___ZNK10MyTemplateIiE8GetValueEv

Implemented solution: Encoding target name in the thunk name

One solution is to give each of the generated thunks a unique, target/library-specific name, e.g.: __rust_thunk___ZNK10MyTemplateIiE8GetValueEv__library_foo (note the library_foo suffix).

Pros:

  • Minimal extra code complexity (e.g. no need for templates-specific code in thunk-related code in src_code_gen.rs).
  • Obviously correct behavior-wise (e.g. since it is just like other thunks which we assume are implemented correctly).

Cons:

  • Performance guarantees are unclear. Binary size depends on link time optimization (LTO) recognizing that all the thunks are identical and deduplicating them.

    • This seems to work in practice (at least for production binaries).
    • Future work: add tests + consider asking LLVM to provide LTO guarantees
  • Requires escaping Bazel target names into valid C identifiers. See ConvertToCcIdentifier(const BazelLabel&) in bazel_types.cc.

Alternative solutions

Function template

An alternative solution would be to use a function template that we immediately explicitly instantiate. These still generate the code we need, but their duplicated symbol definitions (across multiple binding crates) won't cause an ODR violation. It is expected that a single function template is instantiated multiple times in multiple translation units, therefore the linker silently merges these equivalent definitions.

Example:

// Thunk is expressed as a function template:
template <typename = void>
__attribute__((__always_inline__)) int const&
__rust_thunk___ZNK10MyTemplateIiE8GetValueEv(
    const class MyTemplate<int>* __this) {
  return __this->GetValue();
}

// Explicit instantiation of the function template:
// (to generate a symbol that `..._rs_api.rs` can call into)
template int const& __rust_thunk___ZNK10MyTemplateIiE8GetValueEv(
    const class MyTemplate<int>* __this);

Pros:

  • Naturally deduplicated (just depending on what C++ already does for function templates).

Cons:

  • Assumes a particular ABI - a function template specialization uses the calling convention prescribed by the platform C++ ABI. We know that the Itanium ABI maps C++ sigatures to the C ABI and therefore will be compatible with the calling convention expected by the generated ..._rs_api.rs. Further research is needed to investigate the guarantees offered by other platforms (e.g., the MSVC ABI).
  • Requires extra complexity to calculate the mangled name of the function template specialization.
    • Crubit doesn’t have a clang::FunctionDecl corresponding to the function-template-based thunk, and therefore Crubit can’t use clang::MangleContext::mangleName to calculate the linkable/mangled name of the thunk.
    • Reimplementing clang::MangleContext::mangleName in Crubit seems fragile. One risk is bugs in Crubit's code that would make it behave differently from Clang (e.g. code review of the initial prototype identified that mangling compression was missing). Another risk is having to implement not just ItaniumMangleContext, but also MicrosoftMangleContext.
    • One idea to avoid reimpliementing mangling is to explicitly specify the name for the function template instantiation using __asm__("abc") (sadly this doesn't seem to work - it may be a Clang bug).

An abandoned prototype of this approach can be found in a (Google-internal) cl/450495903.

Explicit linkonce_odr attribute

Example:

extern "C"
int const& __rust_thunk___ZNK10MyTemplateIiE8GetValueEv(
    const class MyTemplate<int>* __this)
    __attribute__((linkonce_odr))  // <- THIS IS THE PROPOSAL
{
  return __this->GetValue();
}

Pros:

  • All the "pros" of the "Encoding target name in the thunk name" approach (simplicity + correctness of behavior)
  • All the "pros" of the "Function template" template approach (deduplication)

Cons:

  • Requires changing Clang to support the new attribute (e.g. requires convincing the Clang community that this is a language extension that is worth supporting). TODO(b/234889162): Send out a short RFC to gauge interest?

Rejected solutions

  • selectany doesn't work with functions, only data members. Furthermore, we need something that maps to linkonce_odr, and selectany maps only to linkonce.

  • __attribute__((weak)) has the disadvantage that a weak definition can be overridden by a strong one. This rule makes weak definitions non-inlineable except in full-program LTO. C++ function template instead follows the ODR rule that says that all definitions must be equivalent, making them inlineable.

Unpin for C++ Types

SUMMARY: A C++ type is Unpin if it is Rust-movable (e.g., a trivial type, or a nontrivial type which is [[clang::trivial_abi]]). Any such type can be used by value or plain reference/pointer in interop, all non-Unpin types must instead be used behind pinned pointers and references.

A C++ type T is Unpin if it is known to be a Rust-movable type (move+destroy is logically equivalent to memcpy+release).

Unpin C++ types can be used like any other normal Rust type: they are always safe to access by reference or by value. Non-Unpin types, in contrast, can only be accessed behind pins such as Pin<&mut T>, or Pin<Box<T>>, because it may not be safe to directly mutate. These types are never used directly by value in Rust, because value-like assignment has incorrect semantics: it fails to run C++ special members for non-Rust-movable types.

Note that not every object with an Unpin type is actually safe to hold in a mutable reference. Objects with live aliases still must not be used with &mut, and "potentially overlapping objects" can produce unexpected behavior in Rust. (See Reference Safety.)

Rust-movable types

In C++, moving a value between locations in memory involves executing code to either initialize (move-construct) or overwrite (move-assign) the new location. The old location still exists, but is in a moved-from state, and must still be destroyed to release resources.

(For example, std::string x = std::move(y); will run the move constructor, so that x contains the same value that y used to have before the move. The variable y will still be a valid string, but might be empty, or might contain some garbage value. The destructors for both x and y will run when they go out of scope.)

Rust does not have move constructors or move assignment. In fact, there is no way to customize what happens during moving or assignment: in Rust, moving or swapping an object means changing its location in memory, as if by memcpy without running the destructor logic in the old location. Another way of looking at it is that it's as if an object moved around in memory over time: it is constructed in one place, and then further operations and eventual destruction might happen in other places. This is a Rust move.

Despite C++ moves using explicit construction and destruction calls, many C++ types could also have used the Rust movement model. We call such types Rust-movable types.

For example, a C++ std::unique_ptr, implemented in the obvious way, is Rust-movable: its actual location in memory does not matter. In contrast, a self-referential type is not Rust-movable, because to move it, you must also update the pointer it has to itself. This is done inside the move constructor in C++, but cannot be done in the Rust model, where the move operation is not customizable.

Which types are Rust-movable?

For the purpose of Rust/C++ interop, we define a type to be Rust-movable if, and only if, it is "trivial for calls" in Clang. That is, either:

  1. It is actually trivial, or
  2. It uses [[clang::trivial_abi]] to make itself trivial for calls

This definition is conservative: some types that could be considered Rust-movable are not trivial for calls. (For example, std::unique_ptr uses [[clang::trivial_abi]] only in the unstable libc++ ABI; the stable libc++ ABI predates this attribute, and adding it now is ABI-breaking.)

This definition is, however, sound: all types which are trivial for calls are Rust-movable, because a type which is trivial for calls is Rust-moved when passed by value as a function argument.

Expanding Rust-movability

C++26 introduces a concept called "trivial relocation" and "trivially relocatable types". These are types which have an alternate relocation operation that does not throw exceptions or run the move constructor or destructor. Ideally, a type would be Rust-movable if and only if it is trivially relocatable, replaceable, and trivial relocation is tantamount to a memcpy. (For example, perhaps T is Rust-movable if and only if any union containing T is trivially relocatable.)

TODO: This is a work in progress.

Reference Safety

Not every object with an Unpin type can actually safely be pointed to by a Rust reference.

Conventional aliasing

If a C++ reference mutably aliases, it is unsafe to pass to Rust as a Rust reference. Do not under any circumstance create aliasing Rust references, the behavior of doing so is undefined.

For example:

#![allow(unused)]
fn main() {
pub fn foo(_: &mut i32, _: &mut i32) {}
}

It is Undefined Behavior to, in C++, call foo(x, x).

Tail padding

In C++, tail padding is not part of the object, and the space in the tail padding can be taken up by other unrelated objects. Avoid creating a Rust reference to a base class, or to a [[no_unique_address]] field, as these are "potentially overlapping". This can cause surprising behavior, or unintended aliasing and undefined behavior.

Consider the following struct:

struct A {};
struct B {
    [[no_unique_address]] A field_1_;
    char field_2_;

    A& field_1() { return field_1_; }
    char& field_2() { return field_2_; }
};

Here, while sizeof(A) is 1, it has no data, only tail padding. A C++ assignment to field_1_ will not write anything. And so C++ can store an unrelated object inside of the tail padding. [[no_unique_address]] marks the tail padding as available for use. field_2_ may actually be stored inside the tail padding of field_1_, and the sizeof(B) may also be 1.

(Base classes also allow their tail padding to be reused, and the same example works with struct B : A.)

static_assert(sizeof(A) == sizeof(B));
static_assert(offsetof(B, field_1) == offsetof(B, field_2));

Rust does not work this way. In Rust, tail padding is part of the object. Rust references refer to the full span of the pointed-to object, including that tail padding. And so a Rust reference to field_1_ would encompass field_2_ by accident.

This means that the following code has undefined behavior via conventional aliasing, despite looking fairly innocent:

B b = ...;
// Rust: pub fn foo(_: &mut A, _: &mut u8)
foo(b.field_1, b.field_2); // C++

And the following Rust code would perform unintended mutations to field_2:

#![allow(unused)]
fn main() {
let mut b1: B = ...;
let mut b2: B = ...;
// This actually swaps field_2!
std::mem::swap(&mut b1.field_1(), &mut b2.field_1());
}

C++20

In C++17 and earlier, there was only one way to create a potentially-overlapping object: inheritance ("EBO"). Making inheritable types non-Unpin could have removed or mitigated the risk of overlapping objects in C++17 and below.

However, as of C++20, any object can alias another in the tail padding. C++20 introduced [[no_unique_address]], which makes tail padding available for reuse for any type. Since [[no_unique_address]] may be used fairly extensively in library code (it has no negative effects in C++), we can't assume that it does not exist.

In modern C++, final types are not much safer than other types. One must be careful when creating Rust references, to ensure that those Rust references do not contain data in their tail padding, or otherwise alias, and there is no way to guarantee this at the type level.