Macros, Safety, and SOA
About a year ago I wrote soa-rs
, a structure-of-arrays (SOA) crate for Rust that makes extensive use of procedural macros and unsafe
. In this post, I want to reflect on my experience with those tools. Much that I have to say mirrors Chad Austin's recent Unsafe Rust Is Harder Than C, which perfectly articulates the unsafe
experience:
The result of my pain is a safe, efficient API.
Background
SOA is a way of organizing linear data for efficient access on modern computers. Consider this type:
struct Foo(u8, u64);
let foos = vec![Foo(0, 1), Foo(2, 3)];
Because of memory alignment requirements, each Foo
takes 128 bits of space, leaving 56 bits to waste. Further, suppose you iterate over many elements but only access the u8
field. Since RAM serves data by the cache line, 15 of every 16 bytes transferred is thrown out, forcing the CPU to wait for memory.
SOA addresses this by storing each struct field as a separate array. That way, elements are tightly packed. No space is wasted to padding, and during iteration, every cache line byte goes to use. Here's what it looks like using soa-rs
:
use soa_rs::{Soa, Soars, soa};
#[derive(Soars)]
struct Foo {
bar: u8,
baz: u64,
}
fn main() {
let soa = soa![
Foo { bar: 1, baz: 2 },
Foo { bar: 3, baz: 4 },
];
for mut foo in &mut soa {
*foo.bar *= 2;
}
assert_eq!(foo.bar(), [2, 6]);
}
Design
Several SOA crates have existed since before soa-rs
. These crates generate a unique container type for each SOA struct. Generated code is tedious to develop and maintain, and each derive
produces duplicate code, slowing down compilation. Instead, soa-rs
generates only essential, low-level routines, moving most of the implementation to a generic container type, Soa
.
The most popular SOA crate creates a Vec
for each field, multiplying the overhead of allocation and capacity tracking by the number of fields. Soa
minimizes overhead by using one allocation for the collection. soa-rs
spends most of its unsafe
in generated code that manages pointers into this allocation for each field.
Comparison to Zig
Zig is the poster child of this data structure. Here's the example from above in Zig:
const std = @import("std");
const assert = std.debug.assert;
const Foo = struct {
bar: u8,
baz: u32,
};
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
const alloc = gpa.allocator();
var soa = std.MultiArrayList(Foo){};
defer soa.deinit(alloc);
try soa.append(alloc, .{ .bar = 1, .baz = 2 });
try soa.append(alloc, .{ .bar = 3, .baz = 4 });
for (soa.items(.bar)) |*bar| {
bar.* *= 2;
}
assert(std.mem.eql(u8, soa.items(.bar), &[_]u8{ 2, 6 }));
}
Unlike Rust, Zig doesn't need a macro to do this, instead using its compile-time reflection system. MultiArrayList
works with all structs, not just specially marked ones, so you can even use it with types from other libraries. Not only that, MultiArrayList
also supports enums, storing small enum variants in separate arrays from large ones. It does all this in just 481 lines, compared to 3498 in soa-rs
. It's superior by just about any standard of comparison. I have at times looked jealously at the ergonomics of raw pointers by default and the insane leverage of comptime. Hopefully one day we get reflection.
Macros
Procedural macros are tedious to write. Even with the aid of rust-analyzer
and cargo-expand
, it's difficult to diagnose generated code that fails to compile. Crucially, while the compiler will report a missing comma or lifetime violation, it won't point out the offending code. We love Rust for its error messages being helpful, friendly, and specific. In contrast, macro errors are vague and frustrating, forcing you to wade through the generated code for syntactic and semantic mistakes.
Worse still is code that compiles but runs incorrectly. Ordinarily you might reach for a debugger to step through the code, but macro code offers no such affordance. Unless you can step through assembly and somehow deduce the macro code it comes from, println!
is your only option.
Complex procedural macro authorship would benefit massively from better compiler diagnostics and macro source maps. A line of generated code represents easily an order of magnitude greater maintenance burden than as much regular code, but tooling could go a long way to bringing them in line.
Safety
Before writing soa-rs
, I believed that unsafe
is difficult largely for the same reasons as C code. malloc
, free
, manual lifetime management, and so on. That is false. Unsafe Rust is much harder, peppered as it is with myriad requirements and pitfalls. These requirements are scattered between std
documentation, the Rustonomicon, and tribal knowledge.
In a matter of hours from publishing soa-rs
, Steffahn appeared in my GitHub issues with two soundness bugs, each requiring significant API redesign. (I've since learned this is a shared experience for unsafe authors.) Even with Miri checking over my work and combing through the code for any issues I knew to look for, I still had major blind spots that an experienced programmer could easily unearth. The biggest problem with unsafe is the unknown unknowns — the fearsome gremlin of undefined behavior dwells in a hundred dark corners you mightn't know to check.
In a lesser language I'm always a little on edge, never quite assured that I've covered all my tracks. In Rust, the type system provides the structure I need to code with confidence. Unsafe Rust is quite the opposite experience. You have to be on your guard at every step. In exchange, you get to share a zero-cost interface that cannot be misused. It's difficult, but no other language rewards your efforts quite the same.
Beware swap
The original SOA design looked like this:
pub struct Slice<T: Soars> {
len: usize,
raw: T::Raw,
}
pub struct Soa<T: Soars> {
cap: usize,
slice: Slice<T>,
}
impl<T> Deref for Soa<T: Soars> {
type Target = Slice<T>;
fn deref(&self) -> &Self::Target {
&self.slice
}
}
In this setup, most of the implementation work goes into Slice
. Soa
adds allocating methods like push
and pop
, but also gets all the methods from Slice
for free via Deref
and DerefMut
implementations. This is the same way that Vec
adopts slice methods. However, this runs into an issue:
let a = soa![Tuple(0)];
let b = soa![];
std::mem::swap(a.as_mut_slice(), b.as_mut_slice());
a.push(Tuple(0)); // segfault!
By exposing a reference to the struct field, it's now possible to exchange the contents of two containers without updating the capacity to match. In this example, a
thinks it has capacity to push an element, but contains a slice with no allocated capacity, causing the segfault. Vec
doesn't have this issue because it dereferences to the unsized type [T]
, which can't be passed to mem::swap
. We can make Slice
unsized like so:
pub struct Soa<T: Soars> {
cap: usize,
len: usize,
slice: Slice<T, ()>,
}
pub struct Slice<T: Soars, D: ?Sized = [()]> {
raw: T::Raw,
dst: D,
}
The length field has moved to Soa
, while Slice
sports a new generic, D
. When D
uses the default generic parameter, [()]
, the struct becomes unsized. That way, when we hand users an &mut Slice<T>
, they won't be able to mem::swap
it. It also makes the struct dynamically sized, meaning that references to it have the same metadata as [()]
itself. That allows us to store the slice length as part of the reference metadata. This isn't really necessary, but it's kind of neat that it works.
Since dynamically sized types can't be stored in a sized type, we can also set D
to ()
when storing it in the Soa
. This is also handy for SliceRef
, an owned type that acts like &Slice
.
pub struct SliceRef<'a, T: 'a + Soars> {
len: usize,
slice: Slice<T, ()>,
marker: PhantomData<&'a T>,
}
SliceRef
gives us the semantics of &Slice
in a type with owned fields by storing a phantom reference, which tells the type system that we're borrowing data without our actually doing so. Whereas &Slice
is necessary for implementing Deref
, SliceRef
is necessary to support slicing (as in &foo[1..5]
). Usually you can't return a reference unless it's tied to something that outlives the function, which for Soa
means referencing a struct field. Native slices are special; you can create one with slice::from_raw_parts
and immediately return it from a function. Whereas &Slice
points to slice information that has to live somewhere, &[T]
is itself the slice information. SliceRef
works similarly.
Beware inner mutability
Suppose you have this struct:
struct Struct(u8, u8);
Since SOA stores the fields separately, a reference to a slice element looks like this:
struct StructRef<'a>(&'a u8, &'a u8);
If Struct
implements a trait, we'd like for StructRef
to also implement the trait. To solve this issue I concocted with_ref
, which takes a function and calls it with a Struct
reference. The macro implements this by bitwise copying each field from StructRef
to recreate the Struct
on the stack, then calling the provided function with a reference to that Struct
.
pub trait WithRef {
type T;
fn with_ref<F, R>(&self, f: F) -> R
where
F: FnOnce(&Self::T) -> R;
}
impl WithRef for StructRef {
type T = Struct;
fn with_ref<F, R>(&self, f: F) -> R
where
F: FnOnce(&Self::T) -> R
{
use std::ptr::{read, from_ref};
let t = Struct(
unsafe { read(from_ref(self.0)) },
unsafe { read(from_ref(self.1)) },
);
f(&t);
}
}
This way, you could make a SOA slice hashable using T
's implementation of Hash
, for example.
impl<T: Soars + Hash> Hash for Slice<T> {
fn hash<H: Hasher>(&self, state: &mut H) {
self.len().hash(state);
for el in self.iter() {
el.with_ref(|el| el.hash(state))
}
}
}
This all seemed fine to me, and Miri had no complaints either. Since WithRef
only calls f
with an non-mut
reference to the stack, you couldn't mutate anything, so there's no worry you'll affect the Soa
contents. Right?
Right?
Well, I forgot to consider inner mutability. Here's how to expose the issue:
#[derive(Soars)]
struct Foo(RefCell<String>);
let soa = soa![Foo(RefCell::new(String::default()))];
// Takes ownership of the Foo that with_ref created on the stack,
// but soa also still owns that memory.
let r = soa.idx(0).with_ref(|x| mem::take(x.0.borrow_mut()));
// Double free when r and soa are dropped.
Even if it's normally safe to think of a shared reference as an immutable reference, they aren't always the same. This is the kind of thing that makes unsafe so hard to get right. You have to consider how your code interacts with all of Rust's myriad constructs, and it's easy to overlook some particular case where an errant bit of safe code comes along to ruin your day.
Sadly, this kills the ability to implement a bunch of traits for SOA slices without some additional effort by the user, since we can't leverage the existing implementations on T
for equality, ordering, equality, debug printing, or cloning. To help the situation, you can use #[soa_derive(Clone, Debug, ...)]
to add derive implementations for SOA types, but it's not quite the same. If your implementations are more complex than simple derive
s, you'll have you maintain a copy of each for Soars::Ref
and Soars::RefMut
. I regret having to sacrifice API design to satisfy something of a corner case usage.
Paper cuts
Const is limited
Having finished analogs for Vec
, &[T]
, and &mut [T]
, I really wanted to round out the set with [T; N]
and make compile-time SOA arrays that don't require allocation. This was hard enough that I gave up a few times, but being preternaturally stubborn, I eventually got this working:
#[derive(Soars)]
#[soa_array]
struct Foo(u64, u8);
const FOOS: FooArray<2> = FooArray::from_array([
Foo(1, 2),
Foo(3, 4),
]);
assert_eq!(FOOS.0, [1, 3]);
assert_eq!(FOOS.1, [2, 4]);
However niche the use case and however baroque the code to make this work, it's neat that this kind of thing is even possible. For the adventurous, this is roughly what the code looks like:
const fn from_array<T, const N: usize>(array: [T; N]) -> Self {
// We're taking ownership of the array contents,
// but Rust can't tell because it's pointer stuff.
let array = ManuallyDrop::new(array);
// Get a reference to the array.
let array = unsafe {
&*ptr::from_ref(&array).cast::<[T; N]>()
};
Self {
foo: {
let mut field = [const { MaybeUninit::uninit() }; N];
let mut i = 0;
while i < N {
let src = ptr::from_ref(&array[i].bar);
let read = unsafe { src.read() };
field[i] = MaybeUninit::new(read);
i += 1;
}
unsafe { transmute_copy(&field) }
},
// snip
}
}
Being able to call trait methods in const
contexts would be really useful. I can understand why this is a difficult problem, but it does hamper ergonomics considerably. For example, you can't use indexing or for
loops because they rely on the Index
and Iterator
traits to work. I had to use some particularly weird casts because Deref
isn't available.
Unstable
Occasionally, there are features and optimizations available to std
implementers that aren't for library authors. For example, I would like to implement try_fold
for my iterator type, but from what I can tell, this simply isn't possible. One of the function generics is R: Try<Output = B>
, and since Try
is unstable, you can't name the type when you go to write the function. It frustrating sometimes to see that a type is right there and in use but out of reach. All the NonNull
stabilizations in 1.80 were exciting to see, having missed them during development. I delight in finding refactor opportunities when new functions are stabilized. Here's hoping for [MaybeUninit<T>; N]::transpose
to replace transmute_copy
in my from_array
function.
Index trait
It's a bit unfortunate that certain traits like Index
return &Self::Output
instead of just Self::Output
. I can't implement this type for Slice
because I need to return SliceRef
instead of a reference. I assume this constraint is because we didn't have GATs when these traits were being developed. Today we could write
trait Index {
type Output<'a>: ?Sized where Self: 'a;
// instead of
type Output: ?Sized;
}
impl<T> Index for Vec<T> {
type Output<'a> = &'a [T] where Self: 'a;
// instead of
type Output = [T];
}
Because of this, users have to write soa.idx(i)
instead of soa[i]
. Not the end of the world, but more flexibility would be nice in a few places.
Special cases
Tuple structs, unit structs, and named field structs each have slightly different syntax that require special handling in macro code. For example, here's how you create a struct definition.
match kind {
FieldKind::Named => quote! {
#ident { #(#field: #ty),* }
},
FieldKind::Unnamed => quote! {
#ident ( #(#ty),* );
},
FieldKind::None => quote! {
#ident;
}
}
This is inconvenient given how many places these differences need to be accounted for. However, tuple structs have this nicety:
struct Tuple(u8, u8);
let tuple = Tuple { 0: 42, 1: 7 };
You can treat tuple structs like named field structs for the purpose of initialization, so you don't need to handle them differently. I wish this feature were available in more areas.
Crate setup
Generated code has no knowledge of its context. It doesn't know what crate it's in, what symbols are imported, or whether you use std as cursed;
. Therefore, you typically refer to symbols by their absolute path, such as ::std::vec::Vec
. There is just one wrinkle: you cannot refer to the current crate this way. That is, you can't say ::my_crate::symbol
from within my_crate
. In macros by example you can use $crate
, in procedural macros you cannot. Because of that, you are unable to invoke your macro in my_crate
if the macro itself uses ::my_crate::symbol
. That means you need a separate crate just for tests, on top of the ones for your macro and your library. This isn't the biggest deal, but it's one of those annoying administrative things that's foisted upon anyone learning proc macros for the first time.
Tricks
Testing compile failures
Sometimes you want to test that a piece of code doesn't compile. I do this to ensure that certain types aren't Send
or Sync
, for example. The only way to do that is by writing a doctest with the compile_fail
annotation:
/// ```compile_fail
/// let number: u32 = "string";
/// ```
mod compile_fail_tests {}
This tests that compilation failed, but not why it failed. I've made the mistake of writing one of these to test one thing, but compilation actually failed because I forgot a semicolon. To make sure you're testing the right thing, compile the failing code and make note of the error code:
error[E0308]: mismatched types
Then, modify the doctest to specify the error code you expect.
/// ```compile_fail,E0308
Testing readme code
If you have a readme with Rust code that you want to ensure compiles, you can do this:
#[doc = include_str!("../README.md")]
mod readme_tests {}
This is equivalent to including the readme as a ///
documentation comment, so Rust will include it with the rest of your doctests.
Conclusion
However much I have to critique, unsafe Rust is uniquely rewarding. Nowhere else can I wrap a bundle of hairy, low-level, error-prone code behind a zero-cost interface that's impossible to misuse. Navigating the difficulties of unsafe
makes me appreciate the safe subset of Rust all the more. Safe Rust is more enjoyable and productive for me than any other language, and I hope there's a future where I can say the same for its unsafe counterpart. Happy programming 🦀