https://github.com/reinerp/varlen-rs
Ergonomic variable-length types in Rust
https://github.com/reinerp/varlen-rs
cache-efficiency layout memory optimization rust
Last synced: 4 months ago
JSON representation
Ergonomic variable-length types in Rust
- Host: GitHub
- URL: https://github.com/reinerp/varlen-rs
- Owner: reinerp
- License: bsd-3-clause
- Created: 2022-01-12T19:44:55.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2022-03-31T05:15:00.000Z (almost 4 years ago)
- Last Synced: 2025-09-27T22:53:14.134Z (5 months ago)
- Topics: cache-efficiency, layout, memory, optimization, rust
- Language: Rust
- Homepage:
- Size: 203 KB
- Stars: 7
- Watchers: 1
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# `varlen`
Ergonomic variable-length types.
1. [Summary](#summary)
1. [Motivation](#motivation)
1. [Examples](#examples)
1. [Overview of types](#overview-of-types)
1. [Use of Pin](#use-of-pin)
1. [Feature flags](#feature-flags)
# Summary
`varlen` defines foundational types and traits for working with variable-length types in Rust.
The main example of variable-length type is a struct that stores a dynamically-sized array
directly in its storage, without requiring a pointer to a separate memory allocation. `varlen`
helps you define such types, and lets you build arbitrary concatenations and structs of them.
Additionally, it provides equivalents of the standard library's `Box` and `Vec` types
that are adapted to work well with variable-length types.
If you want to reduce the number of pointer indirections in your types by storing
variable-sized arrays directly in your objects, then `varlen` is the library for you.
# Motivation
Traditionally when we use variable-sized data such as strings or arrays, we use a pointer
to a separately allocated object. For example, the following object ...
```rust
type Person = (/* age */ usize, /* name */ Box, /* email */ Box);
let person: Person =
(16, Box::from("Harry Potter"), Box::from("harry.potter@example.com"));
```
... is represented in memory like this, with three separately allocated objects:
```svgbob
"Person"
+------------+--------------------+-------------------+--------------------+-------------------+
| "16 (age)" | "0x1234 (str ptr)" | "12 (str length)" | "0x5678 (str ptr)" | "24 (str length)" |
+------------+--------------------+-------------------+--------------------+-------------------+
| |
| |
.-----------' .-------------'
| |
v "str payload" v "str payload"
+------------------+ +------------------------------+
| "'Harry Potter'" | | "'harry.potter@example.com'" |
+------------------+ +------------------------------+
```
Sometimes we can reduce the number of object allocations by bringing the variable-length
storage directly into the parent object, perhaps with a memory layout like this:
```svgbob
+-----+
| ptr |
+-----+
|
|
| "Person"
| +-------------+-------------------+------------------+-------------------+------------------------------+
'->| "16 (age)" | "12 (str length)" | "'Harry Potter'" | "24 (str length)" | "'harry.potter@example.com'" |
+-------------+-------------------+------------------+-------------------+------------------------------+
```
This layout reduced the number of object allocations from 3 to 2, potentially improving
memory allocator performance, and potentially also improving [CPU cache locality](https://en.wikipedia.org/wiki/Locality_of_reference).
It also reduced the number of pointers from 3 to 2, saving memory.
The main disadvantage of this layout is that size and layout of the `Person` object is not
known at compile time; it is only known at runtime, when the lengths of the strings are known.
Working with such layouts in plain Rust is cumbersome, and also requires unsafe code to do
the necessary pointer arithmetic.
`varlen` lets you easily define and use such types, without you having to write any
unsafe code. The following code will create an object with the memory layout from above:
```rust
use varlen::prelude::*;
type Person = Tup3* age */ FixedLen, /* name */ Str, /* email */ Str>;
let person: VBox = VBox::new(tup3::Init(
FixedLen(16),
Str::copy("Harry Potter"),
Str::copy("harry.potter@example.com"),
));
```
# Examples
```rust
use varlen::prelude::*;
// Define a variable-length tuple:
type MyTuple = Tup3, Str, Array>;
let my_tuple: VBox = VBox::new(tup3::Init(
FixedLen(16), Str::copy("hello"), Array::copy(&[1u16, 2])));
// Put multiple objects in a sequence, with tightly packed memory layout:
let sequence: Seq = seq![my_tuple.vcopy(), my_tuple.vcopy()];
// Or arena-allocate them, if the "bumpalo" crate feature is enabled:
let arena = bumpalo::Bump::new();
let arena_tuple: Owned = Owned::new_in(my_tuple.vcopy(), &arena);
// Define a newtype wrapper for the tuple:
define_varlen_newtype! {
#[repr(transparent)]
pub struct MyStruct(MyTuple);
with init: struct MyStructInit<_>(_);
with inner_ref: fn inner(&self) -> &_;
with inner_mut: fn inner_mut(self: _) -> _;
}
let my_struct: VBox = VBox::new(MyStructInit(my_tuple));
// Define a variable-length struct via a procedural macro, if the "macro"
// crate feature is enabled.
#[define_varlen]
struct MyMacroStruct {
age: usize,
#[varlen]
name: Str,
#[varlen]
email: Str,
#[varlen]
child: MyStruct,
}
let s: VBox = VBox::new(
my_macro_struct::Init{
age: 16,
name: Str::copy("Harry Potter"),
email: Str::copy("harry.potter@example.com"),
child: my_struct,
}
);
// #[define_varlen] also let you directly specify array lengths:
#[define_varlen]
struct MultipleArrays {
#[controls_layout]
len: usize,
#[varlen_array]
array1: [u16; *len],
#[varlen_array]
array2: [u8; *len],
#[varlen_array]
half_array: [u16; (*len) / 2],
}
let base_array = vec![0u16, 64000, 13, 105];
let a: VBox = VBox::new(multiple_arrays::Init{
len: base_array.len(),
array1: FillSequentially(|i| base_array[i]),
array2: FillSequentially(|i| base_array[base_array.len() - 1 - i] as u8),
half_array: FillSequentially(|i| base_array[i * 2]),
});
```
# Overview of types
`varlen` provides variable-length versions of various standard-library types and traits.
This table gives the correspondence:
| Name | Fixed-length type `T` | Variable-length type `T` | Notes |
|--------------------------|-----------------------|-----------------------------------------------|-----------------------------------------------------------------------------------------|
| Immutable reference | `&T` | `&T` | |
| Mutable reference | `&mut T` | [`Pin<&mut T>`](std::pin::Pin) | `Pin<>` required for safety, see below |
| Owning, non-allocated | `T` | [`Owned<'storage, T>`](https://docs.rs/varlen/latest/varlen/owned/struct.Owned.html) | `Owned` is still a pointer to `T`'s payload |
| Owning, allocated | [`Box`] | [`VBox`](https://docs.rs/varlen/latest/varlen/vbox/struct.VBox.html) | |
| Sequence | [`Vec`] | [`Seq`](https://docs.rs/varlen/latest/varlen/seq/struct.Seq.html) | `Seq` has tightly-packed _variable-size_ elements. Random access is somewhat restricted |
| String | [`String`] | `Str` | String payload immediately follows the size, no pointer following |
| Array (fixed-size elems) | [`Box<[u16]>`] | [`Array`](https://docs.rs/varlen/latest/varlen/array/struct.Array.html) | Array payload immediately follows the size, no pointer following |
| Tuple | `(T, U)` | `Tup2` | Field `U` might not be at a statically known offset from start of object |
| Clone | `Clone::clone()` | `VClone::vclone()` | |
| Copy | | `VCopy::vcopy()` | |
# Use of `Pin`
Mutable references to variable-length types use [`Pin<&mut T>`](std::pin::Pin) rather than
`&mut T`. By doing so, we prevent patterns such as calling [`std::mem::swap`] on
variable-length types. Such patterns would be a safety hazard, because the part of the type
that the Rust compiler knows about when calling [`std::mem::swap`] is just the
"fixed-size head" of the type. However, almost all variable-length types additionally have a
"variable-sized tail" that the Rust compiler doesn't know about. Swapping the head but not
the tail could violate a type's invariants, potentially breaking safety.
If you never write `unsafe` code, you don't need to worry about this issue. The only practical
consequence is that mutable access to a variable-length type is always mediated through
[`Pin<&mut T>`](std::pin::Pin) rather than `&mut T`, and you will have to work with the slightly
more cumbersome `Pin` APIs.
On the other hand, if you write `unsafe` code, you may have to be aware of the following
invariant. If `T` is a variable-length type, we require that any reference `&T` points to
a "valid" `T`, which we consider to be one which has a fixed-length head
(of size `std::mem::size_of::()`) followed by a variable-length tail, and the head and
tail are "consistent" with each other. Here, "consistent" means that they were produced by
a call to one of the type's [`Initializer`] instances. In `unsafe` code, where you might
have access to a `&mut T` (without a `Pin`), you must avoid code patterns which modify the
head without also correspondingly modifying the tail.
# Feature flags
This crate has no *required* dependencies. The following feature flags exist, which can turn
on some dependencies.
* `bumpalo`. Enables support for allocating an [`Owned`](https://docs.rs/varlen/latest/varlen/owned/struct.Owned.html) in an `bumpalo::Bump` arena. Adds a dependency on `bumpalo`.
* `macro`. Enables procedural macro support, for defining variable-length structs using `#[define_varlen]`. Adds a dependency on `varlen_macro`, `syn` and `quote`.
* `doc`. Enables pretty SVG diagrams in documentation. Adds a lot of dependencies.