https://github.com/savannstm/marshal-rs
Blazingly fast Ruby-lang's Marshal implementation in Rust.
https://github.com/savannstm/marshal-rs
deserialize marshal ruby serialize
Last synced: 12 months ago
JSON representation
Blazingly fast Ruby-lang's Marshal implementation in Rust.
- Host: GitHub
- URL: https://github.com/savannstm/marshal-rs
- Owner: savannstm
- License: wtfpl
- Created: 2024-08-19T16:20:36.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2025-07-11T09:01:39.000Z (12 months ago)
- Last Synced: 2025-07-11T10:52:52.243Z (12 months ago)
- Topics: deserialize, marshal, ruby, serialize
- Language: Rust
- Homepage: https://crates.io/crates/marshal-rs
- Size: 146 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# marshal-rs
**`marshal-rs` is a complete Rust implementation of Ruby-lang's `Marshal`.**
It is capable of :fire: **_BLAZINGLY FAST_** loading data from dumped Ruby Marshal files, as well as :fire: **_BLAZINGLY FAST_** dumping it back to Marshal format.
## Installation
`cargo add marshal-rs`
## Overview
This crate has two main structs, `Loader` and `Dumper`, along with helper functions that use them internally. There's three load functions: `load()`, `load_utf8()`, `load_binary()`, and a single dump function: `dump()`.
`load()` takes a `&[u8]`, consisting of Marshal data bytes (that can be read using `std::fs::read()`) as its only argument, and outputs `Value`.
`dump()`, in turn, takes `marshal_rs::Value` as its only argument and serializes it back to `Vec` Marshal byte stream. It does not preserve strings' initial encoding, writing all strings as UTF-8 encoded.
By default, in `load()` function, Ruby strings, that include encoding instance variable, are serialized to JSON strings, and those which don't, serialized to byte arrays.
`load_utf8()` function tries to convert arrays without instance variable to string, and produces string if array is valid UTF-8, and object otherwise.
`load_binary()` function converts all strings to objects.
This behavior also can be controlled in `Loader` by calling `set_string_mode()`.
You can manage the prefix of instance variables using `instance_var_prefix` argument in `load()` and `dump()`, or by using `set_instance_var_prefix()` function in `Loader` or `Dumper`. Passed string replaces "@" instance variables' prefixes.
To avoid loss of precision, floats are stored as strings.
If Marshal file contains any extra float mantissa bits, `marshal-rs` discards them. They aren't written by latest 4.8 version of Marshal, but it still preserves them, if encounters any. `marshal-rs` does not.
The reason this crate wraps around `serde_json::Value`, is because it needs to cleanly track unique object instances and object metadata.
The table shows, how `marshal-rs` serializes Ruby types to Value:
| Ruby object | Serialized to Value |
| ------------------------------------------ | ----------------------------------------- |
| `nil` | `null` |
| `true`, `false` | `true`, `false` |
| `1337` (Integer) | `1337` |
| `36893488147419103232` (Big Integer) | `"36893488147419103232"` |
| `13.37` (Float) | `"13.37"` |
| `"ligma"` (String, with instance variable) | `"ligma"` |
| `:ligma` (Symbol) | `"ligma"` |
| `/lgma/i` (Regex) | `"/lgma/i"` |
| `[ ... ]` (Array) | `[ ... ]` |
| `Hash`, `Struct` | `IndexMap` |
| `Object.new` | `IndexMap` |
| `Class`, `Module` | `null` (Doesn't dump any data to Marshal) |
Value can be stringified and written to JSON using `serde_json::to_string` function. That will wrap each value in an object, that holds its metadata as object keys. For example, `null` will become
```json
{
"__id": number,
"__class": "",
"__type": 0,
"__value": null,
"__extensions": [],
"__flags": 0
}
```
object.
Possible `__type` values are defined in `src/types.rs`:
```rust
pub enum ValueType {
#[default]
Null = 0,
Bool(bool) = 1,
Integer(i32) = 2,
Float(String) = 3,
Bigint(String) = 4,
String(String) = 5,
Bytes(Vec) = 6,
Symbol(String) = 7,
Regexp(String) = 8,
Array(Vec) = 9,
Object(ObjectMap) = 10,
Class = 11,
Module = 12,
HashMap(HashMap) = 13,
Struct(HashMap) = 14,
}
```
Possible `__flags` values are defined in `src/types.rs`:
```rust
struct ValueFlags: u8 {
const None = 0;
const OldModule = 1;
const UserClass = 2;
const Data = 4;
const UserDefined = 8;
const UserMarshal = 16;
}
```
Keep in mind `__flags` also could be a combination of some flags.
### Unsafe code
In this crate, unsafe code provides the ability to replicate Marshal's behavior. It shouldn't ever cause problems.
## Test coverage
Currently, tests feature dumping/loading the following values: nil, bool, positive/negative fixnum, positive/negative bignum, float (including inf, nan and negative), utf-8/non-utf-8 strings, object links, array, hashes, structs, objects (including extended with modules, with custom marshal\_ methods, with custom \_load/\_dump methods), regexps, built-in class subclasses.
Also tests include loading/dumping RPG Maker game's files and battle-testing them.
If something is missing in the tests, open an issue or submit a pull request.
## Example
```rust
use std::fs::read;
use marshal_rs::{load, dump, Value};
// Note: Value supports indexing by `&str` if it's an object,
// and by `&Value` if it's a hashmap, but to use that you need
// to import `Get` trait.
// use marshal_rs::Get;
fn main() {
// Read marshal data from file
// let marshal_data: Vec = read("./Map001.rvdata2").unwrap();
// For this example, we'll just take pre-defined marshal data
let marshal_data = [0x04, 0x08, 0x30];
// Serializing to json
// `load()` takes a `&[u8]` as argument, so `Vec` must be borrowed
let serialized_to_json: Value = load(&marshal_data, None).unwrap();
// Here you may stringify Value using `serde_json::to_string()`, and
// `std::fs::write()` it to file
// Serializing back to marshal
// `dump()` requires owned Value as argument
let serialized_to_marshal: Vec = dump(serialized_to_json, None);
// Here you may `std::fs::write()` serialized Marshal data to file
}
```
## MSRV
Minimum supported Rust version is 1.63.0.
## References
- [marshal.c](https://github.com/ruby/ruby/blob/master/marshal.c) (Had to compile Ruby manually, and add debug prints to marshal.c, to even figure out what's going on)
- [TypeScript implementation of Marshal](https://github.com/hyrious/marshal) (This project inspired me to start working on this)
- [Official documentation for Marshal format](https://docs.ruby-lang.org/en/master/marshal_rdoc.html) (Mostly useless)
## License
Project is licensed under WTFPL.