An open API service indexing awesome lists of open source software.

https://github.com/lucidfrontier45/silva

Tiny inference engine for tree ensemble models in Rust
https://github.com/lucidfrontier45/silva

ensemble-model lightgbm machine-learning rust xgboost

Last synced: 5 months ago
JSON representation

Tiny inference engine for tree ensemble models in Rust

Awesome Lists containing this project

README

          

logo

[![crates.io](https://img.shields.io/crates/v/silva)](https://crates.io/crates/silva)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Repository](https://img.shields.io/badge/github-lucidfrontier45/silva-blue)](https://github.com/lucidfrontier45/silva)

Silva is a tiny inference engine for tree ensemble models (a.k.a forest models) in Rust.

## Why Silva?

Silva makes it easier for Rust programs to use pre-trained XGBoost and LightGBM models by providing a lightweight inference engine that avoids runtime dependencies on external machine learning libraries. Key benefits include:

- **Pure Rust**: Entirely written in Rust for high performance, memory safety, and zero-cost abstractions
- **Simple Codebase**: Minimal, clean implementation that's easy to understand, integrate, and maintain
- **No External Dependencies**: Parses and runs models from XGBoost and LightGBM without requiring those libraries to be installed or linked

# Supported Formats

## Silva Format
- Native format using efficient serde serialization
- Most compact and fastest to load

## XGBoost
- **Booster Types**: `gbtree` only (gblinear and dart are not supported)
- **Supported Objectives**:
- `reg:squarederror` (regression)
- `binary:logistic` (binary classification)
- `multi:softmax` (multiclass classification)
- `multi:softprob` (multiclass classification)
- **Note**: Unsupported booster types/objectives will return descriptive errors

## LightGBM
- All regression and classification models
- Text format only (binary format not supported)
- Tree structure only (no linear models)
- Note: LightGBM incorporates all bias into leaf values (no separate base_score)

# Use this library

```sh
cargo add silva
```

# Data Structures

## MultiOutputForest
A container for multi-output models (e.g., multi-class classification). Holds a vector of `Forest` instances, one per output class. Returns a vector of predictions, one per output.

## Forest
Single-output tree ensemble containing:
- `base_value`: Bias/baseline score added to all predictions
- `trees`: Vector of decision trees

Prediction formula: `base_value + Σ tree_predictions`

## Tree
Individual decision tree represented as:
- `node_map`: Hash map of node ID → `TreeNode`
- `root`: Root node ID

Traverses tree from root to leaf based on feature comparisons.

## TreeNode
Single node with:
- `split_index`: Feature index for splitting
- `split_condition`: Threshold value (NotNan)
- `left/right`: Child node IDs (None for leaves)
- `value`: Leaf value (NotNan)

Leaves have no children; internal nodes contain split logic.

# Silva Format Example

```json
{
"forests": [
{
"base_value": 0.5,
"trees": [
{
"nm": {
"0": {"id": 0, "si": 0, "sc": 2.5, "l": 1, "r": 2, "v": 0.0},
"1": {"id": 1, "si": 1, "sc": 1.5, "l": null, "r": null, "v": 3.0},
"2": {"id": 2, "si": 1, "sc": 3.5, "l": null, "r": null, "v": 5.0}
},
"root": 0
},
{
"nm": {
"0": {"id": 0, "si": 0, "sc": 5.0, "l": 1, "r": 2, "v": 0.0},
"1": {"id": 1, "si": 1, "sc": 2.0, "l": null, "r": null, "v": 10.0},
"2": {"id": 2, "si": 1, "sc": 3.0, "l": null, "r": null, "v": 20.0}
},
"root": 0
}
]
}
]
}
```

## Field Notation

| Abbreviation | Full Name | Description |
| ------------ | --------------- | ----------------------------------------------- |
| `nm` | node_map | Hash map mapping node ID to TreeNode |
| `si` | split_index | Feature index used for splitting at this node |
| `sc` | split_condition | Threshold value for the split comparison |
| `l` | left | ID of left child node (null for leaves) |
| `r` | right | ID of right child node (null for leaves) |
| `v` | value | Leaf prediction value (only used in leaf nodes) |

## Structure Hierarchy

```
MultiOutputForest
└── forests: Forest[]
├── base_value: f64 (baseline score)
├── trees: Tree[]
│ ├── nm: {node_id: TreeNode}
│ │ ├── id: node ID
│ │ ├── si: feature index to split on
│ │ ├── sc: split threshold
│ │ ├── l: left child ID (or null)
│ │ ├── r: right child ID (or null)
│ │ └── v: leaf value
│ └── root: ID of the root node
```

**Prediction Flow**: Start at root → compare feature[si] with sc → follow l or r → repeat until leaf → sum all tree values → add base_value

# Usage Examples

## Basic Prediction

The predict methods work with feature vectors (`&[f64]`) and return prediction values.

### Single Tree Prediction
```rust
use silva::Tree;

let tree = Tree::new(node_map, root_id);
let prediction = tree.predict(&[1.5, 2.3, 0.8]); // returns NotNan
```

### Forest (Single Output)
```rust
use silva::Forest;

let forest = Forest::new(base_value, trees);
let prediction = forest.predict(&[1.5, 2.3, 0.8]); // returns NotNan
```

### Multi-Output Forest
```rust
use silva::MultiOutputForest;

let model = MultiOutputForest::new(forests);
let predictions = model.predict(&[1.5, 2.3, 0.8]); // returns Vec>
```

## Complete Workflow Example

```rust
use silva::MultiOutputForest;

fn main() -> Result<(), Box> {
// Load model from file
let model = MultiOutputForest::from_file("model.json")?;

// Prepare feature data
let features = vec![vec![1.5, 2.3, 0.8], vec![0.5, 1.2, 3.4]];

// Make predictions
for x in &features {
let prediction = model.predict(x);
println!("Predictions: {:?}", prediction);
}

Ok(())
}
```

## Understanding Predictions

The `predict` methods return **raw values** that may require post-processing depending on the model type and objective:

### Classification Models

For binary classification using `binary:logistic`, apply sigmoid to the raw prediction:
```rust
let raw = forest.predict(&features);
let probability = 1.0 / (1.0 + (-raw).exp()); // sigmoid
```

For multiclass classification using `multi:softmax` or `multi:softprob`, apply softmax to the predictions:
```rust
let raw_values = model.predict(&features);
let exp_values: Vec = raw_values.iter().map(|&v| v.exp()).collect();
let sum: f64 = exp_values.iter().sum();
let probabilities: Vec = exp_values.iter().map(|&v| v / sum).collect();
```

### Regression Models

For Poisson regression objectives, apply exponential to the raw prediction:
```rust
let raw = forest.predict(&features);
let count_prediction = raw.exp();
```

For standard regression (e.g., `reg:squarederror`), the raw value can be used directly.

### LightGBM Note

LightGBM models incorporate bias into leaf values, so no separate `base_score` adjustment is needed for predictions.

For more examples, see `examples/prediction.rs`.