https://github.com/dirvine/saorsa-robotics

Scaffold for training Hugging Face SO-101 robotic arms using Vision-Language-Action policies without demonstrations
https://github.com/dirvine/saorsa-robotics
Last synced: 8 months ago
JSON representation
Scaffold for training Hugging Face SO-101 robotic arms using Vision-Language-Action policies without demonstrations
Host: GitHub
URL: https://github.com/dirvine/saorsa-robotics
Owner: dirvine
Created: 2025-08-10T12:53:56.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-09-02T13:38:14.000Z (10 months ago)
Last Synced: 2025-09-02T14:21:56.084Z (10 months ago)
Language: Rust
Homepage:
Size: 788 KB
Stars: 0
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Agents: AGENTS.md
Awesome Lists containing this project

README

          # 🤖 Saorsa Robotics

> **Production-ready Rust framework for autonomous robotic control with local AI models**

[![Rust](https://img.shields.io/badge/rust-1.75%2B-orange.svg)](https://www.rust-lang.org)

[![Safety](https://img.shields.io/badge/safety-critical-red.svg)](./crates/safety-guard)

[![Tests](https://img.shields.io/badge/tests-100%25-brightgreen.svg)](./crates)

[![License](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)

Saorsa Robotics provides a comprehensive, safety-first framework for robotic control systems with Vision-Language-Action (VLA) models running entirely on local hardware. Built in Rust for memory safety, performance, and reliability.

## 🌟 Key Features

### 🧠 Local AI Models

- **MolmoAct Integration**: Action Reasoning Model with 3D spatial understanding and Chain-of-Thought planning

- **Candle ML Framework**: Lightweight Rust-native inference without Python dependencies

- **OpenVLA Support**: Compatible with cloud and local VLA models

- **On-Device Learning**: Continual improvement through OFT adapters and intervention learning

### 🛡️ Safety-Critical Design

- **Formal Constraint System**: Expression-based DSL for defining safety boundaries

- **Real-Time Monitoring**: Watchdog systems with automatic intervention

- **Zero Panic Guarantee**: No `unwrap()`, `expect()`, or `panic!()` in production code

- **Comprehensive Testing**: 100% test coverage on safety-critical paths

### 🎯 Advanced Capabilities

- **Multi-Modal Control**: Voice commands, vision processing, and haptic feedback

- **Stereo Vision**: Depth perception with dual camera calibration

- **CAN Bus Integration**: Direct hardware control for motors and actuators  

- **Intent Parsing**: Natural language to robot action conversion

### ⚡ Performance

- **Zero-Copy Data Paths**: Efficient memory usage

- **Async/Await**: Non-blocking I/O throughout

- **Action Chunking**: Smooth control despite network latency

- **Real-Time Capable**: Deterministic timing for critical operations

## 🚀 Quick Start

### Prerequisites

```bash

# Install Rust

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone repository

git clone https://github.com/dirvine/saorsa-robotics

cd saorsa-robotics

```

### Build and Test

```bash

# Build all crates

cargo build --release

# Run all tests

cargo test --all

# Run with safety checks

cargo run --bin sr-cli -- --safety-enabled

```

### Run Examples

```bash

# VLA Policy Demo

cargo run --example vla_policy_demo

# Wake Word Detection

cargo run --example wake_word_demo

# Safety Constraints Demo

cargo run --bin safety-demo

```

## 📦 Architecture

```

saorsa-robotics/

├── apps/                       # Application binaries

│   ├── sr-cli/                # Main CLI interface

│   ├── brain-daemon/          # Central coordination daemon

│   ├── safety-demo/           # Safety system demonstration

│   └── kyutai-stt-app/       # Speech-to-text application

├── crates/                    # Core library crates

│   ├── vla-policy/           # Vision-Language-Action models

│   ├── safety-guard/         # Safety constraint engine

│   ├── voice-local/          # On-device voice processing

│   ├── vision-stereo/        # Stereo vision and depth

│   ├── intent-parser/        # NLU and command parsing

│   ├── can-transport/        # CAN bus communication

│   ├── device-registry/      # Hardware device management

│   └── continual-learning/   # Online learning framework

├── examples/                  # Example applications

├── configs/                   # Device and system configs

└── docs/                     # Technical documentation

```

## 🔧 Core Components

### VLA Policy System (`vla-policy`)

Implements multiple Vision-Language-Action models for robot control:

```rust

use vla_policy::{create_policy, PolicyConfig, Observation};

// Create MolmoAct policy with 3D reasoning

let config = PolicyConfig {

    model_type: "molmoact".to_string(),

    model_path: "models/molmoact-7b".to_string(),

    // ... configuration

};

let policy = create_policy(config)?;

let action = policy.predict(&observation).await?;

```

**Features:**

- MolmoAct with Chain-of-Thought reasoning

- Waypoint generation for complex tasks

- Skills framework (Pick, Place, Reach)

- Mock policy for testing

### Safety Guard (`safety-guard`)

Expression-based constraint system ensuring safe operation:

```rust

use safety_guard::{SafetyGuard, Constraint};

let mut guard = SafetyGuard::new();

// Define workspace boundaries

guard.add_constraint(Constraint::expression(

    "workspace_x",

    "x >= -0.5 && x <= 0.5"

)?);

// Check if action is safe

if guard.check_action(&action)? {

    robot.execute(action)?;

}

```

**Features:**

- Mathematical expression constraints

- Real-time evaluation with evalexpr

- Watchdog monitoring

- Automatic intervention on violations

### Voice Control (`voice-local`)

On-device speech recognition and wake word detection:

```rust

use voice_local::{KyutaiProvider, WakeWordDetector};

let provider = KyutaiProvider::new(config)?;

let detector = WakeWordDetector::new("hey robot")?;

// Process audio stream

if detector.detect(&audio_frame)? {

    let command = provider.transcribe(&audio_buffer)?;

    execute_command(command)?;

}

```

**Features:**

- Kyutai/Mimi model integration

- Real-time transcription

- Wake word detection

- Plugin architecture for custom models

### Stereo Vision (`vision-stereo`)

Depth perception and 3D scene understanding:

```rust

use vision_stereo::{StereoCamera, DepthEstimator};

let camera = StereoCamera::new(config)?;

camera.calibrate()?;

let (left, right) = camera.capture()?;

let depth_map = DepthEstimator::compute(&left, &right)?;

let tags = detect_april_tags(&left)?;

```

**Features:**

- Dual camera calibration

- Real-time depth estimation

- AprilTag detection

- Point cloud generation

### CAN Transport (`can-transport`)

Hardware control via CAN bus:

```rust

use can_transport::{SlcanTransport, Message};

let transport = SlcanTransport::new("/dev/ttyUSB0")?;

// Send motor command

let msg = Message::new(0x123, &[0x01, 0x02, 0x03])?;

transport.send(&msg)?;

```

**Features:**

- SLCAN protocol support

- ODrive motor control

- T-Motor actuator support

- Mock transport for testing

## 🧪 Testing

All crates maintain 100% test coverage on critical paths:

```bash

# Run all tests

cargo test --all

# Run with coverage

cargo tarpaulin --out Html

# Run safety-critical tests

cargo test -p safety-guard

# Run benchmarks

cargo bench

```

Current test status:

- ✅ `safety-guard`: 13/13 passing

- ✅ `vla-policy`: 21/21 passing

- ✅ `voice-local`: All doctests passing

- ✅ `intent-parser`: 1/1 passing

- ✅ Zero compilation warnings

## 🔐 Safety & Security

### Production Standards

- **No Panics**: Zero `unwrap()`, `expect()`, or `panic!()` in production

- **Error Handling**: All errors properly propagated with `Result`

- **Memory Safety**: Guaranteed by Rust's ownership system

- **Concurrency Safety**: Safe parallelism with Send/Sync traits

### Safety Features

- Formal constraint verification

- Watchdog timers on all operations

- Automatic failsafe modes

- Comprehensive audit logging

## 📚 Documentation

- [Architecture Overview](./docs/README.md)

- [VLA Policy Design](./docs/SPEC.md)

- [Safety System](./crates/safety-guard/README.md)

- [CAN Protocol](./docs/CAN.md)

- [Vision System](./docs/VISION.md)

- [Voice Control](./docs/VOICE.md)

- [Research Notes](./docs/RESEARCH.md)

## 🗺️ Roadmap

### Near Term

- [ ] ONNX runtime integration for broader model support

- [ ] ROS2 bridge for ecosystem compatibility

- [ ] Web dashboard for monitoring and control

- [ ] Simulation environment with Bevy

### Long Term

- [ ] Distributed multi-robot coordination

- [ ] Federated learning across robot fleets

- [ ] Custom silicon accelerator support

- [ ] Formal verification of safety properties

## 🤝 Contributing

We welcome contributions! Please see [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.

Key areas for contribution:

- Additional VLA model implementations

- Hardware device drivers

- Safety constraint patterns

- Documentation and examples

## 📄 License

MIT License - see [LICENSE](./LICENSE) for details.

## 🙏 Acknowledgments

- Built with [Candle](https://github.com/huggingface/candle) for ML inference

- Inspired by [LeRobot](https://github.com/huggingface/lerobot) for robot learning

- Safety patterns from aerospace and automotive industries

## 📬 Contact

- GitHub: [@dirvine](https://github.com/dirvine)

- Project: [Saorsa Labs](https://saorsa.org)

---

*For the original Python implementation for SO-101 arms, see [archive/python-so101](./archive/python-so101/README.md)*
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dirvine/saorsa-robotics

Awesome Lists containing this project

README