https://github.com/orcastor/orcas
๐๏ธใๅผๆพๅผ็ฎฑๅณ็จๅ
ๅฎนๅฏปๅๅฏน่ฑกๅญๅจใๆฏๆไธปๆตๆไฝ็ณป็ปๅๅปไปทไฝๅ่่ฎพๅค [OrcaS] Open Ready-to-use Content Addressable Storage - for popular OS & cheap and low power devices.
https://github.com/orcastor/orcas
ceph content-addressable-storage embedded fastdfs hacktoberfest minio nas object-file-system object-storage object-store osd oss s3 storage tikv
Last synced: 5 months ago
JSON representation
๐๏ธใๅผๆพๅผ็ฎฑๅณ็จๅ ๅฎนๅฏปๅๅฏน่ฑกๅญๅจใๆฏๆไธปๆตๆไฝ็ณป็ปๅๅปไปทไฝๅ่่ฎพๅค [OrcaS] Open Ready-to-use Content Addressable Storage - for popular OS & cheap and low power devices.
- Host: GitHub
- URL: https://github.com/orcastor/orcas
- Owner: orcastor
- License: mit
- Created: 2019-03-28T10:31:22.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2026-01-13T11:00:08.000Z (5 months ago)
- Last Synced: 2026-01-13T14:06:32.474Z (5 months ago)
- Topics: ceph, content-addressable-storage, embedded, fastdfs, hacktoberfest, minio, nas, object-file-system, object-storage, object-store, osd, oss, s3, storage, tikv
- Language: Go
- Homepage: https://orcastor.github.io/doc/
- Size: 25.3 MB
- Stars: 27
- Watchers: 3
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
OrcaS: Open Ready-to-Use Content Addressable Storage
- [English](README.md) | [ไธญๆ](README.zh.md)
## ๐ What is OrcaS?
**OrcaS** (Open Ready-to-Use Content Addressable Storage) is a **lightweight, high-performance object storage system** built with **Content Addressable Storage (CAS)** at its core. It provides enterprise-grade features like instant deduplication, multi-versioning, zero-knowledge encryption, and smart compression - all in a single binary that's ready to deploy.
### Why OrcaS?
- ๐ **Open**: Open source (MIT license), transparent, community-driven development
- โ
**Ready-to-Use**: Content Addressable Storage ensures data integrity and automatic deduplication, production-ready out of the box
- ๐ฏ **Content Addressable Storage**: Data is stored by content hash, enabling automatic deduplication and integrity verification
- โก **Instant Upload (Deduplication)**: Upload files in seconds, not minutes - identical files are detected instantly without uploading
- ๐ **Zero-Knowledge Encryption**: Your data, your keys - end-to-end encryption with industry-standard algorithms
- ๐ฆ **Production Ready**: S3-compatible API, VFS mount support, and comprehensive documentation
- ๐ **High Performance**: Optimized for both small and large files with intelligent packaging and chunking
## โจ Key Features
### โฑ Instant Upload (Object-level Deduplication)
**What it does**: Upload identical files instantly without transferring data.
**How it works**:
- Calculates multiple checksums (XXH3, SHA-256) for each file
- Before uploading, checks if identical content already exists
- If found, creates a reference to existing data instead of uploading
- **Result**: Upload time drops from minutes to milliseconds for duplicate files
**Use cases**:
- Backup systems (same files across multiple backups)
- Version control systems (similar files across versions)
- Multi-user environments (shared files)
- CDN edge storage (cached content)
**Benefits**:
- ๐ **99%+ faster** uploads for duplicate files
- ๐พ **Massive storage savings** - store 1 copy, reference it N times
- โก **Bandwidth savings** - no redundant data transfer
- ๐ **Automatic integrity verification** - content hash ensures data correctness

### ๐ฆ Small Object Packaging
**What it does**: Efficiently stores many small files together.
**How it works**:
- Groups small files (< 64KB) into packages
- Reduces metadata overhead and I/O operations
- Maintains individual file access while optimizing storage
**Benefits**:
- ๐ **10x+ performance improvement** for small file operations
- ๐ฐ **Reduced storage costs** - less metadata overhead
- โก **Faster operations** - batch metadata writes
### ๐ช Large Object Chunking
**What it does**: Splits large files into manageable chunks.
**How it works**:
- Automatically chunks files larger than configured threshold (default 10MB)
- Each chunk stored independently with its own checksum
- Enables parallel upload/download and efficient updates
**Benefits**:
- ๐ **Parallel processing** - upload/download chunks concurrently
- ๐ก๏ธ **Resumable transfers** - retry failed chunks independently
- โ๏ธ **Efficient updates** - only modified chunks need re-upload
- ๐ **Better resource utilization** - process large files efficiently
### ๐ Object Multi-versioning
**What it does**: Automatically maintains file version history.
**How it works**:
- Each file modification creates a new version
- Old versions preserved automatically
- Configurable retention policies
- Space-efficient through content deduplication
**Benefits**:
- ๐ **Point-in-time recovery** - restore any previous version
- ๐ก๏ธ **Data protection** - accidental deletions are recoverable
- ๐ **Audit trail** - track all changes over time
- ๐พ **Space efficient** - unchanged data shared across versions
### ๐ Zero-Knowledge Encryption
**What it does**: End-to-end encryption where only you hold the keys.
**How it works**:
- AES-256 encryption (industry standard)
- Encryption keys never leave your control
- Optional per-bucket encryption keys
- Transparent encryption/decryption
**Benefits**:
- ๐ **Maximum security** - even storage admins can't read your data
- โ
**Compliance ready** - meets strict security requirements
- ๐ก๏ธ **Data privacy** - your data, your control
- ๐ **International standards** - AES-256 encryption
### ๐ Smart Compression
**What it does**: Automatically compresses data to save space.
**How it works**:
- Configurable compression algorithms (zstd, gzip, etc.)
- Compression applied before encryption
- Automatic detection of already-compressed data
- Per-bucket compression settings
**Benefits**:
- ๐พ **Storage savings** - typically 30-70% reduction
- โก **Bandwidth savings** - less data to transfer
- ๐ฏ **Smart defaults** - works out of the box
- โ๏ธ **Configurable** - adjust per your needs
## ๐๏ธ Architecture & Design
### Content Addressable Storage (CAS) Core
OrcaS is built on **Content Addressable Storage** principles, where data is stored and retrieved by its content hash rather than location.

**Key Benefits of CAS**:
1. **Automatic Deduplication**: Identical content stored once, referenced many times
2. **Integrity Verification**: Content hash ensures data hasn't been corrupted
3. **Efficient Versioning**: New versions only store changed content
4. **Simplified Backup**: Same content = same hash = no re-upload needed
### System Architecture

### Instant Upload Flow

### Data Storage Structure
```
Storage Layout:
โโโ Metadata (SQLite)
โ โโโ Objects (files, directories)
โ โโโ DataInfo (content metadata)
โ โโโ Versions (version history)
โ โโโ References (deduplication)
โ
โโโ Data Blocks (File System)
โโโ /
โโโ /
โโโ /
โโโ _
```
## ๐ Performance Highlights
- **Instant Upload**: 99%+ faster for duplicate files (milliseconds vs minutes)
- **Small Files**: 10x+ performance improvement with packaging
- **Large Files**: Parallel chunk processing for optimal throughput
- **Storage Efficiency**: 30-70% space savings with compression + deduplication
- **Concurrent Operations**: Optimized for high concurrency
**Performance Test Reports**:
- [S3 API Performance Test Report](s3/docs/PERFORMANCE_TEST_REPORT.md)
- [VFS Performance Optimization Report](vfs/PERFORMANCE_OPTIMIZATION_FINAL.md)
## ๐ง Path Management
OrcaS supports flexible path management, allowing you to use different storage paths within the same process. This is useful for multi-tenant scenarios or when managing multiple storage locations.
### Creating Handlers with Paths
#### LocalHandler
`NewLocalHandler` requires both `basePath` and `dataPath` parameters:
```go
import (
"github.com/orcastor/orcas/core"
)
// Create handler with custom paths
handler := core.NewLocalHandler("/custom/base/path", "/custom/data/path")
defer handler.Close()
// basePath: path for main database and bucket databases
// dataPath: path for data file storage
```
#### NoAuthHandler
`NewNoAuthHandler` only requires `dataPath` parameter. The `basePath` is automatically set to empty string (no main database):
```go
// Create NoAuthHandler (bypasses authentication)
handler := core.NewNoAuthHandler("/custom/data/path")
defer handler.Close()
// Only dataPath is needed, basePath is always empty for NoAuth mode
```
### Creating Admins with Paths
#### LocalAdmin
`NewLocalAdmin` requires both `basePath` and `dataPath` parameters:
```go
// Create admin with custom paths
admin := core.NewLocalAdmin("/custom/base/path", "/custom/data/path")
// basePath: path for main database and bucket databases
// dataPath: path for data file storage
```
#### NoAuthAdmin
`NewNoAuthAdmin` only requires `dataPath` parameter. The `basePath` is automatically set to empty string (no main database):
```go
// Create NoAuthAdmin (bypasses authentication and permission checks)
admin := core.NewNoAuthAdmin("/custom/data/path")
// Only dataPath is needed, basePath is always empty for NoAuth mode
```
### Path Usage Examples
```go
// Example: Using current directory for both paths
handler := core.NewLocalHandler(".", ".")
admin := core.NewLocalAdmin(".", ".")
// Example: Separate paths for base and data
handler := core.NewLocalHandler("/var/orcas/base", "/var/orcas/data")
admin := core.NewLocalAdmin("/var/orcas/base", "/var/orcas/data")
// Example: NoAuth mode (no main database, only data path)
handler := core.NewNoAuthHandler("/var/orcas/data")
admin := core.NewNoAuthAdmin("/var/orcas/data")
```
### Benefits
- ๐ **Multi-tenant Support**: Different contexts can use different storage paths
- ๐ฏ **Flexible Configuration**: Specify paths directly when creating handlers/admins
- โ๏ธ **NoAuth Mode**: Simplified path management for NoAuth handlers/admins (only dataPath needed)
- ๐ **Process Isolation**: Multiple storage locations in the same process
## ๐ Documentation
- [Full Documentation](https://orcastor.github.io/doc/)
- [VFS Mount Guide](vfs/MOUNT_GUIDE.md) - Complete guide for VFS filesystem mounting
- [S3 API Documentation](s3/README.md)
- [No Main Database Mode Guide](docs/NO_BASE_DB_GUIDE.md) - Run without main database (no user management)
## ๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## ๐ License
MIT License - see [LICENSE](LICENSE) file for details.
## โญ Why Star This Project?
- ๐ฏ **Production Ready**: Battle-tested, actively maintained
- ๐ **High Performance**: Optimized for real-world workloads
- ๐ **Security First**: Zero-knowledge encryption built-in
- ๐พ **Storage Efficient**: Automatic deduplication saves space and costs
- ๐ ๏ธ **Easy to Use**: S3-compatible API, VFS mount, comprehensive docs
- ๐ **Innovative**: Content Addressable Storage with instant deduplication
- ๐ **Actively Developed**: Regular updates and improvements
- ๐ค **Open Source**: MIT licensed, community-driven
**Star us if you find this project useful!** โญ
---
[](https://app.fossa.com/projects/git%2Bgithub.com%2Forcastor%2Forcas?ref=badge_large)