{"id":31076112,"url":"https://github.com/h2337/tsink","last_synced_at":"2025-09-16T04:15:53.846Z","repository":{"id":314513527,"uuid":"1055820632","full_name":"h2337/tsink","owner":"h2337","description":"Embedded time-series database for Rust","archived":false,"fork":false,"pushed_at":"2025-09-12T21:35:03.000Z","size":59,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-09-12T23:45:57.081Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/h2337.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-12T21:29:24.000Z","updated_at":"2025-09-12T21:35:06.000Z","dependencies_parsed_at":"2025-09-12T23:46:00.425Z","dependency_job_id":"fa6a1012-f165-45be-93fa-7cb3594f7919","html_url":"https://github.com/h2337/tsink","commit_stats":null,"previous_names":["h2337/tsink"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/h2337/tsink","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2337%2Ftsink","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2337%2Ftsink/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2337%2Ftsink/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2337%2Ftsink/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/h2337","download_url":"https://codeload.github.com/h2337/tsink/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2337%2Ftsink/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275359851,"owners_count":25450659,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-16T02:00:10.229Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-16T04:15:52.100Z","updated_at":"2025-09-16T04:15:53.827Z","avatar_url":"https://github.com/h2337.png","language":"Rust","readme":"# tsink\n\n\u003cdiv align=\"center\"\u003e\n\n\u003cp align=\"right\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/h2337/tsink/refs/heads/master/logo.svg\" width=\"250\" height=\"250\"\u003e\n\u003c/p\u003e\n\n**A high-performance embedded time-series database for Rust**\n\n\u003c/div\u003e\n\n## Overview\n\ntsink is a lightweight, high-performance time-series database engine written in Rust. It provides efficient storage and retrieval of time-series data with automatic compression, time-based partitioning, and thread-safe operations.\n\n### Key Features\n\n- **🚀 High Performance**: Gorilla compression achieves ~1.37 bytes per data point\n- **🔒 Thread-Safe**: Lock-free reads and concurrent writes with configurable worker pools\n- **💾 Flexible Storage**: Choose between in-memory or persistent disk storage\n- **📊 Time Partitioning**: Automatic data organization by configurable time ranges\n- **🏷️ Label Support**: Multi-dimensional metrics with key-value labels\n- **📝 WAL Support**: Write-ahead logging for durability and crash recovery\n- **🗑️ Auto-Retention**: Configurable automatic data expiration\n- **🐳 Container-Aware**: cgroup support for optimal resource usage in containers\n- **⚡ Zero-Copy Reads**: Memory-mapped files for efficient disk operations\n\n## Installation\n\nAdd tsink to your `Cargo.toml`:\n\n```toml\n[dependencies]\ntsink = \"0.2.1\"\n```\n\n## Quick Start\n\n### Basic Usage\n\n```rust\nuse tsink::{DataPoint, Row, StorageBuilder, Storage, TimestampPrecision};\n\nfn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    // Create storage with default settings\n    let storage = StorageBuilder::new()\n        .with_timestamp_precision(TimestampPrecision::Seconds)\n        .build()?;\n\n    // Insert data points\n    let rows = vec![\n        Row::new(\"cpu_usage\", DataPoint::new(1600000000, 45.5)),\n        Row::new(\"cpu_usage\", DataPoint::new(1600000060, 47.2)),\n        Row::new(\"cpu_usage\", DataPoint::new(1600000120, 46.8)),\n    ];\n    storage.insert_rows(\u0026rows)?;\n\n    // Note: Using timestamp 0 will automatically use the current timestamp\n    // let row = Row::new(\"cpu_usage\", DataPoint::new(0, 50.0));  // timestamp = current time\n\n    // Query data points\n    let points = storage.select(\"cpu_usage\", \u0026[], 1600000000, 1600000121)?;\n    for point in points {\n        println!(\"Timestamp: {}, Value: {}\", point.timestamp, point.value);\n    }\n\n    storage.close()?;\n    Ok(())\n}\n```\n\n### Persistent Storage\n\n```rust\nuse tsink::{StorageBuilder, Storage};\nuse std::time::Duration;\n\nlet storage = StorageBuilder::new()\n    .with_data_path(\"./tsink-data\")              // Enable disk persistence\n    .with_partition_duration(Duration::from_secs(3600))  // 1-hour partitions\n    .with_retention(Duration::from_secs(7 * 24 * 3600))  // 7-day retention\n    .with_wal_buffer_size(8192)                  // 8KB WAL buffer\n    .build()?;\n```\n\n### Multi-Dimensional Metrics with Labels\n\n```rust\nuse tsink::{DataPoint, Label, Row};\n\n// Create metrics with labels for detailed categorization\nlet rows = vec![\n    Row::with_labels(\n        \"http_requests\",\n        vec![\n            Label::new(\"method\", \"GET\"),\n            Label::new(\"status\", \"200\"),\n            Label::new(\"endpoint\", \"/api/users\"),\n        ],\n        DataPoint::new(1600000000, 150.0),\n    ),\n    Row::with_labels(\n        \"http_requests\",\n        vec![\n            Label::new(\"method\", \"POST\"),\n            Label::new(\"status\", \"201\"),\n            Label::new(\"endpoint\", \"/api/users\"),\n        ],\n        DataPoint::new(1600000000, 25.0),\n    ),\n];\n\nstorage.insert_rows(\u0026rows)?;\n\n// Query specific label combinations\nlet points = storage.select(\n    \"http_requests\",\n    \u0026[\n        Label::new(\"method\", \"GET\"),\n        Label::new(\"status\", \"200\"),\n    ],\n    1600000000,\n    1600000100,\n)?;\n```\n\n## Architecture\n\ntsink uses a linear-order partition model that divides time-series data into time-bounded chunks:\n\n```\n┌─────────────────────────────────────────┐\n│             tsink Storage               │\n├─────────────────────────────────────────┤\n│                                         │\n│  ┌───────────────┐  Active Partition    │\n│  │ Memory Part.  │◄─ (Writable)         │\n│  └───────────────┘                      │\n│                                         │\n│  ┌───────────────┐  Buffer Partition    │\n│  │ Memory Part.  │◄─ (Out-of-order)     │\n│  └───────────────┘                      │\n│                                         │\n│  ┌───────────────┐                      │\n│  │ Disk Part. 1  │◄─ Read-only          │\n│  └───────────────┘   (Memory-mapped)    │\n│                                         │\n│  ┌───────────────┐                      │\n│  │ Disk Part. 2  │◄─ Read-only          │\n│  └───────────────┘                      │\n│         ...                             │\n└─────────────────────────────────────────┘\n```\n\n### Partition Lifecycle\n\n1. **Active Partition**: Accepts new writes, kept in memory\n2. **Buffer Partition**: Handles out-of-order writes within recent time window\n3. **Flushing**: When active partition is full, it's flushed to disk\n4. **Disk Partitions**: Read-only, memory-mapped for efficient queries\n5. **Expiration**: Old partitions are automatically removed based on retention\n\n### Benefits\n\n- **Fast Queries**: Skip irrelevant partitions based on time range\n- **Efficient Memory**: Only recent data stays in RAM\n- **Low Write Amplification**: Sequential writes, no compaction needed\n- **SSD-Friendly**: Minimal random I/O patterns\n\n## Configuration\n\n### StorageBuilder Options\n\n| Option | Description | Default |\n|--------|-------------|---------|\n| `with_data_path` | Directory for persistent storage | None (in-memory) |\n| `with_retention` | How long to keep data | 14 days |\n| `with_timestamp_precision` | Timestamp precision (ns/μs/ms/s) | Nanoseconds |\n| `with_max_writers` | Maximum concurrent write workers | CPU count |\n| `with_write_timeout` | Timeout for write operations | 30 seconds |\n| `with_partition_duration` | Time range per partition | 1 hour |\n| `with_wal_enabled` | Enable write-ahead logging | true |\n| `with_wal_buffer_size` | WAL buffer size in bytes | 4096 |\n\n### Example Configuration\n\n```rust\nlet storage = StorageBuilder::new()\n    .with_data_path(\"/var/lib/tsink\")\n    .with_retention(Duration::from_secs(30 * 24 * 3600))  // 30 days\n    .with_timestamp_precision(TimestampPrecision::Milliseconds)\n    .with_max_writers(16)\n    .with_write_timeout(Duration::from_secs(60))\n    .with_partition_duration(Duration::from_secs(6 * 3600))  // 6 hours\n    .with_wal_buffer_size(16384)  // 16KB\n    .build()?;\n```\n\n## Compression\n\ntsink uses the Gorilla compression algorithm, which is specifically designed for time-series data:\n\n- **Delta-of-delta encoding** for timestamps\n- **XOR compression** for floating-point values\n- Typical compression ratio: **~1.37 bytes per data point**\n\nThis means a data point that would normally take 16 bytes (8 bytes timestamp + 8 bytes value) is compressed to less than 2 bytes on average.\n\n## Performance\n\nBenchmarks on AMD Ryzen 7940HS (single core):\n\n| Operation | Throughput | Latency |\n|-----------|------------|---------|\n| Insert single point | 10M ops/sec | ~100ns |\n| Batch insert (1000) | 15M points/sec | ~67μs/batch |\n| Select 1K points | 4.5M queries/sec | ~220ns |\n| Select 1M points | 3.4M queries/sec | ~290ns |\n\nRun benchmarks yourself:\n```bash\ncargo bench\n```\n\n## Module Overview\n\n### Core Modules\n\n| Module | Description |\n|--------|-------------|\n| `storage` | Main storage engine with builder pattern configuration |\n| `partition` | Time-based data partitioning (memory and disk implementations) |\n| `encoding` | Gorilla compression for efficient time-series storage |\n| `wal` | Write-ahead logging for durability and crash recovery |\n| `label` | Multi-dimensional metric labeling and marshaling |\n\n### Infrastructure Modules\n\n| Module | Description |\n|--------|-------------|\n| `cgroup` | Container-aware CPU and memory limit detection |\n| `mmap` | Platform-optimized memory-mapped file operations |\n| `concurrency` | Worker pools, semaphores, and rate limiters |\n| `bstream` | Bit-level streaming for compression algorithms |\n| `list` | Thread-safe partition list management |\n\n### Utility Modules\n\n| Module | Description |\n|--------|-------------|\n| `error` | Comprehensive error types with context |\n\n## Advanced Usage\n\n### Concurrent Operations\n\ntsink is designed for high-concurrency scenarios:\n\n```rust\nuse std::thread;\nuse std::sync::Arc;\n\nlet storage = Arc::new(StorageBuilder::new().build()?);\n\n// Spawn multiple writer threads\nlet mut handles = vec![];\nfor worker_id in 0..10 {\n    let storage = storage.clone();\n    let handle = thread::spawn(move || {\n        for i in 0..1000 {\n            let row = Row::new(\n                \"concurrent_metric\",\n                DataPoint::new(1600000000 + i, i as f64),\n            );\n            storage.insert_rows(\u0026[row]).unwrap();\n        }\n    });\n    handles.push(handle);\n}\n\n// Wait for all threads\nfor handle in handles {\n    handle.join().unwrap();\n}\n```\n\n### Out-of-Order Insertion\n\ntsink handles out-of-order data points automatically:\n\n```rust\n// Insert data points in random order\nlet rows = vec![\n    Row::new(\"metric\", DataPoint::new(1600000500, 5.0)),\n    Row::new(\"metric\", DataPoint::new(1600000100, 1.0)),  // Earlier timestamp\n    Row::new(\"metric\", DataPoint::new(1600000300, 3.0)),\n    Row::new(\"metric\", DataPoint::new(1600000200, 2.0)),  // Out of order\n];\n\nstorage.insert_rows(\u0026rows)?;\n\n// Query returns points in correct chronological order\nlet points = storage.select(\"metric\", \u0026[], 1600000000, 1600001000)?;\nassert!(points.windows(2).all(|w| w[0].timestamp \u003c= w[1].timestamp));\n```\n\n### Container Deployment\n\ntsink automatically detects container resource limits:\n\n```rust\n// tsink reads cgroup limits automatically\nlet storage = StorageBuilder::new()\n    .with_max_writers(0)  // 0 = auto-detect from cgroup\n    .build()?;\n\n// In a container with 2 CPU limit, this will use 2 workers\n// even if the host has 16 CPUs\n```\n\n### WAL Recovery\n\nAfter a crash, tsink automatically recovers from WAL:\n\n```rust\n// First run - data is written to WAL\nlet storage = StorageBuilder::new()\n    .with_data_path(\"/data/tsink\")\n    .build()?;\nstorage.insert_rows(\u0026rows)?;\n// Crash happens here...\n\n// Next run - data is recovered from WAL automatically\nlet storage = StorageBuilder::new()\n    .with_data_path(\"/data/tsink\")  // Same path\n    .build()?;  // Recovery happens here\n\n// Previously inserted data is available\nlet points = storage.select(\"metric\", \u0026[], 0, i64::MAX)?;\n```\n\n## Examples\n\nRun the comprehensive example showcasing all features:\n\n```bash\ncargo run --example comprehensive\n```\n\nOther examples:\n- `basic_usage` - Simple insert and query operations\n- `persistent_storage` - Disk-based storage with WAL\n- `production_example` - Production-ready configuration\n\n## Testing\n\nRun the test suite:\n\n```bash\n# Run all tests\ncargo test\n\n# Run with verbose output\ncargo test -- --nocapture\n\n# Run specific test module\ncargo test storage::tests\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n### Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/h2337/tsink.git\ncd tsink\n\n# Run tests\ncargo test\n\n# Run benchmarks\ncargo bench\n\n# Check formatting\ncargo fmt -- --check\n\n# Run clippy\ncargo clippy -- -D warnings\n```\n\n## License\n\n- MIT License\n","funding_links":[],"categories":["Applications","Rust"],"sub_categories":["Database"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fh2337%2Ftsink","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fh2337%2Ftsink","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fh2337%2Ftsink/lists"}