{"id":15031446,"url":"https://github.com/rana/svb","last_synced_at":"2026-01-02T03:21:04.679Z","repository":{"id":186264665,"uuid":"674776059","full_name":"rana/svb","owner":"rana","description":"Stream variable byte compression in Rust.","archived":false,"fork":false,"pushed_at":"2024-10-31T19:29:22.000Z","size":38,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-19T22:51:02.724Z","etag":null,"topics":["compression","compression-algorithm","compression-library","integer-compression","rust","rust-lang","rust-library","rustlang","x64"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rana.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-04T18:43:43.000Z","updated_at":"2024-10-31T19:29:26.000Z","dependencies_parsed_at":"2025-01-19T22:48:19.272Z","dependency_job_id":"be1d7dc8-d2bf-4971-9f21-6e1d3cfc381f","html_url":"https://github.com/rana/svb","commit_stats":null,"previous_names":["rana/svb"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rana%2Fsvb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rana%2Fsvb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rana%2Fsvb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rana%2Fsvb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rana","download_url":"https://codeload.github.com/rana/svb/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243351839,"owners_count":20276908,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","compression-algorithm","compression-library","integer-compression","rust","rust-lang","rust-library","rustlang","x64"],"created_at":"2024-09-24T20:15:41.890Z","updated_at":"2026-01-02T03:21:04.639Z","avatar_url":"https://github.com/rana.png","language":"Rust","readme":"# SVB - Stream Variable Byte Compression\n\nA high-performance integer compression library implemented in Rust that uses SIMD instructions to compress 32-bit unsigned integers into variable-length byte sequences. This implementation follows the Stream VByte algorithm described in [\"Stream VByte: Faster Byte-Oriented Integer Compression\"](https://arxiv.org/abs/1709.08990).\n\n## Key Features\n\n- **Variable-length Compression**: Efficiently compresses 32-bit integers into 1-4 bytes based on value magnitude\n- **SIMD Optimization**: Uses x86_64 SIMD instructions for parallel processing of integer blocks\n- **Dual Implementation**: Provides both scalar and SIMD variants for maximum compatibility\n- **Zero-Copy Design**: Employs unsafe Rust for direct memory manipulation without unnecessary copying\n- **Memory-efficient**: Uses compact control headers (2 bits per integer) to track compression ratios\n\n## Technical Implementation\n\n### Compression Format\n\nThe compressed data format consists of three sections:\n1. Total Integer Count (usize bytes)\n2. Control Headers (compressed size indicators)\n3. Compressed Data (variable-length encoded integers)\n\n### Control Headers\n\nEach control header uses 2 bits to indicate compression level:\n- `00` (0): 1-byte compression\n- `01` (1): 2-byte compression\n- `10` (2): 3-byte compression\n- `11` (3): 4-byte compression (uncompressed)\n\nHeaders are packed four per byte, with bits ordered right-to-left within each byte.\n\n### Performance Optimizations\n\n1. **SIMD Processing**\n   - Processes 8 integers simultaneously using 128-bit SIMD registers\n   - Uses specialized x86_64 instructions for parallel comparisons and bit manipulation\n   - Includes lookup tables for rapid compression length calculation\n\n2. **Memory Management**\n   - Direct memory manipulation using unsafe Rust for zero-copy operations\n   - Efficient slice manipulation without unnecessary allocations\n   - Careful pointer arithmetic for optimal performance\n\n3. **Error Handling**\n   - Comprehensive validation of input data\n   - Robust error handling using the `anyhow` crate\n   - Proper bounds checking during compression/decompression\n\n## Implementation Details\n\n### Core Components\n\n1. **Scalar Implementation (`scl.rs`)**\n   - Traditional single-integer processing\n   - Fallback implementation for non-SIMD platforms\n   - Clear, maintainable code for reference\n\n2. **SIMD Implementation (`smd.rs`)**\n   - Leverages x86_64 SIMD instructions\n   - Processes multiple integers in parallel\n   - Uses lookup tables for optimization\n\n3. **Common Utilities (`lib.rs`)**\n   - Shared constants and utilities\n   - Header calculation functions\n   - Type definitions and common traits\n\n### Testing\n\n- Comprehensive unit tests for both implementations\n- Property-based testing with random input data\n- Edge case validation\n- Performance benchmarking comparisons\n\n## Technical Achievements\n\n1. **Memory Efficiency**\n   - Optimal compression ratios for different integer ranges\n   - Minimal memory overhead for control structures\n   - Efficient handling of large datasets\n\n2. **Performance**\n   - SIMD parallelization for up to 8x throughput\n   - Minimal branching in critical paths\n   - Efficient bit manipulation techniques\n\n3. **Code Quality**\n   - Type-safe Rust implementation\n   - Clear separation of concerns\n   - Well-documented interfaces\n   - Comprehensive error handling\n\n## Usage\n\n```rust\nuse svb::{smd, scl};\n\n// SIMD-accelerated compression\nlet compressed = smd::enc(\u0026integers)?;\n\n// SIMD-accelerated decompression\nlet decompressed = smd::dec(\u0026compressed)?;\n\n// Scalar fallback compression\nlet compressed = scl::enc(\u0026integers)?;\n\n// Scalar fallback decompression\nlet decompressed = scl::dec(\u0026compressed)?;\n```\n\n## Skills Demonstrated\n\n- Advanced Rust programming\n- SIMD optimization\n- Low-level memory management\n- Algorithm implementation\n- Performance optimization\n- Systems programming\n- Technical documentation\n- Test-driven development\n\n## References\n\n- [Stream VByte: Faster Byte-Oriented Integer Compression](https://arxiv.org/abs/1709.08990)\n- [Original C Implementation](https://github.com/lemire/streamvbyte)\n\n\n## Byte Layout\n\nBytes are organized as `total integer count`, followed by `control headers`, followed by the `compressed data`.\n\n| Total Integer Count | Control Headers | Compressed Data |\n|---------------------|-----------------|-----------------|\n| `usize bytes`       | `bytes`         | `bytes`         |\n\n\u003e Byte layout for svb compression.\n\n## Control header\n\n`Two bits` indicate how much compression occurs in a 4-byte integer. \n\nThe two bits are called a control header.\n\n| Compression Size      | 1 byte | 2 bytes | 3 bytes | 4 bytes |\n|-----------------------|--------|---------|---------|---------|\n| Bit value             | `00`   | `01`    | `10`    | `11`    |\n| Integer value of bits | 0      | 1       | 2       | 3       |\n\n\u003e Compression size represented as two bits. \n\n\n\nA header byte holds four control headers.\n\nWithin the header byte, bit values are indexed from right-to-left.\n\n| Header Byte Index  | 3    | 2    | 1    | 0    |\n|--------------------|------|------|------|------|\n| Example bit values | `00` | `00` | `11` | `01` |\n\n\u003e A header byte containing four header values. The right-most two bits indicate compression size for the first integer.\n\n## Development notes\n\nLemire blog: [Stream VByte: breaking new speed records for integer compression](https://lemire.me/blog/2017/09/27/stream-vbyte-breaking-new-speed-records-for-integer-compression/)\n\narXiv article: [Stream VByte: Faster Byte-Oriented Integer Compression](https://arxiv.org/abs/1709.08990)\n\nLemire C code: [streamvbyte](https://github.com/lemire/streamvbyte)\n* Good overview of format in README.\n\nPierce Rust code: [stream-vbyte-rust](https://bitbucket.org/marshallpierce/stream-vbyte-rust/src/master/)\n\n\n## File Tree\n.\n├── Cargo.lock\n├── Cargo.toml\n├── LICENSE\n├── README.md\n└── svb\n    ├── Cargo.toml\n    └── src\n        ├── lib.rs\n        ├── scl.rs\n        └── smd.rs","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frana%2Fsvb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frana%2Fsvb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frana%2Fsvb/lists"}