{"id":30180853,"url":"https://github.com/infinilabs/zipora","last_synced_at":"2026-04-05T06:01:34.372Z","repository":{"id":307966485,"uuid":"1031242049","full_name":"infinilabs/zipora","owner":"infinilabs","description":"Zipora – High-performance Rust compression with In-place compressed-access (no full decompression).","archived":false,"fork":false,"pushed_at":"2026-04-03T05:24:15.000Z","size":4226,"stargazers_count":13,"open_issues_count":3,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-03T05:38:46.740Z","etag":null,"topics":["algorithms","compression","data-structures","memory-safety","performance","rust","simd","succinct-data-structures","zero-copy","zipora"],"latest_commit_sha":null,"homepage":"https://docs.rs/zipora","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/infinilabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-03T10:25:22.000Z","updated_at":"2026-04-03T05:24:19.000Z","dependencies_parsed_at":"2025-12-17T14:12:49.952Z","dependency_job_id":null,"html_url":"https://github.com/infinilabs/zipora","commit_stats":null,"previous_names":["infinilabs/zipora"],"tags_count":20,"template":false,"template_full_name":null,"purl":"pkg:github/infinilabs/zipora","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/infinilabs%2Fzipora","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/infinilabs%2Fzipora/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/infinilabs%2Fzipora/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/infinilabs%2Fzipora/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/infinilabs","download_url":"https://codeload.github.com/infinilabs/zipora/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/infinilabs%2Fzipora/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31426193,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T02:22:46.605Z","status":"ssl_error","status_checked_at":"2026-04-05T02:22:33.263Z","response_time":75,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","compression","data-structures","memory-safety","performance","rust","simd","succinct-data-structures","zero-copy","zipora"],"created_at":"2025-08-12T08:05:46.305Z","updated_at":"2026-04-05T06:01:34.345Z","avatar_url":"https://github.com/infinilabs.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Zipora\n\n[中文](README_cn.md)\n\n[![Build Status](https://github.com/infinilabs/zipora/workflows/CI/badge.svg)](https://github.com/infinilabs/zipora/actions)\n[![License](https://img.shields.io/badge/license-BDL--1.0-blue.svg)](LICENSE)\n[![Rust Version](https://img.shields.io/badge/rust-1.88+-orange.svg)](https://www.rust-lang.org)\n\nHigh-performance Rust data structures and compression algorithms with memory safety guarantees.\n\n## Key Features\n\n- **High Performance**: Zero-copy operations, SIMD optimizations (AVX2, AVX-512), cache-friendly layouts\n- **Memory Safety**: 99.8% unsafe block documentation coverage, all production unsafe blocks annotated with `// SAFETY:` comments\n- **Secure Memory Management**: Production-ready memory pools with thread safety and RAII\n- **Blob Storage**: 8 specialized stores with trie-based indexing and compression\n- **Succinct Data Structures**: 12 rank/select variants\n- **Specialized Containers**: 13+ containers (VecTrbSet/Map, MinimalSso, SortedUintVec, LruMap, etc.)\n- **Hash Maps**: Golden ratio optimized, string-optimized, cache-optimized implementations\n- **Advanced Tries**: Double-Array (DoubleArrayTrie, XOR transitions), LOUDS, Critical-Bit (BMI2), Patricia tries with rank/select, NestTrieDawg\n- **Compression**: PA-Zip, Huffman O0/O1/O2, FSE, rANS, ZSTD integration\n- **C FFI Support**: Complete C API (`--features ffi`)\n\n## Quick Start\n\n```toml\n[dependencies]\nzipora = \"3.1.1\"\n\n# With C FFI bindings\nzipora = { version = \"3.1.1\", features = [\"ffi\"] }\n\n# AVX-512 (nightly only)\nzipora = { version = \"3.1.1\", features = [\"avx512\"] }\n```\n\n### Basic Usage\n\n```rust\nuse zipora::*;\n\n// High-performance vector\nlet mut vec = FastVec::new();\nvec.push(42).unwrap();\n\n// Zero-copy strings with SIMD hashing\nlet s = FastStr::from_string(\"hello world\");\nprintln!(\"Hash: {:x}\", s.hash_fast());\n\n// Intelligent rank/select with automatic optimization\nlet mut bv = BitVector::new();\nfor i in 0..1000 { bv.push(i % 7 == 0).unwrap(); }\nlet adaptive_rs = AdaptiveRankSelect::new(bv).unwrap();\nlet rank = adaptive_rs.rank1(500);\n\n// Unified Trie - Strategy-based configuration\nuse zipora::fsa::{ZiporaTrie, ZiporaTrieConfig, Trie};\n\nlet mut trie = ZiporaTrie::new();\ntrie.insert(b\"hello\").unwrap();\nassert!(trie.contains(b\"hello\"));\n\n// Unified Hash Map - Strategy-based configuration\nuse zipora::hash_map::{ZiporaHashMap, ZiporaHashMapConfig};\n\nlet mut map = ZiporaHashMap::new();\nmap.insert(\"key\", \"value\").unwrap();\n\n// Blob storage with compression\nlet config = ZipOffsetBlobStoreConfig::performance_optimized();\nlet mut builder = ZipOffsetBlobStoreBuilder::with_config(config).unwrap();\nbuilder.add_record(b\"Compressed data\").unwrap();\nlet store = builder.finish().unwrap();\n\n// Entropy coding\nlet encoder = HuffmanEncoder::new(b\"sample data\").unwrap();\nlet compressed = encoder.encode(b\"sample data\").unwrap();\n\n// String utilities\nuse zipora::string::{join_str, hex_encode, hex_decode, words, decimal_strcmp};\nlet joined = join_str(\", \", \u0026[\"hello\", \"world\"]);\nassert_eq!(joined, \"hello, world\");\n```\n\n## Documentation\n\n### Core Components\n- **[Containers](docs/CONTAINERS.md)** - Specialized containers (FastVec, ValVec32, IntVec, LruMap, etc.)\n- **[Hash Maps](docs/HASH_MAPS.md)** - ZiporaHashMap, GoldHashMap with strategy-based configuration\n- **[Blob Storage](docs/BLOB_STORAGE.md)** - 8 blob store variants with trie indexing and compression\n- **[Memory Management](docs/MEMORY_MANAGEMENT.md)** - SecureMemoryPool, MmapVec, five-level pools\n\n### Algorithms \u0026 Processing\n- **[Algorithms](docs/ALGORITHMS.md)** - Radix sort, suffix arrays, set operations, cache-oblivious algorithms, SIMD popcount\n- **[Compression](docs/COMPRESSION.md)** - PA-Zip, Huffman, FSE, rANS, real-time compression\n- **[String Processing](docs/STRING_PROCESSING.md)** - SIMD string operations, pattern matching\n\n### System Architecture\n- **[Concurrency](docs/CONCURRENCY.md)** - Pipeline processing, work-stealing, parallel trie building\n- **[Error Handling](docs/ERROR_HANDLING.md)** - Error classification, automatic recovery strategies\n- **[Configuration](docs/CONFIGURATION.md)** - Rich configuration APIs, presets, validation\n- **[SIMD Framework](docs/SIMD.md)** - 6-tier SIMD with AVX2/BMI2/POPCNT support\n\n### Integration\n- **[I/O \u0026 Serialization](docs/IO_SERIALIZATION.md)** - Stream processing, endian handling, varint encoding\n- **[C FFI](docs/FFI.md)** - C API for interoperability\n\n### Reference\n- **[Porting Status](docs/PORTING_STATUS.md)** - Feature implementation status\n\n## Features\n\n| Feature | Default | Description |\n|---------|---------|-------------|\n| `simd` | Yes | SIMD optimizations (AVX2, SSE4.2) |\n| `mmap` | Yes | Memory-mapped file support |\n| `zstd` | Yes | ZSTD compression |\n| `serde` | Yes | Serialization support (serde, serde_json, bincode) |\n| `lz4` | Yes | LZ4 compression |\n| `async` | Yes | Async runtime (tokio) for concurrency, pipeline, real-time compression |\n| `ffi` | No | C FFI bindings |\n| `avx512` | No | AVX-512 (nightly only) |\n| `nightly` | No | Nightly-only optimizations |\n\n## Build \u0026 Test\n\n```bash\n# Build (default features)\ncargo build --release\n\n# Build with all features including FFI\ncargo build --release --all-features\n\n# Test\ncargo test --lib\n\n# Sanity check (all feature combinations, debug + release)\nmake sanity\n\n# Benchmark (release only)\ncargo bench\n\n# Lint\ncargo clippy --all-targets --all-features -- -D warnings\n```\n\n## Verified Performance\n\n\u003e **Test Machine**: AMD EPYC 7B13 (Zen 3), 64 vCPUs, 117 GB RAM, AVX2/BMI2/POPCNT, rustc 1.91.1, Linux 6.17.\n\u003e Results vary across hardware — Intel may differ on BMI2 (native vs microcode), ARM lacks x86 SIMD paths.\n\u003e Run `cargo bench` to reproduce on your own hardware.\n\n### Trie / Term Dictionary (DoubleArrayTrie)\n\n| Operation (5000 terms) | Time | Per-op |\n|------------------------|------|--------|\n| Lookup hit | 103 µs | 20.6 ns/lookup |\n| Lookup miss | 19 µs | 3.8 ns/lookup |\n| Prefix search (5 queries) | 14 µs | 2.8 µs/query |\n| Insert (incremental) | 967 µs | 193 ns/term |\n\nXOR transitions, terminal bit in NInfo, unsafe `get_unchecked` — 3 ops, 1 branch per transition.\nSupports arbitrary binary keys including `\\x00` bytes.\n\n### BitVector (scatter + popcount)\n\n| Operation (1M bits) | Zipora | Scalar Vec\\\u003cu64\\\u003e | Ratio |\n|---------------------|--------|-------------------|-------|\n| Scatter + popcount (20×5K docs) | **1.08 ms** | 1.35 ms | **0.80x (faster)** |\n| Allocation (`with_size(1M, false)`) | **155 µs** | 247 µs | **0.63x (faster)** |\n| Popcount only (50% density) | 9.25 µs | 9.26 µs | Tied |\n\n`alloc_zeroed` (calloc), zero-copy `from_blocks`, SIMD `popcount_slice` (AVX-512 / POPCNT / AVX2 / NEON).\n\n### popcount_slice (SIMD population count)\n\n| Slice size | Throughput | Rate |\n|------------|-----------|------|\n| 16 words (128B) | 4.4 ns | 3.7 Gwords/s |\n| 781 words (6KB, engine union buffer) | 150 ns | 5.2 Gwords/s |\n| 10K words (80KB) | 1.9 µs | 5.4 Gwords/s |\n\nMulti-tier dispatch: AVX-512 VPOPCNTDQ → hardware POPCNT → AVX2 vpshufb → NEON → scalar.\nUsed internally by `BitVector::count_ones()` and available as `zipora::algorithms::popcount_slice`.\n\n### Succinct Data Structures\n\n| Operation | Zipora | Baseline | Speedup |\n|-----------|--------|----------|---------|\n| Rank1 query (100K bits) | 192 ns | — | ~5.2 Gops/s |\n| Select1 query (100K bits) | 5.4 ms / 100K queries | — | ~18.5 Mops/s |\n| Bulk rank (SIMD, 50K) | 8.4 µs | 84.1 µs (individual) | **10x** |\n| Bulk bitwise ops (SIMD, 50K) | 3.1 µs | 128.4 µs (individual) | **41x** |\n| Range set (SIMD, 50K) | 3.2 µs | 17.9 µs (individual) | **5.6x** |\n\n### Containers vs std\n\n| Operation | Zipora | std | Ratio |\n|-----------|--------|-----|-------|\n| ValVec32 push (100K) | 119 µs | 120 µs | 1.0x |\n| ValVec32 random access (100K) | 706 ns | 729 ns | **0.97x** |\n| ValVec32 iteration (10K) | 778 ns | 783 ns | 1.0x |\n| ValVec32 bulk extend (100K) | 21.8 µs | 28.7 µs | **0.76x** |\n| SmallMap insert+lookup (8 keys) | 444 ns | 805 ns (HashMap) | **1.8x** |\n| SmallMap lookup-intensive | 36.9 µs | 141.7 µs (HashMap) | **3.8x** |\n| CircularQueue push+pop (100K) | 326 µs | 381 µs (VecDeque) | **0.86x** |\n| FixedStr16Vec push (100K) | 755 µs | 5,906 µs (Vec\\\u003cString\\\u003e) | **7.8x** |\n| SortableStrVec sort (5K) | 390 µs | 448 µs (Vec\\\u003cString\\\u003e) | **1.15x** |\n\n### Entropy Coding (65KB input)\n\n| Algorithm | Entropy 0.5 | Entropy 2.0 | Entropy 6.0 |\n|-----------|-------------|-------------|-------------|\n| Huffman O0 | 1,124 µs | 1,235 µs | 1,720 µs |\n| Huffman O1 (x1 stream) | 188 µs | 173 µs | 188 µs |\n| rANS64 | 405 µs | 351 µs | 426 µs |\n\n### Cache (LRU vs HashMap)\n\n| Operation | LruMap | HashMap | Note |\n|-----------|--------|---------|------|\n| Hot get (cap=64, 10K ops) | 5.7 µs | 152 µs | **26x** faster (hot-set fits in cache) |\n| Hot get (cap=1024, 10K ops) | 94.6 µs | 152 µs | **1.6x** faster |\n| Insert (cap=64, 10K ops) | 1,897 µs | 1,177 µs | 0.62x (eviction overhead) |\n\n## Dependencies\n\nMinimal dependency footprint by design:\n- **Core**: `bytemuck`, `thiserror`, `log`, `ahash`, `rayon`, `libc`, `once_cell`, `raw-cpuid`\n- **Default**: `memmap2` (mmap), `zstd`, `lz4_flex`, `serde`/`serde_json`/`bincode`, `tokio` (async)\n- **Optional**: `cbindgen` (ffi)\n- **Removed**: `crossbeam-utils`, `parking_lot`, `uuid`, `num_cpus`, `async-trait`, `futures` (all replaced with std or eliminated)\n\n## Building a Search Engine with Zipora\n\nZipora provides the core building blocks for high-performance search engines: succinct posting lists, compressed document storage, trie-based term dictionaries, SIMD-accelerated query processing, and multi-threaded indexing pipelines.\n\n### Architecture Overview\n\n```\n Documents                    Query\n     |                          |\n     v                          v\n [Tokenizer]              [Query Parser]\n     |                          |\n     v                          v\n [Term Dictionary]  ---\u003e  [Term Lookup]        ZiporaTrie / DoubleArrayTrie\n     |                          |\n     v                          v\n [Inverted Index]  ---\u003e  [Posting Lists]       UintVecMin0 / SortedUintVec / BitVector\n     |                          |\n     v                          v\n [Document Store]  ---\u003e  [Doc Retrieval]       DictZipBlobStore / MixedLenBlobStore\n     |                          |\n     v                          v\n [Compression]            [Ranking]            HuffmanEncoder / Rans64Encoder\n```\n\n### 1. Term Dictionary (Trie-based)\n\nUse `DoubleArrayTrie` (double-array trie with XOR transitions) for maximum performance — 8 bytes per state with O(1) transitions per byte. Supports arbitrary binary keys including `\\x00` bytes. For large vocabularies, it's 3-5x more memory-efficient than `HashMap\u003cString, u32\u003e` while providing faster lookups.\n\n```rust\nuse zipora::DoubleArrayTrie;\n\n// Build term dictionary during indexing\nlet mut dict = DoubleArrayTrie::new();\n\nfor term in terms.iter() {\n    dict.insert(term.as_bytes()).unwrap();\n}\n\n// Query-time lookup: O(|key|) with O(1) per-byte transitions\nassert!(dict.contains(b\"search\"));\n\n// For key-value storage (term → term_id)\n// DoubleArrayTrieMap\u003cV\u003e requires V: MapValue (configurable sentinel for zero-cost Option\u003cV\u003e elimination)\n// Built-in impls: i32 (MIN), u32 (MAX), i64 (MIN), u64 (MAX), usize (MAX)\nuse zipora::DoubleArrayTrieMap;\nlet mut term_ids: DoubleArrayTrieMap\u003cu32\u003e = DoubleArrayTrieMap::new();\nfor (term_id, term) in terms.iter().enumerate() {\n    term_ids.insert(term.as_bytes(), term_id as u32).unwrap();\n}\nlet id = term_ids.get(b\"search\");\n```\n\n`DoubleArrayTrieMap\u003cV\u003e` uses the `MapValue` trait with a compile-time sentinel constant instead of `Option\u003cV\u003e`, halving the values array memory footprint for primitive types (e.g., 4 bytes vs 8 bytes per slot for `i32`). The sentinel is monomorphized to a single `cmp` instruction — zero runtime cost.\n\nFor alternative trie strategies (LOUDS, Patricia, CritBit), use `ZiporaTrie` with explicit config. For compressed term storage with prefix sharing, use `NestLoudsTrieBlobStore`.\n\n### 2. Inverted Index (Posting Lists)\n\nChoose the right container based on posting list characteristics:\n\n```rust\nuse zipora::containers::{UintVecMin0, ZipIntVec};\nuse zipora::blob_store::SortedUintVec;\nuse zipora::BitVector;\n\n// Option A: UintVecMin0 — variable-width packed integers (2-58 bits per value)\n// Best for: medium-length posting lists with bounded doc IDs\nlet mut postings = UintVecMin0::new();\nfor doc_id in matching_docs {\n    postings.push(doc_id);\n}\n// Access: postings.get(i) — O(1), cache-friendly sequential layout\n\n// Option B: SortedUintVec — delta + block compression for sorted doc IDs\n// Best for: long posting lists (60-80% space reduction vs raw u32)\n\n// Option C: BitVector + RankSelect — bitmap representation\n// Best for: high-frequency terms (\u003e10% of docs), boolean queries\nlet mut bitmap = BitVector::new();\nfor i in 0..num_docs {\n    bitmap.push(doc_ids.contains(\u0026i)).unwrap();\n}\n```\n\n### 3. Boolean Query Processing (Set Operations)\n\nSIMD-accelerated set operations on posting lists — **up to 41x faster** than element-by-element processing for bitwise operations.\n\n```rust\nuse zipora::algorithms::set_ops::{\n    multiset_intersection,   // AND queries\n    multiset_union,          // OR queries\n    multiset_difference,     // NOT queries\n    multiset_fast_intersection, // adaptive: picks best algo by size ratio\n};\n\n// AND query: \"rust\" AND \"search\"\nlet result = multiset_intersection(\u0026postings_rust, \u0026postings_search);\n\n// For skewed sizes (one term rare, one common), use adaptive intersection\n// Automatically picks linear merge vs binary search based on |A|/|B| ratio\nlet result = multiset_fast_intersection(\u0026rare_term, \u0026common_term);\n\n// Bulk bitwise on rank/select bitvectors (41x faster with SIMD)\nuse zipora::AdaptiveRankSelect;\nlet rs = AdaptiveRankSelect::new(bitmap).unwrap();\nlet rank = rs.rank1(doc_id);   // count docs before this ID — O(1)\nlet pos = rs.select1(rank);    // find N-th matching doc — O(log n)\n```\n\n### 4. Document Storage (Compressed Blob Stores)\n\nStore and retrieve documents with dictionary compression (PA-Zip):\n\n```rust\nuse zipora::DictZipBlobStore;\nuse zipora::blob_store::{MixedLenBlobStore, PlainBlobStore, BlobStore};\n\n// DictZipBlobStore: best compression for similar documents (web pages, logs)\n// Learns a shared dictionary from training data, then compresses each record\nlet store = DictZipBlobStore::builder()\n    .build_from_records(\u0026documents)\n    .unwrap();\n\n// Retrieve: zero-copy access via mmap\nlet doc = store.get(doc_id).unwrap();\n\n// MixedLenBlobStore: optimal for mixed fixed/variable-length records\n// Automatically selects storage strategy based on record size distribution\n\n// PlainBlobStore: uncompressed, fastest retrieval for hot data\n```\n\n### 5. Entropy Coding (Posting List Compression)\n\nCompress posting list deltas with Huffman or rANS:\n\n```rust\nuse zipora::HuffmanEncoder;\nuse zipora::Rans64Encoder;\n\n// Huffman O0: simple, fast encoding (1.1 µs per 65KB)\nlet encoder = HuffmanEncoder::new(\u0026training_data).unwrap();\nlet compressed = encoder.encode(\u0026delta_encoded_postings).unwrap();\n\n// Huffman O1: context-aware, better compression for structured data\n// Particularly effective for posting list deltas with skewed distributions\n\n// rANS: highest compression ratio, slightly slower\nlet rans = Rans64Encoder::new(\u0026training_data).unwrap();\nlet compressed = rans.encode(\u0026data).unwrap();\n```\n\n### 6. Multi-threaded Indexing\n\nParallelize index building with rayon and zipora's pipeline processing:\n\n```rust\nuse rayon::prelude::*;\nuse zipora::algorithms::MultiWayMerge;\n\n// Parallel document processing: each thread builds a segment\nlet segments: Vec\u003c_\u003e = document_batches\n    .par_iter()\n    .map(|batch| {\n        let mut segment_index = SegmentIndex::new();\n        for doc in batch {\n            let terms = tokenize(doc);\n            for term in terms {\n                segment_index.add(term, doc.id);\n            }\n        }\n        segment_index\n    })\n    .collect();\n\n// Merge segments using k-way merge (loser tree)\nuse zipora::EnhancedLoserTree;\n// EnhancedLoserTree provides O(log k) per element for k-way merge\n// Ideal for merging sorted posting lists from parallel index segments\n```\n\nFor async pipeline processing (requires `async` feature):\n\n```rust\nuse zipora::Pipeline;\n// Pipeline stages: parse → tokenize → index → compress → flush\n// Each stage runs concurrently with work-stealing load balancing\n```\n\n### 7. Memory-Mapped Index Files\n\nServe large indices directly from disk without loading into RAM:\n\n```rust\nuse zipora::memory::MmapVec;\n\n// Memory-map an index file — OS manages paging\nlet index: MmapVec\u003cu32\u003e = MmapVec::open(\"postings.idx\").unwrap();\n\n// Random access is backed by the page cache\nlet doc_id = index[position];\n\n// For blob stores, use mmap-backed storage\n// DictZipBlobStore and NestLoudsTrieBlobStore support mmap natively\n```\n\n### 8. Query Result Caching\n\nLRU cache for frequently accessed posting lists — **26x faster** hot-set retrieval vs HashMap:\n\n```rust\nuse zipora::containers::specialized::LruMap;\n\n// Cache hot posting lists\nlet mut cache: LruMap\u003cString, Vec\u003cu32\u003e\u003e = LruMap::new(1024);\n\nfn get_postings(term: \u0026str, cache: \u0026mut LruMap\u003cString, Vec\u003cu32\u003e\u003e) -\u003e Vec\u003cu32\u003e {\n    if let Some(cached) = cache.get(term) {\n        return cached.clone(); // 26x faster than HashMap for hot keys\n    }\n    let postings = load_from_disk(term);\n    cache.insert(term.to_string(), postings.clone());\n    postings\n}\n```\n\n### 9. String Processing for Tokenization\n\n```rust\nuse zipora::SortableStrVec;\nuse zipora::string::{decimal_strcmp, words};\n\n// Arena-based string storage: 7.8x faster than Vec\u003cString\u003e for push (100K strings)\nlet mut terms = SortableStrVec::new();\nfor token in document.split_whitespace() {\n    terms.push(token);\n}\nterms.sort(); // In-place sort, 1.15x faster than Vec\u003cString\u003e::sort\n\n// For small lookup tables (field names, stop words), SmallMap is 3.8x faster\nuse zipora::SmallMap;\nlet mut stop_words = SmallMap::new();\nstop_words.insert(\"the\", true);\nstop_words.insert(\"and\", true);\n```\n\n### Component Selection Guide\n\n| Search Engine Component | Zipora Type | When to Use |\n|------------------------|-------------|-------------|\n| Term dictionary | `DoubleArrayTrie` | Default choice, 8 bytes/state, XOR transitions |\n| Term dictionary (alternatives) | `ZiporaTrie` | LOUDS/Patricia/CritBit via config |\n| Short posting lists | `UintVecMin0` | Variable-width, \u003c1M doc IDs |\n| Long posting lists | `SortedUintVec` | Delta-compressed sorted IDs |\n| Boolean posting lists | `BitVector` + `AdaptiveRankSelect` | High-frequency terms, bitwise ops |\n| AND/OR/NOT queries | `set_ops::multiset_*` | Sorted posting list intersection |\n| Bulk bitwise queries | SIMD rank/select | 10-41x faster than scalar |\n| Document storage | `DictZipBlobStore` | Best compression for similar docs |\n| Document storage (fast) | `PlainBlobStore` | Uncompressed, fastest retrieval |\n| Posting compression | `HuffmanEncoder` | Fast encode/decode |\n| Posting compression | `Rans64Encoder` | Best compression ratio |\n| Query cache | `LruMap` | 26x faster hot-set access |\n| Small lookups | `SmallMap` | 3.8x faster for ≤8 keys |\n| String storage | `SortableStrVec` / `FixedStr16Vec` | Arena-based, 7.8x vs Vec\\\u003cString\\\u003e |\n| Index files | `MmapVec` | Disk-backed, OS-managed paging |\n| Segment merge | `MultiWayMerge` / `EnhancedLoserTree` | K-way merge of sorted lists |\n| Parallel indexing | `rayon` + `Pipeline` | Multi-threaded segment building |\n\n## License\n\nBusiness Source License 1.0 - See [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinilabs%2Fzipora","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finfinilabs%2Fzipora","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinilabs%2Fzipora/lists"}