{"id":50128699,"url":"https://github.com/CrystallineCore/Biscuit","last_synced_at":"2026-06-26T13:01:13.655Z","repository":{"id":362341816,"uuid":"1110746265","full_name":"CrystallineCore/Biscuit","owner":"CrystallineCore","description":"Biscuit – a high-performance, bitmap-based deterministic index for PostgreSQL ","archived":false,"fork":false,"pushed_at":"2026-06-26T10:58:51.000Z","size":9807,"stargazers_count":293,"open_issues_count":0,"forks_count":4,"subscribers_count":6,"default_branch":"main","last_synced_at":"2026-06-26T12:19:10.393Z","etag":null,"topics":["indexing","open-source","postgresql","postgresql-extension","wildcard"],"latest_commit_sha":null,"homepage":"https://pgxn.org/dist/biscuit/","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CrystallineCore.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["crystallinecore"]}},"created_at":"2025-12-05T16:52:12.000Z","updated_at":"2026-06-26T10:52:00.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/CrystallineCore/Biscuit","commit_stats":null,"previous_names":["crystallinecore/biscuit"],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/CrystallineCore/Biscuit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CrystallineCore%2FBiscuit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CrystallineCore%2FBiscuit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CrystallineCore%2FBiscuit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CrystallineCore%2FBiscuit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CrystallineCore","download_url":"https://codeload.github.com/CrystallineCore/Biscuit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CrystallineCore%2FBiscuit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34817641,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-26T02:00:06.560Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["indexing","open-source","postgresql","postgresql-extension","wildcard"],"created_at":"2026-05-23T21:00:29.530Z","updated_at":"2026-06-26T13:01:13.649Z","avatar_url":"https://github.com/CrystallineCore.png","language":"C","funding_links":["https://github.com/sponsors/crystallinecore"],"categories":["C"],"sub_categories":[],"readme":"# Biscuit - High-Performance Pattern Matching Index for PostgreSQL\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![PostgreSQL: 16+](https://img.shields.io/badge/PostgreSQL-16%2B-blue.svg)](https://www.postgresql.org/)\n[![Read the Docs](https://img.shields.io/badge/Read%20the%20Docs-8CA1AF?logo=readthedocs\u0026logoColor=fff)](https://biscuit.readthedocs.io/)\n\n\n**Biscuit** is a specialized PostgreSQL index access method (IAM) designed for blazing-fast pattern matching on `LIKE` and `ILIKE` queries, with native support for multi-column searches. It eliminates the recheck overhead of trigram indexes while delivering significant performance improvements on wildcard-heavy queries. It stands for _**B**itmap **I**ndexed **S**earching with **C**omprehensive **U**nion and **I**ntersection **T**echniques_.\n\n---\n## Stability Notice\n\nThis extension is currently under active development and has not yet received the level of testing and operational experience expected of production-ready software.\n\nUsers are encouraged to evaluate the extension thoroughly in development and staging environments before considering deployment in production systems. In particular, testing should include representative datasets, workloads, upgrade procedures, backup and recovery workflows, and performance validation.\n\nAlthough the extension is intended to operate safely and reliably, defects or unexpected behavior may still be present. As with any new database component, appropriate backups and validation procedures should be maintained before use.\n\nAt this stage, the extension is best suited for evaluation, experimentation, and non-critical workloads. Production deployment should be undertaken only after careful testing and assessment of its suitability for the intended environment.\n\n---\n\n## What's new in Version 2.4.0?\n\n### New Features\n\n* **Expression index support:** Biscuit now correctly evaluates arbitrary index key expressions during index builds, enabling indexes such as:\n  ```sql\n  CREATE INDEX idx ON table USING biscuit (lower(column_1), (column_2::text));\n  ```\n\n\n* **Multi-version build support (PG 16, 17, 18, 19beta1):** Biscuit can now be compiled and installed against PostgreSQL 16 and 17, in addition to the already supported PG 18 and PG 19 Beta. All version-specific API differences are handled at compile time via `#if PG_VERSION_NUM` guards.\n\n### Bug Fixes\n\n* **Multi-column parallel scan returned duplicate rows:** In the multi-column fallback scan path, every Gather participant was calling `biscuit_collect_sorted_tids_single()` unconditionally, causing each worker to return the full TID set and the Gather node to assemble N× the expected rows. The call site now mirrors the single-column path by resolving the shared-memory parallel scan descriptor and dispatching through `biscuit_collect_sorted_tids_parallel()`, so each participant claims a disjoint slice of the pre-partitioned TID array.\n\n* **`biscuit_operators` view no longer breaks when additional operator classes are added:** The view previously filtered on a hardcoded `opfname = 'biscuit_text_ops'`. It now joins through `pg_am` and filters on `am.amname = 'biscuit'`, staying correct without edits if new opclasses or opfamilies are later added. The view also surfaces the opfamily name per row.\n\n### Internal Changes\n\n* **Parallel scan callbacks are conditionally compiled for PG 18+:** `amcanparallel`, `amestimateparallelscan`, `aminitparallelscan`, and `amparallelrescan` are only registered when `PG_VERSION_NUM \u003e= 180000`. On PG 16 and 17 the parallel fields are set to `false` / `NULL`.\n\n* **Cross-version compatibility macros added to `biscuit_common.h`:**\n  * `BISCUIT_PARALLEL_AM_OFFSET(ps)` abstracts the rename of `ps_offset` → `ps_offset_am` in PG 18.\n  * `BISCUIT_COUNT_INDEX_SEARCH(scan)` abstracts the index search counter, which moved from `xs_numIndexSearches` (PG 17) to `scan-\u003einstrument-\u003ensearches` (PG 18+) and did not exist in PG 16.\n  * `biscuit_estimateparallelscan` is declared with the correct signature for each major version (`void` on PG 16, `int nkeys, int norderbys` on PG 17, `Relation indexRelation, int nworkers, int nchunks` on PG 18+).\n\n* **Version string updated to `2.4.0 - Donut`.**\n\n### Notes\n\n* **CHAR(n) / `bpchar` native operator class is not yet available.** PostgreSQL defines LIKE/ILIKE operators only over `(text, text)`, so a dedicated `biscuit_bpchar_ops` operating directly on padded `bpchar` values would require new C-level operator implementations. As a supported workaround, CHAR(n) columns can be indexed today via an expression index on the text cast:\n  ```sql\n  CREATE INDEX idx ON table USING biscuit ((char_col::text));\n  ```\n  This is documented in `biscuit.sql` and reflected in the updated `biscuit_operators` view comment.\n\n---\n\n##  **Installation**\n\n### **Requirements**\n- Build tools: `gcc`, `make`, `pg_config`\n- Recommended: CRoaring library for enhanced performance\n\n### **From Source**\n\n```bash\n# Clone repository\ngit clone https://github.com/Crystallinecore/biscuit.git\ncd biscuit\n\n# Build and install\nmake\nsudo make install\n\n# Enable in PostgreSQL\npsql -d your_database -c \"CREATE EXTENSION biscuit;\"\n```\n\n### **From PGXN**\n\n```bash\npgxn install biscuit\npsql -d your_database -c \"CREATE EXTENSION biscuit;\"\n```\n\n---\n\n##  **Quick Start**\n\n### **Basic Usage**\n\n```sql\n-- Create a Biscuit index\nCREATE INDEX idx_users_name ON users USING biscuit(name);\n\n-- Query with wildcard patterns\nSELECT * FROM users WHERE name LIKE '%john%';\nSELECT * FROM users WHERE name NOT LIKE 'a%b%c';\nSELECT COUNT(*) FROM users WHERE name LIKE '%test%';\n```\n\n### **Multi-Column Indexes**\n\n```sql\n-- Create multi-column index\nCREATE INDEX idx_products_search \nON products USING biscuit(name, description, category);\n\n-- Multi-column query (optimized automatically)\nSELECT * FROM products \nWHERE name LIKE '%widget%' \n  AND description LIKE '%blue%'\n  AND category LIKE 'electronics%'\nLIMIT 10;\n```\n\n---\n\n##  **How It Works**\n\n### **Core Concept: Bitmap Position Indices**\n\nBiscuit builds the following bitmaps for every string:\n\n#### **1. Positive Indices (Forward)**\nTracks which records have character `c` at position `p`:\n\n```\nString: \"Hello\"\nBitmaps:\n  H@0 → {record_ids...}\n  e@1 → {record_ids...}\n  l@2 → {record_ids...}\n  l@3 → {record_ids...}\n  o@4 → {record_ids...}\n```\n\n#### **2. Negative Indices (Backward)**\nTracks which records have character `c` at position `-p` from the end:\n\n```\nString: \"Hello\"\nBitmaps:\n  o@-1 → {record_ids...}  (last char)\n  l@-2 → {record_ids...}  (second to last)\n  l@-3 → {record_ids...}\n  e@-4 → {record_ids...}\n  H@-5 → {record_ids...}\n```\n\n#### **3. Positive Indices (Case-insensitive)**\nTracks which records have character `c` at position `p`:\n\n```\nString: \"Hello\"\nBitmaps:\n  h@0 → {record_ids...}\n  e@1 → {record_ids...}\n  l@2 → {record_ids...}\n  l@3 → {record_ids...}\n  o@4 → {record_ids...}\n```\n\n#### **4. Negative Indices (Case-insensitive)**\nTracks which records have character `c` at position `-p` from the end:\n\n```\nString: \"Hello\"\nBitmaps:\n  o@-1 → {record_ids...}  (last char)\n  l@-2 → {record_ids...}  (second to last)\n  l@-3 → {record_ids...}\n  e@-4 → {record_ids...}\n  h@-5 → {record_ids...}\n```\n\n#### **5. Length Bitmaps**\nTwo types for fast length filtering:\n- **Exact length**: `length[5]` → all 5-character strings\n- **Minimum length**: `length_ge[3]` → all strings ≥ 3 characters\n\n---\n\n### **Pattern Matching Algorithm**\n\n#### **Example: `LIKE 'abc%def'`**\n\n**Step 1: Parse pattern into parts**\n```\nParts: [\"abc\", \"def\"]\nStarts with %: NO\nEnds with %: NO\n```\n\n**Step 2: Match first part as prefix**\n```sql\n-- \"abc\" must start at position 0\nCandidates = pos[a@0] ∩ pos[b@1] ∩ pos[c@2]\n```\n\n**Step 3: Match last part at end (negative indexing)**\n```sql\n-- \"def\" must end at string end\nCandidates = Candidates ∩ neg[f@-1] ∩ neg[e@-2] ∩ neg[d@-3]\n```\n\n**Step 4: Apply length constraint**\n```sql\n-- String must be at least 6 chars (abc + def)\nCandidates = Candidates ∩ length_ge[6]\n```\n\n**Result: Exact matches, zero false positives**\n\n---\n\n### **Why It's Fast**\n\n#### **1. Pure Bitmap Operations**\n```c\n// Traditional approach (pg_trgm)\nfor each trigram in pattern:\n    candidates = scan_trigram_index(trigram)\n    for each candidate:\n        if !heap_fetch_and_recheck(candidate):  // SLOW: Random I/O\n            remove candidate\n\n// Biscuit approach\nfor each character at position:\n    candidates \u0026= bitmap[char][pos]  // FAST: In-memory AND\n// No recheck needed!\n```\n\n#### **2. Roaring Bitmaps**\nCompressed bitmap representation:\n- Sparse data: array of integers\n- Dense data: bitset\n- Automatic conversion for optimal memory\n\n#### **3. Negative Indexing Optimization**\n```sql\n-- Pattern: '%xyz'\n-- Traditional: Scan all strings, check suffix\n-- Biscuit: Direct lookup in neg[z@-1] ∩ neg[y@-2] ∩ neg[x@-3]\n```\n\n---\n\n##  **12 Performance Optimizations**\n\n### **1. Skip Wildcard Intersections**\n```c\n// Pattern: \"a_c\" (underscore = any char)\n// OLD: Intersect all 256 chars at position 1\n// NEW: Skip position 1 entirely, only check a@0 and c@2\n```\n\n### **2. Early Termination on Empty**\n```c\nresult = bitmap[a][0];\nresult \u0026= bitmap[b][1];\nif (result.empty()) return empty;  // Don't process remaining chars\n```\n\n### **3. Avoid Redundant Bitmap Copies**\n```c\n// OLD: Copy bitmap for every operation\n// NEW: Operate in-place, copy only when branching\n```\n\n### **4. Optimized Single-Part Patterns**\nFast paths for common cases:\n- **Exact**: `'abc'` → Check position 0-2 and length = 3\n- **Prefix**: `'abc%'` → Check position 0-2 and length ≥ 3\n- **Suffix**: `'%xyz'` → Check negative positions -3 to -1 and length ≥ 3\n- **Substring**: `'%abc%'` → Check all positions, OR results\n\n### **5. Skip Unnecessary Length Operations**\n```c\n// Pure wildcard patterns\nif (pattern == \"%%%___%%\")  // 3 underscores\n    return length_ge[3];     // No character checks needed!\n```\n\n### **6. TID Sorting for Sequential Heap Access**\n```c\n// Sort TIDs by (block_number, offset) before returning\n// Converts random I/O into sequential I/O\n// Uses radix sort for \u003e5000 TIDs, quicksort for smaller sets\n```\n\n### **7. Batch TID Insertion**\n```c\n// For bitmap scans, insert TIDs in chunks\nfor (i = 0; i \u003c num_results; i += 10000) {\n    tbm_add_tuples(tbm, \u0026tids[i], batch_size, false);\n}\n```\n\n### **8. Direct Roaring Iteration**\n```c\n// OLD: Convert bitmap to array, then iterate\n// NEW: Direct iterator, no intermediate allocation\nroaring_uint32_iterator_t *iter = roaring_create_iterator(bitmap);\nwhile (iter-\u003ehas_value) {\n    process(iter-\u003ecurrent_value);\n    roaring_advance_uint32_iterator(iter);\n}\n```\n\n\n### **9. Batch Cleanup on Threshold**\n```c\n// After 1000 deletes, clean tombstones from all bitmaps\nif (tombstone_count \u003e= 1000) {\n    for each bitmap:\n        bitmap \u0026= ~tombstones;  // Batch operation\n    tombstones.clear();\n}\n```\n\n### **10. Aggregate Query Detection**\n```c\n// COUNT(*), EXISTS, etc. don't need sorted TIDs\nif (!scan-\u003exs_want_itup) {\n    skip_sorting = true;  // Save sorting time\n}\n```\n\n### **11. LIMIT-Aware TID Collection**\n```c\n// If LIMIT 10 in query, don't collect more than needed\nif (limit_hint \u003e 0 \u0026\u0026 collected \u003e= limit_hint)\n    break;  // Early termination\n```\n\n### **12. Multi-Column Query Optimization**\n\n#### **Predicate Reordering**\nAnalyzes each column's pattern and executes in order of selectivity:\n\n```sql\n-- Query:\nWHERE name LIKE '%common%'           -- Low selectivity\n  AND sku LIKE 'PROD-2024-%'         -- High selectivity (prefix)\n  AND description LIKE '%rare_word%' -- Medium selectivity\n\n-- Execution order (Biscuit automatically reorders):\n1. sku LIKE 'PROD-2024-%'         (PREFIX, priority=20, selectivity=0.02)\n2. description LIKE '%rare_word%' (SUBSTRING, priority=35, selectivity=0.15)\n3. name LIKE '%common%'           (SUBSTRING, priority=55, selectivity=0.60)\n```\n\n**Selectivity scoring formula:**\n```\nscore = 1.0 / (concrete_chars + 1)\n      - (underscore_count × 0.05)\n      + (partition_count × 0.15)\n      - (anchor_strength / 200)\n```\n\n**Priority tiers:**\n1. **0-10**: Exact matches, many underscores\n2. **10-20**: Non-% patterns with underscores\n3. **20-30**: Strong anchored patterns (prefix/suffix)\n4. **30-40**: Weak anchored patterns\n5. **40-50**: Multi-partition patterns\n6. **50-60**: Substring patterns (lowest priority)\n\n---\n\n##  **Benchmarking**\n\n### **Setup Test Data**\n\n```sql\n-- Create 1M row test table\nCREATE TABLE benchmark (\n    id SERIAL PRIMARY KEY,\n    name TEXT,\n    description TEXT,\n    category TEXT,\n    score FLOAT\n);\n\nINSERT INTO benchmark (name, description, category, score)\nSELECT \n    'Name_' || md5(random()::text),\n    'Description_' || md5(random()::text),\n    'Category_' || (random() * 100)::int,\n    random() * 1000\nFROM generate_series(1, 1000000);\n\n-- Create indexes\nCREATE INDEX idx_trgm ON benchmark \n    USING gin(name gin_trgm_ops, description gin_trgm_ops);\n\nCREATE INDEX idx_biscuit ON benchmark \n    USING biscuit(name, description, category);\n\nANALYZE benchmark;\n```\n\n### **Run Benchmarks**\n\n```sql\n-- Single column, simple pattern\nEXPLAIN ANALYZE\nSELECT * FROM benchmark WHERE name LIKE '%abc%' LIMIT 100;\n\n-- Multi-column, complex pattern\nEXPLAIN ANALYZE\nSELECT * FROM benchmark \nWHERE name LIKE '%a%b' \n  AND description LIKE '%bc%cd%'\nORDER BY score DESC \nLIMIT 10;\n\n-- Aggregate query (COUNT)\nEXPLAIN ANALYZE\nSELECT COUNT(*) FROM benchmark \nWHERE name LIKE 'a%l%' \n  AND category LIKE 'f%d';\n\n-- Complex multi-part pattern\nEXPLAIN ANALYZE\nSELECT * FROM benchmark \nWHERE description LIKE 'u%dc%x'\nLIMIT 50;\n```\n\n### **View Index Statistics**\n\n```sql\n-- Show internal statistics\nSELECT biscuit_index_stats('idx_biscuit'::regclass);\n```\n\n**Output:**\n```\n----------------------------------------------------\n Biscuit Index Statistics (FULLY OPTIMIZED)        +\n ==========================================        +\n Index: idx_biscuit                                +\n Active records: 1000002                           +\n Total slots: 1000002                              +\n Free slots: 0                                     +\n Tombstones: 0                                     +\n Max length: 44                                    +\n ------------------------                          +\n CRUD Statistics:                                  +\n   Inserts: 0                                      +\n   Updates: 0                                      +\n   Deletes: 0                                      +\n ------------------------                          +\n Active Optimizations:                             +\n   ✓ 1. Skip wildcard intersections                +\n   ✓ 2. Early termination on empty                 +\n   ✓ 3. Avoid redundant copies                     +\n   ✓ 4. Optimized single-part patterns             +\n   ✓ 5. Skip unnecessary length ops                +\n   ✓ 6. TID sorting for sequential I/O             +\n   ✓ 7. Batch TID insertion                        +\n   ✓ 8. Direct bitmap iteration                    +\n   ✓ 9. Parallel bitmap scan support               +\n   ✓ 10. Batch cleanup on threshold                +\n   ✓ 11. Skip sorting for bitmap scans (aggregates)+\n   ✓ 12. LIMIT-aware TID collection                +\n```\n\n\n---\n\n##  **Use Cases**\n\n### **1. Full-Text Search Applications**\n```sql\n-- E-commerce product search\nCREATE INDEX idx_products ON products \n    USING biscuit(name, brand, description);\n\nSELECT * FROM products \nWHERE name LIKE '%laptop%' \n  AND brand LIKE 'ABC%'\n  AND description LIKE '%gaming%'\nORDER BY price DESC \nLIMIT 20;\n```\n\n### **2. Log Analysis**\n```sql\n-- Search error logs\nCREATE INDEX idx_logs ON logs \n    USING biscuit(message, source, level);\n\nSELECT * FROM logs \nWHERE message LIKE '%ERROR%connection%timeout%'\n  AND source LIKE 'api.%'\n  AND timestamp \u003e NOW() - INTERVAL '1 hour'\nLIMIT 100;\n```\n\n### **3. Customer Support / CRM**\n```sql\n-- Search tickets by multiple fields\nCREATE INDEX idx_tickets ON tickets \n    USING biscuit(subject, description, customer_name);\n\nSELECT * FROM tickets \nWHERE subject LIKE '%refund%'\n  AND customer_name LIKE 'John%'\n  AND status = 'open';\n```\n\n### **4. Code Search / Documentation**\n```sql\n-- Search code repositories\nCREATE INDEX idx_files ON code_files \n    USING biscuit(filename, content, author);\n\nSELECT * FROM code_files \nWHERE filename LIKE '%.py'\n  AND content LIKE '%def%parse%json%'\n  AND author LIKE 'team-%';\n```\n\n### **5. Analytics with Aggregates**\n```sql\n-- Fast COUNT queries (no sorting overhead)\nCREATE INDEX idx_events ON events \n    USING biscuit(event_type, user_agent, referrer);\n\nSELECT COUNT(*) FROM events \nWHERE event_type LIKE 'click%'\n  AND user_agent LIKE '%Mobile%'\n  AND referrer LIKE '%google%';\n```\n\n---\n\n##  **Configuration**\n\n### **Build Options**\n\nEnable CRoaring for better performance.\n\n\n### **Index Options**\n\nCurrently, Biscuit doesn't expose tunable options. All optimizations are automatic.\n\n---\n\n##  **Limitations and Trade-offs**\n\n### **What Biscuit Does NOT Support**\n\n1. **Regular expressions** - Only `LIKE` / `ILIKE` patterns with `%` and `_`\n2. **Locale-specific collations** - String comparisons are byte-based\n3. **Amcanorder = false** - Cannot provide ordered scans directly (but see below)\n\n### **ORDER BY + LIMIT Behavior**\n\nBiscuit doesn't support ordered index scans (`amcanorder = false`), BUT:\n\n**PostgreSQL's planner handles this efficiently:**\n```sql\nSELECT * FROM table WHERE col LIKE '%pattern%' ORDER BY score LIMIT 10;\n```\n\n**Execution plan:**\n```\nLimit\n  -\u003e Sort (cheap, small result set)\n    -\u003e Biscuit Index Scan (fast filtering)\n```\n\n**Why this works:**\n- Biscuit filters candidates extremely fast \n- Result set is small after filtering\n- Sorting 100-1000 rows in memory is negligible (\u003c1ms)\n- **Net result**: Still much faster than pg_trgm with recheck overhead in many cases\n\n### **Memory Usage**\n\nBiscuit stores bitmaps in memory:\n- Use `REINDEX` to rebuild if index grows too large\n\n### **Write Performance**\n**Note:** Biscuit is not optimized for workloads with heavy write operations. In such cases, we suggest using pg_trgm or B-tree based on your requirements.\n- **INSERT**: Must update multiple bitmaps\n- **UPDATE**: Two operations (remove old, insert new)\n- **DELETE**: Marks as tombstone, batch cleanup at threshold\n\n---\n\n##  **Comparison with pg_trgm**\n\n| Feature                  | Biscuit                     | pg_trgm (GIN)        |\n|--------------------------|------------------------------|----------------------|\n| **Wildcard patterns**    | ✔ Native              | ✔ Approximate        |\n| **Recheck overhead**     | ✔ None (deterministic)       | ✗ Required    |\n| **Regex support**        | ✗ No                         | ✔ Yes                |\n| **Similarity search**    | ✗ No                         | ✔ Yes                |\n| **ILIKE support**        | ✔ Full       | ✔ Native             |\n\n\n**When to use Biscuit:**\n- Wildcard-heavy `LIKE` / `ILIKE` queries (`%`, `_`)\n-  Multi-column pattern matching\n-  Need exact results (no false positives)\n-  `COUNT(*)` / aggregate queries\n-  High query volume, can afford memory\n\n**When to use pg_trgm:**\n- Fuzzy/similarity search (`word \u003c-\u003e pattern`)\n- Regular expressions\n- Memory-constrained environments\n- Write-heavy workloads\n\n---\n\n## **Development**\n\n### **Build from Source**\n\n```bash\ngit clone https://github.com/Crystallinecore/biscuit.git\ncd biscuit\n\n# Development build with debug symbols\nmake clean\nCFLAGS=\"-g -O0 -DDEBUG\" make\n\n# Run tests\nmake installcheck\n\n# Install\nsudo make install\n```\n\n### **Testing**\n\n```bash\n# Unit tests\nmake installcheck\n\n# Manual testing\npsql -d testdb\n\nCREATE EXTENSION biscuit;\n\n-- Create test table\nCREATE TABLE test (id SERIAL, name TEXT);\nINSERT INTO test (name) VALUES ('hello'), ('world'), ('test');\n\n-- Create index\nCREATE INDEX idx_test ON test USING biscuit(name);\n\n-- Test queries\nEXPLAIN ANALYZE SELECT * FROM test WHERE name LIKE '%ell%';\n```\n\n### **Debugging**\n\nEnable PostgreSQL debug logging:\n\n```sql\nSET client_min_messages = DEBUG1;\nSET log_min_messages = DEBUG1;\n\n-- Now run queries to see Biscuit's internal logs\nSELECT * FROM test WHERE name LIKE '%pattern%';\n```\n\n---\n\n##  **Contributing**\n\nContributions are welcome! Please:\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing`)\n3. Make your changes with tests\n4. Submit a pull request\n\n### **Areas for Contribution**\n\n- [ ] Implement `amcanorder` for native sorted scans\n- [ ] Add statistics collection for better cost estimation\n- [ ] Support for more data types \n- [ ] Parallel index build\n- [ ] Index compression options\n\n---\n\n##  **License**\n\nMIT License - See LICENSE file for details.\n\n---\n\n## **Author**\n\nSivaprasad Murali\n- Email: sivaprasad.off@gmail.com\n- GitHub: [@Crystallinecore](https://github.com/Crystallinecore)\n\n---\n\n\n## **Acknowledgments**\n\n* The PostgreSQL community for the extensible index access method (AM) framework\n* **B-tree** and **pg_trgm** indexes that shaped the design space for pattern matching in PostgreSQL\n* The **CRoaring** library for efficient compressed bitmap operations\n\n---\n\n## **Support**\n\n- **Issues**: [GitHub Issues](https://github.com/Crystallinecore/biscuit/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/Crystallinecore/biscuit/discussions)\n- **Documentation**: [ReadTheDocs Page](https://biscuit.readthedocs.io/) \n---\n\n**Happy pattern matching! Grab a biscuit 🍪 when others feel half-baked!**\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCrystallineCore%2FBiscuit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FCrystallineCore%2FBiscuit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCrystallineCore%2FBiscuit/lists"}