{"id":30772173,"url":"https://github.com/query-farm/bitfilters","last_synced_at":"2025-09-05T00:52:39.076Z","repository":{"id":308818981,"uuid":"1028625852","full_name":"Query-farm/bitfilters","owner":"Query-farm","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-08T03:46:34.000Z","size":69,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-08T05:26:54.113Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Query-farm.png","metadata":{"files":{"readme":"docs/README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-29T20:12:51.000Z","updated_at":"2025-08-08T03:46:37.000Z","dependencies_parsed_at":"2025-08-08T05:39:22.289Z","dependency_job_id":null,"html_url":"https://github.com/Query-farm/bitfilters","commit_stats":null,"previous_names":["query-farm/bitfilters"],"tags_count":null,"template":false,"template_full_name":"duckdb/extension-template","purl":"pkg:github/Query-farm/bitfilters","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fbitfilters","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fbitfilters/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fbitfilters/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fbitfilters/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Query-farm","download_url":"https://codeload.github.com/Query-farm/bitfilters/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fbitfilters/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273695251,"owners_count":25151484,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-04T02:00:08.968Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-05T00:52:37.016Z","updated_at":"2025-09-05T00:52:39.051Z","avatar_url":"https://github.com/Query-farm.png","language":"C++","readme":"# `bitfilters` Extension for DuckDB Developed by [Query.Farm](https://query.farm)\n\nA high-performance [DuckDB](https://duckdb.org) extension providing probabilistic data structures for fast set membership testing and approximate duplicate detection. This extension implements state-of-the-art filter algorithms including [Quotient filters](https://en.wikipedia.org/wiki/Quotient_filter), [XOR filters](https://arxiv.org/abs/1912.08258), [Binary Fuse filters](https://arxiv.org/abs/2201.01174), and soon [Bloom filters](https://en.wikipedia.org/wiki/Bloom_filter).\n\n`bitfilters` provides space-efficient [probabilistic data structures](https://en.wikipedia.org/wiki/Probabilistic_data_structure) that can answer \"Is element X in set S?\" with:\n\n- **No false negatives**: If the filter says an element is not present, it's definitely not there\n\n- **Possible false positives**: If the filter says an element is present, it might be there (with configurable probability)\n\nYou may find it useful to use this module in combination with the [`hashfuncs` extension](https://query.farm/duckdb_extension_hashfuncs.html) to produce the hash outputs that these filters require.  Most filters require a `UBIGINT` or unsigned 64-bit integer value as their input type.  You can use the functions provided by `hashfuncs` or the DuckDB `hash()` function to produce those values from most DuckDB data types.\n\n## Installation\n\n**`bitfilters` is a [DuckDB Community Extension](https://github.com/duckdb/community-extensions).**\n\nYou can install and use this extension with the following SQL commands:\n\n```sql\nINSTALL bitfilters FROM community;\nLOAD bitfilters;\n```\n\nFor more information about DuckDB extensions, see the [official documentation](https://duckdb.org/docs/extensions/overview).\n\n## What are bitfilters?\n\nBitfilters are [probabilistic data structures](https://en.wikipedia.org/wiki/Probabilistic_data_structure) that provide fast, memory-efficient approximate set membership testing. They are designed to answer the question \"Is element X in set S?\" with:\n\n- **Space efficiency**: Use significantly less memory than storing the actual set\n- **Speed**: Extremely fast lookups (typically [O(1)](https://en.wikipedia.org/wiki/Big_O_notation) or O(k) where k is small)\n- **No false negatives**: If a filter says \"NO\", the element is definitely not in the set\n- **Possible false positives**: If a filter says \"YES\", the element might be in the set\n\n### Common Use Cases\n\n- **Pre-filtering expensive operations**: Avoid costly disk I/O or network calls for non-existent data\n- **Duplicate detection**: Quickly identify potential duplicates in large datasets\n- **Cache optimization**: Determine if data might be in cache before expensive lookups\n- **Data skipping**: Skip irrelevant data partitions in analytical queries\n- **Set operations**: Approximate set intersections and unions on massive datasets\n- **Database join optimization**: Pre-filter join candidates to reduce computation\n\n### Performance Benefits\n\n```sql\n-- Without bitfilters: Expensive operation on every row\nSELECT expensive_function(data)\nFROM large_table\nWHERE complex_condition(data);\n\n-- With bitfilters: Pre-filter to reduce expensive operations by 90%+\nSELECT expensive_function(lt.data)\nFROM large_table lt\nJOIN precomputed_filters pf ON lt.partition = pf.partition\nWHERE filter_contains(pf.filter, lt.key)  -- Fast filter check\n  AND complex_condition(lt.data);         -- Expensive check only when needed\n```\n\n## Available Filters\n\n### 1. Quotient Filters\n\nSpace-efficient filters that support deletion and resizing operations. Learn more about [Quotient Filters](https://en.wikipedia.org/wiki/Quotient_filter).\n\n#### Functions:\n\n- `quotient_filter(q, r, hash_value)` - Create filter from hash values\n- `quotient_filter_contains(filter, hash_value)` - Test membership\n\n#### Characteristics:\n\n- **Use case**: Dynamic datasets, applications requiring deletion\n- **Pros**: Supports deletion, better cache locality, resizable\n- **Cons**: More complex implementation, slightly slower than Bloom filters\n- **Memory**: Similar to Bloom filters but with better cache performance\n\n```sql\n-- Create a quotient filter with q=16 and r=4\nCREATE TABLE quotient_filters AS (\n    SELECT id % 2 AS remainder,\n           quotient_filter(16, 4, hash(id)) AS filter\n    FROM series_data\n    GROUP BY id % 2\n);\n\n-- Test membership\nSELECT quotient_filter_contains(filter, hash(12345)) AS might_exist\nFROM quotient_filters WHERE remainder = 1;\n```\n\n### 2. XOR Filters\n\nModern high-performance filters with better space efficiency than Bloom filters. Read the [XOR Filters paper](https://arxiv.org/abs/1912.08258) for technical details.\n\n#### Functions:\n\n- `xor16_filter(hash_value)` - Create 16-bit XOR filter from hash values\n- `xor8_filter(hash_value)` - Create 8-bit XOR filter from hash values\n- `xor16_filter_contains(filter, hash_value)` - Test membership in 16-bit filter\n- `xor8_filter_contains(filter, hash_value)` - Test membership in 8-bit filter\n\n#### Characteristics:\n\n- **Use case**: Read-heavy workloads, static datasets\n- **Pros**: Better space efficiency (~20% less memory), faster queries\n- **Cons**: Static size, more complex construction, no incremental updates\n- **Memory**: ~1.23 bits per element for 1% false positive rate\n\n```sql\n-- Create XOR filters (both 8-bit and 16-bit versions)\nCREATE TABLE xor_filters AS (\n    SELECT id % 2 AS remainder,\n           xor16_filter(hash(id)) AS xor16_filter,\n           xor8_filter(hash(id)) AS xor8_filter\n    FROM series_data\n    GROUP BY id % 2\n);\n\n-- Test membership\nSELECT\n    xor16_filter_contains(xor16_filter, hash(12345)) AS in_xor16,\n    xor8_filter_contains(xor8_filter, hash(12345)) AS in_xor8\nFROM xor_filters WHERE remainder = 1;\n```\n\n### 3. Binary Fuse Filters\n\nLatest generation filters with optimal space usage and excellent performance. See the [Binary Fuse Filters paper](https://arxiv.org/abs/2201.01174) for implementation details.\n\n#### Functions:\n\n- `binary_fuse16_filter(hash_value)` - Create 16-bit Binary Fuse filter from hash values\n- `binary_fuse8_filter(hash_value)` - Create 8-bit Binary Fuse filter from hash values\n- `binary_fuse16_filter_contains(filter, hash_value)` - Test membership in 16-bit filter\n- `binary_fuse8_filter_contains(filter, hash_value)` - Test membership in 8-bit filter\n\n#### Characteristics:\n\n- **Use case**: Applications requiring minimal memory footprint\n- **Pros**: Best space efficiency, fast construction and queries\n- **Cons**: Static size, newer algorithm with less production experience\n- **Memory**: Significantly more space-efficient than other filters\n\n```sql\n-- Create Binary Fuse filters (both 8-bit and 16-bit versions)\nCREATE TABLE binary_fuse_filters AS (\n    SELECT id % 2 AS remainder,\n           binary_fuse16_filter(hash(id)) AS binary_fuse16_filter,\n           binary_fuse8_filter(hash(id)) AS binary_fuse8_filter\n    FROM series_data\n    GROUP BY id % 2\n);\n\n-- Test membership\nSELECT\n    binary_fuse16_filter_contains(binary_fuse16_filter, hash(12345)) AS in_fuse16,\n    binary_fuse8_filter_contains(binary_fuse8_filter, hash(12345)) AS in_fuse8\nFROM binary_fuse_filters WHERE remainder = 1;\n```\n\n## Usage Examples\n\n### Basic Filter Operations\n\n```sql\n-- Create test data\nCREATE TABLE series_data AS (\n    SELECT * AS id FROM generate_series(1, 100000) AS id\n);\n\n-- Create quotient filters with q=16 and r=4\nCREATE TABLE quotient_filters AS (\n    SELECT id % 2 AS remainder,\n           quotient_filter(16, 4, hash(id)) AS filter\n    FROM series_data\n    GROUP BY id % 2\n);\n\n-- Test membership (should find all matching elements)\nSELECT remainder,\n       COUNT(CASE WHEN quotient_filter_contains(filter, hash(id)) THEN 1 ELSE NULL END) AS matches\nFROM series_data, quotient_filters\nWHERE series_data.id % 2 = quotient_filters.remainder\nGROUP BY remainder;\n┌───────────┬─────────┐\n│ remainder │ matches │\n│   int64   │  int64  │\n├───────────┼─────────┤\n│         0 │   50000 │\n│         1 │   50000 │\n└───────────┴─────────┘\n\n-- Check false positives (elements not in the filter that test positive)\nSELECT remainder,\n       COUNT(CASE WHEN quotient_filter_contains(filter, hash(id)) THEN 1 ELSE NULL END) AS false_positives\nFROM series_data, quotient_filters\nWHERE series_data.id % 2 != quotient_filters.remainder\nGROUP BY remainder;\n┌───────────┬─────────────────┐\n│ remainder │ false_positives │\n│   int64   │      int64      │\n├───────────┼─────────────────┤\n│         0 │            2264 │\n│         1 │            2273 │\n└───────────┴─────────────────┘\n```\n\n### XOR Filter Examples\n\n```sql\n-- Create XOR filters (both 8-bit and 16-bit)\nCREATE TABLE xor_filters AS (\n    SELECT id % 2 AS remainder,\n           xor16_filter(hash(id)) AS xor16_filter,\n           xor8_filter(hash(id)) AS xor8_filter\n    FROM series_data\n    GROUP BY id % 2\n);\n\n-- Verify all elements are found (no false negatives)\nSELECT remainder,\n       COUNT(CASE WHEN xor16_filter_contains(xor16_filter, hash(id)) THEN 1 ELSE NULL END) AS xor16_matches,\n       COUNT(CASE WHEN xor8_filter_contains(xor8_filter, hash(id)) THEN 1 ELSE NULL END) AS xor8_matches\nFROM series_data, xor_filters\nWHERE series_data.id % 2 = xor_filters.remainder\nGROUP BY remainder;\n┌───────────┬───────────────┬──────────────┐\n│ remainder │ xor16_matches │ xor8_matches │\n│   int64   │     int64     │    int64     │\n├───────────┼───────────────┼──────────────┤\n│         0 │         50000 │        50000 │\n│         1 │         50000 │        50000 │\n└───────────┴───────────────┴──────────────┘\n\n-- Compare filter performance\nSELECT\n    'XOR16' AS filter_type,\n    octet_length(xor16_filter) AS size_bytes\nFROM xor_filters WHERE remainder = 0\nUNION ALL\nSELECT\n    'XOR8' AS filter_type,\n    octet_length(xor8_filter) AS size_bytes\nFROM xor_filters WHERE remainder = 0;\n┌─────────────┬────────────┐\n│ filter_type │ size_bytes │\n│   varchar   │   int64    │\n├─────────────┼────────────┤\n│ XOR16       │     123076 │\n│ XOR8        │      61546 │\n└─────────────┴────────────┘\n```\n\n### Binary Fuse Filter Examples\n\n```sql\n-- Create Binary Fuse filters\nCREATE TABLE binary_fuse_filters AS (\n    SELECT id % 2 AS remainder,\n           binary_fuse16_filter(hash(id)) AS binary_fuse16_filter,\n           binary_fuse8_filter(hash(id)) AS binary_fuse8_filter\n    FROM series_data\n    GROUP BY id % 2\n);\n\n-- Test all elements are found\nSELECT remainder,\n       COUNT(CASE WHEN binary_fuse16_filter_contains(binary_fuse16_filter, hash(id)) THEN 1 ELSE NULL END) AS fuse16_matches,\n       COUNT(CASE WHEN binary_fuse8_filter_contains(binary_fuse8_filter, hash(id)) THEN 1 ELSE NULL END) AS fuse8_matches\nFROM series_data, binary_fuse_filters\nWHERE series_data.id % 2 = binary_fuse_filters.remainder\nGROUP BY remainder;\n┌───────────┬────────────────┬───────────────┐\n│ remainder │ fuse16_matches │ fuse8_matches │\n│   int64   │     int64      │     int64     │\n├───────────┼────────────────┼───────────────┤\n│         0 │          50000 │         50000 │\n│         1 │          50000 │         50000 │\n└───────────┴────────────────┴───────────────┘\n\n-- Check false positive rates\nSELECT remainder,\n       COUNT(CASE WHEN binary_fuse16_filter_contains(binary_fuse16_filter, hash(id)) THEN 1 ELSE NULL END) AS fuse16_false_positives,\n       COUNT(CASE WHEN binary_fuse8_filter_contains(binary_fuse8_filter, hash(id)) THEN 1 ELSE NULL END) AS fuse8_false_positives\nFROM series_data, binary_fuse_filters\nWHERE series_data.id % 2 != binary_fuse_filters.remainder\nGROUP BY remainder;\n┌───────────┬────────────────────────┬───────────────────────┐\n│ remainder │ fuse16_false_positives │ fuse8_false_positives │\n│   int64   │         int64          │         int64         │\n├───────────┼────────────────────────┼───────────────────────┤\n│         0 │                      1 │                   171 │\n│         1 │                      1 │                   199 │\n└───────────┴────────────────────────┴───────────────────────┘\n```\n\n### Practical Use Case: User Activity Tracking\n\n```sql\n-- Track active users by day using quotient filters\nCREATE TABLE user_activity AS (\n    SELECT\n        (CURRENT_DATE - (random() * 30)::INTEGER) AS activity_date,\n        (random() * 1000000)::INTEGER AS user_id\n    FROM generate_series(1, 5000000)\n);\n\n-- Create daily quotient filters for active users\nCREATE TABLE daily_user_filters AS (\n    SELECT\n        activity_date,\n        quotient_filter(16, 4, hash(user_id)) AS user_filter,\n        COUNT(DISTINCT user_id) AS actual_unique_users\n    FROM user_activity\n    GROUP BY activity_date\n);\n\n-- Check if specific users were active on specific dates\nSELECT\n    activity_date,\n    quotient_filter_contains(user_filter, hash(12345)) AS user_12345_active,\n    quotient_filter_contains(user_filter, hash(67890)) AS user_67890_active,\n    actual_unique_users\nFROM daily_user_filters\nORDER BY activity_date;\n\n-- Find days when specific users might have been active\nWITH target_users AS (\n    SELECT unnest([12345, 67890, 11111, 99999]) AS user_id\n)\nSELECT\n    tu.user_id,\n    duf.activity_date,\n    quotient_filter_contains(duf.user_filter, hash(tu.user_id)) AS possibly_active,\n    EXISTS(\n        SELECT 1 FROM user_activity ua\n        WHERE ua.user_id = tu.user_id\n        AND ua.activity_date = duf.activity_date\n    ) AS actually_active\nFROM target_users tu\nCROSS JOIN daily_user_filters duf\nWHERE quotient_filter_contains(duf.user_filter, hash(tu.user_id))\nORDER BY tu.user_id, duf.activity_date;\n```\n\n### Filter Comparison Example\n\n```sql\n-- Compare all filter types side by side\nWITH sample_data AS (\n    SELECT hash(id) AS hash_value\n    FROM generate_series(1, 1000000) AS id\n),\nall_filters AS (\n    SELECT\n        quotient_filter(20, 4, hash_value) AS qf,\n        xor16_filter(hash_value) AS xor16,\n        xor8_filter(hash_value) AS xor8,\n        binary_fuse16_filter(hash_value) AS bf16,\n        binary_fuse8_filter(hash_value) AS bf8\n    FROM sample_data\n)\nSELECT\n    'Quotient Filter' AS filter_type,\n    octet_length(qf) AS size_bytes,\n    octet_length(qf) / 1000000.0 AS bytes_per_element\nFROM all_filters\nUNION ALL\nSELECT\n    'XOR16 Filter',\n    octet_length(xor16),\n    octet_length(xor16) / 1000000.0\nFROM all_filters\nUNION ALL\nSELECT\n    'XOR8 Filter',\n    octet_length(xor8),\n    octet_length(xor8) / 1000000.0\nFROM all_filters\nUNION ALL\nSELECT\n    'Binary Fuse16 Filter',\n    octet_length(bf16),\n    octet_length(bf16) / 1000000.0\nFROM all_filters\nUNION ALL\nSELECT\n    'Binary Fuse8 Filter',\n    octet_length(bf8),\n    octet_length(bf8) / 1000000.0\nFROM all_filters;\n┌──────────────────────┬────────────┬───────────────────┐\n│     filter_type      │ size_bytes │ bytes_per_element │\n│       varchar        │   int64    │      double       │\n├──────────────────────┼────────────┼───────────────────┤\n│ Quotient Filter      │     917544 │          0.917544 │\n│ XOR16 Filter         │    2460076 │          2.460076 │\n│ XOR8 Filter          │    1230046 │          1.230046 │\n│ Binary Fuse16 Filter │    2261024 │          2.261024 │\n│ Binary Fuse8 Filter  │    1130524 │          1.130524 │\n└──────────────────────┴────────────┴───────────────────┘\n```\n\n\n## Best Practices\n\n### ✅ Do's\n\n- Use a hash function to create consistent hash values for filter operations\n- Store filters in materialized views for reuse across queries\n- Combine filters with exact checks for final results\n- Monitor actual false positive rates in production\n- Choose appropriate bit-width (8 vs 16) based on memory/accuracy tradeoffs\n\n### ❌ Don'ts\n\n- Don't rely on filters for exact results without confirmation\n- Don't mix hash values from different sources in the same filter\n- Don't rebuild filters frequently for dynamic data\n- Don't use filters for very small datasets (overhead not worth it)\n- Don't forget that quotient filter parameters (q, r) affect capacity and accuracy\n\n```sql\n-- Good: Use filter to pre-screen, then exact check\nSELECT expensive_operation(data)\nFROM large_table lt\nJOIN precomputed_filter pf ON lt.partition = pf.partition\nWHERE quotient_filter_contains(pf.filter, hash(lt.key))  -- Fast pre-filter\n  AND exact_expensive_condition(lt.data);                -- Exact check\n\n-- Bad: Using filter as final arbiter\nSELECT data\nFROM large_table lt\nJOIN precomputed_filter pf ON lt.partition = pf.partition\nWHERE quotient_filter_contains(pf.filter, hash(lt.key)); -- May have false positives!\n```\n\n## API Reference\n\n### Quotient Filter Functions\n\n#### `quotient_filter(q, r, hash_value)`\nCreates a quotient filter with 2^q slots and r remainder bits.\n\n##### Parameters:\n\n- `q` (INTEGER): Log2 of the number of slots (capacity = 2^q)\n- `r` (INTEGER): Number of remainder bits (affects accuracy)\n- `hash_value` (UBIGINT): Hash values to insert into the filter\n\n##### Returns:\n\nBLOB containing the serialized quotient filter\n\n##### Example\n\n```sql\n-- Create filter with 2^16 = 65536 slots and 4 remainder bits\nSELECT quotient_filter(16, 4, hash(user_id)) FROM users;\n```\n---\n\n#### `quotient_filter_contains(filter, hash_value)`\nTests if a quotient filter may contain a hash value.\n\n##### Parameters:\n- `filter` (BLOB): Serialized quotient filter\n- `hash_value` (UBIGINT): Hash value to test\n\n##### Returns:\n\n`BOOLEAN`\n\n- `true`: Hash value might be in the set (possible false positive)\n- `false`: Hash value is definitely not in the set (no false negatives)\n\n### XOR Filter Functions\n\n#### `xor16_filter(hash_value)` / `xor8_filter(hash_value)`\n\nCreates XOR filters with 16-bit or 8-bit fingerprints.\n\n##### Parameters:\n\n- `hash_value` (BIGINT): Hash values to insert into the filter\n\n##### Returns:\n\n`BLOB` containing the serialized XOR filter\n\n##### Example:\n\n```sql\nSELECT xor16_filter(hash(product_id)) FROM products;\nSELECT xor8_filter(hash(product_id)) FROM products;  -- Smaller but higher FPR\n```\n\n#### `xor16_filter_contains(filter, hash_value)` / `xor8_filter_contains(filter, hash_value)`\n\nTests if an XOR filter may contain a hash value.\n\n##### Parameters:\n\n- `filter` (`BLOB`): Serialized XOR filter\n- `hash_value` (`UBIGINT`): Hash value to test\n\n##### Returns:\n\n`BOOLEAN` (same semantics as quotient filters)\n\n### Binary Fuse Filter Functions\n\n#### `binary_fuse16_filter(hash_value)` / `binary_fuse8_filter(hash_value)`\n\nCreates Binary Fuse filters with 16-bit or 8-bit fingerprints.\n\n##### Parameters:\n\n- `hash_value` (`UBIGINT`): Hash values to insert into the filter\n\n##### Returns:\n\n`BLOB` containing the serialized Binary Fuse filter\n\n##### Example:\n\n```sql\nSELECT binary_fuse16_filter(hash(transaction_id)) FROM transactions;\nSELECT binary_fuse8_filter(hash(transaction_id)) FROM transactions;\n```\n\n#### `binary_fuse16_filter_contains(filter, hash_value)` / `binary_fuse8_filter_contains(filter, hash_value)`\n\nTests if a Binary Fuse filter may contain a hash value.\n\n##### Parameters:\n\n- `filter` (`BLOB`): Serialized Binary Fuse filter\n- `hash_value` (`BIGINT`): Hash value to test\n\n##### Returns:\n\n`BOOLEAN` (same semantics as other filters)\n\n### Filter Characteristics Summary\n\n| Filter Type | Create Function | Contains Function | Bits per Element | False Positive Rate | Notes |\n|-------------|----------------|-------------------|------------------|-------------------|-------|\n| Quotient | `quotient_filter(q,r,hash)` | `quotient_filter_contains(filter,hash)` | Variable | Depends on q,r | Supports deletion |\n| XOR16 | `xor16_filter(hash)` | `xor16_filter_contains(filter,hash)` | ~9-10 bits | ~0.4% | High performance |\n| XOR8 | `xor8_filter(hash)` | `xor8_filter_contains(filter,hash)` | ~9-10 bits | ~0.4% | Smaller size |\n| Binary Fuse16 | `binary_fuse16_filter(hash)` | `binary_fuse16_filter_contains(filter,hash)` | ~9-10 bits | ~0.4% | Best space efficiency |\n| Binary Fuse8 | `binary_fuse8_filter(hash)` | `binary_fuse8_filter_contains(filter,hash)` | ~9-10 bits | ~1.5% | Smallest size |\n\n## Integration with Other Extensions\n\n### Using with `hashfuncs` Extension\n\nThe [`hashfuncs` extension](https://query.farm/duckdb_extension_hashfuncs.html) provides additional hash functions that can improve filter performance and distribution.\n\n```sql\n-- Load both extensions\nLOAD hashfuncs;\nLOAD bitfilters;\n\n-- Use specialized hash functions for better distribution\nSELECT quotient_filter(16, 4, xxh64(complex_key || salt))\nFROM my_table;\n\n```\n\n### Hash Function Recommendations\n\nFor optimal performance, consider these hash functions from the [`hashfuncs` extension](https://query.farm/duckdb_extension_hashfuncs.html):\n\n## Limitations\n\n1. **Hash-based Input**: All filters require hash values as input\n2. **Static Size**: XOR and Binary Fuse filters cannot be resized after creation\n3. **No Direct Deletion**: Only quotient filters support element removal\n4. **False Positives**: All filters may return false positives but never false negatives\n5. **Type Consistency**: Hash values must be consistent between creation and testing\n6. **Memory Usage**: Larger filters provide better accuracy but use more memory\n\n## Performance Characteristics\n\nFor detailed performance analysis, see the respective research papers: [Quotient Filters](https://dl.acm.org/doi/10.1145/2213977.2214006), [XOR Filters](https://arxiv.org/abs/1912.08258), and [Binary Fuse Filters](https://arxiv.org/abs/2201.01174).\n\n| Operation | Quotient Filter | XOR Filter | Binary Fuse Filter |\n|-----------|----------------|------------|-------------------|\n| **Construction** | [O(n)](https://en.wikipedia.org/wiki/Big_O_notation) | O(n) | O(n) |\n| **Query** | [O(1)](https://en.wikipedia.org/wiki/Big_O_notation) average | O(1) | O(1) |\n| **Space** | Variable | ~9.84 bits/element | ~9.1 bits/element |\n| **False Positive Rate** | Configurable | ~0.39% | 8-bit: ~1.56%, 16-bit: ~0.39% |\n| **Supports Deletion** | ✅ | ❌ | ❌ |\n| **Resizable** | ✅ | ❌ | ❌ |\n\n## Contributing\n\nThis extension is developed and maintained by **[Query.Farm](https://query.farm)**.\n\nFor bug reports, feature requests, or contributions:\n- Visit our [GitHub repository](https://github.com/Query-farm/bitfilters)\n- Submit issues with reproducible examples\n- Follow our coding standards for pull requests\n- Check the [DuckDB extension development guide](https://duckdb.org/docs/extensions/overview) for technical details\n\n## License\n\n[MIT Licensed](https://github.com/Query-farm/bitfilters/blob/main/LICENSE)\n\n## Related Resources\n\n- **Academic Papers:**\n  - [Quotient Filters](https://dl.acm.org/doi/10.1145/2213977.2214006) - Original quotient filter paper\n  - [XOR Filters](https://arxiv.org/abs/1912.08258) - XOR filter research\n  - [Binary Fuse Filters](https://arxiv.org/abs/2201.01174) - Latest filter technology\n\n\n- **Compatible Extensions:**\n  - [`hashfuncs`](https://query.farm/duckdb_extension_hashfuncs.html) - Additional hash functions\n\n---\n\n**[Query.Farm](https://query.farm)** - Advanced data processing solutions for modern analytics workloads.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquery-farm%2Fbitfilters","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquery-farm%2Fbitfilters","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquery-farm%2Fbitfilters/lists"}