{"id":30772186,"url":"https://github.com/query-farm/hashfuncs","last_synced_at":"2025-09-05T00:52:48.228Z","repository":{"id":308468894,"uuid":"1032747540","full_name":"Query-farm/hashfuncs","owner":"Query-farm","description":"A DuckDB extension that supplies hash functions","archived":false,"fork":false,"pushed_at":"2025-08-06T04:10:26.000Z","size":17,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-06T05:29:17.899Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Query-farm.png","metadata":{"files":{"readme":"docs/README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-05T19:05:07.000Z","updated_at":"2025-08-06T04:10:29.000Z","dependencies_parsed_at":"2025-08-06T05:29:20.683Z","dependency_job_id":"c27983e7-207a-4b6d-8c1c-0c88b242f39c","html_url":"https://github.com/Query-farm/hashfuncs","commit_stats":null,"previous_names":["query-farm/hashfuncs"],"tags_count":null,"template":false,"template_full_name":"duckdb/extension-template","purl":"pkg:github/Query-farm/hashfuncs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fhashfuncs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fhashfuncs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fhashfuncs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fhashfuncs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Query-farm","download_url":"https://codeload.github.com/Query-farm/hashfuncs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fhashfuncs/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273695250,"owners_count":25151484,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-04T02:00:08.968Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-05T00:52:44.691Z","updated_at":"2025-09-05T00:52:48.219Z","avatar_url":"https://github.com/Query-farm.png","language":"C++","readme":"# Hashfuncs (hash functions) Extension for DuckDB\n\nThis `hashfuncs` extension adds functions for computing hash values from data.\n\n## Installation\n\n**`hashfuncs` is a [DuckDB Community Extension](https://github.com/duckdb/community-extensions).**\n\nYou can now use this by using this SQL:\n\n```sql\ninstall hashfuncs from community;\nload hashfuncs;\n```\n\n## What is hashing?\n\nHashing is a process that transforms input data of any size into a fixed-size output, which is typically a hash value or hash code. Hash functions are designed to be fast, deterministic (same input always produces the same output), and distribute hash values uniformly across the output space.\n\nNon cryptographic hash functions are commonly used for:\n\n- **Database indexing**: Creating efficient lookup structures\n\n- **Partitioning data**: Distributing data across multiple nodes or buckets\n\n- **Caching**: Creating cache keys\n\n- **Bloom filters**: Probabilistic data structures for membership testing\n\n## Available Hash Functions\n\nThis extension provides multiple high-performance non-cryptographic hash algorithms, each optimized for different use cases:\n\n### xxHash Family\n\n**xxHash** is an extremely fast non-cryptographic hash algorithm, reaching speeds close to RAM limits.\n\n#### `xxh32(data [, seed])`\n- **Returns**: `UINTEGER` (32-bit unsigned integer)\n- **Seed type**: `UINTEGER` (optional)\n- **Description**: 32-bit version of xxHash, good for hash tables and checksums\n\n```sql\nSELECT xxh32('hello world');\n┌──────────────────────┐\n│ xxh32('hello world') │\n│        uint32        │\n├──────────────────────┤\n│      3468387874      │\n│    (3.47 billion)    │\n└──────────────────────┘\n\nSELECT xxh32('hello world', 42);\n┌──────────────────────────┐\n│ xxh32('hello world', 42) │\n│          uint32          │\n├──────────────────────────┤\n│        4225033588        │\n│      (4.23 billion)      │\n└──────────────────────────┘\n```\n\n#### `xxh64(data [, seed])`\n- **Returns**: `UBIGINT` (64-bit unsigned integer)\n- **Seed type**: `UBIGINT` (optional)\n- **Description**: 64-bit version of xxHash, recommended for most applications\n\n```sql\nSELECT xxh64('hello world');\n┌──────────────────────┐\n│ xxh64('hello world') │\n│        uint64        │\n├──────────────────────┤\n│ 5020219685658847592  │\n│  (5.02 quintillion)  │\n└──────────────────────┘\n\nSELECT xxh64('hello world', 12345);\n┌─────────────────────────────┐\n│ xxh64('hello world', 12345) │\n│           uint64            │\n├─────────────────────────────┤\n│    15771590491225725957     │\n└─────────────────────────────┘\n```\n\n#### `xxh3_64(data [, seed])`\n- **Returns**: `UBIGINT` (64-bit unsigned integer)\n- **Seed type**: `UBIGINT` (optional)\n- **Description**: Latest xxHash algorithm, optimized for modern CPUs\n\n```sql\nSELECT xxh3_64('hello world');\n┌────────────────────────┐\n│ xxh3_64('hello world') │\n│         uint64         │\n├────────────────────────┤\n│  15296390279056496779  │\n└────────────────────────┘\n\nSELECT xxh3_64('hello world', 999);\n┌─────────────────────────────┐\n│ xxh3_64('hello world', 999) │\n│           uint64            │\n├─────────────────────────────┤\n│     3002856137354040482     │\n│     (3.00 quintillion)      │\n└─────────────────────────────┘\n```\n\n#### `xxh3_128(data [, seed])`\n- **Returns**: `UHUGEINT` (128-bit unsigned integer)\n- **Seed type**: `UBIGINT` (optional)\n- **Description**: 128-bit version providing larger hash space, reducing collision probability\n\n```sql\nSELECT xxh3_128('hello world');\n┌─────────────────────────────────────────┐\n│         xxh3_128('hello world')         │\n│                 uint128                 │\n├─────────────────────────────────────────┤\n│ 225447084758876380551077147957698971904 │\n└─────────────────────────────────────────┘\n\nSELECT xxh3_128('hello world', 777);\n┌─────────────────────────────────────────┐\n│      xxh3_128('hello world', 777)       │\n│                 uint128                 │\n├─────────────────────────────────────────┤\n│ 283192007746380917896797379546610829141 │\n└─────────────────────────────────────────┘\n```\n\n### RapidHash Family\n\n**RapidHash** is designed for exceptional speed while maintaining good hash quality.\n\n#### `rapidhash(data [, seed])`\n- **Returns**: `UBIGINT` (64-bit unsigned integer)\n- **Seed type**: `UBIGINT` (optional)\n- **Description**: Main RapidHash algorithm, optimized for speed and quality\n\n```sql\nSELECT rapidhash('hello world');\n┌──────────────────────────┐\n│ rapidhash('hello world') │\n│          uint64          │\n├──────────────────────────┤\n│   3397907815814400320    │\n│    (3.40 quintillion)    │\n└──────────────────────────┘\n\nSELECT rapidhash('hello world', 2023);\n┌────────────────────────────────┐\n│ rapidhash('hello world', 2023) │\n│             uint64             │\n├────────────────────────────────┤\n│      11789095433300219990      │\n└────────────────────────────────┘\n```\n\n### MurmurHash3 Family\n\n**MurmurHash3** is a well-established non-cryptographic hash function known for good distribution and performance.\n\n#### `murmurhash3_32(data [, seed])`\n- **Returns**: `UINTEGER` (32-bit unsigned integer)\n- **Seed type**: `UINTEGER` (optional)\n- **Description**: 32-bit MurmurHash3, widely used and tested\n\n```sql\nSELECT murmurhash3_32('hello world');\n┌───────────────────────────────┐\n│ murmurhash3_32('hello world') │\n│            uint32             │\n├───────────────────────────────┤\n│          1586663183           │\n│        (1.59 billion)         │\n└───────────────────────────────┘\n\nSELECT murmurhash3_32('hello world', 123);\n┌────────────────────────────────────┐\n│ murmurhash3_32('hello world', 123) │\n│               uint32               │\n├────────────────────────────────────┤\n│             679062093              │\n│          (679.06 million)          │\n└────────────────────────────────────┘\n```\n\n#### `murmurhash3_128(data [, seed])`\n- **Returns**: `UHUGEINT` (128-bit unsigned integer)\n- **Seed type**: `UINTEGER` (optional)\n- **Description**: 128-bit MurmurHash3 for x86 platforms\n\n```sql\nSELECT murmurhash3_128('hello world');\n┌─────────────────────────────────────────┐\n│     murmurhash3_128('hello world')      │\n│                 uint128                 │\n├─────────────────────────────────────────┤\n│ 206095855024402301784664199839047883400 │\n└─────────────────────────────────────────┘\n```\n\n#### `murmurhash3_x64_128(data [, seed])`\n- **Returns**: `UHUGEINT` (128-bit unsigned integer)\n- **Seed type**: `UINTEGER` (optional)\n- **Description**: 128-bit MurmurHash3 optimized for x64 platforms\n\n```sql\nSELECT murmurhash3_x64_128('hello world');\n┌─────────────────────────────────────────┐\n│   murmurhash3_x64_128('hello world')    │\n│                 uint128                 │\n├─────────────────────────────────────────┤\n│ 228083453807047072434243676435732455694 │\n└─────────────────────────────────────────┘\n```\n\n## Supported Data Types\n\nAll hash functions support the following DuckDB data types:\n\n- **String types**: `VARCHAR`, `BLOB`\n- **Integer types**: `TINYINT`, `SMALLINT`, `INTEGER`, `BIGINT`, `HUGEINT`\n- **Unsigned integer types**: `UTINYINT`, `USMALLINT`, `UINTEGER`, `UBIGINT`, `UHUGEINT`\n- **Floating point types**: `FLOAT`, `DOUBLE`\n- **Date/time types**: `DATE`, `TIME`\n\n## Performance Characteristics\n\n| Algorithm | Speed | Quality | Output Size | Best Use Case |\n|-----------|-------|---------|-------------|---------------|\n| `xxh32` | Very Fast | Good | 32-bit | Legacy systems, hash tables |\n| `xxh64` | Very Fast | Very Good | 64-bit | General purpose hashing |\n| `xxh3_64` | Fastest | Excellent | 64-bit | Modern applications |\n| `xxh3_128` | Fast | Excellent | 128-bit | When collision resistance is critical |\n| `rapidhash` | Extremely Fast | Good | 64-bit | High-throughput applications |\n| `rapidhash_micro` | Extremely Fast | Good | 64-bit | Small data, high frequency |\n| `rapidhash_nano` | Fastest | Fair | 64-bit | Tiny data, maximum speed |\n| `murmurhash3_32` | Fast | Very Good | 32-bit | Distributed systems, Bloom filters |\n| `murmurhash3_128` | Fast | Very Good | 128-bit | UUID generation, partitioning |\n| `murmurhash3_x64_128` | Fast | Very Good | 128-bit | 64-bit optimized partitioning |\n\n## Usage Examples\n\n### Basic Hashing\n\n```sql\n-- Hash various data types\nSELECT xxh64(42);                    -- Integer\nSELECT xxh64('Hello, World!');       -- String\nSELECT xxh64('2023-12-01'::DATE);    -- Date\nSELECT xxh64(3.14159::FLOAT);        -- Float\n```\n\n### Data Partitioning\n\n```sql\n-- Distribute data across 10 partitions\nSELECT\n    customer_id,\n    xxh64(customer_id) % 10 as partition_id\nFROM customers;\n```\n\n### Creating Consistent Hash Keys\n\n```sql\n-- Create cache keys from multiple columns\nSELECT\n    user_id,\n    product_id,\n    xxh64(CONCAT(user_id, ':', product_id)) as cache_key\nFROM user_purchases;\n```\n\n### Data Integrity Verification\n\n```sql\n-- Create checksums for data verification\nSELECT\n    file_name,\n    file_content,\n    xxh3_128(file_content) as checksum\nFROM file_storage;\n```\n\n### Using Seeds for Different Hash Spaces\n\n```sql\n-- Create different hash values for the same data\nSELECT\n    data,\n    xxh64(data, 0) as hash_space_0,\n    xxh64(data, 1) as hash_space_1,\n    xxh64(data, 2) as hash_space_2\nFROM my_table;\n```\n\n### Bloom Filter Implementation\n\n```sql\n-- Generate multiple hash values for Bloom filter\nWITH bloom_hashes AS (\n    SELECT\n        item,\n        murmurhash3_32(item, 0) % 1000000 as hash1,\n        murmurhash3_32(item, 1) % 1000000 as hash2,\n        murmurhash3_32(item, 2) % 1000000 as hash3\n    FROM items\n)\nSELECT * FROM bloom_hashes;\n```\n\n### Load Balancing\n\n```sql\n-- Distribute requests across servers\nSELECT\n    request_id,\n    xxh3_64(request_id) % 5 as server_id\nFROM incoming_requests;\n```\n\n## Algorithm Selection Guide\n\n**For general-purpose hashing**: Use `xxh3_64` - it provides the best balance of speed and quality for modern applications.\n\n**For maximum speed**: Use `rapidhash` or `rapidhash_nano` when you need the absolute fastest hashing.\n\n**For legacy compatibility**: Use `murmurhash3_32` if you need compatibility with existing systems using MurmurHash.\n\n**For high collision resistance**: Use `xxh3_128` or `murmurhash3_x64_128` when you need larger hash spaces.\n\n**For small data**: Use `rapidhash_micro` or `rapidhash_nano` for very small inputs.\n\n## Performance Tips\n\n1. **Choose appropriate output size**: Use 32-bit hashes only when memory is constrained; 64-bit hashes provide better collision resistance.\n\n2. **Use seeds wisely**: Seeds allow you to create independent hash functions from the same algorithm.\n\n3. **Consider your data distribution**: Some algorithms perform better with certain types of input data.\n\n4. **Benchmark for your use case**: Performance can vary based on your specific data patterns and hardware.\n\n## Limitations\n\n- These are **non-cryptographic** hash functions - do not use them for security-sensitive applications like password hashing or digital signatures\n- Hash collisions are possible (but rare with good algorithms and appropriate output sizes)\n- Performance characteristics may vary based on input data patterns and hardware architecture\n\n## License\n\nMIT Licensed\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquery-farm%2Fhashfuncs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquery-farm%2Fhashfuncs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquery-farm%2Fhashfuncs/lists"}