{"id":30687607,"url":"https://github.com/query-farm/marisa","last_synced_at":"2025-09-02T00:04:32.141Z","repository":{"id":308653614,"uuid":"1033443470","full_name":"Query-farm/marisa","owner":"Query-farm","description":"The Marisa extension by Query.Farm integrates the fast, space-efficient MARISA trie into DuckDB, enabling high-performance string lookups, prefix searches, and autocomplete functionality.","archived":false,"fork":false,"pushed_at":"2025-08-07T04:36:42.000Z","size":16,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-07T06:09:11.657Z","etag":null,"topics":["common-prefix-search","duckdb","duckdb-extension","marisa","marisa-trie","predictive-search","sql","trie","tries"],"latest_commit_sha":null,"homepage":"https://query.farm/duckdb_extension_marisa.html","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Query-farm.png","metadata":{"files":{"readme":"docs/README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-06T20:29:11.000Z","updated_at":"2025-08-07T04:36:44.000Z","dependencies_parsed_at":"2025-08-07T06:09:13.368Z","dependency_job_id":"6bb643ab-811a-4bcb-8160-149adadefe97","html_url":"https://github.com/Query-farm/marisa","commit_stats":null,"previous_names":["query-farm/marisa"],"tags_count":null,"template":false,"template_full_name":"duckdb/extension-template","purl":"pkg:github/Query-farm/marisa","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fmarisa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fmarisa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fmarisa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fmarisa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Query-farm","download_url":"https://codeload.github.com/Query-farm/marisa/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fmarisa/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273208777,"owners_count":25064204,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-01T02:00:09.058Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["common-prefix-search","duckdb","duckdb-extension","marisa","marisa-trie","predictive-search","sql","trie","tries"],"created_at":"2025-09-02T00:03:29.088Z","updated_at":"2025-09-02T00:04:32.106Z","avatar_url":"https://github.com/Query-farm.png","language":"C++","readme":"# DuckDB Marisa Extension by [Query.Farm](https://query.farm)\n\nThe **Marisa** extension, developed by **[Query.Farm](https://query.farm)**, adds [MARISA](https://github.com/s-yata/marisa-trie) (Matching Algorithm with Recursively Implemented StorAge) trie functionality for DuckDB. [MARISA](https://github.com/s-yata/marisa-trie) is a static and space-efficient trie data structure that enables fast string lookups, prefix searches, and predictive text operations.\n\n## Use Cases\n\nMARISA tries are particularly useful for:\n- **Autocomplete/Type-ahead functionality**: Use `marisa_predictive()` to find all completions for a partial string\n- **Spell checking**: Use `marisa_lookup()` to verify if words exist in a dictionary\n- **URL routing**: Efficiently match URL patterns and extract parameters\n- **IP address prefix matching**: Network routing and firewall rules\n- **String deduplication**: Compact storage of large sets of strings with common prefixes\n\n## Installation\n\n**`marisa` is a [DuckDB Community Extension](https://github.com/duckdb/community-extensions).**\n\nYou can now use this by using this SQL:\n\n```sql\ninstall marisa from community;\nload marisa;\n```\n\nThe `marisa` extension provides several functions for working with MARISA tries:\n\n### Creating a Trie\nUse the `marisa_trie()` aggregate function to create a trie from string data:\n```sql\nCREATE TABLE employees(name TEXT);\nINSERT INTO employees VALUES('Alice'), ('Bob'), ('Charlie'), ('David'), ('Eve'), ('Frank'), ('Mallory'), ('Megan'), ('Oscar'), ('Melissa');\n\n-- Create a trie from the employee names\nCREATE TABLE employees_trie AS SELECT marisa_trie(name) AS trie FROM employees;\n\nSELECT trie, octet_length(trie) FROM employees_trie;\n┌─────────────────────────────────────────────────────────────────┬────────────────────┐\n│                              trie                               │ octet_length(trie) │\n│                              blob                               │       int64        │\n├─────────────────────────────────────────────────────────────────┼────────────────────┤\n│ We love Marisa.\\x00\\x08\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\xFD\\x1B`\\…  │        4160        │\n└─────────────────────────────────────────────────────────────────┴────────────────────┘\n```\n\n### Lookup Function\nCheck if a string exists in the trie using `marisa_lookup()`:\n```sql\n-- Check if 'Alice' exists in the trie (returns true)\nSELECT marisa_lookup(trie, 'Alice') FROM employees_trie;\n┌──────────────────────────────┐\n│ marisa_lookup(trie, 'Alice') │\n│           boolean            │\n├──────────────────────────────┤\n│ true                         │\n└──────────────────────────────┘\n\n-- Check if 'Unknown' exists in the trie (returns false)\nSELECT marisa_lookup(trie, 'Unknown') FROM employees_trie;\n┌────────────────────────────────┐\n│ marisa_lookup(trie, 'Unknown') │\n│            boolean             │\n├────────────────────────────────┤\n│ false                          │\n└────────────────────────────────┘\n```\n\n### Common Prefix Search\nFind all strings in the trie that are prefixes of a given string using `marisa_common_prefix()`:\n```sql\nCREATE TABLE countries(name TEXT);\nINSERT INTO countries VALUES ('U'), ('US'), ('USA');\nCREATE TABLE countries_trie AS SELECT marisa_trie(name) AS trie FROM countries;\n\n-- Find all prefixes of 'USA' (returns ['U', 'US', 'USA'])\nSELECT marisa_common_prefix(trie, 'USA', 10) FROM countries_trie;\n┌───────────────────────────────────────┐\n│ marisa_common_prefix(trie, 'USA', 10) │\n│               varchar[]               │\n├───────────────────────────────────────┤\n│ [U, US, USA]                          │\n└───────────────────────────────────────┘\n```\n\n### Predictive Search\nFind all strings in the trie that start with a given prefix using `marisa_predictive()`:\n```sql\n-- Find all names starting with 'Me' (returns ['Megan', 'Melissa'])\nSELECT marisa_predictive(trie, 'Me', 10) FROM employees_trie;\n┌───────────────────────────────────┐\n│ marisa_predictive(trie, 'Me', 10) │\n│             varchar[]             │\n├───────────────────────────────────┤\n│ [Megan, Melissa]                  │\n└───────────────────────────────────┘\n```\n\n## Function Reference\n\n### `marisa_trie(column)`\n**Type:** Aggregate Function\n**Description:** Creates a MARISA trie from string values in a column.\n**Parameters:**\n- `column` (VARCHAR): Column containing strings to build the trie from\n\n**Example:**\n```sql\nSELECT marisa_trie(name) FROM employees;\n```\n\n### `marisa_lookup(trie, search_string)`\n**Type:** Scalar Function\n**Description:** Checks if a string exists in the trie.\n**Parameters:**\n- `trie` (BLOB): The trie created by `marisa_trie()`\n- `search_string` (VARCHAR): String to search for\n\n**Returns:** BOOLEAN (true if found, false otherwise)\n\n**Example:**\n```sql\nSELECT marisa_lookup(trie, 'Alice') FROM employees_trie;\n```\n\n### `marisa_common_prefix(trie, search_string, max_results)`\n**Type:** Scalar Function\n**Description:** Finds all strings in the trie that are prefixes of the search string.\n**Parameters:**\n- `trie` (BLOB): The trie created by `marisa_trie()`\n- `search_string` (VARCHAR): String to find prefixes for\n- `max_results` (INTEGER): Maximum number of results to return\n\n**Returns:** LIST(VARCHAR) - List of prefix matches\n\n**Example:**\n```sql\nSELECT marisa_common_prefix(trie, 'USA', 10) FROM countries_trie;\n-- Returns: ['U', 'US', 'USA']\n```\n\n### `marisa_predictive(trie, prefix, max_results)`\n**Type:** Scalar Function\n**Description:** Finds all strings in the trie that start with the given prefix.\n**Parameters:**\n- `trie` (BLOB): The trie created by `marisa_trie()`\n- `prefix` (VARCHAR): Prefix to search for\n- `max_results` (INTEGER): Maximum number of results to return\n\n**Returns:** LIST(VARCHAR) - List of strings starting with the prefix\n\n**Example:**\n```sql\nSELECT marisa_predictive(trie, 'Me', 10) FROM employees_trie;\n-- Returns: ['Megan', 'Melissa']\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquery-farm%2Fmarisa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquery-farm%2Fmarisa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquery-farm%2Fmarisa/lists"}