{"id":21994513,"url":"https://github.com/codelibs/elasticsearch-minhash","last_synced_at":"2025-08-27T22:20:59.564Z","repository":{"id":20994154,"uuid":"24283714","full_name":"codelibs/elasticsearch-minhash","owner":"codelibs","description":"Elasticsearch plugin for b-bit minhash algorism","archived":false,"fork":false,"pushed_at":"2024-06-17T12:26:14.000Z","size":266,"stargazers_count":63,"open_issues_count":13,"forks_count":14,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-07-13T09:14:48.666Z","etag":null,"topics":["elasticsearch","elasticsearch-plugin","minhash"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/codelibs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-09-21T06:11:02.000Z","updated_at":"2025-04-29T08:35:46.000Z","dependencies_parsed_at":"2023-12-16T05:32:35.409Z","dependency_job_id":"c8583765-45bd-4eb5-a5c5-9fecb4057540","html_url":"https://github.com/codelibs/elasticsearch-minhash","commit_stats":null,"previous_names":[],"tags_count":89,"template":false,"template_full_name":null,"purl":"pkg:github/codelibs/elasticsearch-minhash","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelibs%2Felasticsearch-minhash","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelibs%2Felasticsearch-minhash/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelibs%2Felasticsearch-minhash/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelibs%2Felasticsearch-minhash/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/codelibs","download_url":"https://codeload.github.com/codelibs/elasticsearch-minhash/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelibs%2Felasticsearch-minhash/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272388424,"owners_count":24926059,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-27T02:00:09.397Z","response_time":76,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elasticsearch","elasticsearch-plugin","minhash"],"created_at":"2024-11-29T21:09:32.713Z","updated_at":"2025-08-27T22:20:59.522Z","avatar_url":"https://github.com/codelibs.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"Elasticsearch MinHash Plugin\n[![Java CI with Maven](https://github.com/codelibs/elasticsearch-minhash/actions/workflows/maven.yml/badge.svg)](https://github.com/codelibs/elasticsearch-minhash/actions/workflows/maven.yml)\n=======================\n\n## Overview\n\nMinHash Plugin provides b-bit MinHash algorithm for Elasticsearch.\nUsing a field type and a token filter provided by this plugin, you can add a minhash value to your document.\n\n## Version\n\n[Versions in Maven Repository](https://repo1.maven.org/maven2/org/codelibs/elasticsearch-minhash/)\n\n### Issues/Questions\n\nPlease file an [issue](https://github.com/codelibs/elasticsearch-minhash/issues \"issue\").\n\n## Installation\n\n    $ $ES_HOME/bin/elasticsearch-plugin install org.codelibs:elasticsearch-minhash:7.14.0\n\n## Getting Started\n\n### Add MinHash Analyzer\n\nFirst, you need to add a minhash analyzer when creating your index:\n\n    $ curl -XPUT 'localhost:9200/my_index' -d '{\n      \"index\":{\n        \"analysis\":{\n          \"analyzer\":{\n            \"minhash_analyzer\":{\n              \"type\":\"custom\",\n              \"tokenizer\":\"standard\",\n              \"filter\":[\"minhash\"]\n            }\n          }\n        }\n      }\n    }'\n\nYou are free to change tokenizer/char\\_filter/filter settings, but the minhash filter needs to be added as a last filter.\n\n### Add MinHash field\n\nPut a minhash field into an index mapping:\n\n    $ curl -XPUT \"localhost:9200/my_index/_mapping\" -d '{\n      \"properties\":{\n        \"message\":{\n          \"type\":\"string\",\n          \"copy_to\":\"minhash_value\"\n        },\n        \"minhash_value\":{\n          \"type\":\"minhash\",\n          \"store\":true,\n          \"minhash_analyzer\":\"minhash_analyzer\"\n        }\n      }\n    }'\n\nThe field type of minhash is of binary type.\nThe above example calculates a minhash value of the message field and stores it in the minhash\\_value field.\n\n## Get MinHash Value\n\nAdd the following document:\n\n    $ curl -XPUT \"localhost:9200/my_index/_doc/1\" -d '{\n      \"message\":\"Fess is Java based full text search server provided as OSS product.\"\n    }'\n\nThe minhash value is calculated automatically when adding the document.\nYou can check it as below:\n\n    $ curl -XGET \"localhost:9200/my_index/_doc/1?pretty\u0026stored_fields=minhash_value,_source\"\n\nThe response is:\n\n    {\n      \"_index\" : \"my_index\",\n      \"_type\" : \"_doc\",\n      \"_id\" : \"1\",\n      \"_version\" : 1,\n      \"found\" : true,\n      \"_source\":{\n          \"message\":\"Fess is Java based full text search server provided as OSS product.\"\n        },\n      \"fields\" : {\n        \"minhash_value\" : [ \"KV5rsUfZpcZdVojpG8mHLA==\" ]\n      }\n    }\n\n## References\n\n### Change the number of bits and hashes\n\nTo change the number of bits and hashes, set them to a token filter setting:\n\n    $ curl -XPUT 'localhost:9200/my_index' -d '{\n      \"index\":{\n        \"analysis\":{\n          \"analyzer\":{\n            \"minhash_analyzer\":{\n              \"type\":\"custom\",\n              \"tokenizer\":\"standard\",\n              \"filter\":[\"my_minhash\"]\n            }\n          }\n        },\n        \"filter\":{\n          \"my_minhash\":{\n            \"type\":\"minhash\",\n            \"seed\":100,\n            \"bit\":2,\n            \"size\":32\n          }\n        }\n      }\n    }'\n\nThe above allows to set the number of bits to 2, the number of hashes to 32 and the seed of hash to 100.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodelibs%2Felasticsearch-minhash","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodelibs%2Felasticsearch-minhash","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodelibs%2Felasticsearch-minhash/lists"}