{"id":13577411,"url":"https://github.com/raphaelsty/cherche","last_synced_at":"2025-10-11T21:49:00.561Z","repository":{"id":37831121,"uuid":"434994455","full_name":"raphaelsty/cherche","owner":"raphaelsty","description":"Neural Search","archived":false,"fork":false,"pushed_at":"2024-06-01T17:05:12.000Z","size":43591,"stargazers_count":332,"open_issues_count":4,"forks_count":14,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-09-05T15:03:33.050Z","etag":null,"topics":["bm25","flashtext","information-retrieval","machine-learning","natural-language-processing","neural-networks","neural-search","nlp","question-answering","reader","retrieval","search","searching","semantic-search","vector-search"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/raphaelsty.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.bib","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-04T19:47:06.000Z","updated_at":"2025-08-22T11:35:17.000Z","dependencies_parsed_at":"2024-01-14T04:45:23.426Z","dependency_job_id":"b79772b4-4949-4ac4-b244-bc3a72ea14ed","html_url":"https://github.com/raphaelsty/cherche","commit_stats":{"total_commits":161,"total_committers":4,"mean_commits":40.25,"dds":"0.11801242236024845","last_synced_commit":"980667fb0d53a45fbbf41b296c7da8e9da41f86e"},"previous_names":[],"tags_count":23,"template":false,"template_full_name":null,"purl":"pkg:github/raphaelsty/cherche","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raphaelsty%2Fcherche","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raphaelsty%2Fcherche/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raphaelsty%2Fcherche/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raphaelsty%2Fcherche/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/raphaelsty","download_url":"https://codeload.github.com/raphaelsty/cherche/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raphaelsty%2Fcherche/sbom","scorecard":{"id":761931,"data":{"date":"2025-08-11","repo":{"name":"github.com/raphaelsty/cherche","commit":"b640571a33b774a5157a07046e0aecb313960f14"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.5,"checks":[{"name":"Code-Review","score":0,"reason":"Found 2/21 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 14 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-22T23:42:27.242Z","repository_id":37831121,"created_at":"2025-08-22T23:42:27.242Z","updated_at":"2025-08-22T23:42:27.242Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279008823,"owners_count":26084518,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bm25","flashtext","information-retrieval","machine-learning","natural-language-processing","neural-networks","neural-search","nlp","question-answering","reader","retrieval","search","searching","semantic-search","vector-search"],"created_at":"2024-08-01T15:01:21.227Z","updated_at":"2025-10-11T21:49:00.510Z","avatar_url":"https://github.com/raphaelsty.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n  \u003ch1\u003eCherche\u003c/h1\u003e\n  \u003cp\u003eNeural search\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg width=300 src=\"docs/img/logo.png\"/\u003e\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003c!-- Documentation --\u003e\n  \u003ca href=\"https://raphaelsty.github.io/cherche/\"\u003e\u003cimg src=\"https://img.shields.io/website?label=docs\u0026style=flat-square\u0026url=https%3A%2F%2Fraphaelsty.github.io/cherche/%2F\" alt=\"documentation\"\u003e\u003c/a\u003e\n  \u003c!-- Demo --\u003e\n  \u003ca href=\"https://raphaelsty.github.io/knowledge/?query=cherche%20neural%20search\"\u003e\u003cimg src=\"https://img.shields.io/badge/demo-running-blueviolet?style=flat-square\" alt=\"Demo\"\u003e\u003c/a\u003e\n  \u003c!-- License --\u003e\n  \u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square\" alt=\"license\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n\nCherche enables the development of a neural search pipeline that employs retrievers and pre-trained language models both as retrievers and rankers. The primary advantage of Cherche lies in its capacity to construct end-to-end pipelines. Additionally, Cherche is well-suited for offline semantic search due to its compatibility with batch computation.\n\nHere are some of the features Cherche offers:\n\n[Live demo of a NLP search engine powered by Cherche](https://raphaelsty.github.io/knowledge/?query=cherche%20neural%20search)\n\n![Alt text](docs/img/explain.png)\n\n## Installation 🤖\n\nTo install Cherche for use with a simple retriever on CPU, such as TfIdf, Flash, Lunr, Fuzz, use the following command:\n\n```sh\npip install cherche\n```\n\nTo install Cherche for use with any semantic retriever or ranker on CPU, use the following command:\n\n```sh\npip install \"cherche[cpu]\"\n```\n\nFinally, if you plan to use any semantic retriever or ranker on GPU, use the following command:\n\n```sh\npip install \"cherche[gpu]\"\n```\n\nBy following these installation instructions, you will be able to use Cherche with the appropriate requirements for your needs.\n\n### Documentation\n\nDocumentation is available [here](https://raphaelsty.github.io/cherche/). It provides details\nabout retrievers, rankers, pipelines and examples.\n\n## QuickStart 📑\n\n### Documents\n\nCherche allows findings the right document within a list of objects. Here is an example of a corpus.\n\n```python\nfrom cherche import data\n\ndocuments = data.load_towns()\n\ndocuments[:3]\n[{'id': 0,\n  'title': 'Paris',\n  'url': 'https://en.wikipedia.org/wiki/Paris',\n  'article': 'Paris is the capital and most populous city of France.'},\n {'id': 1,\n  'title': 'Paris',\n  'url': 'https://en.wikipedia.org/wiki/Paris',\n  'article': \"Since the 17th century, Paris has been one of Europe's major centres of science, and arts.\"},\n {'id': 2,\n  'title': 'Paris',\n  'url': 'https://en.wikipedia.org/wiki/Paris',\n  'article': 'The City of Paris is the centre and seat of government of the region and province of Île-de-France.'\n  }]\n```\n\n### Retriever ranker\n\nHere is an example of a neural search pipeline composed of a TF-IDF that quickly retrieves documents, followed by a ranking model. The ranking model sorts the documents produced by the retriever based on the semantic similarity between the query and the documents. We can call the pipeline using a list of queries and get relevant documents for each query.\n\n```python\nfrom cherche import data, retrieve, rank\nfrom sentence_transformers import SentenceTransformer\nfrom lenlp import sparse\n\n# List of dicts\ndocuments = data.load_towns()\n\n# Retrieve on fields title and article\nretriever = retrieve.BM25(\n  key=\"id\", \n  on=[\"title\", \"article\"], \n  documents=documents, \n  k=30\n)\n\n# Rank on fields title and article\nranker = rank.Encoder(\n    key = \"id\",\n    on = [\"title\", \"article\"],\n    encoder = SentenceTransformer(\"sentence-transformers/all-mpnet-base-v2\").encode,\n    k = 3,\n)\n\n# Pipeline creation\nsearch = retriever + ranker\n\nsearch.add(documents=documents)\n\n# Search documents for 3 queries.\nsearch([\"Bordeaux\", \"Paris\", \"Toulouse\"])\n[[{'id': 57, 'similarity': 0.69513524},\n  {'id': 63, 'similarity': 0.6214994},\n  {'id': 65, 'similarity': 0.61809087}],\n [{'id': 16, 'similarity': 0.59158516},\n  {'id': 0, 'similarity': 0.58217555},\n  {'id': 1, 'similarity': 0.57944715}],\n [{'id': 26, 'similarity': 0.6925601},\n  {'id': 37, 'similarity': 0.63977146},\n  {'id': 28, 'similarity': 0.62772334}]]\n```\n\nWe can map the index to the documents to access their contents using pipelines:\n\n```python\nsearch += documents\nsearch([\"Bordeaux\", \"Paris\", \"Toulouse\"])\n[[{'id': 57,\n   'title': 'Bordeaux',\n   'url': 'https://en.wikipedia.org/wiki/Bordeaux',\n   'similarity': 0.69513524},\n  {'id': 63,\n   'title': 'Bordeaux',\n   'similarity': 0.6214994},\n  {'id': 65,\n   'title': 'Bordeaux',\n   'url': 'https://en.wikipedia.org/wiki/Bordeaux',\n   'similarity': 0.61809087}],\n [{'id': 16,\n   'title': 'Paris',\n   'url': 'https://en.wikipedia.org/wiki/Paris',\n   'article': 'Paris received 12.',\n   'similarity': 0.59158516},\n  {'id': 0,\n   'title': 'Paris',\n   'url': 'https://en.wikipedia.org/wiki/Paris',\n   'similarity': 0.58217555},\n  {'id': 1,\n   'title': 'Paris',\n   'url': 'https://en.wikipedia.org/wiki/Paris',\n   'similarity': 0.57944715}],\n [{'id': 26,\n   'title': 'Toulouse',\n   'url': 'https://en.wikipedia.org/wiki/Toulouse',\n   'similarity': 0.6925601},\n  {'id': 37,\n   'title': 'Toulouse',\n   'url': 'https://en.wikipedia.org/wiki/Toulouse',\n   'similarity': 0.63977146},\n  {'id': 28,\n   'title': 'Toulouse',\n   'url': 'https://en.wikipedia.org/wiki/Toulouse',\n   'similarity': 0.62772334}]]\n```\n\n## Retrieve\n\nCherche provides [retrievers](https://raphaelsty.github.io/cherche/retrieve/retrieve/) that filter input documents based on a query.\n\n- retrieve.TfIdf\n- retrieve.BM25\n- retrieve.Lunr\n- retrieve.Flash\n- retrieve.Encoder\n- retrieve.DPR\n- retrieve.Fuzz\n- retrieve.Embedding\n\n## Rank\n\nCherche provides [rankers](https://raphaelsty.github.io/cherche/rank/rank/) that filter documents in output of retrievers.\n\nCherche rankers are compatible with [SentenceTransformers](https://www.sbert.net/docs/pretrained_models.html) models which are available on [Hugging Face hub](https://huggingface.co/models?pipeline_tag=zero-shot-classification\u0026sort=downloads).\n\n- rank.Encoder\n- rank.DPR\n- rank.CrossEncoder\n- rank.Embedding\n\n## Question answering\n\nCherche provides modules dedicated to question answering. These modules are compatible with Hugging Face's pre-trained models and fully integrated into neural search pipelines.\n\n## Contributors 🤝\nCherche was created for/by Renault and is now available to all.\nWe welcome all contributions.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"docs/img/renault.jpg\"/\u003e\u003c/p\u003e\n\n## Acknowledgements 👏\n\nLunr retriever is a wrapper around [Lunr.py](https://github.com/yeraydiazdiaz/lunr.py). Flash retriever is a wrapper around [FlashText](https://github.com/vi3k6i5/flashtext). DPR, Encode and CrossEncoder rankers are wrappers dedicated to the use of the pre-trained models of [SentenceTransformers](https://www.sbert.net/docs/pretrained_models.html) in a neural search pipeline.\n\n## Citations\n\nIf you use cherche to produce results for your scientific publication, please refer to our SIGIR paper:\n\n```bibtex\n@inproceedings{Sourty2022sigir,\n    author = {Raphael Sourty and Jose G. Moreno and Lynda Tamine and Francois-Paul Servant},\n    title = {CHERCHE: A new tool to rapidly implement pipelines in information retrieval},\n    booktitle = {Proceedings of SIGIR 2022},\n    year = {2022}\n}\n```\n\n## Dev Team 💾\n\nThe Cherche dev team is made up of [Raphaël Sourty](https://github.com/raphaelsty), [François-Paul Servant](https://github.com/fpservant), [Nicolas Bizzozzero](https://github.com/NicolasBizzozzero), [Jose G Moreno](https://scholar.google.com/citations?user=4BZFUw8AAAAJ\u0026hl=fr). 🥳\n","funding_links":[],"categories":["Python","nlp"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraphaelsty%2Fcherche","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraphaelsty%2Fcherche","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraphaelsty%2Fcherche/lists"}