{"id":18730045,"url":"https://github.com/primeqa/docuverse","last_synced_at":"2025-04-12T17:08:14.324Z","repository":{"id":227364814,"uuid":"771205431","full_name":"primeqa/docuverse","owner":"primeqa","description":null,"archived":false,"fork":false,"pushed_at":"2025-04-04T18:44:12.000Z","size":4291,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-12T17:08:14.064Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/primeqa.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-12T22:01:11.000Z","updated_at":"2025-04-04T04:56:01.000Z","dependencies_parsed_at":"2024-03-12T23:45:21.523Z","dependency_job_id":"3edc673f-7547-4d2d-8e39-9ba2160ed7fe","html_url":"https://github.com/primeqa/docuverse","commit_stats":null,"previous_names":["primeqa/docuverse"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/primeqa%2Fdocuverse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/primeqa%2Fdocuverse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/primeqa%2Fdocuverse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/primeqa%2Fdocuverse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/primeqa","download_url":"https://codeload.github.com/primeqa/docuverse/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248602315,"owners_count":21131616,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T14:34:02.165Z","updated_at":"2025-04-12T17:08:14.319Z","avatar_url":"https://github.com/primeqa.png","language":"Jupyter Notebook","readme":"\u003c!---\nCopyright 2022 IBM Corp.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n--\u003e\n\n\u003ch3 align=\"center\"\u003e\n    \u003cimg width=\"350\" alt=\"primeqa\" src=\"docs/_static/img/PrimeQA.png\"\u003e\n    \u003cp\u003eRepository for (almost) *all* your document search needs.\u003c/p\u003e\n    \u003cp\u003ePart of the Prime Repository for State-of-the-Art Multilingual QuestionAnswering Research and Development.\u003c/p\u003e\n\u003c/h3\u003e\n\n[//]: # (![Build Status]\u0026#40;https://github.com/primeqa/primeqa/actions/workflows/primeqa-ci.yml/badge.svg\u0026#41;)\n\n[//]: # ([![LICENSE|Apache2.0]\u0026#40;https://img.shields.io/github/license/saltstack/salt?color=blue\u0026#41;]\u0026#40;https://www.apache.org/licenses/LICENSE-2.0.txt\u0026#41;)\n\n[//]: # ([![sphinx-doc-build]\u0026#40;https://github.com/primeqa/primeqa/actions/workflows/sphinx-doc-build.yml/badge.svg\u0026#41;]\u0026#40;https://github.com/primeqa/primeqa/actions/workflows/sphinx-doc-build.yml\u0026#41;   )\n\nDocUServe is a public open source repository that enables researchers and developers to quickly\nexperiment with various search engines (such as ElasticSearch, ChromaDB, Milvus, PrimeQA, FAISS)\nboth in direct search and reranking scenarios. By using DocUVerse, a researcher\ncan replicate the experiments outlined in a paper published in the latest NLP \nconference while also enjoying the capability to download pre-trained models \n(from an online repository) and run them on their own custom data. DocUVerse is built \non top of the [Transformers](https://github.com/huggingface/transformers), PrimeQA, and Elasticsearch toolkits and uses [datasets](https://huggingface.co/datasets/viewer/) and \n[models](https://huggingface.co/PrimeQA) that are directly \ndownloadable.\n\n## Design\n\nThe following is a code snippet showing how to ingesting a new corpus (create an index for a specific engine), \nread the query file, run the search, compute the results and print them:\n```python\nfrom docuverse import SearchEngine\nengine = SearchEngine(config_or_path=\"data/clapnq_small/milvus-test.yaml\")\n\n# Read the ClapNQ dataset\ndata = engine.read_data() # or engine.read_data(engine.config.input_passages)\n#Ingest the data\nengine.ingest(data)\n\n# Read the queries\nqueries = engine.read_questions() # or engine.read_questions(engine.config.input_queries)\n# Run the retrieval\nresults = engine.search(queries)\n# Evaluation and print the results\nscores = engine.compute_score(queries, results)\n\n# Print the evaluation results in a human-readable format.\nprint(f\"Results:\\n{scores}\")\n```\n\n## ✔️ Getting Started\n\n### Installation\n[Installation doc](https://primeqa.github.io/primeqa/installation.html)       \n\n```shell\n# cd to project root\n\n# If you want to run on GPU make sure to install torch appropriately\n\n# Install as editable (-e) or non-editable using pip, with extras (e.g. tests) as desired\n# Example installation commands:\n\n# Minimal install (non-editable)\npip install .\n\n# Full install (editable)\npip install -e .[all]\n\n# Install milvus and/or elastic dependencies, and the pyizumo library (if you have acecess to it)\npip install -r requirements-milvus.txt\npip install -r requirements-elastic.txt\npip install -r requirements_extra.txt\n```\n\nPlease note that dependencies (specified in [setup.py](./setup.py)) are pinned to provide a stable experience.\nWhen installing from source these can be modified, however this is not officially supported.\n\n## 🔭 Learn more (not yet working)\n\n| Section                                                                                     | Description                                                |\n|---------------------------------------------------------------------------------------------|------------------------------------------------------------|\n| 📒 [Documentation](https://primeqa.github.io/primeqa)                                       | Start API documentation and tutorials                      |\n| 📓 [Tutorials: Jupyter Notebooks](https://github.com/primeqa/docuverse/tree/main/notebooks) | Notebooks to get started on QA tasks                       |\n| 🤗 [Model sharing and uploading](https://huggingface.co/docs/transformers/model_sharing)    | Upload and share your fine-tuned models with the community |\n| ✅ [Pull Request](https://primeqa.github.io/docuverse/pull_request_template.html)            | PrimeQA Pull Request                                       |\n| 📄 [Generate Documentation](https://primeqa.github.io/primeqa/README.html)                  | How Documentation works                                    |        \n\n## ❤️ DocUVerse collaborators include: Sara Rosenthal, Parul Awasthy, Scott McCarley, Jatin Ganhotra, and Radu Florian.       \n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprimeqa%2Fdocuverse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprimeqa%2Fdocuverse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprimeqa%2Fdocuverse/lists"}