Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/primeqa/primeqa
The prime repository for state-of-the-art Multilingual Question Answering research and development.
https://github.com/primeqa/primeqa
ai bert dpr ibm ibm-research-ai language-model machine-learning natural-language-processing neural-information-retrieval neural-search nlp python pytorch question-answering semantic-search squad transfer-learning
Last synced: 6 days ago
JSON representation
The prime repository for state-of-the-art Multilingual Question Answering research and development.
- Host: GitHub
- URL: https://github.com/primeqa/primeqa
- Owner: primeqa
- License: apache-2.0
- Created: 2022-06-15T02:15:41.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-01-08T21:04:52.000Z (14 days ago)
- Last Synced: 2025-01-09T12:14:26.533Z (13 days ago)
- Topics: ai, bert, dpr, ibm, ibm-research-ai, language-model, machine-learning, natural-language-processing, neural-information-retrieval, neural-search, nlp, python, pytorch, question-answering, semantic-search, squad, transfer-learning
- Language: Python
- Homepage: https://primeqa.github.io/primeqa
- Size: 51 MB
- Stars: 730
- Watchers: 28
- Forks: 57
- Open Issues: 24
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development.
![Build Status](https://github.com/primeqa/primeqa/actions/workflows/primeqa-ci.yml/badge.svg)
[![LICENSE|Apache2.0](https://img.shields.io/github/license/saltstack/salt?color=blue)](https://www.apache.org/licenses/LICENSE-2.0.txt)
[![sphinx-doc-build](https://github.com/primeqa/primeqa/actions/workflows/sphinx-doc-build.yml/badge.svg)](https://github.com/primeqa/primeqa/actions/workflows/sphinx-doc-build.yml)PrimeQA is a public open source repository that enables researchers and developers to train state-of-the-art models for question answering (QA). By using PrimeQA, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data. PrimeQA is built on top of the [Transformers](https://github.com/huggingface/transformers) toolkit and uses [datasets](https://huggingface.co/datasets/viewer/) and [models](https://huggingface.co/PrimeQA) that are directly downloadable.
The models within PrimeQA supports End-to-end Question Answering. PrimeQA answers questions via
- [Information Retrieval](https://github.com/primeqa/primeqa/tree/main/primeqa/ir): Retrieving documents and passages using both traditional (e.g. BM25) and neural (e.g. ColBERT) models
- [Multilingual Machine Reading Comprehension](https://huggingface.co/ibm/tydiqa-primary-task-xlm-roberta-large): Extract and/ or generate answers given the source document or passage.
- [Multilingual Question Generation](https://huggingface.co/PrimeQA/mt5-base-tydi-question-generator): Supports generation of questions for effective domain adaptation over [tables](https://huggingface.co/PrimeQA/t5-base-table-question-generator) and [multilingual text](https://huggingface.co/PrimeQA/mt5-base-tydi-question-generator).
- [Retrieval Augmented Generation](https://github.com/primeqa/primeqa/blob/main/notebooks/retriever-reader-pipelines/prompt_reader_with_GPT.ipynb): Generate answers using the GPT-3/ChatGPT pretrained models, conditioned on retrieved passages.Some examples of models (applicable on benchmark datasets) supported are :
- [Traditional IR with BM25](https://github.com/primeqa/primeqa/tree/main/primeqa/ir/) Pyserini
- [Neural IR with ColBERT, DPR](https://github.com/primeqa/primeqa/tree/main/primeqa/ir) (collaboration with [Stanford NLP](https://nlp.stanford.edu/) IR led by [Chris Potts](https://web.stanford.edu/~cgpotts/) & [Matei Zaharia](https://cs.stanford.edu/~matei/)).
Replicating the experiments that [Dr. Decr](https://huggingface.co/ibm/DrDecr_XOR-TyDi_whitebox) (Li et. al, 2022) performed to reach the top of the XOR TyDI leaderboard.
- [Machine Reading Comprehension with XLM-R](https://github.com/primeqa/primeqa/tree/main/primeqa/mrc): to replicate the experiments to get to the top of the TyDI leaderboard similar to the performance of the IBM GAAMA system. Coming soon: code to replicate GAAMA's performance on Natural Questions.## ๐ Top of the Leaderboard
PrimeQA is at the top of several leaderboards: XOR-TyDi, TyDiQA-main, OTT-QA and HybridQA.
### [XOR-TyDi](https://nlp.cs.washington.edu/xorqa/)
### [TyDiQA-main](https://ai.google.com/research/tydiqa)
### [OTT-QA](https://codalab.lisn.upsaclay.fr/competitions/7967)
### [HybridQA](https://codalab.lisn.upsaclay.fr/competitions/7979)
## โ๏ธ Getting Started
### Installation
[Installation doc](https://primeqa.github.io/primeqa/installation.html)```shell
# cd to project root# If you want to run on GPU make sure to install torch appropriately
# E.g. for torch 1.11 + CUDA 11.3:
pip install 'torch~=1.11.0' --extra-index-url https://download.pytorch.org/whl/cu113# Install as editable (-e) or non-editable using pip, with extras (e.g. tests) as desired
# Example installation commands:# Minimal install (non-editable)
pip install .# GPU support
pip install .[gpu]# Full install (editable)
pip install -e .[all]
```Please note that dependencies (specified in [setup.py](./setup.py)) are pinned to provide a stable experience.
When installing from source these can be modified, however this is not officially supported.**Note:** in many environments, conda-forge based faiss libraries perform substantially better than the default ones installed with pip. To install faiss libraries from conda-forge, use the following steps:
- Create and activate a conda environment
- Install faiss libraries, using a command```conda install -c conda-forge faiss=1.7.0 faiss-gpu=1.7.0```
- In `setup.py`, remove the faiss-related lines:
```commandline
"faiss-cpu~=1.7.2": ["install", "gpu"],
"faiss-gpu~=1.7.2": ["gpu"],
```- Continue with the `pip install` commands as desctibed above.
### JAVA requirements
Java 11 is required for BM25 retrieval. Install java as follows:```shell
conda install -c conda-forge openjdk=11
```
## :speech_balloon: Blog Posts
There're several blog posts by members of the open source community on how they've been using PrimeQA for their needs. Read some of them:
1. [PrimeQA and GPT 3](https://www.marktechpost.com/2023/03/03/with-just-20-lines-of-python-code-you-can-do-retrieval-augmented-gpt-based-qa-using-this-open-source-repository-called-primeqa/)
2. [Enterprise search with PrimeQA](https://heidloff.net/article/introduction-neural-information-retrieval/)
3. [A search engine for Trivia geeks](https://www.deleeuw.me.uk/posts/Using-PrimeQA-For-NLP-Question-Answering/)## ๐งช Unit Tests
[Testing doc](https://primeqa.github.io/primeqa/testing.html)To run the unit tests you first need to [install PrimeQA](#Installation).
Make sure to install with the `[tests]` or `[all]` extras from pip.From there you can run the tests via pytest, for example:
```shell
pytest --cov PrimeQA --cov-config .coveragerc tests/
```For more information, see:
- Our [tox.ini](./tox.ini)
- The [pytest](https://docs.pytest.org) and [tox](https://tox.wiki/en/latest/) documentation## ๐ญ Learn more
| Section | Description |
|-|-|
| ๐ [Documentation](https://primeqa.github.io/primeqa) | Full API documentation and tutorials |
| ๐ [Quick tour: Entry Points for PrimeQA](https://github.com/primeqa/primeqa/tree/main/primeqa) | Different entry points for PrimeQA: Information Retrieval, Reading Comprehension, TableQA and Question Generation |
| ๐ [Tutorials: Jupyter Notebooks](https://github.com/primeqa/primeqa/tree/main/notebooks) | Notebooks to get started on QA tasks |
| ๐ [GPT-3/ChatGPT Reader Notebooks](https://github.com/primeqa/primeqa/tree/main/notebooks/mrc/LLM_reader_predict_mode.ipynb) | Notebooks to get started with the GPT-3/ChatGPT reader components|
| ๐ป [Examples: Applying PrimeQA on various QA tasks](https://github.com/primeqa/primeqa/tree/main/examples) | Example scripts for fine-tuning PrimeQA models on a range of QA tasks |
| ๐ค [Model sharing and uploading](https://huggingface.co/docs/transformers/model_sharing) | Upload and share your fine-tuned models with the community |
| โ [Pull Request](https://primeqa.github.io/primeqa/pull_request_template.html) | PrimeQA Pull Request |
| ๐ [Generate Documentation](https://primeqa.github.io/primeqa/README.html) | How Documentation works |
| ๐ [Orchestrator Service REST Microservice](https://primeqa.github.io/primeqa/orchestrator.html) | Proof-of-concept code for PrimeQA Orchestrator microservice |
| ๐ [Tooling UI](https://primeqa.github.io/primeqa/tooling_ui.html) | Demo UI |## โค๏ธ PrimeQA collaborators include
| | | | |
|:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|
|| Stanford NLP || University of Illinois |
|| University of Stuttgart | | University of Notre Dame |
|| Ohio State University || Carnegie Mellon University |
|| University of Massachusetts || IBM Research |
| | | | |