https://github.com/primeqa/primeqa

The prime repository for state-of-the-art Multilingual Question Answering research and development.
https://github.com/primeqa/primeqa

ai bert dpr ibm ibm-research-ai language-model machine-learning natural-language-processing neural-information-retrieval neural-search nlp python pytorch question-answering semantic-search squad transfer-learning

Last synced: about 1 month ago
JSON representation

The prime repository for state-of-the-art Multilingual Question Answering research and development.

Host: GitHub
URL: https://github.com/primeqa/primeqa
Owner: primeqa
License: apache-2.0
Created: 2022-06-15T02:15:41.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2025-01-08T21:04:52.000Z (5 months ago)
Last Synced: 2025-05-01T00:42:14.101Z (about 2 months ago)
Topics: ai, bert, dpr, ibm, ibm-research-ai, language-model, machine-learning, natural-language-processing, neural-information-retrieval, neural-search, nlp, python, pytorch, question-answering, semantic-search, squad, transfer-learning
Language: Python
Homepage: https://primeqa.github.io/primeqa
Size: 51 MB
Stars: 732
Watchers: 28
Forks: 57
Open Issues: 24
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        



    

    
The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development.


![Build Status](https://github.com/primeqa/primeqa/actions/workflows/primeqa-ci.yml/badge.svg)

[![LICENSE|Apache2.0](https://img.shields.io/github/license/saltstack/salt?color=blue)](https://www.apache.org/licenses/LICENSE-2.0.txt)

[![sphinx-doc-build](https://github.com/primeqa/primeqa/actions/workflows/sphinx-doc-build.yml/badge.svg)](https://github.com/primeqa/primeqa/actions/workflows/sphinx-doc-build.yml)   

PrimeQA is a public open source repository that enables researchers and developers to train state-of-the-art models for question answering (QA). By using PrimeQA, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data. PrimeQA is built on top of the [Transformers](https://github.com/huggingface/transformers) toolkit and uses [datasets](https://huggingface.co/datasets/viewer/) and [models](https://huggingface.co/PrimeQA) that are directly downloadable.

The models within PrimeQA supports End-to-end Question Answering. PrimeQA answers questions via 

- [Information Retrieval](https://github.com/primeqa/primeqa/tree/main/primeqa/ir): Retrieving documents and passages using both traditional (e.g. BM25) and neural (e.g. ColBERT) models

- [Multilingual Machine Reading Comprehension](https://huggingface.co/ibm/tydiqa-primary-task-xlm-roberta-large): Extract and/ or generate answers given the source document or passage.

- [Multilingual Question Generation](https://huggingface.co/PrimeQA/mt5-base-tydi-question-generator): Supports generation of questions for effective domain adaptation over [tables](https://huggingface.co/PrimeQA/t5-base-table-question-generator) and [multilingual text](https://huggingface.co/PrimeQA/mt5-base-tydi-question-generator).

- [Retrieval Augmented Generation](https://github.com/primeqa/primeqa/blob/main/notebooks/retriever-reader-pipelines/prompt_reader_with_GPT.ipynb): Generate answers using the GPT-3/ChatGPT pretrained models, conditioned on retrieved passages. 

Some examples of models (applicable on benchmark datasets) supported are :

- [Traditional IR with BM25](https://github.com/primeqa/primeqa/tree/main/primeqa/ir/) Pyserini

- [Neural IR with ColBERT, DPR](https://github.com/primeqa/primeqa/tree/main/primeqa/ir) (collaboration with [Stanford NLP](https://nlp.stanford.edu/) IR led by [Chris Potts](https://web.stanford.edu/~cgpotts/) & [Matei Zaharia](https://cs.stanford.edu/~matei/)).

Replicating the experiments that [Dr. Decr](https://huggingface.co/ibm/DrDecr_XOR-TyDi_whitebox) (Li et. al, 2022) performed to reach the top of the XOR TyDI leaderboard.

- [Machine Reading Comprehension with XLM-R](https://github.com/primeqa/primeqa/tree/main/primeqa/mrc): to replicate the experiments to get to the top of the TyDI leaderboard similar to the performance of the IBM GAAMA system. Coming soon: code to replicate GAAMA's performance on Natural Questions. 

## 🏅 Top of the Leaderboard

PrimeQA is at the top of several leaderboards: XOR-TyDi, TyDiQA-main, OTT-QA and HybridQA.

### [XOR-TyDi](https://nlp.cs.washington.edu/xorqa/)



### [TyDiQA-main](https://ai.google.com/research/tydiqa)



### [OTT-QA](https://codalab.lisn.upsaclay.fr/competitions/7967)



### [HybridQA](https://codalab.lisn.upsaclay.fr/competitions/7979)



## ✔️ Getting Started

### Installation

[Installation doc](https://primeqa.github.io/primeqa/installation.html)       

```shell

# cd to project root

# If you want to run on GPU make sure to install torch appropriately

# E.g. for torch 1.11 + CUDA 11.3:

pip install 'torch~=1.11.0' --extra-index-url https://download.pytorch.org/whl/cu113

# Install as editable (-e) or non-editable using pip, with extras (e.g. tests) as desired

# Example installation commands:

# Minimal install (non-editable)

pip install .

# GPU support

pip install .[gpu]

# Full install (editable)

pip install -e .[all]

```

Please note that dependencies (specified in [setup.py](./setup.py)) are pinned to provide a stable experience.

When installing from source these can be modified, however this is not officially supported.

**Note:** in many environments, conda-forge based faiss libraries perform substantially better than the default ones installed with pip. To install faiss libraries from conda-forge, use the following steps:

- Create and activate a conda environment

- Install faiss libraries, using a command

```conda install -c conda-forge faiss=1.7.0 faiss-gpu=1.7.0```

- In `setup.py`, remove the faiss-related lines:

```commandline

"faiss-cpu~=1.7.2": ["install", "gpu"],

"faiss-gpu~=1.7.2": ["gpu"],

```

- Continue with the `pip install` commands as desctibed above.

### JAVA requirements

Java 11 is required for BM25 retrieval. Install java as follows:

```shell

conda install -c conda-forge openjdk=11

```

## :speech_balloon: Blog Posts

There're several blog posts by members of the open source community on how they've been using PrimeQA for their needs. Read some of them:

1. [PrimeQA and GPT 3](https://www.marktechpost.com/2023/03/03/with-just-20-lines-of-python-code-you-can-do-retrieval-augmented-gpt-based-qa-using-this-open-source-repository-called-primeqa/)

2. [Enterprise search with PrimeQA](https://heidloff.net/article/introduction-neural-information-retrieval/)

3. [A search engine for Trivia geeks](https://www.deleeuw.me.uk/posts/Using-PrimeQA-For-NLP-Question-Answering/)

## 🧪 Unit Tests

[Testing doc](https://primeqa.github.io/primeqa/testing.html)       

To run the unit tests you first need to [install PrimeQA](#Installation).

Make sure to install with the `[tests]` or `[all]` extras from pip.

From there you can run the tests via pytest, for example:

```shell

pytest --cov PrimeQA --cov-config .coveragerc tests/

```

For more information, see:

- Our [tox.ini](./tox.ini)

- The [pytest](https://docs.pytest.org) and [tox](https://tox.wiki/en/latest/) documentation    

## 🔭 Learn more

| Section | Description |

|-|-|

| 📒 [Documentation](https://primeqa.github.io/primeqa) | Full API documentation and tutorials |

| 🏁 [Quick tour: Entry Points for PrimeQA](https://github.com/primeqa/primeqa/tree/main/primeqa) | Different entry points for PrimeQA: Information Retrieval, Reading Comprehension, TableQA and Question Generation |

| 📓 [Tutorials: Jupyter Notebooks](https://github.com/primeqa/primeqa/tree/main/notebooks) | Notebooks to get started on QA tasks |

| 📓 [GPT-3/ChatGPT Reader Notebooks](https://github.com/primeqa/primeqa/tree/main/notebooks/mrc/LLM_reader_predict_mode.ipynb) | Notebooks to get started with the GPT-3/ChatGPT reader components|

| 💻 [Examples: Applying PrimeQA on various QA tasks](https://github.com/primeqa/primeqa/tree/main/examples) | Example scripts for fine-tuning PrimeQA models on a range of QA tasks |

| 🤗 [Model sharing and uploading](https://huggingface.co/docs/transformers/model_sharing) | Upload and share your fine-tuned models with the community |

| ✅ [Pull Request](https://primeqa.github.io/primeqa/pull_request_template.html) | PrimeQA Pull Request |

| 📄 [Generate Documentation](https://primeqa.github.io/primeqa/README.html) | How Documentation works |        

| 🛠 [Orchestrator Service REST Microservice](https://primeqa.github.io/primeqa/orchestrator.html) | Proof-of-concept code for PrimeQA Orchestrator microservice |        

| 📖 [Tooling UI](https://primeqa.github.io/primeqa/tooling_ui.html) | Demo UI |        

## ❤️ PrimeQA collaborators include       

| | | | |

|:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|

|| Stanford NLP || University of Illinois |

|| University of Stuttgart | | University of Notre Dame |

|| Ohio State University || Carnegie Mellon University |

|| University of Massachusetts || IBM Research |

| | | | |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/primeqa/primeqa

Awesome Lists containing this project

README