Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/primeqa/primeqa

The prime repository for state-of-the-art Multilingual Question Answering research and development.
https://github.com/primeqa/primeqa

ai bert dpr ibm ibm-research-ai language-model machine-learning natural-language-processing neural-information-retrieval neural-search nlp python pytorch question-answering semantic-search squad transfer-learning

Last synced: 6 days ago
JSON representation

The prime repository for state-of-the-art Multilingual Question Answering research and development.

Awesome Lists containing this project

README

        


primeqa

The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development.

![Build Status](https://github.com/primeqa/primeqa/actions/workflows/primeqa-ci.yml/badge.svg)
[![LICENSE|Apache2.0](https://img.shields.io/github/license/saltstack/salt?color=blue)](https://www.apache.org/licenses/LICENSE-2.0.txt)
[![sphinx-doc-build](https://github.com/primeqa/primeqa/actions/workflows/sphinx-doc-build.yml/badge.svg)](https://github.com/primeqa/primeqa/actions/workflows/sphinx-doc-build.yml)

PrimeQA is a public open source repository that enables researchers and developers to train state-of-the-art models for question answering (QA). By using PrimeQA, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data. PrimeQA is built on top of the [Transformers](https://github.com/huggingface/transformers) toolkit and uses [datasets](https://huggingface.co/datasets/viewer/) and [models](https://huggingface.co/PrimeQA) that are directly downloadable.

The models within PrimeQA supports End-to-end Question Answering. PrimeQA answers questions via
- [Information Retrieval](https://github.com/primeqa/primeqa/tree/main/primeqa/ir): Retrieving documents and passages using both traditional (e.g. BM25) and neural (e.g. ColBERT) models
- [Multilingual Machine Reading Comprehension](https://huggingface.co/ibm/tydiqa-primary-task-xlm-roberta-large): Extract and/ or generate answers given the source document or passage.
- [Multilingual Question Generation](https://huggingface.co/PrimeQA/mt5-base-tydi-question-generator): Supports generation of questions for effective domain adaptation over [tables](https://huggingface.co/PrimeQA/t5-base-table-question-generator) and [multilingual text](https://huggingface.co/PrimeQA/mt5-base-tydi-question-generator).
- [Retrieval Augmented Generation](https://github.com/primeqa/primeqa/blob/main/notebooks/retriever-reader-pipelines/prompt_reader_with_GPT.ipynb): Generate answers using the GPT-3/ChatGPT pretrained models, conditioned on retrieved passages.

Some examples of models (applicable on benchmark datasets) supported are :
- [Traditional IR with BM25](https://github.com/primeqa/primeqa/tree/main/primeqa/ir/) Pyserini
- [Neural IR with ColBERT, DPR](https://github.com/primeqa/primeqa/tree/main/primeqa/ir) (collaboration with [Stanford NLP](https://nlp.stanford.edu/) IR led by [Chris Potts](https://web.stanford.edu/~cgpotts/) & [Matei Zaharia](https://cs.stanford.edu/~matei/)).
Replicating the experiments that [Dr. Decr](https://huggingface.co/ibm/DrDecr_XOR-TyDi_whitebox) (Li et. al, 2022) performed to reach the top of the XOR TyDI leaderboard.
- [Machine Reading Comprehension with XLM-R](https://github.com/primeqa/primeqa/tree/main/primeqa/mrc): to replicate the experiments to get to the top of the TyDI leaderboard similar to the performance of the IBM GAAMA system. Coming soon: code to replicate GAAMA's performance on Natural Questions.

## ๐Ÿ… Top of the Leaderboard

PrimeQA is at the top of several leaderboards: XOR-TyDi, TyDiQA-main, OTT-QA and HybridQA.

### [XOR-TyDi](https://nlp.cs.washington.edu/xorqa/)

### [TyDiQA-main](https://ai.google.com/research/tydiqa)

### [OTT-QA](https://codalab.lisn.upsaclay.fr/competitions/7967)

### [HybridQA](https://codalab.lisn.upsaclay.fr/competitions/7979)

## โœ”๏ธ Getting Started

### Installation
[Installation doc](https://primeqa.github.io/primeqa/installation.html)

```shell
# cd to project root

# If you want to run on GPU make sure to install torch appropriately

# E.g. for torch 1.11 + CUDA 11.3:
pip install 'torch~=1.11.0' --extra-index-url https://download.pytorch.org/whl/cu113

# Install as editable (-e) or non-editable using pip, with extras (e.g. tests) as desired
# Example installation commands:

# Minimal install (non-editable)
pip install .

# GPU support
pip install .[gpu]

# Full install (editable)
pip install -e .[all]
```

Please note that dependencies (specified in [setup.py](./setup.py)) are pinned to provide a stable experience.
When installing from source these can be modified, however this is not officially supported.

**Note:** in many environments, conda-forge based faiss libraries perform substantially better than the default ones installed with pip. To install faiss libraries from conda-forge, use the following steps:

- Create and activate a conda environment
- Install faiss libraries, using a command

```conda install -c conda-forge faiss=1.7.0 faiss-gpu=1.7.0```

- In `setup.py`, remove the faiss-related lines:

```commandline
"faiss-cpu~=1.7.2": ["install", "gpu"],
"faiss-gpu~=1.7.2": ["gpu"],
```

- Continue with the `pip install` commands as desctibed above.

### JAVA requirements
Java 11 is required for BM25 retrieval. Install java as follows:

```shell
conda install -c conda-forge openjdk=11
```
## :speech_balloon: Blog Posts
There're several blog posts by members of the open source community on how they've been using PrimeQA for their needs. Read some of them:
1. [PrimeQA and GPT 3](https://www.marktechpost.com/2023/03/03/with-just-20-lines-of-python-code-you-can-do-retrieval-augmented-gpt-based-qa-using-this-open-source-repository-called-primeqa/)
2. [Enterprise search with PrimeQA](https://heidloff.net/article/introduction-neural-information-retrieval/)
3. [A search engine for Trivia geeks](https://www.deleeuw.me.uk/posts/Using-PrimeQA-For-NLP-Question-Answering/)

## ๐Ÿงช Unit Tests
[Testing doc](https://primeqa.github.io/primeqa/testing.html)

To run the unit tests you first need to [install PrimeQA](#Installation).
Make sure to install with the `[tests]` or `[all]` extras from pip.

From there you can run the tests via pytest, for example:
```shell
pytest --cov PrimeQA --cov-config .coveragerc tests/
```

For more information, see:
- Our [tox.ini](./tox.ini)
- The [pytest](https://docs.pytest.org) and [tox](https://tox.wiki/en/latest/) documentation

## ๐Ÿ”ญ Learn more

| Section | Description |
|-|-|
| ๐Ÿ“’ [Documentation](https://primeqa.github.io/primeqa) | Full API documentation and tutorials |
| ๐Ÿ [Quick tour: Entry Points for PrimeQA](https://github.com/primeqa/primeqa/tree/main/primeqa) | Different entry points for PrimeQA: Information Retrieval, Reading Comprehension, TableQA and Question Generation |
| ๐Ÿ““ [Tutorials: Jupyter Notebooks](https://github.com/primeqa/primeqa/tree/main/notebooks) | Notebooks to get started on QA tasks |
| ๐Ÿ““ [GPT-3/ChatGPT Reader Notebooks](https://github.com/primeqa/primeqa/tree/main/notebooks/mrc/LLM_reader_predict_mode.ipynb) | Notebooks to get started with the GPT-3/ChatGPT reader components|
| ๐Ÿ’ป [Examples: Applying PrimeQA on various QA tasks](https://github.com/primeqa/primeqa/tree/main/examples) | Example scripts for fine-tuning PrimeQA models on a range of QA tasks |
| ๐Ÿค— [Model sharing and uploading](https://huggingface.co/docs/transformers/model_sharing) | Upload and share your fine-tuned models with the community |
| โœ… [Pull Request](https://primeqa.github.io/primeqa/pull_request_template.html) | PrimeQA Pull Request |
| ๐Ÿ“„ [Generate Documentation](https://primeqa.github.io/primeqa/README.html) | How Documentation works |
| ๐Ÿ›  [Orchestrator Service REST Microservice](https://primeqa.github.io/primeqa/orchestrator.html) | Proof-of-concept code for PrimeQA Orchestrator microservice |
| ๐Ÿ“– [Tooling UI](https://primeqa.github.io/primeqa/tooling_ui.html) | Demo UI |

## โค๏ธ PrimeQA collaborators include

| | | | |
|:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|
|stanford| Stanford NLP |i| University of Illinois |
|stuttgart| University of Stuttgart | notredame| University of Notre Dame |
|ohio| Ohio State University |carnegie| Carnegie Mellon University |
|massachusetts| University of Massachusetts |ibm| IBM Research |
| | | | |










primeqa