https://github.com/stanfordnlp/colbert-qa
Code for Relevance-guided Supervision for OpenQA with ColBERT (TACL'21)
https://github.com/stanfordnlp/colbert-qa
Last synced: about 1 month ago
JSON representation
Code for Relevance-guided Supervision for OpenQA with ColBERT (TACL'21)
- Host: GitHub
- URL: https://github.com/stanfordnlp/colbert-qa
- Owner: stanfordnlp
- License: mit
- Created: 2021-05-09T23:35:00.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2021-08-02T21:53:54.000Z (almost 4 years ago)
- Last Synced: 2025-02-20T08:14:57.283Z (4 months ago)
- Homepage:
- Size: 10.7 KB
- Stars: 41
- Watchers: 14
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## ColBERT-QA: Relevance-guided Supervision for OpenQA (TACL'21)
### ColBERT-QA is a state-of-the-art system for answering open questions over a large corpus of text like Wikipedia.
ColBERT-QA extends the scalable ColBERT retriever for open-domain QA. It introduces Relevance-Guided Supervision (RGS), an iterative weak-supervision strategy that efficiently samples accurate positives and challenging negatives for training.
### Implementation
The system implementation lives as part of the parent [ColBERT](https://github.com/stanford-futuredata/ColBERT) repository. This repository will contain instructions focused on ColBERT-QA. For the general-purpose instructions, refer to the README of the parent repository.
After cloning, make sure you obtain the code for the submodule too:
```
git submodule update --init --recursive
```We use [Anserini](https://github.com/castorini/anserini) for the BM25 implementation, which is used to initiate the first round of RGS's weak supervision. The second and third rounds rely on ColBERT-QA itself. The default corpus is Karpukhin et al.'s [21M-passage Wikipedia 2018 dump](https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz). In addition to training, indexing, and retrieval, the following scripts in the parent repository are useful for the RGS process.
- utility/supervision/triples.py
- utility/evaluation/annotate_EM.py