https://github.com/stanfordnlp/colbert-qa

Code for Relevance-guided Supervision for OpenQA with ColBERT (TACL'21)
https://github.com/stanfordnlp/colbert-qa

Last synced: about 1 month ago
JSON representation

Code for Relevance-guided Supervision for OpenQA with ColBERT (TACL'21)

Host: GitHub
URL: https://github.com/stanfordnlp/colbert-qa
Owner: stanfordnlp
License: mit
Created: 2021-05-09T23:35:00.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2021-08-02T21:53:54.000Z (almost 4 years ago)
Last Synced: 2025-02-20T08:14:57.283Z (4 months ago)
Homepage:
Size: 10.7 KB
Stars: 41
Watchers: 14
Forks: 5
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## ColBERT-QA: Relevance-guided Supervision for OpenQA (TACL'21)

### ColBERT-QA is a state-of-the-art system for answering open questions over a large corpus of text like Wikipedia.

ColBERT-QA extends the scalable ColBERT retriever for open-domain QA. It introduces Relevance-Guided Supervision (RGS), an iterative weak-supervision strategy that efficiently samples accurate positives and challenging negatives for training.

### Implementation

The system implementation lives as part of the parent [ColBERT](https://github.com/stanford-futuredata/ColBERT) repository. This repository will contain instructions focused on ColBERT-QA. For the general-purpose instructions, refer to the README of the parent repository.

After cloning, make sure you obtain the code for the submodule too:

```
git submodule update --init --recursive
```

We use [Anserini](https://github.com/castorini/anserini) for the BM25 implementation, which is used to initiate the first round of RGS's weak supervision. The second and third rounds rely on ColBERT-QA itself. The default corpus is Karpukhin et al.'s [21M-passage Wikipedia 2018 dump](https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz). In addition to training, indexing, and retrieval, the following scripts in the parent repository are useful for the RGS process.

- utility/supervision/triples.py
- utility/evaluation/annotate_EM.py

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/stanfordnlp/colbert-qa

Awesome Lists containing this project

README