https://github.com/amzn/extremely-efficient-query-encoder

efficient query encoding for dense retrieval
https://github.com/amzn/extremely-efficient-query-encoder

dense-retrieval information-retrieval query-encoding

Last synced: about 1 month ago
JSON representation

efficient query encoding for dense retrieval

Host: GitHub
URL: https://github.com/amzn/extremely-efficient-query-encoder
Owner: amzn
License: apache-2.0
Created: 2024-03-18T11:21:08.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-08-05T23:43:00.000Z (10 months ago)
Last Synced: 2025-04-30T14:26:21.579Z (about 1 month ago)
Topics: dense-retrieval, information-retrieval, query-encoding
Language: Python
Homepage: https://www.amazon.science/publications/extremely-efficient-online-query-encoding-for-dense-retrieval
Size: 68.4 KB
Stars: 11
Watchers: 3
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

extremely-efficient-online-query-encoding-for-dense-retrieval is an extension of the popular
[Tevatron package](https://github.com/texttron/tevatron/)
([commit](https://github.com/texttron/tevatron/commit/b8f33900895930f9886012580e85464a5c1f7e9a)),
adding the ability to use a small query encoder (with a large passage encoder) to have very low encoding time, while
incurring minor impact in quality.

## Instructions

1. Create (teacher) embeddings for all queries in the train set, using
```bash
python -m tevatron.driver.encode --output_dir /tmp/encode --model_name_or_path Luyu/co-condenser-marco --fp16 \
--per_device_eval_batch_size 128 \
--encode_in_path ../resources/pretrain_data/train_queries_tokens.jsonl \
--encoded_save_path ../resources/pretrain_data/train_queries.pt`
```
2. Run pretraining using `python -m run_pretraining.pretrain`
3. Run training using `marco_train_pretrained_model.sh`
4. Evaluate using `full_eval.sh`

## Citations

```
@article{cohen2024indi,
title={Extremely efficient online query encoding for dense retrieval},
author={Cohen, Nachshon and Fairstein, Yaron and Kushilevitz, Guy},
booktitle={Findings of the Association for Computational Linguistics: NAACL 2024},
year={2024}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/amzn/extremely-efficient-query-encoder

Awesome Lists containing this project

README