https://github.com/amzn/extremely-efficient-query-encoder
efficient query encoding for dense retrieval
https://github.com/amzn/extremely-efficient-query-encoder
dense-retrieval information-retrieval query-encoding
Last synced: about 1 month ago
JSON representation
efficient query encoding for dense retrieval
- Host: GitHub
- URL: https://github.com/amzn/extremely-efficient-query-encoder
- Owner: amzn
- License: apache-2.0
- Created: 2024-03-18T11:21:08.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-05T23:43:00.000Z (10 months ago)
- Last Synced: 2025-04-30T14:26:21.579Z (about 1 month ago)
- Topics: dense-retrieval, information-retrieval, query-encoding
- Language: Python
- Homepage: https://www.amazon.science/publications/extremely-efficient-online-query-encoding-for-dense-retrieval
- Size: 68.4 KB
- Stars: 11
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
extremely-efficient-online-query-encoding-for-dense-retrieval is an extension of the popular
[Tevatron package](https://github.com/texttron/tevatron/)
([commit](https://github.com/texttron/tevatron/commit/b8f33900895930f9886012580e85464a5c1f7e9a)),
adding the ability to use a small query encoder (with a large passage encoder) to have very low encoding time, while
incurring minor impact in quality.## Instructions
1. Create (teacher) embeddings for all queries in the train set, using
```bash
python -m tevatron.driver.encode --output_dir /tmp/encode --model_name_or_path Luyu/co-condenser-marco --fp16 \
--per_device_eval_batch_size 128 \
--encode_in_path ../resources/pretrain_data/train_queries_tokens.jsonl \
--encoded_save_path ../resources/pretrain_data/train_queries.pt`
```
2. Run pretraining using `python -m run_pretraining.pretrain`
3. Run training using `marco_train_pretrained_model.sh`
4. Evaluate using `full_eval.sh`## Citations
```
@article{cohen2024indi,
title={Extremely efficient online query encoding for dense retrieval},
author={Cohen, Nachshon and Fairstein, Yaron and Kushilevitz, Guy},
booktitle={Findings of the Association for Computational Linguistics: NAACL 2024},
year={2024}
}
```