{"id":21441738,"url":"https://github.com/dmis-lab/tour","last_synced_at":"2025-07-14T17:32:01.531Z","repository":{"id":97388753,"uuid":"580311930","full_name":"dmis-lab/TouR","owner":"dmis-lab","description":"Findings of ACL'2023: Optimizing Test-Time Query Representations for Dense Retrieval","archived":false,"fork":false,"pushed_at":"2023-10-24T07:52:25.000Z","size":82,"stargazers_count":28,"open_issues_count":0,"forks_count":2,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-05-14T00:23:20.716Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://https://arxiv.org/abs/2205.12680","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dmis-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-12-20T08:50:05.000Z","updated_at":"2024-04-14T07:13:50.000Z","dependencies_parsed_at":"2023-07-05T01:19:07.202Z","dependency_job_id":"965c2314-2492-421a-a042-17d83873cc4e","html_url":"https://github.com/dmis-lab/TouR","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmis-lab%2FTouR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmis-lab%2FTouR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmis-lab%2FTouR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmis-lab%2FTouR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dmis-lab","download_url":"https://codeload.github.com/dmis-lab/TouR/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225990493,"owners_count":17556152,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-23T01:41:22.252Z","updated_at":"2024-11-23T01:41:22.802Z","avatar_url":"https://github.com/dmis-lab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TouR: Optimizing Test-Time Query Representations for Dense Retrieval\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg alt=\"TouR\" src=\"images/tour_overview.png\" width=\"350px\"\u003e\n\u003c/div\u003e\n\n*TouR* optimizes instance-level query representations guided by cross-encoders at test time for dense retrieval.\nPlease see our Findings of ACL paper [\nOptimizing Test-Time Query Representations for Dense Retrieval (Sung et al., 2023)](https://arxiv.org/abs/2205.12680) for more details.\n\n## *TouR* for Phrase Retrieval\n\n### Installation\n\nTo use TouR for phrase retrieval, we suggest first installing [DensePhrases v1.1.0](https://github.com/princeton-nlp/DensePhrases/tree/v1.1.0), which is a state-of-the-art phrase retrieval model.\n\n```\ngit clone -b v1.1.0 https://github.com/princeton-nlp/DensePhrases.git\ncd DensePhrases\n\n# Install from environment.yml (python \u003e= 3.8, transformers==4.13.0)\nconda env create --file environment.yml -n TouR\nconda activate TouR\n\n# Install DensePhrases\npython setup.py develop\n\n# Running config.sh will set the following three environment variables:\n# DATA_DIR: for datasets (including 'kilt', 'open-qa', 'single-qa', 'truecase', 'wikidump')\n# SAVE_DIR: for pre-trained models or index; new models and index will also be saved here\n# CACHE_DIR: for cache files from Huggingface Transformers\nsource config.sh\n```\n\nAfter installing DensePhrases, you will need to download the [resources](https://github.com/princeton-nlp/DensePhrases/tree/v1.1.0#resources) such as datasets, phrase indexes, and pre-trained models.\n\n\n### Running *TouR*\n\nTo run TouR for open-domain question answering, you need to execute the following script.\nThe example script demonstrates applying TouR\u003csub\u003ehard\u003c/sub\u003e to the NQ testset. \nOnce executed, the prediction file will be generated in $OUTPUT_DIR.\n\n```\nTEST_PATH='/path/to/densephrases-data/open-qa/nq-open/nq_test_preprocessed.json'\nLOAD_DIR=princeton-nlp/densephrases-multi-query-multi\nPSEUDO_LABELER_DIR='/path/to/phrase_reranker_multi' # see the model list below\nOUTPUT_DIR='/path/to/output'\nPSEUDO_LABELER_TYPE='hard' # or 'soft'\n\nCUDA_VISIBLE_DEVICES=0 python -u run_tour_densephrases.py \\\n\t--run_mode test_query_vec \\\n\t--cache_dir ${CACHE_DIR} \\\n\t--test_path ${TEST_PATH} \\\n\t--per_device_train_batch_size 1 \\\n\t--warmup_steps 0 \\\n\t--dump_dir ${SAVE_DIR}/densephrases-multi_wiki-20181220/dump/ \\\n\t--index_name start/1048576_flat_OPQ96 \\\n\t--load_dir ${LOAD_DIR} \\\n\t--output_dir ${OUTPUT_DIR} \\\n\t--pseudo_labeler_name_or_path ${PSEUDO_LABELER_DIR} \\\n\t--pseudo_labeler_type ${PSEUDO_LABELER_TYPE} \\\n\t--pseudo_labeler_p 0.5 \\\n\t--pseudo_labeler_temp 0.5 \\\n\t--learning_rate 1.2 \\\n\t--num_train_epochs 3 \\\n\t--top_k 10 \\\n\t--rerank_lambda 0.1 \\\n\t--cuda \\\n\t--top1_earlystop \\\n\t--truecase\n```\n\n#### Model list\n\nWe have uploaded our phrase re-rankers on the Huggingface hub.\nThe phrase re-rankers are used as pseudo labelers for *TouR*.\n\n- [phrase-reranker-multi](https://huggingface.co/dmis-lab/phrase-reranker-multi): Phrase re-ranker trained on multiple datasets (NQ|TriviaQA|SQuAD|WQ|TREC)\n- [phrase-reranker-multi-wq](https://huggingface.co/dmis-lab/phrase-reranker-multi-wq): phrase-reranker-multi with further fine-tuning on WQ dataset\n- [phrase-reranker-multi-trec](https://huggingface.co/dmis-lab/phrase-reranker-multi-trec): phrase-reranker-multi with further fine-tuning on TREC dataset\n- [phrase-reranker-nq](https://huggingface.co/dmis-lab/phrase-reranker-nq): Phrase re-ranker trained on NQ dataset only\n\nTo train phrase re-rankers, please refer to the 'cross_encoder' folder.\n\n## Citations\n```bibtex\n@inproceedings{sung2023optimizing,\n   title={Optimizing Test-Time Query Representations for Dense Retrieval},\n   author={Sung, Mujeen and Park, Jungsoo and Kang, Jaewoo and Chen, Danqi and Lee, Jinhyuk},\n   booktitle={Findings of Association for Computational Linguistics (ACL)},\n   year={2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmis-lab%2Ftour","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdmis-lab%2Ftour","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmis-lab%2Ftour/lists"}