Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/SeongKu-Kang/ToTER_WWW24
Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy (WWW'24)
https://github.com/SeongKu-Kang/ToTER_WWW24
Last synced: about 1 month ago
JSON representation
Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy (WWW'24)
- Host: GitHub
- URL: https://github.com/SeongKu-Kang/ToTER_WWW24
- Owner: SeongKu-Kang
- License: gpl-3.0
- Created: 2024-02-20T10:03:36.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-02-21T06:04:07.000Z (10 months ago)
- Last Synced: 2024-06-04T14:17:52.779Z (7 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 27.3 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-taxonomy - https://github.com/SeongKu-Kang/ToTER_WWW24
README
## Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy
This repository provides the source code of "Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy" accepted in TheWebConf (WWW2024) as a research paper.
### 1. Overview
We introduce a new plug-and-play ToTER framework which improves PLM-based retrieval using a corpus topical taxonomy.
#### (Training phase) Taxonomy-guided topic class relevance learning
The taxonomy reveals the latent structure of the whole corpus.
To exploit it for retrieval, we first connect the corpus-level knowledge to individual documents.
We formulate this step as an unsupervised multi-label classification, assessing the relevance of each document to each topic class without document-topic labels.#### (Inference phase) Topical taxonomy-enhanced retrieval
ToTER consists of three strategies to complement the existing retrieve-then-rerank pipeline: (1) search space adjustment, (2) class relevance matching, and (3) query enrichment by core phrases.
Each strategy is designed to gradually focus on fine-grained ranking.### 2. How to use
Please refer to 'Guide to using ToTER.ipynb' file.### 3. Resources
- Due to their large size, we provide necessary files (e.g., PLM-embeddings, trained classifier) through another file-sharing system: https://drive.google.com/file/d/1BmUmlAV4i4-lwwQdBkuDsCZS8-OS-y8M/view?usp=sharing