Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ucasir/NPRF
NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval
https://github.com/ucasir/NPRF
information-retrieval neural-network pseudo-relevance-feedback
Last synced: 13 days ago
JSON representation
NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval
- Host: GitHub
- URL: https://github.com/ucasir/NPRF
- Owner: ucasir
- License: apache-2.0
- Created: 2018-08-16T06:49:45.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2018-11-26T10:44:13.000Z (almost 6 years ago)
- Last Synced: 2024-07-31T23:45:10.191Z (3 months ago)
- Topics: information-retrieval, neural-network, pseudo-relevance-feedback
- Language: Python
- Size: 2.15 MB
- Stars: 33
- Watchers: 4
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-few-shot-meta-learning - code - official (TF+Keras)
README
# NPRF
NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval [[pdf]](http://aclweb.org/anthology/D18-1478)
If you use the code, please cite the following paper:
```
@inproceedings{li2018nprf,
title={NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval},
author={Li, Canjia and Sun, Yingfei and He, Ben and Wang, Le and Hui, Kai and Yates, Andrew and Sun, Le and Xu, Jungang},
booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
year={2018}
}
```## Requirement
* Tensorflow
* Keras
* gensim
* numpy## Getting started
### Training data preparation
To capture the top-k terms from top-n documents, one needs to extract term document frequency from index. Afterwards, you are required to generate the similarity matrix upon the query and document given the pre-trained word embedding (e.g. word2vec). Related functions can be found in preprocess/prepare_d2d.py.### Training meta data preparation
We introduce two classes for the ease of training. The class **Relevance** incorporates the relevance information from the baseline and qrels file. The class **Result** simplify the write/read operation on standard TREC result file. Other information like query idf is dumped as a pickle file.### Model training
Configure the MODEL_config.py file, then run
```
python MODEL.py --fold fold_number temp_file_path
```
You need to run 5-fold cross valiation, which can be automatically done by running the *runfold.sh* script. The temp file is a temporary file to write the result of the validation set in TREC format. A training log sample on the first fold of TREC 1-3 dataset is provided for reference, see [sample_log](https://github.com/ucasir/NPRF/blob/master/sample_log).
### Evaluation
After training, the evaluation result of each fold is retained in the result path as you specify in the MODEL_config.py file. One can simply run `cat *res >> merge_file` to merge results from all folds. Thereafter, run the [trec_eval](https://trec.nist.gov/trec_eval/) script to evaluate your model.## Reference
Some snippets of the code follow the implementation of [K-NRM](https://github.com/AdeDZY/K-NRM), [MatchZoo](https://github.com/faneshion/MatchZoo).