https://github.com/louislefevre/information-retrieval-models
Ranks passages against queries using various models and techniques.
https://github.com/louislefevre/information-retrieval-models
bm25 dirichlet-smoothing information-retrieval laplace-smoothing lidstone-smoothing query-likelihood tfidf vectorspace
Last synced: 7 months ago
JSON representation
Ranks passages against queries using various models and techniques.
- Host: GitHub
- URL: https://github.com/louislefevre/information-retrieval-models
- Owner: louislefevre
- Created: 2021-03-12T03:15:23.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-09-20T00:54:58.000Z (about 4 years ago)
- Last Synced: 2025-02-05T18:27:37.645Z (8 months ago)
- Topics: bm25, dirichlet-smoothing, information-retrieval, laplace-smoothing, lidstone-smoothing, query-likelihood, tfidf, vectorspace
- Language: Python
- Homepage:
- Size: 162 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Information Retrieval Models
Ranks passages against queries using various models and techniques.## Models
- BM25 - Probabilistic retrieval model for estimating the relevance of a passage.
- VectorSpace - Algebraic model for representing passages as vectors.
- QueryLikelihood - Language model for calculating the likelihood of a document being relevant to a given query.## How to Run
The program can be initialised by running *start.py*, which accepts parameters in the format of:
`start.py [-s ]`### Parameters
#### Dataset
- The `` parameter is required and is the path of the dataset to be parsed.
- Expects a TSV file in the format **, where qid is the query ID, pid is the ID of the passage retrieved, query is the query text, and passage is the passage text.
- Each column must be tab separated.#### Model
- The `` parameter is required and is the name of the model to be used for ranking passages.
- Expects either 'bm25' for the BM25 model, 'vs' for the Vector Space model, or 'lm' for the query likelihood model.
- Any other input will be deemed invalid, and an exception will be raised.#### Smoothing
- The `-s ` parameter is required only when using the Query Likelihood model, and is the name of the smoothing technique which will be applied.
- Expects either `laplace` for Laplace smoothing, `lidstone` for Lidstone smoothing, or `dirichlet` for Dirichlet smoothing.
- This parameter can only ever be used if the Query Likelihood model was selected for the `` parameter, and an exception will be raised if any other model is used.### Examples
- `start.py dataset.tsv bm25`
- `start.py dataset.tsv vs`
- `start.py dataset.tsv lm -s laplace`## Dependencies
- [numpy](https://pypi.org/project/numpy/)
- [matplotlib](https://pypi.org/project/matplotlib/)
- [nltk](https://pypi.org/project/nltk/)
- [num2words](https://pypi.org/project/num2words/)
- [tabulate](https://pypi.org/project/tabulate/)
- [punkt (nltk module)](http://www.nltk.org/api/nltk.tokenize.html?highlight=punkt)
- [stopwords (nltk module)](https://www.nltk.org/api/nltk.corpus.html)
*NLTK modules are downloaded automatically at runtime*