Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mpkato/pyNTCIREVAL
Python version of NTCIREVAL http://research.nii.ac.jp/ntcir/tools/ntcireval-en.html
https://github.com/mpkato/pyNTCIREVAL
Last synced: about 5 hours ago
JSON representation
Python version of NTCIREVAL http://research.nii.ac.jp/ntcir/tools/ntcireval-en.html
- Host: GitHub
- URL: https://github.com/mpkato/pyNTCIREVAL
- Owner: mpkato
- License: mit
- Created: 2016-06-10T23:13:56.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2023-09-11T12:22:27.000Z (about 1 year ago)
- Last Synced: 2024-04-28T13:41:37.996Z (7 months ago)
- Language: Python
- Homepage:
- Size: 44.9 KB
- Stars: 22
- Watchers: 2
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pyNTCIREVAL
[![CircleCI](https://circleci.com/gh/mpkato/pyNTCIREVAL.svg?style=svg)](https://circleci.com/gh/mpkato/pyNTCIREVAL)
## Introduction
pyNTCIREVAL is a python version of NTCIREVAL http://research.nii.ac.jp/ntcir/tools/ntcireval-en.html
developed by Dr. Tetsuya Sakai http://www.f.waseda.jp/tetsuya/sakai.html .
Only a part of NTCIREVAL functionalities has been implemented in the current
version of pyNTCIREVAL:
retrieval effectiveness metrics for ranked retrieval (e.g. DCG and ERR).
As shown below, pyNTCIREVAL can be used in Python codes as well.For Japanese users, there is a very nice textbook written in Japanese
that discusses various evaluation metrics and how to use NTCIREVAL: see http://www.f.waseda.jp/tetsuya/book.html .## Evaluation Metrics
These evaluation metrics are available in the current version:
- Hit@k: 1 if top k contains a relevant doc, and 0 otherwise.
- P@k (precision at k): number of relevant docs in top k divided by k.
- AP (Average Precision)6, 7.
- ERR (Expected Reciprocal Rank), nERR@k2, 8.
- RBP (Rank-biased Precision)4.
- nDCG (original nDCG)3.
- MSnDCG (Microsoft version of nDCG)1.
- Q-measure8.
- RR (Reciprocal Rank).
- O-measure5
- P-measure and P-plus5.
- NCU (Normalised Cumulative Utility)7.## Installation
```bash
pip install pyNTCIREVAL
```## Examples
### P@k
```python
from pyNTCIREVAL import Labeler
from pyNTCIREVAL.metrics import Precision# dict of { document ID: relevance level }
qrels = {0: 1, 1: 0, 2: 0, 3: 0, 4: 1, 5: 0, 6: 0, 7: 1, 8: 0, 9: 0}
ranked_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # a list of document IDs# labeling: [doc_id] -> [(doc_id, rel_level)]
labeler = Labeler(qrels)
labeled_ranked_list = labeler.label(ranked_list)
assert labeled_ranked_list == [
(0, 1), (1, 0), (2, 0), (3, 0), (4, 1),
(5, 0), (6, 0), (7, 1), (8, 0), (9, 0)
]# let's compute Precision@5
metric = Precision(cutoff=5)
result = metric.compute(labeled_ranked_list)
assert result == 0.4
```### nDCG@k (Microsoft version)
Many evaluation metric classes need `xrelnum` and `grades` as input for initialization.
`xrelnum` is a list containing the number of documents of i-th relevance level,
while `grades` is a list containing a grade for each i-th relevance level (except for level 0).For example, there are three levels of relevance: irrelevant, partially relevant, and highly relevant.
Suppose a document collection includes 5 irrelevant, 3 partially relevant, and 2 highly relevant for a certain topic.
In this case, `xrelnum = [5, 3, 2]`.
If we want to assign 0, 1, and 2 grades for each level, then `grades = [1, 2]`.```python
from pyNTCIREVAL import Labeler
from pyNTCIREVAL.metrics import MSnDCG# dict of { document ID: relevance level }
qrels = {0: 2, 1: 0, 2: 1, 3: 0, 4: 1, 5: 0, 6: 0, 7: 2, 8: 0, 9: 0}
grades = [1, 2] # a grade for relevance levels 1 and 2 (Note that level 0 is excluded)
ranked_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # a list of document IDs# labeling: [doc_id] -> [(doc_id, rel_level)]
labeler = Labeler(qrels)
labeled_ranked_list = labeler.label(ranked_list)
assert labeled_ranked_list == [
(0, 2), (1, 0), (2, 1), (3, 0), (4, 1),
(5, 0), (6, 0), (7, 2), (8, 0), (9, 0)
]# compute the number of documents for each relevance level
rel_level_num = 3
xrelnum = labeler.compute_per_level_doc_num(rel_level_num)
assert xrelnum == [6, 2, 2]# Let's compute nDCG@5
metric = MSnDCG(xrelnum, grades, cutoff=5)
result = metric.compute(labeled_ranked_list)
assert result == 0.6885695823073614
```## References
[1] Burges, C. et al.:
Learning to rank using gradient descent,
ICML 2005.[2] Chapelle, O. et al.:
Expected Reciprocal Rank for Graded Relevance,
CIKM 2009.[3] Jarvelin, K. and Kelalainen, J.:
Cumulated Gain-based Evaluation of IR Techniques,
ACM TOIS 20(4), 2002.[4] Moffat, A. and Zobel, J.:
Rank-biased Precision for Measurement of Retrieval Effectiveness,
ACM TOIS 27(1), 2008.[5] Sakai, T.:
On the Properties of Evaluation Metrics for Finding One Highly Relevant Document,
IPSJ TOD, Vol.48, No.SIG9 (TOD35), 2007.[6] Sakai, T.:
Alternatives to Bpref,
SIGIR 2007.[7] Sakai. T. and Robertson, S.:
Modelling A User Population for Designing Information Retrieval Metrics,
EVIA 2008.[8] Sakai, T. and Song, R.:
Evaluating Diversified Search Results Using Per-intent Graded Relevance,
SIGIR 2011.