Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/facebookresearch/hypernymysuite
Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora
https://github.com/facebookresearch/hypernymysuite
Last synced: 3 months ago
JSON representation
Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora
- Host: GitHub
- URL: https://github.com/facebookresearch/hypernymysuite
- Owner: facebookresearch
- License: other
- Archived: true
- Created: 2018-05-09T16:58:27.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2021-08-31T14:40:40.000Z (about 3 years ago)
- Last Synced: 2024-03-04T16:46:32.459Z (8 months ago)
- Language: Python
- Size: 4.76 MB
- Stars: 152
- Watchers: 62
- Forks: 23
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-taxonomy - https://github.com/facebookresearch/hypernymysuite
README
# Hypernymy Suite
HypernymySuite is a tool for evaluating some hypernymy detection modules. Its
predominant focus is reproducing the results for the following paper.> Stephen Roller, Douwe Kiela, and Maximilian Nickel. 2018. Hearst Patterns
> Revisited: Automatic Hypernym Detection from Large Text Corpora. ACL.
> ([arXiv](https://arxiv.org/abs/1806.03191))We hope that open sourcing our evaluation will help facilitate future research.
## Example
You can produce results in a JSON format by calling main.py:
python main.py cnt --dset hearst_counts.txt.gz
These results can be made machine readable by piping them into `compile_table`:
python main.py cnt --dset hearst_counts.txt.gz | python compile_table.py
To generate the full table from the report, you may simply use `generate_table.sh`:
bash generate_table.sh results.json
Please note that due to licensing concerns, we were not able to release our
train/validation/test folds from the paper, so results may differ slightly than
those reported.## Requirements
The module was developed with python3 in mind, and is not tested for python2.
Nonetheless, cross-platform compatibility may be possible.The suite requires several packages you probably already have installed:
`numpy`, `scipy`, `pandas`, `scikit-learn` and `nltk`. These can be installed
using pip:pip install -r requirements.txt
If you've never used `nltk` before, you'll need to install the wordnet module.
python -c "import nltk; nltk.download('wordnet')"
On OS X, you may need to install `coreutils` and `gnu-sed` for the script `download_data.sh` to run correctly. These can be installed using brew:brew install coreutils gnu-sed
After installation, you will either need to modify `download_data.sh` to run `gsort` and `gsed` instead of `sort` and `sed`, or alternatively add a "gnubin" directory to your PATH from your bashrc:
PATH="/usr/local/opt/coreutils/libexec/gnubin:$PATH"
For more information, see `brew info coreutils` or `brew info gnu-sed`.
## Evaluating your own model
You can evaluate your own model in two separate ways. The simplest way is simply
to create a copy of example.tsv, and fill in your model's predictions in the `sim`
column. You must include a prediction for every pair, but you may set the `is_oov`
column to `1` to ensure it is correctly calculated.You may then evaluate the model:
python main.py precomputed --dset example.tsv
You can also implement any model by extending the `base.HypernymySuiteModel` class
and filling in your own implemenation for `predict` or `predict_many`.## References
If you find this code useful for your research, please cite the following paper:
@inproceedings{roller2018hearst
title = {Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora},
author = {Roller, Stephen and Kiela, Douwe and Nickel, Maximilian},
year = {2018},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics},
location = {Melbourne, Australia},
publisher = {Association for Computational Linguistics}
}## License
This code is licensed under [CC-BY-NC4.0](https://creativecommons.org/licenses/by-nc/4.0/).
The data contained in `hearst_counts.txt` was extracted from a combination of
[Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:Database_download) and Gigaword.
Please see publication for details.