Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/generall/entitycategoryprediction
Model for predicting categories of entities by its mentions
https://github.com/generall/entitycategoryprediction
allennlp classification mentions nlp
Last synced: about 1 month ago
JSON representation
Model for predicting categories of entities by its mentions
- Host: GitHub
- URL: https://github.com/generall/entitycategoryprediction
- Owner: generall
- Created: 2019-04-11T20:53:23.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2021-06-23T17:01:05.000Z (over 3 years ago)
- Last Synced: 2023-03-27T08:23:38.133Z (over 1 year ago)
- Topics: allennlp, classification, mentions, nlp
- Language: Jupyter Notebook
- Homepage: https://mention.vasnetsov.com/#/
- Size: 1.94 MB
- Stars: 26
- Watchers: 4
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Category prediction model
This repo contains AllenNLP model for prediction of Named Entity categories by its mentions.
# Data
## Fake data
You can generate some fake data using this [Notebook](notebooks/gen_face_data.ipynb)
## Real data (Work in progress)
Filtered [OneShotWikilinks](https://www.kaggle.com/generall/oneshotwikilinks) dataset with manually selected categories.
### Data preparation steps
* Crete category graph [build_category_graph.ipynb](./notebooks/build_category_graph.ipynb)
* Produces: `category_graph.pkl`
* Obtain the list of Person articles from Ontology [obtain_people_articles.ipynb](/notebooks/obtain_people_articles.ipynb):
* Requires: `dbpedia_2016-10.owl`
* Produces: `people_categories.json`
* Build mapping from article to people categories [generate_full_people_categories.ipynb](./notebooks/generate_full_people_categories.ipynb). Requires
* `people_categories.json`
* `category_graph.pkl`
* `projects/categories_prediction/manual_categories.gsheet`
* Filter mentions for people [filter_mentions.ipynb](./notebooks/filter_mentions.ipynb).
* Requires: `people_all_categories.json`
* Produces: `people_mentions.tsv`Prepare splitted data with:
```bash
!split -n l/10 --verbose ../data/fake_data_train.tsv ../data/fake_data_train.tsv_
```# Install
```bash
pip install -r requirements.txt
```# Run
## Train
```bash
rm -rf ./data/vocabulary ; allennlp make-vocab -s ./data/ allen_conf_vocab.json --include-package category_prediction
allennlp train -f -s data/stats allen_conf.json --include-package category_prediction
``````bash
allennlp train -f -s data/stats allen_conf.json --include-package category_prediction -o '{"trainer": {"cuda_device": 0}}'
```### Continue training with different params
```bash
rm -rf data/stats2/ # Clear new serialization dir
allennlp fine-tune -s data/stats2/ -c allen_conf.json -m ./data/stats/model.tar.gz --include-package category_prediction -o '{"trainer": {"cuda_device": 0}, "iterator": {"base_iterator": {"batch_size": 64}}}'
```## Validate
```bash
allennlp evaluate ./data/stats/model.tar.gz ./data/fake_data_test.tsv --include-package category_prediction
```## Server
### Debug
```bash
MODEL=./data/trained_models/6th_augmented/model.tar.gz python run_server.py
```### Prod
```bash
gunicorn -c gunicorn_config.py wsgi:application
```### Docker
Build
```bash
cd docker
docker build --tag mention .
```Run with passing pyenv into container
```bash
docker run --rm --restart unless-stopped -v $HOME:$HOME -p 8000:8000 \
-v $HOME/.pyenv:/root/.pyenv \
-e ENV_PATH=$HOME/virtualenv/path \
-e APP_PATH=$HOME/project/root/path mention
```# GCE related notes
Fix 100% GPU utilization
```bash
sudo nvidia-smi -pm 1
```