{"id":17153416,"url":"https://github.com/generall/entitycategoryprediction","last_synced_at":"2025-04-13T12:44:15.414Z","repository":{"id":49205704,"uuid":"180876250","full_name":"generall/EntityCategoryPrediction","owner":"generall","description":"Model for predicting categories of entities by its mentions","archived":false,"fork":false,"pushed_at":"2021-06-23T17:01:05.000Z","size":2030,"stargazers_count":29,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-06T13:47:38.808Z","etag":null,"topics":["allennlp","classification","mentions","nlp"],"latest_commit_sha":null,"homepage":"https://mention.vasnetsov.com/#/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/generall.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-04-11T20:53:23.000Z","updated_at":"2024-01-05T11:58:27.000Z","dependencies_parsed_at":"2022-09-14T18:50:31.755Z","dependency_job_id":null,"html_url":"https://github.com/generall/EntityCategoryPrediction","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/generall%2FEntityCategoryPrediction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/generall%2FEntityCategoryPrediction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/generall%2FEntityCategoryPrediction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/generall%2FEntityCategoryPrediction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/generall","download_url":"https://codeload.github.com/generall/EntityCategoryPrediction/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248717240,"owners_count":21150387,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["allennlp","classification","mentions","nlp"],"created_at":"2024-10-14T21:46:10.062Z","updated_at":"2025-04-13T12:44:15.391Z","avatar_url":"https://github.com/generall.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Category prediction model\n\nThis repo contains AllenNLP model for prediction of Named Entity categories by its mentions.\n\n# Data\n\n## Fake data\n\nYou can generate some fake data using this [Notebook](notebooks/gen_face_data.ipynb)\n\n\n## Real data (Work in progress)\n\nFiltered [OneShotWikilinks](https://www.kaggle.com/generall/oneshotwikilinks) dataset with manually selected categories.\n\n### Data preparation steps\n\n\n* Crete category graph [build_category_graph.ipynb](./notebooks/build_category_graph.ipynb)\n    * Produces: `category_graph.pkl`\n* Obtain the list of Person articles from Ontology [obtain_people_articles.ipynb](/notebooks/obtain_people_articles.ipynb):\n    * Requires: `dbpedia_2016-10.owl`\n    * Produces: `people_categories.json`\n* Build mapping from article to people categories [generate_full_people_categories.ipynb](./notebooks/generate_full_people_categories.ipynb). Requires\n    * `people_categories.json`\n    * `category_graph.pkl`\n    * `projects/categories_prediction/manual_categories.gsheet`\n* Filter mentions for people [filter_mentions.ipynb](./notebooks/filter_mentions.ipynb). \n    * Requires: `people_all_categories.json`\n    * Produces: `people_mentions.tsv`\n\n\nPrepare splitted data with:\n\n```bash\n!split -n l/10 --verbose ../data/fake_data_train.tsv ../data/fake_data_train.tsv_\n```\n\n# Install\n\n```bash\npip install -r requirements.txt\n```\n\n# Run\n\n\n## Train\n\n```bash\n\nrm -rf ./data/vocabulary ; allennlp make-vocab -s ./data/ allen_conf_vocab.json --include-package category_prediction\n\nallennlp train -f -s data/stats allen_conf.json --include-package category_prediction\n```\n\n```bash\nallennlp train -f -s data/stats allen_conf.json --include-package category_prediction -o '{\"trainer\": {\"cuda_device\": 0}}'\n```\n\n### Continue training with different params\n\n```bash\nrm -rf data/stats2/  # Clear new serialization dir\nallennlp fine-tune -s data/stats2/ -c allen_conf.json -m ./data/stats/model.tar.gz --include-package category_prediction -o '{\"trainer\": {\"cuda_device\": 0}, \"iterator\": {\"base_iterator\": {\"batch_size\": 64}}}'\n```\n\n## Validate\n\n```bash\nallennlp evaluate ./data/stats/model.tar.gz ./data/fake_data_test.tsv --include-package category_prediction\n```\n\n## Server\n\n### Debug\n\n```bash\nMODEL=./data/trained_models/6th_augmented/model.tar.gz python run_server.py\n```\n\n### Prod\n\n```bash\ngunicorn -c gunicorn_config.py wsgi:application\n```\n\n### Docker\n\n\nBuild\n```bash\ncd docker\ndocker build --tag mention .\n```\n\nRun with passing pyenv into container\n\n```bash\ndocker run --rm --restart unless-stopped -v $HOME:$HOME -p 8000:8000 \\\n        -v $HOME/.pyenv:/root/.pyenv \\ \n        -e ENV_PATH=$HOME/virtualenv/path \\\n        -e APP_PATH=$HOME/project/root/path mention\n```\n\n# GCE related notes\n\n\nFix 100% GPU utilization\n```bash\nsudo nvidia-smi -pm 1\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgenerall%2Fentitycategoryprediction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgenerall%2Fentitycategoryprediction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgenerall%2Fentitycategoryprediction/lists"}