{"id":19137442,"url":"https://github.com/shangjingbo1226/autoner","last_synced_at":"2026-01-27T11:02:30.566Z","repository":{"id":54317408,"uuid":"139912414","full_name":"shangjingbo1226/AutoNER","owner":"shangjingbo1226","description":"Learning Named Entity Tagger from Domain-Specific Dictionary","archived":false,"fork":false,"pushed_at":"2019-10-05T08:29:47.000Z","size":3721,"stargazers_count":482,"open_issues_count":11,"forks_count":91,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-02-22T18:29:15.067Z","etag":null,"topics":["data-driven","dictionary","distant-supervision","domain-specific","named-entity-recognition","ner","sequence-labeling"],"latest_commit_sha":null,"homepage":"https://shangjingbo1226.github.io/AutoNER/","language":"ChucK","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shangjingbo1226.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-07-06T00:03:11.000Z","updated_at":"2025-02-09T15:38:42.000Z","dependencies_parsed_at":"2022-08-13T11:50:45.806Z","dependency_job_id":null,"html_url":"https://github.com/shangjingbo1226/AutoNER","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/shangjingbo1226/AutoNER","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shangjingbo1226%2FAutoNER","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shangjingbo1226%2FAutoNER/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shangjingbo1226%2FAutoNER/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shangjingbo1226%2FAutoNER/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shangjingbo1226","download_url":"https://codeload.github.com/shangjingbo1226/AutoNER/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shangjingbo1226%2FAutoNER/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28812367,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-27T07:41:26.337Z","status":"ssl_error","status_checked_at":"2026-01-27T07:41:08.776Z","response_time":168,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-driven","dictionary","distant-supervision","domain-specific","named-entity-recognition","ner","sequence-labeling"],"created_at":"2024-11-09T06:38:26.749Z","updated_at":"2026-01-27T11:02:30.543Z","avatar_url":"https://github.com/shangjingbo1226.png","language":"ChucK","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AutoNER\n\n**Check Our New NER Toolkit🚀🚀🚀**\n- **Inference**:\n  - **[LightNER](https://github.com/LiyuanLucasLiu/LightNER)**: inference w. models pre-trained / trained w. *any* following tools, *efficiently*. \n- **Training**:\n  - **[LD-Net](https://github.com/LiyuanLucasLiu/LD-Net)**: train NER models w. efficient contextualized representations.\n  - **[VanillaNER](https://github.com/LiyuanLucasLiu/Vanilla_NER)**: train vanilla NER models w. pre-trained embedding.\n- **Distant Training**:\n  - **[AutoNER](https://shangjingbo1226.github.io/AutoNER/)**: train NER models w.o. line-by-line annotations and get competitive performance.\n\n--------------------------------\n\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Documentation Status](https://readthedocs.org/projects/autoner/badge/?version=latest)](http://autoner.readthedocs.io/en/latest/?badge=latest)\n\n**No line-by-line annotations**, AutoNER trains named entity taggers with distant supervision.\n\nDetails about AutoNER can be accessed at: [https://arxiv.org/abs/1809.03599](https://arxiv.org/abs/1809.03599)\n\n- [Model notes](#model-notes)\n- [Benchmarks](#benchmarks)\n- [Training](#training)\n\t- [Required Inputs](#required-inputs)\n\t- [Dependencies](#dependencies)\n\t- [Command](#command)\n- [Citation](#citation)\n\n## Model Notes\n\n![AutoNER-Framework](docs/AutoNER-Framework.png)\n\n## Benchmarks\n\n| Method | Precision | Recall | F1 |\n| ------------- |-------------| -----| -----|\n| Supervised Benchmark | 88.84 | 85.16 | **86.96** |\n| Dictionary Match | 93.93 | 58.35 | 71.98 |\n| Fuzzy-LSTM-CRF | 88.27 | 76.75 | 82.11 |\n| AutoNER | 88.96 | 81.00 | **84.80** |\n\n## Training\n\n### Required Inputs\n\n- **Tokenized Raw Texts**\n  - Example: ```data/BC5CDR/raw_text.txt```\n    - One token per line.\n    - An empty line means the end of a sentence.\n- **Two Dictionaries**\n  - **Core Dictionary w/ Type Info**\n    - Example: ```data/BC5CDR/dict_core.txt```\n      - Two columns (i.e., Type, Tokenized Surface) per line.\n      - Tab separated.\n    - How to obtain?\n      - From domain-specific dictionaries.\n  - **Full Dictionary w/o Type Info**\n    - Example: ```data/BC5CDR/dict_full.txt```\n      - One tokenized high-quality phrases per line.\n    - How to obtain? \n      - From domain-specific dictionaries.\n      - Applying the high-quality phrase mining tool on domain-specific corpus.\n        - [AutoPhrase](https://github.com/shangjingbo1226/AutoPhrase) \n- **Pre-trained word embeddings**\n  - Train your own or download from the web.\n  - The example run uses ```embedding/bio_embedding.txt```, which can be downloaded from [our group's server](http://dmserv4.cs.illinois.edu/bio_embedding.txt). For example, ```curl http://dmserv4.cs.illinois.edu/bio_embedding.txt -o embedding/bio_embedding.txt```. Since the embedding encoding step consumes quite a lot of memory, we also provide the encoded file in the ```autoner_train.sh```.\n- **[Optional]** Development \u0026 Test Sets.\n  - Example: ```data/BC5CDR/truth_dev.ck``` and ```data/BC5CDR/truth_test.ck```\n    - Three columns (i.e., token, ```Tie or Break``` label, entity type).\n    - ```I``` is ```Break```.\n    - ```O``` is ```Tie```.\n    - Two special tokens ```\u003cs\u003e``` and ```\u003ceof\u003e``` mean the start and end of the sentence.\n\n### Dependencies\n\nThis project is based on ```python\u003e=3.6```. The dependent package for this project is listed as below:\n```\nnumpy==1.13.1\ntqdm\ntorch-scope\u003e=0.5.0\npytorch==0.4.1\n```\n\n### Command\n\nTo train an AutoNER model, please run\n```\n./autoner_train.sh\n```\n\nTo apply the trained AutoNER model, please run\n```\n./autoner_test.sh\n```\n\nYou can specify the parameters in the bash files. The variables names are self-explained.\n\n\n## Citation\n\nPlease cite the following two papers if you are using our tool. Thanks!\n\n- Jingbo Shang*, Liyuan Liu*, Xiaotao Gu, Xiang Ren, Teng Ren and Jiawei Han, \"**[Learning Named Entity Tagger using Domain-Specific Dictionary](https://arxiv.org/abs/1809.03599)**\", in Proc. of 2018 Conf. on Empirical Methods in Natural Language Processing (EMNLP'18), Brussels, Belgium, Oct. 2018. (* Equal Contribution)\n- Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, Jiawei Han, \"**[Automated Phrase Mining from Massive Text Corpora](https://arxiv.org/abs/1702.04457)**\", accepted by IEEE Transactions on Knowledge and Data Engineering, Feb. 2018.\n\n```\n@inproceedings{shang2018learning,\n  title = {Learning Named Entity Tagger using Domain-Specific Dictionary}, \n  author = {Shang, Jingbo and Liu, Liyuan and Ren, Xiang and Gu, Xiaotao and Ren, Teng and Han, Jiawei}, \n  booktitle = {EMNLP}, \n  year = 2018, \n}\n\n@article{shang2018automated,\n  title = {Automated phrase mining from massive text corpora},\n  author = {Shang, Jingbo and Liu, Jialu and Jiang, Meng and Ren, Xiang and Voss, Clare R and Han, Jiawei},\n  journal = {IEEE Transactions on Knowledge and Data Engineering},\n  year = {2018},\n  publisher = {IEEE}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshangjingbo1226%2Fautoner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshangjingbo1226%2Fautoner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshangjingbo1226%2Fautoner/lists"}