{"id":16105595,"url":"https://github.com/charlesyuan02/named_entity_recognition","last_synced_at":"2025-09-01T12:09:48.124Z","repository":{"id":124966221,"uuid":"432359154","full_name":"CharlesYuan02/named_entity_recognition","owner":"CharlesYuan02","description":"Utilizing Spacy and Tensorflow to train custom Named Entity Recognizers.","archived":false,"fork":false,"pushed_at":"2021-12-11T21:11:21.000Z","size":20902,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-06T03:12:56.342Z","etag":null,"topics":["conll-2003","named-entity-recognition","ner","nlp","spacy","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CharlesYuan02.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-27T03:33:15.000Z","updated_at":"2022-11-02T19:56:27.000Z","dependencies_parsed_at":"2023-08-09T18:00:17.082Z","dependency_job_id":null,"html_url":"https://github.com/CharlesYuan02/named_entity_recognition","commit_stats":null,"previous_names":["charlesyuan02/named_entity_recognition"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CharlesYuan02/named_entity_recognition","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CharlesYuan02%2Fnamed_entity_recognition","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CharlesYuan02%2Fnamed_entity_recognition/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CharlesYuan02%2Fnamed_entity_recognition/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CharlesYuan02%2Fnamed_entity_recognition/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CharlesYuan02","download_url":"https://codeload.github.com/CharlesYuan02/named_entity_recognition/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CharlesYuan02%2Fnamed_entity_recognition/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273122125,"owners_count":25049539,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-01T02:00:09.058Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conll-2003","named-entity-recognition","ner","nlp","spacy","transformer"],"created_at":"2024-10-09T19:10:08.934Z","updated_at":"2025-09-01T12:09:48.088Z","avatar_url":"https://github.com/CharlesYuan02.png","language":"Python","readme":"# Named Entity Recognizer\n#### (named_entity_recognition.py)\nEarlier today, I was browsing through the list of Thesis Projects from past fourth-year EngScis, and one topic in particular stood out to me. Perhaps it was because I didn't know anything about it, but the words \"Named Entity Recognition\" remained stuck in my head for the rest of the day. As such, I decided to search up what it was and how to do it. Turns out, it's not actually too difficult. Subsequently, I used Spacy to train a custom Named Entity Recognizer to classify words from the light novel series Eighty Six.\n\n#### Update 1 (spacy_conll2003.py):\nI wondered what it would take to train an NER model from scratch, so I did just that using the CoNLL-2003 benchmark dataset. Unlike the fine-tuned model used for Eighty Six, this one is trained from scratch. \n\n#### Update 2 (transformer_conll2003.py):\nOf course, I can't conclude this project without testing out my own deep learning model. So I trained a transformer to compare with Spacy's model, and it actually did perform better! It ended up detecting all the entities in the text, along with identifying them with the correct labels.\n\n## Prerequisites\n```\ndatasets==1.16.1\npython==3.8.0\nspacy==3.2.0\n```\n## Dataset\n\nThe dataset I used was \u003ca href=\"https://huggingface.co/datasets/conll2003\"\u003eCoNLL-2003\u003c/a\u003e, a named entity recognition dataset released as a part of CoNLL-2003 shared task: language-independent named entity recognition. The data was taken from the Reuters Corpus, which consists of Reuters news stories between August 1996 and August 1997.\n\n## Results, with Pretrained Model and Custom Examples\n\n### Before Training\n\u003cimg src=\"https://github.com/Chubbyman2/named_entity_recognition/blob/main/results/86_untrained_result.PNG\"\u003e\n\n### After Training\n\u003cimg src=\"https://github.com/Chubbyman2/named_entity_recognition/blob/main/results/86_trained_result.PNG\"\u003e\n\n## Results, Training From Scratch with CoNLL-2003 Dataset\n\n#### Precision: 46.2% \n#### Recall: 51.7% \n#### F1 Score: 48.8% \n\n### Before Training\n\u003cimg src=\"https://github.com/Chubbyman2/named_entity_recognition/blob/main/results/spacy_untrained_result.PNG\"\u003e\n\n### After Training\n\u003cimg src=\"https://github.com/Chubbyman2/named_entity_recognition/blob/main/results/spacy_trained_result.PNG\"\u003e\n\n## Results, Transformer Model with CoNLL-2003 Dataset\n\n#### Precision: 69.02% \n#### Recall: 65.58%\n#### F1 Score: 67.26%\n\n\u003cimg src=\"https://github.com/Chubbyman2/named_entity_recognition/blob/main/results/transformer_trained_result.PNG\"\u003e\n\n\u003cimg src=\"https://github.com/Chubbyman2/named_entity_recognition/blob/main/results/ground_truth.PNG\"\u003e\n\n## Notes\n* There is a weird bug with Spacy, where you can train, save, and load your model just fine.\n* But then when you try to load your saved model without training first, it doesn't work.\n* I'm not too sure why...\n\n## Acknowledgements\n* Varun Singh, for writing this \u003ca href=\"https://keras.io/examples/nlp/ner_transformers/\"\u003eTransformer tutorial\u003c/a\u003e\n* Shrivarsheni, for writing this \u003ca href=\"https://www.machinelearningplus.com/nlp/training-custom-ner-model-in-spacy/\"\u003earticle\u003c/a\u003e on the basics of training Spacy NER models\n* Asato Asato, for writing the amazing series that is Eighty Six\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcharlesyuan02%2Fnamed_entity_recognition","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcharlesyuan02%2Fnamed_entity_recognition","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcharlesyuan02%2Fnamed_entity_recognition/lists"}