{"id":13535175,"url":"https://github.com/kyzhouhzau/BERT-NER","last_synced_at":"2025-04-02T00:32:46.905Z","repository":{"id":33293754,"uuid":"156503444","full_name":"kyzhouhzau/BERT-NER","owner":"kyzhouhzau","description":"Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.","archived":false,"fork":false,"pushed_at":"2022-05-19T05:06:32.000Z","size":2291,"stargazers_count":1256,"open_issues_count":78,"forks_count":332,"subscribers_count":36,"default_branch":"master","last_synced_at":"2025-03-25T16:19:00.172Z","etag":null,"topics":["bert","conll-2003","google-bert","ner","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kyzhouhzau.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-11-07T06:55:44.000Z","updated_at":"2025-03-21T02:01:32.000Z","dependencies_parsed_at":"2022-09-02T09:22:58.629Z","dependency_job_id":null,"html_url":"https://github.com/kyzhouhzau/BERT-NER","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyzhouhzau%2FBERT-NER","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyzhouhzau%2FBERT-NER/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyzhouhzau%2FBERT-NER/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyzhouhzau%2FBERT-NER/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kyzhouhzau","download_url":"https://codeload.github.com/kyzhouhzau/BERT-NER/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246735354,"owners_count":20825221,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","conll-2003","google-bert","ner","tensorflow"],"created_at":"2024-08-01T08:00:50.775Z","updated_at":"2025-04-02T00:32:45.666Z","avatar_url":"https://github.com/kyzhouhzau.png","language":"Python","funding_links":[],"categories":["BERT  NER  task:","Tasks","Python"],"sub_categories":["Named-Entity Recognition (NER)"],"readme":"## For better performance, you can try NLPGNN, see [NLPGNN](https://github.com/kyzhouhzau/NLPGNN) for more details.\n\n# BERT-NER Version 2\n\n\nUse Google's BERT for named entity recognition （CoNLL-2003 as the dataset）. \n\nThe original version （see old_version for more detail） contains some hard codes and lacks corresponding annotations,which is inconvenient to understand. So in this updated version,there are some new ideas and tricks （On data Preprocessing and layer design） that can help you quickly implement the fine-tuning model (you just need to try to modify crf_layer or softmax_layer).\n\n### Folder Description:\n```\nBERT-NER\n|____ bert                          # need git from [here](https://github.com/google-research/bert)\n|____ cased_L-12_H-768_A-12\t    # need download from [here](https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip)\n|____ data\t\t            # train data\n|____ middle_data\t            # middle data (label id map)\n|____ output\t\t\t    # output (final model, predict results)\n|____ BERT_NER.py\t\t    # mian code\n|____ conlleval.pl\t\t    # eval code\n|____ run_ner.sh    \t\t    # run model and eval result\n\n```\n\n\n### Usage:\n```\nbash run_ner.sh\n```\n\n### What's in run_ner.sh:\n```\npython BERT_NER.py\\\n    --task_name=\"NER\"  \\\n    --do_lower_case=False \\\n    --crf=False \\\n    --do_train=True   \\\n    --do_eval=True   \\\n    --do_predict=True \\\n    --data_dir=data   \\\n    --vocab_file=cased_L-12_H-768_A-12/vocab.txt  \\\n    --bert_config_file=cased_L-12_H-768_A-12/bert_config.json \\\n    --init_checkpoint=cased_L-12_H-768_A-12/bert_model.ckpt   \\\n    --max_seq_length=128   \\\n    --train_batch_size=32   \\\n    --learning_rate=2e-5   \\\n    --num_train_epochs=3.0   \\\n    --output_dir=./output/result_dir\n\nperl conlleval.pl -d '\\t' \u003c ./output/result_dir/label_test.txt\n```\n\n**Notice:** cased model was recommened, according to [this](https://arxiv.org/abs/1810.04805) paper. CoNLL-2003 dataset and perl Script comes from [here](https://www.clips.uantwerpen.be/conll2003/ner/)\n\n\n### RESULTS:(On test set)\n#### Parameter setting:\n* do_lower_case=False \n* num_train_epochs=4.0\n* crf=False\n  \n```\naccuracy:  98.15%; precision:  90.61%; recall:  88.85%; FB1:  89.72\n              LOC: precision:  91.93%; recall:  91.79%; FB1:  91.86  1387\n             MISC: precision:  83.83%; recall:  78.43%; FB1:  81.04  668\n              ORG: precision:  87.83%; recall:  85.18%; FB1:  86.48  1191\n              PER: precision:  95.19%; recall:  94.83%; FB1:  95.01  1311\n```\n### Result description:\nHere i just use the default paramaters, but as Google's paper says a 0.2% error is reasonable(reported 92.4%).\nMaybe some tricks need to be added to the above model. \n\n\n\n### reference:\n\n[1] https://arxiv.org/abs/1810.04805\n\n[2] https://github.com/google-research/bert\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyzhouhzau%2FBERT-NER","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkyzhouhzau%2FBERT-NER","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyzhouhzau%2FBERT-NER/lists"}