{"id":18246545,"url":"https://github.com/fastnlp/tener","last_synced_at":"2025-04-07T16:18:54.378Z","repository":{"id":109151356,"uuid":"227044384","full_name":"fastnlp/TENER","owner":"fastnlp","description":"Codes for \"TENER: Adapting Transformer Encoder for Named Entity Recognition\"","archived":false,"fork":false,"pushed_at":"2020-07-06T13:31:23.000Z","size":31,"stargazers_count":374,"open_issues_count":13,"forks_count":55,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-31T14:13:03.652Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fastnlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-12-10T06:23:50.000Z","updated_at":"2025-03-12T20:18:12.000Z","dependencies_parsed_at":"2023-06-15T05:30:32.620Z","dependency_job_id":null,"html_url":"https://github.com/fastnlp/TENER","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fastnlp%2FTENER","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fastnlp%2FTENER/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fastnlp%2FTENER/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fastnlp%2FTENER/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fastnlp","download_url":"https://codeload.github.com/fastnlp/TENER/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247685628,"owners_count":20979085,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-05T09:26:35.553Z","updated_at":"2025-04-07T16:18:54.358Z","avatar_url":"https://github.com/fastnlp.png","language":"Python","readme":"## TENER: Adapting Transformer Encoder for Named Entity Recognition\n\n\nThis is the code for the paper [TENER](https://arxiv.org/abs/1911.04474). \n\nTENER (Transformer Encoder for Named Entity Recognition) is a Transformer-based model which\n aims to tackle the NER task. Compared with the naive Transformer, we \n found relative position embedding is quite important in the NER task. Experiments\n in the English and Chinese NER datasets prove the effectiveness.\n\n#### Requirements\nThis project needs the natural language processing python package \n[fastNLP](https://github.com/fastnlp/fastNLP). You can install by\nthe following command\n\n```bash\npip install fastNLP\n```\n\n#### Run the code\n\n(1) Prepare the English dataset.\n\n##### Conll2003\n\nYour file should like the following (The first token in a line\n is the word, the last token is the NER tag.) \n\n```\nLONDON NNP B-NP B-LOC\n1996-08-30 CD I-NP O\n\nWest NNP B-NP B-MISC\nIndian NNP I-NP I-MISC\nall-rounder NN I-NP O\nPhil NNP I-NP B-PER\n\n```\n\n##### OntoNotes\n\nSuggest to use the following code to prepare your data \n[OntoNotes-5.0-NER](https://github.com/yhcc/OntoNotes-5.0-NER). \nOr you can prepare data like the Conll2003 style, and then replace the \nOntoNotesNERPipe with Conll2003NERPipe in the code.\n\nFor English datasets, we use the Glove 100d pretrained embedding. FastNLP will\n download it automatically.\n \nYou can use the following code to run (make sure you have changed the \ndata path)\n\n```\npython train_tener_en.py --dataset conll2003\n```\nor \n```\npython train_tener_en.py --dataset en-ontonotes\n```\n\nAlthough we tried hard to make sure you can reproduce our results, \nthe results may still disappoint you. This is usually caused by \nthe best dev performance does not correlate well with the test performance\n. Several runs should be helpful. \n\nThe ELMo version (FastNLP will download ELMo weights automatically, you just need\nto change the data path in train_elmo_en.)\n\n```\npython train_elmo_en.py --dataset en-ontonotes\n```\n\n   \n   \n##### MSRA, OntoNotes4.0, Weibo, Resume\nYour data should only have two columns, the first is the character,\n the second is the tag, like the following\n```\n口 O\n腔 O\n溃 O\n疡 O\n加 O\n上 O\n```\n\nFor the Chinese datasets, you can download the pretrained unigram and \nbigram embeddings in [Baidu Cloud](https://pan.baidu.com/s/1pLO6T9D#list/path=%2Fsharelink808087924-1080546002081577%2FNeuralSegmentation\u0026parentPath=%2Fsharelink808087924-1080546002081577).\n Download the 'gigaword_chn.all.a2b.uni.iter50.vec' and 'gigaword_chn.all.a2b.bi.iter50.vec'.\n Then replace the embedding path in train_tener_cn.py\n \nYou can run the code by the following command\n\n```\npython train_tener_cn.py --dataset ontonotes\n```\n\n\n\n\n\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffastnlp%2Ftener","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffastnlp%2Ftener","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffastnlp%2Ftener/lists"}