Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/taishan1994/awesome-chinese-ner
中文命名实体识别。包含目前最新的中文命名实体识别论文、中文实体识别相关工具、数据集,以及中文预训练模型、词向量、实体识别综述等。
https://github.com/taishan1994/awesome-chinese-ner
List: awesome-chinese-ner
named-entity-recognition ner
Last synced: 3 months ago
JSON representation
中文命名实体识别。包含目前最新的中文命名实体识别论文、中文实体识别相关工具、数据集,以及中文预训练模型、词向量、实体识别综述等。
- Host: GitHub
- URL: https://github.com/taishan1994/awesome-chinese-ner
- Owner: taishan1994
- Created: 2022-03-19T09:14:16.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-03T08:56:57.000Z (7 months ago)
- Last Synced: 2024-04-11T17:27:56.818Z (7 months ago)
- Topics: named-entity-recognition, ner
- Homepage:
- Size: 192 KB
- Stars: 465
- Watchers: 3
- Forks: 40
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- ultimate-awesome - awesome-chinese-ner - 中文命名实体识别。包含目前最新的中文命名实体识别论文、中文实体识别相关工具、数据集,以及中文预训练模型、词向量、实体识别综述等。. (Other Lists / PowerShell Lists)
README
# awesome-chinese-ner
中文命名实体识别
#### 大模型信息抽取
- 大模型信息抽取综述
Large Language Models for Generative Information Extraction: A Survey
https://arxiv.org/abs/2312.17617
https://github.com/quqxui/Awesome-LLM4IE-Papers
#### 延申
- 中文预训练模型综述
https://www.jsjkx.com/CN/10.11896/jsjkx.211200018
- 中文预训练模型下载地址
https://github.com/lonePatient/awesome-pretrained-chinese-nlp-models
- 中文词向量下载地址
https://github.com/Embedding/Chinese-Word-Vectors
- Bilstm_CRF怎么调参?
https://arxiv.org/pdf/1707.06799.pdf
- 使用chatgpt进行信息抽取(实体、关系、事件)
Zero-Shot Information Extraction via Chatting with ChatGPT
演示地址:http://124.221.16.143:5000/
https://arxiv.org/pdf/2302.10205.pdf
https://github.com/cocacola-lab/ChatIE
- GPT for Information Extraction
https://github.com/cocacola-lab/GPT4IE
- Evaluation-of-ChatGPT-on-Information-Extraction
https://github.com/RidongHan/Evaluation-of-ChatGPT-on-Information-Extraction
- 这篇把它放在延申这里:
Unified Text Structuralization with Instruction-tuned Language Models
2023
https://arxiv.org/pdf/2303.14956v2.pdf
- GPT-NER: Named Entity Recognition via Large Language Models
2023
https://arxiv.org/pdf/2304.10428v1.pdf
https://github.com/ShuheWang1998/GPT-NER
- EasyInstruct: An Easy-to-use Framework to Instruct Large Language Models
https://github.com/zjunlp/EasyInstruct
- CODEIE: Large Code Generation Models are Better Few-Shot Information Extractors
在代码中进行实体和关系的提取
2023
https://arxiv.org/pdf/2305.05711v1.pdf
https://github.com/dasepli/CodeIE
- PromptNER : Prompting For Named Entity Recognition
2023
https://arxiv.org/pdf/2305.15444v2.pdf#### 命名实体识别综述(中文)
- 基于深度学习的中文命名实体识别最新研究进展综述
2022年 中文信息学报
http://61.175.198.136:8083/rwt/125/http/GEZC6MJZFZZUPLSSGM3B/Qikan/Article/Detail?id=7107633068
- 命名实体识别方法研究综述
2022年 计算机科学与探索
http://fcst.ceaj.org/CN/10.3778/j.issn.1673-9418.2112109
- 中文命名实体识别综述
2021年 计算机科学与探索
http://fcst.ceaj.org/CN/abstract/abstract2902.shtml
- Chinese named entity recognition: The state of the art
Neurocomputing 2022
[link](https://reader.elsevier.com/reader/sd/pii/S0925231221016581?token=592CD98CF076A91AFE5EDB2396D806784B30D3217FD7B61FE2FE9CB905451ABB5B28C0285AAFA973010ACE14AD387A5C&originRegion=us-east-1&originCreation=20221119143715)# 模型
- Chinese Sequence Labeling with Semi-Supervised Boundary-Aware Language Model Pre-training
COLING 2024
https://arxiv.org/pdf/2404.05560
- Unified Lattice Graph Fusion for Chinese Named Entity Recognition
2024
https://arxiv.org/pdf/2312.16917.pdf
- MRC-based Nested Medical NER with Co-prediction and Adaptive Pre-training
2024 医疗实体识别
https://arxiv.org/pdf/2403.15800.pdf
- CHisIEC: An Information Extraction Corpus for Ancient Chinese History
2024 文言文实体识别
https://arxiv.org/pdf/2403.15088.pdf
https://github.com/tangxuemei1995/CHisIEC
- Attack Named Entity Recognition by Entity Boundary Interference
2023
https://arxiv.org/pdf/2305.05253v1.pdf
- Token Relation Aware Chinese Named Entity Recognition
ACM Transactions on Asian and Low-Resource Language Information Processing 2023
https://dl.acm.org/doi/10.1145/3531534
- WYWEB: A NLP Evaluation Benchmark For Classical Chinese
ACL2023
https://arxiv.org/pdf/2305.14150
https://github.com/baudzhou/WYWEB
- PUnifiedNER: a Prompting-based Unified NER System for Diverse Datasets
AAAI 2023
https://arxiv.org/pdf/2211.14838.pdf
https://github.com/GeorgeLuImmortal/PUnifiedNER
- END-TO-END ENTITY DETECTION WITH PROPOSER ANDREGRESSOR
借鉴目标检测的思想
2022
https://arxiv.org/pdf/2210.10260v2.pdf
https://github.com/Rosenberg37/EntityDetection
- DAMO-NLP at SemEval-2022 Task 11:A Knowledge-based System for Multilingual Named Entity Recognition
多语言的命名实体识别
2022
https://arxiv.org/pdf/2203.00545.pdf
https://github.com/Alibaba-NLP/KB-NER
- PCBERT: Parent and Child BERT for Chinese Few-shot NER
COLING 2022
https://aclanthology.org/2022.coling-1.192.pdf
- GNN-SL: Sequence Labeling Based on Nearest Examples via GNN
2022
https://arxiv.org/pdf/2212.02017.pdf
https://github.com/ShuheWang1998/GNN-SL
- EiCi: A New Method of Dynamic Embedding Incorporating Contextual Information in Chinese NER
这个和AMBERT的思想感觉差不多:[AMBERT](https://arxiv.org/pdf/2008.11869.pdf)
2022
https://openreview.net/pdf?id=0TKg4UlnEEQ
- Deep Span Representations for Named Entity Recognition
2022
https://arxiv.org/pdf/2210.04182v1.pdf
- Mulco: Recognizing Chinese Nested Named Entities Through Multiple Scopes
2022
https://arxiv.org/pdf/2211.10854.pdf
- Unsupervised Boundary-Aware Language Model Pretraining for Chinese Sequence Labeling
EMNLP 2022
https://arxiv.org/pdf/2210.15231.pdf
http://github.com/modelscope/adaseq/examples/babert
- Domain-Specific NER via Retrieving Correlated Samples
COLING 2022
https://arxiv.org/pdf/2208.12995.pdf
- Two Languages Are Better than One: Bilingual Enhancement for Chinese Named Entity Recognition
COLING 2022
https://aclanthology.org/2022.coling-1.176.pdf
- A hybrid Transformer approach for Chinese NER with features augmentation
Expert Syst. Appl 2022
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4087645
- Adaptive Threshold Selective Self-Attention for Chinese NER
COLING 2022
https://aclanthology.org/2022.coling-1.157.pdf
- Improving Chinese Named Entity Recognition by Search Engine Augmentation
2022
https://arxiv.org/pdf/2210.12662.pdf
- Domain-Specific NER via Retrieving Correlated Samples
COLING 2022
https://arxiv.org/pdf/2208.12995.pdf
- Robust Self-Augmentation for Named Entity Recognition with Meta Reweighting
NAACL 2022
https://arxiv.org/pdf/2204.11406.pdf
https://github.com/LindgeW/MetaAug4NER
- Boundary Smoothing for Named Entity Recognition
ACL 2022
https://arxiv.org/pdf/2204.12031v1.pdf
https://github.com/syuoni/eznlp
- NFLAT: Non-Flat-Lattice Transformer for Chinese Named Entity Recognition
2022
https://arxiv.org/pdf/2205.05832.pdf
- Unified Structure Generation for Universal Information Extraction
(一统实体识别、关系抽取、事件抽取、情感分析),百度UIE
ACL 2022
https://arxiv.org/pdf/2203.12277.pdf
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/uie
https://github.com/universal-ie/UIE
以下这篇也是通用的,只是英文方面的,没有中文数据上的实验:
- DEEPSTRUCT: Pretraining of Language Models for Structure Prediction
2022
https://arxiv.org/pdf/2205.10475v1.pdf
https://github.com/cgraywang/deepstruct- Parallel Instance Query Network for Named Entity Recognition
2022
https://arxiv.org/pdf/2203.10545v1.pdf
- Delving Deep into Regularity: A Simple but Effective Method for Chinese Named Entity Recognition
NAACL 2022
https://arxiv.org/pdf/2204.05544.pdf
- TURNER: The Uncertainty-based Retrieval Framework for Chinese NER
2022
https://arxiv.org/pdf/2202.09022
- NN-NER: Named Entity Recognition with Nearest Neighbor Search
2022
https://arxiv.org/pdf/2203.17103
https://github.com/ShannonAI/KNN-NER
- Unified Named Entity Recognition as Word-Word Relation Classification
AAAI 2022
https://arxiv.org/abs/2112.10070
https://github.com/ljynlp/W2NER.git
- MarkBERT: Marking Word Boundaries Improves Chinese BERT
2022
https://arxiv.org/pdf/2203.06378
- MFE-NER: Multi-feature Fusion Embedding for Chinese Named Entity Recognition
2021
https://arxiv.org/pdf/2109.07877
- AdaK-NER: An Adaptive Top-K Approach for Named Entity Recognition with Incomplete Annotations
2021
https://arxiv.org/pdf/2109.05233
- ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information
ACL 2021
https://arxiv.org/pdf/2106.16038
https://github.com/ShannonAI/ChineseBert
- Enhanced Language Representation with Label Knowledge for Span Extraction
EMNLP 2021
https://aclanthology.org/2021.emnlp-main.379.pdf
https://github.com/Akeepers/LEAR
- Lex-BERT: Enhancing BERT based NER with lexicons
ICLR 2021
https://arxiv.org/pdf/2101.00396v1.pdf
- Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter
ACL 2021
https://arxiv.org/pdf/2105.07148.pdf
https://github.com/liuwei1206/LEBERT
- MECT: Multi-Metadata Embedding based Cross-Transformer for Chinese Named Entity Recognition
ACL 2021
https://arxiv.org/pdf/2107.05418v1.pdf
https://github.com/CoderMusou/MECT4CNER
- Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition
ACL 2021
https://arxiv.org/pdf/2105.06804v2.pdf
https://github.com/tricktreat/locate-and-label
- Dynamic Modeling Cross- and Self-Lattice Attention Network for Chinese NER
AAAI 2021
https://ojs.aaai.org/index.php/AAAI/article/view/17706/17513
https://github.com/zs50910/DCSAN-for-Chinese-NER
- Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information
EMNLP-2020
https://arxiv.org/pdf/2010.15466
https://github.com/cuhksz-nlp/AESINER
- ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations
ACL 2020
https://arxiv.org/pdf/1911.00720v1.pdf
https://github.com/sinovation/ZEN
- A Unified MRC Framework for Named Entity Recognition
ACL 2020
https://arxiv.org/pdf/1910.11476v6.pdf
https://github.com/ShannonAI/mrc-for-flat-nested-ner
- Simplify the Usage of Lexicon in Chinese NER
ACL 2020
https://arxiv.org/pdf/1908.05969.pdf
https://github.com/v-mipeng/LexiconAugmentedNER
- A Boundary Regression Model for Nested Named Entity Recognition
2020
https://arxiv.org/pdf/2011.14330v3.pdf
https://github.com/yuelfei/BR
- Dice Loss for Data-imbalanced NLP Tasks
ACL 2020
https://arxiv.org/pdf/1911.02855v3.pdf
https://github.com/ShannonAI/dice_loss_for_NLP
- Porous Lattice Transformer Encoder for Chinese NER
COLING 2020
https://aclanthology.org/2020.coling-main.340.pdf
- FLAT: Chinese NER Using Flat-Lattice Transformer
ACL 2020
https://arxiv.org/pdf/2004.11795v2.pdf
https://github.com/LeeSureman/Flat-Lattice-Transformer
- FGN: Fusion Glyph Network for Chinese Named Entity Recognition
2020
https://arxiv.org/pdf/2001.05272v6.pdf
https://github.com/AidenHuen/FGN-NER
- SLK-NER: Exploiting Second-order Lexicon Knowledge for Chinese NER
2020
https://arxiv.org/pdf/2007.08416v1.pdf
https://github.com/zerohd4869/SLK-NER
- Entity Enhanced BERT Pre-training for Chinese NER
EMNLP 2020
https://aclanthology.org/2020.emnlp-main.518.pdf
https://github.com/jiachenwestlake/Entity_BERT
- Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information
ACL2020
https://arxiv.org/pdf/2010.15466v1.pdf
https://github.com/cuhksz-nlp/AESINER
- Named Entity Recognition for Social Media Texts with Semantic Augmentation
EMNLP 2020
https://arxiv.org/pdf/2010.15458v1.pdf
https://github.com/cuhksz-nlp/SANER
- CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese
2020
https://arxiv.org/pdf/2001.04351v4.pdf
https://github.com/CLUEbenchmark/CLUENER2020
- ERNIE: Enhanced Representation through Knowledge Integration
2019
https://arxiv.org/pdf/1904.09223v1.pdf
https://github.com/PaddlePaddle/ERNIE
- TENER: Adapting Transformer Encoder for Named Entity Recognition
2019
https://arxiv.org/pdf/1911.04474v3.pdf
https://github.com/fastnlp/TENER
- Chinese NER Using Lattice LSTM
ACL 2018
https://arxiv.org/pdf/1805.02023v4.pdf
https://github.com/jiesutd/LatticeLSTM
- ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
2019
https://arxiv.org/pdf/1907.12412v2.pdf
https://github.com/PaddlePaddle/ERNIE
- Glyce: Glyph-vectors for Chinese Character Representations
NeurIPS 2019
https://arxiv.org/pdf/1901.10125v5.pdf
https://github.com/ShannonAI/glyce
- CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition
NAACL 2019
https://arxiv.org/pdf/1904.02141v3.pdf
https://github.com/microsoft/vert-papers/tree/master/papers/CAN-NER
- Neural Chinese Named Entity Recognition via CNN-LSTM-CRF and Joint Training with Word Segmentation
2019
https://arxiv.org/pdf/1905.01964v1.pdf
https://github.com/rxy007/cnn-lstm-crf
- Chinese Named Entity Recognition Augmented with Lexicon Memory
2019
https://arxiv.org/pdf/1912.08282v2.pdf
https://github.com/dugu9sword/LEMON
- Exploiting Multiple Embeddings for Chinese Named Entity Recognition
2019
https://arxiv.org/pdf/1908.10657v1.pdf
https://github.com/WHUIR/ME-CNER
- Dependency-Guided LSTM-CRF for Named Entity Recognition
IJCNLP 2019
https://arxiv.org/pdf/1909.10148v1.pdf
https://github.com/allanj/ner_with_dependency
- CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition
NAACL-HLT (1) 2019
https://aclanthology.org/N19-1342/
- CNN-Based Chinese NER with Lexicon Rethinking
IJCAI 2019
https://www.ijcai.org/proceedings/2019/0692.pdf
https://aclanthology.org/N19-1342.pdf
- Leverage Lexical Knowledge for Chinese Named Entity Recognition via Collaborative Graph Network
IJCNLP 2019
https://aclanthology.org/D19-1396.pdf
https://github.com/DianboWork/Graph4CNER
- Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning
COLING 2018
https://aclanthology.org/C18-1183.pdf
https://github.com/rainarch/DSNER
- Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism
EMNLP 2018
https://aclanthology.org/D18-1017.pdf
https://github.com/CPF-NLPR/AT4ChineseNER# 非中文模型
没有针对于中文的实验,但是思想可以借鉴的:
- DiffusionNER: Boundary Diffusion for Named Entity Recognition
2023
https://arxiv.org/pdf/2305.13298v1.pdf
https://github.com/tricktreat/DiffusionNER
- Learning In-context Learning for Named Entity Recognition
ACL 2023
https://arxiv.org/pdf/2305.11038v1.pdf
https://github.com/chen700564/metaner-icl
- UniEX: An Effective and Efficient Framework for Unified Information Extraction via a Span-extractive Perspective
2023
https://arxiv.org/pdf/2305.10306v1.pdf
- Easy-to-Hard Learning for Information Extraction∗
2023
https://arxiv.org/pdf/2305.09193v1.pdf
https://github.com/DAMO-NLP-SG/IE-E2H
- UTC-IE: A Unified Token-pair Classification Architecture for Information Extraction
2023
https://openreview.net/pdf?id=cRQwl-59CU8
https://github.com/yhcc/utcie
- Deep Span Representations for Named Entity Recognition
Boundary Smoothing for Named Entity Recognition(同作者)
ACL 2023
https://github.com/syuoni/eznlp
https://arxiv.org/pdf/2210.04182v2.pdf
- NER-to-MRC: Named-Entity Recognition Completely Solving as Machine Reading Comprehension
2023
https://arxiv.org/pdf/2305.03970v1.pdf
- RexUIE: A Recursive Method with Explicit Schema Instructor for Universal Information Extraction
通用信息抽取,对比USM
2023
https://arxiv.org/pdf/2304.14770.pdf
- InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction
(又一篇通用信息抽取,对比百度UIE以及USM)
2023
https://arxiv.org/pdf/2304.08085v1.pdf
https://github.com/BeyonderXX/InstructUIE
- Universal Information Extraction as Unified Semantic Matching
通用的信息抽取:实体、关系、事件(没有在中文数据上的实验),简称USM
AAAI 2023
https://arxiv.org/pdf/2301.03282.pdf
- MULTI-TASK TRANSFORMER WITH RELATION-ATTENTION AND TYPE-ATTENTION FOR NAMED ENTITY RECOGNITION
2023
https://arxiv.org/pdf/2303.10870v1.pdf
- DEEPSTRUCT: Pretraining of Language Models for Structure Prediction
通用信息抽取
ACL 2022
https://arxiv.org/pdf/2205.10475v2.pdf
https://github.com/cgraywang/deepstruct
- TOE: A Grid-Tagging Discontinuous NER Model Enhanced by Embedding Tag/Word Relations and More Fine-Grained Tags
改进W2NER模型
IEEE TASLP(Transactions on Audio, Speech and Language Processing)
https://arxiv.org/pdf/2211.00684.pdf
https://github.com/solkx/TOE
- OPTIMIZING BI-ENCODER FOR NAMED ENTITY RECOGNITION VIA CONTRASTIVE LEARNING
ICLR 2023
https://arxiv.org/pdf/2208.14565v2.pdf
github.com/microsoft/binder
- One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER
2023
https://arxiv.org/pdf/2301.10410v2.pdf
https://github.com/zjunlp/DeepKE/tree/main/example/ner/cross
- QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition
2022
https://arxiv.org/pdf/2203.01543.pdf
- A Unified Generative Framework for Various NER Subtasks
(使用BART生成模型进行命名实体识别)
ACL-ICJNLP 2021
https://arxiv.org/pdf/2106.01223.pdf
https://github.com/yhcc/BARTNER
(以下四篇是基于prompt的命名实体识别)
- Template-Based Named Entity Recognition Using BART
https://arxiv.org/abs/2106.01760
https://github.com/Nealcly/templateNER
- Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER
https://arxiv.org/abs/2110.08454
https://github.com/INK-USC/fewNER
- LightNER: A Lightweight Generative Framework with Prompt-guided Attention for Low-resource NER
https://arxiv.org/abs/2109.00720
https://github.com/zjunlp/DeepKE/blob/main/example/ner/few-shot/README_CN.md
- Template-free Prompt Tuning for Few-shot NER
https://arxiv.org/abs/2109.13532
https://github.com/rtmaww/EntLM/# 数据集
- [MSRA](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/MSRA)
- [Weibo](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/weibo)
- [resume](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/ResumeNER )
- onenotes4
- onenotes5
- [一家公司提供的数据集,包含人名、地名、机构名、专有名词。](https://bosonnlp.com/dev/resource)
- [人民网(04年)](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/people_daily)
- [影视-音乐-书籍实体标注数据](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/video_music_book_datasets)
- [中文医学文本命名实体识别 2020CCKS](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/2020_ccks_ner)
- [医渡云实体识别数据集](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/yidu-s4k )
- [CLUENER2020](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/cluener_public)
- [不同任务中文数据集整理](https://github.com/liucongg/NLPDataSet)
- [医疗相关的数据集](http://172.16.1.113:9005/docs)
- [30+ner数据汇总](https://zhuanlan.zhihu.com/p/603850842)
- [中文实体识别数据集汇总](https://www.zhihu.com/question/264243637/answer/2936822902)# 预训练语言模型
- [ChineseBert](https://aclanthology.org/2021.acl-long.161/) ACL2021
- [MacBert](https://arxiv.org/pdf/2004.13922.pdf) 2020
- [SpanBert](https://arxiv.org/pdf/1907.10529.pdf)
- [XLNet](https://arxiv.org/pdf/1906.08237.pdf)
- [Roberta](https://arxiv.org/pdf/1907.11692.pdf)
- [Bert](https://arxiv.org/pdf/1810.04805.pdf)
- [StructBert](https://arxiv.org/abs/1908.04577)
- [WoBert](https://github.com/ZhuiyiTechnology/WoBERT)
- [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB)
- [Ernie1.0](https://arxiv.org/pdf/1904.09223)
- [Ernie2.0](https://arxiv.org/abs/1907.12412)
- [Ernie3.0](https://arxiv.org/abs/2107.02137)
- [ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding](https://arxiv.org/pdf/2010.12148.pdf)
- [NeZha](https://arxiv.org/abs/1909.00204)
- [MengZi](https://arxiv.org/pdf/2110.06696.pdf )
- [ZEN](https://arxiv.org/pdf/1911.00720.pdf)
- [ALBERT](https://arxiv.org/pdf/1909.11942.pdf)
- [roformer](https://arxiv.org/abs/2104.09864)
- [roformer-v2](https://github.com/ZhuiyiTechnology/roformer-v2)
- [Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words](https://arxiv.org/pdf/2202.12142)
- [PERT: Pre-Training BERT with Permuted Language Model](https://arxiv.org/abs/2203.06906)
- [RoChBert: Towards Robust BERT Fine-tuning for Chinese](https://arxiv.org/pdf/2210.15944.pdf) EMNLP2022
- [MarkBERT: Marking Word Boundaries Improves Chinese BERT](https://arxiv.org/pdf/2203.06378.pdf) 2022
- [MVP-BERT: REDESIGNING VOCABULARIES FOR CHINESE BERT AND MULTI-VOCAB PRETRAINING](https://arxiv.org/pdf/2011.08539.pdf) 2022
- [LERT: A Linguistically-motivated Pre-trained Language Model](https://arxiv.org/pdf/2211.05344v1.pdf) 2022
- [AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization](https://arxiv.org/pdf/2008.11869.pdf) 2022
- [BURT: BERT-inspired Universal Representation from Learning Meaningful Segment](https://arxiv.org/pdf/2012.14320.pdf) 2021
- [Towards Efficient NLP: A Standard Evaluation and A Strong Baseline](https://aclanthology.org/2022.naacl-main.240.pdf)
- [Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence](https://arxiv.org/pdf/2209.02970.pdf)
- [AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models](https://github.com/modelscope/AdaSeq) 多种方法
- [TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning](https://arxiv.org/pdf/2111.04198.pdf) NAACL 2022
- [Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models](https://arxiv.org/ftp/arxiv/papers/2303/2303.10893.pdf) 2023
- [MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model](https://arxiv.org/pdf/2304.00717v1.pdf) 2023
- [sikuGPT](https://github.com/SIKU-BERT/sikuGPT) 古文模型 2023
- [UniIE](https://github.com/AAIG-NLP/UniIE) 通用信息抽取
# Ner工具- [Stanza](https://github.com/stanfordnlp/stanza)
- [LAC](https://github.com/baidu/lac)
- [Ltp](https://github.com/HIT-SCIR/ltp) 哈工大
- [Hanlp](https://github.com/hankcs/HanLP)
- [foolnltk](https://github.com/rockyzhengwu/FoolNLTK)
- [NLTK](https://github.com/nltk/nltk)
- BosonNLP
- [FudanNlp](https://github.com/FudanNLP/fnlp) 复旦大学
- [Jionlp](https://github.com/dongrixinyu/JioNLP)
- [HarvestText](https://github.com/blmoistawinde/HarvestText)
- [fastHan](https://github.com/fastnlp/fastHan)
- [EsayNLP](https://github.com/alibaba/EasyNLP) 阿里巴巴
- [PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP) 百度
- [AliceMind](https://github.com/alibaba/AliceMind) 阿里巴巴
- [spacy](https://github.com/explosion/spaCy)
- [DeepKE](https://github.com/zjunlp/DeepKE)
- [coreNlp](https://github.com/stanfordnlp/CoreNLP) JAVA/Python
- [opennlp](https://github.com/apache/opennlp) JAVA
- [NLPIR](https://github.com/NLPIR-team/NLPIR/)
- [trankit](https://github.com/nlp-uoregon/trankit) 多语言
- [HugIE](https://github.com/wjn1996/HugNLP/blob/main/documents/information_extraction/HugIE.md) 通用信息抽取
- [EasyInstruct](https://github.com/zjunlp/EasyInstruct)# 比赛
- CCKS2017开放的中文的电子病例测评相关的数据。
评测任务一:https://biendata.com/competition/CCKS2017_1/
评测任务二:https://biendata.com/competition/CCKS2017_2/
- CCKS2018开放的音乐领域的实体识别任务。
评测任务:https://biendata.com/competition/CCKS2018_2/
- (CoNLL 2002)Annotated Corpus for Named Entity Recognition。
地址:https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus
- NLPCC2018开放的任务型对话系统中的口语理解评测。
地址:http://tcci.ccf.org.cn/conference/2018/taskdata.php
- 非结构化商业文本信息中隐私信息识别
地址:https://www.datafountain.cn/competitions/472/datasets
- 商品标题识别
地址:https://www.heywhale.com/home/competition/620b34ed28270b0017b823ad/content/3
- CCKS2021中文NLP地址要素解析
地址:https://tianchi.aliyun.com/competition/entrance/531900/introduction
- CAIL2022信息抽取赛道
地址:http://cail.cipsc.org.cn/task6.html?raceID=6&cail_tag=2022
- [2019互联网金融新实体发现](https://github.com/TingFree/NLPer-Arsenal/blob/master/%E5%BE%80%E6%9C%9F%E7%AB%9E%E8%B5%9B/%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB/2019%E4%BA%92%E8%81%94%E7%BD%91%E9%87%91%E8%9E%8D%E6%96%B0%E5%AE%9E%E4%BD%93%E5%8F%91%E7%8E%B0.md)
- [2020CHIP-中药说明书实体识别挑战](https://github.com/TingFree/NLPer-Arsenal/blob/master/%E5%BE%80%E6%9C%9F%E7%AB%9E%E8%B5%9B/%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB/2020%E4%B8%AD%E8%8D%AF%E8%AF%B4%E6%98%8E%E4%B9%A6%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB%E6%8C%91%E6%88%98.md)
- [2020CHIP-中文医学文本命名实体识别](https://github.com/TingFree/NLPer-Arsenal/blob/master/%E5%BE%80%E6%9C%9F%E7%AB%9E%E8%B5%9B/%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB/2020%E4%B8%AD%E6%96%87%E5%8C%BB%E5%AD%A6%E6%96%87%E6%9C%AC%E5%91%BD%E5%90%8D%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB.md)
- [2020CCKS面向试验鉴定的命名实体识别](https://github.com/TingFree/NLPer-Arsenal/blob/master/%E5%BE%80%E6%9C%9F%E7%AB%9E%E8%B5%9B/%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB/2020CCKS%E9%9D%A2%E5%90%91%E8%AF%95%E9%AA%8C%E9%89%B4%E5%AE%9A%E7%9A%84%E5%91%BD%E5%90%8D%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB.md)
- [2020CCKS面向中文电子病历的医疗实体及事件抽取-子任务1:医疗命名实体识别](https://github.com/TingFree/NLPer-Arsenal/blob/master/%E5%BE%80%E6%9C%9F%E7%AB%9E%E8%B5%9B/%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB/2020CCKS%E9%9D%A2%E5%90%91%E4%B8%AD%E6%96%87%E7%94%B5%E5%AD%90%E7%97%85%E5%8E%86%E7%9A%84%E5%8C%BB%E7%96%97%E5%AE%9E%E4%BD%93%E5%8F%8A%E4%BA%8B%E4%BB%B6%E6%8A%BD%E5%8F%96-%E5%AD%90%E4%BB%BB%E5%8A%A1%E4%B8%80%EF%BC%9A%E5%8C%BB%E7%96%97%E5%90%8D%E9%97%A8%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB.md)
- [LAIC2022-犯罪事实实体识别](http://data.court.gov.cn/pages/laic.html)
- [SemEval-2023 Task 2: Fine-grained Multilingual Named Entity Recognition (MultiCoNER 2)](https://arxiv.org/pdf/2305.06586v1.pdf)
- [新型电力系统人工智能应用大赛赛题二:电力生产知识图谱多模式信息抽取](https://aistudio.baidu.com/aistudio/competition/detail/425/0/task-definition)
- [CCKS2022通用信息抽取](https://aistudio.baidu.com/aistudio/competition/detail/161/0/introduction)