Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/taishan1994/awesome-chinese-ner

中文命名实体识别。包含目前最新的中文命名实体识别论文、中文实体识别相关工具、数据集,以及中文预训练模型、词向量、实体识别综述等。
https://github.com/taishan1994/awesome-chinese-ner

List: awesome-chinese-ner

named-entity-recognition ner

Last synced: 3 months ago
JSON representation

中文命名实体识别。包含目前最新的中文命名实体识别论文、中文实体识别相关工具、数据集,以及中文预训练模型、词向量、实体识别综述等。

Awesome Lists containing this project

README

        

# awesome-chinese-ner
中文命名实体识别
#### 大模型信息抽取
- 大模型信息抽取综述

Large Language Models for Generative Information Extraction: A Survey

https://arxiv.org/abs/2312.17617

https://github.com/quqxui/Awesome-LLM4IE-Papers

#### 延申
- 中文预训练模型综述

https://www.jsjkx.com/CN/10.11896/jsjkx.211200018
- 中文预训练模型下载地址

https://github.com/lonePatient/awesome-pretrained-chinese-nlp-models
- 中文词向量下载地址

https://github.com/Embedding/Chinese-Word-Vectors
- Bilstm_CRF怎么调参?

https://arxiv.org/pdf/1707.06799.pdf
- 使用chatgpt进行信息抽取(实体、关系、事件)

Zero-Shot Information Extraction via Chatting with ChatGPT

演示地址:http://124.221.16.143:5000/

https://arxiv.org/pdf/2302.10205.pdf

https://github.com/cocacola-lab/ChatIE

- GPT for Information Extraction

https://github.com/cocacola-lab/GPT4IE

- Evaluation-of-ChatGPT-on-Information-Extraction

https://github.com/RidongHan/Evaluation-of-ChatGPT-on-Information-Extraction

- 这篇把它放在延申这里:

Unified Text Structuralization with Instruction-tuned Language Models

2023

https://arxiv.org/pdf/2303.14956v2.pdf

- GPT-NER: Named Entity Recognition via Large Language Models

2023

https://arxiv.org/pdf/2304.10428v1.pdf

https://github.com/ShuheWang1998/GPT-NER

- EasyInstruct: An Easy-to-use Framework to Instruct Large Language Models

https://github.com/zjunlp/EasyInstruct
- CODEIE: Large Code Generation Models are Better Few-Shot Information Extractors

在代码中进行实体和关系的提取

2023

https://arxiv.org/pdf/2305.05711v1.pdf

https://github.com/dasepli/CodeIE

- PromptNER : Prompting For Named Entity Recognition

2023

https://arxiv.org/pdf/2305.15444v2.pdf

#### 命名实体识别综述(中文)
- 基于深度学习的中文命名实体识别最新研究进展综述

2022年 中文信息学报

http://61.175.198.136:8083/rwt/125/http/GEZC6MJZFZZUPLSSGM3B/Qikan/Article/Detail?id=7107633068

- 命名实体识别方法研究综述

2022年 计算机科学与探索

http://fcst.ceaj.org/CN/10.3778/j.issn.1673-9418.2112109

- 中文命名实体识别综述

2021年 计算机科学与探索

http://fcst.ceaj.org/CN/abstract/abstract2902.shtml

- Chinese named entity recognition: The state of the art

Neurocomputing 2022

[link](https://reader.elsevier.com/reader/sd/pii/S0925231221016581?token=592CD98CF076A91AFE5EDB2396D806784B30D3217FD7B61FE2FE9CB905451ABB5B28C0285AAFA973010ACE14AD387A5C&originRegion=us-east-1&originCreation=20221119143715)

# 模型
- Chinese Sequence Labeling with Semi-Supervised Boundary-Aware Language Model Pre-training

COLING 2024

https://arxiv.org/pdf/2404.05560

- Unified Lattice Graph Fusion for Chinese Named Entity Recognition

2024

https://arxiv.org/pdf/2312.16917.pdf

- MRC-based Nested Medical NER with Co-prediction and Adaptive Pre-training

2024 医疗实体识别

https://arxiv.org/pdf/2403.15800.pdf

- CHisIEC: An Information Extraction Corpus for Ancient Chinese History

2024 文言文实体识别

https://arxiv.org/pdf/2403.15088.pdf

https://github.com/tangxuemei1995/CHisIEC

- Attack Named Entity Recognition by Entity Boundary Interference

2023

https://arxiv.org/pdf/2305.05253v1.pdf

- Token Relation Aware Chinese Named Entity Recognition

ACM Transactions on Asian and Low-Resource Language Information Processing 2023

https://dl.acm.org/doi/10.1145/3531534

- WYWEB: A NLP Evaluation Benchmark For Classical Chinese

ACL2023

https://arxiv.org/pdf/2305.14150

https://github.com/baudzhou/WYWEB

- PUnifiedNER: a Prompting-based Unified NER System for Diverse Datasets

AAAI 2023

https://arxiv.org/pdf/2211.14838.pdf

https://github.com/GeorgeLuImmortal/PUnifiedNER

- END-TO-END ENTITY DETECTION WITH PROPOSER ANDREGRESSOR

借鉴目标检测的思想

2022

https://arxiv.org/pdf/2210.10260v2.pdf

https://github.com/Rosenberg37/EntityDetection

- DAMO-NLP at SemEval-2022 Task 11:A Knowledge-based System for Multilingual Named Entity Recognition

多语言的命名实体识别

2022

https://arxiv.org/pdf/2203.00545.pdf

https://github.com/Alibaba-NLP/KB-NER

- PCBERT: Parent and Child BERT for Chinese Few-shot NER

COLING 2022

https://aclanthology.org/2022.coling-1.192.pdf
- GNN-SL: Sequence Labeling Based on Nearest Examples via GNN

2022

https://arxiv.org/pdf/2212.02017.pdf

https://github.com/ShuheWang1998/GNN-SL
- EiCi: A New Method of Dynamic Embedding Incorporating Contextual Information in Chinese NER

这个和AMBERT的思想感觉差不多:[AMBERT](https://arxiv.org/pdf/2008.11869.pdf)

2022

https://openreview.net/pdf?id=0TKg4UlnEEQ
- Deep Span Representations for Named Entity Recognition

2022

https://arxiv.org/pdf/2210.04182v1.pdf
- Mulco: Recognizing Chinese Nested Named Entities Through Multiple Scopes

2022

https://arxiv.org/pdf/2211.10854.pdf
- Unsupervised Boundary-Aware Language Model Pretraining for Chinese Sequence Labeling

EMNLP 2022

https://arxiv.org/pdf/2210.15231.pdf

http://github.com/modelscope/adaseq/examples/babert
- Domain-Specific NER via Retrieving Correlated Samples

COLING 2022

https://arxiv.org/pdf/2208.12995.pdf
- Two Languages Are Better than One: Bilingual Enhancement for Chinese Named Entity Recognition

COLING 2022

https://aclanthology.org/2022.coling-1.176.pdf
- A hybrid Transformer approach for Chinese NER with features augmentation

Expert Syst. Appl 2022

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4087645
- Adaptive Threshold Selective Self-Attention for Chinese NER

COLING 2022

https://aclanthology.org/2022.coling-1.157.pdf
- Improving Chinese Named Entity Recognition by Search Engine Augmentation

2022

https://arxiv.org/pdf/2210.12662.pdf

- Domain-Specific NER via Retrieving Correlated Samples

COLING 2022

https://arxiv.org/pdf/2208.12995.pdf

- Robust Self-Augmentation for Named Entity Recognition with Meta Reweighting

NAACL 2022

https://arxiv.org/pdf/2204.11406.pdf

https://github.com/LindgeW/MetaAug4NER

- Boundary Smoothing for Named Entity Recognition

ACL 2022

https://arxiv.org/pdf/2204.12031v1.pdf

https://github.com/syuoni/eznlp

- NFLAT: Non-Flat-Lattice Transformer for Chinese Named Entity Recognition

2022

https://arxiv.org/pdf/2205.05832.pdf

- Unified Structure Generation for Universal Information Extraction

(一统实体识别、关系抽取、事件抽取、情感分析),百度UIE

ACL 2022

https://arxiv.org/pdf/2203.12277.pdf

https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/uie

https://github.com/universal-ie/UIE

以下这篇也是通用的,只是英文方面的,没有中文数据上的实验:
- DEEPSTRUCT: Pretraining of Language Models for Structure Prediction

2022

https://arxiv.org/pdf/2205.10475v1.pdf

https://github.com/cgraywang/deepstruct

- Parallel Instance Query Network for Named Entity Recognition

2022

https://arxiv.org/pdf/2203.10545v1.pdf

- Delving Deep into Regularity: A Simple but Effective Method for Chinese Named Entity Recognition

NAACL 2022

https://arxiv.org/pdf/2204.05544.pdf

- TURNER: The Uncertainty-based Retrieval Framework for Chinese NER

2022

https://arxiv.org/pdf/2202.09022

- NN-NER: Named Entity Recognition with Nearest Neighbor Search

2022

https://arxiv.org/pdf/2203.17103

https://github.com/ShannonAI/KNN-NER

- Unified Named Entity Recognition as Word-Word Relation Classification

AAAI 2022

https://arxiv.org/abs/2112.10070

https://github.com/ljynlp/W2NER.git

- MarkBERT: Marking Word Boundaries Improves Chinese BERT

2022

https://arxiv.org/pdf/2203.06378

- MFE-NER: Multi-feature Fusion Embedding for Chinese Named Entity Recognition

2021

https://arxiv.org/pdf/2109.07877

- AdaK-NER: An Adaptive Top-K Approach for Named Entity Recognition with Incomplete Annotations

2021

https://arxiv.org/pdf/2109.05233

- ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information

ACL 2021

https://arxiv.org/pdf/2106.16038

https://github.com/ShannonAI/ChineseBert

- Enhanced Language Representation with Label Knowledge for Span Extraction

EMNLP 2021

https://aclanthology.org/2021.emnlp-main.379.pdf

https://github.com/Akeepers/LEAR

- Lex-BERT: Enhancing BERT based NER with lexicons

ICLR 2021

https://arxiv.org/pdf/2101.00396v1.pdf

- Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

ACL 2021

https://arxiv.org/pdf/2105.07148.pdf

https://github.com/liuwei1206/LEBERT

- MECT: Multi-Metadata Embedding based Cross-Transformer for Chinese Named Entity Recognition

ACL 2021

https://arxiv.org/pdf/2107.05418v1.pdf

https://github.com/CoderMusou/MECT4CNER

- Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition

ACL 2021

https://arxiv.org/pdf/2105.06804v2.pdf

https://github.com/tricktreat/locate-and-label

- Dynamic Modeling Cross- and Self-Lattice Attention Network for Chinese NER

AAAI 2021

https://ojs.aaai.org/index.php/AAAI/article/view/17706/17513

https://github.com/zs50910/DCSAN-for-Chinese-NER

- Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information

EMNLP-2020

https://arxiv.org/pdf/2010.15466

https://github.com/cuhksz-nlp/AESINER

- ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations

ACL 2020

https://arxiv.org/pdf/1911.00720v1.pdf

https://github.com/sinovation/ZEN

- A Unified MRC Framework for Named Entity Recognition

ACL 2020

https://arxiv.org/pdf/1910.11476v6.pdf

https://github.com/ShannonAI/mrc-for-flat-nested-ner

- Simplify the Usage of Lexicon in Chinese NER

ACL 2020

https://arxiv.org/pdf/1908.05969.pdf

https://github.com/v-mipeng/LexiconAugmentedNER

- A Boundary Regression Model for Nested Named Entity Recognition

2020

https://arxiv.org/pdf/2011.14330v3.pdf

https://github.com/yuelfei/BR

- Dice Loss for Data-imbalanced NLP Tasks

ACL 2020

https://arxiv.org/pdf/1911.02855v3.pdf

https://github.com/ShannonAI/dice_loss_for_NLP

- Porous Lattice Transformer Encoder for Chinese NER

COLING 2020

https://aclanthology.org/2020.coling-main.340.pdf

- FLAT: Chinese NER Using Flat-Lattice Transformer

ACL 2020

https://arxiv.org/pdf/2004.11795v2.pdf

https://github.com/LeeSureman/Flat-Lattice-Transformer

- FGN: Fusion Glyph Network for Chinese Named Entity Recognition

2020

https://arxiv.org/pdf/2001.05272v6.pdf

https://github.com/AidenHuen/FGN-NER

- SLK-NER: Exploiting Second-order Lexicon Knowledge for Chinese NER

2020

https://arxiv.org/pdf/2007.08416v1.pdf

https://github.com/zerohd4869/SLK-NER

- Entity Enhanced BERT Pre-training for Chinese NER

EMNLP 2020

https://aclanthology.org/2020.emnlp-main.518.pdf

https://github.com/jiachenwestlake/Entity_BERT

- Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information

ACL2020

https://arxiv.org/pdf/2010.15466v1.pdf

https://github.com/cuhksz-nlp/AESINER

- Named Entity Recognition for Social Media Texts with Semantic Augmentation

EMNLP 2020

https://arxiv.org/pdf/2010.15458v1.pdf

https://github.com/cuhksz-nlp/SANER

- CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese

2020

https://arxiv.org/pdf/2001.04351v4.pdf

https://github.com/CLUEbenchmark/CLUENER2020

- ERNIE: Enhanced Representation through Knowledge Integration

2019

https://arxiv.org/pdf/1904.09223v1.pdf

https://github.com/PaddlePaddle/ERNIE

- TENER: Adapting Transformer Encoder for Named Entity Recognition

2019

https://arxiv.org/pdf/1911.04474v3.pdf

https://github.com/fastnlp/TENER

- Chinese NER Using Lattice LSTM

ACL 2018

https://arxiv.org/pdf/1805.02023v4.pdf

https://github.com/jiesutd/LatticeLSTM

- ERNIE 2.0: A Continual Pre-training Framework for Language Understanding

2019

https://arxiv.org/pdf/1907.12412v2.pdf

https://github.com/PaddlePaddle/ERNIE

- Glyce: Glyph-vectors for Chinese Character Representations

NeurIPS 2019

https://arxiv.org/pdf/1901.10125v5.pdf

https://github.com/ShannonAI/glyce

- CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition

NAACL 2019

https://arxiv.org/pdf/1904.02141v3.pdf

https://github.com/microsoft/vert-papers/tree/master/papers/CAN-NER

- Neural Chinese Named Entity Recognition via CNN-LSTM-CRF and Joint Training with Word Segmentation

2019

https://arxiv.org/pdf/1905.01964v1.pdf

https://github.com/rxy007/cnn-lstm-crf

- Chinese Named Entity Recognition Augmented with Lexicon Memory

2019

https://arxiv.org/pdf/1912.08282v2.pdf

https://github.com/dugu9sword/LEMON

- Exploiting Multiple Embeddings for Chinese Named Entity Recognition

2019

https://arxiv.org/pdf/1908.10657v1.pdf

https://github.com/WHUIR/ME-CNER

- Dependency-Guided LSTM-CRF for Named Entity Recognition

IJCNLP 2019

https://arxiv.org/pdf/1909.10148v1.pdf

https://github.com/allanj/ner_with_dependency

- CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition

NAACL-HLT (1) 2019

https://aclanthology.org/N19-1342/

- CNN-Based Chinese NER with Lexicon Rethinking

IJCAI 2019

https://www.ijcai.org/proceedings/2019/0692.pdf

https://aclanthology.org/N19-1342.pdf

- Leverage Lexical Knowledge for Chinese Named Entity Recognition via Collaborative Graph Network

IJCNLP 2019

https://aclanthology.org/D19-1396.pdf

https://github.com/DianboWork/Graph4CNER

- Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning

COLING 2018

https://aclanthology.org/C18-1183.pdf

https://github.com/rainarch/DSNER

- Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism

EMNLP 2018

https://aclanthology.org/D18-1017.pdf

https://github.com/CPF-NLPR/AT4ChineseNER

# 非中文模型

没有针对于中文的实验,但是思想可以借鉴的:

- DiffusionNER: Boundary Diffusion for Named Entity Recognition

2023

https://arxiv.org/pdf/2305.13298v1.pdf

https://github.com/tricktreat/DiffusionNER

- Learning In-context Learning for Named Entity Recognition

ACL 2023

https://arxiv.org/pdf/2305.11038v1.pdf

https://github.com/chen700564/metaner-icl

- UniEX: An Effective and Efficient Framework for Unified Information Extraction via a Span-extractive Perspective

2023

https://arxiv.org/pdf/2305.10306v1.pdf

- Easy-to-Hard Learning for Information Extraction∗

2023

https://arxiv.org/pdf/2305.09193v1.pdf

https://github.com/DAMO-NLP-SG/IE-E2H

- UTC-IE: A Unified Token-pair Classification Architecture for Information Extraction

2023

https://openreview.net/pdf?id=cRQwl-59CU8

https://github.com/yhcc/utcie

- Deep Span Representations for Named Entity Recognition

Boundary Smoothing for Named Entity Recognition(同作者)

ACL 2023

https://github.com/syuoni/eznlp

https://arxiv.org/pdf/2210.04182v2.pdf

- NER-to-MRC: Named-Entity Recognition Completely Solving as Machine Reading Comprehension

2023

https://arxiv.org/pdf/2305.03970v1.pdf

- RexUIE: A Recursive Method with Explicit Schema Instructor for Universal Information Extraction

通用信息抽取,对比USM

2023

https://arxiv.org/pdf/2304.14770.pdf

- InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction

(又一篇通用信息抽取,对比百度UIE以及USM)

2023

https://arxiv.org/pdf/2304.08085v1.pdf

https://github.com/BeyonderXX/InstructUIE

- Universal Information Extraction as Unified Semantic Matching

通用的信息抽取:实体、关系、事件(没有在中文数据上的实验),简称USM

AAAI 2023

https://arxiv.org/pdf/2301.03282.pdf

- MULTI-TASK TRANSFORMER WITH RELATION-ATTENTION AND TYPE-ATTENTION FOR NAMED ENTITY RECOGNITION

2023

https://arxiv.org/pdf/2303.10870v1.pdf

- DEEPSTRUCT: Pretraining of Language Models for Structure Prediction

通用信息抽取

ACL 2022

https://arxiv.org/pdf/2205.10475v2.pdf

https://github.com/cgraywang/deepstruct

- TOE: A Grid-Tagging Discontinuous NER Model Enhanced by Embedding Tag/Word Relations and More Fine-Grained Tags

改进W2NER模型

IEEE TASLP(Transactions on Audio, Speech and Language Processing)

https://arxiv.org/pdf/2211.00684.pdf

https://github.com/solkx/TOE

- OPTIMIZING BI-ENCODER FOR NAMED ENTITY RECOGNITION VIA CONTRASTIVE LEARNING

ICLR 2023

https://arxiv.org/pdf/2208.14565v2.pdf

github.com/microsoft/binder

- One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER

2023

https://arxiv.org/pdf/2301.10410v2.pdf

https://github.com/zjunlp/DeepKE/tree/main/example/ner/cross

- QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition

2022

https://arxiv.org/pdf/2203.01543.pdf

- A Unified Generative Framework for Various NER Subtasks

(使用BART生成模型进行命名实体识别)

ACL-ICJNLP 2021

https://arxiv.org/pdf/2106.01223.pdf

https://github.com/yhcc/BARTNER

(以下四篇是基于prompt的命名实体识别)

- Template-Based Named Entity Recognition Using BART

https://arxiv.org/abs/2106.01760

https://github.com/Nealcly/templateNER

- Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER

https://arxiv.org/abs/2110.08454

https://github.com/INK-USC/fewNER

- LightNER: A Lightweight Generative Framework with Prompt-guided Attention for Low-resource NER

https://arxiv.org/abs/2109.00720

https://github.com/zjunlp/DeepKE/blob/main/example/ner/few-shot/README_CN.md

- Template-free Prompt Tuning for Few-shot NER

https://arxiv.org/abs/2109.13532

https://github.com/rtmaww/EntLM/

# 数据集

- [MSRA](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/MSRA)
- [Weibo](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/weibo)
- [resume](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/ResumeNER )
- onenotes4
- onenotes5
- [一家公司提供的数据集,包含人名、地名、机构名、专有名词。](https://bosonnlp.com/dev/resource)
- [人民网(04年)](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/people_daily)
- [影视-音乐-书籍实体标注数据](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/video_music_book_datasets)
- [中文医学文本命名实体识别 2020CCKS](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/2020_ccks_ner)
- [医渡云实体识别数据集](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/yidu-s4k )
- [CLUENER2020](https://github.com/GuocaiL/nlp_corpus/tree/main/open_ner_data/cluener_public)
- [不同任务中文数据集整理](https://github.com/liucongg/NLPDataSet)
- [医疗相关的数据集](http://172.16.1.113:9005/docs)
- [30+ner数据汇总](https://zhuanlan.zhihu.com/p/603850842)
- [中文实体识别数据集汇总](https://www.zhihu.com/question/264243637/answer/2936822902)

# 预训练语言模型

- [ChineseBert](https://aclanthology.org/2021.acl-long.161/) ACL2021
- [MacBert](https://arxiv.org/pdf/2004.13922.pdf) 2020
- [SpanBert](https://arxiv.org/pdf/1907.10529.pdf)
- [XLNet](https://arxiv.org/pdf/1906.08237.pdf)
- [Roberta](https://arxiv.org/pdf/1907.11692.pdf)
- [Bert](https://arxiv.org/pdf/1810.04805.pdf)
- [StructBert](https://arxiv.org/abs/1908.04577)
- [WoBert](https://github.com/ZhuiyiTechnology/WoBERT)
- [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB)
- [Ernie1.0](https://arxiv.org/pdf/1904.09223)
- [Ernie2.0](https://arxiv.org/abs/1907.12412)
- [Ernie3.0](https://arxiv.org/abs/2107.02137)
- [ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding](https://arxiv.org/pdf/2010.12148.pdf)
- [NeZha](https://arxiv.org/abs/1909.00204)
- [MengZi](https://arxiv.org/pdf/2110.06696.pdf )
- [ZEN](https://arxiv.org/pdf/1911.00720.pdf)
- [ALBERT](https://arxiv.org/pdf/1909.11942.pdf)
- [roformer](https://arxiv.org/abs/2104.09864)
- [roformer-v2](https://github.com/ZhuiyiTechnology/roformer-v2)
- [Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words](https://arxiv.org/pdf/2202.12142)
- [PERT: Pre-Training BERT with Permuted Language Model](https://arxiv.org/abs/2203.06906)
- [RoChBert: Towards Robust BERT Fine-tuning for Chinese](https://arxiv.org/pdf/2210.15944.pdf) EMNLP2022
- [MarkBERT: Marking Word Boundaries Improves Chinese BERT](https://arxiv.org/pdf/2203.06378.pdf) 2022
- [MVP-BERT: REDESIGNING VOCABULARIES FOR CHINESE BERT AND MULTI-VOCAB PRETRAINING](https://arxiv.org/pdf/2011.08539.pdf) 2022
- [LERT: A Linguistically-motivated Pre-trained Language Model](https://arxiv.org/pdf/2211.05344v1.pdf) 2022
- [AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization](https://arxiv.org/pdf/2008.11869.pdf) 2022
- [BURT: BERT-inspired Universal Representation from Learning Meaningful Segment](https://arxiv.org/pdf/2012.14320.pdf) 2021
- [Towards Efficient NLP: A Standard Evaluation and A Strong Baseline](https://aclanthology.org/2022.naacl-main.240.pdf)
- [Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence](https://arxiv.org/pdf/2209.02970.pdf)
- [AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models](https://github.com/modelscope/AdaSeq) 多种方法
- [TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning](https://arxiv.org/pdf/2111.04198.pdf) NAACL 2022
- [Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models](https://arxiv.org/ftp/arxiv/papers/2303/2303.10893.pdf) 2023
- [MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model](https://arxiv.org/pdf/2304.00717v1.pdf) 2023
- [sikuGPT](https://github.com/SIKU-BERT/sikuGPT) 古文模型 2023
- [UniIE](https://github.com/AAIG-NLP/UniIE) 通用信息抽取

# Ner工具

- [Stanza](https://github.com/stanfordnlp/stanza)
- [LAC](https://github.com/baidu/lac)
- [Ltp](https://github.com/HIT-SCIR/ltp) 哈工大
- [Hanlp](https://github.com/hankcs/HanLP)
- [foolnltk](https://github.com/rockyzhengwu/FoolNLTK)
- [NLTK](https://github.com/nltk/nltk)
- BosonNLP
- [FudanNlp](https://github.com/FudanNLP/fnlp) 复旦大学
- [Jionlp](https://github.com/dongrixinyu/JioNLP)
- [HarvestText](https://github.com/blmoistawinde/HarvestText)
- [fastHan](https://github.com/fastnlp/fastHan)
- [EsayNLP](https://github.com/alibaba/EasyNLP) 阿里巴巴
- [PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP) 百度
- [AliceMind](https://github.com/alibaba/AliceMind) 阿里巴巴
- [spacy](https://github.com/explosion/spaCy)
- [DeepKE](https://github.com/zjunlp/DeepKE)
- [coreNlp](https://github.com/stanfordnlp/CoreNLP) JAVA/Python
- [opennlp](https://github.com/apache/opennlp) JAVA
- [NLPIR](https://github.com/NLPIR-team/NLPIR/)
- [trankit](https://github.com/nlp-uoregon/trankit) 多语言
- [HugIE](https://github.com/wjn1996/HugNLP/blob/main/documents/information_extraction/HugIE.md) 通用信息抽取
- [EasyInstruct](https://github.com/zjunlp/EasyInstruct)

# 比赛

- CCKS2017开放的中文的电子病例测评相关的数据。

评测任务一:https://biendata.com/competition/CCKS2017_1/

评测任务二:https://biendata.com/competition/CCKS2017_2/

- CCKS2018开放的音乐领域的实体识别任务。

评测任务:https://biendata.com/competition/CCKS2018_2/

- (CoNLL 2002)Annotated Corpus for Named Entity Recognition。

地址:https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus

- NLPCC2018开放的任务型对话系统中的口语理解评测。

地址:http://tcci.ccf.org.cn/conference/2018/taskdata.php

- 非结构化商业文本信息中隐私信息识别

地址:https://www.datafountain.cn/competitions/472/datasets
- 商品标题识别

地址:https://www.heywhale.com/home/competition/620b34ed28270b0017b823ad/content/3
- CCKS2021中文NLP地址要素解析

地址:https://tianchi.aliyun.com/competition/entrance/531900/introduction
- CAIL2022信息抽取赛道

地址:http://cail.cipsc.org.cn/task6.html?raceID=6&cail_tag=2022
- [2019互联网金融新实体发现](https://github.com/TingFree/NLPer-Arsenal/blob/master/%E5%BE%80%E6%9C%9F%E7%AB%9E%E8%B5%9B/%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB/2019%E4%BA%92%E8%81%94%E7%BD%91%E9%87%91%E8%9E%8D%E6%96%B0%E5%AE%9E%E4%BD%93%E5%8F%91%E7%8E%B0.md)

- [2020CHIP-中药说明书实体识别挑战](https://github.com/TingFree/NLPer-Arsenal/blob/master/%E5%BE%80%E6%9C%9F%E7%AB%9E%E8%B5%9B/%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB/2020%E4%B8%AD%E8%8D%AF%E8%AF%B4%E6%98%8E%E4%B9%A6%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB%E6%8C%91%E6%88%98.md)

- [2020CHIP-中文医学文本命名实体识别](https://github.com/TingFree/NLPer-Arsenal/blob/master/%E5%BE%80%E6%9C%9F%E7%AB%9E%E8%B5%9B/%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB/2020%E4%B8%AD%E6%96%87%E5%8C%BB%E5%AD%A6%E6%96%87%E6%9C%AC%E5%91%BD%E5%90%8D%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB.md)

- [2020CCKS面向试验鉴定的命名实体识别](https://github.com/TingFree/NLPer-Arsenal/blob/master/%E5%BE%80%E6%9C%9F%E7%AB%9E%E8%B5%9B/%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB/2020CCKS%E9%9D%A2%E5%90%91%E8%AF%95%E9%AA%8C%E9%89%B4%E5%AE%9A%E7%9A%84%E5%91%BD%E5%90%8D%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB.md)

- [2020CCKS面向中文电子病历的医疗实体及事件抽取-子任务1:医疗命名实体识别](https://github.com/TingFree/NLPer-Arsenal/blob/master/%E5%BE%80%E6%9C%9F%E7%AB%9E%E8%B5%9B/%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB/2020CCKS%E9%9D%A2%E5%90%91%E4%B8%AD%E6%96%87%E7%94%B5%E5%AD%90%E7%97%85%E5%8E%86%E7%9A%84%E5%8C%BB%E7%96%97%E5%AE%9E%E4%BD%93%E5%8F%8A%E4%BA%8B%E4%BB%B6%E6%8A%BD%E5%8F%96-%E5%AD%90%E4%BB%BB%E5%8A%A1%E4%B8%80%EF%BC%9A%E5%8C%BB%E7%96%97%E5%90%8D%E9%97%A8%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB.md)
- [LAIC2022-犯罪事实实体识别](http://data.court.gov.cn/pages/laic.html)
- [SemEval-2023 Task 2: Fine-grained Multilingual Named Entity Recognition (MultiCoNER 2)](https://arxiv.org/pdf/2305.06586v1.pdf)
- [新型电力系统人工智能应用大赛赛题二:电力生产知识图谱多模式信息抽取](https://aistudio.baidu.com/aistudio/competition/detail/425/0/task-definition)
- [CCKS2022通用信息抽取](https://aistudio.baidu.com/aistudio/competition/detail/161/0/introduction)