{"id":13594633,"url":"https://github.com/iflytek/HFL-Anthology","last_synced_at":"2025-04-09T07:33:27.426Z","repository":{"id":37415913,"uuid":"245039617","full_name":"iflytek/HFL-Anthology","owner":"iflytek","description":"Collections of resources from Joint Laboratory of HIT and iFLYTEK Research (HFL)","archived":false,"fork":false,"pushed_at":"2023-03-09T06:19:57.000Z","size":634,"stargazers_count":354,"open_issues_count":1,"forks_count":40,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-08-02T16:53:32.945Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Markdown","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-sa-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iflytek.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-03-05T01:16:34.000Z","updated_at":"2024-07-09T02:53:43.000Z","dependencies_parsed_at":"2024-01-16T22:18:33.696Z","dependency_job_id":"5e1e3acb-436d-4254-b607-a47d9fa5fa4f","html_url":"https://github.com/iflytek/HFL-Anthology","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iflytek%2FHFL-Anthology","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iflytek%2FHFL-Anthology/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iflytek%2FHFL-Anthology/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iflytek%2FHFL-Anthology/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iflytek","download_url":"https://codeload.github.com/iflytek/HFL-Anthology/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223375397,"owners_count":17135364,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T16:01:36.765Z","updated_at":"2024-11-06T16:31:40.667Z","avatar_url":"https://github.com/iflytek.png","language":"Markdown","funding_links":[],"categories":["Markdown"],"sub_categories":[],"readme":"[**中文**](https://github.com/ymcui/HFL-Anthology/blob/master/README_ZH.md) | [**English**](https://github.com/ymcui/HFL-Anthology)\n\n\u003cp align=\"center\"\u003e\n    \u003cbr\u003e\n    \u003cimg src=\"./banner.png\" width=\"500\"/\u003e\n    \u003cbr\u003e\n\u003c/p\u003e\n\n\nCollections of resources from Joint Laboratory of HIT and iFLYTEK Research (HFL).\n\n\u003c!-- TOC --\u003e\n\n- [Pre-trained Language Model](#Pre-trained-Language-Model)\n- [Dataset](#Dataset)\n- [Toolkit](#Toolkit)\n- [Demo](#System-Demonstration)\n- [Evaluation Campaign](#Evaluation-Campaign)\n- [Paper](#Paper)\n\n\u003c!-- /TOC --\u003e\n\n## Pre-trained Language Model\n\n| Name | Description |\n| :------ | :------ |\n| [VLE](https://github.com/iflytek/VLE) | Multimodal Vision-Language Encoder |\n| [MiniRBT](https://github.com/iflytek/MiniRBT) | Chinese MiniRBT models (a series of small pre-trained models) |\n| [LERT](https://github.com/ymcui/LERT) | Chinese LERT models (small-level, base-level, large-level) |\n| [PERT](https://github.com/ymcui/PERT) | Chinese and English PERT models (base-level, large-level) |\n| [Chinese-MobileBERT](https://github.com/ymcui/Chinese-MobileBERT) | Chinese MobileBERT (base-level, large-level) *(archival purpose only)* |\n| [CINO](https://github.com/ymcui/Chinese-Minority-PLM) | Pre-trained Language Models for Chinese Minority Languages |\n| [MacBERT](https://github.com/ymcui/MacBERT) | Chinese pre-trained MacBERT models (MacBERT-base, MacBERT-large) |\n| [CharBERT](https://github.com/wtma/CharBERT) | English pre-trained CharBERT models |\n| [Chinese-ELECTRA](https://github.com/ymcui/Chinese-ELECTRA) | Chinese pre-trained ELECTRA models (ELECTRA-base, ELECTRA-small) with code supports for six tasks: CMRC 2018, DRCD, XNLI, ChnSentiCorp, LCQMC, BQCorpus |\n| [Chinese-XLNet](https://github.com/ymcui/Chinese-XLNet) | Chinese pre-trained XLNet models: XLNet-mid, XLNet-base |\n| [Chinese-BERT-wwm](https://github.com/ymcui/Chinese-BERT-wwm) | Chinese BERT with Whole Word Masking (wwm), including BERT-wwm, BERT-wwm-ext, RoBERTa-wwm-ext, RoBERTa-wwm-ext-large, RBT3, RBTL3 |\n\n\n## Dataset\n\n| Name | Type | Paper |\n| :------ | :------: | :------: |\n| [CCTC](https://github.com/destwang/CTCResources) | Text Correction | [Wang et al., 2022](https://aclanthology.org/2022.coling-1.294.pdf) |\n| [CTC 2021](https://github.com/destwang/CTC2021) | Text Correction |[Wang et al., 2022](https://arxiv.org/pdf/2208.05681.pdf)|\n| [ExpMRC](https://github.com/ymcui/expmrc) | Reading Comprehension | [Cui et al., 2021](https://arxiv.org/abs/2105.04126) |\n| [AdvRACE](https://github.com/NoviScl/AdvRACE) | Reading Comprehension | [Si et al., 2020](https://arxiv.org/abs/2004.14004) |\n| [CMRC 2019](https://github.com/ymcui/cmrc2019) | Reading Comprehension | [Cui et al., 2020](https://arxiv.org/abs/2004.03116) |\n| [CJRC](https://github.com/china-ai-law-challenge/CAIL2019/tree/master/阅读理解) | Reading Comprehension | [Duan et al., 2019](https://arxiv.org/abs/1912.09156) |\n| [CMRC 2018](https://github.com/ymcui/cmrc2018) | Reading Comprehension | [Cui et al., 2019](https://www.aclweb.org/anthology/D19-1600/) |\n| [CMRC 2017](https://github.com/ymcui/cmrc2017) | Reading Comprehension | [Cui et al., 2018](https://www.aclweb.org/anthology/L18-1431/) |\n| [PD\u0026CFT](https://github.com/ymcui/Chinese-Cloze-RC) | Reading Comprehension | [Cui et al., 2016](https://www.aclweb.org/anthology/C16-1167/) |\n\n\n## Toolkit\n\n| Name | Description | Paper |\n| :------ | :------- | :------: |\n| [TextPruner](https://github.com/airaria/TextPruner) | Model Pruning for NLP | [Yang et al., 2022](https://aclanthology.org/2022.acl-demo.4) |\n| [TextBrewer](https://github.com/airaria/TextBrewer) | Knowledge Distillation for NLP | [Yang et al., 2020](https://aclanthology.org/2020.acl-demos.2/) |\n\n\n## System Demonstration\n\n| Name | Description | Paper |\n| :------ | :------- | :------: |\n| IFlyEA | A Chinese Essay Assessment System with Automated Rating, Review Generation, and Recommendation | [Gong et al., 2021](https://aclanthology.org/2021.acl-demo.29/) |\n| [iFLYChecker](http://check.hfl-rc.com) | A Chinese Grammar Checking System | - |\n| [IFlyLegal](https://github.com/china-ai-law-challenge/CAIL2019/tree/master/阅读理解#法小飞) | A Chinese Legal System for Consultation \u0026 Law Searching | [Wang et al., 2019](https://www.aclweb.org/anthology/D19-3017/) |\n\n\n## Evaluation Campaign\n\n| Name | Description | Live Leaderboard |\n| :------ | :------: | :------: |\n| [CMRC 2022](http://cmrc2022.hfl-rc.com) | Explainable Reading Comprehension | ✅ |\n| [CTC 2021](https://github.com/destwang/CTC2021) | Chinese Text Correction | ✅ |\n| [CAIL 2020](http://cail.cipsc.org.cn) | Judiciary Reading Comprehension | ❌ |\n| [CMRC 2019](http://hfl-rc.github.io/cmrc2019/) | Sentence Cloze Reading Comprehension | ✅ |\n| [CAIL 2019](http://cail.cipsc.org.cn) | Judiciary Reading Comprehension | ❌ |\n| [CMRC 2018](http://hfl-rc.github.io/cmrc2018/) | Span-Extraction Reading Comprehension | ✅ |\n| [CMRC 2017](http://hfl-rc.github.io/cmrc2017/) | Cloze-style Reading Comprehension | ❌ |\n\n## Paper\n\n| Year | Paper | Author List | Published in | Note |\n| :-----: | :------ | :------- | :------: | :------: |\n| 2022 | [Visualizing Attention Zones in Machine Reading Comprehension Models](https://star-protocols.cell.com/protocols/1736) | Yiming Cui, Wei-Nan Zhang, Ting Liu | STAR Protocols | [GitHub](https://github.com/ymcui/mrc-model-analysis) |\n| 2022 | [Multilingual Multi-Aspect Explainability Analyses on Machine Reading Comprehension Models](https://www.cell.com/iscience/fulltext/S2589-0042%2822%2900446-1) | Yiming Cui, Wei-Nan Zhang, Wanxiang Che, Ting Liu, Zhigang Chen, Shijin Wang | iScience | [GitHub](https://github.com/ymcui/mrc-model-analysis) |\n| 2021 | [ExpMRC: Explainability Evaluation for Machine Reading Comprehension](https://www.cell.com/heliyon/fulltext/S2405-8440%2822%2900578-3) | Yiming Cui, Ting Liu, Wanxiang Che, Zhigang Chen, Shijin Wang | Heliyon | [GitHub](https://github.com/ymcui/expmrc) |\n| 2022 | [Teaching Machines to Read, Answer and Explain](https://ieeexplore.ieee.org/document/9729502) | Yiming Cui, Ting Liu, Wanxiang Che, Zhigang Chen, Shijin Wang | IEEE/ACM TASLP |  |\n| 2022 | [PERT: Pre-training BERT with Permuted Language Model](https://arxiv.org/abs/2203.06906) | Yiming Cui, Ziqing Yang, Ting Liu | | [GitHub](https://github.com/ymcui/PERT) |\n| 2022 | [A Static and Dynamic Attention Framework for Multi Turn Dialogue Generation](https://dl.acm.org/doi/10.1145/3522763) | Wei-Nan Zhang, Yiming Cui, Kaiyan Zhang, Yifa Wang, Qingfu Zhu, Lingzhi Li, Ting Liu | ACM TOIS |  |\n| 2022 | [Cross-Lingual Text Classification with Multilingual Distillation and Zero-Shot-Aware Training](https://arxiv.org/abs/2202.13654) | Ziqing Yang, Yiming Cui, Zhigang Chen, Shijin Wang | | |\n| 2022 | [CINO: A Chinese Minority Pre-trained Language Model](https://arxiv.org/abs/2202.13558) | Ziqing Yang, Zihang Xu, Yiming Cui, Baoxin Wang, Min Lin, Dayong Wu, Zhigang Chen | [GitHub](https://github.com/ymcui/Chinese-Minority-PLM) |\n| 2022 | [HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with Data Augmentation for Multilingual News Similarity](https://aclanthology.org/2022.semeval-1.157/) | Zihang Xu, Ziqing Yang, Yiming Cui, Zhigang Chen | SemEval 2022 | [GitHub](https://github.com/GeekDream-x/SemEval2022-Task8-TonyX) |\n| 2022 | [HIT at SemEval-2022 Task 2: Pre-trained Language Model for Idioms Detection](https://aclanthology.org/2022.semeval-1.28/) | Zheng Chu, Ziqing Yang, Yiming Cui, Zhigang Chen, Ming Liu | SemEval 2022 | |\n| 2022 | [TextPruner: A Model Pruning Toolkit for Pre-trained Language Models](https://aclanthology.org/2022.acl-demo.4) | Ziqing Yang, Yiming Cui, Zhigang Chen | ACL 2022 Demo | [GitHub](https://github.com/airaria/TextPruner)\n| 2022 | [Interactive Gated Decoder for Machine Reading Comprehension](https://dl.acm.org/doi/10.1145/3501399) | Yiming Cui, Wanxiang Che, Ziqing Yang, Ting Liu, Bing Qin, Shijin Wang, Guoping Hu | ACM TALLIP |  |\n| 2021 | [IFlyEA: A Chinese Essay Assessment System with Automated Rating, Review Generation, and Recommendation](https://aclanthology.org/2021.acl-demo.29/) | Jiefu Gong, Xiao Hu, Wei Song, Ruiji Fu, Zhichao Sheng, Bo Zhu, Shijin Wang, Ting Liu | ACL 2021 Demo |  |\n| 2021 | [Dynamic Connected Networks for Chinese Spelling Check](https://aclanthology.org/2021.findings-acl.216/) | Baoxin Wang, Wanxiang Che, Dayong Wu, Shijin Wang, Guoping Hu, Ting Liu | Findings of ACL 2021 | |\n| 2021 | [Various Legal Factors Extraction Based on Machine Reading Comprehension](https://link.springer.com/chapter/10.1007/978-3-030-88189-4_2) | Beichen Wang, Ziyue Wang, Baoxin Wang, Dayong Wu, Zhigang Chen, Shijin Wang, Guoping Hu | CCIR 2021 | |\n| 2021 | 利用深层语言分析改进中文作文自动评分方法 | 魏思，巩捷甫，宋巍，宋子尧，王士进 | 中文信息学报 | |\n| 2021 | [Bilingual Alignment Pre-training for Zero-shot Cross-lingual Transfer](https://aclanthology.org/2021.mrqa-1.10/) | Ziqing Yang, Wentao Ma, Yiming Cui, Jiani Ye, Wanxiang Che, Shijin Wang | MRQA 2021 |  |\n| 2021 | [Adversarial Training for Machine Reading Comprehension with Virtual Embeddings](https://aclanthology.org/2021.starsem-1.30/) | Ziqing Yang, Yiming Cui, Chenglei Si, Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu | *SEM 2021 |  |\n| 2021 | [Pre-Training with Whole Word Masking for Chinese BERT](http://ieeexplore.ieee.org/document/9599397) | Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang | IEEE/ACM TASLP | [GitHub1](https://github.com/ymcui/Chinese-BERT-wwm/), [GitHub2](https://github.com/ymcui/Chinese-PreTrained-XLNet) |\n| 2021 | [Benchmarking Robustness of Machine Reading Comprehension Models](https://arxiv.org/abs/2004.14004) | Chenglei Si, Ziqing Yang, Yiming Cui, Wentao Ma, Ting Liu, Shijin Wang | Findings of ACL 2021 | [GitHub](https://github.com/NoviScl/AdvRACE) |\n| 2020 | [A Sentence Cloze Dataset for Chinese Machine Reading Comprehension](https://arxiv.org/abs/2004.03116) | Yiming Cui, Ting Liu, Ziqing Yang, Zhipeng Chen, Wentao Ma, Wanxiang Che, Shijin Wang, Guoping Hu | COLING 2020 | [GitHub](http://hfl-rc.github.io/cmrc2019/) |\n| 2020 | [CharBERT: Character-aware Pre-trained Language Model](https://arxiv.org/abs/2011.01513) | Wentao Ma, Yiming Cui, Chenglei Si, Ting Liu, Shijin Wang, Guoping Hu | COLING 2020 | [GitHub](https://github.com/wtma/CharBERT) |\n| 2020 | [Revisiting Pre-Trained Models for Chinese Natural Language Processing](https://arxiv.org/abs/2004.13922) | Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, Guoping Hu | Findings of EMNLP 2020 | [GitHub](https://github.com/ymcui/MacBERT) |\n| 2020 | [Is Graph Structure Necessary for Multi-hop Question Answering?](https://arxiv.org/abs/2004.03096) | Nan Shao, Yiming Cui, Ting Liu, Shijin Wang, Guoping Hu | EMNLP 2020 | - |\n| 2020 | [TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing](https://www.aclweb.org/anthology/2020.acl-demos.2/) | Ziqing Yang, Yiming Cui, Zhipeng Chen, Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu | ACL 2020 Demo | [GitHub](https://github.com/airaria/TextBrewer) |\n| 2020 | [Conversational Word Embedding for Retrieval-based Dialog System](https://www.aclweb.org/anthology/2020.acl-main.127/) | Wentao Ma, Yiming Cui, Ting Liu, Dong Wang, Shijin Wang, Guoping Hu | ACL 2020 | [GitHub](https://github.com/wtma/PR-Embedding) |\n| 2020 | [Discriminative Sentence Modeling for Story Ending Prediction](https://aaai.org/ojs/index.php/AAAI/article/view/6260) | Yiming Cui, Wanxiang Che, Wei-Nan Zhang, Ting Liu, Shijin Wang, Guoping Hu | AAAI 2020 | - |\n| 2019 | [Cross-Lingual Machine Reading Comprehension](https://www.aclweb.org/anthology/D19-1169/) | Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, Guoping Hu | EMNLP 2019 | [GitHub](https://github.com/ymcui/Cross-Lingual-MRC) |\n| 2019 | [A Span-Extraction Dataset for Chinese Machine Reading Comprehension](https://www.aclweb.org/anthology/D19-1600/) | Yiming Cui, Ting Liu, Wanxiang Che, Li Xiao, Zhipeng Chen, Wentao Ma, Shijin Wang, Guoping Hu | EMNLP 2019 | [GitHub](https://github.com/ymcui/cmrc2018) |\n| 2019 | [IFlyLegal: A Chinese Legal System for Consultation, Law Searching, and Document Analysis](https://www.aclweb.org/anthology/D19-3017/) | Ziyue Wang, Baoxin Wang, Xingyi Duan, Dayong Wu, Shijin Wang, Guoping Hu, Ting Liu | EMNLP 2019 Demo | - |\n| 2019 | [TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots](https://www.aclweb.org/anthology/K19-1069/) | Wentao Ma, Yiming Cui, Nan Shao, Su He, Wei-Nan Zhang, Ting Liu, Shijin Wang, Guoping Hu | CoNLL 2019 | [GitHub](https://github.com/wtma/TripleNet) |\n| 2019 | [Improving Machine Reading Comprehension via Adversarial Training](https://arxiv.org/abs/1911.03614) | Ziqing Yang, Yiming Cui, Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu | - | - |\n| 2019 | [Contextual Recurrent Units for Cloze-style Reading Comprehension](https://arxiv.org/abs/1911.05960) | Yiming Cui, Wei-Nan Zhang, Wanxiang Che, Ting Liu, Zhipeng Chen, Shijin Wang, Guoping Hu | - | - |\n| 2019 | [CJRC: A Reliable Human-Annotated Benchmark DataSet for Chinese Judicial Reading Comprehension](https://arxiv.org/abs/1912.09156) | Xingyi Duan, Baoxin Wang, Ziyue Wang, Wentao Ma, Yiming Cui, Dayong Wu, Shijin Wang, Ting Liu, Tianxiang Huo, Zhen Hu, Heng Wang, Zhiyuan Liu | CCL 2019 | [GitHub](https://github.com/china-ai-law-challenge/CAIL2019) |\n| 2019 | [Convolutional Spatial Attention Model for Reading Comprehension with Multiple-Choice Questions](https://arxiv.org/abs/1811.08610) | Zhipeng Chen, Yiming Cui, Wentao Ma, Shijin Wang, Guoping Hu | AAAI 2019 | - |\n| 2018 | [Disconnected Recurrent Neural Networks for Text Categorization](https://www.aclweb.org/anthology/P18-1215/) | Baoxin Wang | ACL 2018 | - |\n| 2018 | [HFL-RC System at SemEval-2018 Task 11: Hybrid Multi-Aspects Model for Commonsense Reading Comprehension](https://arxiv.org/abs/1803.05655) | Zhipeng Chen, Yiming Cui*, Wentao Ma, Shijin Wang, Ting Liu, Guoping Hu | - | - |\n| 2018 | [Dataset for the First Evaluation on Chinese Machine Reading Comprehension](https://www.aclweb.org/anthology/L18-1431/) | Yiming Cui, Ting Liu, Zhipeng Chen, Wentao Ma, Shijin Wang, Guoping Hu | LREC 2018 | [GitHub](https://github.com/ymcui/cmrc2017) |\n| 2018 | [Chinese Grammatical Error Diagnosis using Statistical and Prior Knowledge driven Features with Probabilistic Ensemble Enhancement](https://www.aclweb.org/anthology/W18-3707/) | Ruiji Fu, Zhengqi Pei, Jiefu Gong, Wei Song, Dechuan Teng, Wanxiang Che, Shijin Wang, Guoping Hu, Ting Liu | NLP-TEA@ACL 2018 | - |\n| 2017 | [面向作文自动评分的优美句识别](http://jcip.cipsc.org.cn/CN/article/downloadArticleFile.do?attachType=PDF\u0026id=2586) | 付瑞吉，王栋，王士进，胡国平，刘挺 | 中文信息学报 | - |\n| 2017 | [Attention-over-Attention Neural Networks for Reading Comprehension](https://www.aclweb.org/anthology/P17-1055/) | Yiming Cui, Zhipeng Chen, Si Wei, Shijin Wang, Ting Liu, Guoping Hu | ACL 2017 | - |\n| 2017 | [Generating and Exploiting Large-scale Pseudo Training Data for Zero Pronoun Resolution](https://www.aclweb.org/anthology/P17-1010/) | Ting Liu, Yiming Cui, Qingyu Yin, Wei-Nan Zhang, Shijin Wang, Guoping Hu | ACL 2017 | - |\n| 2016 | [Consensus Attention-based Neural Networks for Chinese Reading Comprehension](https://www.aclweb.org/anthology/C16-1167/) | Yiming Cui, Ting Liu, Zhipeng Chen, Shijin Wang, Guoping Hu | COLING 2016 | [GitHub](https://github.com/ymcui/Chinese-Cloze-RC) |\n| 2016 | [LSTM Neural Reordering Feature for Statistical Machine Translation](https://www.aclweb.org/anthology/N16-1112/) | Yiming Cui, Shijin Wang, Jianfeng Li | NAACL 2016 | - |\n\n## Follow Us\nFollow our official WeChat account to keep updated with our latest technologies!\n\n![](./hfl_qrcode.jpg)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiflytek%2FHFL-Anthology","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiflytek%2FHFL-Anthology","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiflytek%2FHFL-Anthology/lists"}