Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hscspring/all4nlp
All For NLP, especially Chinese.
https://github.com/hscspring/all4nlp
ai deeplearning machinelearning nlp
Last synced: 8 days ago
JSON representation
All For NLP, especially Chinese.
- Host: GitHub
- URL: https://github.com/hscspring/all4nlp
- Owner: hscspring
- License: mit
- Created: 2017-11-02T08:03:05.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2021-10-22T19:04:17.000Z (about 3 years ago)
- Last Synced: 2024-12-28T08:41:14.290Z (13 days ago)
- Topics: ai, deeplearning, machinelearning, nlp
- Language: Jupyter Notebook
- Homepage:
- Size: 63.5 MB
- Stars: 172
- Watchers: 6
- Forks: 33
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# All4NLP
## Leading Track
- [卖萌屋学术站](https://arxiv.xixiaoyao.cn/)
- [The NLP Index](https://index.quantumstat.com/)
- [Nature Language Tech | Synced](https://syncedreview.com/category/technology/n-nature-language-tech/)
- [Natural Language Processing | Papers With Code](https://paperswithcode.com/area/natural-language-processing/)
- [机器之心 SOTA](https://www.jiqizhixin.com/sota/tech-fields/8a0ace81-a1e8-44ec-962f-dbe351a26e37)
- [中文GLUE](https://www.cluebenchmarks.com/index.html)
- [GLUE Benchmark](https://gluebenchmark.com/tasks)
- [Google Research](https://github.com/google-research/)
- [Facebook Research](https://github.com/facebookresearch)
- [microsoft/unilm: UniLM - Unified Language Model Pre-training / Pre-training for NLP and Beyond](https://github.com/microsoft/unilm)
- [腾讯技术工程 | 机器之心](https://www.jiqizhixin.com/columns/TEG)
- [美团技术团队](https://tech.meituan.com/)
- [pytorch](https://github.com/pytorch)
- [tensorflow](https://github.com/tensorflow)## Research
- [NLP Progress](https://github.com/sebastianruder/NLP-progress)
## Framework & Toolkit
- [facebookresearch/pytext: A natural language modeling framework based on PyTorch](https://github.com/facebookresearch/pytext)
- deeplearning NLP with PyTorch
- Text classifiers, Sequence taggers, Joint intent-slot model and Contextual intent-slot models
- C++ server example
- [zalandoresearch/flair: A very simple framework for state-of-the-art Natural Language Processing (NLP)](https://github.com/zalandoresearch/flair)
- NER, POS, sense disambiguation and classification
- on top of PyTorch
- [pytorch/fairseq: Facebook AI Research Sequence-to-Sequence Toolkit written in Python.](https://github.com/pytorch/fairseq)
- Seq2Seq modeling
- on top of PyTorch
- [BrikerMan/Kashgari: Kashgari is a Production-ready NLP Transfer learning framework for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.](https://github.com/BrikerMan/Kashgari)
- Text labeling, classification, Pre-trained
- on top of Tensorflow
- [asyml/texar: Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow](https://github.com/asyml/texar)
- NLP Toolkit
- on top of Tensorflow
- [stanfordnlp/stanza: Official Stanford NLP Python Library for Many Human Languages](https://github.com/stanfordnlp/stanfordnlp)
- on top of Pytorch
- speed, prodcution system use
- [nltk/nltk: NLTK Source](https://github.com/nltk/nltk)
- education and research tool
- learning and exploring NLP concepts
- [sloria/TextBlob: Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.](https://github.com/sloria/textblob)
- on top of NLTK
- fast-prtotyping
- applications don't require highly performance
- [spaCy · Industrial-strength Natural Language Processing in Python](https://spacy.io/)
- fast
- streamlined
- production-ready
- [chartbeat-labs/textacy: NLP, before and after spaCy](https://github.com/chartbeat-labs/textacy)### Tokenizer
- [OpenNMT/Tokenizer: Fast and customizable text tokenization library with BPE and SentencePiece support](https://github.com/OpenNMT/Tokenizer)
- [google/sentencepiece: Unsupervised text tokenizer for Neural Network-based text generation.](https://github.com/google/sentencepiece)
- [huggingface/tokenizers: 💥Fast State-of-the-Art Tokenizers optimized for Research and Production](https://github.com/huggingface/tokenizers)### Seq2Seq
- [OpenNMT - Open-Source Neural Machine Translation](https://opennmt.net/)
- [google-research/text-to-text-transfer-transformer: Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"](https://github.com/google-research/text-to-text-transfer-transformer)
- [tensorflow/tensor2tensor: Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.](https://github.com/tensorflow/tensor2tensor/)### Text First
- [pytorch/text: Data loaders and abstractions for text and NLP](https://github.com/pytorch/text)
- [tensorflow/text: Making text a first-class citizen in TensorFlow.](https://github.com/tensorflow/text)
- [textpipe/textpipe: Textpipe: clean and extract metadata from text](https://github.com/textpipe/textpipe)## Task & Model
### Language Model
- **`2020 Chinese-Bert`** [CLUEbenchmark/CLUEPretrainedModels](https://github.com/CLUEbenchmark/CLUEPretrainedModels)
- **`2019 GPT2+Chinese`** [Morizeyao/GPT2-Chinese: Chinese version of GPT2 training code, using BERT tokenizer.](https://github.com/Morizeyao/GPT2-Chinese)
- **`2019 Bert-wwm`** [ymcui/Chinese-BERT-wwm: Pre-Training with Whole Word Masking for Chinese BERT(中文 BERT-wwm 系列模型)](https://github.com/ymcui/Chinese-BERT-wwm)
- **`2019 Toolkit`** [huggingface/pytorch-transformers: 👾 A library of state-of-the-art pretrained models for Natural Language Processing (NLP)](https://github.com/huggingface/pytorch-transformers)
- **`2019 MASK`** [google-research/bert: TensorFlow code and pre-trained models for BERT](https://github.com/google-research/bert)
- **`2019 Permutation`** [zihangdai/xlnet: XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://github.com/zihangdai/xlnet)
- **`2019 MultiTask`** [PaddlePaddle/ERNIE: An Implementation of ERNIE For Language Understanding](https://github.com/PaddlePaddle/ERNIE)
- **`2019 Attention`** [kimiyoung/transformer-xl](https://github.com/kimiyoung/transformer-xl)
- **`2019 LM `** [openai/gpt-2: Code for the paper "Language Models are Unsupervised Multitask Learners"](https://github.com/openai/gpt-2)
- **`2018 TwoLMs`** [ELMo: Deep contextualized word representations](https://allennlp.org/elmo)
- **`2018 Co-occurrence`** [stanfordnlp/GloVe: GloVe model for distributed word representation](https://github.com/stanfordnlp/GloVe)
- **`2019`** [facebookresearch/fastText: Library for fast text representation and classification.](https://github.com/facebookresearch/fastText/)
- **`2019 Word2vec`** [Embedding/Chinese-Word-Vectors: 100+ Chinese Word Vectors 上百种预训练中文词向量](https://github.com/Embedding/Chinese-Word-Vectors)
- **`2018` Word2vec** [Chinese-Word-Vectors](https://github.com/Embedding/Chinese-Word-Vectors)
- **`2018 LSTM`** [递归神经网络 | TensorFlow](https://www.tensorflow.org/tutorials/sequences/recurrent#language_modeling)
- **`2013`** [Google Code Archive - Long-term storage for Google Code Project Hosting.](https://code.google.com/archive/p/word2vec/)### Text Generation
- **`2020 Toolkit`** [RUCAIBox/TextBox: TextBox is an open-source library for building text generation system.](https://github.com/RUCAIBox/TextBox)
- **`2020 Awesome`** [tokenmill/awesome-nlg: A curated list of resources dedicated to Natural Language Generation (NLG)](https://github.com/tokenmill/awesome-nlg)
- **`2018 BenchMark`** [geek-ai/Texygen: A text generation benchmarking platform](https://github.com/geek-ai/Texygen)
- **`2018 RNN`** [docs/text_generation.ipynb at master · tensorflow/docs](https://github.com/tensorflow/docs/blob/master/site/en/tutorials/sequences/text_generation.ipynb)
- **`2019 Tookit on top of TF`** [asyml/texar: Toolkit for Text Generation and Beyond](https://github.com/asyml/texar)### Classification
- **`Collection`** [brightmart/text_classification: all kinds of text classification models and more with deep learning](https://github.com/brightmart/text_classification)
### NLU & IE
- **`2019 Framework `** [RasaHQ/rasa_nlu: 💬 Open source library for natural language understanding and machine learning-based dialogue management. - All things around intent classification, entity extraction and action predictions - DIY NLP and chatbot framwork.](https://github.com/RasaHQ/rasa_nlu)
- **`2018 Chi`** [crownpku/Rasa_NLU_Chi: Turn Chinese natural language into structured data 中文自然语言理解](https://github.com/crownpku/Rasa_NLU_Chi)
- **`2019 Toolkit `** [snipsco/snips-nlu: Snips Python library to extract meaning from text](https://github.com/snipsco/snips-nlu)### QA
- **`2020 Toolkit`** [RUCAIBox/CRSLab: CRSLab is an open-source toolkit for building Conversational Recommender System (CRS).](https://github.com/RUCAIBox/CRSLab)
- **`2018`** [5hirish/adam_qas: ADAM - A Question Answering System. Inspired from IBM Watson](https://github.com/5hirish/adam_qas)### Similarity
- **`2019 Sentence`** [UKPLab/sentence-transformers: Sentence Embeddings with BERT & XLNet](https://github.com/UKPLab/sentence-transformers)
- **`2019 Sentence`** [hanxiao/bert-as-service: Mapping a variable-length sentence to a fixed-length vector using BERT model](https://github.com/hanxiao/bert-as-service)
- **`2018 Sentence`** [explosion/sense2vec: 🦆 Use NLP to go beyond vanilla word2vec](https://github.com/explosion/sense2vec)
- **`2019 Sentence`** [gensim: models.doc2vec – Doc2vec paragraph embeddings](https://radimrehurek.com/gensim/models/doc2vec.html)
- **`2014 Sentence`** [klb3713/sentence2vec: Tools for mapping a sentence with arbitrary length to vector space](https://github.com/klb3713/sentence2vec)
- **`2019 Doc+Sentence+Word`** [gensim: Topic modelling for humans](https://radimrehurek.com/gensim/)
- **`2019 MinHash`** [ekzhu/datasketch: MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++](https://github.com/ekzhu/datasketch)
- **`2019 LevenshteinDistance`** [ztane/python-Levenshtein: The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity](https://github.com/ztane/python-Levenshtein)
- **`2018 Graph`** [caesar0301/graphsim: Graph similarity algorithms based on NetworkX.](https://github.com/caesar0301/graphsim)### Pinyin
- **`2019 Pinyin`** [mozillazg/python-pinyin: 汉字转拼音 (pypinyin)](https://github.com/mozillazg/python-pinyin)
### Visualization
- **`2020 Explain`** [jalammar/ecco: Visualize and explore NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2).](https://github.com/jalammar/ecco)
- **`2019 Word`** [JasonKessler/scattertext: Beautiful visualizations of how language differs among document types.](https://github.com/JasonKessler/scattertext)
- **`2019 Bert GPT`** [jessevig/bertviz: Tool for visualizing attention in the Transformer model (BERT and OpenAI GPT-2)](https://github.com/jessevig/bertviz)
- **`2019 MLC`** [marcotcr/lime: Lime: Explaining the predictions of any machine learning classifier](https://github.com/marcotcr/lime)
- **`2019 Graph Visualization Framework`** [antvis/G6: ♾ A Graph Visualization Framework in JavaScript](https://github.com/antvis/g6)
- **`2017 Neo4j D3`** [eisman/neo4jd3: Neo4j graph visualization using D3.js](https://github.com/eisman/neo4jd3)
- **`2019 Neo4j browser`** [neo4j-contrib/neovis.js: Neo4j + vis.js = neovis.js. Graph visualizations in the browser with data from Neo4j.](https://github.com/neo4j-contrib/neovis.js)
- **`2019 Neo4j 3D`** [jexp/neo4j-3d-force-graph: Experiments with Neo4j & 3d-force-graph https://github.com/vasturiano/3d-force-graph](https://github.com/jexp/neo4j-3d-force-graph)
- **`2019 Interactive Graphvizz`** [magjac/graphviz-visual-editor: A web application for interactive visual editing of Graphviz graphs described in the DOT language.](https://github.com/magjac/graphviz-visual-editor)
- **`2019 graphviz Python`** [mapio/GraphvizAnim: A tool to create animated graph visualizations, based on graphviz.](https://github.com/mapio/GraphvizAnim)### Readability
- **`2019 Kinds of indexes`** [shivam5992/textstat: python package to calculate readability statistics of a text object - paragraphs, sentences, articles.](https://github.com/shivam5992/textstat)
- **`2019 in Spacy`** [mholtzscher/spacy_readability: spaCy pipeline component for adding text readability meta data to Doc objects.](https://github.com/mholtzscher/spacy_readability)### Translation
- **`2019 XLM`** [facebookresearch/XLM: PyTorch original implementation of Cross-lingual Language Model Pretraining.](https://github.com/facebookresearch/XLM)
- **`2018 Microsoft Based on Phrase`** [Microsoft/NPMT: Towards Neural Phrase-based Machine Translation](https://github.com/Microsoft/NPMT)
- **`2019 Google Based on Seq2Seq and Attention`** [tensorflow/nmt: TensorFlow Neural Machine Translation Tutorial](https://github.com/tensorflow/nmt)
- **`2019 Google Based on Pure Attention`** [models/official/transformer at master · tensorflow/models](https://github.com/tensorflow/models/tree/master/official/transformer)
- **`2019 Facebook Based on CNN`** [pytorch/fairseq: Facebook AI Research Sequence-to-Sequence Toolkit written in Python.](https://github.com/pytorch/fairseq)
- **`2019 Facebook Based on Unsupervised`** [facebookresearch/UnsupervisedMT: Phrase-Based & Neural Unsupervised Machine Translation](https://github.com/facebookresearch/UnsupervisedMT)
- **`2019 DeepL Basedon CNN (Not Open Source)`** [DeepL Translator](https://www.deepl.com/translator) DeepL 基于 CNN 的翻译工具
- **`2019 OpenNMT`** [OpenNMT/OpenNMT: Open Source Neural Machine Translation](https://github.com/OpenNMT/OpenNMT)### Style Transfer
- **`2020 Awesome`** [fuzhenxin/Style-Transfer-in-Text: Paper List for Style Transfer in Text](https://github.com/fuzhenxin/Style-Transfer-in-Text)
## Tricks
- **`2020`** [让PyTorch训练速度更快,你需要掌握这17种方法](https://mp.weixin.qq.com/s/hvlr098BxWOf6C8zO0UvYg)
- **`2017`** [Deep Learning for NLP Best Practices](https://ruder.io/deep-learning-nlp-best-practices/index.html)## Dataset
- [datasets](https://github.com/huggingface/datasets/tree/master/docs)
- [中文任务基准测评](https://github.com/CLUEbenchmark/CLUE)
- [中文预训练语料](https://github.com/CLUEbenchmark/CLUECorpus2020)
- [cluebenchmarks.com/dataSet_search.html](https://www.cluebenchmarks.com/index.html)
- [离线百度百科下载(2012 图文版)](https://www.pdawiki.com/forum/forum.php?mod=viewthread&tid=9599&highlight=百度百科)
- [百度百科 2012 图文版](https://pan.baidu.com/s/1epjGg)
- [最全中华古诗词数据库](https://github.com/chinese-poetry/chinese-poetry)
- [Kinds of Resources](https://github.com/fighting41love/funNLP)
- [中文历时语料库](https://github.com/liuhuanyong/ChineseDiachronicCorpus)
- [中文自然语言处理数据集。](https://github.com/InsaneLife/ChineseNLPCorpus)## Learn Here
- [google/trax: Trax — your path to advanced deep learning](https://github.com/google/trax)
- [tensorflow/models](https://github.com/tensorflow/models)
- [Transformers](https://github.com/huggingface/transformers)
- [OpenNMT/OpenNMT-py](https://github.com/OpenNMT/OpenNMT-py)
- [OpenNMT/OpenNMT-tf](https://github.com/OpenNMT/OpenNMT-tf)
- [microsoft/nlp-recipes: Natural Language Processing Best Practices & Examples](https://github.com/microsoft/nlp-recipes)## Experts
- [Michael Collins](http://www.cs.columbia.edu/~mcollins/), [Michael Collins - Google Scholar Citations](https://scholar.google.com/citations?user=DxoenfgAAAAJ&hl=en) ☆
- [Terry Koo](http://people.csail.mit.edu/maestro/), [Terry Koo - Google Scholar Citations](https://scholar.google.com/citations?user=cSTLkv8AAAAJ&hl=en)
- [Percy Liang](https://cs.stanford.edu/~pliang/), [Percy Liang - Google Scholar Citations](https://scholar.google.com/citations?user=pouyVyUAAAAJ&hl=en)
- [Luke Zettlemoyer | Computer Science & Engineering](https://www.cs.washington.edu/people/faculty/lsz), [Luke Zettlemoyer - Google Scholar Citations](https://scholar.google.com/citations?user=UjpbO6IAAAAJ&hl=en)
- [Jason Eisner - Home Page (JHU)](https://www.cs.jhu.edu/~jason/), [Jason Eisner - Google Scholar Citations](https://scholar.google.com/citations?user=tjb2UccAAAAJ&hl=en) ☆
- [Noah Smith](https://homes.cs.washington.edu/~nasmith/), [Noah A. Smith - Google Scholar Citations](https://scholar.google.com/citations?user=TjdFs3EAAAAJ&hl=en), [Noah A. Smith - Google Scholar Citations](https://scholar.google.com/citations?hl=en&user=TjdFs3EAAAAJ&view_op=list_works&sortby=pubdate)
- [David Yarowsky](http://www.cs.jhu.edu/~yarowsky/), [David Yarowsky - Google Scholar Citations](https://scholar.google.com/citations?user=gaO-vS4AAAAJ&hl=en)
- [Dan Jurafsky - Home Page](https://web.stanford.edu/~jurafsky/), [Dan Jurafsky - Google Scholar Citations](https://scholar.google.com/citations?user=uZg9l58AAAAJ&hl=en) ☆
- [Christopher Manning, Stanford NLP](https://nlp.stanford.edu/manning/), [Christopher D Manning - Google Scholar Citations](https://scholar.google.com/citations?user=1zmDOdwAAAAJ&hl=en) ☆
- [Richard Socher - Home Page](https://www.socher.org/), [Richard Socher - Google Scholar Citations](https://scholar.google.com/citations?user=FaOcyfMAAAAJ&hl=en) ☆
- [Dan Klein's Home Page](https://people.eecs.berkeley.edu/~klein/), [The Berkeley NLP Group](http://nlp.cs.berkeley.edu/publications.shtml) ☆
- [Dan Roth - Main Page](http://l2r.cs.uiuc.edu/), [Dan Roth - Google Scholar Citations](https://scholar.google.com/citations?user=E-bpPWgAAAAJ&hl=en) ☆
- [ChengXiang Zhai - Home Page](http://czhai.cs.illinois.edu/), [ChengXiang Zhai - Google Scholar Citations](https://scholar.google.com/citations?user=YU-baPIAAAAJ&hl=en)
- [Eugene Charniak's Home Page](http://cs.brown.edu/people/echarnia/), [Eugene Charniak - Google Scholar Citations](https://scholar.google.com/citations?user=_XHudx4AAAAJ&hl=en)
- [Joakim Nivre's Home Page](https://cl.lingfil.uu.se/~nivre/), [Joakim Nivre - Google Scholar Citations](https://scholar.google.co.uk/citations?user=lLBHtFUAAAAJ&hl=en) ☆
- [Philipp Koehn](http://www.cs.jhu.edu/~phi/), [Philipp Koehn - Google Scholar Citations](https://scholar.google.com/citations?user=OsIZgIYAAAAJ&hl=en)
- [James H. Martin](http://www.cs.colorado.edu/~martin/), [James H. Martin - Google Scholar Citations](https://scholar.google.com/citations?user=ZVxO6IIAAAAJ&hl=en)
- [Julia Hirschberg](http://www.cs.columbia.edu/~julia/), [Julia Hirschberg - Google Scholar Citations](https://scholar.google.com/citations?user=Qrd7FCoAAAAJ&hl=en)
- [Fernando Pereira – Google AI](https://ai.google/research/people/author1092), [Fernando Pereira - Google Scholar Citations](https://scholar.google.com/citations?user=qWDmIgIAAAAJ&hl=en) ☆
- [ryan mcdonald](https://ryanmcd.github.io/), [Ryan McDonald - Google Scholar Citations](https://scholar.google.com/citations?user=i05BhUgAAAAJ&hl=en)
- [Slav Petrov - Слав Петров](http://www.petrovi.de/), [Slav Petrov - Google Scholar Citations](https://scholar.google.com/citations?user=ipb9-GEAAAAJ&hl=en) ☆
- [Kenneth Church HomePage](http://www.cs.jhu.edu/~kchurch/), [Kenneth Ward Church - Google Scholar Citations](https://scholar.google.com/citations?user=E6aqGvYAAAAJ&hl=en)