Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/eagle705/awesome-nlp-note

A curated list of resources dedicated to NLP (paper, blogs, note and etc)
https://github.com/eagle705/awesome-nlp-note

List: awesome-nlp-note

Last synced: about 1 month ago
JSON representation

A curated list of resources dedicated to NLP (paper, blogs, note and etc)

Awesome Lists containing this project

README

        

# awesome-nlp-note

[![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)

A curated list of resources dedicated to Natural Language Processing and etc(paper, blogs and notes).
*note*: There is some materials which is not directly related to nlp such as python skills.

#### README Reference:
- https://github.com/keon/awesome-nlp
- https://github.com/papower1/Awesome-Korean-NLP-Papers
- https://github.com/Kyubyong/nlp_tasks

## Contents
* [Blogs](#blogs)
* [Github](#github)
* [Research Summaries and Trends](#research-summaries-and-trends)
* [Environment](#environment)
* [NLP in Korean](#nlp-in-korean)
* [Datasets](#datasets)
* [Tutorials](#tutorials)
* [Videos and Courses](#videos-and-online-courses)
* [Libraries](#libraries)
* [Annotation Tools](#annotation-tools)

## Blogs & Youtube
- [핑퐁 BERT](https://blog.pingpong.us/dialog-bert-pretrain/?fbclid=IwAR3UQ1VBnkf8Fcpqa5kzGknrs2PySQJCB97v0UOHUvOk6CiYCfASo4Tosr8)
- [핑퐁 띄어쓰기](https://blog.pingpong.us/spacing/?fbclid=IwAR21WUYVaJZ8HijcyGx6tfIiw_oTrfhrUXUYBGypNCG0qtNhlQmJQByJzZI)
- [dsindex's blog](https://dsindex.github.io/)
- [Kangwon University's NLP course in Korean](http://cs.kangwon.ac.kr/~leeck/NLP/)
- [파이썬 키워드 인자 *](https://sjquant.tistory.com/31?fbclid=IwAR3rCNBXC5-wNCZbwW0XV9AWuAKjFgLFlmPdd73f9hpOqNtysx60VzqIL54)
- [딥러닝 용어사전](http://www.wildml.com/deep-learning-glossary/)
- [arXIV 작성법](https://brunch.co.kr/@gimmesilver/34)
- [박규병님의 Deep Learning Career FAQ](https://github.com/Kyubyong/dl_career_faq?fbclid=IwAR3VDb4oHVr82WGw9H4WN9k34BJ906u8Oah3LGUawu24L0pBMEbeKAO301o)
- [Algorith Youtube Channel](https://www.youtube.com/user/damazzang/videos)
- [Structing your first NLP project](https://tykimos.github.io/warehouse/2019-7-4-ISS_2nd_Deep_Learning_Conference_All_Together_aisolab_file.pdf)
- [Pypapago 개발기](https://beomi.github.io/2019/07/08/Papago-API-with-Python/)
- [아나콘다 환경복사](https://hiseon.me/python/anaconda-env-export/)
- [스타트업 개발자가 리눅스 서버에 들어가면 언제나 하는 작업들](https://www.mimul.com/blog/linux-server-operations/?fbclid=IwAR0VJR1YvkLy_xXTw9ltaEVsOvakysUBC4WRceVheKW5T2q-BH2jb8GqtFA)

## GitHub
- [Chatbot convai2 (with retrieval via elastic)](https://github.com/atselousov/transformer_chatbot)
- [DL dev to production](https://github.com/alirezadir/Production-Level-Deep-Learning)
- [NL to SQL by BERT](https://github.com/guotong1988/NL2SQL-BERT)
- [제주어 번역 및 음성 합성(박규병님)](https://github.com/kakaobrain/jejueo)
- [beam search + nlp_mad_easy(박규병님)](https://github.com/Kyubyong/nlp_made_easy/blob/master/Beam%20Decoding.ipynb)
- [pypapago nmt lib](https://github.com/Beomi/pypapago)
- [makcedward/nlpaug(NLP & Signal augmentation)](https://github.com/makcedward/nlpaug)
- [lovit의 패스트캠퍼스, 자연어처리를 위한 머신러닝 github](https://github.com/lovit/fastcampus_textml_blogs)
- [한국어 문서 -> 문장 분류기 (중요)](https://github.com/likejazz/korean-sentence-splitter)
- [핑퐁에서 만든 띄어쓰기 모델_Chatspace](https://github.com/pingpong-ai/chatspace?fbclid=IwAR3LQCIBnRNyMMUfh3SzrYc_DMIRnSzQCVtjpSzQaXk-prpzlDTsRVqndb4)
- [Chatbot with Crawler](https://github.com/gusdnd852/Chatbot)
- [NLP RedditSota](https://github.com/RedditSota/state-of-the-art-result-for-machine-learning-problems#nlp)
- [yandex 강의](https://github.com/yandexdataschool/nlp_course)
- [한글 자모 분리 툴킷](https://github.com/bluedisk/hangul-toolkit)
- [파이썬 오픈소스 챗봇 RasaHQ](https://github.com/RasaHQ/rasa)
- [Customized KoNLPy](https://github.com/lovit/customized_konlpy)
- [용래님 pytorch Transformer](https://github.com/dreamgonfly/Transformer-pytorch)
- [Korean NER Dataset Github](https://github.com/machinereading/KoreanNERCorpus)
- [송영숙님 Korean Chitchat Dataset with Sentiment](https://github.com/songys/Chatbot_data)
- [Chatbot API open source example](https://github.com/alfredfrancis/ai-chatbot-framework)
- [Awesome Python](https://github.com/JoMingyu/--Awesome-Python--)
- [Yunjey의 PyTorch Tutorial](https://github.com/yunjey/pytorch-tutorial)
- [개발자 기술 면접 정리](https://github.com/JaeYeopHan/Interview_Question_for_Beginner)
- [NER_TensorFlow_2017_HCLT](https://github.com/JudeLee19/korean_ner_tagging_challenge)
- [이기창님 깃헙 블로그 소스](https://github.com/ratsgo/ratsgo.github.io)
- [현재 쓰고 있는 깃헙 블로그 소스](https://github.com/isme2n/isme2n.github.io)
- [PyTorch Wrapper, pytorch-lightning](https://github.com/williamFalcon/pytorch-lightning)
- [Pycon 2019 Tutorial GluonNLP tutorial](https://github.com/seujung/gluonnlp_tutorial?fbclid=IwAR1dVxeXYp06Zr4h4OFjL38W6enZ4SjJd27n7MSkmt4v9wKOtj9Sol5B3Es)
- [matplotlib + 한글](https://financedata.github.io/posts/matplotlib-hangul-for-ubuntu-linux.html?fbclid=IwAR0WNVxF5cMRLUhdug10fWGdZzwZ1YES88xD4UPW4pOFSvQgovu_xf5Kb4c)
- [API basd Chatbot example](https://github.com/gusdnd852/Chatbot)
- [NLP tutorial by lyeoni](https://github.com/lyeoni/nlp-tutorial)
- [tmux 셋팅](https://github.com/gpakosz/.tmux)
- [CRF!!! harvardnlp/pytorch-struct](https://github.com/harvardnlp/pytorch-struct)
- [RL Chatbot1](https://github.com/pochih/RL-Chatbot)
- [RL Chatbot2](https://github.com/maxbren/GO-Bot-DRL)
- [Evaluation Sentence Embedding (SentEval)](https://github.com/facebookresearch/SentEval)
- [python-mecab-ko](https://github.com/jonghwanhyeon/python-mecab-ko)

## Research Summaries and Trends

* [NLP-Overview](https://nlpoverview.com/) is an up-to-date overview of deep learning techniques applied to NLP, including theory, implementations, applications, and state-of-the-art results. This is a great Deep NLP Introduction for researchers.
* [NLP-Progress](https://nlpprogress.com/) tracks the progress in Natural Language Processing, including the datasets and the current state-of-the-art for the most common NLP tasks
* [NLP's ImageNet moment has arrived](https://thegradient.pub/nlp-imagenet/)
* [ACL 2018 Highlights: Understanding Representation and Evaluation in More Challenging Settings](http://ruder.io/acl-2018-highlights/)
* [Four deep learning trends from ACL 2017. Part One: Linguistic Structure and Word Embeddings](https://www.abigailsee.com/2017/08/30/four-deep-learning-trends-from-acl-2017-part-1.html)
* [Four deep learning trends from ACL 2017. Part Two: Interpretability and Attention](https://www.abigailsee.com/2017/08/30/four-deep-learning-trends-from-acl-2017-part-2.html)
* [Highlights of EMNLP 2017: Exciting Datasets, Return of the Clusters, and More!](http://blog.aylien.com/highlights-emnlp-2017-exciting-datasets-return-clusters/)
* [Deep Learning for Natural Language Processing (NLP): Advancements & Trends](https://tryolabs.com/blog/2017/12/12/deep-learning-for-nlp-advancements-and-trends-in-2017/?utm_campaign=Revue%20newsletter&utm_medium=Newsletter&utm_source=The%20Wild%20Week%20in%20AI)
* [Survey of the State of the Art in Natural Language Generation](https://arxiv.org/abs/1703.09902)

## Environment
- [Docker](http://moducon.kr/2018/wp-content/uploads/sites/2/2018/12/leesangsoo_slide.pdf)

## NLP in Korean

[Back to Top](#contents)

### Libraries

- [KoNLPy](http://konlpy.org) - Python package for Korean natural language processing.
- [Mecab (Korean)](https://eunjeon.blogspot.com/) - C++ library for Korean NLP
- [KoalaNLP](https://koalanlp.github.io/koalanlp/) - Scala library for Korean Natural Language Processing.

### Datasets
- [Korean WordNet](http://wordnet.kaist.ac.kr/)
- [KAIST Corpus](http://semanticweb.kaist.ac.kr/home/index.php/KAIST_Corpus) - A corpus from the Korea Advanced Institute of Science and Technology in Korean.
- [Naver Sentiment Movie Corpus in Korean](https://github.com/e9t/nsmc/)
- [Chosun Ilbo archive](http://srchdb1.chosun.com/pdf/i_archive/) - dataset in Korean from one of the major newspapers in South Korea, the Chosun Ilbo.
- [NER dataset from 한국해양대학교 자연언어처리연구실](https://github.com/kmounlp/NER)
- [PAWS and PAWS-X: Two New Datasets to Improve Natural Language Understanding Models_( Paraphrase Adversaries from Word Scrambling)](https://ai.googleblog.com/2019/10/releasing-paws-and-paws-x-two-new.html?m=1)
- [conversational-AI-atasets(영어 대화 데이터셋)](https://github.com/PolyAI-LDN/conversational-datasets)

## Tutorials
[Back to Top](#contents)

### Videos and Online Courses
[Back to Top](#contents)

* [Intro to Artificial Intelligence](https://www.udacity.com/course/intro-to-artificial-intelligence--cs271) - Udacity course which touches upon NLP as well
* [Deep Natural Language Processing](https://github.com/oxford-cs-deepnlp-2017/lectures) - Lectures series from Oxford
* [Deep Learning for Natural Language Processing (cs224-n)](https://web.stanford.edu/class/cs224n/) - Richard Socher and Christopher Manning's Stanford Course
* [Neural Networks for NLP](http://phontron.com/class/nn4nlp2017/) - Carnegie Mellon Language Technology Institute there
* [Deep NLP Course](https://github.com/yandexdataschool/nlp_course) by Yandex Data School, covering important ideas from text embedding to machine translation including sequence modeling, language models and so on.

## Libraries

[Back to Top](#contents)

* **Python** - Python NLP Libraries | [Back to Top](#contents)

- [TextBlob](http://textblob.readthedocs.org/) - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of [Natural Language Toolkit (NLTK)](https://www.nltk.org/) and [Pattern](https://github.com/clips/pattern), and plays nicely with both :+1:
- [spaCy](https://github.com/explosion/spaCy) - Industrial strength NLP with Python and Cython :+1:
- [textacy](https://github.com/chartbeat-labs/textacy) - Higher level NLP built on spaCy
- [gensim](https://radimrehurek.com/gensim/index.html) - Python library to conduct unsupervised semantic modelling from plain text :+1:
- [scattertext](https://github.com/JasonKessler/scattertext) - Python library to produce d3 visualizations of how language differs between corpora
- [GluonNLP](https://github.com/dmlc/gluon-nlp) - A deep learning toolkit for NLP, built on MXNet/Gluon, for research prototyping and industrial deployment of state-of-the-art models on a wide range of NLP tasks.
- [AllenNLP](https://github.com/allenai/allennlp) - An NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks.
- [PyTorch-NLP](https://github.com/PetrochukM/PyTorch-NLP) - NLP research toolkit designed to support rapid prototyping with better data loaders, word vector loaders, neural network layer representations, common NLP metrics such as BLEU
- [Rosetta](https://github.com/columbia-applied-data-science/rosetta) - Text processing tools and wrappers (e.g. Vowpal Wabbit)
- [PyNLPl](https://github.com/proycon/pynlpl) - Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for [FoLiA](https://proycon.github.io/folia/), but also ARPA language models, Moses phrasetables, GIZA++ alignments.
- [jPTDP](https://github.com/datquocnguyen/jPTDP) - A toolkit for joint part-of-speech (POS) tagging and dependency parsing. jPTDP provides pre-trained models for 40+ languages.
- [BigARTM](https://github.com/bigartm/bigartm) - a fast library for topic modelling
- [Snips NLU](https://github.com/snipsco/snips-nlu) - A production ready library for intent parsing
- [Chazutsu](https://github.com/chakki-works/chazutsu) - A library for downloading&parsing standard NLP research datasets
- [Word Forms](https://github.com/gutfeeling/word_forms) - Word forms can accurately generate all possible forms of an English word
- [Multilingual Latent Dirichlet Allocation (LDA)](https://github.com/ArtificiAI/Multilingual-Latent-Dirichlet-Allocation-LDA) - A multilingual and extensible document clustering pipeline
- [NLP Architect](https://github.com/NervanaSystems/nlp-architect) - A library for exploring the state-of-the-art deep learning topologies and techniques for NLP and NLU
- [Flair](https://github.com/zalandoresearch/flair) - A very simple framework for state-of-the-art multilingual NLP built on PyTorch. Includes BERT, ELMo and Flair embeddings.
- [Kashgari](https://github.com/BrikerMan/Kashgari) - Simple, Keras-powered multilingual NLP framework, allows you to build your models in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS) and text classification tasks. Includes BERT and word2vec embedding.

### Annotation Tools
- [Label Studio](https://github.com/heartexlabs/label-studio?fbclid=IwAR30j2OmVMcB-TenAczkNwwUsObi8JAOpTNxGFzrmMrJ2pd4-gg_S0D3S78) is an open-source, configurable data annotation tool. Its purpose is to enable you to label different types of data using the most convenient interface with a standardized output format.
- [brat](https://brat.nlplab.org/) - brat rapid annotation tool is an online environment for collaborative text annotation
- [LIDA: Lightweight Interactive Dialogue Annotator (in EMNLP 2019)](https://github.com/Wluper/lida) - LIDA is an open source dialogue annotation system which supports the full pipeline of dialogue annotation from dialogue / turn segmentation from raw text
- [GATE](https://gate.ac.uk/overview.html) - General Architecture and Text Engineering is 15+ years old, free and open source
- [Anafora](https://github.com/weitechen/anafora) is free and open source, web-based raw text annotation tool
- [doccano](https://github.com/chakki-works/doccano) - doccano is free, open-source, and provides annotation features for text classification, sequence labeling and sequence to sequence
- [tagtog](https://www.tagtog.net/), costs $
- [prodigy](https://prodi.gy/) is an annotation tool powered by active learning, costs $
- [LightTag](https://lighttag.io) - Hosted and managed text annotation tool for teams, costs $
- [rstWeb](https://corpling.uis.georgetown.edu/rstweb/info/) - open source local or online tool for discourse tree annotations
- [GitDox](https://corpling.uis.georgetown.edu/gitdox/) - open source server annotation tool with GitHub version control and validation for XML data and collaborative spreadsheet grids