https://github.com/eagle705/awesome-nlp-note

A curated list of resources dedicated to NLP (paper, blogs, note and etc)
https://github.com/eagle705/awesome-nlp-note
Last synced: 5 months ago
JSON representation
A curated list of resources dedicated to NLP (paper, blogs, note and etc)
Host: GitHub
URL: https://github.com/eagle705/awesome-nlp-note
Owner: eagle705
License: cc0-1.0
Created: 2019-08-09T08:34:48.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2019-11-30T14:27:50.000Z (about 6 years ago)
Last Synced: 2024-11-19T14:02:08.656Z (about 1 year ago)
Homepage:
Size: 19.5 KB
Stars: 13
Watchers: 3
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: contributing.md
- License: LICENSE
Awesome Lists containing this project

ultimate-awesome - awesome-nlp-note - A curated list of resources dedicated to NLP (paper, blogs, note and etc). (Programming Language Lists / Python Lists)
README

          # awesome-nlp-note

[![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)

A curated list of resources dedicated to Natural Language Processing and etc(paper, blogs and notes).

*note*: There is some materials which is not directly related to nlp such as python skills.

#### README Reference: 

- https://github.com/keon/awesome-nlp

- https://github.com/papower1/Awesome-Korean-NLP-Papers

- https://github.com/Kyubyong/nlp_tasks

## Contents

* [Blogs](#blogs)

* [Github](#github)

* [Research Summaries and Trends](#research-summaries-and-trends)

* [Environment](#environment)

* [NLP in Korean](#nlp-in-korean)

  * [Datasets](#datasets)

* [Tutorials](#tutorials)

  * [Videos and Courses](#videos-and-online-courses)

* [Libraries](#libraries)

* [Annotation Tools](#annotation-tools)

## Blogs & Youtube

- [핑퐁 BERT](https://blog.pingpong.us/dialog-bert-pretrain/?fbclid=IwAR3UQ1VBnkf8Fcpqa5kzGknrs2PySQJCB97v0UOHUvOk6CiYCfASo4Tosr8)

- [핑퐁 띄어쓰기](https://blog.pingpong.us/spacing/?fbclid=IwAR21WUYVaJZ8HijcyGx6tfIiw_oTrfhrUXUYBGypNCG0qtNhlQmJQByJzZI)

- [dsindex's blog](https://dsindex.github.io/)

- [Kangwon University's NLP course in Korean](http://cs.kangwon.ac.kr/~leeck/NLP/)

- [파이썬 키워드 인자 *](https://sjquant.tistory.com/31?fbclid=IwAR3rCNBXC5-wNCZbwW0XV9AWuAKjFgLFlmPdd73f9hpOqNtysx60VzqIL54)

- [딥러닝 용어사전](http://www.wildml.com/deep-learning-glossary/)

- [arXIV 작성법](https://brunch.co.kr/@gimmesilver/34)

- [박규병님의 Deep Learning Career FAQ](https://github.com/Kyubyong/dl_career_faq?fbclid=IwAR3VDb4oHVr82WGw9H4WN9k34BJ906u8Oah3LGUawu24L0pBMEbeKAO301o)

- [Algorith Youtube Channel](https://www.youtube.com/user/damazzang/videos)

- [Structing your first NLP project](https://tykimos.github.io/warehouse/2019-7-4-ISS_2nd_Deep_Learning_Conference_All_Together_aisolab_file.pdf)

- [Pypapago 개발기](https://beomi.github.io/2019/07/08/Papago-API-with-Python/)

- [아나콘다 환경복사](https://hiseon.me/python/anaconda-env-export/)

- [스타트업 개발자가 리눅스 서버에 들어가면 언제나 하는 작업들](https://www.mimul.com/blog/linux-server-operations/?fbclid=IwAR0VJR1YvkLy_xXTw9ltaEVsOvakysUBC4WRceVheKW5T2q-BH2jb8GqtFA)

## GitHub

- [Chatbot convai2 (with retrieval via elastic)](https://github.com/atselousov/transformer_chatbot)

- [DL dev to production](https://github.com/alirezadir/Production-Level-Deep-Learning)

- [NL to SQL by BERT](https://github.com/guotong1988/NL2SQL-BERT)

- [제주어 번역 및 음성 합성(박규병님)](https://github.com/kakaobrain/jejueo)

- [beam search + nlp_mad_easy(박규병님)](https://github.com/Kyubyong/nlp_made_easy/blob/master/Beam%20Decoding.ipynb)

- [pypapago nmt lib](https://github.com/Beomi/pypapago)

- [makcedward/nlpaug(NLP & Signal augmentation)](https://github.com/makcedward/nlpaug)

- [lovit의 패스트캠퍼스, 자연어처리를 위한 머신러닝 github](https://github.com/lovit/fastcampus_textml_blogs)

- [한국어 문서 -> 문장 분류기 (중요)](https://github.com/likejazz/korean-sentence-splitter)

- [핑퐁에서 만든 띄어쓰기 모델_Chatspace](https://github.com/pingpong-ai/chatspace?fbclid=IwAR3LQCIBnRNyMMUfh3SzrYc_DMIRnSzQCVtjpSzQaXk-prpzlDTsRVqndb4)

- [Chatbot with Crawler](https://github.com/gusdnd852/Chatbot)

- [NLP RedditSota](https://github.com/RedditSota/state-of-the-art-result-for-machine-learning-problems#nlp)

- [yandex 강의](https://github.com/yandexdataschool/nlp_course)

- [한글 자모 분리 툴킷](https://github.com/bluedisk/hangul-toolkit)

- [파이썬 오픈소스 챗봇 RasaHQ](https://github.com/RasaHQ/rasa)

- [Customized KoNLPy](https://github.com/lovit/customized_konlpy)

- [용래님 pytorch Transformer](https://github.com/dreamgonfly/Transformer-pytorch)

- [Korean NER Dataset Github](https://github.com/machinereading/KoreanNERCorpus)

- [송영숙님 Korean Chitchat Dataset with Sentiment](https://github.com/songys/Chatbot_data)

- [Chatbot API open source example](https://github.com/alfredfrancis/ai-chatbot-framework)

- [Awesome Python](https://github.com/JoMingyu/--Awesome-Python--)

- [Yunjey의 PyTorch Tutorial](https://github.com/yunjey/pytorch-tutorial)

- [개발자 기술 면접 정리](https://github.com/JaeYeopHan/Interview_Question_for_Beginner)

- [NER_TensorFlow_2017_HCLT](https://github.com/JudeLee19/korean_ner_tagging_challenge)

- [이기창님 깃헙 블로그 소스](https://github.com/ratsgo/ratsgo.github.io)

- [현재 쓰고 있는 깃헙 블로그 소스](https://github.com/isme2n/isme2n.github.io)

- [PyTorch Wrapper, pytorch-lightning](https://github.com/williamFalcon/pytorch-lightning)

- [Pycon 2019 Tutorial GluonNLP tutorial](https://github.com/seujung/gluonnlp_tutorial?fbclid=IwAR1dVxeXYp06Zr4h4OFjL38W6enZ4SjJd27n7MSkmt4v9wKOtj9Sol5B3Es)

- [matplotlib + 한글](https://financedata.github.io/posts/matplotlib-hangul-for-ubuntu-linux.html?fbclid=IwAR0WNVxF5cMRLUhdug10fWGdZzwZ1YES88xD4UPW4pOFSvQgovu_xf5Kb4c)

- [API basd Chatbot example](https://github.com/gusdnd852/Chatbot)

- [NLP tutorial by lyeoni](https://github.com/lyeoni/nlp-tutorial)

- [tmux 셋팅](https://github.com/gpakosz/.tmux)

- [CRF!!! harvardnlp/pytorch-struct](https://github.com/harvardnlp/pytorch-struct)

- [RL Chatbot1](https://github.com/pochih/RL-Chatbot)

- [RL Chatbot2](https://github.com/maxbren/GO-Bot-DRL)

- [Evaluation Sentence Embedding (SentEval)](https://github.com/facebookresearch/SentEval)

- [python-mecab-ko](https://github.com/jonghwanhyeon/python-mecab-ko)

## Research Summaries and Trends

* [NLP-Overview](https://nlpoverview.com/) is an up-to-date overview of deep learning techniques applied to NLP, including theory, implementations, applications, and state-of-the-art results. This is a great Deep NLP Introduction for researchers.

* [NLP-Progress](https://nlpprogress.com/) tracks the progress in Natural Language Processing, including the datasets and the current state-of-the-art for the most common NLP tasks

* [NLP's ImageNet moment has arrived](https://thegradient.pub/nlp-imagenet/)

* [ACL 2018 Highlights: Understanding Representation and Evaluation in More Challenging Settings](http://ruder.io/acl-2018-highlights/)

* [Four deep learning trends from ACL 2017. Part One: Linguistic Structure and Word Embeddings](https://www.abigailsee.com/2017/08/30/four-deep-learning-trends-from-acl-2017-part-1.html)

* [Four deep learning trends from ACL 2017. Part Two: Interpretability and Attention](https://www.abigailsee.com/2017/08/30/four-deep-learning-trends-from-acl-2017-part-2.html)

* [Highlights of EMNLP 2017: Exciting Datasets, Return of the Clusters, and More!](http://blog.aylien.com/highlights-emnlp-2017-exciting-datasets-return-clusters/)

* [Deep Learning for Natural Language Processing (NLP): Advancements & Trends](https://tryolabs.com/blog/2017/12/12/deep-learning-for-nlp-advancements-and-trends-in-2017/?utm_campaign=Revue%20newsletter&utm_medium=Newsletter&utm_source=The%20Wild%20Week%20in%20AI)

* [Survey of the State of the Art in Natural Language Generation](https://arxiv.org/abs/1703.09902)

## Environment

- [Docker](http://moducon.kr/2018/wp-content/uploads/sites/2/2018/12/leesangsoo_slide.pdf)

## NLP in Korean

[Back to Top](#contents)

### Libraries

- [KoNLPy](http://konlpy.org) - Python package for Korean natural language processing.

- [Mecab (Korean)](https://eunjeon.blogspot.com/) - C++ library for Korean NLP

- [KoalaNLP](https://koalanlp.github.io/koalanlp/) - Scala library for Korean Natural Language Processing.

### Datasets

- [Korean WordNet](http://wordnet.kaist.ac.kr/)

- [KAIST Corpus](http://semanticweb.kaist.ac.kr/home/index.php/KAIST_Corpus) - A corpus from the Korea Advanced Institute of Science and Technology in Korean.

- [Naver Sentiment Movie Corpus in Korean](https://github.com/e9t/nsmc/)

- [Chosun Ilbo archive](http://srchdb1.chosun.com/pdf/i_archive/) - dataset in Korean from one of the major newspapers in South Korea, the Chosun Ilbo.

- [NER dataset from 한국해양대학교 자연언어처리연구실](https://github.com/kmounlp/NER)

- [PAWS and PAWS-X: Two New Datasets to Improve Natural Language Understanding Models_( Paraphrase Adversaries from Word Scrambling)](https://ai.googleblog.com/2019/10/releasing-paws-and-paws-x-two-new.html?m=1)

- [conversational-AI-atasets(영어 대화 데이터셋)](https://github.com/PolyAI-LDN/conversational-datasets)

## Tutorials

[Back to Top](#contents)

### Videos and Online Courses

[Back to Top](#contents)

* [Intro to Artificial Intelligence](https://www.udacity.com/course/intro-to-artificial-intelligence--cs271) - Udacity course which touches upon NLP as well

* [Deep Natural Language Processing](https://github.com/oxford-cs-deepnlp-2017/lectures) - Lectures series from Oxford

* [Deep Learning for Natural Language Processing (cs224-n)](https://web.stanford.edu/class/cs224n/) - Richard Socher and Christopher Manning's Stanford Course

* [Neural Networks for NLP](http://phontron.com/class/nn4nlp2017/) - Carnegie Mellon Language Technology Institute there

* [Deep NLP Course](https://github.com/yandexdataschool/nlp_course) by Yandex Data School, covering important ideas from text embedding to machine translation including sequence modeling, language models and so on.

## Libraries

[Back to Top](#contents)

*  **Python** - Python NLP Libraries | [Back to Top](#contents)

  - [TextBlob](http://textblob.readthedocs.org/) - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of [Natural Language Toolkit (NLTK)](https://www.nltk.org/) and [Pattern](https://github.com/clips/pattern), and plays nicely with both :+1:

  - [spaCy](https://github.com/explosion/spaCy) - Industrial strength NLP with Python and Cython :+1:

    - [textacy](https://github.com/chartbeat-labs/textacy) - Higher level NLP built on spaCy

  - [gensim](https://radimrehurek.com/gensim/index.html) - Python library to conduct unsupervised semantic modelling from plain text :+1:

  - [scattertext](https://github.com/JasonKessler/scattertext) - Python library to produce d3 visualizations of how language differs between corpora

  - [GluonNLP](https://github.com/dmlc/gluon-nlp) - A deep learning toolkit for NLP, built on MXNet/Gluon, for research prototyping and industrial deployment of state-of-the-art models on a wide range of NLP tasks.

  - [AllenNLP](https://github.com/allenai/allennlp) - An NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks.

  - [PyTorch-NLP](https://github.com/PetrochukM/PyTorch-NLP) - NLP research toolkit designed to support rapid prototyping with better data loaders, word vector loaders, neural network layer representations, common NLP metrics such as BLEU

  - [Rosetta](https://github.com/columbia-applied-data-science/rosetta) - Text processing tools and wrappers (e.g. Vowpal Wabbit)

  - [PyNLPl](https://github.com/proycon/pynlpl) - Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for [FoLiA](https://proycon.github.io/folia/), but also ARPA language models, Moses phrasetables, GIZA++ alignments.

  - [jPTDP](https://github.com/datquocnguyen/jPTDP) - A toolkit for joint part-of-speech (POS) tagging and dependency parsing. jPTDP provides pre-trained models for 40+ languages.

  - [BigARTM](https://github.com/bigartm/bigartm) - a fast library for topic modelling

  - [Snips NLU](https://github.com/snipsco/snips-nlu) - A production ready library for intent parsing

  - [Chazutsu](https://github.com/chakki-works/chazutsu) - A library for downloading&parsing standard NLP research datasets

  - [Word Forms](https://github.com/gutfeeling/word_forms) - Word forms can accurately generate all possible forms of an English word

  - [Multilingual Latent Dirichlet Allocation (LDA)](https://github.com/ArtificiAI/Multilingual-Latent-Dirichlet-Allocation-LDA) - A multilingual and extensible document clustering pipeline

  - [NLP Architect](https://github.com/NervanaSystems/nlp-architect) - A library for exploring the state-of-the-art deep learning topologies and techniques for NLP and NLU

  - [Flair](https://github.com/zalandoresearch/flair) - A very simple framework for state-of-the-art multilingual NLP built on PyTorch. Includes BERT, ELMo and Flair embeddings.

  - [Kashgari](https://github.com/BrikerMan/Kashgari) - Simple, Keras-powered multilingual NLP framework, allows you to build your models in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS) and text classification tasks. Includes BERT and word2vec embedding.

### Annotation Tools

- [Label Studio](https://github.com/heartexlabs/label-studio?fbclid=IwAR30j2OmVMcB-TenAczkNwwUsObi8JAOpTNxGFzrmMrJ2pd4-gg_S0D3S78)  is an open-source, configurable data annotation tool. Its purpose is to enable you to label different types of data using the most convenient interface with a standardized output format.

- [brat](https://brat.nlplab.org/) - brat rapid annotation tool is an online environment for collaborative text annotation

- [LIDA: Lightweight Interactive Dialogue Annotator (in EMNLP 2019)](https://github.com/Wluper/lida) - LIDA is an open source dialogue annotation system which supports the full pipeline of dialogue annotation from dialogue / turn segmentation from raw text

- [GATE](https://gate.ac.uk/overview.html) - General Architecture and Text Engineering is 15+ years old, free and open source

- [Anafora](https://github.com/weitechen/anafora) is free and open source, web-based raw text annotation tool

- [doccano](https://github.com/chakki-works/doccano) - doccano is free, open-source, and provides annotation features for text classification, sequence labeling and sequence to sequence

- [tagtog](https://www.tagtog.net/), costs $

- [prodigy](https://prodi.gy/) is an annotation tool powered by active learning, costs $

- [LightTag](https://lighttag.io) - Hosted and managed text annotation tool for teams, costs $

- [rstWeb](https://corpling.uis.georgetown.edu/rstweb/info/) - open source local or online tool for discourse tree annotations

- [GitDox](https://corpling.uis.georgetown.edu/gitdox/) - open source server annotation tool with GitHub version control and validation for XML data and collaborative spreadsheet grids
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/eagle705/awesome-nlp-note

Awesome Lists containing this project

README