Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kyubyong/nlp_made_easy
Explains nlp building blocks in a simple manner.
https://github.com/kyubyong/nlp_made_easy
beam-search bpe nlp seq2seq
Last synced: about 1 month ago
JSON representation
Explains nlp building blocks in a simple manner.
- Host: GitHub
- URL: https://github.com/kyubyong/nlp_made_easy
- Owner: Kyubyong
- Created: 2019-01-18T00:42:53.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2019-09-23T01:27:39.000Z (over 5 years ago)
- Last Synced: 2024-12-14T02:41:48.793Z (about 2 months ago)
- Topics: beam-search, bpe, nlp, seq2seq
- Language: Jupyter Notebook
- Homepage:
- Size: 278 KB
- Stars: 251
- Watchers: 14
- Forks: 36
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# NLP Made Easy
Simple code notes for explaining NLP building blocks
* [Subword Segmentation Techniques](Subword%20Segmentation%20Techniques.ipynb)
* Let's compare various tokenizers, i.e., nltk, BPE, SentencePiece, and Bert tokenizer.
* [Beam Decoding](Beam%20Decoding.ipynb)
* Beam decoding is essential for seq2seq tasks. But it's notoriously complicated to implement. Here's a relatively easy one, batchfying candidates.
* [How to get the last hidden vector of rnns properly](How%20to%20get%20the%20last%20hidden%20vector%20of%20rnns%20properly.ipynb)
* We'll see how to get the last hidden states of Rnns in Tensorflow and PyTorch.
* [Tensorflow seq2seq template based on the g2p task](Tensorflow%20seq2seq%20template%20based%20on%20g2p.ipynb)
* We'll write a simple template for seq2seq using Tensorflow. For demonstration, we attack the g2p task. G2p is a task of converting graphemes (spelling) to phonemes (pronunciation). It's a very good source for this purpose as it's simple enough for you to up and run.
* [PyTorch seq2seq template based on the g2p task](PyTorch%20seq2seq%20template%20based%20on%20the%20g2p%20task.ipynb)
* We'll write a simple template for seq2seq using PyTorch. For demonstration, we attack the g2p task. G2p is a task of converting graphemes (spelling) to phonemes (pronunciation). It's a very good source for this purpose as it's simple enough for you to up and run.
* [Attention mechanism](Work in progress)
* [POS-tagging with BERT Fine-tuning](Pos-tagging%20with%20Bert%20Fine-tuning.ipynb)
* BERT is known to be good at Sequence tagging tasks like Named Entity Recognition. Let's see if it's true for POS-tagging.
* [Dropout in a minute](Dropout%20in%20a%20minute.ipynb)
* Dropout is arguably the most popular regularization technique in deep learning. Let's check again how it work.
* Ngram LM vs. rnnlm(WIP)
* [Data Augmentation for Quora Question Pairs](Data%20Augmentation%20for%20Quora%20Question%20Pairs.ipynb)
* Let's see if it's effective to augment training data in the task of quora question pairs.