https://github.com/kyubyong/nlp_made_easy

Explains nlp building blocks in a simple manner.
https://github.com/kyubyong/nlp_made_easy

beam-search bpe nlp seq2seq

Last synced: 22 days ago
JSON representation

Explains nlp building blocks in a simple manner.

Host: GitHub
URL: https://github.com/kyubyong/nlp_made_easy
Owner: Kyubyong
Created: 2019-01-18T00:42:53.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2019-09-23T01:27:39.000Z (almost 6 years ago)
Last Synced: 2025-05-07T21:37:07.408Z (2 months ago)
Topics: beam-search, bpe, nlp, seq2seq
Language: Jupyter Notebook
Homepage:
Size: 278 KB
Stars: 251
Watchers: 13
Forks: 36
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# NLP Made Easy

Simple code notes for explaining NLP building blocks

* [Subword Segmentation Techniques](Subword%20Segmentation%20Techniques.ipynb)
* Let's compare various tokenizers, i.e., nltk, BPE, SentencePiece, and Bert tokenizer.
* [Beam Decoding](Beam%20Decoding.ipynb)
* Beam decoding is essential for seq2seq tasks. But it's notoriously complicated to implement. Here's a relatively easy one, batchfying candidates.
* [How to get the last hidden vector of rnns properly](How%20to%20get%20the%20last%20hidden%20vector%20of%20rnns%20properly.ipynb)
* We'll see how to get the last hidden states of Rnns in Tensorflow and PyTorch.
* [Tensorflow seq2seq template based on the g2p task](Tensorflow%20seq2seq%20template%20based%20on%20g2p.ipynb)
* We'll write a simple template for seq2seq using Tensorflow. For demonstration, we attack the g2p task. G2p is a task of converting graphemes (spelling) to phonemes (pronunciation). It's a very good source for this purpose as it's simple enough for you to up and run.
* [PyTorch seq2seq template based on the g2p task](PyTorch%20seq2seq%20template%20based%20on%20the%20g2p%20task.ipynb)
* We'll write a simple template for seq2seq using PyTorch. For demonstration, we attack the g2p task. G2p is a task of converting graphemes (spelling) to phonemes (pronunciation). It's a very good source for this purpose as it's simple enough for you to up and run.
* [Attention mechanism](Work in progress)
* [POS-tagging with BERT Fine-tuning](Pos-tagging%20with%20Bert%20Fine-tuning.ipynb)
* BERT is known to be good at Sequence tagging tasks like Named Entity Recognition. Let's see if it's true for POS-tagging.
* [Dropout in a minute](Dropout%20in%20a%20minute.ipynb)
* Dropout is arguably the most popular regularization technique in deep learning. Let's check again how it work.
* Ngram LM vs. rnnlm(WIP)
* [Data Augmentation for Quora Question Pairs](Data%20Augmentation%20for%20Quora%20Question%20Pairs.ipynb)
* Let's see if it's effective to augment training data in the task of quora question pairs.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kyubyong/nlp_made_easy

Awesome Lists containing this project

README