{"id":19398833,"url":"https://github.com/kyubyong/nlp_made_easy","last_synced_at":"2025-07-21T10:02:51.677Z","repository":{"id":85026743,"uuid":"166318444","full_name":"Kyubyong/nlp_made_easy","owner":"Kyubyong","description":"Explains nlp building blocks in a simple manner.","archived":false,"fork":false,"pushed_at":"2019-09-23T01:27:39.000Z","size":285,"stargazers_count":251,"open_issues_count":0,"forks_count":36,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-06-22T06:35:55.317Z","etag":null,"topics":["beam-search","bpe","nlp","seq2seq"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Kyubyong.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-01-18T00:42:53.000Z","updated_at":"2025-03-14T10:53:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"fb9ef817-88a9-4bdb-ad03-0ee23f6e4ae1","html_url":"https://github.com/Kyubyong/nlp_made_easy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Kyubyong/nlp_made_easy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kyubyong%2Fnlp_made_easy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kyubyong%2Fnlp_made_easy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kyubyong%2Fnlp_made_easy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kyubyong%2Fnlp_made_easy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Kyubyong","download_url":"https://codeload.github.com/Kyubyong/nlp_made_easy/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kyubyong%2Fnlp_made_easy/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266278169,"owners_count":23904038,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beam-search","bpe","nlp","seq2seq"],"created_at":"2024-11-10T11:07:24.344Z","updated_at":"2025-07-21T10:02:51.655Z","avatar_url":"https://github.com/Kyubyong.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NLP Made Easy\n\nSimple code notes for explaining NLP building blocks\n\n* [Subword Segmentation Techniques](Subword%20Segmentation%20Techniques.ipynb)\n  * Let's compare various tokenizers, i.e., nltk, BPE, SentencePiece, and Bert tokenizer.\n* [Beam Decoding](Beam%20Decoding.ipynb)\n  * Beam decoding is essential for seq2seq tasks. But it's notoriously complicated to implement. Here's a relatively easy one, batchfying candidates.\n* [How to get the last hidden vector of rnns properly](How%20to%20get%20the%20last%20hidden%20vector%20of%20rnns%20properly.ipynb)\n  * We'll see how to get the last hidden states of Rnns in Tensorflow and PyTorch.\n* [Tensorflow seq2seq template based on the g2p task](Tensorflow%20seq2seq%20template%20based%20on%20g2p.ipynb)\n  * We'll write a simple template for seq2seq using Tensorflow. For demonstration, we attack the g2p task. G2p is a task of converting graphemes (spelling) to phonemes (pronunciation). It's a very good source for this purpose as it's simple enough for you to up and run.\n* [PyTorch seq2seq template based on the g2p task](PyTorch%20seq2seq%20template%20based%20on%20the%20g2p%20task.ipynb)\n  * We'll write a simple template for seq2seq using PyTorch. For demonstration, we attack the g2p task. G2p is a task of converting graphemes (spelling) to phonemes (pronunciation). It's a very good source for this purpose as it's simple enough for you to up and run.\n* [Attention mechanism](Work in progress)\n* [POS-tagging with BERT Fine-tuning](Pos-tagging%20with%20Bert%20Fine-tuning.ipynb)\n  * BERT is known to be good at Sequence tagging tasks like Named Entity Recognition. Let's see if it's true for POS-tagging.\n* [Dropout in a minute](Dropout%20in%20a%20minute.ipynb)\n  * Dropout is arguably the most popular regularization technique in deep learning. Let's check again how it work.\n* Ngram LM vs. rnnlm(WIP)\n* [Data Augmentation for Quora Question Pairs](Data%20Augmentation%20for%20Quora%20Question%20Pairs.ipynb)\n  * Let's see if it's effective to augment training data in the task of quora question pairs.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyubyong%2Fnlp_made_easy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkyubyong%2Fnlp_made_easy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyubyong%2Fnlp_made_easy/lists"}