{"id":18551339,"url":"https://github.com/morvanzhou/nlp-tutorials","last_synced_at":"2025-05-16T10:08:29.484Z","repository":{"id":37665036,"uuid":"159797814","full_name":"MorvanZhou/NLP-Tutorials","owner":"MorvanZhou","description":"Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com","archived":false,"fork":false,"pushed_at":"2023-05-22T23:31:37.000Z","size":909,"stargazers_count":938,"open_issues_count":7,"forks_count":316,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-04-09T05:03:08.068Z","etag":null,"topics":["attention","bert","elmo","gpt","nlp","seq2seq","transformer","tutorial","w2v"],"latest_commit_sha":null,"homepage":"https://mofanpy.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MorvanZhou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-11-30T09:11:39.000Z","updated_at":"2025-04-08T07:13:55.000Z","dependencies_parsed_at":"2024-11-13T22:03:11.156Z","dependency_job_id":"b29a07fa-94eb-4092-a754-2b882872bd4f","html_url":"https://github.com/MorvanZhou/NLP-Tutorials","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MorvanZhou%2FNLP-Tutorials","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MorvanZhou%2FNLP-Tutorials/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MorvanZhou%2FNLP-Tutorials/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MorvanZhou%2FNLP-Tutorials/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MorvanZhou","download_url":"https://codeload.github.com/MorvanZhou/NLP-Tutorials/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254509477,"owners_count":22082892,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention","bert","elmo","gpt","nlp","seq2seq","transformer","tutorial","w2v"],"created_at":"2024-11-06T21:08:42.898Z","updated_at":"2025-05-16T10:08:24.474Z","avatar_url":"https://github.com/MorvanZhou.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Natural Language Processing Tutorial\n\nTutorial in Chinese can be found in [mofanpy.com](https://mofanpy.com/tutorials/machine-learning/nlp/).\n\nThis repo includes many simple implementations of models in Neural Language Processing (NLP).\n\nAll code implementations in this tutorial are organized as following:\n\n1. Search Engine\n  - [TF-IDF numpy / TF-IDF skearn](#TF-IDF)\n2. Understand Word (W2V)\n  - [Continuous Bag of Words (CBOW)](#Word2Vec)\n  - [Skip-Gram](#Word2Vec)\n3. Understand Sentence (Seq2Seq)\n  - [seq2seq](#Seq2Seq)\n  - [CNN language model](#CNNLanguageModel)\n4. All about Attention\n  - [seq2seq with attention](#Seq2SeqAttention)\n  - [Transformer](#Transformer)\n5. Pretrained Models\n  - [ELMo](#ELMO)\n  - [GPT](#GPT)\n  - [BERT](#BERT)\n\nThanks for the contribution made by [@W1Fl](https://github.com/W1Fl) with a simplified keras codes in [simple_realize](simple_realize).\nAnd the a [pytorch version of this NLP](/pytorch) tutorial made by [@ruifanxu](https://github.com/ruifan831).\n\n## Installation\n\n```shell script\n$ git clone https://github.com/MorvanZhou/NLP-Tutorials\n$ cd NLP-Tutorials/\n$ sudo pip3 install -r requirements.txt\n```\n\n\n## TF-IDF\n\nTF-IDF numpy [code](tf_idf.py)\n\nTF-IDF short sklearn [code](tf_idf_sklearn.py)\n\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/tfidf_matrix.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/tfidf_matrix.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\n\n## Word2Vec\n[Efficient Estimation of Word Representations in Vector Space](https://arxiv.org/pdf/1301.3781.pdf)\n\nSkip-Gram [code](skip-gram.py)\n\nCBOW [code](CBOW.py)\n\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/cbow_illustration.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/cbow_illustration.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/skip_gram_illustration.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/skip_gram_illustration.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/cbow_code_result.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/cbow_code_result.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\n\n## Seq2Seq\n[Sequence to Sequence Learning with Neural Networks](https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf)\n\nSeq2Seq [code](seq2seq.py)\n\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/seq2seq_illustration.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/seq2seq_illustration.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\n## CNNLanguageModel\n[Convolutional Neural Networks for Sentence Classification](https://arxiv.org/pdf/1408.5882.pdf)\n\nCNN language model [code](cnn-lm.py)\n\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/cnn-ml_sentence_embedding.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/cnn-ml_sentence_embedding.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\n\n## Seq2SeqAttention\n[Effective Approaches to Attention-based Neural Machine Translation](https://arxiv.org/pdf/1508.04025.pdf)\n\nSeq2Seq Attention [code](seq2seq_attention.py)\n\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/luong_attention.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/luong_attention.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/seq2seq_attention_res.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/seq2seq_attention_res.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\n\n\n## Transformer\n[Attention Is All You Need](https://arxiv.org/pdf/1706.03762.pdf)\n\nTransformer [code](transformer.py)\n\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/transformer_encoder_decoder.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/transformer_encoder_decoder.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/transformer0_decoder_encoder_attention.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/transformer0_decoder_encoder_attention.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/transformer0_encoder_decoder_attention_line.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/transformer0_encoder_decoder_attention_line.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\n\n## ELMO\n[Deep contextualized word representations](https://arxiv.org/pdf/1802.05365.pdf)\n\nELMO [code](ELMo.py)\n\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/elmo_training.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/elmo_training.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/elmo_word_emb.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/elmo_word_emb.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\n\n## GPT\n[Improving Language Understanding by Generative Pre-Training](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf)\n\nGPT [code](GPT.py)\n\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/gpt_structure.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/gpt_structure.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/gpt7_self_attention_line.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/gpt7_self_attention_line.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\n\n## BERT\n[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805.pdf)\n\nBERT [code](BERT.py)\n\nMy new attempt [Bert with window mask](BERT_window_mask.py)\n\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/bert_gpt_comparison.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/bert_gpt_comparison.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\u003ca target=\"_blank\" href=\"https://mofanpy.com/static/results/nlp/bert_self_mask4_self_attention_line.png\" style=\"text-align: center\"\u003e\n\u003cimg src=\"https://mofanpy.com/static/results/nlp/bert_self_mask4_self_attention_line.png\" height=\"250px\" alt=\"image\"\u003e\n\u003c/a\u003e\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmorvanzhou%2Fnlp-tutorials","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmorvanzhou%2Fnlp-tutorials","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmorvanzhou%2Fnlp-tutorials/lists"}