Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/DanAnastasyev/DeepNLP-Course

Deep NLP Course
https://github.com/DanAnastasyev/DeepNLP-Course

colab-notebook deep-learning keras nlp pytorch

Last synced: 3 months ago
JSON representation

Deep NLP Course

Awesome Lists containing this project

README

        

# Deep NLP Course at ABBYY

Deep learning for NLP crash course at ABBYY.

Suggested textbook: [Neural Network Methods in Natural Language Processing by Yoav Goldberg](https://www.amazon.com/Language-Processing-Synthesis-Lectures-Technologies/dp/1627052984)

*I'm gradually updating and translating the notebooks right now. Stay in touch.*

## Materials
### Week 1: *Introduction*
Sentiment analysis on the IMDB movie review dataset: a short overview of classical machine learning for NLP + indecently brief intro to keras.

Russian version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/12nrEX3JXTxsHWC-HpuwkTWyJybjmkZu-#forceEdit=true&offline=true&sandboxMode=true)

Updated English version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1eW-mN3gEdLluYe1W1unQ-7-71by8eauz#scrollTo=OlqOAQmQGXOL&forceEdit=true&offline=true&sandboxMode=true)

### Week 2: *Word Embeddings: Part 1*
Meet the Word Embeddings: an unsupervised method to capture some fun relationships between words.
Phrases similarity with word embeddings model + word based machine translation without parallel data (with MUSE word embeddings).

Russian version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1o65wrq6RYgWyyMvNP8r9ZknXBniDoXrn#forceEdit=true&offline=true&sandboxMode=true)

Updated English version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ey9NARKvk-c4vfQGdvOkPjp5wNGmxd5o#forceEdit=true&offline=true&sandboxMode=true)

### Week 3: *Word Embeddings: Part 2*
Introduction to PyTorch. Implementation of pet linear regression on pure numpy and pytorch. Implementations of CBoW, skip-gram, negative sampling and structured Word2vec models.

Russian version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1YruNhE5aEJfLpaCZSKGIaZ1hOQR5qoIG#forceEdit=true&offline=true&sandboxMode=true)

Updated English version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Cgdg_jUIbhMBZiL3DtUQMN6xqFUYjKlg#forceEdit=true&offline=true&sandboxMode=true)

### Week 4: *Convolutional Neural Networks*
Introduction to convolutional networks. Relations between convolutions and n-grams. Simple surname detector on character-level convolutions + fun visualizations.

Russian version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Vo_yuiA7xLjavUA_7ayLeosGJyMsyDAt#forceEdit=true&offline=true&sandboxMode=true)

Updated English version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1iwhfaHp_L2loxjvbqW9DhO9BaWlIWzpB#forceEdit=true&offline=true&sandboxMode=true)

### Week 5: *RNNs: Part 1*
RNNs for text classification. Simple RNN implementation + memorization test. Surname detector in multilingual setup: character-level LSTM classifier.

Russian version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-FoMnf7s-BYNM7jT9UF3u9m63h7dSq3_#forceEdit=true&offline=true&sandboxMode=true)

Updated English version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WA9YA30m7xFYfLyptuW7lHROvVGbZWZo#forceEdit=true&offline=true&sandboxMode=true)

### Week 6: *RNNs: Part 2*
RNNs for sequence labelling. Part-of-speech tagger implementations based on word embeddings and character-level word embeddings.

Russian version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1A7dbNANHg8srCemnwFI8WB1wLhvmuJp0#forceEdit=true&offline=true&sandboxMode=true)

### Week 7: *Language Models: Part 1*
Character-level language model for Russian troll tweets generation: fixed-window model via convolutions and RNN model.
Simple conditional language model: surname generation given source language.
And Toxic Comment Classification Challenge - to apply your skills to a real-world problem.

Russian version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1W5uaNpKFoaq1gV9N9FpIAEDyrsGGRBBi#forceEdit=true&offline=true&sandboxMode=true)

### Week 8: *Language Models: Part 2*
Word-level language model for poetry generation. Pet examples of transfer learning and multi-task learning applied to language models.

Russian version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1lUlBsdvAYJc5rLHwkOICyFhvns5Ssp1X#forceEdit=true&offline=true&sandboxMode=true)

### Week 9: *Seq2seq*
Seq2seq for machine translation and image captioning. Byte-pair encoding, beam search and other usefull stuff for machine translation.

Russian version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jSYWuEGwik2lnnvGSU_PyXTFtbRKSyz_#forceEdit=true&offline=true&sandboxMode=true)

### Week 10: *Seq2seq with Attention*
Seq2seq with attention for machine translation and image captioning.

Russian version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1xZed_YAQf20fYacr9anE7T4EsdC_R0Oy#forceEdit=true&offline=true&sandboxMode=true)

### Week 11: *Transformers & Text Summarization*
Implementation of Transformer model for text summarization. Discussion of Pointer-Generator Networks for text summarization.

Russian version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wy5BDHZVEm-vSeH8U4Xh0Sm3bArwVWGU#forceEdit=true&offline=true&sandboxMode=true)

### Week 12: *Dialogue Systems: Part 1*
Goal-orientied dialogue systems. Implemention of the multi-task model: intent classifier and token tagger for dialogue manager.

Russian version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1lNhbHboRYVb-caV7Ktj9cWJnHW7DPD9-#forceEdit=true&offline=true&sandboxMode=true)

### Week 13: *Dialogue Systems: Part 2*
General conversation dialogue systems and DSSMs. Implementation of question answering model on SQuAD dataset and chit-chat model on OpenSubtitles dataset.

Russian version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/19kQoxDWhv9VOfXxZCHcB39qIM30gziCR#forceEdit=true&offline=true&sandboxMode=true)

### Week 14: *Pretrained Models*
Pretrained models for various tasks: Universal Sentence Encoder for sentence similarity, ELMo for sequence tagging (with a bit of CRF), BERT for SWAG - reasoning about possible continuation.

Russian version: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WMspBJe-m0mJHb7SbDTj64W-5-3AxW7v#forceEdit=true&offline=true&sandboxMode=true)

### Final Presentation
[NLP Summary](https://drive.google.com/open?id=16GV-jSGtMAQPJgO_B6q1gLYXL10vM8Ev) - summary of cool stuff that appeared and didn't in the course.