Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/king-menin/mipt-nlp2022
NLP course, MIPT
https://github.com/king-menin/mipt-nlp2022
course natural-language-processing
Last synced: 3 months ago
JSON representation
NLP course, MIPT
- Host: GitHub
- URL: https://github.com/king-menin/mipt-nlp2022
- Owner: king-menin
- Created: 2020-02-06T15:22:29.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2022-04-28T07:35:04.000Z (over 2 years ago)
- Last Synced: 2024-02-07T12:29:51.049Z (12 months ago)
- Topics: course, natural-language-processing
- Language: Jupyter Notebook
- Size: 104 MB
- Stars: 38
- Watchers: 6
- Forks: 15
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# mipt-nlp2022
NLP course, MIPT### Course instructors
Anton Emelianov ([email protected], @king_menin), Albina Akhmetgareeva ([email protected])Videos [here](https://drive.google.com/drive/folders/1CDQsHx53en5punmtB5I94A4NI2pJY9Wm?usp=sharing)
Exam questions [here](https://docs.google.com/document/d/1NdD20EMSAOFTXSD6uEC55k0k6_MOofJHf0xdQ7-2DKs/edit?usp=sharing)
## Mark
```math
final_mark=sum_i (max_score(HW_i)) / count(HWs) * 10, i==1..3
```## Lecture schedule
#### Week 1
* Lecture: [Intro to NLP](lectures/L1.Intro2NLP.pdf)
* Practical: [Text preprocessing](seminars/sem1/sem1_basic_text_processing.ipynb), [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/king-menin/mipt-nlp2022/blob/master/seminars/sem1/sem1_basic_text_processing.ipynb)
* [Video](https://drive.google.com/file/d/1hx3EHLtIsOspEjMDZz9I0ENH3P9DIsrf/view?usp=sharing)#### Week 2
* Lecture: [Word embeddings](lectures/L2.WordEmbeddings.pdf)
Distributional semantics. Count-based (pre-neural) methods. Word2Vec: learn vectors. GloVe: count, then learn. N-gram (collocations)
RusVectores. t-SNE.
* Practical: word2vec, fasttext
* [HW1](HWs/hw1.ipynb)
* Video: [lecture](https://drive.google.com/file/d/1LQEVudRMccfiIj5igVeNg2qqg9EwRJ7b/view?usp=sharing), [seminar](https://drive.google.com/file/d/1yoXQXRmEvhBUl0iPJgVXFYv9KwyLAzUA/view?usp=sharing)#### Week 3
* Lecture: [RNN + CNN, Text classification](lectures/L3.TextClassification_BasicNNs_at_NLP.pdf)
Neural Language Models: Recurrent Models, Convolutional Models. Text classification (architectures)
* Practical: [Classification with LSTM, CNN](seminars/sem3/sem3_classification.ipynb), [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/king-menin/mipt-nlp2021/blob/master/seminars/sem3/sem3_classification.ipynb)
* [Video](https://drive.google.com/file/d/1uqV_uhPUjhqh5v8zVFBPsem4W2oDErGn/view?usp=sharing)#### Week 4
* Lecture: [Language modelling and NER](lectures/L4.LMs_Intro_and_NER.pdf)
Task description, methods (Markov Model, RNNs), evaluation (perplexity), Sequence Labelling (NER, pos-tagging, chunking etc.) N-gram language models, HMM, MEMM, CRF
* Practical: [NER](seminars/sem4/sem4_ner.ipynb), [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/king-menin/mipt-nlp2022/blob/master/seminars/sem4/sem4_ner.ipynb)
* [Video](https://drive.google.com/file/d/1ECVWmy7zMs9QPX-nnZ7SASvopBlFAIM6/view?usp=sharing)#### Week 5
* Lecture: [Machine translation, Seq2seq, Attention, Transformers](lectures/L5.MTAttentionTransformers.pptx.pdf)Basics: Encoder-Decoder framework, Inference (e.g., beam search), Eval (bleu).
Attention: general, score functions, models. Bahdanau and Luong models.
Transformer: self-attention, masked self-attention, multi-head attention.
* [HW2](HWs/hw2.ipynb), https://www.kaggle.com/c/mipt-nlp-hw2-2022* [Video](https://drive.google.com/file/d/1P0UQX50ZacNnRAhotgjGhZt6L6D3L4Zo/view?usp=sharing)
#### Week 6
* Lecture: [Transfer learning in NLP](lectures/L6.TransferLearning.pdf)
Bertology (BERT, GPT-s, t5, etc.), Subword Segmentation (BPE), Evaluation of big LMs.
* Practical: [transformers models for classification task](seminars/sem6/TransferLearningSeminar.ipynb), [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/king-menin/mipt-nlp2022/blob/master/seminars/sem6/TransferLearningSeminar.ipynb)
* [Video](https://drive.google.com/file/d/15YWJGC-8FzGBtfO4SkWOvPG5BEgkuZRO/view?usp=sharing)#### Week 7
Lecture & Practical: How to train big models? [Distributed training](lectures/L7.DistributedTraining.pdf)
Training Multi-Billion Parameter Language Models. Model Parallelism. Data Parallelism.
* Practical: [DDP example](seminars/sem7)
* [Video](https://drive.google.com/file/d/1dFy_EI6OcaqL7VydX7PborZt8-58I2aw/view?usp=sharing)#### Week 8
* Lecture: [Question answering](lectures/L8.QuestionAnswering.pdf)
* Practical: [seminar QA](seminars/sem8/qa.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/king-menin/mipt-nlp2022/blob/master/seminars/sem8/qa.ipynb), [seminar chat-bots](seminars/sem8/chatbots.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/king-menin/mipt-nlp2022/blob/master/seminars/sem8/chatbots.ipynb)
* [Video](https://drive.google.com/file/d/1t4LtMNbbckA0AH5YXTbrSepflt0VVyPy/view?usp=sharing)Squads (one-hop, multi-hop), architectures, retrieval and search, chat-bots
#### Week 9
* Lecture: [Summarization, simplification, paraphrasing](lectures/L9.Summarization.pdf)
* Practical: summarization seminar
* [HW3](HWs/hw3.ipynb), https://www.kaggle.com/c/mipt-nlp-hw3-2022
* [Video](https://drive.google.com/file/d/1zDcyheTST_l7hSi8-xzOVsjAtQ-upfxe/view?usp=sharing)#### Week 10
* Lecture: [Multimodal NLP](lectures/L10.MultimodalNLP.pdf)
* [Video](https://drive.google.com/file/d/1Sl4WZv-L-3zq6cYpe62llnX0TALxxLc_/view?usp=sharing)## Recommended Resources
### En* [ruder.io](https://ruder.io/)
* [Jurafsky & Martin](https://web.stanford.edu/~jurafsky/slp3/)
* [Курс Лауры Каллмайер по МО для NLP](https://user.phil.hhu.de/~kallmeyer/MachineLearning/index.html)
* [Курс Нильса Раймерса по DL для NLP](https://github.com/UKPLab/deeplearning4nlp-tutorial)
* [Курс в Оксфорде по DL для NLP](https://github.com/UKPLab/deeplearning4nlp-tutorial)
* [Курс в Стенфорде по DL для NLP](https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1214/)
* [Reinforcment Learning for NLP](https://github.com/jiyfeng/rl4nlp)### На русском (и про русский, в основном)
* [Курс nlp в яндексе](https://github.com/yandexdataschool/nlp_course)
* [НКРЯ](http://ruscorpora.ru)
* [Открытый корпус](http://opencorpora.org)
* [Дистрибутивные семантические модели для русского языка](http://rusvectores.org/ru/)
* [Морфология](https://tech.yandex.ru/mystem/)
* [Синтаксис](https://habrahabr.ru/post/317564/)
* [Томита-парсер](https://tech.yandex.ru/tomita/)
* [mathlingvo](http://mathlingvo.ru)
* [nlpub](https://nlpub.ru)
* [Text Visualisation browser](http://textvis.lnu.se)## Literature
* Manning, Christopher D., and Hinrich Schütze. Foundations of statistical natural language processing. Vol. 999. Cambridge: MIT press, 1999.
* Martin, James H., and Daniel Jurafsky. "Speech and language processing." International Edition 710 (2000): 25.
* Cohen, Shay. "Bayesian analysis in natural language processing." Synthesis Lectures on Human Language Technologies 9, no. 2 (2016): 1-274.
* Goldberg, Yoav. "Neural Network Methods for Natural Language Processing." Synthesis Lectures on Human Language Technologies 10, no. 1 (2017): 1-309.