Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/simran-khanuja/awesome-multilingual-nlp
A repository to maintain work being done in the field of multilingual NLP, to encourage research beyond English :)
https://github.com/simran-khanuja/awesome-multilingual-nlp
List: awesome-multilingual-nlp
Last synced: 16 days ago
JSON representation
A repository to maintain work being done in the field of multilingual NLP, to encourage research beyond English :)
- Host: GitHub
- URL: https://github.com/simran-khanuja/awesome-multilingual-nlp
- Owner: simran-khanuja
- Created: 2022-12-26T19:37:38.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-07-30T15:27:34.000Z (over 1 year ago)
- Last Synced: 2024-05-21T09:12:19.530Z (7 months ago)
- Homepage:
- Size: 42 KB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- ultimate-awesome - awesome-multilingual-nlp - A repository to maintain work being done in the field of multilingual NLP, to encourage research beyond English :). (Other Lists / Monkey C Lists)
README
# Resources for Multilingual NLP
A repository to maintain work being done in the field of multilingual NLP, to encourage NLP research beyond English :)## Table of Contents
* [Survey Papers](#survey-papers)
* [Modeling](#modeling)
* [Models](#models)
* [\[Methods\] General](#methods-general)
* [\[Methods\] Task-Specific](#methods-task-specific)
* [Fine-tuning](#fine-tuning)
* [Evaluation](#evaluation)
* [Datasets](#datasets)
* [Analysis](#analysis)
* [Position Papers](#position-papers)
* [Blog Posts](#blog-posts)
* [Workshops](#workshops)
* [Tutorials](#tutorials)
* [Courses](#courses)
* [\[Bonus\] Non-NLP References](#bonus-non-nlp-references)# Research Papers
## Survey Papers
[A Survey of Multilingual Models for Automatic Speech Recognition](https://aclanthology.org/2022.lrec-1.542/), *Yadav et al.*, LREC 2022[A Primer on Pretrained Multilingual Language Models](https://arxiv.org/abs/2107.00676), *Doddapaneni et al.*, arXiv:2107.00676, Jul 2021
## Modeling
### Models
[Mu2SLAM: Multitask, Multilingual Speech and Language Models](https://arxiv.org/abs/2212.09553), *Cheng et al.*, arXiv:2212.09553, Dec 2022[BLOOM: A 176B-Parameter Open-Access Multilingual Language Model](https://arxiv.org/pdf/2211.05100.pdf), *BigScience Workshop*, arXiv:2211.05100, Nov 2022
[PaLI: A Jointly-Scaled Multilingual Language-Image Model](https://arxiv.org/abs/2209.06794), *Chen et al.*, arXiv:2209.06794, Sep 2022
[Few-shot Learning with Multilingual Generative Language Models](https://arxiv.org/pdf/2112.10668.pdf), *Lin et al.*, EMNLP 2022
[MAESTRO: Matched Speech Text Representations through Modality Matching](https://arxiv.org/abs/2204.03409), *Chen et al.*, Interspeech 2022
[mSLAM: Massively multilingual joint pre-training for speech and text](https://arxiv.org/abs/2202.01374), *Bapna et al.*, arXiv:2202.01374, Feb 2022
[Unsupervised Cross-lingual Representation Learning for Speech Recognition](https://www.isca-speech.org/archive/pdfs/interspeech_2021/conneau21_interspeech.pdf), *Conneau et al.*, Interspeech 2021
[Rethinking Embedding Coupling in Pre-trained Language Models](https://openreview.net/forum?id=xpFFI_NtgpW), *Chung et al.*, ICLR 2021
[Larger-Scale Transformers for Multilingual Masked Language Modeling](https://aclanthology.org/2021.repl4nlp-1.4.pdf), *Goyal et al.*, RepL4NLP-2021
[MuRIL: Multilingual Representations for Indian Languages](https://arxiv.org/pdf/2103.10730.pdf), *Khanuja et al.*, arXiv:2103.10730, March 2021
[Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116), *Conneau et al.*, ACL 2020
### [Methods] General
[Overlap-based Vocabulary Generation Improves Cross-lingual Transfer Among Related Languages](https://aclanthology.org/2022.acl-long.18/), *Patil et al.*, ACL 2022[When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Model](https://arxiv.org/pdf/2010.12858.pdf), *Muller et al.*, NAACL-HLT 2021
[On the Cross-lingual Transferability of Monolingual Representations](https://arxiv.org/abs/1910.11856), *Artetxe et al.*, ACL 2020
### [Methods] Task-Specific
[Domain Adaptation in Multilingual and Multi-Domain Monolingual Settings for Complex Word Identification](https://aclanthology.org/2022.acl-long.6/), *Zaharia et al.*, ACL 2022[An Unsupervised Multiple-Task and Multiple-Teacher Model for Cross-lingual Named Entity Recognition](https://aclanthology.org/2022.acl-long.14/), *Li et al.*, ACL 2022
[Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation](https://aclanthology.org/2022.acl-long.12/), *Chen et al.*, ACL 2022
[Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization](https://aclanthology.org/2022.acl-long.42/), *Jia et al.*, ACL 2022
[Multilingual Knowledge Graph Completion with Self-Supervised Adaptive Graph Alignment](https://aclanthology.org/2022.acl-long.36/), *Huang et al.*, ACL 2022
### Fine-tuning
[On Efficiently Acquiring Annotations for Multilingual Models](https://aclanthology.org/2022.acl-short.9/), *Moniz et al.*, ACL 2022[Role of Language Relatedness in Multilingual Fine-tuning of Language Models: A Case Study in Indo-Aryan Languages](https://arxiv.org/pdf/2109.10534.pdf), *Dhamecha et al.*, EMNLP 2021
[From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers](https://aclanthology.org/2020.emnlp-main.363/), *Lauscher et al.*, EMNLP 2020
[Choosing Transfer Languages for Cross-Lingual Learning](https://aclanthology.org/P19-1301/), *Lin et al.*, ACL 2019
## Evaluation
[Multi Task Learning For Zero Shot Performance Prediction of Multilingual Models](https://aclanthology.org/2022.acl-long.374/), *Ahuja et al.*, ACL 2022[XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization](https://arxiv.org/abs/2003.11080), *Hu et al.*, ICML 2020
## Datasets
[Few-shot Learning with Multilingual Generative Language Models](https://arxiv.org/pdf/2112.10668.pdf), *Lin et al.*, EMNLP 2022[Language Models are Multilingual Chain-of-thought Reasoners](https://arxiv.org/pdf/2210.03057.pdf), *Shi et al.*, arXiv:2210.03057, Oct 2022
[Visually Grounded Reasoning across Languages and Cultures](https://aclanthology.org/2021.emnlp-main.818/), *Liu et al.*, EMNLP 2021
[On the Cross-lingual Transferability of Monolingual Representations](https://arxiv.org/abs/1910.11856), *Artetxe et al.*, ACL 2020
## Analysis
[Does Corpus Quality Really Matter for Low-Resource Languages?](https://arxiv.org/pdf/2203.08111.pdf), *Artetxe et al.*, EMNLP 2022[When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer](https://aclanthology.org/2022.naacl-main.264/), *Deshpande et al.*, NAACL 2022
[Computational Historical Linguistics and Language Diversity in South Asia](https://aclanthology.org/2022.acl-long.99/), *Arora et al.*, ACL 2022
[Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets](https://aclanthology.org/2022.tacl-1.4.pdf), *Kreutzer et al.*, TACL 2022
[Cultural and Geographical Influences on Image Translatability of Words across Languages](https://aclanthology.org/2021.naacl-main.19/), *Khani et al.*, NAACL 2021
[Language Models are Few-shot Multilingual Learners](https://aclanthology.org/2021.mrl-1.1.pdf), *Winata et al.*, MRL 2021
[Identifying Elements Essential for BERT’s Multilinguality](https://aclanthology.org/2020.emnlp-main.358/), *Dufter et al.*, EMNLP 2020
## Position Papers
[Challenges and Strategies in Cross-Cultural NLP](https://aclanthology.org/2022.acl-long.482/), *Hershcovich et al.*, ACL 2022[Toward More Meaningful Resources for Lower-resourced Languages](https://aclanthology.org/2022.findings-acl.44.pdf), *Lignos et al.*, ACL Findings 2022
[The State and Fate of Linguistic Diversity and Inclusion in the NLP World](https://aclanthology.org/2020.acl-main.560/), *Joshi et al.*, ACL 2020
## Blog Posts
[The State of Multilingual AI](https://ruder.io/state-of-multilingual-ai/), *Sebastian Ruder*, Nov 2022[Multi-domain Multilingual Question Answering](https://ruder.io/multi-qa-tutorial/), *Sebastian Ruder*, Dec 2021
[Why You Should Do NLP Beyond English](https://ruder.io/nlp-beyond-english/), *Sebastian Ruder*, Aug 2020
## Workshops
## Tutorials
## Courses
[Multilingual NLP](http://phontron.com/class/multiling2022/), CMU CS 11-737, Spring 2022## [Bonus] Non-NLP References
[The digital language divide](https://labs.theguardian.com/digital-language-divide/), *Holly Young*[The Multilingual Mind: A Modular Processing Perspective](https://www.cambridge.org/mq/titles/multilingual-mind-modular-processing-perspective), *Michael Sharwood Smith, John Truscott*