Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/simran-khanuja/awesome-multilingual-nlp

A repository to maintain work being done in the field of multilingual NLP, to encourage research beyond English :)
https://github.com/simran-khanuja/awesome-multilingual-nlp

List: awesome-multilingual-nlp

Last synced: about 1 month ago
JSON representation

A repository to maintain work being done in the field of multilingual NLP, to encourage research beyond English :)

Awesome Lists containing this project

README

        

# Resources for Multilingual NLP
A repository to maintain work being done in the field of multilingual NLP, to encourage NLP research beyond English :)

## Table of Contents

* [Survey Papers](#survey-papers)
* [Modeling](#modeling)
* [Models](#models)
* [\[Methods\] General](#methods-general)
* [\[Methods\] Task-Specific](#methods-task-specific)
* [Fine-tuning](#fine-tuning)
* [Evaluation](#evaluation)
* [Datasets](#datasets)
* [Analysis](#analysis)
* [Position Papers](#position-papers)
* [Blog Posts](#blog-posts)
* [Workshops](#workshops)
* [Tutorials](#tutorials)
* [Courses](#courses)
* [\[Bonus\] Non-NLP References](#bonus-non-nlp-references)

# Research Papers

## Survey Papers
[A Survey of Multilingual Models for Automatic Speech Recognition](https://aclanthology.org/2022.lrec-1.542/), *Yadav et al.*, LREC 2022

[A Primer on Pretrained Multilingual Language Models](https://arxiv.org/abs/2107.00676), *Doddapaneni et al.*, arXiv:2107.00676, Jul 2021

## Modeling

### Models
[Mu2SLAM: Multitask, Multilingual Speech and Language Models](https://arxiv.org/abs/2212.09553), *Cheng et al.*, arXiv:2212.09553, Dec 2022

[BLOOM: A 176B-Parameter Open-Access Multilingual Language Model](https://arxiv.org/pdf/2211.05100.pdf), *BigScience Workshop*, arXiv:2211.05100, Nov 2022

[PaLI: A Jointly-Scaled Multilingual Language-Image Model](https://arxiv.org/abs/2209.06794), *Chen et al.*, arXiv:2209.06794, Sep 2022

[Few-shot Learning with Multilingual Generative Language Models](https://arxiv.org/pdf/2112.10668.pdf), *Lin et al.*, EMNLP 2022

[MAESTRO: Matched Speech Text Representations through Modality Matching](https://arxiv.org/abs/2204.03409), *Chen et al.*, Interspeech 2022

[mSLAM: Massively multilingual joint pre-training for speech and text](https://arxiv.org/abs/2202.01374), *Bapna et al.*, arXiv:2202.01374, Feb 2022

[Unsupervised Cross-lingual Representation Learning for Speech Recognition](https://www.isca-speech.org/archive/pdfs/interspeech_2021/conneau21_interspeech.pdf), *Conneau et al.*, Interspeech 2021

[Rethinking Embedding Coupling in Pre-trained Language Models](https://openreview.net/forum?id=xpFFI_NtgpW), *Chung et al.*, ICLR 2021

[Larger-Scale Transformers for Multilingual Masked Language Modeling](https://aclanthology.org/2021.repl4nlp-1.4.pdf), *Goyal et al.*, RepL4NLP-2021

[MuRIL: Multilingual Representations for Indian Languages](https://arxiv.org/pdf/2103.10730.pdf), *Khanuja et al.*, arXiv:2103.10730, March 2021

[Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116), *Conneau et al.*, ACL 2020

### [Methods] General
[Overlap-based Vocabulary Generation Improves Cross-lingual Transfer Among Related Languages](https://aclanthology.org/2022.acl-long.18/), *Patil et al.*, ACL 2022

[When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Model](https://arxiv.org/pdf/2010.12858.pdf), *Muller et al.*, NAACL-HLT 2021

[On the Cross-lingual Transferability of Monolingual Representations](https://arxiv.org/abs/1910.11856), *Artetxe et al.*, ACL 2020

### [Methods] Task-Specific
[Domain Adaptation in Multilingual and Multi-Domain Monolingual Settings for Complex Word Identification](https://aclanthology.org/2022.acl-long.6/), *Zaharia et al.*, ACL 2022

[An Unsupervised Multiple-Task and Multiple-Teacher Model for Cross-lingual Named Entity Recognition](https://aclanthology.org/2022.acl-long.14/), *Li et al.*, ACL 2022

[Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation](https://aclanthology.org/2022.acl-long.12/), *Chen et al.*, ACL 2022

[Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization](https://aclanthology.org/2022.acl-long.42/), *Jia et al.*, ACL 2022

[Multilingual Knowledge Graph Completion with Self-Supervised Adaptive Graph Alignment](https://aclanthology.org/2022.acl-long.36/), *Huang et al.*, ACL 2022

### Fine-tuning
[On Efficiently Acquiring Annotations for Multilingual Models](https://aclanthology.org/2022.acl-short.9/), *Moniz et al.*, ACL 2022

[Role of Language Relatedness in Multilingual Fine-tuning of Language Models: A Case Study in Indo-Aryan Languages](https://arxiv.org/pdf/2109.10534.pdf), *Dhamecha et al.*, EMNLP 2021

[From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers](https://aclanthology.org/2020.emnlp-main.363/), *Lauscher et al.*, EMNLP 2020

[Choosing Transfer Languages for Cross-Lingual Learning](https://aclanthology.org/P19-1301/), *Lin et al.*, ACL 2019

## Evaluation
[Multi Task Learning For Zero Shot Performance Prediction of Multilingual Models](https://aclanthology.org/2022.acl-long.374/), *Ahuja et al.*, ACL 2022

[XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization](https://arxiv.org/abs/2003.11080), *Hu et al.*, ICML 2020

## Datasets
[Few-shot Learning with Multilingual Generative Language Models](https://arxiv.org/pdf/2112.10668.pdf), *Lin et al.*, EMNLP 2022

[Language Models are Multilingual Chain-of-thought Reasoners](https://arxiv.org/pdf/2210.03057.pdf), *Shi et al.*, arXiv:2210.03057, Oct 2022

[Visually Grounded Reasoning across Languages and Cultures](https://aclanthology.org/2021.emnlp-main.818/), *Liu et al.*, EMNLP 2021

[On the Cross-lingual Transferability of Monolingual Representations](https://arxiv.org/abs/1910.11856), *Artetxe et al.*, ACL 2020

## Analysis
[Does Corpus Quality Really Matter for Low-Resource Languages?](https://arxiv.org/pdf/2203.08111.pdf), *Artetxe et al.*, EMNLP 2022

[When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer](https://aclanthology.org/2022.naacl-main.264/), *Deshpande et al.*, NAACL 2022

[Computational Historical Linguistics and Language Diversity in South Asia](https://aclanthology.org/2022.acl-long.99/), *Arora et al.*, ACL 2022

[Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets](https://aclanthology.org/2022.tacl-1.4.pdf), *Kreutzer et al.*, TACL 2022

[Cultural and Geographical Influences on Image Translatability of Words across Languages](https://aclanthology.org/2021.naacl-main.19/), *Khani et al.*, NAACL 2021

[Language Models are Few-shot Multilingual Learners](https://aclanthology.org/2021.mrl-1.1.pdf), *Winata et al.*, MRL 2021

[Identifying Elements Essential for BERT’s Multilinguality](https://aclanthology.org/2020.emnlp-main.358/), *Dufter et al.*, EMNLP 2020

## Position Papers
[Challenges and Strategies in Cross-Cultural NLP](https://aclanthology.org/2022.acl-long.482/), *Hershcovich et al.*, ACL 2022

[Toward More Meaningful Resources for Lower-resourced Languages](https://aclanthology.org/2022.findings-acl.44.pdf), *Lignos et al.*, ACL Findings 2022

[The State and Fate of Linguistic Diversity and Inclusion in the NLP World](https://aclanthology.org/2020.acl-main.560/), *Joshi et al.*, ACL 2020

## Blog Posts
[The State of Multilingual AI](https://ruder.io/state-of-multilingual-ai/), *Sebastian Ruder*, Nov 2022

[Multi-domain Multilingual Question Answering](https://ruder.io/multi-qa-tutorial/), *Sebastian Ruder*, Dec 2021

[Why You Should Do NLP Beyond English](https://ruder.io/nlp-beyond-english/), *Sebastian Ruder*, Aug 2020

## Workshops

## Tutorials

## Courses
[Multilingual NLP](http://phontron.com/class/multiling2022/), CMU CS 11-737, Spring 2022

## [Bonus] Non-NLP References
[The digital language divide](https://labs.theguardian.com/digital-language-divide/), *Holly Young*

[The Multilingual Mind: A Modular Processing Perspective](https://www.cambridge.org/mq/titles/multilingual-mind-modular-processing-perspective), *Michael Sharwood Smith, John Truscott*