https://github.com/simran-khanuja/awesome-multilingual-nlp

A repository to maintain work being done in the field of multilingual NLP, to encourage research beyond English :)
https://github.com/simran-khanuja/awesome-multilingual-nlp

List: awesome-multilingual-nlp

Last synced: 6 months ago
JSON representation

A repository to maintain work being done in the field of multilingual NLP, to encourage research beyond English :)

Host: GitHub
URL: https://github.com/simran-khanuja/awesome-multilingual-nlp
Owner: simran-khanuja
Created: 2022-12-26T19:37:38.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-07-30T15:27:34.000Z (almost 2 years ago)
Last Synced: 2024-05-21T09:12:19.530Z (about 1 year ago)
Homepage:
Size: 42 KB
Stars: 4
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

ultimate-awesome - awesome-multilingual-nlp - A repository to maintain work being done in the field of multilingual NLP, to encourage research beyond English :). (Other Lists / Julia Lists)

README

        # Resources for Multilingual NLP

A repository to maintain work being done in the field of multilingual NLP, to encourage NLP research beyond English :)

## Table of Contents

* [Survey Papers](#survey-papers)

* [Modeling](#modeling)

  * [Models](#models)

  * [\[Methods\] General](#methods-general)

  * [\[Methods\] Task-Specific](#methods-task-specific)

  * [Fine-tuning](#fine-tuning)

* [Evaluation](#evaluation)

* [Datasets](#datasets)

* [Analysis](#analysis)

* [Position Papers](#position-papers)

* [Blog Posts](#blog-posts)

* [Workshops](#workshops)

* [Tutorials](#tutorials)

* [Courses](#courses)

* [\[Bonus\] Non-NLP References](#bonus-non-nlp-references)

# Research Papers

## Survey Papers

[A Survey of Multilingual Models for Automatic Speech Recognition](https://aclanthology.org/2022.lrec-1.542/), *Yadav et al.*, LREC 2022

[A Primer on Pretrained Multilingual Language Models](https://arxiv.org/abs/2107.00676), *Doddapaneni et al.*, arXiv:2107.00676, Jul 2021

## Modeling

### Models

[Mu2SLAM: Multitask, Multilingual Speech and Language Models](https://arxiv.org/abs/2212.09553), *Cheng et al.*, arXiv:2212.09553, Dec 2022

[BLOOM: A 176B-Parameter Open-Access Multilingual Language Model](https://arxiv.org/pdf/2211.05100.pdf), *BigScience Workshop*, arXiv:2211.05100, Nov 2022

[PaLI: A Jointly-Scaled Multilingual Language-Image Model](https://arxiv.org/abs/2209.06794), *Chen et al.*, arXiv:2209.06794, Sep 2022

[Few-shot Learning with Multilingual Generative Language Models](https://arxiv.org/pdf/2112.10668.pdf), *Lin et al.*, EMNLP 2022

[MAESTRO: Matched Speech Text Representations through Modality Matching](https://arxiv.org/abs/2204.03409), *Chen et al.*, Interspeech 2022

[mSLAM: Massively multilingual joint pre-training for speech and text](https://arxiv.org/abs/2202.01374), *Bapna et al.*, arXiv:2202.01374, Feb 2022

[Unsupervised Cross-lingual Representation Learning for Speech Recognition](https://www.isca-speech.org/archive/pdfs/interspeech_2021/conneau21_interspeech.pdf), *Conneau et al.*, Interspeech 2021

[Rethinking Embedding Coupling in Pre-trained Language Models](https://openreview.net/forum?id=xpFFI_NtgpW), *Chung et al.*, ICLR 2021

[Larger-Scale Transformers for Multilingual Masked Language Modeling](https://aclanthology.org/2021.repl4nlp-1.4.pdf), *Goyal et al.*, RepL4NLP-2021

[MuRIL: Multilingual Representations for Indian Languages](https://arxiv.org/pdf/2103.10730.pdf), *Khanuja et al.*, arXiv:2103.10730, March 2021

[Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116), *Conneau et al.*, ACL 2020

### [Methods] General

[Overlap-based Vocabulary Generation Improves Cross-lingual Transfer Among Related Languages](https://aclanthology.org/2022.acl-long.18/), *Patil et al.*, ACL 2022

[When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Model](https://arxiv.org/pdf/2010.12858.pdf), *Muller et al.*, NAACL-HLT 2021

[On the Cross-lingual Transferability of Monolingual Representations](https://arxiv.org/abs/1910.11856), *Artetxe et al.*, ACL 2020

### [Methods] Task-Specific

[Domain Adaptation in Multilingual and Multi-Domain Monolingual Settings for Complex Word Identification](https://aclanthology.org/2022.acl-long.6/),  *Zaharia et al.*, ACL 2022

[An Unsupervised Multiple-Task and Multiple-Teacher Model for Cross-lingual Named Entity Recognition](https://aclanthology.org/2022.acl-long.14/), *Li et al.*, ACL 2022

[Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation](https://aclanthology.org/2022.acl-long.12/), *Chen et al.*, ACL 2022

[Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization](https://aclanthology.org/2022.acl-long.42/), *Jia et al.*, ACL 2022

[Multilingual Knowledge Graph Completion with Self-Supervised Adaptive Graph Alignment](https://aclanthology.org/2022.acl-long.36/), *Huang et al.*, ACL 2022

### Fine-tuning

[On Efficiently Acquiring Annotations for Multilingual Models](https://aclanthology.org/2022.acl-short.9/), *Moniz et al.*, ACL 2022

[Role of Language Relatedness in Multilingual Fine-tuning of Language Models: A Case Study in Indo-Aryan Languages](https://arxiv.org/pdf/2109.10534.pdf), *Dhamecha et al.*, EMNLP 2021

[From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers](https://aclanthology.org/2020.emnlp-main.363/), *Lauscher et al.*, EMNLP 2020

[Choosing Transfer Languages for Cross-Lingual Learning](https://aclanthology.org/P19-1301/), *Lin et al.*, ACL 2019

## Evaluation

[Multi Task Learning For Zero Shot Performance Prediction of Multilingual Models](https://aclanthology.org/2022.acl-long.374/), *Ahuja et al.*, ACL 2022

[XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization](https://arxiv.org/abs/2003.11080), *Hu et al.*, ICML 2020

## Datasets

[Few-shot Learning with Multilingual Generative Language Models](https://arxiv.org/pdf/2112.10668.pdf), *Lin et al.*, EMNLP 2022

[Language Models are Multilingual Chain-of-thought Reasoners](https://arxiv.org/pdf/2210.03057.pdf), *Shi et al.*, 	arXiv:2210.03057, Oct 2022

[Visually Grounded Reasoning across Languages and Cultures](https://aclanthology.org/2021.emnlp-main.818/), *Liu et al.*, EMNLP 2021

[On the Cross-lingual Transferability of Monolingual Representations](https://arxiv.org/abs/1910.11856), *Artetxe et al.*, ACL 2020 

## Analysis

[Does Corpus Quality Really Matter for Low-Resource Languages?](https://arxiv.org/pdf/2203.08111.pdf), *Artetxe et al.*, EMNLP 2022

[When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer](https://aclanthology.org/2022.naacl-main.264/), *Deshpande et al.*, NAACL 2022

[Computational Historical Linguistics and Language Diversity in South Asia](https://aclanthology.org/2022.acl-long.99/), *Arora et al.*, ACL 2022

[Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets](https://aclanthology.org/2022.tacl-1.4.pdf), *Kreutzer et al.*, TACL 2022

[Cultural and Geographical Influences on Image Translatability of Words across Languages](https://aclanthology.org/2021.naacl-main.19/), *Khani et al.*, NAACL 2021

[Language Models are Few-shot Multilingual Learners](https://aclanthology.org/2021.mrl-1.1.pdf), *Winata et al.*, MRL 2021

[Identifying Elements Essential for BERT’s Multilinguality](https://aclanthology.org/2020.emnlp-main.358/), *Dufter et al.*, EMNLP 2020

## Position Papers

[Challenges and Strategies in Cross-Cultural NLP](https://aclanthology.org/2022.acl-long.482/), *Hershcovich et al.*, ACL 2022

[Toward More Meaningful Resources for Lower-resourced Languages](https://aclanthology.org/2022.findings-acl.44.pdf), *Lignos et al.*, ACL Findings 2022

[The State and Fate of Linguistic Diversity and Inclusion in the NLP World](https://aclanthology.org/2020.acl-main.560/), *Joshi et al.*, ACL 2020

## Blog Posts

[The State of Multilingual AI](https://ruder.io/state-of-multilingual-ai/), *Sebastian Ruder*, Nov 2022

[Multi-domain Multilingual Question Answering](https://ruder.io/multi-qa-tutorial/), *Sebastian Ruder*, Dec 2021

[Why You Should Do NLP Beyond English](https://ruder.io/nlp-beyond-english/), *Sebastian Ruder*, Aug 2020

## Workshops

## Tutorials

## Courses

[Multilingual NLP](http://phontron.com/class/multiling2022/), CMU CS 11-737, Spring 2022

## [Bonus] Non-NLP References

[The digital language divide](https://labs.theguardian.com/digital-language-divide/), *Holly Young*

[The Multilingual Mind: A Modular Processing Perspective](https://www.cambridge.org/mq/titles/multilingual-mind-modular-processing-perspective), *Michael Sharwood Smith, John Truscott*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/simran-khanuja/awesome-multilingual-nlp

Awesome Lists containing this project

README