{"id":18927558,"url":"https://github.com/gentaiscool/code-switching-papers","last_synced_at":"2026-01-27T19:27:17.446Z","repository":{"id":47546939,"uuid":"136591338","full_name":"gentaiscool/code-switching-papers","owner":"gentaiscool","description":"A curated list of research papers and resources on code-switching","archived":false,"fork":false,"pushed_at":"2024-12-18T00:50:52.000Z","size":182,"stargazers_count":315,"open_issues_count":0,"forks_count":39,"subscribers_count":24,"default_branch":"main","last_synced_at":"2025-06-01T11:05:34.517Z","etag":null,"topics":["bilingual","code-mixed","code-mixing","code-switch","code-switching","language","nlp","papers","research","speech"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gentaiscool.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-08T08:38:26.000Z","updated_at":"2025-05-28T18:53:40.000Z","dependencies_parsed_at":"2024-11-29T14:30:47.509Z","dependency_job_id":"1adb21d6-9d4d-464c-9c3a-1f457196f1a8","html_url":"https://github.com/gentaiscool/code-switching-papers","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gentaiscool/code-switching-papers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gentaiscool%2Fcode-switching-papers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gentaiscool%2Fcode-switching-papers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gentaiscool%2Fcode-switching-papers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gentaiscool%2Fcode-switching-papers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gentaiscool","download_url":"https://codeload.github.com/gentaiscool/code-switching-papers/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gentaiscool%2Fcode-switching-papers/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28819296,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-27T18:44:20.126Z","status":"ssl_error","status_checked_at":"2026-01-27T18:44:09.161Z","response_time":168,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bilingual","code-mixed","code-mixing","code-switch","code-switching","language","nlp","papers","research","speech"],"created_at":"2024-11-08T11:19:33.891Z","updated_at":"2026-01-27T19:27:17.438Z","avatar_url":"https://github.com/gentaiscool.png","language":null,"readme":"# Code-switching Research Resources\nThis is the list of tutorials, workshops, papers, and resources on computational linguistic approaches to code-switching research. \nThe list will be updated over the time. You are welcome to send a pull request for updating the list and be one of the contributors! \n\n📌 I plan to collect theses and books on code-switching and list them here. If you have one, don't hesitate to contact me or create a pull request! \n\n## Table of Contents\n\n- [🚀 Highlights](#-highlights)\n- [🏫 Workshops](#-workshops)\n- [📑 Research Papers](#-research-papers)\n  - [Survey Paper](#survey-paper)\n  - [Large Language Models](#large-language-models)\n  - [Language Identification and POS Tagging](#language-identification-and-pos-tagging)\n  - [Corpus](#corpus)\n  - [Language Modeling and Speech Recognition](#language-modeling-and-speech-recognition)\n  - [Discourse](#discourse)\n  - [Generation](#generation)\n  - [Speech Synthesis](#speech-synthesis)\n  - [Metric](#metric)\n  - [Representation Learning](#representation-learning)\n  - [Machine Translation](#machine-translation)\n  - [Speech Translation](#speech-translation)\n  - [Natural Language Understanding](#natural-language-understanding)\n  - [Named Entity Recognition](#named-entity-recognition)\n  - [Linguistics](#linguistics)\n  - [Affective Computing](#affective-computing)\n  - [Dialog and Conversational System](#dialog-and-conversational-system)\n  - [Discourse](#discourse)\n  - [Syntax](#syntax)\n  - [Adversarial Attack](#adversarial-attack)\n  - [Social Linguistics](#social-linguistics)\n  - [Benchmark](#benchmark)\n  - [Social Media](#social-media)\n  - [Text Normalization](#text-normalization)\n  - [Toolkit](#toolkit)\n- [Books](#books)\n- [Theses](#theses)\n\n## 🚀 Highlights\n- We will be organizing the code-switching workshop at NAACL 2025! We will soon update the website! \u003ca href=\"https://code-switching.github.io/\"\u003e[Website]\u003c/a\u003e\n- If you are new on code-switching or looking for a new research direction, we have written a comprehensive survey paper on code-switching: \u003cb\u003eThe Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges\u003c/b\u003e \u003ca href=\"https://arxiv.org/pdf/2212.09660.pdf\"\u003e[Paper]\u003c/a\u003e. Feel free to read and let us know if you have any suggestions! Thanks to Alham Fikri Aji, Zheng-Xin Yong, and Thamar Solorio to make this possible 😊\n- We organized the code-switching workshop at EMNLP 2023! \u003ca href=\"https://code-switching.github.io/2023\"\u003e[Website]\u003c/a\u003e\n- We (I, Marina Zhukova, and Sudipta Kar) organized a bird-of-a-feather session at EMNLP 2022 in Abu Dhabi. We have around 30 people joining (in-person and online). Thanks for coming!\n- 📔 There was a comprehensive tutorial about code-mixing by Microsoft Research (Monojit Choudhury, Kalika Bali, Anirudh Srinivasan, and Sandipan Dandapat) at EMNLP 2019, you can check the following \u003ca href=\"https://genius1237.github.io/emnlp19_tut/\"\u003elink\u003c/a\u003e.\n\n## 🏫 Workshops\nThis is the list of the code-switching workshop series:\n- First Workshop on Computational Approaches to Code-switching, EMNLP 2014 \u003ca href=\"http://emnlp2014.org/workshops/CodeSwitch/call.html\"\u003e[Website]\u003c/a\u003e\n- Second Workshop on Computational Approaches to Code-switching, EMNLP 2016\n- Third Workshop on Computational Approaches to Linguistic Code-switching, ACL 2018 \u003ca href=\"https://code-switching.github.io/2018/\"\u003e[Website]\u003c/a\u003e\n- Fourth Workshop on Computational Approaches to Linguistic Code-switching, LREC 2020 \u003ca href=\"https://code-switching.github.io/2020/\"\u003e[Website]\u003c/a\u003e\n- First Workshop on Speech Technologies for Code-switching in Multilingual Communities, Interspeech 2020 \u003ca href=\"https://www.microsoft.com/en-us/research/event/workshop-on-speech-technologies-for-code-switching-2020/\"\u003e[Website]\u003c/a\u003e\n- Fifth Workshop on Computational Approaches to Linguistic Code-switching, NAACL 2021 \u003ca href=\"https://code-switching.github.io/2021\"\u003e[Website]\u003c/a\u003e\n- Sixth Workshop on Computational Approaches to Linguistic Code-switching, EMNLP 2023 \u003ca href=\"https://code-switching.github.io/2023\"\u003e[Website]\u003c/a\u003e\n- Seventh Workshop on Computational Approaches to Linguistic Code-switching, NAACL 2025 \u003ca href=\"https://code-switching.github.io/2025\"\u003e[Website (will open soon)]\u003c/a\u003e\n\n## 📑 Research Papers\n\n### Survey Paper\n- \u003cb\u003eWinata, et al. (2023)\u003c/b\u003e \u003ci\u003eThe Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges\u003c/i\u003e. ACL Findings \u003ca href=\"https://arxiv.org/pdf/2212.09660.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eDoğruöz, et al (2021)\u003c/b\u003e \u003ci\u003eA Survey of Code-switching: Linguistic and Social Perspectives for Language Technologies\u003c/i\u003e. ACL \u003ca href=\"https://aclanthology.org/2021.acl-long.131.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eJose, et al. (2020)\u003c/b\u003e \u003ci\u003eA Survey of Current Datasets for Code-Switching Research\u003c/i\u003e. International Conference on Advanced Computing and Communication Systems (ICACCS) \u003ca href=\"https://ieeexplore.ieee.org/document/9074205\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eSitaram, et al. (2019)\u003c/b\u003e \u003ci\u003eA Survey of Code-switched Speech and Language Processing\u003c/i\u003e. Arxiv \u003ca href=\"https://arxiv.org/pdf/1904.00784.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Large Language Models\n- \u003cb\u003eIgor Sterner and Simone Teufel (2025)\u003c/b\u003e \u003ci\u003eMinimal Pair-Based Evaluation of Code-Switching\u003c/i\u003e. ACL \u003ca href=\"https://aclanthology.org/2025.acl-long.910.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/igorsterner/acs\"\u003e[Code]\u003c/a\u003e\n- \u003cb\u003eWinata, et al. (2024)\u003c/b\u003e \u003ci\u003eMINERS: Multilingual Language Models as Semantic Retrievers\u003c/i\u003e. EMNLP Findings \u003ca href=\"https://arxiv.org/pdf/2406.07424\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/gentaiscool/miners\"\u003e[Code]\u003c/a\u003e\n- \u003cb\u003eYoo, et al. (2024)\u003c/b\u003e \u003ci\u003eCode-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding\u003c/i\u003e. Arxiv \u003ca href=\"https://arxiv.org/pdf/2406.15481\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eLeon, et al., (2024)\u003c/b\u003e \u003ci\u003eCode-Mixed Probes Show How Pre-Trained Models Generalise On Code-Switched Text\u003c/i\u003e. LREC \u003ca href=\"https://aclanthology.org/2024.lrec-main.307.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/francesita/code-mixed-probes\"\u003e[Code]\u003c/a\u003e\n- \u003cb\u003eHuzaifah, et al. (2024)\u003c/b\u003e \u003ci\u003eEvaluating Code-Switching Translation with Large Language\nModels\u003c/i\u003e. LREC-COLING \u003ca href=\"https://aclanthology.org/2024.lrec-main.565.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eYong, et al. (2023)\u003c/b\u003e \u003ci\u003ePrompting Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages\u003c/i\u003e. CALCS, EMNLP \u003ca href=\"https://aclanthology.org/2023.calcs-1.5.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Language Identification and POS Tagging\n- \u003cb\u003eIgor Sterner (2024)\u003c/b\u003e \u003ci\u003eMultilingual Identification of English Code-Switching\u003c/i\u003e. VarDial, NAACL \u003ca href=\"https://aclanthology.org/2024.vardial-1.14.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/igorsterner/AnE\"\u003e[Code]\u003c/a\u003e\n- \u003cb\u003eBurchell, et al. (2024)\u003c/b\u003e \u003ci\u003eCode-Switched Language Identification is Harder Than You Think\u003c/i\u003e. EACL \u003ca href=\"https://aclanthology.org/2024.eacl-long.38.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eIgor Sterner and Simone Teufel (2023)\u003c/b\u003e \u003ci\u003eTongueSwitcher: Fine-Grained Identification of German-English Code-Switching\u003c/i\u003e. CALCS, EMNLP \u003ca href=\"https://aclanthology.org/2023.calcs-1.1.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/igorsterner/TongueSwitcher\"\u003e[Code]\u003c/a\u003e\n- \u003cb\u003eOstapenko, et al. (2022)\u003c/b\u003e \u003ci\u003eSpeaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching\u003c/i\u003e. ACL \u003ca href=\"https://aclanthology.org/2022.acl-long.267.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eNguyen, et al. (2021)\u003c/b\u003e \u003ci\u003eAutomatic Language Identification in Code-Switched Hindi-English Social Media Text\u003c/i\u003e. Journal of Open Humanities Data \u003ca href=\"https://openhumanitiesdata.metajnl.com/article/10.5334/johd.44/\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eTarunesh, et al. (2021)\u003c/b\u003e \u003ci\u003eFrom Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text\u003c/i\u003e. ACL \u003ca href=\"https://aclanthology.org/2021.acl-long.245.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eGustavo Aguilar and Thamar Solorio. (2020)\u003c/b\u003e \u003ci\u003eFrom English to Code-Switching: Transfer Learning with Strong Morphological Clues\u003c/i\u003e. ACL \u003ca href=\"https://arxiv.org/pdf/1909.05158.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/gaguilar/cs_elmo\"\u003e[Code]\u003c/a\u003e\n- \u003cb\u003eMager, et al. (2019)\u003c/b\u003e \u003ci\u003eSubword-Level Language Identification for Intra-Word Code-Switching\u003c/i\u003e. NAACL \u003ca href=\"https://arxiv.org/abs/1904.01989\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eZhang, et al. (2018)\u003c/b\u003e \u003ci\u003eA Fast, Compact, Accurate Model for Language Identification of Codemixed Text\u003c/i\u003e. EMNLP \u003ca href=\"https://aclanthology.org/D18-1030.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eKelsey Ball and Dan Garrette. (2018)\u003c/b\u003e \u003ci\u003ePart-of-Speech Tagging for Code-Switched, Transliterated Texts without Explicit Language Identification\u003c/i\u003e. EMNLP \u003ca href=\"http://aclweb.org/anthology/D18-1347\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eZeynep Yirmibesoglu and Gulsen Eryigit. (2018)\u003c/b\u003e \u003ci\u003eDetecting Code-Switching between Turkish-English Language Pair\u003c/i\u003e. Workshop W-NUT, EMNLP \u003ca href=\"http://www.aclweb.org/anthology/W18-6115\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eMavem, et al. (2018)\u003c/b\u003e \u003ci\u003eLanguage Identification and Analysis of Code-Switched Social Media Text\u003c/i\u003e. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL \u003ca href=\"http://www.aclweb.org/anthology/W18-3206\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eVictor Soto and Julia Hirschberg. (2018)\u003c/b\u003e \u003ci\u003eJoint Part-of-Speech and Language ID Tagging for Code-Switched Data\u003c/i\u003e. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL \u003ca href=\"http://aclweb.org/anthology/W18-3201\"\u003e[Paper]\u003c/a\u003e \n- \u003cb\u003eBullock, et al. (2018)\u003c/b\u003e \u003ci\u003ePredicting the presence of a Matrix Language in code-switching\u003c/i\u003e. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL \u003ca href=\"http://www.aclweb.org/anthology/W18-3208\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eSoto, et al. (2018)\u003c/b\u003e \u003ci\u003eThe Role of Cognate Words, POS Tags, and Entrainment in Code-Switching\u003c/i\u003e. Interspeech \u003ca href=\"http://www.cs.columbia.edu/speech/PaperFiles/2018/soto_is18.pdf\"\u003e[Paper]\u003c/a\u003e \n- \u003cb\u003eBarman, et al. (2016)\u003c/b\u003e \u003ci\u003ePart-of-speech Tagging of Code-mixed Social Media Content: Pipeline,Stacking and Joint Modelling\u003c/i\u003e. 2nd Workshop on Computational Approaches to Code-Switching, ACL \u003ca href=\"https://aclweb.org/anthology/W16-5804\"\u003e[Paper]\u003c/a\u003e \n- \u003cb\u003eVyas, et al. (2014)\u003c/b\u003e \u003ci\u003ePOS Tagging of English-Hindi Code-Mixed Social Media Content\u003c/i\u003e. EMNLP \u003ca href=\"https://www.aclweb.org/anthology/D14-1105.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eHeba Elfardy and Mona Diab. (2012)\u003c/b\u003e \u003ci\u003eToken Level Identification of Linguistic Code Switching\u003c/i\u003e. COLING \u003ca href=\"https://www.aclweb.org/anthology/C12-2029.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eThamar Solorio and Yang Liu. (2008)\u003c/b\u003e \u003ci\u003eLearning to Predict Code-Switching Points\u003c/i\u003e. EMNLP \u003ca href=\"http://www.aclweb.org/anthology/D08-1102\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eDau-Cheng Lyu and Ren-Yuan Lyu. (2008)\u003c/b\u003e \u003ci\u003eLanguage Identification on Code-Switching Utterances Using Multiple Cues\u003c/i\u003e. Interspeech \u003ca href=\"https://pdfs.semanticscholar.org/67b5/b05a9669202fe63cf5165a5b2286ddd1b6f2.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Corpus\n- \u003cb\u003eWinata, et al. (2026)\u003c/b\u003e \u003ci\u003eCan Large Language Models Understand, Reason About, and Generate Code-Switched Text?\u003c/i\u003e. Arxiv \u003ca href=\"https://arxiv.org/pdf/2601.07153\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/gentaiscool/codemixqa\"\u003e[Code]\u003c/a\u003e \u003ca href=\"https://huggingface.co/datasets/gentaiscool/codemixqa\"\u003e[Dataset]\u003c/a\u003e\n- \u003cb\u003eKuwanto, et al. (2024)\u003c/b\u003e \u003ci\u003eLinguistics Theory Meets LLM: Code-Switched Text Generation via Equivalence Constrained Large Language Models\u003c/i\u003e. Arxiv \u003ca href=\"\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/gkuwanto/ezswitch\"\u003e[Code]\u003c/a\u003e \u003ca href=\"https://huggingface.co/datasets/garrykuwanto/cspref\"\u003e[Dataset]\u003c/a\u003e\n- \u003cb\u003eRuochen Zhang and Carsten Eickhoff (2024)\u003c/b\u003e \u003ci\u003eCroCosum: A Benchmark Dataset for Cross-Lingual Code-switched Summarization\u003c/i\u003e. LREC \u003ca href=\"https://aclanthology.org/2024.lrec-main.367.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/RosenZhang/CroCoSum\"\u003e[Dataset]\u003c/a\u003e\n- \u003cb\u003eWhitehouse, et al. (2022)\u003c/b\u003e \u003ci\u003eEntityCS: Improving Zero-Shot Cross-lingual Transfer with Entity-Centric Code Switching\u003c/i\u003e. EMNLP \u003ca href=\"https://arxiv.org/pdf/2210.12540.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/huawei-noah/noah-research/tree/master/NLP\"\u003e[Code]\u003c/a\u003e\n- \u003cb\u003eLovenia, et al. (2022)\u003c/b\u003e \u003ci\u003eASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation\u003c/i\u003e. LREC \u003ca href=\"https://arxiv.org/pdf/2112.06223.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://huggingface.co/datasets/CAiRE/ASCEND\"\u003e[Dataset]\u003c/a\u003e\n- \u003cb\u003eNguyen, et al. (2020)\u003c/b\u003e \u003ci\u003eCanVEC-the Canberra Vietnamese-English Code-switching Natural Speech Corpus\u003c/i\u003e. LREC \u003ca href=\"https://aclanthology.org/2020.lrec-1.507.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eUmapathy, et al. (2020)\u003c/b\u003e \u003ci\u003eInvestigating Modelling Techniques for Natural Language Inference on Code-Switched Dialogues in Bollywood Movies\u003c/i\u003e. First Workshop on Speech Technologies for Code-switching in Multilingual Communities, Interspeech 2020 \u003ca href=\"https://aka.ms/CodeMixedNLI\"\u003e[Dataset]\u003c/a\u003e\n- \u003cb\u003eXiang, et al. (2020)\u003c/b\u003e \u003ci\u003eSina Mandarin Alphabetical Words:A Web-driven Code-mixing Lexical Resource\u003c/i\u003e. AACL-IJCNLP \u003ca href=\"\"\u003e[TBC]\u003c/a\u003e\n- \u003cb\u003eChakravarthi, et al. (2020)\u003c/b\u003e \u003ci\u003eCorpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text\u003c/i\u003e. Spoken Language Technologies for Under-resourced languages) and CCURL (Collaboration and Computing for Under-Resourced Languages Workshop, LREC \u003ca href=\"https://arxiv.org/pdf/2006.00206.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eKhanuja, et al. (2020)\u003c/b\u003e \u003ci\u003eA New Dataset for Natural Language Inference from Code-mixed Conversations\u003c/i\u003e. 4th Workshop of Computational Approaches to Linguistic Code-switching, LREC \u003ca href=\"https://arxiv.org/abs/2004.05051\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eBarik, et al. (2019)\u003c/b\u003e \u003ci\u003eNormalization of Indonesian-English Code-Mixed Twitter Data\u003c/i\u003e. W-NUT, EMNLP \u003ca href=\"https://www.aclweb.org/anthology/D19-5554.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/seelenbrecher/code-mixed-normalization\"\u003e[Dataset]\u003c/a\u003e\n- \u003cb\u003eSingh, et al. (2018)\u003c/b\u003e \u003ci\u003eA Twitter Corpus for Hindi-English Code Mixed POS Tagging\u003c/i\u003e. Sixth International Workshop on Natural Language Processing for Social Media, ACL \u003ca href=\"http://aclweb.org/anthology/W18-3503\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eLi, et al. (2012)\u003c/b\u003e \u003ci\u003eA Mandarin-English Code-Switching Corpus\u003c/i\u003e. LREC \u003ca href=\"http://www.lrec-conf.org/proceedings/lrec2012/pdf/964_Paper.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eLyu, et al. (2010)\u003c/b\u003e \u003ci\u003eSEAME: A Mandarin-English Code-Switching Speech Corpus in South-East Asia\u003c/i\u003e. Interspeech \u003ca href=\"https://pdfs.semanticscholar.org/de83/7c40f54125ce9c612c143ebc6c9ca5e84b13.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eLyu, et al. (2010)\u003c/b\u003e \u003ci\u003eAn Analysis of a Mandarin-English Code-switching Speech Corpus: SEAME\u003c/i\u003e. Age \u003ca href=\"https://www.researchgate.net/profile/Tien_Ping_Tan/publication/266890986_An_Analysis_of_a_Mandarin-English_Code-switching_Speech_Corpus_SEAME/links/54cb12f80cf2517b7560ffbb.pdf\"\u003e[Paper]\u003c/a\u003e\n\n\n### Language Modeling and Speech Recognition\n- \u003cb\u003eYu, et al. (2023)\u003c/b\u003e \u003ci\u003eCode-switching text generation and injection in mandarin-english asr\u003c/i\u003e. ICASSP \u003ca href=\"https://arxiv.org/pdf/2303.10949\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eTolúlopé, et al. (2023)\u003c/b\u003e \u003ci\u003eMultilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching\u003c/i\u003e. Sixth Workshop on Computational Approaches to Linguistic Code-Switching. \u003ca href=\"https://openreview.net/forum?id=mtrmzEoSRk\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eKumar, et al. (2020)\u003c/b\u003e \u003ci\u003eMachine Learning based Language Modelling of Code Switched Data\u003c/i\u003e. International Conference on Electronics and Sustainable Communication Systems (ICESC) \u003ca href=\"https://ieeexplore.ieee.org/abstract/document/9155695\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eMadhumani, et al. (2020)\u003c/b\u003e \u003ci\u003eLearning not to Discriminate: Task Agnostic Learning for Improving Monolingual and Code-switched Speech Recognition\u003c/i\u003e. Arxiv \u003ca href=\"https://arxiv.org/pdf/2006.05257.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eShah, et al. (2020)\u003c/b\u003e \u003ci\u003eLearning to Recognize Code-switched Speech Without Forgetting Monolingual Speech Recognition\u003c/i\u003e. Arxiv \u003ca href=\"https://arxiv.org/pdf/2006.00782.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eWinata, et al. (2020)\u003c/b\u003e \u003ci\u003eMeta-Transfer Learning for Code-Switched Speech Recognition\u003c/i\u003e. ACL \u003ca href=\"https://arxiv.org/pdf/2004.14228.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/audioku/meta-transfer-learning\"\u003e[Code]\u003c/a\u003e\n- \u003cb\u003eChandu, et al. (2020)\u003c/b\u003e \u003ci\u003eStyle Variation as a Vantage Point for Code-Switching\u003c/i\u003e. Arxiv \u003ca href=\"https://arxiv.org/pdf/2005.00458.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eGanji Sreeram and Rohit Sinha (2020)\u003c/b\u003e \u003ci\u003eExploration of End-to-End Framework for Code-Switching Speech Recognition Task: Challenges and Enhancements\u003c/i\u003e. IEEE Access \u003ca href=\"https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9058687\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eWinata, et al. (2019)\u003c/b\u003e \u003ci\u003eCode-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences\u003c/i\u003e. CoNLL \u003ca href=\"https://www.aclweb.org/anthology/K19-1026.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eHila Gonen and Yoav Goldberg (2019)\u003c/b\u003e \u003ci\u003eLanguage Modeling for Code-Switching:Evaluation, Integration of Monolingual Data, and Discriminative Training\u003c/i\u003e. EMNLP \u003ca href=\"https://www.aclweb.org/anthology/D19-1427.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eLee, et al. (2019)\u003c/b\u003e \u003ci\u003eLinguistically Motivated Parallel Data Augmentation for Code-switch Language Modeling\u003c/i\u003e. Interspeech \u003ca href=\"https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1382.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eVictor Soto and Julia Hirschberg (2019)\u003c/b\u003e \u003ci\u003eImproving Code-Switched Language Modeling Performance Using Cognate Features\u003c/i\u003e. Interspeech \u003ca href=\"https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2681.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eChang, et al. (2019)\u003c/b\u003e \u003ci\u003eCode-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation\u003c/i\u003e. Interspeech \u003ca href=\"https://www.isca-speech.org/archive/Interspeech_2019/pdfs/3214.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eZeng, et al. (2019)\u003c/b\u003e \u003ci\u003eOn the End-to-End Solution to Mandarin-English Code-switching Speech Recognition\u003c/i\u003e. Interspeech \u003ca href=\"https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1429.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eTaneja, et al. (2019)\u003c/b\u003e \u003ci\u003eExploiting Monolingual Speech Corpora for Code-mixed Speech Recognition\u003c/i\u003e. Interspeech \u003ca href=\"https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1959.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eShan, et al. (2019)\u003c/b\u003e \u003ci\u003eInvestigating End-to-end Speech Recognition for Mandarin-english Code-switching\u003c/i\u003e. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) \u003ca href=\"http://lxie.nwpu-aslp.org/papers/2019ICASSP-ChanghaoShan-CS.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eGrandee Lee, Haizhou Li. (2019)\u003c/b\u003e \u003ci\u003eWord and Class Common Space Embedding for Code-switch Language Modelling\u003c/i\u003e. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) \u003ca href=\"https://www.researchgate.net/profile/Grandee_Lee/publication/331122308_Word_and_Class_Common_Space_Embedding_for_Code-switch_Language_Modelling/links/5c66b31a45851582c3eadf09/Word-and-Class-Common-Space-Embedding-for-Code-switch-Language-Modelling.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eHamed, et al. (2019)\u003c/b\u003e \u003ci\u003eCode-Switching Language Modeling with Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English\u003c/i\u003e. International Conference on Speech and Computer \u003ca href=\"https://link.springer.com/chapter/10.1007/978-3-030-26061-3_17\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eWinata, et al. (2018)\u003c/b\u003e \u003ci\u003eLearn to Code-Switch: Data Augmentation using Copy Mechanism on Language Modeling\u003c/i\u003e. Arxiv \u003ca href=\"https://arxiv.org/pdf/1810.10254.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eWinata, et al. (2018)\u003c/b\u003e \u003ci\u003eTowards End-to-end Automatic Code-Switching Speech Recognition\u003c/i\u003e. Arxiv \u003ca href=\"https://arxiv.org/pdf/1810.12620.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eNakayama, et al. (2018)\u003c/b\u003e \u003ci\u003eSpeech Chain for Semi-Supervised Learning of Japanese-English Code-Switching ASR and TTS\u003c/i\u003e. IEEE Spoken Language Technology Workshop (SLT) \u003ca href=\"https://ieeexplore.ieee.org/iel7/8632666/8639030/08639674.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eJesse Emond, Bhuwana Ramabhadran, Brian Roark, Pedro Moreno, and Min Ma. (2018)\u003c/b\u003e \u003ci\u003eTransliteration Based Approaches to Improve Code-Switched Speech Recognition Performance\u003c/i\u003e, IEEE Spoken Language Technology Workshop (SLT) \u003ca href=\"https://ieeexplore.ieee.org/iel7/8632666/8639030/08639699.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eGanji Sreeram and Rohit Sinha. (2018)\u003c/b\u003e \u003ci\u003eExploiting Parts-of-Speech for Improved Textual Modeling of Code-Switching Data\u003c/i\u003e. 2018 Twenty Fourth National Conference on Communications (NCC) \u003ca href=\"https://ieeexplore.ieee.org/abstract/document/8600097\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eGarg, et al. (2018)\u003c/b\u003e \u003ci\u003eCode-switched Language Models Using Dual RNNs and Same-Source Pretraining\u003c/i\u003e. EMNLP \u003ca href=\"http://aclweb.org/anthology/D18-1346\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eEwald van der Westhuizen and Thomas R. Niesler. (2018)\u003c/b\u003e \u003ci\u003eSynthesised bigrams using word embeddings for code-switched\nASR of four South African language pairs\u003c/i\u003e. Computer Speech and Language \u003ca href=\"https://www.sciencedirect.com/science/article/pii/S0885230818301815\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eBiswal, et al. (2018)\u003c/b\u003e \u003ci\u003eMultilingual Neural Network Acoustic Modelling for ASR of Under-Resourced\nEnglish-isiZulu Code-Switched Speech\u003c/i\u003e. Interspeech \u003ca href=\"https://www.researchgate.net/profile/Emre_Yilmaz33/publication/325571050_Multilingual_Neural_Network_Acoustic_Modelling_for_ASR_of_Under-Resourced_English-isiZulu_Code-Switched_Speech/links/5b2cdac40f7e9b0df5baf271/Multilingual-Neural-Network-Acoustic-Modelling-for-ASR-of-Under-Resourced-English-isiZulu-Code-Switched-Speech.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eWinata, et al. (2018)\u003c/b\u003e \u003ci\u003eCode-Switching Language Modeling using Syntax-Aware Multi-Task Learning\u003c/i\u003e. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL \u003ca href=\"http://aclweb.org/anthology/W18-3207\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/gentaiscool/multi-task-cs-lm\"\u003e[Code]\u003c/a\u003e\n- \u003cb\u003eChandu, et al. (2018)\u003c/b\u003e \u003ci\u003eLanguage Informed Modeling of Code-Switched Text\u003c/i\u003e. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL \u003ca href=\"http://www.aclweb.org/anthology/W18-3211\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003ePratapa, et al. (2018)\u003c/b\u003e \u003ci\u003eLanguage Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data\u003c/i\u003e. ACL \u003ca href=\"https://www.microsoft.com/en-us/research/uploads/prod/2018/05/language_modeling_cm.pdf\"\u003e[Paper]\u003c/a\u003e \n- \u003cb\u003eSivasankaran, et al. (2018)\u003c/b\u003e \u003ci\u003ePhone Merging For Code-Switched Speech Recognition\u003c/i\u003e. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL \u003ca href=\"http://aclweb.org/anthology/W18-3202\"\u003e[Paper]\u003c/a\u003e \n- \u003cb\u003eGarg, et al. (2018)\u003c/b\u003e \u003ci\u003eDual Language Models for Code Switched Speech Recognition\u003c/i\u003e. Interspeech \u003ca href=\"https://arxiv.org/abs/1711.01048\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eBaheti, et al. (2017)\u003c/b\u003e \u003ci\u003eCurriculum Design for Code-switching: Experiments with Language\nIdentification and Language Modeling with Deep Neural Networks\u003c/i\u003e. ICON \u003ca href=\"https://www.microsoft.com/en-us/research/uploads/prod/2018/04/icon-2017-curriculum.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eAdel, et al. (2015)\u003c/b\u003e \u003ci\u003eSyntactic and Semantic Features For Code-Switching Factored Language Models\u003c/i\u003e. IEEE Transactions on Audio, Speech, and Language Processing \u003ca href=\"https://arxiv.org/pdf/1710.01809.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eYing Li and Pascale Fung. (2014)\u003c/b\u003e \u003ci\u003eCode switch language modeling with Functional Head Constraint\u003c/i\u003e. ICASSP \u003ca href=\"https://www.semanticscholar.org/paper/Code-switch-language-modeling-with-Functional-Head-Li-Fung/46996cb0e1b6ff7c4bf88b6b200327a1a19cd946\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eYing Li and Pascale Fung. (2014)\u003c/b\u003e \u003ci\u003eLanguage Modeling with Functional Head Constraint for Code Switching Speech Recognition\u003c/i\u003e. EMNLP \u003ca href=\"http://www.aclweb.org/anthology/D14-1098\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eAdel, et al. (2013)\u003c/b\u003e \u003ci\u003eCombination of Recurrent Neural Networks and Factored Language\nModels for Code-Switching Language Modeling\u003c/i\u003e. ACL \u003ca href=\"http://www.aclweb.org/anthology/P13-2037\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eAdel, et al. (2013)\u003c/b\u003e \u003ci\u003eRecurrent neural network language modeling for code switching conversational speech\u003c/i\u003e. ICASSP \u003ca href=\"https://ieeexplore.ieee.org/document/6639306/\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eVu, et al. (2012)\u003c/b\u003e \u003ci\u003eA First Speech Recognition System for Mandarin-English Code-Switch Conversational Speech\u003c/i\u003e. ICASSP \u003ca href=\"https://www.csl.uni-bremen.de/cms/images/documents/publications/ICASSP2012-Vu_CodeSwitch.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eYing Li and Pascale Fung. (2012)\u003c/b\u003e \u003ci\u003eCode-switch Language Model with Inversion Constraints for Mixed Language Speech Recognition\u003c/i\u003e. COLING \u003ca href=\"http://www.aclweb.org/anthology/C12-1102\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eLi, et al. (2011)\u003c/b\u003e \u003ci\u003eAsymmetric acoustic modeling of mixed language speech\u003c/i\u003e. ICASSP \u003ca href=\"https://pdfs.semanticscholar.org/1b57/5dbb14901b0cfa668f21a3b188beee4c9582.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Discourse\n- \u003cb\u003eSravani, et al. (2021)\u003c/b\u003e \u003ci\u003ePolitical Discourse Analysis: A Case Study of Code Mixing and Code Switching in Political Speeches\u003c/i\u003e. Proceedings of the 5th Workshop on Computational Approaches to Code Switching (CALCS), NAACL \u003ca href=\"https://www.aclweb.org/anthology/2021.calcs-1.1.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Generation\n- \u003cb\u003eGupta, et al. (2020)\u003c/b\u003e \u003ci\u003eA Semi-supervised Approach to Generate the Code-Mixed Text using Pre-trained Encoder and Transfer Learning\u003c/i\u003e. Findings of EMNLP \u003ca href=\"https://www.aclweb.org/anthology/2020.findings-emnlp.206.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eBryan Gregorius and Takeshi Okadome (2022)\u003c/b\u003e \u003ci\u003eGenerating Code-Switched Text from Monolingual Text with Dependency Tree\u003c/i\u003e. The 20th Annual Workshop of the Australasian Language Technology Association \u003ca href=\"https://aclanthology.org/2022.alta-1.12/\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/Selubi/CSify\"\u003e[Code]\u003c/a\u003e\n\n### Speech Synthesis\n- \u003cb\u003eSai Krishna Rallabandi and Alan W Black (2019)\u003c/b\u003e \u003ci\u003eVariational Attention using Articulatory Priors for generating Code Mixed Speech using Monolingual Corpora\u003c/i\u003e. Interspeech \u003ca href=\"https://pdfs.semanticscholar.org/5e74/c4c5688a24248a9bd04aa0474c28bc267ba5.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eSai Krishna Rallabandi and Alan W Black (2017)\u003c/b\u003e \u003ci\u003eOn Building Mixed Lingual Speech Synthesis Systems.\u003c/i\u003e Interspeech \u003ca href=\"https://pdfs.semanticscholar.org/02a2/0ed2182475b40a4e7744aa6555607adffa62.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eChandu, et al. (2017)\u003c/b\u003e \u003ci\u003eSpeech Synthesis for Mixed-Language Navigation Instructions.\u003c/i\u003e Interspeech \u003ca href=\"https://pdfs.semanticscholar.org/99f0/7e194197a55fd017657d4cd1a8d9c349de05.pdf?_ga=2.136822064.183444372.1582035562-2106241630.1557729576\"\u003e[Paper]\u003c/a\u003e\n\n### Metric\n- \u003cb\u003eGuzman, et al. (2017)\u003c/b\u003e \u003ci\u003eMetrics for modeling code-switching across corpora\u003c/i\u003e. Interspeech \u003ca href=\"https://pdfs.semanticscholar.org/25a5/cf5c7dc2269cf67d98b2fb46317a4d16b581.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Representation Learning\n- \u003cb\u003eAdilazuarda, et al. (2023)\u003c/b\u003e \u003ci\u003eIndoRobusta: Towards Robustness Against Diverse Code-Mixed Indonesian Local Languages\u003c/i\u003e. Proceedings of the First Workshop on Scaling Up Multilingual Evaluation, AACL \u003ca href=\"https://aclanthology.org/2022.sumeval-1.5.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/faridlazuarda/indorobusta\"\u003e[Code]\u003c/a\u003e\n- \u003cb\u003ePrasad, et al. (2021)\u003c/b\u003e \u003ci\u003eThe Effectiveness of Intermediate-Task Training for Code-Switched Natural Language Understanding\u003c/i\u003e. Proceedings of the 1st Workshop on Multilingual Representation Learning, EMNLP \u003ca href=\"https://aclanthology.org/2021.mrl-1.16.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eWinata, et al. (2021)\u003c/b\u003e \u003ci\u003eAre Multilingual Models Effective in Code-Switching?\u003c/i\u003e. Proceedings of the 5th Workshop on Computational Approaches to Code Switching (CALCS), NAACL \u003ca href=\"https://www.aclweb.org/anthology/2021.calcs-1.20.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eRizal, et al. (2020)\u003c/b\u003e \u003ci\u003eEvaluating Word Embeddings for Indonesian–English Code-Mixed Text Based on Synthetic Data\u003c/i\u003e. Proceedings of the 4th Workshop on Computational Approaches to Code Switching (CALCS), LREC \u003ca href=\"https://www.aclweb.org/anthology/2020.calcs-1.4/\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eWinata, et al. (2019)\u003c/b\u003e \u003ci\u003eHierarchical Meta-Embeddings for Code-Switching Named Entity Recognition\u003c/i\u003e. EMNLP \u003ca href=\"https://arxiv.org/abs/1909.08504\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/gentaiscool/meta-emb\"\u003e[Code]\u003c/a\u003e\n- \u003cb\u003ePratapa, et al. (2018)\u003c/b\u003e \u003ci\u003eWord Embeddings for Code-Mixed Language Processing\u003c/i\u003e. EMNLP \u003ca href=\"http://www.aclweb.org/anthology/D18-1344\"\u003e[Paper]\u003c/a\u003e\n\n### Machine Translation\n- \u003cb\u003ePengpun, et al. (2024)\u003c/b\u003e \u003ci\u003eOn Creating an English-Thai Code-switched Machine Translation in Medical Domain.\u003c/i\u003e EMNLP \u003ca href=\"https://arxiv.org/abs/2410.16221\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eGaser, et al. (2023)\u003c/b\u003e \u003ci\u003eExploring Segmentation Approaches for Neural Machine Translation of Code-Switched Egyptian Arabic-English Text\u003c/i\u003e. EACL \u003ca href=\"https://aclanthology.org/2023.eacl-main.256.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eKuwanto, et al. (2021)\u003c/b\u003e \u003ci\u003eLow-Resource Machine Translation Training Curriculum Fit for Low-Resource Languages\u003c/i\u003e. Arxiv \u003ca href=\"https://arxiv.org/pdf/2103.13272\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eVivek Srivastava and Mayank Singh (2020)\u003c/b\u003e \u003ci\u003ePHINC: A Parallel Hinglish Social Media Code-Mixed Corpus for Machine Translation\u003c/i\u003e. W-NUT, EMNLP \u003ca href=\"http://noisy-text.github.io/2020/pdf/2020.d200-1.7.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://zenodo.org/record/3605597#.X5rwWXgzZQI\"\u003e[Dataset]\u003c/a\u003e\n- \u003cb\u003eThoudam Doren Singh and Thamar Solorio. (2017)\u003c/b\u003e \u003ci\u003eTowards Translating Mixed-Code Comments from Social Media\u003c/i\u003e. CICLing \u003ca href=\"https://link.springer.com/chapter/10.1007/978-3-319-77116-8_34\"\u003e[Paper]\u003c/a\u003e\n\n### Speech Translation\n- \u003cb\u003eAlastruey, et al. (2023)\u003c/b\u003e \u003ci\u003eTowards Real-World Streaming Speech Translation for Code-Switched Speech\u003c/i\u003e. CALCS, EMNLP \u003ca href=\"https://aclanthology.org/2023.calcs-1.2.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Natural Language Understanding\n- \u003cb\u003eKrishnan, et al. (2021)\u003c/b\u003e \u003ci\u003eMultilingual Code-Switching for Zero-Shot Cross-Lingual Intent Prediction and Slot Filling\u003c/i\u003e. MRL, EMNLP \u003ca href=\"https://aclanthology.org/2021.mrl-1.18.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Named Entity Recognition\n- \u003cb\u003ePriyadharshini, et al. (2020)\u003c/b\u003e \u003ci\u003eNamed Entity Recognition for Code-Mixed Indian Corpus using Meta Embedding\u003c/i\u003e. 6th International Conference on Advanced Computing and Communication Systems (ICACCS) \u003ca href=\"https://ieeexplore.ieee.org/abstract/document/9074379\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eWinata, et al. (2019)\u003c/b\u003e \u003ci\u003eLearning Multilingual Meta-Embeddings for Code-Switching Named Entity Recognition\u003c/i\u003e. RepL4NLP, ACL \u003ca href=\"https://www.aclweb.org/anthology/W19-4320\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/gentaiscool/meta-emb\"\u003e[Code]\u003c/a\u003e\n- \u003cb\u003eAguilar, et al. (2018)\u003c/b\u003e \u003ci\u003eNamed Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task\u003c/i\u003e. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL \u003ca href=\"http://www.aclweb.org/anthology/W18-3219\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eWang, et al. (2018)\u003c/b\u003e \u003ci\u003eCode-Switched  Named  Entity  Recognition\nwith Embedding Attention\u003c/i\u003e. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL \u003ca href=\"http://www.aclweb.org/anthology/W18-3221\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eWinata, et al. (2018)\u003c/b\u003e \u003ci\u003eBilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition\u003c/i\u003e. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL \u003ca href=\"http://aclweb.org/anthology/W18-3214\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eAguilar, et al. (2017)\u003c/b\u003e \u003ci\u003eA Multi-task Approach for Named Entity Recognition in Social Media Data\u003c/i\u003e. 3rd Workshop on Noisy User-generated Text, EMNLP \u003ca href=\"http://www.aclweb.org/anthology/W17-4419\"\u003e[Paper]\u003c/a\u003e\n\n### Linguistics\n- \u003cb\u003eLi Nyuyen. (2018)\u003c/b\u003e \u003ci\u003eBorrowing or Code-switching? Traces of community norms in Vietnamese-English speech.\u003c/i\u003e Australian Journal of Linguistics 38.4 (2018): 443-466. \u003ca href=\"https://www.tandfonline.com/doi/abs/10.1080/07268602.2018.1510727\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eFairchild, Sarah, and Janet G. Van Hell. (2017)\u003c/b\u003e \u003ci\u003eDeterminer-noun code-switching in Spanish heritage speakers.\u003c/i\u003e Bilingualism: Language and Cognition 20.1 (2017): 150-161. \u003ca href=\"https://www.researchgate.net/profile/Janet_Van_Hell/publication/282895015_Determiner-noun_code-switching_in_Spanish_heritage_speakers/links/5891e94ba6fdcc1b41469634/Determiner-noun-code-switching-in-Spanish-heritage-speakers.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eBhatt, Rakesh M., and Agnes Bolonyai. (2011)\u003c/b\u003e \u003ci\u003eCode-switching and the optimal grammar of bilingual language use.\u003c/i\u003e Bilingualism: Language and Cognition 14.4 (2011): 522-546. \u003ca href=\"https://s3.amazonaws.com/academia.edu.documents/38571919/BLC-Bhatt-Bolonyai.pdf?AWSAccessKeyId=AKIAIWOWYYGZ2Y53UL3A\u0026Expires=1540209279\u0026Signature=srQ%2B9cKb1LdK4qgUtuGJ1zG3Wa4%3D\u0026response-content-disposition=inline%3B%20filename%3DCode-switching_and_the_optimal_grammar_o.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eLipski (2005)\u003c/b\u003e \u003ci\u003eCode-switching or Borrowing? No sé so no puedo decir, you know.\u003c/i\u003e Second Workshop on Spanish Sociolinguistics \u003ca href=\"http://commonweb.unifr.ch/artsdean/pub/gestens/f/as/files/4740/21370_065330.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eRoberto R. Heredia and Jeanette Altarriba (2001)\u003c/b\u003e \u003ci\u003eBilingual Language Mixing: Why Do Bilinguals Code-Switch?\u003c/i\u003e SAGE Publications \u003ca href=\"https://journals.sagepub.com/doi/pdf/10.1111/1467-8721.00140\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eBelazi, et al. (1994)\u003c/b\u003e \u003ci\u003eCode switching and X-bar theory: The functional head constraint\u003c/i\u003e. Linguistic inquiry Vol 25 No.2 Spring \u003ca href=\"https://www.jstor.org/stable/4178859\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eShana Poplack (1980)\u003c/b\u003e \u003ci\u003eSometimes i’ll start a sentence in spanish y termino en espanol: toward a typology of code-switching1\u003c/i\u003e. Linguistics 18(7-8) \u003ca href=\"https://yorkspace.library.yorku.ca/xmlui/bitstream/handle/10315/2506/CRLC00161.pdf?sequence=1\u0026isAllowed=y\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003ePfaff, Carol W. (1979)\u003c/b\u003e \u003ci\u003eConstraints on language mixing: intrasentential code-switching and borrowing in Spanish/English.\u003c/i\u003e Language: 291-318. \u003ca href=\"https://www.jstor.org/stable/pdf/412586.pdf?casa_token=_ghSnFiA7q4AAAAA:TR2oFkeipuhqYca38iK-55yaQ2vMiJG47mdMkDHw3QMdOq1TN935OkaeI5i06KgXnGg8tPjQwXnOLlA8sMdL2VC6kGSrsE2bX2tONqcwhWI2aWcs8kBg\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eShana Poplack (1978)\u003c/b\u003e \u003ci\u003eSyntactic structure and social function of code-switching\u003c/i\u003e. Vol. 2. Centro de Estudios Puertorriqueños, City University of New York \u003ca href=\"https://www.researchgate.net/publication/317039669_Syntactic_structure_and_social_function_of_code-switching\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eGumperz, J. J., \u0026 Hernandez, E. (1969)\u003c/b\u003e \u003ci\u003eCognitive aspects of bilingual communication\u003c/i\u003e. Institute of International Studies, University of California \u003ca href=\"https://files.eric.ed.gov/fulltext/ED138103.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Affective Computing\n- \u003cb\u003eChakravarthi, et al. (2021)\u003c/b\u003e \u003ci\u003eDravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text\u003c/i\u003e. Arxiv \u003ca href=\"https://arxiv.org/pdf/2106.09460.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/bharathichezhiyan/DravidianCodeMix-Dataset\"\u003e[Code and Dataset]\u003c/a\u003e\n- \u003cb\u003eSiddharth Yadav (2020)\u003c/b\u003e \u003ci\u003eUnsupervised Sentiment Analysis for Code-mixed Data\u003c/i\u003e. Arxiv\u003ca href=\"https://arxiv.org/pdf/2001.11384.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/sedflix/unsacmt\"\u003e[Code]\u003c/a\u003e\n- \u003cb\u003eWang, et al. (2017)\u003c/b\u003e \u003ci\u003eEmotion Analysis in Code-Switching Text With Joint Factor Graph Model\u003c/i\u003e. IEEE/ACM Transactions on Audio, Speech, and Language Processing \u003ca href=\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=7776833\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eWang, et al. (2016)\u003c/b\u003e \u003ci\u003eA Bilingual Attention Network for Code-switched Emotion Prediction\u003c/i\u003e. COLING \u003ca href=\"https://www.aclweb.org/anthology/C16-1153.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eSophia Lee and Zhongqing Wang (2015)\u003c/b\u003e \u003ci\u003eEmotion in Code-switching Texts: Corpus Construction and Analysis\u003c/i\u003e. Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing \u003ca href=\"https://www.aclweb.org/anthology/W15-3116.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eWang, et al. (2015)\u003c/b\u003e \u003ci\u003eEmotion Detection in Code-switching Texts via Bilingual and Sentimental Information\u003c/i\u003e. ACL \u003ca href=\"https://www.aclweb.org/anthology/P15-2125.pdf\"\u003e[Paper]\u003c/a\u003e\n  \n### Dialog and Conversational System\n- \u003cb\u003eGupta, et al. (2018)\u003c/b\u003e \u003ci\u003eUncovering Code-Mixed Challenges: A Framework for Linguistically Driven Question Generation and Neural based Question Answering\u003c/i\u003e. CoNLL \u003ca href=\"http://www.aclweb.org/anthology/K18-1012\"\u003e[Paper]\u003c/a\u003e\n\n### Discourse\n- \u003cb\u003eSravani, et al. (2021)\u003c/b\u003e \u003ci\u003ePolitical Discourse Analysis: A Case Study of Code Mixing and Code Switching in Political Speeches\u003c/i\u003e. CALCS Proceedings of the 5th Workshop on Computational Approaches to Code Switching (CALCS), NAACL \u003ca href=\"https://www.aclweb.org/anthology/2021.calcs-1.1.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Syntax\n- \u003cb\u003eIgor Sterner and Simone Teufel (2025)\u003c/b\u003e \u003ci\u003eCode-Switching and Syntax: A Large-Scale Experiment\u003c/i\u003e. ACL Findings \u003ca href=\"https://aclanthology.org/2025.findings-acl.600.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/igorsterner/csntax-gnn\"\u003e[Code]\u003c/a\u003e\n- \u003cb\u003eKodali, et al. (2022)\u003c/b\u003e \u003ci\u003eSyMCoM - Syntactic Measure of Code Mixing A Study Of English-Hindi Code-Mixing\u003c/i\u003e. Findings of ACL \u003ca href=\"https://aclanthology.org/2022.findings-acl.40.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eÖzlem Çetinoglu and Çagrı Çöltekin (2019)\u003c/b\u003e \u003ci\u003eChallenges of Annotating a Code-Switching Treebank\u003c/i\u003e. SyntaxFest \u003ca href=\"https://syntaxfest.github.io/syntaxfest19/proceedings/papers/paper_83.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Adversarial Attack\n- \u003cb\u003eSamson Tan and Shafiq Joty (2021)\u003c/b\u003e \u003ci\u003eCode-Mixing on Sesame Street: Dawn of the Adversarial Polyglots\u003c/i\u003e. NAACL \u003ca href=\"https://arxiv.org/pdf/2103.09593.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Social Linguistics\n- \u003cb\u003eBolock, et al. (2020)\u003c/b\u003e \u003ci\u003eWho, When and Why: The 3 Ws of Code-Switching\u003c/i\u003e. International Conference on Practical Applications of Agents and Multi-Agent Systems \u003ca href=\"https://www.researchgate.net/profile/Alia_El_Bolock/publication/342705747_Who_When_and_Why_The_3_Ws_of_Code-Switching/links/5f0a659892851c52d62cfd13/Who-When-and-Why-The-3-Ws-of-Code-Switching.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eYoder, et al. (2017)\u003c/b\u003e \u003ci\u003eCode-Switching as a Social Act:The Case of Arabic Wikipedia Talk Pages\u003c/i\u003e. Proceedings of the Second Workshop on Natural Language Processing and Computational Social Science, ACL \u003ca href=\"https://www.aclweb.org/anthology/W17-2911\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eAgrawal, et al. (2017)\u003c/b\u003e \u003ci\u003eAgarwal, Prabhat, et al. I may talk in English but gaali toh Hindi mein hi denge: A study of English-Hindi code-switching and swearing pattern on social networks\u003c/i\u003e. International Conference on Communication Systems and Networks (COMSNETS) \u003ca href=\"https://ieeexplore.ieee.org/abstract/document/7945452\"\u003e[Paper]\u003c/a\u003e\n\n### Benchmark\n- \u003cb\u003eKhanuja, et al. (2020)\u003c/b\u003e \u003ci\u003eGLUECoS : An Evaluation Benchmark for Code-Switched NLP\u003c/i\u003e. ACL \u003ca href=\"https://arxiv.org/pdf/2004.12376.pdf\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eAguilar, et al. (2020)\u003c/b\u003e \u003ci\u003eLinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation\u003c/i\u003e. LREC \u003ca href=\"https://www.aclweb.org/anthology/2020.lrec-1.223.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Social Media\n- \u003cb\u003eBali, et al. (2014)\u003c/b\u003e \u003ci\u003e“I am borrowing ya mixing ?” An Analysis of English-Hindi Code Mixing in Facebook\u003c/i\u003e. Proceedings of The First Workshop on Computational Approaches to Code Switching \u003ca href=\"https://www.aclweb.org/anthology/W14-3914.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Text Normalization\n- \u003cb\u003eDwija Parikh and Thamar Solorio (2021)\u003c/b\u003e \u003ci\u003eNormalization and Back-Transliteration for Code­Switched Data\u003c/i\u003e. CALCS Proceedings of the 5th Workshop on Computational Approaches to Code Switching (CALCS), NAACL \u003ca href=\"https://www.aclweb.org/anthology/2021.calcs-1.15.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Toolkit\n\n#### Sentence Segmentation\n- \u003cb\u003eFrohmann, et al. (2024)\u003c/b\u003e \u003ci\u003eSegment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation\n\u003c/i\u003e. EMNLP \u003ca href=\"https://aclanthology.org/2024.emnlp-main.665.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/segment-any-text/wtpsplit\"\u003e[Code]\u003c/a\u003e\n\n\n#### Synthetic Data Generation Toolkit\n- \u003cb\u003eJayanthi, et al. (2021)\u003c/b\u003e \u003ci\u003eCodemixedNLP: An Extensible and Open NLP Toolkit for Code-Mixing\u003c/i\u003e. CALCS Proceedings of the 5th Workshop on Computational Approaches to Code Switching (CALCS), NAACL \u003ca href=\"https://www.aclweb.org/anthology/2021.calcs-1.14.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/murali1996/CodemixedNLP\"\u003e[Code]\u003c/a\u003e\n- \u003cb\u003eRizvi, et al. (2021)\u003c/b\u003e \u003ci\u003eGCM: A Toolkit for Generating Synthetic Code-mixed Text\u003c/i\u003e. EACL (System Demonstrations) \u003ca href=\"https://www.aclweb.org/anthology/2021.eacl-demos.24.pdf\"\u003e[Paper]\u003c/a\u003e \u003ca href=\"https://github.com/microsoft/CodeMixed-Text-Generator\"\u003e[Code]\u003c/a\u003e\n\n#### Annotation Toolkit\n- \u003cb\u003eShah, et al. (2019)\u003c/b\u003e \u003ci\u003eCoSSAT: Code-Switched Speech Annotation Tool\u003c/i\u003e. Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP \u003ca href=\"https://www.aclweb.org/anthology/D19-5907.pdf\"\u003e[Paper]\u003c/a\u003e\n\n#### Summarization\n- \u003cb\u003eMehnaz, et al. (2021)\u003c/b\u003e \u003ci\u003eGupShup: Summarizing Open-Domain Code-Switched Conversations\u003c/i\u003e. EMNLP \u003ca href=\"https://aclanthology.org/2021.emnlp-main.499.pdf\"\u003e\u003c/a\u003e\n\n#### Question Answering\n- \u003cb\u003eGupta, et al. (2020)\u003c/b\u003e \u003ci\u003eA Unified Framework for Multilingual and Code-Mixed Visual Question Answering\u003c/i\u003e. AACL-IJCNLP \u003ca href=\"\"\u003e[TBA]\u003c/a\u003e\n\n#### Dialog and Conversational System\n- \u003cb\u003eBawa, et al. (2020)\u003c/b\u003e \u003ci\u003eDo Multilingual Users Prefer Chat-bots that Code-mix? Let's Nudge and Find Out!\u003c/i\u003e. ACM on Human-Computer Interaction \u003ca href=\"https://dl.acm.org/doi/pdf/10.1145/3392846\"\u003e[Paper]\u003c/a\u003e\n- \u003cb\u003eBanerjee, et al. (2018)\u003c/b\u003e \u003ci\u003eA Dataset for Building Code-Mixed Goal Oriented Conversation Systems\u003c/i\u003e. COLING \u003ca href=\"https://arxiv.org/pdf/1806.05997.pdf\"\u003e[Paper]\u003c/a\u003e\n\n### Position Paper\n- \u003cb\u003eNguyen, et al. (2022)\u003c/b\u003e \u003ci\u003eBuilding Educational Technologies for Code-Switching: Current Practices, Difficulties and Future Directions\u003c/i\u003e. Languages \u003ca href=\"https://www.mdpi.com/2226-471X/7/3/220/pdf?version=1660898944\"\u003e[Paper]\u003c/a\u003e\n\n## Books\n- \u003cb\u003eCaciullos and Travis (2018)\u003c/b\u003e \u003ci\u003eBilingualism in the Community\u003c/i\u003e. Cambridge University Press\n\n## Theses\n- \u003cb\u003eGenta Indra Winata (2021)\u003c/b\u003e \u003ci\u003eMultilingual Transfer Learning for Code-Switched Language and Speech Neural Modeling\u003c/i\u003e. \u003ca href=\"https://arxiv.org/pdf/2104.06268.pdf\"\u003e[Thesis]\u003c/a\u003e\n- \u003cb\u003eGustavo Aguilar (2020)\u003c/b\u003e \u003ci\u003eNeural Sequence Labeling on Social Media Text\u003c/i\u003e. \u003ca href=\"https://uh-ir.tdl.org/bitstream/handle/10657/7726/AGUILAR-DISSERTATION-2020.pdf?sequence=1\u0026isAllowed=y\"\u003e[Thesis]\u003c/a\u003e\n- \u003cb\u003eVictor Soto Martinez (2020)\u003c/b\u003e \u003ci\u003eIdentifying and Modeling Code-Switched Language\u003c/i\u003e. \u003ca href=\"http://www.cs.columbia.edu/speech/ThesisFiles/victor_soto.pdf\"\u003e[Thesis]\u003c/a\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgentaiscool%2Fcode-switching-papers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgentaiscool%2Fcode-switching-papers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgentaiscool%2Fcode-switching-papers/lists"}