{"id":21177028,"url":"https://github.com/bat67/deep-marl-papers","last_synced_at":"2026-02-23T09:32:09.110Z","repository":{"id":152503243,"uuid":"287159067","full_name":"bat67/deep-MARL-papers","owner":"bat67","description":"[WIP✏] Paper list of deep multi-agent reinforcement learning (deep MARL)","archived":false,"fork":false,"pushed_at":"2020-08-16T14:47:49.000Z","size":332,"stargazers_count":2,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-14T18:37:25.649Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bat67.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-13T02:12:16.000Z","updated_at":"2023-11-25T10:29:56.000Z","dependencies_parsed_at":"2023-04-25T16:46:56.146Z","dependency_job_id":null,"html_url":"https://github.com/bat67/deep-MARL-papers","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bat67/deep-MARL-papers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bat67%2Fdeep-MARL-papers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bat67%2Fdeep-MARL-papers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bat67%2Fdeep-MARL-papers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bat67%2Fdeep-MARL-papers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bat67","download_url":"https://codeload.github.com/bat67/deep-MARL-papers/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bat67%2Fdeep-MARL-papers/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271752125,"owners_count":24814750,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-23T02:00:09.327Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-20T17:14:20.600Z","updated_at":"2025-10-31T00:51:35.543Z","avatar_url":"https://github.com/bat67.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Deep Multi-Agent Reinforcement Learning (MARL) Papers\n\n![overview](overview.png)\n\n\n## Centralized Training, Decentralized Execution (CTDE)\n\nValue-Decomposition Networks For Cooperative Multi-Agent Learning\nhttps://arxiv.org/abs/1706.05296\n\nQMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning\nhttps://arxiv.org/abs/1803.11485\n\nQTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning\nhttps://arxiv.org/abs/1905.05408\n\nMulti-Agent Actor-Critic for Mixed Cooperative-Competitive Environments\nhttps://papers.nips.cc/paper/7217-multi-agent-actor-critic-for-mixed-cooperative-competitive-environments\n\nCounterfactual Multi-Agent Policy Gradients\nhttps://arxiv.org/abs/1705.08926\n\nActor-Attention-Critic for Multi-Agent Reinforcement Learning\nhttps://arxiv.org/abs/1810.02912\n\nQPLEX: Duplex Dueling Multi-Agent Q-Learning\nhttps://arxiv.org/abs/2008.01062\n\nDOP: Off-Policy Multi-Agent Decomposed Policy Gradients\nhttps://arxiv.org/abs/2007.12322v1\n\n\n\n## Coordination\n\nHysteretic Q-learning : an algorithm for Decentralized Reinforcement Learning in Cooperative Multi-Agent Teams\nhttps://ieeexplore.ieee.org/document/4399095\n\nDeep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability\nhttps://arxiv.org/abs/1703.06182\n\nLenient Learning in Independent-Learner Stochastic Cooperative Games\nhttps://jmlr.org/papers/v17/15-417.html\n\nLenient Multi-Agent Deep Reinforcement Learning\nhttps://arxiv.org/abs/1707.04402\n\nExplicitly Coordinated Policy Iteration\nhttps://www.ijcai.org/Proceedings/2019/51\n\n\n\n## Learning to Communicate\n\nLearning to Communicate with Deep Multi-Agent Reinforcement Learning\nhttps://arxiv.org/abs/1605.06676\n\nLearning Multiagent Communication with Backpropagation\nhttps://arxiv.org/abs/1605.07736\n\nMultiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games\nhttps://arxiv.org/abs/1703.10069\n\nACCNet: Actor-Coordinator-Critic Net for \"Learning-to-Communicate\" with Deep Multi-agent Reinforcement Learning\nhttps://arxiv.org/abs/1706.03235\n\nLearning Attentional Communication for Multi-Agent Cooperation\nhttps://arxiv.org/abs/1805.07733\n\nLearning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks\nhttps://arxiv.org/abs/1812.09755\n\nTarMAC: Targeted Multi-Agent Communication\nhttps://arxiv.org/abs/1810.11187\n\nLearning Nearly Decomposable Value Functions Via Communication Minimization\nhttps://arxiv.org/abs/1910.05366\n\nLearning to Schedule Communication in Multi-agent Reinforcement Learning\nhttps://arxiv.org/abs/1902.01554\n\nSocial Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning\nhttps://arxiv.org/abs/1810.08647\n\nInfoBot: Transfer and Exploration via the Information Bottleneck\nhttps://arxiv.org/abs/1901.10902\n\n\n\n## Neural Network Design\n\nDeep reinforcement learning with relational inductive biases\nhttps://openreview.net/forum?id=HkxaFoC9KQ\n\nAction Semantics Network: Considering the Effects of Actions in Multiagent Systems\nhttps://arxiv.org/abs/1907.11461\n\nMulti-Agent Game Abstraction via Graph Attention Neural Network\nhttps://arxiv.org/abs/1911.10715\n\nFrom Few to More: Large-scale Dynamic Multiagent Curriculum Learning\nhttps://arxiv.org/abs/1909.02790\n\n\n\n## Opponent Exploration\n\nOpponent Modeling in Deep Reinforcement Learning\nhttps://arxiv.org/abs/1609.05559\n\nEfficiently detecting switches against non-stationary opponents\nhttp://ifaamas.org/Proceedings/aamas2017/pdfs/p920.pdf\n\nAn exploration strategy for non-stationary opponents\nhttps://link.springer.com/article/10.1007/s10458-016-9347-3?shared-article-renderer\n\nA Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents\nhttps://papers.nips.cc/paper/7374-a-deep-bayesian-policy-reuse-approach-against-non-stationary-agents\n\nAutonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems\nhttps://arxiv.org/abs/1709.08071\n\nA Deep Policy Inference Q-Network for Multi-Agent Systems\nhttps://arxiv.org/abs/1712.07893\n\nLearning with Opponent-Learning Awareness\nhttps://arxiv.org/abs/1709.04326\n\nBayes-ToMoP: A Fast Detection and Best Response Algorithm Towards Sophisticated Opponents\nhttp://www.ifaamas.org/Proceedings/aamas2019/pdfs/p2282.pdf\n\nTowards Efficient Detection and Optimal Response against Sophisticated Opponents\nhttps://arxiv.org/abs/1809.04240\n\n\n\n## Multi-Agent Exploration\n\nInfluence-Based Multi-Agent Exploration\nhttps://arxiv.org/abs/1910.05512\n\nCoordinated Exploration in Concurrent Reinforcement Learning\nhttps://arxiv.org/abs/1802.01282\n\nScalable Coordinated Exploration in Concurrent Reinforcement Learning\nhttps://arxiv.org/abs/1805.08948\n\nLearning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems\nhttp://proceedings.mlr.press/v80/bargiacchi18a.html\n\nCoordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning\nhttps://arxiv.org/abs/1905.12127\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbat67%2Fdeep-marl-papers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbat67%2Fdeep-marl-papers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbat67%2Fdeep-marl-papers/lists"}