Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-multi-modal-reinforcement-learning
A curated list of Multi-Modal Reinforcement Learning resources (continually updated)
https://github.com/opendilab/awesome-multi-modal-reinforcement-learning
Last synced: 4 days ago
JSON representation
-
Papers
-
NeurIPS 2022
-
ICLR 2023
- VIMA-Bench - Data](https://huggingface.co/datasets/VIMA/VIMA-Data)
- PaLI: A Jointly-Scaled Multilingual Language-Image Model
- VIMA: General Robot Manipulation with Multimodal Prompts
- MuJoCo
- MIND ’S EYE: GROUNDED LANGUAGE MODEL REASONING THROUGH SIMULATION
-
ICLR 2021
-
NeurIPS 2023
- HumanML3D - database.humanoids.kit.edu/)
- Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation
- Frequency-Enhanced Data Augmentation for Vision-and-Language Navigation
- Matterport3d
- MotionGPT: Human Motion as a Foreign Language
- Large Language Models are Visual Reasoning Coordinators
- Language Is Not All You Need: Aligning Perception with Language Models
- IQ50
-
ICML 2022
- VirtualHome
- Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
- Reinforcement Learning with Action-Free Pre-Training from Videos
- History Compression via Language Models in Reinforcement Learning
- Minigrid
- Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
-
NeurIPS 2021
- Pretraining Representations for Data-Efficient Reinforcement Learning
- SOAT: A Scene-and Object-Aware Transformer for Vision-and-Language Navigation
- Room-to-Room - Across-Room](https://github.com/google-research-datasets/RxR)
-
ICLR 2022
-
ICLR 2019
-
NeurIPS 2018
-
ICML 2019
-
ICML 2017
-
CVPR 2022
- End-to-end Generative Pretraining for Multimodal Video Captioning
- HowTo100M
- Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
- ADE20K
- Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
- Masked Visual Pre-training for Motor Control
- Isaac Gym
-
CoRL 2022
-
Other
- Learning Generalizable Robotic Reward Functions from “In-The-Wild” Human Videos
- Offline Reinforcement Learning from Images with Latent Space Models
- Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?
- DeepMind Control - Foundation/D4RL), [Sawyer Door Open](https://github.com/suraj-nair-1/metaworld), [Robel D’Claw Screw](https://github.com/google-research/robel)
- Language Conditioned Imitation Learning over Unstructured Data
-
ArXiv
- On Time-Indexing as Inductive Bias in Deep RL for Sequential Manipulation Tasks
- Parameterized Decision-making with Multi-modal Perception for Autonomous Driving
- Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API
- Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
- Nonprehensile Planar Manipulation through Reinforcement Learning with Multimodal Categorical Exploration
- Do as I can, not as I get:Topology-aware multi-hop reasoningon multi-modal knowledge graphs
- Multimodal Reinforcement Learning for Robots Collaborating with Humans
- See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction
- Open-vocabulary Queryable Scene Representations for Real World Planning
- Say Can
- Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
- M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representation Learning for Robotic Manipulation
- End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning
- Spatialvlm: Endowing vision-language models with spatial reasoning capabilities
-
ICLR 2024
- DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization
- Revisiting Data Augmentation in Deep Reinforcement Learning
- Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages
- Entity-Centric Reinforcement Learning for Object Manipulation from Pixels
- IsaacGym
-
CVPR 2024
- Vision-and-Language Navigation via Causal Learning
- R2R - research-datasets/RxR) [RxR-English](https://github.com/google-research-datasets/RxR) [SOON]()
- DMR: Decomposed Multi-Modality Representations for Frames and Events Fusion in Visual Reinforcement Learning
-
ICML 2024
- Atari
- Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning
- RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback
- Reward Shaping for Reinforcement Learning with An Assistant Reward Agent
- Mujoco
- FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning
- Rich-Observation Reinforcement Learning with Continuous Latent Dynamics
- LLM-Empowered State Representation for Reinforcement Learning
- Code as Reward: Empowering Reinforcement Learning with VLMs
- MiniGrid
-
-
Introduction
Programming Languages
Categories
Sub Categories
Keywords
deep-learning
2
mujoco
2
reinforcement-learning
2
motion-generation
1
pytorch-implementation
1
computer-vision
1
graph
1
multi-agent
1
simulator
1
unity
1
gridworld-environment
1
gym
1
artificial-intelligence
1
machine-learning
1
neural-networks
1
physics-simulation
1
3d-reconstruction
1
semantic-scene-understanding
1
physics
1
robotics
1