Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-audio-visual-question-answering
A curated list of resources in audio visual question answering and related area. :-)
https://github.com/swarupbehera/awesome-audio-visual-question-answering
Last synced: 2 days ago
JSON representation
-
Dataset
-
Papers
-
2022
- Learning To Answer Questions in Dynamic Audio-Visual Scenarios - Guangyao Li et al., **CVPR**.
- Vision Transformers are Parameter-Efficient Audio-Visual Learners - Mohit Bansal et al. (Code available)
- Learning in Audio-visual Context: A Review, Analysis, and New Perspective - Xuelong Li et al. (Code not available)
- Learning to Answer Questions in Dynamic Audio-Visual Scenarios - Guangyao Li et al. (Code available)
- PACS: A Dataset for Physical Audiovisual CommonSense Reasoning - Samuel Yu et al. (Code not available)
-
2021
- Pano-AVQA: Grounded Audio-Visual Question Answering on 360 Degree Videos - Heeseung Yun et al. (Code not available)
-
2024
- Boosting Audio Visual Question Answering via Key Semantic-Aware Cues - Guangyao Li et al. (Code available)
- Learning Trimodal Relation for AVQA with Missing Modality - Hong Joo Lee et al. (Code available)
- Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time - Mohamed Elhoseiny et al. (Code not available)
- SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering - Zhe Yang et al. (Code not available)
- Towards Multilingual Audio-Visual Question Answering - Rajesh Sharma et al. (Code available)
- VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs - Lidong Bing et al. (Code available)
- CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models - Guangzhi Sun et al. (Code not available)
- CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering - Jianqin Yin et al. (Code not available)
- Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering - Pinghui Wang et al. (Code available)
- AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue - Jing Bi et al. (Code not available)
- Answering Diverse Questions via Text Attached with Key Audio-Visual Clues - Xin Liu, Zitong Yu, Qilang Ye (Code available)
- CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios - Xiaochun Cao, Rui Shao, Zitong Yu, Philip Torr, Xinyu Xie, Qilang Ye (Code available)
- Model Composition for Multimodal Large Language Models - Yang Liu, Maosong Sun, Ming Yan, Peng Li, Fei Huang, Ji Zhang, Zheng Fang, Chi Chen, Ziyue Wang, Yiyang Du, Fuwen Luo (Code available)
- M2K-VDG: Model-Adaptive Multimodal Knowledge Anchor Enhanced Video-grounded Dialogue Generation - Yanfeng Wang, Hongcheng Liu, Yu Wang, Pingjie Wang (Code not available)
- CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion - Mohit Bansal, Jaehong Yoon, Shoubin Yu (Code available)
- Querying as Prompt: Parameter-Efficient Learning for Multimodal Language Model - Qiang Zhu, Jing Huang, Ming Kong, Tian Liang, Luyuan Chen (Code not available)
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension - Zhou Zhao, Yichong Leng, Jin Xu, Jingren Zhou, Qian Yang, Xiaohuan Zhou, Chang Zhou, Yunfei Chu, Ziyue Jiang, Wenrui Liu, YuanJun Lv (Code available)
-
2023
- AQUALLM: Audio Question Answering Data Generation Using Large Language Models - Swarup Ranjan Behera, Praveen Kumar Pokala, Krishna Mohan Injeti, Jaya Sai Kiran Patibandla, Balakrishna Reddy Pailla (Code not available)
- OneLLM: One Framework to Align All Modalities with Language - Dahua Lin, Xiangyu Yue, Peng Gao, Jiaqi Wang, Jiaming Han, Kaipeng Zhang, Yu Qiao, Kaixiong Gong, Yiyuan Zhang (Code available)
- Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks - Zhou Zhao, Li Tang, Jieming Zhu, Yan Xia, Haoyi Duan, Mingze Zhou (Code available)
- CAD -- Contextual Multi-modal Alignment for Dynamic AVQA - Armin Mustafa et al. (Code not available)
- Parameter-Efficient Transfer Learning for Audio-Visual-Language Tasks - Unknown Authors (Code not available)
- VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset - Xinxin Zhu et al. (Code available)
- ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst - Longteng Guo et al. (Code available)
- Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios - Jianqin Yin et al. (Code available)
- ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities - Peng Wang (Code available)
- Progressive Spatio-temporal Perception for Audio-Visual Question Answering - Guangyao Li et al. (Code available)
- Multi-Scale Attention for Audio Question Answering - Guangyao Li et al. (Code available)
- VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset - Longteng Guo et al. (Code available)
-
2020
- Hierarchical Conditional Relation Networks for Video Question Answering - Thao Minh Le (Code available)
-
-
Licenses