Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-audio-visual-question-answering

A curated list of resources in audio visual question answering and related area. :-)
https://github.com/swarupbehera/awesome-audio-visual-question-answering

Last synced: about 15 hours ago
JSON representation

Dataset
- DAQA - ->
- Source
- ClothoAQA
- CLEAR
Papers
- 2022
  - Learning To Answer Questions in Dynamic Audio-Visual Scenarios - Guangyao Li et al., **CVPR**.
  - Vision Transformers are Parameter-Efficient Audio-Visual Learners - Mohit Bansal et al. (Code available)
  - Learning in Audio-visual Context: A Review, Analysis, and New Perspective - Xuelong Li et al. (Code not available)
  - Learning to Answer Questions in Dynamic Audio-Visual Scenarios - Guangyao Li et al. (Code available)
  - PACS: A Dataset for Physical Audiovisual CommonSense Reasoning - Samuel Yu et al. (Code not available)
- 2021
  - Pano-AVQA: Grounded Audio-Visual Question Answering on 360 Degree Videos - Heeseung Yun et al. (Code not available)
- 2024
  - Boosting Audio Visual Question Answering via Key Semantic-Aware Cues - Guangyao Li et al. (Code available)
  - Learning Trimodal Relation for AVQA with Missing Modality - Hong Joo Lee et al. (Code available)
  - Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time - Mohamed Elhoseiny et al. (Code not available)
  - SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering - Zhe Yang et al. (Code not available)
  - Towards Multilingual Audio-Visual Question Answering - Rajesh Sharma et al. (Code available)
  - VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs - Lidong Bing et al. (Code available)
  - CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models - Guangzhi Sun et al. (Code not available)
  - CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering - Jianqin Yin et al. (Code not available)
  - Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering - Pinghui Wang et al. (Code available)
  - AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue - Jing Bi et al. (Code not available)
  - Answering Diverse Questions via Text Attached with Key Audio-Visual Clues - Xin Liu, Zitong Yu, Qilang Ye (Code available)
  - CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios - Xiaochun Cao, Rui Shao, Zitong Yu, Philip Torr, Xinyu Xie, Qilang Ye (Code available)
  - Model Composition for Multimodal Large Language Models - Yang Liu, Maosong Sun, Ming Yan, Peng Li, Fei Huang, Ji Zhang, Zheng Fang, Chi Chen, Ziyue Wang, Yiyang Du, Fuwen Luo (Code available)
  - M2K-VDG: Model-Adaptive Multimodal Knowledge Anchor Enhanced Video-grounded Dialogue Generation - Yanfeng Wang, Hongcheng Liu, Yu Wang, Pingjie Wang (Code not available)
  - CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion - Mohit Bansal, Jaehong Yoon, Shoubin Yu (Code available)
  - Querying as Prompt: Parameter-Efficient Learning for Multimodal Language Model - Qiang Zhu, Jing Huang, Ming Kong, Tian Liang, Luyuan Chen (Code not available)
  - AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension - Zhou Zhao, Yichong Leng, Jin Xu, Jingren Zhou, Qian Yang, Xiaohuan Zhou, Chang Zhou, Yunfei Chu, Ziyue Jiang, Wenrui Liu, YuanJun Lv (Code available)
- 2023
  - AQUALLM: Audio Question Answering Data Generation Using Large Language Models - Swarup Ranjan Behera, Praveen Kumar Pokala, Krishna Mohan Injeti, Jaya Sai Kiran Patibandla, Balakrishna Reddy Pailla (Code not available)
  - OneLLM: One Framework to Align All Modalities with Language - Dahua Lin, Xiangyu Yue, Peng Gao, Jiaqi Wang, Jiaming Han, Kaipeng Zhang, Yu Qiao, Kaixiong Gong, Yiyuan Zhang (Code available)
  - Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks - Zhou Zhao, Li Tang, Jieming Zhu, Yan Xia, Haoyi Duan, Mingze Zhou (Code available)
  - CAD -- Contextual Multi-modal Alignment for Dynamic AVQA - Armin Mustafa et al. (Code not available)
  - Parameter-Efficient Transfer Learning for Audio-Visual-Language Tasks - Unknown Authors (Code not available)
  - VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset - Xinxin Zhu et al. (Code available)
  - ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst - Longteng Guo et al. (Code available)
  - Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios - Jianqin Yin et al. (Code available)
  - ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities - Peng Wang (Code available)
  - Progressive Spatio-temporal Perception for Audio-Visual Question Answering - Guangyao Li et al. (Code available)
  - Multi-Scale Attention for Audio Question Answering - Guangyao Li et al. (Code available)
  - VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset - Longteng Guo et al. (Code available)
- 2020
  - Hierarchical Conditional Relation Networks for Video Question Answering - Thao Minh Le (Code available)
Licenses
- 2021
  - Swarup
- 2020
  - ![CC0

Programming Languages

Categories

Papers 36 Dataset 4 Licenses 2

Sub Categories

2024 17 2023 12 2022 5 2020 2 2021 2