Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-sound_event_detection
Reading list for research topics in Sound AI
https://github.com/soham97/awesome-sound_event_detection
Last synced: 1 day ago
JSON representation
-
Areas
-
Network Architecture
- Effective Perturbation based Semi-Supervised Learning Method for Sound Event Detection
- Weakly-supervised audio event detection using event-specific Gaussian filters and fully convolutional networks
- Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data
- Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network
- Orthogonality-Regularized Masked NMF for Learning on Weakly Labeled Audio Data
- Attention-based Atrous Convolutional Neural Networks: Visualisation and Understanding Perspectives of Acoustic Scenes
- Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization
- DD-CNN: Depthwise Disout Convolutional Neural Network for Low-complexity Acoustic Scene Classification
- Weakly-Supervised Sound Event Detection with Self-Attention
- Improving Deep Learning Sound Events Classifiers using Gram Matrix Feature-wise Correlations
- An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection
- AST: Audio Spectrogram Transformer
- Event Specific Attention for Polyphonic Sound Event Detection
- Sound Event Detection with Adaptive Frequency Selection
- HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
- MAE-AST: Masked Autoencoding Audio Spectrogram Transformer
- Efficient Training of Audio Transformers with Patchout
- BEATs: Audio Pre-Training with Acoustic Tokenizers
- Sound event detection and time–frequency segmentation from weakly labelled data
-
Representation Learning
- Contrastive Predictive Coding of Audio with an Adversary
- Towards Learning a Universal Non-Semantic Representation of Speech
- ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection
- FRILL: A Non-Semantic Speech Embedding for Mobile Devices
- HEAR 2021: Holistic Evaluation of Audio Representations
- Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
- Towards Learning Universal Audio Representations
- SSAST: Self-Supervised Audio Spectrogram Transformer
-
Learning formulation
- Weakly supervised scalable audio content analysis
- An approach for self-training audio event detectors using web data
- A joint detection-classification model for audio tagging of weakly labelled data
- Connectionist Temporal Localization for Sound Event Detection with Sequential Labeling
- A Sequential Self Teaching Approach for Improving Generalization in Sound Event Recognition
- Non-Negative Matrix Factorization-Convolutional Neural Network (NMF-CNN) For Sound Event Detection
- Duration robust weakly supervised sound event detection
- SeCoST:: Sequential Co-Supervision for Large Scale Weakly Labeled Audio Event Detection
- Guided Learning for Weakly-Labeled Semi-Supervised Sound Event Detection
- Unsupervised Contrastive Learning of Sound Event Representations
- Sound Event Detection Based on Curriculum Learning Considering Learning Difficulty of Events
- Comparison of Deep Co-Training and Mean-Teacher Approaches for Semi-Supervised Audio Tagging
- Enhancing Audio Augmentation Methods with Consistency Learning
- Audio Event Detection using Weakly Labeled Data
- Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection
- Unsupervised Contrastive Learning of Sound Event Representations
-
Pooling functions
- Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks
- Hierarchical Pooling Structure for Weakly Labeled Sound Event Detection
- Weakly labelled audioset tagging with attention neural networks
- A Global-Local Attention Framework for Weakly Labelled Audio Tagging
- A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling
- Adaptive Pooling Operators for Weakly Labeled Sound Event Detection
-
Data Augmentation:
-
Multi-Task Learning
- A Joint Separation-Classification Model for Sound Event Detection of Weakly Labelled Data
- Label-efficient audio classification through multitask learning and self-supervision
- A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling
- Identifying Actions for Sound Event Classification
- Impact of Acoustic Event Tagging on Scene Classification in a Multi-Task Learning Framework
-
Few-Shot
- Few-Shot Audio Classification with Attentional Graph Neural Networks
- Continual Learning of New Sound Classes Using Generative Replay
- Few-Shot Sound Event Detection
- Few-Shot Continual Learning for Audio Classification
- Unsupervised and Semi-Supervised Few-Shot Acoustic Event Classification
- Who Calls the Shots? Rethinking Few-Shot Learning for Audio
- A Mutual Learning Framework For Few-Shot Sound Event Detection
- Active Few-Shot Learning for Sound Event Detection
- Adapting Language-Audio Models as Few-Shot Audio Learners
-
Zero-Shot
- AudioCLIP: Extending CLIP to Image, Text and Audio
- CLAP 👏: Learning Audio Concepts From Natural Language Supervision
- Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
- Listen, Think, and Understand
- Pengi 🐧: An Audio Language Model for Audio Tasks
- Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
- Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
- ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
- Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
-
Knowledge Transfer
- Transfer learning of weakly labelled audio
- Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes
- PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
- Do sound event representations generalize to other audio tasks? A case study in audio transfer learning
-
Polyphonic SED
- Polyphonic Sound Event Detection with Weak Labeling
- Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy
- Evaluation of Post-Processing Algorithms for Polyphonic Sound Event Detection
- Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection
- Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection
-
Loss function
-
Audio and Visual
- A Light-Weight Multimodal Framework for Improved Environmental Audio Tagging
- Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data
- Labelling unlabelled videos from scratch with multi-modal self-supervision
- Audio-Visual Event Recognition Through the Lens of Adversary
- Learning Audio-Video Modalities from Image Captions
- UAVM: Towards Unifying Audio and Visual Models
- Contrastive Audio-Visual Masked Autoencoder
-
Audio Captioning
- Automated audio captioning with recurrent neural networks
- Audio caption: Listen and tell
- AudioCaps: Generating captions for audios in the wild
- Audio Captioning Based on Combined Audio and Semantic Embeddings
- Clotho: An Audio Captioning Dataset
- A Transformer-based Audio Captioning Model with Keyword Estimation
- Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events
- Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags
- Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization
- Sound Event Detection Guided by Semantic Contexts of Scenes
- Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning
-
Audio Retrieval
-
Audio Generation
- Acoustic Scene Generation with Conditional Samplernn
- Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning
- Diffsound: Discrete Diffusion Model for Text-to-sound Generation
- AudioGen: Textually Guided Audio Generation
- Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
- AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
- AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models
- Diverse and Vivid Sound Generation from Text Descriptions
- Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
- Simple and Controllable Music Generation
- Audiobox: Unified Audio Generation with Natural Language Prompts
- Masked Audio Generation using a Single Non-Autoregressive Transformer
- Taming Visually Guided Sound Generation
-
Others
- Audio event and scene recognition: A unified approach using strongly and weakly labeled data
- Sound Event Detection Using Point-Labeled Data
- An Investigation of the Effectiveness of Phase for Audio Classification
- DCASE17 Task 4
- US8K
- FSD50K
- AudioSet
- DiCOVA
- DCASE21 Task 5
- DCASE18 Task 1
- VGG-Sound
- AudioCaps
- LAION 630k
- UCF101
- YFCC100M
- Other audio-based datasets to consider
- Computational Analysis of Sound Scenes and Events
- K. Drossos - captioning/audio-captioning-papers)
-
Missing or noisy audio:
-
-
Survey papers
Categories
Sub Categories
Network Architecture
19
Others
18
Learning formulation
16
Audio Generation
13
Audio Captioning
11
Few-Shot
9
Zero-Shot
9
Representation Learning
8
Audio and Visual
7
Pooling functions
6
Polyphonic SED
5
Multi-Task Learning
5
Audio Retrieval
4
Knowledge Transfer
4
Missing or noisy audio:
1
Data Augmentation:
1
Loss function
1