https://github.com/dmitryryumin/icassp-2023-24-papers
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
https://github.com/dmitryryumin/icassp-2023-24-papers
asr denoising domain-adaptation face-recognition generative-models icassp icassp2023 icassp2024 image-generation keyword-spotting language-modeling multimodal-learning music-generation self-supervised-learning semantic-segmentation signal-processing signal-restoration speech-recognition spoken-language-understanding vad
Last synced: 2 months ago
JSON representation
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
- Host: GitHub
- URL: https://github.com/dmitryryumin/icassp-2023-24-papers
- Owner: DmitryRyumin
- License: mit
- Created: 2023-08-01T09:17:13.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-30T01:47:26.000Z (8 months ago)
- Last Synced: 2024-10-30T03:56:24.418Z (8 months ago)
- Topics: asr, denoising, domain-adaptation, face-recognition, generative-models, icassp, icassp2023, icassp2024, image-generation, keyword-spotting, language-modeling, multimodal-learning, music-generation, self-supervised-learning, semantic-segmentation, signal-processing, signal-restoration, speech-recognition, spoken-language-understanding, vad
- Language: Python
- Homepage:
- Size: 8.8 MB
- Stars: 388
- Watchers: 29
- Forks: 17
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
![]()
General Information
![]()
![]()
![]()
![]()
Repository Size and Activity
![]()
![]()
Contribution Statistics
![]()
![]()
![]()
![]()
![]()
Other Metrics
![]()
![]()
![]()
![]()
![]()
GitHub Actions
![]()
![]()
![]()
Application
![]()
Progress Status
Main
![]()
![]()
---
ICASSP 2024 Papers: A complete collection of influential and exciting research papers from the [*ICASSP 2024*](https://2024.ieeeicassp.org/) conference. Explore the latest advancements in acoustics, speech and signal processing. Code included. :star: the repository to support the advancement of audio and signal processing!
---
> [!TIP]
[*Online version of the ICASSP 2024 Conference Technical Program*](https://2024.ieeeicassp.org/program-schedule/), which lists all accepted full papers along with their presentation mode and time.---
![]()
Other collections of the best AI conferences
> [!important]
> Conference table will be up to date all the time.
Conference
Year
2023
2024
Computer Vision (CV)
CVPR
![]()
ICCV
![]()
![]()
![]()
ECCV
![]()
![]()
WACV
:heavy_minus_sign:
![]()
![]()
FG
:heavy_minus_sign:
![]()
Speech/Signal Processing (SP/SigProc)
ICASSP
![]()
INTERSPEECH
![]()
![]()
![]()
ISMIR
![]()
![]()
:heavy_minus_sign:
Natural Language Processing (NLP)
EMNLP
![]()
![]()
Machine Learning (ML)
AAAI
:heavy_minus_sign:
![]()
ICLR
:heavy_minus_sign:
![]()
ICML
:heavy_minus_sign:
![]()
NeurIPS
:heavy_minus_sign:
![]()
---
## Contributors
> [!NOTE]
> Contributions to improve the completeness of this list are greatly appreciated. If you come across any overlooked papers, please **feel free to [*create pull requests*](https://github.com/DmitryRyumin/ICASSP-2023-24-Papers/pulls), [*open issues*](https://github.com/DmitryRyumin/ICASSP-2023-24-Papers/issues) or contact me via [*email*](mailto:[email protected])**. Your participation is crucial to making this repository even better.---
## Papers
Section
Papers
![]()
![]()
![]()
Main
Audio-Visual Speech Processing
![]()
![]()
![]()
![]()
Vision and Language
![]()
![]()
![]()
![]()
Acoustic Signal Processing
![]()
![]()
![]()
![]()
Deep Learning Techniques
![]()
![]()
![]()
![]()
Speech Enhancement and Separation - Diffusion and other Probabilistic Models
![]()
![]()
![]()
![]()
ASPS Lecture
![]()
![]()
![]()
![]()
Distributed and Federated Learning
![]()
![]()
![]()
![]()
Transfer Learning
![]()
![]()
![]()
![]()
Voice Conversion
![]()
![]()
![]()
![]()
Graph Neural Networks
![]()
![]()
![]()
![]()
Language Resources, Metrics and Systems
![]()
![]()
![]()
![]()
Watermarking and Data Hiding
![]()
![]()
![]()
![]()
Signal and Information Processing over Graphs
![]()
![]()
![]()
![]()
Integrated Sensing and Communications
![]()
![]()
![]()
![]()
Audio Events Detection and Classification; Music Information Retrieval
![]()
![]()
![]()
![]()
Language Understanding and Computational Semantics - NLP Tasks
![]()
![]()
![]()
![]()
Physiological and Wearable Signal Processing
![]()
![]()
![]()
![]()
Speech Enhancement; Music Information Retrieval
![]()
![]()
![]()
![]()
Multimodal Medical Image Fusion and Analysis
![]()
![]()
![]()
![]()
Sparse/Low-Dimensional Signal Processing
![]()
![]()
![]()
![]()
Robust and Sustainable Machine Learning
![]()
![]()
![]()
![]()
Machine Learning for Image and Video Processing
![]()
![]()
![]()
![]()
Deep Learning Generalization
![]()
![]()
![]()
![]()
Distributed Processing and Federated Learning
![]()
![]()
![]()
![]()
Biological Image Analysis
![]()
![]()
![]()
![]()
Learning from Multimodal Data
![]()
![]()
![]()
![]()
Biometrics
![]()
![]()
![]()
![]()
Detection and Classification
![]()
![]()
![]()
![]()
Multimedia Coding
![]()
![]()
![]()
![]()
Anonymisation, Data Privacy and Hiding
![]()
![]()
![]()
![]()
Quality Assessment and Anomaly Detection
![]()
![]()
![]()
![]()
Signal Filtering, Reconstruction, Restoration and Enhancement
![]()
![]()
![]()
![]()
Speech Emotion Recognition and Analysis
![]()
![]()
![]()
![]()
Deep Generative Models
![]()
![]()
![]()
![]()
Context and LLM Speech Recognition
![]()
![]()
![]()
![]()
Music Information Retrieval
![]()
![]()
![]()
![]()
Multimodal Processing: Vision + Language
![]()
![]()
![]()
![]()
Environmental Sound Synthesis and Generation
![]()
![]()
![]()
![]()
Biomedical and Biological Image Processing
![]()
![]()
![]()
![]()
DoA Estimation
![]()
![]()
![]()
![]()
Tracking
![]()
![]()
![]()
![]()
Machine Learning for Communications
![]()
![]()
![]()
![]()
Image and Video Processing for Watermarking and Security
![]()
![]()
![]()
![]()
Self-Supervised Learning for Speech Processing
![]()
![]()
![]()
![]()
Deep Learning for Image and Video Processing
![]()
![]()
![]()
![]()
Image, Video, and 3D Content Generation
![]()
![]()
![]()
![]()
Classification of Acoustic Scenes and Events
![]()
![]()
![]()
![]()
Reinforcement Learning
![]()
![]()
![]()
![]()
Subspace and Manifold Learning
![]()
![]()
![]()
![]()
Active Noise Control and Echo Cancellation; Source Separation
![]()
![]()
![]()
![]()
Machine Learning, Detection and Classification
![]()
![]()
![]()
![]()
Machine Learning for Audio, Speech and Music Processing
![]()
![]()
![]()
![]()
Multimedia Generation and Synthesis
![]()
![]()
![]()
![]()
Medical Image Detection and Segmentation
![]()
![]()
![]()
![]()
Multimedia Forensics and Cybersecurity
![]()
![]()
![]()
![]()
Estimation Theory and Methods
![]()
![]()
![]()
![]()
Emerging Methods for Biomedical Image and Signal Processing
![]()
![]()
![]()
![]()
Text to Speech Generation
![]()
![]()
![]()
![]()
Audio Classification, Detection and Localization
![]()
![]()
![]()
![]()
Self-Supervised and Semi-Supervised Learning
![]()
![]()
![]()
![]()
Multichannel/Multimodal Speech Recognition
![]()
![]()
![]()
![]()
Speaker Verification
![]()
![]()
![]()
![]()
Speaker Diarization
![]()
![]()
![]()
![]()
Adversarial Machine Learning
![]()
![]()
![]()
![]()
Machine Learning Methods for Language
![]()
![]()
![]()
![]()
SPED: Signal Processing Education
![]()
![]()
![]()
![]()
Multimedia Quality of Experience
![]()
![]()
![]()
![]()
Domain-Enriched Learning for Medical Image Processing
![]()
![]()
![]()
![]()
Speech Enhancement and Separation
![]()
![]()
![]()
![]()
Image Denoising
![]()
![]()
![]()
![]()
ASPS Poster
![]()
![]()
![]()
![]()
ASR - New Algorithms and Approaches
![]()
![]()
![]()
![]()
Data Mining and Big Data
![]()
![]()
![]()
![]()
Language Understanding and Computational Semantics - Machine Learning
![]()
![]()
![]()
![]()
Explainable and Interpretable Machine Learning
![]()
![]()
![]()
![]()
Neuroimaging and Brain/Human-Computer Interfaces
![]()
![]()
![]()
![]()
Localization, DOA Estimation, Spatial Audio Recording and Reproduction
![]()
![]()
![]()
![]()
Perception and Processing for Autonomous Systems and Applications
![]()
![]()
![]()
![]()
Computational Imaging
![]()
![]()
![]()
![]()
Audio and Speech Quality and Intelligibility Measures; Music Analysis
![]()
![]()
![]()
![]()
Medical Image Formation, Reconstruction and Restoration
![]()
![]()
![]()
![]()
Audio and Speech Source Separation
![]()
![]()
![]()
![]()
Text-based Customization for Speech-to-Text
![]()
![]()
![]()
![]()
Deep Learning Models
![]()
![]()
![]()
![]()
Next-Gen Communication Systems
![]()
![]()
![]()
![]()
Image Restoration
![]()
![]()
![]()
![]()
Robustness and Trustworthy Machine Learning
![]()
![]()
![]()
![]()
Signal Processing over Networks
![]()
![]()
![]()
![]()
3D Understanding
![]()
![]()
![]()
![]()
Compressed Sensing and Machine Learning for Multi-Sensor Systems
![]()
![]()
![]()
![]()
LIMMITS: Multi-Speaker, Multi-Lingual Indic TTS with Voice Cloning
![]()
![]()
![]()
![]()
Natural Language Processing for Speech-to-Text
![]()
![]()
![]()
![]()
Resource Constrained Acoustic and Language Modeling
![]()
![]()
![]()
![]()
Dereverberation and RIR Estimation; Speech Enhancement and Restoration
![]()
![]()
![]()
![]()
Image/Video Super-Resolution
![]()
![]()
![]()
![]()
Matrix Factorization and Source Separation
![]()
![]()
![]()
![]()
Beamforming for Audio and Speech; Music Signal Analysis, Processing and Synthesis
![]()
![]()
![]()
![]()
Summarization, Retrieval and Language Learning
![]()
![]()
![]()
![]()
Sequential Learning and Sequential Decision Methods
![]()
![]()
![]()
![]()
MIMO and Massive MIMO Communication Systems
![]()
![]()
![]()
![]()
Multimodal Emotion/Sentiment Analysis
![]()
![]()
![]()
![]()
Human Understanding
![]()
![]()
![]()
![]()
Image and Video Synthesis
![]()
![]()
![]()
![]()
MIMO and High-Frequency Communications
![]()
![]()
![]()
![]()
Image and Video Super-Resolution
![]()
![]()
![]()
![]()
Spatial Audio Recording and Reproduction
![]()
![]()
![]()
![]()
Audio Signal Restoration and Speech Enhancement
![]()
![]()
![]()
![]()
Discourse and Dialog
![]()
![]()
![]()
![]()
Bayesian Signal Processing
![]()
![]()
![]()
![]()
Pattern Recognition and Classification
![]()
![]()
![]()
![]()
Key Word Spotting
![]()
![]()
![]()
![]()
Speech Analysis - Pitch, Spectrum and Voice Disorders
![]()
![]()
![]()
![]()
Grand Challenge on Hyperspectral Skin Vision
![]()
![]()
![]()
![]()
Robust Speech Recognition and Adaptation
![]()
![]()
![]()
![]()
Speech Analysis and Language Disorder Analysis
![]()
![]()
![]()
![]()
Aspects in Image/Video Processing and Analysis
![]()
![]()
![]()
![]()
DoA Estimation and Source Localization
![]()
![]()
![]()
![]()
Multimodal Processing of Language
![]()
![]()
![]()
![]()
Source separation; Music analysis
![]()
![]()
![]()
![]()
Machine Learning for Time Series Analysis
![]()
![]()
![]()
![]()
Multimedia Search and Retrieval
![]()
![]()
![]()
![]()
Anomaly Detection; Sound Event Detection and Localization
![]()
![]()
![]()
![]()
Acoustic Array and Signal Processing
![]()
![]()
![]()
![]()
Music Signal Analysis and Processing
![]()
![]()
![]()
![]()
Language Understanding and Computational Semantics - Language Models
![]()
![]()
![]()
![]()
Deep Learning Theory
![]()
![]()
![]()
![]()
Anti-Spoofing
![]()
![]()
![]()
![]()
Pose, Gesture, and Action in Multimedia
![]()
![]()
![]()
![]()
Sampling Theory, Compressed and Non-Uniform Sampling
![]()
![]()
![]()
![]()
MIMO and Massive MIMO Systems
![]()
![]()
![]()
![]()
Multimodal and Emerging Medical Signal Analysis
![]()
![]()
![]()
![]()
The RF Signal Separation Challenge
![]()
![]()
![]()
![]()
Signal Processing for Communications
![]()
![]()
![]()
![]()
Audio and Speech Modeling, Coding and Transmission; Spatial Audio Recording and Reproduction
![]()
![]()
![]()
![]()
Voice Conversion: Singing, Accent and Emotion
![]()
![]()
![]()
![]()
Other Machine Learning Applications
![]()
![]()
![]()
![]()
Speaker Recognition and Anonymization
![]()
![]()
![]()
![]()
Feature Extraction Selection and Learning
![]()
![]()
![]()
![]()
Music Information Retrieval; Quality and Intelligibility Measures
![]()
![]()
![]()
![]()
Learning Theory and Performance Bound
![]()
![]()
![]()
![]()
Human-Centric Multimedia
![]()
![]()
![]()
![]()
Multilingual Speech Recognition and Identification
![]()
![]()
![]()
![]()
Image Recognition and Detection
![]()
![]()
![]()
![]()
Signal Processing over Graphs and Networks
![]()
![]()
![]()
![]()
End-to-End Modeling for Automatic Speech Recognition
![]()
![]()
![]()
![]()
Segmentation, Tagging, and Parsing of Language
![]()
![]()
![]()
![]()
Detection
![]()
![]()
![]()
![]()
Audio-Language Processing and Audio Captioning
![]()
![]()
![]()
![]()
Action Recognition
![]()
![]()
![]()
![]()
Image, Video and Other Applications
![]()
![]()
![]()
![]()
Multimodal Information Based Speech Processing (MISP)
![]()
![]()
![]()
![]()
Next-Gen Communications and PHY Security
![]()
![]()
![]()
![]()
Network and System Security
Will soon be added
Target Source Extraction; Active Noise Control, Echo Reduction and Feedback Reduction
Machine Translation for Spoken and Written Language
Sound Events Detection, Description and Generation
Applied Cryptography
Machine/Deep Learning Methodologies for Multimedia
Speech Separation and Extraction
Signal Processing and Machine Learning for Communications
Audio Coding
Active Noise Control and Echo Cancellation
Bayesian Machine Learning
Advancing the Frontiers of Deep Learning for Low-Dose 3D Cone-Beam CT Reconstruction
Bioacoustics and Medical Acoustics; Audio Security
Acoustic Modeling for Automatic Speech Recognition
Multimodal Processing of Speech
IFS General
3D Image and Video Processing and Analysis
Deep Learning Training Methods
Key Word Spotting and Acoustic Event Detection
Coding, Information Theory, and Applications of Signal Processing for Communications
Speech Analysis
Music Separation; Audio for Multimedia and Audio Processing Systems
Machine Learning for Communications and Wireless Networks
Image and Video Coding/Compression
Bioinformatics and Biomedical Signal Processing
Audio-Visual Speech/Intent Recognition
Multimodal Clustering, Segmentation, and Summarization
Learning Theory and Methods
SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids
Radar Signal Processing
Biological and Medical Signal and Image Processing
Anti-Spoofing and Speaker Embedding
Speech Enhancement; Dereverberation and RIR Estimation
Segmentation
3D Generation
Multimedia Forensics
Speech Signal Improvement Challenge
Audio Deep Packet Loss Concealment Grand Challenge
Signal Processing Theory and Methods Journal Papers
Multi-Sensor and Multichannel Signal Processing
Array Processing and Beamforming
Sound Event Classification and Generation; Active Noise Control, Echo Reduction and Feedback Reduction
Deep Learning Fairness and Privacy
Sparsity and Low-Rank Models
Optimization Methods for Signal Processing
Multimodal Processing
Show and Tell Demos
Special Session
Model based Machine Learning for Wireless Communications and Sensing
Will soon be added
Exploiting Diversities in Advanced Array Systems: New Applications and Trends
Generative Semantic Communication: How Generative Models Enhance Semantic Communications
Quantum Machine Learning Algorithms and Applications on NISQ Devices
Robust Reconstruction Methods in Computational Imaging
Graphical Inference and Modeling in Dynamical Systems
Advancements in Integrated Sensing and Communication for Next-Generation Wireless Networks
Signal and Graph Processing for Autonomous Agents
Next-Generation Wi-Fi Sensing
Signal Processing Theory for Covert Communication and Cybersecurity
In-Context Learning Methods for Speech and Spoken Language Processing
Topological Signal Processing over Higher-Order Networks
Deepfakes and AI-Generated Content (AIGC) Detection and Forensics: Recent Advances
Recent Advances in AI-Powered Visual Computing and Multimodal Signal Processing for Metaverse Era
Algorithm-Hardware Co-Design of Neuromorphic Solutions for Signal Processing Applications
Automotive Radar Signal Processing for Autonomous Driving
Learning with Incomplete Medical Data
Signal Processing and Machine Learning for Collective Intelligence
Variational Inference and Approximate Bayesian Techniques
Efficient Modeling of Long Sequences with Applications to Speech and Audio
Decentralized Learning with Resource-Constrained Communication
Localization and Sensing based on Signals from Terrestrial and Non-Terrestrial Networks
Signal Processing and Machine Learning for Understanding Brain Dynamics
---
## Key Terms
![]()
---
## Star History