Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sony/creativeai


https://github.com/sony/creativeai

Last synced: about 2 months ago
JSON representation

Awesome Lists containing this project

README

        


# I. Deep Generative Modeling




CTM




[arXiv]
[demo]

Unified framework enables diverse samplers and 1-step generation SOTAs
(ICLR2024)


ICLR24



SAN




[arXiv]
[code]
[demo]

Enhancing GAN with metrizable discriminators
(ICLR2024)


ICLR24

Applications:

[Vocoder]




MPGD




[arXiv]
[demo]

Fast, Efficient, Training-Free, and Controllable diffusion-based generation method
(ICLR2024)


ICLR24



HQ-VAE




[OpenReview]
[arXiv]

Generalizing hierarchical VQ-VAEs with a Bayesian framework
(TMLR2024)


TMLR



FP-Diffusion




[PMLR]
[code]

Improving density estimation of diffusion
(ICML2023)


ICML23



GibbsDDRM




[PMLR]
[code]

Achieving blind inversion using DDPM
(ICML2023 Oral)


ICML23 Oral

Applications:

[DeReverb]
[SpeechEnhance]




Consistency-type Models




[arXiv]

Theoretically unified framework for "consistency" on diffusion models
(ICML2023 SPIGM workshop)


ICML23 SPIGM workshop



SQ-VAE




[PMLR]
[arXiv]
[code]

Improving codebook utilization and training stability
(ICML2022)




AR-ELBO




[Elsevier]
[arXiv]

Mitigating oversmoothness in VAE
(Neurocomputing2022)





# II. Multimodal NLP




DiffuCOMET




[ACL]
[arXiv]
[code]

DiffuCOMET: Contextual Commonsense Knowledge Diffusion
(ACL2024)


ACL2024



CyCLIPs/CyCLAPs




[ACL]
[arXiv]

On the Language Encoder of Contrastive Cross-modal Models
(ACL2024 Findings)


ACL2024



DIIR




[ACL]
[arXiv]
[code]

Few-shot Dialogue Strategy Learning for Motivational Interviewing via Inductive Reasoning
(ACL2024 Findings)


ACL2024



CPD Challenge 2023




[CPD Challenge 2023]

Commonsense Persona-grounded Dialogue Challenge




PeaCok




[ACL]
[arXiv]
[code]

PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives
(ACL2023, Outstanding Paper Award)




ComFact




[EMNLP]
[arXiv]
[code]

ComFact: A Benchmark for Linking Contextual Commonsense Knowledge
(EMNLP2022 Findings)





# III. Music & Cinematic Technologies




Mixing Graph Estimation




[arXiv]
[code]
[demo]

Searching For Music Mixing Graphs: A Pruning Approach
(DAFx24)


DAFx24



Guitar Amp. Modeling




[arXiv]

Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data
(DAFx24)


DAFx24



Text-to-Music Editing




[arXiv]
[code]
[demo]

MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
(IJCAI2024)


IJCAI2024



STARSS23




[arXiv]
[Dataset]

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
(NeurIPS2023)


NeurIPS2023



BigVSAN Vocoder




[arXiv]
[code]
[demo]

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
(ICASSP2024)


ICASSP2024



Instr.-Agnostic Trans.




[arXiv]

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription
(ICASSP2024)


ICASSP2024



Vocal Restoration




[arXiv]

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance
(ICASSP2024)


ICASSP2024



Zero-/Few-shot SELD




[arXiv]

Zero- and Few-shot Sound Event Localization and Detection
(ICASSP2024)


ICASSP2024



CLIPSep




[OpenReview]
[arXiv]
[code]
[demo]

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
(ICLR2023)




hFT-Transformer




[arXiv]
[code]

Automatic Piano Transcription with Hierarchical Frequency-Time Transformer
(ISMIR2023)




Audio Restoration: ViT-AE




[IEEE]
[arXiv]
[demo]

Extending Audio Masked Autoencoders Toward Audio Restoration
(WASPAA2023)




Diffiner




[ISCA]
[arXiv]
[code]

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement
(INTERSPEECH2023)




Automatic Music Tagging




[arXiv]

An Attention-based Approach To Hierarchical Multi-label Music Instrument Classification
(ICASSP2023)




Vocal Dereverberation




[arXiv]
[demo]

Unsupervised Vocal Dereverberation with Diffusion-based Generative Models
(ICASSP2023)




Mixing Style Transfer




[arXiv]
[code]
[demo]

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects
(ICASSP2023)




Music Transcription




[arXiv]
[code]
[demo]

DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
(ICASSP2023)




Singing Voice Vocoder




[arXiv]
[demo]

Hierarchical Diffusion Models for Singing Voice Neural Vocoder
(ICASSP2023)




Distortion Effect Removal




[poster]
[arXiv]
[demo]

Distortion Audio Effects: Learning How to Recover the Clean Signal
(ISMIR2022)




Automatic Music Mixing




[poster]
[arXiv]
[code]
[demo]

Automatic Music Mixing with Deep Learning and Out-of-Domain Data
(ISMIR2022)




Sound Separation




[IEEE]

Music Source Separation with Deep Equilibrium Models
(ICASSP2022)




Automatic DJ Transition




[arXiv]
[code]
[demo]

Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks
(ICASSP2022)




Sound Event Localization and Detection




[IEEE]
[arXiv]

Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training
(ICASSP2022)




Singing Voice Conversion




[arXiv]
[demo]

Robust One-Shot Singing Voice Conversion




Sound Separation




[video]
[site]

Glenn Gould and Kanji Ishimaru 2021: A collaboration with AI Sound Separation after 60 years




MDX21




[site]
[frontiers]

Music Demixing Challenge 2021




Sound Demixing Challenge 2023




[site]
[paper (music)]
[paper (cinematic)]

Sound Demixing Challenge 2023




DCASE Challenge




[DCASE Challenge2023]

Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes




### Contact

Yuki Mitsufuji ([email protected])