An open API service indexing awesome lists of open source software.

https://github.com/sony/creativeai


https://github.com/sony/creativeai

Last synced: 7 months ago
JSON representation

Awesome Lists containing this project

README

          


# Deep Generative Modeling




Jump Your Steps




[arXiv]

A general method to find an optimal sampling schedule for inference in discrete diffusion


ICLR25



HERO-DM




[arXiv]
[demo]

A method efficiently leverages online human feedback to fine-tune Stable Diffusion for various range of tasks


ICLR25



WPSE




[arXiv]

An enhanced multimodal representation using weighted point clouds and its theoretical benefits


ICLR25



PaGoDA




[arXiv]

A 64x64 pre-trained diffusion model is all you need for 1-step high-resolution SOTA generation


NeurIPS24



CTM




[arXiv]
[demo]

Unified framework enables diverse samplers and 1-step generation SOTAs


ICLR24

Applications:

[SoundGen]




SAN




[arXiv]
[code]
[demo]

Enhancing GAN with metrizable discriminators


ICLR24

Applications:

[Vocoder]




MPGD




[arXiv]
[demo]

Fast, Efficient, Training-Free, and Controllable diffusion-based generation method


ICLR24



HQ-VAE




[OpenReview]
[arXiv]

Generalizing hierarchical VQ-VAEs with a Bayesian framework


TMLR



FP-Diffusion




[PMLR]
[code]

Improving density estimation of diffusion


ICML23



GibbsDDRM




[PMLR]
[code]

Achieving blind inversion using DDPM


ICML23

Applications:

[DeReverb]
[SpeechEnhance]




Consistency-type Models




[arXiv]

Theoretically unified framework for "consistency" on diffusion model


ICML23 SPIGM Workshop



SQ-VAE




[PMLR]
[arXiv]
[code]

Improving codebook utilization and training stability


ICML22



AR-ELBO




[Elsevier]
[arXiv]

Mitigating oversmoothness in VAE


Neurocomputing





# Multimodal NLP




VinaBench




[CVPR]
[arXiv]
[data]

VinaBench: Benchmark for Faithful and Consistent Visual Narratives


CVPR25



DiffuCOMET




[ACL]
[arXiv]
[code]

DiffuCOMET: Contextual Commonsense Knowledge Diffusion


ACL24



CyCLIPs/CyCLAPs




[ACL]
[arXiv]

On the Language Encoder of Contrastive Cross-modal Models


ACL24



DIIR




[ACL]
[arXiv]
[code]

Few-shot Dialogue Strategy Learning for Motivational Interviewing via Inductive Reasoning


ACL24



PeaCok




[ACL]
[arXiv]
[code]

PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives
(Outstanding Paper Award)


ACL23



ComFact




[EMNLP]
[arXiv]
[code]

ComFact: A Benchmark for Linking Contextual Commonsense Knowledge


EMNLP22 Findings




# Music Technologies




Variable Bitrate RVQ




[arXiv]

VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression


ICASSP25



Instr. Timbre Transfer




[arXiv]
[code]
[demo]

Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer


ICASSP25



Mixing Graph Estimation




[arXiv]
[code]
[demo]

Searching For Music Mixing Graphs: A Pruning Approach


DAFx24



Guitar Amp. Modeling




[arXiv]

Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data


DAFx24



Text-to-Music Editing




[arXiv]
[code]
[demo]

MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models


IJCAI24



Instr.-Agnostic Trans.




[IEEE]
[arXiv]

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription


ICASSP24



Vocal Restoration




[IEEE]
[arXiv]
[demo]

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance


ICASSP24



hFT-Transformer




[arXiv]
[code]

Automatic Piano Transcription with Hierarchical Frequency-Time Transformer


ISMIR23



Automatic Music Tagging




[arXiv]

An Attention-based Approach To Hierarchical Multi-label Music Instrument Classification


ICASSP23



Vocal Dereverberation




[arXiv]
[demo]

Unsupervised Vocal Dereverberation with Diffusion-based Generative Models


ICASSP23



Mixing Style Transfer




[arXiv]
[code]
[demo]

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects


ICASSP23



Music Transcription




[arXiv]
[code]
[demo]

DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability


ICASSP23



Singing Voice Vocoder




[arXiv]
[demo]

Hierarchical Diffusion Models for Singing Voice Neural Vocoder


ICASSP23



Distortion Effect Removal




[poster]
[arXiv]
[demo]

Distortion Audio Effects: Learning How to Recover the Clean Signal


ISMIR22



Automatic Music Mixing




[poster]
[arXiv]
[code]
[demo]

Automatic Music Mixing with Deep Learning and Out-of-Domain Data


ISMIR22



Sound Separation




[IEEE]

Music Source Separation with Deep Equilibrium Models


ICASSP22



Automatic DJ Transition




[arXiv]
[code]
[demo]

Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks


ICASSP22



Singing Voice Conversion




[arXiv]
[demo]

Robust One-Shot Singing Voice Conversion




Sound Separation




[video]
[site]

Glenn Gould and Kanji Ishimaru 2021: A collaboration with AI Sound Separation after 60 years






# Cinematic Technologies




MMAudio




[arXiv]
[code]
[demo]

MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis


CVPR25



MMDisCo




[OpenReview]
[arXiv]
[code]

MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation


ICLR25



SoundCTM




[OpenReview]
[arXiv]
[code]
[demo]

SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation


ICLR25



Mining Your Own Secrets




[OpenReview]
[arXiv]

Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models


ICLR25



GenWarp




[arXiv]
[demo]

GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping


NeurIPS24



SpecMaskGIT




[arXiv]
[demo]

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond


ISMIR24



Acoustic Inv. Rendering




[CVF]
[arXiv]
[dataset]
[code]
[demo]

Hearing Anything Anywhere


CVPR24



BigVSAN Vocoder




[arXiv]
[code]
[demo]

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network


ICASSP24



Zero-/Few-shot SELD




[IEEE]
[arXiv]

Zero- and Few-shot Sound Event Localization and Detection


ICASSP24



STARSS23




[arXiv]
[dataset]

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events


NeurIPS23



Audio Restoration: ViT-AE




[IEEE]
[arXiv]
[demo]

Extending Audio Masked Autoencoders Toward Audio Restoration


WASPAA23



Diffiner




[ISCA]
[arXiv]
[code]

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement


INTERSPEECH23



CLIPSep




[OpenReview]
[arXiv]
[code]
[demo]

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos


ICLR23



Sound Event Localization and Detection




[IEEE]
[arXiv]

Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training


ICASSP22




# Hosted Challenges




SVG Challenge 2024




[SVG Challenge 2024]

Sounding Video Generation Challenge 2024




DCASE Challenge Task 3




[DCASE Challenge2023]

Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes




CPD Challenge 2023




[CPD Challenge 2023]

Commonsense Persona-grounded Dialogue Challenge




SDX Challenge 2023




[site]
[paper (music)]
[paper (cinematic)]

Sound Demixing Challenge 2023




MDX Challenge 2021




[site]
[frontiers]

Music Demixing Challenge 2021





### Contact

Yuki Mitsufuji (yuhki.mitsufuji@sony.com)