https://github.com/sony/creativeai
https://github.com/sony/creativeai
Last synced: 7 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/sony/creativeai
- Owner: sony
- Created: 2022-11-21T03:13:46.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2025-04-01T08:16:33.000Z (8 months ago)
- Last Synced: 2025-04-02T12:08:07.079Z (8 months ago)
- Language: CSS
- Size: 52.7 MB
- Stars: 66
- Watchers: 7
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Jump Your Steps
[arXiv]
A general method to find an optimal sampling schedule for inference in discrete diffusion
ICLR25
HERO-DM
[arXiv]
[demo]
A method efficiently leverages online human feedback to fine-tune Stable Diffusion for various range of tasks
ICLR25
WPSE
[arXiv]
An enhanced multimodal representation using weighted point clouds and its theoretical benefits
ICLR25
PaGoDA
[arXiv]
A 64x64 pre-trained diffusion model is all you need for 1-step high-resolution SOTA generation
NeurIPS24
CTM
[arXiv]
[demo]
Unified framework enables diverse samplers and 1-step generation SOTAs
ICLR24
Applications:
[SoundGen]
SAN
[arXiv]
[code]
[demo]
Enhancing GAN with metrizable discriminators
ICLR24
Applications:
[Vocoder]
MPGD
[arXiv]
[demo]
Fast, Efficient, Training-Free, and Controllable diffusion-based generation method
ICLR24
GibbsDDRM
[PMLR]
[code]
Achieving blind inversion using DDPM
ICML23
Applications:
[DeReverb]
[SpeechEnhance]
Consistency-type Models
[arXiv]
Theoretically unified framework for "consistency" on diffusion model
ICML23 SPIGM Workshop
VinaBench
[CVPR]
[arXiv]
[data]
VinaBench: Benchmark for Faithful and Consistent Visual Narratives
CVPR25
DIIR
[ACL]
[arXiv]
[code]
Few-shot Dialogue Strategy Learning for Motivational Interviewing via Inductive Reasoning
ACL24
PeaCok
[ACL]
[arXiv]
[code]
PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives
(Outstanding Paper Award)
ACL23
ComFact
[EMNLP]
[arXiv]
[code]
ComFact: A Benchmark for Linking Contextual Commonsense Knowledge
EMNLP22 Findings
Variable Bitrate RVQ
[arXiv]
VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression
ICASSP25
Instr. Timbre Transfer
[arXiv]
[code]
[demo]
Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer
ICASSP25
Mixing Graph Estimation
[arXiv]
[code]
[demo]
Searching For Music Mixing Graphs: A Pruning Approach
DAFx24
Guitar Amp. Modeling
[arXiv]
Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data
DAFx24
Text-to-Music Editing
[arXiv]
[code]
[demo]
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
IJCAI24
Instr.-Agnostic Trans.
[IEEE]
[arXiv]
Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription
ICASSP24
Vocal Restoration
[IEEE]
[arXiv]
[demo]
VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance
ICASSP24
hFT-Transformer
[arXiv]
[code]
Automatic Piano Transcription with Hierarchical Frequency-Time Transformer
ISMIR23
Automatic Music Tagging
[arXiv]
An Attention-based Approach To Hierarchical Multi-label Music Instrument Classification
ICASSP23
Vocal Dereverberation
[arXiv]
[demo]
Unsupervised Vocal Dereverberation with Diffusion-based Generative Models
ICASSP23
Mixing Style Transfer
[arXiv]
[code]
[demo]
Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects
ICASSP23
Music Transcription
[arXiv]
[code]
[demo]
DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
ICASSP23
Singing Voice Vocoder
[arXiv]
[demo]
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
ICASSP23
Distortion Effect Removal
[poster]
[arXiv]
[demo]
Distortion Audio Effects: Learning How to Recover the Clean Signal
ISMIR22
Automatic Music Mixing
[poster]
[arXiv]
[code]
[demo]
Automatic Music Mixing with Deep Learning and Out-of-Domain Data
ISMIR22
Automatic DJ Transition
[arXiv]
[code]
[demo]
Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks
ICASSP22
Sound Separation
[video]
[site]
Glenn Gould and Kanji Ishimaru 2021: A collaboration with AI Sound Separation after 60 years
MMAudio
[arXiv]
[code]
[demo]
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
CVPR25
MMDisCo
[OpenReview]
[arXiv]
[code]
MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
ICLR25
SoundCTM
[OpenReview]
[arXiv]
[code]
[demo]
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
ICLR25
Mining Your Own Secrets
[OpenReview]
[arXiv]
Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
ICLR25
GenWarp
[arXiv]
[demo]
GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping
NeurIPS24
SpecMaskGIT
[arXiv]
[demo]
SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
ISMIR24
BigVSAN Vocoder
[arXiv]
[code]
[demo]
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
ICASSP24
Zero-/Few-shot SELD
[IEEE]
[arXiv]
Zero- and Few-shot Sound Event Localization and Detection
ICASSP24
STARSS23
[arXiv]
[dataset]
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
NeurIPS23
Audio Restoration: ViT-AE
[IEEE]
[arXiv]
[demo]
Extending Audio Masked Autoencoders Toward Audio Restoration
WASPAA23
Diffiner
[ISCA]
[arXiv]
[code]
Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement
INTERSPEECH23
CLIPSep
[OpenReview]
[arXiv]
[code]
[demo]
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
ICLR23
Sound Event Localization and Detection
[IEEE]
[arXiv]
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training
ICASSP22
DCASE Challenge Task 3
[DCASE Challenge2023]
Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes
### Contact
Yuki Mitsufuji (yuhki.mitsufuji@sony.com)







