SoundMorpher: Perceptually-Uniform Sound Morphing with Diffusion Model

Model comparison with MorphFader

This page contains demonstration of comparison with MorphFader, where the audio samples are sourced from this link.

MorphFader and SoundMorpher represent fundamentally different approaches to sound morphing. MorphFader performs sound morphing based on transitions between text prompts, allowing users to specify desired transformations using textual descriptions. In contrast, SoundMorpher is an open-world sound morphing method that directly morphs between two given source audios, enabling a broader range of creative possibilities with real-world audio inputs.

Note:

In SoundMorpher, samples morphs from a source audio (e.g., α =0.0) to target audio (e.g., α = 1.0). The audio samples are generated by morphing with a constant SPDP differences with Δ𝑝 = 0.25.

In MorphFader, each sample below morphs from a source prompt to a target prompt. The source prompt is on the left, and the target prompt is on the right. The audio samples are generated by morphing between the source and target prompts in steps of Δα = 0.25.

Example 1

SoundMorpher

α=0.0

α=1.0

MorphFader

Source Prompt: A clarinet playing

Target Prompt: A trumpet playing

α=0.0

α=1.0

Example 2

SoundMorpher

α=0.0

α=1.0

MorphFader

Source Prompt: A gong ringing

Target Prompt: A dog howling

α=0.0

α=1.0

Example 3

SoundMorpher

α=0.0

α=1.0

MorphFader

Source Prompt: A croaking frog

Target Prompt: A man speaking

α=0.0

α=1.0

Return index page

Reference

[1] Kamath, Purnima, Chitralekha Gupta, and Suranga Nanayakkara. "MorphFader: Enabling Fine-grained Controllable Morphing with Text-to-Audio Models." arXiv preprint arXiv:2408.07260 (2024).