Wednesday, 14 May 2025
  • My Feed
  • My Interests
  • My Saves
  • History
  • Blog
Subscribe
Capernaum
  • Finance
    • Cryptocurrency
    • Stock Market
    • Real Estate
  • Lifestyle
    • Travel
    • Fashion
    • Cook
  • Technology
    • AI
    • Data Science
    • Machine Learning
  • Health
    HealthShow More
    Foods That Disrupt Our Microbiome
    Foods That Disrupt Our Microbiome

    Eating a diet filled with animal products can disrupt our microbiome faster…

    By capernaum
    Skincare as You Age Infographic
    Skincare as You Age Infographic

    When I dove into the scientific research for my book How Not…

    By capernaum
    Treating Fatty Liver Disease with Diet 
    Treating Fatty Liver Disease with Diet 

    What are the three sources of liver fat in fatty liver disease,…

    By capernaum
    Bird Flu: Emergence, Dangers, and Preventive Measures

    In the United States in January 2025 alone, approximately 20 million commercially-raised…

    By capernaum
    Inhospitable Hospital Food 
    Inhospitable Hospital Food 

    What do hospitals have to say for themselves about serving meals that…

    By capernaum
  • Sport
  • 🔥
  • Cryptocurrency
  • Data Science
  • Travel
  • Real Estate
  • AI
  • Technology
  • Machine Learning
  • Stock Market
  • Finance
  • Fashion
Font ResizerAa
CapernaumCapernaum
  • My Saves
  • My Interests
  • My Feed
  • History
  • Travel
  • Health
  • Technology
Search
  • Pages
    • Home
    • Blog Index
    • Contact Us
    • Search Page
    • 404 Page
  • Personalized
    • My Feed
    • My Saves
    • My Interests
    • History
  • Categories
    • Technology
    • Travel
    • Health
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Home » Blog » NVIDIA AI Introduces Audio-SDS: A Unified Diffusion-Based Framework for Prompt-Guided Audio Synthesis and Source Separation without Specialized Datasets
AITechnology

NVIDIA AI Introduces Audio-SDS: A Unified Diffusion-Based Framework for Prompt-Guided Audio Synthesis and Source Separation without Specialized Datasets

capernaum
Last updated: 2025-05-12 08:09
capernaum
Share
NVIDIA AI Introduces Audio-SDS: A Unified Diffusion-Based Framework for Prompt-Guided Audio Synthesis and Source Separation without Specialized Datasets
SHARE

Audio diffusion models have achieved high-quality speech, music, and Foley sound synthesis, yet they predominantly excel at sample generation rather than parameter optimization. Tasks like physically informed impact sound generation or prompt-driven source separation require models that can adjust explicit, interpretable parameters under structural constraints. Score Distillation Sampling (SDS)—which has powered text-to-3D and image editing by backpropagating through pretrained diffusion priors—has not yet been applied to audio. Adapting SDS to audio diffusion allows optimizing parametric audio representations without assembling large task-specific datasets, bridging modern generative models with parameterized synthesis workflows.

Classic audio techniques—such as frequency modulation (FM) synthesis, which uses operator-modulated oscillators to craft rich timbres, and physically grounded impact-sound simulators—provide compact, interpretable parameter spaces. Similarly, source separation has evolved from matrix factorization to neural and text-guided methods for isolating components like vocals or instruments. By integrating SDS updates with pretrained audio diffusion models, one can leverage learned generative priors to guide the optimization of FM parameters, impact-sound simulators, or separation masks directly from high-level prompts, uniting signal-processing interpretability with the flexibility of modern diffusion-based generation. 

Researchers from NVIDIA and MIT introduce Audio-SDS, an extension of SDS for text-conditioned audio diffusion models. Audio-SDS leverages a single pretrained model to perform various audio tasks without requiring specialized datasets. Distilling generative priors into parametric audio representations facilitates tasks like impact sound simulation, FM synthesis parameter calibration, and source separation. The framework combines data-driven priors with explicit parameter control, producing perceptually convincing results. Key improvements include a stable decoder-based SDS, multistep denoising, and a multiscale spectrogram approach for better high-frequency detail and realism. 

The study discusses applying SDS to audio diffusion models. Inspired by DreamFusion, SDS generates stereo audio through a rendering function, improving performance by bypassing encoder gradients and focusing instead on the decoded audio. The methodology is enhanced by three modifications: avoiding encoder instability, emphasizing spectrogram features to highlight high-frequency details, and using multi-step denoising for better stability. Applications of Audio-SDS include FM synthesizers, impact sound synthesis, and source separation. These tasks show how SDS adapts to different audio domains without retraining, ensuring that synthesized audio aligns with textual prompts while maintaining high fidelity. 

The performance of the Audio-SDS framework is demonstrated across three tasks: FM synthesis, impact synthesis, and source separation. The experiments are designed to test the framework’s effectiveness using both subjective (listening tests) and objective metrics such as the CLAP score, distance to ground truth, and Signal-to-Distortion Ratio (SDR). Pretrained models, such as the Stable Audio Open checkpoint, are used for these tasks. The results show significant audio synthesis and separation improvements, with clear alignment to text prompts. 

In conclusion, the study introduces Audio-SDS, a method that extends SDS to text-conditioned audio diffusion models. Using a single pretrained model, Audio-SDS enables a variety of tasks, such as simulating physically informed impact sounds, adjusting FM synthesis parameters, and performing source separation based on prompts. The approach unifies data-driven priors with user-defined representations, eliminating the need for large, domain-specific datasets. While there are challenges in model coverage, latent encoding artifacts, and optimization sensitivity, Audio-SDS demonstrates the potential of distillation-based methods for multimodal research, particularly in audio-related tasks. 


Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.

Here’s a brief overview of what we’re building at Marktechpost:

  • ML News Community – r/machinelearningnews (92k+ members)
  • Newsletter– airesearchinsights.com/(30k+ subscribers)
  • miniCON AI Events – minicon.marktechpost.com
  • AI Reports & Magazines – magazine.marktechpost.com
  • AI Dev & Research News – marktechpost.com (1M+ monthly readers)
  • Partner with us

The post NVIDIA AI Introduces Audio-SDS: A Unified Diffusion-Based Framework for Prompt-Guided Audio Synthesis and Source Separation without Specialized Datasets appeared first on MarkTechPost.

Share This Article
Twitter Email Copy Link Print
Previous Article Lido Finance Moves Fast After Chorus One Oracle Compromise Lido Finance Moves Fast After Chorus One Oracle Compromise
Next Article PEPE Coin Price Targets New All-time high as Sellers Lose $1M in 4 hours PEPE Coin Price Targets New All-time high as Sellers Lose $1M in 4 hours
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Using RSS feeds, we aggregate news from trusted sources to ensure real-time updates on the latest events and trends. Stay ahead with timely, curated information designed to keep you informed and engaged.
TwitterFollow
TelegramFollow
LinkedInFollow
- Advertisement -
Ad imageAd image

You Might Also Like

PwC Releases Executive Guide on Agentic AI: A Strategic Blueprint for Deploying Autonomous Multi-Agent Systems in the Enterprise

By capernaum

ServiceLink expands closing technology

By capernaum
Reinforcement Learning, Not Fine-Tuning: Nemotron-Tool-N1 Trains LLMs to Use Tools with Minimal Supervision and Maximum Generalization
AIMachine LearningTechnology

Reinforcement Learning, Not Fine-Tuning: Nemotron-Tool-N1 Trains LLMs to Use Tools with Minimal Supervision and Maximum Generalization

By capernaum

FHA cites AI emergence as it ‘archives’ inactive policy documents

By capernaum
Capernaum
Facebook Twitter Youtube Rss Medium

Capernaum :  Your instant connection to breaking news & stories . Stay informed with real-time coverage across  AI ,Data Science , Finance, Fashion , Travel, Health. Your trusted source for 24/7 insights and updates.

© Capernaum 2024. All Rights Reserved.

CapernaumCapernaum
Welcome Back!

Sign in to your account

Lost your password?