Sunday, 11 May 2025
  • My Feed
  • My Interests
  • My Saves
  • History
  • Blog
Subscribe
Capernaum
  • Finance
    • Cryptocurrency
    • Stock Market
    • Real Estate
  • Lifestyle
    • Travel
    • Fashion
    • Cook
  • Technology
    • AI
    • Data Science
    • Machine Learning
  • Health
    HealthShow More
    Skincare as You Age Infographic
    Skincare as You Age Infographic

    When I dove into the scientific research for my book How Not…

    By capernaum
    Treating Fatty Liver Disease with Diet 
    Treating Fatty Liver Disease with Diet 

    What are the three sources of liver fat in fatty liver disease,…

    By capernaum
    Bird Flu: Emergence, Dangers, and Preventive Measures

    In the United States in January 2025 alone, approximately 20 million commercially-raised…

    By capernaum
    Inhospitable Hospital Food 
    Inhospitable Hospital Food 

    What do hospitals have to say for themselves about serving meals that…

    By capernaum
    Gaming the System: Cardiologists, Heart Stents, and Upcoding 
    Gaming the System: Cardiologists, Heart Stents, and Upcoding 

    Cardiologists can criminally game the system by telling patients they have much…

    By capernaum
  • Sport
  • 🔥
  • Cryptocurrency
  • Data Science
  • Travel
  • Real Estate
  • AI
  • Technology
  • Machine Learning
  • Stock Market
  • Finance
  • Fashion
Font ResizerAa
CapernaumCapernaum
  • My Saves
  • My Interests
  • My Feed
  • History
  • Travel
  • Health
  • Technology
Search
  • Pages
    • Home
    • Blog Index
    • Contact Us
    • Search Page
    • 404 Page
  • Personalized
    • My Feed
    • My Saves
    • My Interests
    • History
  • Categories
    • Technology
    • Travel
    • Health
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Home » Blog » Fourier Neural Operators Just Got a Turbo Boost: Researchers from UC Riverside Introduce TurboFNO, a Fully Fused FFT-GEMM-iFFT Kernel Achieving Up to 150% Speedup over PyTorch
AIMachine LearningTechnology

Fourier Neural Operators Just Got a Turbo Boost: Researchers from UC Riverside Introduce TurboFNO, a Fully Fused FFT-GEMM-iFFT Kernel Achieving Up to 150% Speedup over PyTorch

capernaum
Last updated: 2025-04-20 22:16
capernaum
Share
Fourier Neural Operators Just Got a Turbo Boost: Researchers from UC Riverside Introduce TurboFNO, a Fully Fused FFT-GEMM-iFFT Kernel Achieving Up to 150% Speedup over PyTorch
SHARE

Fourier Neural Operators (FNO) are powerful tools for learning partial differential equation solution operators, but lack architecture-aware optimizations, with their Fourier layer executing FFT, filtering, GEMM, zero padding, and iFFT as separate stages, resulting in multiple kernel launches and excessive global memory traffic. The FFT -> GEMM -> iFFT computational pattern has received inadequate attention regarding GPU kernel fusion and memory layout optimization. Current methods like Quantum ESPRESSO, Octopus, and CP2K make separate calls to FFT and BLAS routines. However, they have three limitations: partial frequency utilization with additional memory copy operations, lack of native frequency filtering capabilities in cuFFT, and excessive memory transactions between processing stages.

FNO implements a pipeline that begins with a forward FFT on input feature maps, applies spectral filtering, and reconstructs output through inverse FFT. This process necessitates frequency domain truncation and zero-padding steps, which current frameworks like PyTorch execute as separate memory-copy kernels due to cuFFT’s limitations in native input/output trimming support. Leading FFT libraries such as cuFFT and VkFFT lack built-in data truncation capabilities. Traditional 2D FFTs apply both 1D-FFT stages along spatial dimensions, but FNO applies spectral weights across the channel dimension, suggesting an opportunity for decoupling the FFT stages by keeping the first 1D FFT along spatial axes while reinterpreting the second FFT stage along the hidden dimension.

Researchers from the University of California, Riverside, CA, USA, have proposed TurboFNO, the first fully fused FFT-GEMM-iFFT GPU kernel with built-in FFT optimizations. The approach begins with developing FFT and GEMM kernels from scratch that achieve performance comparable to or faster than closed-source state-of-the-art cuBLAS and cuFFT. An FFT variant is introduced to effectively fuse FFT and GEMM workloads where a single thread block iterates over the hidden dimension, aligning with the k-loop in GEMM. Moreover, two shared memory swizzling patterns are designed to achieve 100% memory bank utilization when forwarding FFT output to GEMM and enable iFFT to retrieve GEMM results directly from shared memory.

TurboFNO integrates optimized implementations of FFT and CGEMM kernels to enable effective fusion and built-in FFT optimizations. The kernel fusion strategy in TurboFNO progresses through three levels: FFT-GEMM fusion, GEMM-iFFT fusion, and full FFT-GEMM-iFFT fusion. Each stage involves aligning the FFT workflow with GEMM, resolving data layout mismatches, and eliminating shared memory bank conflicts. Key techniques include modifying FFT output layout to match GEMM’s input format, applying thread swizzling for conflict-free shared memory access, and integrating inverse FFT as an epilogue stage of CGEMM to bypass intermediate global memory writes and enhance memory locality.

TurboFNO shows great performance in both 1D and 2D FNO evaluations. In 1D FNO tests, the optimized FFT-CGEMM-iFFT workflow achieves up to 100% speedup over PyTorch, averaging 50% improvement. These gains come from FFT pruning, which reduces computation by 25%-67.5%. The fully fused FFT-CGEMM-iFFT kernel delivers up to 150% speedup over PyTorch and provides an additional 10%-20% improvement over partial fusion strategies. Similarly, in 2D FNO, the optimized workflow outperforms PyTorch with average speedups above 50% and maximum improvements reaching 100%. The 2D fully fused kernel achieves 50%-105% speedup over PyTorch without performance degradation, despite the additional overhead of aligning FFT workload layout with CGEMM dataflow.

In this paper, researchers introduced TurboFNO, the first fully fused GPU kernel that integrates FFT, CGEMM, and iFFT for accelerating Fourier Neural Operators. They developed a series of architecture-aware optimizations to overcome inefficiencies in conventional FNO implementations, such as excessive kernel launches and global memory traffic. These include a custom FFT kernel with built-in frequency filtering and zero padding, a GEMM-compatible FFT variant that mimics k-loop behavior, and shared memory swizzling strategies that improve bank utilization from 25% to 100%. TurboFNO achieves up to 150% speedup and maintains an average 67% performance gain across all tested configurations.


Here is the Paper. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post Fourier Neural Operators Just Got a Turbo Boost: Researchers from UC Riverside Introduce TurboFNO, a Fully Fused FFT-GEMM-iFFT Kernel Achieving Up to 150% Speedup over PyTorch appeared first on MarkTechPost.

Share This Article
Twitter Email Copy Link Print
Previous Article Bitcoin Price Forecast: Oregon Attorney General Sues Coinbase, is BTC $100K Rally at Risk? Bitcoin Price Forecast: Oregon Attorney General Sues Coinbase, is BTC $100K Rally at Risk?
Next Article An Advanced Coding Implementation: Mastering Browser‑Driven AI in Google Colab with Playwright, browser_use Agent & BrowserContext, LangChain, and Gemini An Advanced Coding Implementation: Mastering Browser‑Driven AI in Google Colab with Playwright, browser_use Agent & BrowserContext, LangChain, and Gemini
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Using RSS feeds, we aggregate news from trusted sources to ensure real-time updates on the latest events and trends. Stay ahead with timely, curated information designed to keep you informed and engaged.
TwitterFollow
TelegramFollow
LinkedInFollow
- Advertisement -
Ad imageAd image

You Might Also Like

A Coding Implementation of Accelerating Active Learning Annotation with Adala and Google Gemini

By capernaum
Tencent Released PrimitiveAnything: A New AI Framework That Reconstructs 3D Shapes Using Auto-Regressive Primitive Generation
AITechnology

Tencent Released PrimitiveAnything: A New AI Framework That Reconstructs 3D Shapes Using Auto-Regressive Primitive Generation

By capernaum

A Coding Guide to Unlock mem0 Memory for Anthropic Claude Bot: Enabling Context-Rich Conversations

By capernaum
Huawei Introduces Pangu Ultra MoE: A 718B-Parameter Sparse Language Model Trained Efficiently on Ascend NPUs Using Simulation-Driven Architecture and System-Level Optimization
AITechnology

Huawei Introduces Pangu Ultra MoE: A 718B-Parameter Sparse Language Model Trained Efficiently on Ascend NPUs Using Simulation-Driven Architecture and System-Level Optimization

By capernaum
Capernaum
Facebook Twitter Youtube Rss Medium

Capernaum :  Your instant connection to breaking news & stories . Stay informed with real-time coverage across  AI ,Data Science , Finance, Fashion , Travel, Health. Your trusted source for 24/7 insights and updates.

© Capernaum 2024. All Rights Reserved.

CapernaumCapernaum
Welcome Back!

Sign in to your account

Lost your password?