Thursday, 15 May 2025
  • My Feed
  • My Interests
  • My Saves
  • History
  • Blog
Subscribe
Capernaum
  • Finance
    • Cryptocurrency
    • Stock Market
    • Real Estate
  • Lifestyle
    • Travel
    • Fashion
    • Cook
  • Technology
    • AI
    • Data Science
    • Machine Learning
  • Health
    HealthShow More
    Foods That Disrupt Our Microbiome
    Foods That Disrupt Our Microbiome

    Eating a diet filled with animal products can disrupt our microbiome faster…

    By capernaum
    Skincare as You Age Infographic
    Skincare as You Age Infographic

    When I dove into the scientific research for my book How Not…

    By capernaum
    Treating Fatty Liver Disease with Diet 
    Treating Fatty Liver Disease with Diet 

    What are the three sources of liver fat in fatty liver disease,…

    By capernaum
    Bird Flu: Emergence, Dangers, and Preventive Measures

    In the United States in January 2025 alone, approximately 20 million commercially-raised…

    By capernaum
    Inhospitable Hospital Food 
    Inhospitable Hospital Food 

    What do hospitals have to say for themselves about serving meals that…

    By capernaum
  • Sport
  • 🔥
  • Cryptocurrency
  • Data Science
  • Travel
  • Real Estate
  • AI
  • Technology
  • Machine Learning
  • Stock Market
  • Finance
  • Fashion
Font ResizerAa
CapernaumCapernaum
  • My Saves
  • My Interests
  • My Feed
  • History
  • Travel
  • Health
  • Technology
Search
  • Pages
    • Home
    • Blog Index
    • Contact Us
    • Search Page
    • 404 Page
  • Personalized
    • My Feed
    • My Saves
    • My Interests
    • History
  • Categories
    • Technology
    • Travel
    • Health
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Home » Blog » ByteDance VAPO: The AI upgrade you’ll hear about soon
Data Science

ByteDance VAPO: The AI upgrade you’ll hear about soon

capernaum
Last updated: 2025-04-11 15:41
capernaum
Share
ByteDance VAPO: The AI upgrade you’ll hear about soon
SHARE

ByteDance VAPO: The AI upgrade you’ll hear about soon

ByteDance Seed researchers rolled out Value Augmented Proximal Policy Optimization (VAPO), a reinforcement learning training framework designed to sharpen large language models’ reasoning on complex, lengthy tasks, achieving new state-of-the-art results on the AIME24 benchmark.

Training LLMs for intricate reasoning using value-based reinforcement learning previously faced significant hurdles. Methods struggled with value model bias, adapting effectively to response sequences of widely varying lengths, and managing sparse reward signals, especially in verifier-based tasks providing only binary feedback.

VAPO addresses these challenges through three core innovations: a detailed value-based training framework, a Length-adaptive Generalized Advantage Estimation (GAE) mechanism adjusting parameters based on response length, and the systematic integration of techniques from prior research.

This combination creates a system where improvements work synergistically. Using the Qwen2.5-32B model without specific SFT data, VAPO improved benchmark scores from 5 to 60, surpassing previous state-of-the-art methods by 10 points.

VAPO builds upon the Proximal Policy Optimization (PPO) algorithm but incorporates key modifications to enhance mathematical reasoning. Training analysis revealed VAPO exhibits smoother training curves compared to the value-free DAPO method, indicating more stable optimization.

VAPO also demonstrated better length scaling for improved generalization, faster score growth attributable to the granular signals from its value model, and lower entropy in later training stages. While reduced entropy can potentially limit exploration, the method effectively balances this, improving reproducibility and stability with minimal performance impact.

bytedance-vapo-the-ai-upgrade-youll-hear-about-soon
Image: ByteDance Seed

On the AIME24 benchmark, DeepSeek R1 using GRPO achieved 47 points, and DAPO reached 50 points. VAPO, using the Qwen-32b model, matched DAPO’s performance with only 60% of the update steps and set a new state-of-the-art score of 60.4 within 5,000 steps. In contrast, vanilla PPO scored just 5 points due to value model learning collapse.


This benchmark asks if AI can think like an engineer


Ablation studies confirmed the effectiveness of seven distinct modifications within VAPO. Value-Pretraining prevents model collapse; decoupled GAE enables full optimization of long responses; adaptive GAE balances short and long response optimization; Clip-higher encourages thorough exploration; Token-level loss increases weighting for long responses; incorporating positive-example LM loss added 6 points; and Group-Sampling contributed 5 points to the final score.

Researchers highlight that VAPO, utilizing the Qwen2.5-32B model, demonstrates that this value-based approach can decisively outperform value-free methods like GRPO and DAPO, establishing a new performance level for complex reasoning tasks and addressing fundamental challenges in training value models for long chain-of-thought scenarios.


Featured image credit

Share This Article
Twitter Email Copy Link Print
Previous Article Incident.io’s AI agents just got $62M to investigate your outages Incident.io’s AI agents just got $62M to investigate your outages
Next Article IEA warns: AI could double global data center energy use by 2030 IEA warns: AI could double global data center energy use by 2030
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Using RSS feeds, we aggregate news from trusted sources to ensure real-time updates on the latest events and trends. Stay ahead with timely, curated information designed to keep you informed and engaged.
TwitterFollow
TelegramFollow
LinkedInFollow
- Advertisement -
Ad imageAd image

You Might Also Like

A Data Scientist’s Guide to Data Streaming

By capernaum
Apple research paper unveils Matrix3D for 3D content generation
Data Science

Apple research paper unveils Matrix3D for 3D content generation

By capernaum
Microsoft’s ADeLe wants to give your AI a cognitive profile
AIData Science

Microsoft’s ADeLe wants to give your AI a cognitive profile

By capernaum
Is your super helpful generative AI partner secretly making your job boring?
AIData Science

Is your super helpful generative AI partner secretly making your job boring?

By capernaum
Capernaum
Facebook Twitter Youtube Rss Medium

Capernaum :  Your instant connection to breaking news & stories . Stay informed with real-time coverage across  AI ,Data Science , Finance, Fashion , Travel, Health. Your trusted source for 24/7 insights and updates.

© Capernaum 2024. All Rights Reserved.

CapernaumCapernaum
Welcome Back!

Sign in to your account

Lost your password?