Tuesday, 13 May 2025
  • My Feed
  • My Interests
  • My Saves
  • History
  • Blog
Subscribe
Capernaum
  • Finance
    • Cryptocurrency
    • Stock Market
    • Real Estate
  • Lifestyle
    • Travel
    • Fashion
    • Cook
  • Technology
    • AI
    • Data Science
    • Machine Learning
  • Health
    HealthShow More
    Skincare as You Age Infographic
    Skincare as You Age Infographic

    When I dove into the scientific research for my book How Not…

    By capernaum
    Treating Fatty Liver Disease with Diet 
    Treating Fatty Liver Disease with Diet 

    What are the three sources of liver fat in fatty liver disease,…

    By capernaum
    Bird Flu: Emergence, Dangers, and Preventive Measures

    In the United States in January 2025 alone, approximately 20 million commercially-raised…

    By capernaum
    Inhospitable Hospital Food 
    Inhospitable Hospital Food 

    What do hospitals have to say for themselves about serving meals that…

    By capernaum
    Gaming the System: Cardiologists, Heart Stents, and Upcoding 
    Gaming the System: Cardiologists, Heart Stents, and Upcoding 

    Cardiologists can criminally game the system by telling patients they have much…

    By capernaum
  • Sport
  • 🔥
  • Cryptocurrency
  • Data Science
  • Travel
  • Real Estate
  • AI
  • Technology
  • Machine Learning
  • Stock Market
  • Finance
  • Fashion
Font ResizerAa
CapernaumCapernaum
  • My Saves
  • My Interests
  • My Feed
  • History
  • Travel
  • Health
  • Technology
Search
  • Pages
    • Home
    • Blog Index
    • Contact Us
    • Search Page
    • 404 Page
  • Personalized
    • My Feed
    • My Saves
    • My Interests
    • History
  • Categories
    • Technology
    • Travel
    • Health
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Home » Blog » ReTool: A Tool-Augmented Reinforcement Learning Framework for Optimizing LLM Reasoning with Computational Tools
AIMachine LearningTechnology

ReTool: A Tool-Augmented Reinforcement Learning Framework for Optimizing LLM Reasoning with Computational Tools

capernaum
Last updated: 2025-04-21 08:34
capernaum
Share
ReTool: A Tool-Augmented Reinforcement Learning Framework for Optimizing LLM Reasoning with Computational Tools
SHARE

Reinforcement learning (RL) is a powerful technique for enhancing the reasoning capabilities of LLMs, enabling them to develop and refine long Chain-of-Thought (CoT). Models like OpenAI o1 and DeepSeek R1 have shown great performance in text-based reasoning tasks, however, they face limitations on tasks that require precise numerical calculations or symbolic manipulations, such as geometric reasoning, complex computations, or equation solving. Recent research has explored prompting and supervised fine-tuning methods to equip LLMs with tool-use capabilities, but they are constrained by their reliance on imitating curated data distributions. This often results in poor generalization beyond seen patterns and an inability to determine when and how to invoke external tools.

Recent advancements in LLMs show progress toward human-like metacognition through CoT prompting. Research has evolved from train-time scaling to test-time scaling, allocating additional computational resources during inference to generate intermediate reasoning steps. Techniques like stepwise preference optimization, Monte Carlo Tree Search, and RL have improved multi-step mathematical reasoning, as evidenced by models like OpenAI-o1 and DeepSeek-R1. In addition to CoT, Program-of-Thought reasoning integrates external computational tools such as Python interpreters to simplify complex reasoning steps. Further, Tool-integrated reasoning was initially introduced to help LLMs solve computationally intensive problems through programming strategies.

Researchers from ByteDance Seed have proposed ReTool, a CI-powered RL framework designed to address math problem-solving tasks. It enhances long-form reasoning with tool-integrated learning through two key features. First, it enables dynamic interleaving of real-time code execution within natural language reasoning processes. Second, it implements an automated RL technique that allows policy rollouts with multi-turn real-time code execution, teaching the model when and how to invoke tools based on outcome feedback. ReTool employs a systematic training framework that begins with synthetic cold-start data generation to produce code-augmented long-form reasoning traces for fine-tuning base models.

The ReTool consists of two primary stages, cold-start supervised fine-tuning followed by RL with interleaved code execution rollout. The pipeline designed for collecting and curating high-quality data begins with collecting high-quality mathematical reasoning data from diverse sources, including open-source datasets like OpenThoughts. A dual-verification approach combining human expert curation and Deepseek-R1 evaluation filters invalid data. From this foundation, code-integrated reasoning data is automatically constructed. The VeRL framework is employed with PPO as the RL method for training. The maximum sequence length is set to 16384 tokens, with a 512 mini-batch size and a KL coefficient of 0.0, using Qwen2.5-32B-Instruct as the main backbone.

ReTool enables the LLM to utilize the code interpreter flexibly during the RL stage, leading to substantial performance improvements. ReTool (Qwen2.5-32B-Instruct) achieves accuracies of 67.0% on AIME2024 and 49.3% on AIME2025 with only 400 training steps. This outperforms the text-based RL baseline (Qwen2.5-32B-Instruct), which attains 40.0% and 36.7% on the respective benchmarks despite using over 1000 training steps. Moreover, on AIME2024, ReTool (Qwen2.5-32B-Instruct) surpasses the competitive baseline s1-32B by 10.3%. Similarly, on AIME2025, it achieves an 11.4% gain over OpenAI’s o1-preview. When combined with a more advanced backbone, ReTool (DeepSeek-R1-Distill-Qwen-32B) further improves performance with scores of 72.5% on AIME2024 and 54.3% on AIME2025.

In conclusion, researchers introduced ReTool, a novel RL framework that empowers LLMs to self-enhance their mathematical reasoning capabilities through effective Code Interpreter utilization. Experiments on AIME2024 and AIME2025 show that ReTool achieves superior accuracy compared to conventional text-based RL approaches and converges with significantly fewer training steps. Through careful data curation and a specialized tool-using pipeline, ReTool enables models to develop complex computational intervention strategies, paving the way for more efficient and powerful tool-augmented reasoning in LLMs. The results demonstrate that tool-integrated RL represents a promising direction for advancing mathematical reasoning capabilities in LLMs for tasks requiring precise computation and symbolic manipulation.


Check out the Paper. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post ReTool: A Tool-Augmented Reinforcement Learning Framework for Optimizing LLM Reasoning with Computational Tools appeared first on MarkTechPost.

Share This Article
Twitter Email Copy Link Print
Previous Article Who Is The Winner Of Our 100,000 Hilton Honors Points Giveaway? Who Is The Winner Of Our 100,000 Hilton Honors Points Giveaway?
Next Article Charles Schwab Eyes 2026 Launch for Spot Crypto Trading—Boosting BTC and ETH Reach Charles Schwab Eyes 2026 Launch for Spot Crypto Trading—Boosting BTC and ETH Reach
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Using RSS feeds, we aggregate news from trusted sources to ensure real-time updates on the latest events and trends. Stay ahead with timely, curated information designed to keep you informed and engaged.
TwitterFollow
TelegramFollow
LinkedInFollow
- Advertisement -
Ad imageAd image

You Might Also Like

Reinforcement Learning, Not Fine-Tuning: Nemotron-Tool-N1 Trains LLMs to Use Tools with Minimal Supervision and Maximum Generalization
AIMachine LearningTechnology

Reinforcement Learning, Not Fine-Tuning: Nemotron-Tool-N1 Trains LLMs to Use Tools with Minimal Supervision and Maximum Generalization

By capernaum

FHA cites AI emergence as it ‘archives’ inactive policy documents

By capernaum

Better leans on AI, sees first profitable month since 2022

By capernaum
A Step-by-Step Guide to Deploy a Fully Integrated Firecrawl-Powered MCP Server on Claude Desktop with Smithery and VeryaX
AI

A Step-by-Step Guide to Deploy a Fully Integrated Firecrawl-Powered MCP Server on Claude Desktop with Smithery and VeryaX

By capernaum
Capernaum
Facebook Twitter Youtube Rss Medium

Capernaum :  Your instant connection to breaking news & stories . Stay informed with real-time coverage across  AI ,Data Science , Finance, Fashion , Travel, Health. Your trusted source for 24/7 insights and updates.

© Capernaum 2024. All Rights Reserved.

CapernaumCapernaum
Welcome Back!

Sign in to your account

Lost your password?