Tuesday, 20 May 2025
  • My Feed
  • My Interests
  • My Saves
  • History
  • Blog
Subscribe
Capernaum
  • Finance
    • Cryptocurrency
    • Stock Market
    • Real Estate
  • Lifestyle
    • Travel
    • Fashion
    • Cook
  • Technology
    • AI
    • Data Science
    • Machine Learning
  • Health
    HealthShow More
    Eating to Keep Ulcerative Colitis in Remission 
    Eating to Keep Ulcerative Colitis in Remission 

    Plant-based diets can be 98 percent effective in keeping ulcerative colitis patients…

    By capernaum
    Foods That Disrupt Our Microbiome
    Foods That Disrupt Our Microbiome

    Eating a diet filled with animal products can disrupt our microbiome faster…

    By capernaum
    Skincare as You Age Infographic
    Skincare as You Age Infographic

    When I dove into the scientific research for my book How Not…

    By capernaum
    Treating Fatty Liver Disease with Diet 
    Treating Fatty Liver Disease with Diet 

    What are the three sources of liver fat in fatty liver disease,…

    By capernaum
    Bird Flu: Emergence, Dangers, and Preventive Measures

    In the United States in January 2025 alone, approximately 20 million commercially-raised…

    By capernaum
  • Sport
  • 🔥
  • Cryptocurrency
  • Travel
  • Data Science
  • Real Estate
  • AI
  • Technology
  • Machine Learning
  • Stock Market
  • Finance
  • Fashion
Font ResizerAa
CapernaumCapernaum
  • My Saves
  • My Interests
  • My Feed
  • History
  • Travel
  • Health
  • Technology
Search
  • Pages
    • Home
    • Blog Index
    • Contact Us
    • Search Page
    • 404 Page
  • Personalized
    • My Feed
    • My Saves
    • My Interests
    • History
  • Categories
    • Technology
    • Travel
    • Health
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Home » Blog » OpenAI’s o3 claimed 25%, independent test says “try 10”
Data Science

OpenAI’s o3 claimed 25%, independent test says “try 10”

capernaum
Last updated: 2025-04-21 10:14
capernaum
Share
OpenAI’s o3 claimed 25%, independent test says “try 10”
SHARE

OpenAI’s o3 claimed 25%, independent test says “try 10”

OpenAI’s o3 AI model scored lower on the FrontierMath benchmark than the company initially implied, according to independent tests by Epoch AI, the research institute behind FrontierMath. When OpenAI unveiled o3 in December, it claimed the model could answer 25% of FrontierMath questions, significantly outperforming other models.

Epoch AI’s tests found that o3 scored around 10% on FrontierMath. The discrepancy may be due to differences in testing setups or the version of o3 used. OpenAI’s chief research officer, Mark Chen, had stated that o3 achieved over 25% in “aggressive test-time compute settings.” Epoch noted that OpenAI’s published benchmark results showed a lower-bound score that matches the 10% score Epoch observed.

The public o3 model is “tuned for chat/product use” and has smaller compute tiers than the version tested by OpenAI in December, according to the ARC Prize Foundation, which tested a pre-release version of o3. OpenAI’s Wenda Zhou explained that the production o3 model is “more optimized for real-world use cases” and speed, which may result in benchmark disparities.

openais-o3-claimed-25-percent-independent-test-says-try-10
Image: Epoch AI

OpenAI’s o3-mini-high and o4-mini models outperform o3 on FrontierMath. The company plans to release a more powerful o3 variant, o3-pro, in the coming weeks. This incident highlights the need for caution when interpreting AI benchmarks, particularly when they are used to promote commercial products.

The AI industry has seen several benchmarking controversies recently. In January, Epoch was criticized for not disclosing funding from OpenAI until after the company announced o3. xAI was accused of publishing misleading benchmark charts for its Grok 3 model, and Meta admitted to touting benchmark scores for a different version of a model than the one available to developers.


Featured image credit

Share This Article
Twitter Email Copy Link Print
Previous Article Solana Price Target Set at $2,000 by Expert—These Metrics Support It Solana Price Target Set at $2,000 by Expert—These Metrics Support It
Next Article Zuckerberg once pitched deleting all your Facebook friends Zuckerberg once pitched deleting all your Facebook friends
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Using RSS feeds, we aggregate news from trusted sources to ensure real-time updates on the latest events and trends. Stay ahead with timely, curated information designed to keep you informed and engaged.
TwitterFollow
TelegramFollow
LinkedInFollow
- Advertisement -
Ad imageAd image

You Might Also Like

The Ultimate Guide to Learning Anything with NotebookLM

By capernaum

WTF is Language Model Quantization?!?

By capernaum

7 Best FREE Platforms to Host Machine Learning Models

By capernaum

Infrastructure automation

By capernaum
Capernaum
Facebook Twitter Youtube Rss Medium

Capernaum :  Your instant connection to breaking news & stories . Stay informed with real-time coverage across  AI ,Data Science , Finance, Fashion , Travel, Health. Your trusted source for 24/7 insights and updates.

© Capernaum 2024. All Rights Reserved.

CapernaumCapernaum
Welcome Back!

Sign in to your account

Lost your password?