Wednesday, 14 May 2025
  • My Feed
  • My Interests
  • My Saves
  • History
  • Blog
Subscribe
Capernaum
  • Finance
    • Cryptocurrency
    • Stock Market
    • Real Estate
  • Lifestyle
    • Travel
    • Fashion
    • Cook
  • Technology
    • AI
    • Data Science
    • Machine Learning
  • Health
    HealthShow More
    Foods That Disrupt Our Microbiome
    Foods That Disrupt Our Microbiome

    Eating a diet filled with animal products can disrupt our microbiome faster…

    By capernaum
    Skincare as You Age Infographic
    Skincare as You Age Infographic

    When I dove into the scientific research for my book How Not…

    By capernaum
    Treating Fatty Liver Disease with Diet 
    Treating Fatty Liver Disease with Diet 

    What are the three sources of liver fat in fatty liver disease,…

    By capernaum
    Bird Flu: Emergence, Dangers, and Preventive Measures

    In the United States in January 2025 alone, approximately 20 million commercially-raised…

    By capernaum
    Inhospitable Hospital Food 
    Inhospitable Hospital Food 

    What do hospitals have to say for themselves about serving meals that…

    By capernaum
  • Sport
  • 🔥
  • Cryptocurrency
  • Data Science
  • Travel
  • Real Estate
  • AI
  • Technology
  • Machine Learning
  • Stock Market
  • Finance
  • Fashion
Font ResizerAa
CapernaumCapernaum
  • My Saves
  • My Interests
  • My Feed
  • History
  • Travel
  • Health
  • Technology
Search
  • Pages
    • Home
    • Blog Index
    • Contact Us
    • Search Page
    • 404 Page
  • Personalized
    • My Feed
    • My Saves
    • My Interests
    • History
  • Categories
    • Technology
    • Travel
    • Health
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Home » Blog » OpenAI Introduces the Evals API: Streamlined Model Evaluation for Developers
AITechnology

OpenAI Introduces the Evals API: Streamlined Model Evaluation for Developers

capernaum
Last updated: 2025-04-09 08:36
capernaum
Share
SHARE

In a significant move to empower developers and teams working with large language models (LLMs), OpenAI has introduced the Evals API, a new toolset that brings programmatic evaluation capabilities to the forefront. While evaluations were previously accessible via the OpenAI dashboard, the new API allows developers to define tests, automate evaluation runs, and iterate on prompts directly from their workflows.

Why the Evals API Matters

Evaluating LLM performance has often been a manual, time-consuming process, especially for teams scaling applications across diverse domains. With the Evals API, OpenAI provides a systematic approach to:

  • Assess model performance on custom test cases
  • Measure improvements across prompt iterations
  • Automate quality assurance in development pipelines

Now, every developer can treat evaluation as a first-class citizen in the development cycle—similar to how unit tests are treated in traditional software engineering.

Core Features of the Evals API

  1. Custom Eval Definitions: Developers can write their own evaluation logic by extending base classes.
  2. Test Data Integration: Seamlessly integrate evaluation datasets to test specific scenarios.
  3. Parameter Configuration: Configure model, temperature, max tokens, and other generation parameters.
  4. Automated Runs: Trigger evaluations via code, and retrieve results programmatically.

The Evals API supports a YAML-based configuration structure, allowing for both flexibility and reusability.

Getting Started with the Evals API

To use the Evals API, you first install the OpenAI Python package:

Copy CodeCopiedUse a different Browser
pip install openai

Then, you can run an evaluation using a built-in eval, such as factuality_qna

Copy CodeCopiedUse a different Browser
oai evals registry:evaluation:factuality_qna 
  --completion_fns gpt-4 
  --record_path eval_results.jsonl

Or define a custom eval in Python:

Copy CodeCopiedUse a different Browser
import openai.evals

class MyRegressionEval(openai.evals.Eval):
    def run(self):
        for example in self.get_examples():
            result = self.completion_fn(example['input'])
            score = self.compute_score(result, example['ideal'])
            yield self.make_result(result=result, score=score)

This example shows how you can define a custom evaluation logic—in this case, measuring regression accuracy.

Use Case: Regression Evaluation

OpenAI’s cookbook example walks through building a regression evaluator using the API. Here’s a simplified version:

Copy CodeCopiedUse a different Browser
from sklearn.metrics import mean_squared_error

class RegressionEval(openai.evals.Eval):
    def run(self):
        predictions, labels = [], []
        for example in self.get_examples():
            response = self.completion_fn(example['input'])
            predictions.append(float(response.strip()))
            labels.append(example['ideal'])
        mse = mean_squared_error(labels, predictions)
        yield self.make_result(result={"mse": mse}, score=-mse)

This allows developers to benchmark numerical predictions from models and track changes over time.

Seamless Workflow Integration

Whether you’re building a chatbot, summarization engine, or classification system, evaluations can now be triggered as part of your CI/CD pipeline. This ensures that every prompt or model update maintains or improves performance before going live.

Copy CodeCopiedUse a different Browser
openai.evals.run(
  eval_name="my_eval",
  completion_fn="gpt-4",
  eval_config={"path": "eval_config.yaml"}
)

Conclusion

The launch of the Evals API marks a shift toward robust, automated evaluation standards in LLM development. By offering the ability to configure, run, and analyze evaluations programmatically, OpenAI is enabling teams to build with confidence and continuously improve the quality of their AI applications.

To explore further, check out the official OpenAI Evals documentation and the cookbook examples.

The post OpenAI Introduces the Evals API: Streamlined Model Evaluation for Developers appeared first on MarkTechPost.

Share This Article
Twitter Email Copy Link Print
Previous Article Salesforce AI Released APIGen-MT and xLAM-2-fc-r Model Series: Advancing Multi-Turn Agent Training with Verified Data Pipelines and Scalable LLM Architectures Salesforce AI Released APIGen-MT and xLAM-2-fc-r Model Series: Advancing Multi-Turn Agent Training with Verified Data Pipelines and Scalable LLM Architectures
Next Article Bitcoin Price Eyes Rally as $2T Treasury Trade Unfolds—Fed Emergency Rate Cut Bets Soar Bitcoin Price Eyes Rally as $2T Treasury Trade Unfolds—Fed Emergency Rate Cut Bets Soar
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Using RSS feeds, we aggregate news from trusted sources to ensure real-time updates on the latest events and trends. Stay ahead with timely, curated information designed to keep you informed and engaged.
TwitterFollow
TelegramFollow
LinkedInFollow
- Advertisement -
Ad imageAd image

You Might Also Like

This AI Paper Investigates Test-Time Scaling of English-Centric RLMs for Enhanced Multilingual Reasoning and Domain Generalization

By capernaum
Rethinking Toxic Data in LLM Pretraining: A Co-Design Approach for Improved Steerability and Detoxification
AIMachine LearningTechnology

Rethinking Toxic Data in LLM Pretraining: A Co-Design Approach for Improved Steerability and Detoxification

By capernaum

PwC Releases Executive Guide on Agentic AI: A Strategic Blueprint for Deploying Autonomous Multi-Agent Systems in the Enterprise

By capernaum

ServiceLink expands closing technology

By capernaum
Capernaum
Facebook Twitter Youtube Rss Medium

Capernaum :  Your instant connection to breaking news & stories . Stay informed with real-time coverage across  AI ,Data Science , Finance, Fashion , Travel, Health. Your trusted source for 24/7 insights and updates.

© Capernaum 2024. All Rights Reserved.

CapernaumCapernaum
Welcome Back!

Sign in to your account

Lost your password?