Thursday, 15 May 2025
  • My Feed
  • My Interests
  • My Saves
  • History
  • Blog
Subscribe
Capernaum
  • Finance
    • Cryptocurrency
    • Stock Market
    • Real Estate
  • Lifestyle
    • Travel
    • Fashion
    • Cook
  • Technology
    • AI
    • Data Science
    • Machine Learning
  • Health
    HealthShow More
    Foods That Disrupt Our Microbiome
    Foods That Disrupt Our Microbiome

    Eating a diet filled with animal products can disrupt our microbiome faster…

    By capernaum
    Skincare as You Age Infographic
    Skincare as You Age Infographic

    When I dove into the scientific research for my book How Not…

    By capernaum
    Treating Fatty Liver Disease with Diet 
    Treating Fatty Liver Disease with Diet 

    What are the three sources of liver fat in fatty liver disease,…

    By capernaum
    Bird Flu: Emergence, Dangers, and Preventive Measures

    In the United States in January 2025 alone, approximately 20 million commercially-raised…

    By capernaum
    Inhospitable Hospital Food 
    Inhospitable Hospital Food 

    What do hospitals have to say for themselves about serving meals that…

    By capernaum
  • Sport
  • 🔥
  • Cryptocurrency
  • Data Science
  • Travel
  • Real Estate
  • AI
  • Technology
  • Machine Learning
  • Stock Market
  • Finance
  • Fashion
Font ResizerAa
CapernaumCapernaum
  • My Saves
  • My Interests
  • My Feed
  • History
  • Travel
  • Health
  • Technology
Search
  • Pages
    • Home
    • Blog Index
    • Contact Us
    • Search Page
    • 404 Page
  • Personalized
    • My Feed
    • My Saves
    • My Interests
    • History
  • Categories
    • Technology
    • Travel
    • Health
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Home » Blog » Apparently, LLMs are really bad at playing chess
AIData Science

Apparently, LLMs are really bad at playing chess

capernaum
Last updated: 2024-11-18 10:36
capernaum
Share
Apparently, LLMs are really bad at playing chess
SHARE

Apparently, LLMs are really bad at playing chess

Contents
Testing LLMs against chess enginesGPT-3.5-turbo-instruct was the unexpected winner
  • Not all LLMs are equal: GPT-3.5-turbo-instruct stands out as the most capable chess-playing model tested.
  • Fine-tuning is crucial: Instruction tuning and targeted dataset exposure dramatically enhance performance in specific domains.
  • Chess as a benchmark: The experiment highlights chess as a valuable benchmark for evaluating LLM capabilities and refining AI systems.

Can AI language models play chess? That question sparked a recent investigation into how well large language models (LLMs) handle chess tasks, revealing unexpected insights about their strengths, weaknesses, and training methodologies.

While some models floundered against even the simplest chess engines, others—like OpenAI’s GPT-3.5-turbo-instruct—showed surprising potential, pointing to intriguing implications for AI development.

Testing LLMs against chess engines

Researchers tested various LLMs by asking them to play chess as grandmasters, providing game states in algebraic notation. Initial excitement centered on whether LLMs, trained on vast text corpora, could leverage embedded chess knowledge to predict moves effectively.

However, results showed that not all LLMs are created equal.

The study began with smaller models like llama-3.2-3b, which has 3 billion parameters. After 50 games against Stockfish’s lowest difficulty setting, the model lost every match, failing to protect its pieces or maintain a favorable board position.

Testing escalated to larger models, such as llama-3.1-70b and its instruction-tuned variant, but they also struggled, showing only slight improvements. Other models, including Qwen-2.5-72b and command-r-v01, continued the trend, revealing a general inability to grasp even basic chess strategies.

chess performance of LLMs research
Smaller LLMs, like llama-3.2-3b, struggled with basic chess strategies, losing consistently to even beginner-level engines (Image credit)

GPT-3.5-turbo-instruct was the unexpected winner

The turning point came with GPT-3.5-turbo-instruct, which excelled against Stockfish—even when the engine’s difficulty level was increased. Unlike chat-oriented counterparts like gpt-3.5-turbo and gpt-4o, the instruct-tuned model consistently produced winning moves.

Why do some models excel while others fail?

Key findings from the research offered valuable insights:

  • Instruction tuning matters: Models like GPT-3.5-turbo-instruct benefited from human feedback fine-tuning, which improved their ability to process structured tasks like chess.
  • Dataset exposure: There’s speculation that instruct models may have been exposed to a richer dataset of chess games, granting them superior strategic reasoning.
  • Tokenization challenges: Small nuances, like incorrect spaces in prompts, disrupted performance, highlighting the sensitivity of LLMs to input formatting.
  • Competing data influences: Training LLMs on diverse datasets may dilute their ability to excel at specialized tasks, such as chess, unless counterbalanced with targeted fine-tuning.

As AI continues to improve, these lessons will inform strategies for improving model performance across disciplines. Whether it’s chess, natural language understanding, or other intricate tasks, understanding how to train and tune AI is essential for unlocking its full potential.


Featured image credit: Piotr Makowski/Unsplash

Share This Article
Twitter Email Copy Link Print
Previous Article Bitstamp Just ‘Rug Pulled’ The XRP Community, Claims XPMarket CEO Bitstamp Just ‘Rug Pulled’ The XRP Community, Claims XPMarket CEO
Next Article CBRE hires Hugh Macdonald as Apac head of capital advisers CBRE hires Hugh Macdonald as Apac head of capital advisers
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Using RSS feeds, we aggregate news from trusted sources to ensure real-time updates on the latest events and trends. Stay ahead with timely, curated information designed to keep you informed and engaged.
TwitterFollow
TelegramFollow
LinkedInFollow
- Advertisement -
Ad imageAd image

You Might Also Like

Google DeepMind Introduces AlphaEvolve: A Gemini-Powered Coding AI Agent for Algorithm Discovery and Scientific Optimization
AITechnology

Google DeepMind Introduces AlphaEvolve: A Gemini-Powered Coding AI Agent for Algorithm Discovery and Scientific Optimization

By capernaum

Rime Introduces Arcana and Rimecaster (Open Source): Practical Voice AI Tools Built on Real-World Speech

By capernaum
Meta AI Introduces CATransformers: A Carbon-Aware Machine Learning Framework to Co-Optimize AI Models and Hardware for Sustainable Edge Deployment
AIMachine LearningTechnology

Meta AI Introduces CATransformers: A Carbon-Aware Machine Learning Framework to Co-Optimize AI Models and Hardware for Sustainable Edge Deployment

By capernaum

A Data Scientist’s Guide to Data Streaming

By capernaum
Capernaum
Facebook Twitter Youtube Rss Medium

Capernaum :  Your instant connection to breaking news & stories . Stay informed with real-time coverage across  AI ,Data Science , Finance, Fashion , Travel, Health. Your trusted source for 24/7 insights and updates.

© Capernaum 2024. All Rights Reserved.

CapernaumCapernaum
Welcome Back!

Sign in to your account

Lost your password?