Saturday, 17 May 2025
  • My Feed
  • My Interests
  • My Saves
  • History
  • Blog
Subscribe
Capernaum
  • Finance
    • Cryptocurrency
    • Stock Market
    • Real Estate
  • Lifestyle
    • Travel
    • Fashion
    • Cook
  • Technology
    • AI
    • Data Science
    • Machine Learning
  • Health
    HealthShow More
    Eating to Keep Ulcerative Colitis in Remission 
    Eating to Keep Ulcerative Colitis in Remission 

    Plant-based diets can be 98 percent effective in keeping ulcerative colitis patients…

    By capernaum
    Foods That Disrupt Our Microbiome
    Foods That Disrupt Our Microbiome

    Eating a diet filled with animal products can disrupt our microbiome faster…

    By capernaum
    Skincare as You Age Infographic
    Skincare as You Age Infographic

    When I dove into the scientific research for my book How Not…

    By capernaum
    Treating Fatty Liver Disease with Diet 
    Treating Fatty Liver Disease with Diet 

    What are the three sources of liver fat in fatty liver disease,…

    By capernaum
    Bird Flu: Emergence, Dangers, and Preventive Measures

    In the United States in January 2025 alone, approximately 20 million commercially-raised…

    By capernaum
  • Sport
  • 🔥
  • Cryptocurrency
  • Data Science
  • Travel
  • Real Estate
  • AI
  • Technology
  • Machine Learning
  • Stock Market
  • Finance
  • Fashion
Font ResizerAa
CapernaumCapernaum
  • My Saves
  • My Interests
  • My Feed
  • History
  • Travel
  • Health
  • Technology
Search
  • Pages
    • Home
    • Blog Index
    • Contact Us
    • Search Page
    • 404 Page
  • Personalized
    • My Feed
    • My Saves
    • My Interests
    • History
  • Categories
    • Technology
    • Travel
    • Health
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Home » Blog » AI masters language but flunks LEGO 101
AIData Science

AI masters language but flunks LEGO 101

capernaum
Last updated: 2025-03-27 12:27
capernaum
Share
AI masters language but flunks LEGO 101
SHARE

AI masters language but flunks LEGO 101

Contents
Why test AI with LEGOs?The surprising scorecard: AI vs humansCan AI even show us the next step?

We hear constantly about the incredible feats of AI like GPT-4o and Gemini – writing code, crafting poetry, acing exams. You might think these powerful Multimodal Large Language Models (MLLMs), which understand both text and images, are well on their way to mastering everything. But what happens when you ask them to do something seemingly simple, like follow LEGO instructions?

According to a new study from researchers at the Shanghai AI Laboratory and Tongji University, the answer is: they largely fail. These AI wizards, it turns out, are surprisingly clumsy when it comes to understanding and reasoning about objects in space over multiple steps – a skill crucial for interacting with the real world.

Why test AI with LEGOs?

The researchers designed a clever benchmark called LEGO-Puzzles precisely because building LEGOs mirrors how humans develop “spatial intelligence.” Following those little diagrams requires understanding 3D shapes, how they fit together, their orientation, and the correct sequence of actions. If an AI can’t handle that, how can we expect it to guide a robot arm assembling a product or navigate a self-driving car through a complex construction zone?

The LEGO-Puzzles benchmark isn’t child’s play. It includes over 1,100 visual questions spanning 11 different tasks. These range from basic checks (“Is this piece taller than that one?”, “Are these two blocks touching?”) to complex sequences (“Put these assembly steps in the right order,” “Which image shows the wrong step?”).

The surprising scorecard: AI vs humans

So, how did today’s top AI models fare on these LEGO challenges? The results were striking, and frankly, a bit embarrassing for the AI.

  • Massive gap: Even the best models, like OpenAI’s GPT-4o and Google’s Gemini-2.0-Flash, only answered about 50-58% of the questions correctly.
  • Human triumph: Human participants, in contrast, breezed through the puzzles with over 90% accuracy.
  • Open-source struggles: Many open-source MLLMs performed only slightly better than random guessing. Some completely failed specific tasks, like ordering assembly steps, sometimes just outputting the same wrong letter for almost every question.

The AI particularly struggled with tasks involving:

  • Height perception: Often confusing a 2D image projection with 3D reality (think optical illusions).
  • Rotation: Understanding how objects look after being turned.
  • Multi-step reasoning: The more steps involved in a sequence, the worse the AI performed, highlighting a failure to track changes over time.

KAIST grew brains for AI that can learn right off devices


Can AI even show us the next step?

Perhaps even more telling was the image generation test. Researchers asked MLLMs to generate an image showing the result of a specific LEGO assembly step.

The outcome? A near-total failure. Most models either ignored the instructions, simply copied the input image, or generated something completely unrelated. Only Gemini-2.0-Flash and GPT-4o showed a “limited ability” – Gemini was better at editing the existing image accurately, while GPT-4o seemed to regenerate the scene conceptually, often losing visual consistency. The open-source models were hopelessly lost.

This research exposes a critical weakness in current AI development. While models excel at pattern matching in language and static images, they lack a robust grasp of multi-step spatial reasoning – the dynamic understanding of how things work in physical space and time.

The study found that even prompting techniques like “Chain-of-Thought” (asking the AI to “think step-by-step”), which often help with text problems, provided minimal benefit and sometimes even hindered performance on these spatial tasks, especially complex ones.

It seems that truly understanding our 3D world and how actions unfold within it requires more than just processing massive amounts of text and images. MLLMs need better ways to represent space, track changes sequentially, and perhaps develop a form of “visual memory.”


Featured image credit: Kerem Gülen/Imagen 3

Share This Article
Twitter Email Copy Link Print
Previous Article Veteran Trader Warns Of ‘Textbook’ XRP Crash Pattern Veteran Trader Warns Of ‘Textbook’ XRP Crash Pattern
Next Article Linear Finance Ceases Operations As ‘Notice Of Closure’ Announced Linear Finance Ceases Operations As ‘Notice Of Closure’ Announced
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Using RSS feeds, we aggregate news from trusted sources to ensure real-time updates on the latest events and trends. Stay ahead with timely, curated information designed to keep you informed and engaged.
TwitterFollow
TelegramFollow
LinkedInFollow
- Advertisement -
Ad imageAd image

You Might Also Like

Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation
AITechnology

Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

By capernaum

AI Agents Now Write Code in Parallel: OpenAI Introduces Codex, a Cloud-Based Coding Agent Inside ChatGPT

By capernaum

Infrastructure automation

By capernaum

OEM (original equipment manufacturer)

By capernaum
Capernaum
Facebook Twitter Youtube Rss Medium

Capernaum :  Your instant connection to breaking news & stories . Stay informed with real-time coverage across  AI ,Data Science , Finance, Fashion , Travel, Health. Your trusted source for 24/7 insights and updates.

© Capernaum 2024. All Rights Reserved.

CapernaumCapernaum
Welcome Back!

Sign in to your account

Lost your password?