Artificial intelligence has long struggled with a fundamental problem: how can an AI explore its environment intelligently without explicit instructions? Traditional reinforcement learning (RL) relies on trial and error, often wasting vast amounts of time interacting randomly with its surroundings. While AI models can be trained to solve specific tasks efficiently, getting them to explore new environments meaningfully—without predefined goals—has been a major challenge.
A recent study by Cansu Sancaktar, Christian Gumbsch, Andrii Zadaianchuk, Pavel Kolev, and Georg Martius from the University of Tübingen, the Max Planck Institute, TU Dresden, and the University of Amsterdam introduces a promising solution: SENSEI (SEmaNtically Sensible ExploratIon).
Unlike previous methods that treat exploration as a brute-force problem, SENSEI takes a different approach—one that mimics how humans, particularly children, explore the world. Instead of just trying new things randomly, humans seek out meaningful interactions—opening drawers instead of just banging on desks, pushing buttons instead of flailing their arms. SENSEI brings this human-like curiosity to artificial agents by using foundation models like Vision Language Models (VLMs) to guide exploration with semantic understanding.
The problem with AI exploration
For AI agents to learn new tasks, they must first explore their environment. Traditional exploration methods rely on intrinsic motivation, meaning AI is given an internal reward for actions that generate novelty or maximize information gain. However, this approach often leads to low-level, unstructured behaviors—such as a robot moving randomly or repeatedly touching objects without recognizing their relevance.
Imagine a robot in a room full of objects:
- A standard RL agent might try every action randomly—hitting the desk, spinning in circles, or grabbing the air—without prioritizing useful interactions.
- A human-like learner, in contrast, would naturally focus on objects like drawers and buttons, recognizing them as sources of meaningful interactions.
This is where SENSEI steps in.
AI now handles molecular simulations: Thanks to MDCrow
How SENSEI teaches AI to explore like a human
SENSEI introduces a new type of intrinsic motivation—one based on semantic understanding. Instead of exploring blindly, AI is guided by what a foundation model (a large-scale AI trained on vast amounts of data) deems “interesting.”
The process works in three main steps:
1. Teaching AI what’s “interesting”
Before the agent starts exploring, SENSEI uses a Vision Language Model (VLM) like GPT-4V to evaluate images of the environment. The VLM is asked questions like:
“Which of these two images is more interesting?”
From these comparisons, SENSEI distills a semantic reward function, teaching the AI what types of interactions matter.
2. Learning a world model
Once the AI understands what is considered “interesting,” it builds an internal world model—a predictive system that helps it anticipate how the environment will respond to its actions.
- Instead of needing to query the foundation model constantly, the AI learns to predict interestingness by itself.
- This reduces reliance on external models and allows for faster, self-guided exploration.
3. Exploring smarter, not harder
With this understanding, the AI is now guided by two competing motivations:
- Find interesting things (maximize the semantic reward).
- Push the boundaries of what it knows (increase uncertainty by exploring new areas).
The result? AI agents unlock behaviors that are both novel and meaningful—just like human curiosity-driven exploration.
What SENSEI can do: AI that unlocks real-world interactions
The researchers tested SENSEI in two different environments:
1. Video game simulations (MiniHack)
- In a game where an AI had to find a key to open a locked door, SENSEI prioritized interactions with the key and door—just like a human would.
- Traditional AI exploration methods often got stuck doing random movements without understanding the significance of objects in the scene.
- SENSEI solved the game’s puzzles faster and with fewer wasted actions than other AI methods.
2. Robotic simulations (Robodesk)
- In a robot arm environment, SENSEI focused on manipulating objects like drawers and buttons, learning meaningful tasks naturally.
- Competing AI systems either flailed randomly or got stuck repeating actions without real purpose.
In both cases, SENSEI didn’t just cover more ground—it focused on interactions that mattered, leading to richer and more efficient learning.
Why this matters: The future of AI exploration
SENSEI’s ability to prioritize meaningful interactions could revolutionize robotics, allowing robots to self-learn useful behaviors without explicit programming. Imagine:
- A home assistant that figures out how to use new appliances without step-by-step instructions.
- Industrial robots that adapt to new tasks in factories without human intervention.
By focusing on semantically relevant exploration, AI can reduce wasted computation, leading to faster and more energy-efficient learning.
One of the biggest challenges in AI is creating systems that learn flexibly like humans. SENSEI represents a step toward AI agents that can explore new environments intelligently—without relying on handcrafted training data or predefined objectives.
Limitations
While SENSEI is a major leap forward, it still has some limitations:
- It relies on high-quality visual input. If the AI’s camera is blocked or distorted, its understanding may be affected.
- It is not yet multimodal. While it works well with images, future versions could incorporate sound, text, and other sensory inputs for richer exploration.
- It assumes general human-like curiosity is always beneficial. In some specialized applications, certain interactions may not be useful.
Featured image credit: Kerem Gülen/Midjourney