The process of building ontologies—those structured, interconnected maps of knowledge that power everything from search engines to AI reasoning—is notoriously complex. It requires a blend of domain expertise, logical rigor, and an almost philosophical understanding of how concepts relate. For years, ontology engineers have wrestled with the challenge of turning abstract knowledge into structured data. Now, Large Language Models (LLMs) are stepping into the ring, claiming they can do much of the heavy lifting.
A team of researchers, including Anna Sofia Lippolis, Mohammad Javad Saeedizade, and Eva Blomqvist, has been testing that claim. Their latest study evaluates whether AI models—specifically OpenAI’s o1-preview, GPT-4, and Meta’s LLaMA 3.1—can generate usable ontologies from natural language descriptions. The results? A mix of promise, pitfalls, and philosophical questions about the role of AI in knowledge representation.
The AI-powered ontology engineer
Traditionally, ontology creation has relied on methodologies like Methontology and NeOn, which guide engineers through an intricate process of defining concepts, relationships, and constraints. But even for seasoned experts, this process is time-consuming and prone to error. The researchers proposed a different approach: let LLMs generate the foundation and let human experts refine the results.
Their study introduced two prompting techniques—Memoryless CQbyCQ and Ontogenia—designed to help AI generate ontologies step by step. Both methods relied on feeding AI structured prompts based on competency questions (queries an ontology should be able to answer) and user stories (real-world scenarios the ontology should support).
Rather than forcing AI to process entire ontologies at once—a task that often leads to confused, bloated outputs—these approaches broke the process down into modular steps, guiding LLMs through logical constructions one piece at a time.
SENSEI: The AI that explores like a curious child
How well did AI perform?
The researchers tested their methods on a benchmark dataset, comparing AI-generated ontologies against those created by novice human engineers. The standout performer? OpenAI’s o1-preview, using the Ontogenia prompting method. It produced ontologies that were not just usable but, in many cases, outperformed those crafted by human beginners.
However, the study also highlighted critical limitations. AI-generated ontologies had a tendency to produce redundant or inconsistent elements—such as defining both employedSince and employmentStartDate for the same concept. Worse, models frequently struggled with the finer points of logic, generating overlapping domains and incorrect inverse relationships. Meta’s LLaMA model, in particular, performed poorly, producing tangled hierarchies and structural flaws that made its ontologies difficult to use.
One of the biggest takeaways? Context matters. When LLMs were forced to work with too much information at once, their performance suffered. Trimming down their input—hence the “Memoryless” strategy—helped reduce irrelevant outputs and improved coherence.
So, should we let AI take over ontology engineering? Not quite. While LLMs can speed up the drafting process, human intervention remains essential. AI is great at producing structured knowledge quickly, but its outputs still need logical refinement and semantic verification—tasks that require human oversight.
The researchers suggest that AI’s true role in ontology engineering is that of a co-pilot rather than a replacement. Instead of building knowledge graphs from scratch, LLMs can assist by generating structured drafts, which human experts can refine. Future improvements might focus on integrating ontology validation mechanisms directly into AI workflows, reducing the need for manual corrections.
In other words, AI can help map the territory, but humans still need to verify the landmarks.
The study also raises a deeper question about AI’s ability to understand knowledge. Can a model truly “grasp” ontological relationships, or is it merely playing an advanced game of statistical pattern-matching? As AI continues to evolve, the line between human logic and machine-generated reasoning is blurring—but for now, human engineers still have the final say.
LLMs can generate surprisingly high-quality ontology drafts, but their outputs remain inconsistent and require human refinement. If the goal is efficiency, AI-assisted ontology engineering is already proving useful. If the goal is perfection, there’s still a long way to go.
Featured image credit: Kerem Gülen/Imagen 3