Research conducted by the BBC has found that four major artificial intelligence (AI) chatbots—OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini, and Perplexity AI—are inaccurately summarising news stories. The study involved these chatbots summarizing 100 news stories sourced from the BBC website.
BBC exclusive: AI chatbots distort news summaries
The BBC reported that the answers produced by the AI chatbots contained “significant inaccuracies” and distortions. Deborah Turness, CEO of BBC News and Current Affairs, noted in a blog post that while AI offers “endless opportunities,” developers are “playing with fire,” raising concerns that AI-distorted headlines could cause real-world harm.
Throughout the study, which involved ratings from journalists who were experts in the respective subjects of the articles, it was found that 51% of the AI responses had substantial issues. Among the AI-generated answers that referenced BBC content, 19% contained factual errors, including incorrect statements, numbers, and dates. Additionally, 13% of quotes attributed to BBC articles were either altered or misrepresented.
Some specific inaccuracies identified in the study included Gemini stating that the UK’s National Health Service (NHS) did not recommend vaping to quit smoking, when in fact it does. ChatGPT and Copilot inaccurately claimed that former leaders Rishi Sunak and Nicola Sturgeon were still in office, despite their departures. Perplexity misquoted BBC News, suggesting Iran acted with “restraint” regarding Israel’s actions.
The study highlighted that Microsoft’s Copilot and Google’s Gemini exhibited more significant issues compared to OpenAI’s ChatGPT and Perplexity AI. The BBC had temporarily lifted restrictions on its content access to these AI systems during the testing phase in December 2024.
OpenAI takes down Iranian cluster using ChatGPT to craft fake news
BBC’s Programme Director for Generative AI, Pete Archer, emphasized that publishers should control how their content is used and that AI companies need to disclose how their assistants process news, including error rates. OpenAI countered that they collaborate with partners to improve the accuracy of in-line citations and respect publisher preferences.
Following the study, Turness urged tech companies to address the identified issues, similar to how Apple responded to previous BBC complaints about AI-powered news summaries. She prompted a collaborative effort among the tech industry, news organizations, and the government to remedy the inaccuracies that can erode public trust in information.
The study further noted Perplexity AI’s tendency to alter statements from sources and revealed that Copilot relied on outdated articles for its news summaries. Overall, the BBC aims to engage in a broader conversation around the regulatory environment for AI to ensure accurate news dissemination.
In response to the findings, Turness posed a critical question regarding the design of AI technologies to foster accuracy in news consumption. She stated that the potential for distortion, akin to disinformation, threatens public trust in all informational media.
Featured image credit: Kerem Gülen/Ideogram