A new research paper from OpenAI asks why large language models like GPT-5 and chatbots like ChatGPT still hallucinate, and whether anything can be done to reduce those hallucinations.
In a blog post summarizing the paper, OpenAI defines hallucinations as “plausible but false statements generated by language models,” and it acknowledges that despite improvements, hallucinations “remain a fundamental challenge for all large language models” — one that will never be completely eliminated.
To illustrate the point, researchers say that when they asked “a widely used chatbot” about the title of Adam Tauman Kalai’s Ph.D. dissertation, they got three different answers, all of them wrong. (Kalai is one of the paper’s authors.) They then asked about his birthday and received three different dates. Once again, all of them were wrong.
How can a chatbot be so wrong — and sound so confident in its wrongness? The researchers suggest that hallucinations arise, in part, because of a pretraining process that focuses on getting models to correctly predict the next word, without true or false labels attached to the training statements: “The model sees only positive examples of fluent language and must approximate the overall distribution.”
“Spelling and parentheses follow consistent patterns, so errors there disappear with scale,” they write. “But arbitrary low-frequency facts, like a pet’s birthday, cannot be predicted from patterns alone and hence lead to hallucinations.”
The paper’s proposed solution, however, focuses less on the initial pretraining process and more on how large language models are evaluated. It argues that the current evaluation models don’t cause hallucinations themselves, but they “set the wrong incentives.”
The researchers compare these evaluations to the kind of multiple choice tests random guessing makes sense, because “you might get lucky and be right,” while leaving the answer blank “guarantees a zero.”
Techcrunch event
San Francisco
|
October 27-29, 2025
“In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say ‘I don’t know,’” they say.
The proposed solution, then, is similar to tests (like the SAT) that include “negative [scoring] for wrong answers or partial credit for leaving questions blank to discourage blind guessing.” Similarly, OpenAI says model evaluations need to “penalize confident errors more than you penalize uncertainty, and give partial credit for appropriate expressions of uncertainty.”
And the researchers argue that it’s not enough to introduce “a few new uncertainty-aware tests on the side.” Instead, “the widely used, accuracy-based evals need to be updated so that their scoring discourages guessing.”
“If the main scoreboards keep rewarding lucky guesses, models will keep learning to guess,” the researchers say.
Source link
#bad #incentives #blame #hallucinations #TechCrunch
![A24 Wants You to Be Nice About Its Google AI Deal
Earlier this week, we learned A24 entered a research partnership with Google’s DeepMind unit. The reactions were… not happy, to say the least. And like many who’ve let generative AI into their hearts, the film studio’s now left playing defense for its widely panned decision. In a statement to Wired, A24’s communications rep Sophia Shin stressed the “research” part of its Google partnership. “We’re working side-by-side with DeepMind’s researchers to learn, iterate, and build, having an active hand in shaping new tools and workflows,” she explained. “This is about learning and helping pain points in workflows behind the scenes. […] It exists because we want to dictate what tools get built for artists, so they have a voice in shaping them rather than having tools handed to them. While acknowledging A24 doesn’t love “any of the current AI outputs onscreen in Hollywood,” Shin considers this partnership a bit of a necessary evil. “We’d rather have a seat at the table than on the sidelines,” she stated. The promise of further artist agency and “a seat at the table” are common shields from genAI users, but it doesn’t seem to be working here. Fans consider this move ironic and a betrayal, given Backrooms director Kane Parsons recently called the tech “cultural and economic rot” and wished he could just snap it out of existence.
Before that, the studio didn’t seem to have much interest in generative AI. In 2024, it came under fire for using the tech to create posters for Civil War, while months later, its horror film Heretic had a disclaimer promising it wasn’t made with the technology. But Hollywood’s become gradually more accepting of generative AI and its supposed potential for the filmmaking process. In that sense, maybe it was inevitable for A24 to fall in line. Want more io9 news? Check out when to expect the latest Marvel, Star Wars, and Star Trek releases, what’s next for the DC Universe on film and TV, and everything you need to know about the future of Doctor Who. #A24 #Nice #Google #DealA24,generative ai,Google DeepMind A24 Wants You to Be Nice About Its Google AI Deal
Earlier this week, we learned A24 entered a research partnership with Google’s DeepMind unit. The reactions were… not happy, to say the least. And like many who’ve let generative AI into their hearts, the film studio’s now left playing defense for its widely panned decision. In a statement to Wired, A24’s communications rep Sophia Shin stressed the “research” part of its Google partnership. “We’re working side-by-side with DeepMind’s researchers to learn, iterate, and build, having an active hand in shaping new tools and workflows,” she explained. “This is about learning and helping pain points in workflows behind the scenes. […] It exists because we want to dictate what tools get built for artists, so they have a voice in shaping them rather than having tools handed to them. While acknowledging A24 doesn’t love “any of the current AI outputs onscreen in Hollywood,” Shin considers this partnership a bit of a necessary evil. “We’d rather have a seat at the table than on the sidelines,” she stated. The promise of further artist agency and “a seat at the table” are common shields from genAI users, but it doesn’t seem to be working here. Fans consider this move ironic and a betrayal, given Backrooms director Kane Parsons recently called the tech “cultural and economic rot” and wished he could just snap it out of existence.
Before that, the studio didn’t seem to have much interest in generative AI. In 2024, it came under fire for using the tech to create posters for Civil War, while months later, its horror film Heretic had a disclaimer promising it wasn’t made with the technology. But Hollywood’s become gradually more accepting of generative AI and its supposed potential for the filmmaking process. In that sense, maybe it was inevitable for A24 to fall in line. Want more io9 news? Check out when to expect the latest Marvel, Star Wars, and Star Trek releases, what’s next for the DC Universe on film and TV, and everything you need to know about the future of Doctor Who. #A24 #Nice #Google #DealA24,generative ai,Google DeepMind](https://gizmodo.com/app/uploads/2026/03/Backrooms-1280x853.jpg)
Post Comment