A new report reveals that Grok — the free-to-use AI chatbot integrated into Elon Musk’s X — showed “significant flaws and limitations” when verifying information about the 12-day conflict between Israel and Iran (June 13-24), which now seems to have subsided.
Researchers at the Atlantic Council’s Digital Forensic Research Lab (DFRLab) analysed 130,000 posts published by the chatbot on X in relation to the 12-day conflict, and found they provided inaccurate and inconsistent information.
They estimate that around a third of those posts responded to requests to verify misinformation circulating about the conflict, including unverified social media claims and footage purporting to emerge from the exchange of fire.
“Grok demonstrated that it struggles with verifying already-confirmed facts, analysing fake visuals and avoiding unsubstantiated claims,” the report says.
“The study emphasises the crucial importance of AI chatbots providing accurate information to ensure they are responsible intermediaries of information.”
While Grok is not intended as a fact-checking tool, X users are increasingly turning to it to verify information circulating on the platform, including to understand crisis events.
X has no third-party fact-checking programme, relying instead on so-called community notes where users can add context to posts believed to be inaccurate.
Misinformation surged on the platform after Israel first struck in Iran on 13 June, triggering an intense exchange of fire.
Grok fails to distinguish authentic from fake
DFRLab researchers identified two AI-generated videos that Grok falsely labelled as “real footage” emerging from the conflict.
The first of these videos shows what seems to be destruction to Tel Aviv’s Ben Gurion airport after an Iranian strike, but is clearly AI-generated. Asked whether it was real, Grok oscillated between conflicting responses within minutes.
It falsely claimed that the false video “likely shows real damage at Tel Aviv’s Ben Gurion Airport from a Houthi missile strike on May 4, 2025,” but later claimed the video “likely shows Mehrabad International Airport in Tehran, Iran, damaged during Israeli airstrikes on June 13, 2025.”
Euroverify, Euronews’ fact-checking unit, identified three further viral AI-generated videos which Grok falsely said were authentic when asked by X users. The chatbot linked them to an attack on Iran’s Arak nuclear plant and strikes on Israel’s port of Haifa and the Weizmann Institute in Rehovot.
Euroverify has previously detected several out-of-context videos circulating on social platforms being misleadingly linked to the Israel-Iran conflict.
Grok seems to have contributed to this phenomenon. The chatbot described a viral video as showing Israelis fleeing the conflict at the Taba border crossing with Egypt, when it in fact shows festival-goers in France.
It also alleged that a video of an explosion in Malaysia showed an “Iranian missile hitting Tel Aviv” on 19 June.
Chatbots amplifying falsehoods
The findings of the report come after the 12-day conflict triggered an avalanche of false claims and speculation online.
One claim, that China sent military cargo planes to Iran’s aid, was widely boosted by AI chatbots Grok and Perplexity, a three-year-old AI startup which has drawn widespread controversy for allegedly using the content of media companies without their consent.
NewsGuard, a disinformation watchdog, claimed both these chatbots had contributed to the spread of the claim.
The misinformation stemmed from misinterpreted data from flight tracking site Flightradar24, which was picked up by some media outlets and amplified artificially by the AI chatbots.
Experts at DFRLab point out that chatbots heavily rely on media outlets to verify information, but often cannot keep up with the fast-changing news pace in situations of global crises.
They also warn against the distorting impact these chatbots can have as users become increasingly reliant on them to inform themselves.
“As these advanced language models become an intermediary through which wars and conflicts are interpreted, their responses, biases, and limitations can influence the public narrative.”