Do AI Engines Agree? The Cross-Engine Divergence Report (60 Answers, 2026)
How much do AI engines agree on which tool to recommend?
Cross-engine agreement is the rate at which ChatGPT, Perplexity, and Google AI Overviews independently name the same top tool for one buying-intent query, and in our data it happened only 36 percent of the time.
The headline number is 36 percent, and it is lower than the vendor pitch implies. We submitted 20 go-to-market buying-intent queries to three live AI answer engines and logged the tool each engine named first. On the 14 queries that ask for a category winner or an intent answer (the head-to-head queries are scored separately), full three-engine agreement on the top tool happened on 5 of 14. Two engines agreed and one dissented on 6. And on 3 queries, all three engines named a different tool. A blended visibility score collapses those three states into one comfortable figure. Measured directly, the disagreement is the finding.
Source: Lucreya original measurement, data.json crossEngineAnalysis. 14 category and intent queries. Snapshot 2026-06-07.verified 2026-06-07
The cross-engine agreement table: where the engines split
The Engine-Consensus flag is a per-query label that records whether all three engines agreed on the top tool, two of three agreed, or all three diverged, and the table below applies it to representative queries from each state.
This is the extractable centerpiece. Each row is one buying-intent query, the named top tool from each engine, and the resulting flag. The consensus winners (Apollo.io, Clay, Surfer SEO) are exactly as logged in the dataset, and so are the three full-divergence queries, all of which land in the GEO and AI-visibility category.
| Buying-intent query | ChatGPT | Perplexity | Google AIO | Flag |
|---|---|---|---|---|
| Best sales prospecting tool 2026 | Apollo.io | Apollo.io | Apollo.io | Consensus |
| Best lead enrichment tool for agencies | Clay | Clay | Clay | Consensus |
| Best SEO content optimization tool 2026 | Surfer SEO | Surfer SEO | Surfer SEO | Consensus |
| Best AI copywriting tool 2026 | Jasper | Jasper | Copy.ai | 2-of-3 |
| Best cold email software 2026 | Instantly.ai | Salesforge | Instantly.ai | 2-of-3 |
| Best AI SDR tool 2026 | 11x.ai | AiSDR | 11x.ai | 2-of-3 |
| Best GEO tool to track AI search visibility | Profound | Semrush AI Visibility Toolkit | Goodie AI | Full diverge |
| How to track brand mentions in ChatGPT | Profound / AthenaHQ | Otterly / Semrush | Keyword.com | Full diverge |
| Best AI search optimization platform 2026 | Search Party / Goodie / Profound | Surfer / Clearscope / Rankability | Surfer / Clearscope / Rankability | Full diverge |
Top tool as named by each engine. Highlighted rows are the three full-divergence queries (data.json IDs S2, S4, S7), all in the GEO and AI-visibility category. Source: Lucreya June 2026 study. Snapshot 2026-06-07.verified 2026-06-07
What do the three full-divergence queries look like up close?
A full-divergence query is one where all three engines name a different top tool, and in our run every such query asked how to measure or track AI search visibility.
On "best GEO tool to track AI search visibility," the three engines gave three different answers. ChatGPT led with Profound. Perplexity led with the Semrush AI Visibility Toolkit (and ZipTie). Google AI Overviews led with Goodie AI. One question, three tools, zero overlap on the top pick. The second full-divergence query, "how to track brand mentions in ChatGPT," split the same way: ChatGPT pointed to Profound and AthenaHQ, Perplexity to Otterly and Semrush, and Google AI Overviews to Keyword.com. The third, "best AI search optimization platform 2026," saw ChatGPT name a different slate (Search Party, Goodie AI, Profound) while Perplexity and Google AI Overviews both leaned on the Surfer / Clearscope / Rankability cluster.
The detail worth sitting with is that all three full-divergence queries are in the same category, GEO and AI-visibility. The mature go-to-market categories had a settled answer. The category that exists to measure AI visibility is itself the one the AI engines cannot agree on. If you sell or buy in that space, the strategic read is direct: the consensus has not formed, so visibility there is still up for grabs in a way it is not for prospecting or enrichment.
Did the citation sources behind these answers also diverge?
Citation-source comparison is the analysis of which URLs each engine cites to justify its answer, and in this run it was capturable at full fidelity for only one of the three engines.
Here is the honest limit, stated plainly. This report compares the tool each engine recommends across all three engines, which is a clean, like-for-like measurement. It does not claim a like-for-like cross-engine citation-URL comparison, because the engines do not expose citations equally. Perplexity publishes a native numbered source list, so we logged 162 of its citations in full. ChatGPT renders citations as in-product chips that are not enumerated the same way, and Google AI Overviews collapses its citation list and intermixes it with organic results. So the citation autopsy in our study rests on Perplexity's high-fidelity capture, and the cross-engine divergence figure rests on the tool-recommendation lists from all three. We do not blur those two things together.
What we can say from our own Perplexity-fidelity data is that the citation surface is overwhelmingly third-party: roughly four in five of the 162 logged Perplexity citations pointed to independent roundups, review blogs, comparison aggregators, and forums rather than the recommended vendor's own site. Reddit alone appeared in the citation set of 15 of 20 answers (75 percent)verified 2026-06-07. So even within one engine, the sources that decide answers are not the vendor's own pages. For the per-engine mechanics behind that, see our breakdown of how to get cited in ChatGPT.
Why does cross-engine divergence break the single blended visibility score?
A single blended visibility score is one averaged number meant to summarize a brand's presence across engines, and divergence breaks it because averaging three disagreeing engines reports a winner that none of them named alone.
The math is the problem. When a vendor sells you one "visibility score," that number has to compress three engines into one. On the 21 percent of our queries where the engines fully diverged, there was no single winner to compress: ChatGPT, Perplexity, and Google AI Overviews each named a different tool. A blended score papers over that by picking, implicitly, which engine to trust, and then hiding the choice inside a formula you cannot inspect. The honest alternative is to report the disagreement itself. That is what the CONSENSUS Protocol formalizes with the Answer-Engine Consensus Index (AECI) and a four-state Engine-Consensus flag: Consensus, single-engine dissent, Absent, or Due-diligence, per query, per engine, with a date attached.
The revenue consequence is concrete. If your tracking dashboard tells you that you are "62 percent visible" while two of three engines never name you, you will under-invest in exactly the engines where you are absent. A team that knows it is a single-engine dissent on Perplexity and absent on Google AI Overviews can act on it. A team holding one blended percentage cannot. And the answer surface is almost always present: Google AI Overviews triggered on 19 of 20 queries (95 percent)verified 2026-06-07 in our run, so the AI answer is a pipeline input before the buyer ever reaches your site. For the broader landscape of tools feeding these answers, our colleagues at Nesyona's AI SEO tools index track the category in depth.
- Queries
- 20 GTM buying-intent prompts (marketing, SEO/GEO, sales verticals), all published by ID.
- Engines
- ChatGPT (GPT-5.x, web search on), Perplexity (default web), Google AI Overviews (default Google Search).
- Answers
- 60 AI answers captured (20 queries x 3 engines). 162 Perplexity citations logged.
- Agreement metric
- Cross-engine agreement on the named top tool, scored on the 14 category and intent queries (head-to-head queries excluded as use-case splits).
- Citation fidelity
- Full only for Perplexity (native numbered source list). ChatGPT chips and AIO collapsed citations are not comparable, so cross-engine source overlap is not claimed.
- Snapshot
- 2026-06-07. AI answers are volatile; re-run the protocol to reproduce. License CC BY 4.0. Dataset DOI: 10.5281/zenodo.20632768 (Zenodo, CC BY 4.0).
Where does your category sit on this map?
The free AI Visibility Audit runs the first step of the CONSENSUS Protocol for you: it returns your Engine-Consensus flag across ChatGPT, Perplexity, and Google AI Overviews, with a snapshot date, in minutes. No blended black-box number. Just whether the engines agree on you, and where they do not.
Run my free AI Visibility Audit ›What should a revenue team do about engine divergence?
The response to divergence is to measure each engine separately and prioritize the engine where you are absent, rather than chasing one averaged score.
Stop optimizing for the average and start optimizing for the gap. Divergence means the leverage is uneven across engines, so the move is to find the engine where you are absent and earn the third-party sources that engine cites. Because roughly four in five citations went to third-party pages, the work is mostly off your own domain: review roundups, comparison pages, and the forum threads (Reddit above all) that the engines lean on. A brand that only optimizes its own site is working the 20 percent of the citation surface that was already most likely to point to it. For the placement side of that work, we productize the monitoring-and-earned-placement loop through our GEO monitoring and placement retainer, and the full measurement standard behind it is the CONSENSUS Protocol.
If you are starting from zero, the sequence is: define GEO and why it matters in our GEO definition, see the full landscape in the State of AI Search 2026, then adopt the measurement discipline in the CONSENSUS Protocol. The divergence in this report is the evidence; those guides are the response.
Frequently asked questions
Do AI engines agree on which tool to recommend?
How much do ChatGPT, Perplexity, and Google AI Overviews disagree?
Which AI engine is the contrarian one?
Why does a single blended AI visibility score hide engine disagreement?
Were the citation sources behind these answers also compared across engines?
Bottom line
AI engines disagree far more than the single-blended-score vendors admit, and our own data proves it. Across 20 buying-intent queries on three engines, full three-engine agreement on the recommended tool happened on only 36 percent of category and intent queries, two-of-three on 43 percent, and full divergence on 21 percent, with all three full-divergence queries in the GEO and AI-visibility category. The mature categories had a consensus winner (Apollo, Clay, Surfer SEO); the category that measures AI visibility did not. We measured agreement on the recommended tool across all three engines and captured citation sources at fidelity only for Perplexity, and we keep those two facts separate. The external 11 percent cross-engine domain-overlap figure is not ours; our directly measured equivalent is the 36 percent agreement, 21 percent divergence reading. Run the measurement on your own category with our free AI Visibility Audit, see the method in the CONSENSUS Protocol, or read the full evidence in the Who AI Recommends: GTM 2026 study.
- Lucreya original measurement. Who AI Recommends: GTM Tool and Source Citations Across ChatGPT, Perplexity, and Google AI Overviews (2026). 20 queries, 3 engines, 60 answers, 162 Perplexity citations. Snapshot date 2026-06-07. lucreya.com/research/who-ai-recommends-gtm-2026/. CC BY 4.0. Dataset DOI: 10.5281/zenodo.20632768 (Zenodo, CC BY 4.0). verified 2026-06-07
- Lucreya. The CONSENSUS Protocol: How to Measure AI Visibility Honestly (AECI Method, 2026). lucreya.com/articles/the-consensus-protocol. The measurement standard defining the AECI and the four-state Engine-Consensus flag used in this report.
- External cross-engine citation-overlap analysis. The ~11 percent figure for cited domains appearing across two or more AI engines is an external industry finding; confirm against the originating source before reuse. It is not a Lucreya measurement. Lucreya's own directly measured equivalent (36 percent full three-engine agreement, 21 percent full divergence across 14 queries) is reported from the study above. verified 2026-06-07
- Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., Deshpande, A. GEO: Generative Engine Optimization. 2023. arxiv.org/abs/2311.09735. The Princeton GEO framework on raising generative-engine visibility through on-page tactics.
- Google. Generative AI in Search: Let Google do the searching for you. blog.google/products/search/generative-ai-search/. Reference for Google AI Overviews behavior and trigger volatility.
- Perplexity AI. perplexity.ai. The one engine in the study exposing a native numbered citation list, the basis for the source-fidelity caveat.
- Creative Commons. CC BY 4.0 License. creativecommons.org/licenses/by/4.0/. License for the Lucreya measurement dataset.