Report · GEO Updated June 2026 · 12 min read · By Vincent Wesley Couey

Last reviewed: June 10, 2026 Next review due: September 2026 Snapshot data: June 7, 2026

Do AI Engines Agree? The Cross-Engine Divergence Report (60 Answers, 2026)

Q: Do AI engines agree on which tool to recommend?

Mostly no. In Lucreya's June 2026 study of 20 GTM buying-intent queries across ChatGPT, Perplexity, and Google AI Overviews, the three engines named the same top tool on only 5 of 14 category and intent queries (36 percent). They agreed two-of-three on 6 queries (43 percent) and named three completely different top tools on 3 queries (21 percent). All three full-divergence queries were in the GEO and AI-visibility category. Cross-engine agreement on the single recommended tool is the exception, not the rule.

Q: How much do ChatGPT, Perplexity, and Google AI Overviews disagree?

On Lucreya's 14 category and intent queries, the three engines fully agreed on the top recommended tool 36 percent of the time and fully diverged (three different top tools) 21 percent of the time, leaving 43 percent in two-of-three partial agreement. For example, on best GEO tool to track AI search visibility, ChatGPT named Profound, Perplexity named the Semrush AI Visibility Toolkit, and Google AI Overviews named Goodie AI, three different answers to one question.

Q: Which AI engine is the contrarian one?

In Lucreya's run, Perplexity was the most frequent dissenter, naming a different top tool from the ChatGPT and Google AI Overviews consensus on at least four queries (a Veo vs HeyGen video query, a Salesforge vs Instantly cold-email query, an AiSDR vs 11x SDR query, and the GEO-tracking query). This is a snapshot reading from one run, not a permanent property of the engine.

Q: Why does a single blended AI visibility score hide engine disagreement?

A single blended score averages disagreeing engines into one number, so it reports a single winner even when the engines named three different tools. In Lucreya's data, a blended score would have implied consensus on the 21 percent of queries where the engines fully diverged. The honest alternative, defined in the CONSENSUS Protocol, is to report the per-engine Answer-Engine Consensus Index (AECI) and a four-state Engine-Consensus flag instead of one averaged figure.

How much do AI engines agree on which tool to recommend?

Cross-engine agreement is the rate at which ChatGPT, Perplexity, and Google AI Overviews independently name the same top tool for one buying-intent query, and in our data it happened only 36 percent of the time.

The headline number is 36 percent, and it is lower than the vendor pitch implies. We submitted 20 go-to-market buying-intent queries to three live AI answer engines and logged the tool each engine named first. On the 14 queries that ask for a category winner or an intent answer (the head-to-head queries are scored separately), full three-engine agreement on the top tool happened on 5 of 14. Two engines agreed and one dissented on 6. And on 3 queries, all three engines named a different tool. A blended visibility score collapses those three states into one comfortable figure. Measured directly, the disagreement is the finding.

36%

Full three-engine agreement on the top tool (5 of 14 category and intent queries)

43%

Two-of-three partial agreement (6 of 14 queries)

21%

Full divergence, three different top tools (3 of 14, all in GEO)

36% AGREE 43% 2-OF-3 21% SPLIT

Full agreement (all 3 engines) Partial (2 of 3) Full divergence (3 different tools)

Source: Lucreya original measurement, data.json crossEngineAnalysis. 14 category and intent queries. Snapshot 2026-06-07.verified 2026-06-07

Q: Why 14 queries and not all 20?

A: Six of the 20 queries are explicit head-to-head comparisons ("Jasper vs Copy.ai", "Clay vs Apollo"), where the "winner" is a use-case split rather than a single named tool, so they cannot be scored as agree-or-disagree on one pick. The 14 category and intent queries ask for a single best tool, which is what the agreement rate measures. All 20 queries and 60 answers are in the published dataset.

The cross-engine agreement table: where the engines split

The Engine-Consensus flag is a per-query label that records whether all three engines agreed on the top tool, two of three agreed, or all three diverged, and the table below applies it to representative queries from each state.

This is the extractable centerpiece. Each row is one buying-intent query, the named top tool from each engine, and the resulting flag. The consensus winners (Apollo.io, Clay, Surfer SEO) are exactly as logged in the dataset, and so are the three full-divergence queries, all of which land in the GEO and AI-visibility category.

Buying-intent query	ChatGPT	Perplexity	Google AIO	Flag
Best sales prospecting tool 2026	Apollo.io	Apollo.io	Apollo.io	Consensus
Best lead enrichment tool for agencies	Clay	Clay	Clay	Consensus
Best SEO content optimization tool 2026	Surfer SEO	Surfer SEO	Surfer SEO	Consensus
Best AI copywriting tool 2026	Jasper	Jasper	Copy.ai	2-of-3
Best cold email software 2026	Instantly.ai	Salesforge	Instantly.ai	2-of-3
Best AI SDR tool 2026	11x.ai	AiSDR	11x.ai	2-of-3
Best GEO tool to track AI search visibility	Profound	Semrush AI Visibility Toolkit	Goodie AI	Full diverge
How to track brand mentions in ChatGPT	Profound / AthenaHQ	Otterly / Semrush	Keyword.com	Full diverge
Best AI search optimization platform 2026	Search Party / Goodie / Profound	Surfer / Clearscope / Rankability	Surfer / Clearscope / Rankability	Full diverge

Top tool as named by each engine. Highlighted rows are the three full-divergence queries (data.json IDs S2, S4, S7), all in the GEO and AI-visibility category. Source: Lucreya June 2026 study. Snapshot 2026-06-07.verified 2026-06-07

The pattern in one sentence: the engines converge where the category has matured (Apollo for prospecting, Clay for enrichment, Surfer SEO for content optimization, all named top by two or three engines) and they diverge completely in the GEO and AI-visibility category, where no consensus winner has formed yet. That is not noise. It is a map of which categories are settled and which are still winnable. Lucreya figures: data.json crossEngineAnalysis + headlineFindings.geoIsUnsettled, snapshot 2026-06-07.

What do the three full-divergence queries look like up close?

A full-divergence query is one where all three engines name a different top tool, and in our run every such query asked how to measure or track AI search visibility.

On "best GEO tool to track AI search visibility," the three engines gave three different answers. ChatGPT led with Profound. Perplexity led with the Semrush AI Visibility Toolkit (and ZipTie). Google AI Overviews led with Goodie AI. One question, three tools, zero overlap on the top pick. The second full-divergence query, "how to track brand mentions in ChatGPT," split the same way: ChatGPT pointed to Profound and AthenaHQ, Perplexity to Otterly and Semrush, and Google AI Overviews to Keyword.com. The third, "best AI search optimization platform 2026," saw ChatGPT name a different slate (Search Party, Goodie AI, Profound) while Perplexity and Google AI Overviews both leaned on the Surfer / Clearscope / Rankability cluster.

The detail worth sitting with is that all three full-divergence queries are in the same category, GEO and AI-visibility. The mature go-to-market categories had a settled answer. The category that exists to measure AI visibility is itself the one the AI engines cannot agree on. If you sell or buy in that space, the strategic read is direct: the consensus has not formed, so visibility there is still up for grabs in a way it is not for prospecting or enrichment.

Q: Which engine was the contrarian?

A: Perplexity dissented from the ChatGPT-and-AIO consensus most often in this run, on at least four queries: a Veo 3.1 vs HeyGen video query, the Salesforge vs Instantly cold-email query, the AiSDR vs 11x SDR query, and the GEO-tracking query. That is a snapshot reading from one dated run, not a fixed property of Perplexity. Re-running the protocol on a later date could move it.

Did the citation sources behind these answers also diverge?

Citation-source comparison is the analysis of which URLs each engine cites to justify its answer, and in this run it was capturable at full fidelity for only one of the three engines.

Here is the honest limit, stated plainly. This report compares the tool each engine recommends across all three engines, which is a clean, like-for-like measurement. It does not claim a like-for-like cross-engine citation-URL comparison, because the engines do not expose citations equally. Perplexity publishes a native numbered source list, so we logged 162 of its citations in full. ChatGPT renders citations as in-product chips that are not enumerated the same way, and Google AI Overviews collapses its citation list and intermixes it with organic results. So the citation autopsy in our study rests on Perplexity's high-fidelity capture, and the cross-engine divergence figure rests on the tool-recommendation lists from all three. We do not blur those two things together.

On the external 11 percent figure: A separate, widely cited industry analysis has reported that only about 11 percent of cited domains appear across two or more AI engines, implying roughly nine in ten cited sources are engine-specific. That figure is external and is not ours. We could not reproduce a cross-engine citation-overlap number at fidelity in this run, for the reason above, so we neither claim it nor lean on it. Our own, directly measured equivalent is the tool-agreement reading: 36 percent full three-engine agreement and 21 percent full divergence across 14 queries. If you cite the 11 percent figure, attribute it to its originating source, not to Lucreya. See citations.

What we can say from our own Perplexity-fidelity data is that the citation surface is overwhelmingly third-party: roughly four in five of the 162 logged Perplexity citations pointed to independent roundups, review blogs, comparison aggregators, and forums rather than the recommended vendor's own site. Reddit alone appeared in the citation set of 15 of 20 answers (75 percent)verified 2026-06-07. So even within one engine, the sources that decide answers are not the vendor's own pages. For the per-engine mechanics behind that, see our breakdown of how to get cited in ChatGPT.

Why does cross-engine divergence break the single blended visibility score?

A single blended visibility score is one averaged number meant to summarize a brand's presence across engines, and divergence breaks it because averaging three disagreeing engines reports a winner that none of them named alone.

The math is the problem. When a vendor sells you one "visibility score," that number has to compress three engines into one. On the 21 percent of our queries where the engines fully diverged, there was no single winner to compress: ChatGPT, Perplexity, and Google AI Overviews each named a different tool. A blended score papers over that by picking, implicitly, which engine to trust, and then hiding the choice inside a formula you cannot inspect. The honest alternative is to report the disagreement itself. That is what the CONSENSUS Protocol formalizes with the Answer-Engine Consensus Index (AECI) and a four-state Engine-Consensus flag: Consensus, single-engine dissent, Absent, or Due-diligence, per query, per engine, with a date attached.

The revenue consequence is concrete. If your tracking dashboard tells you that you are "62 percent visible" while two of three engines never name you, you will under-invest in exactly the engines where you are absent. A team that knows it is a single-engine dissent on Perplexity and absent on Google AI Overviews can act on it. A team holding one blended percentage cannot. And the answer surface is almost always present: Google AI Overviews triggered on 19 of 20 queries (95 percent)verified 2026-06-07 in our run, so the AI answer is a pipeline input before the buyer ever reaches your site. For the broader landscape of tools feeding these answers, our colleagues at Nesyona's AI SEO tools index track the category in depth.

How this was measured

Queries: 20 GTM buying-intent prompts (marketing, SEO/GEO, sales verticals), all published by ID.
Engines: ChatGPT (GPT-5.x, web search on), Perplexity (default web), Google AI Overviews (default Google Search).
Answers: 60 AI answers captured (20 queries x 3 engines). 162 Perplexity citations logged.
Agreement metric: Cross-engine agreement on the named top tool, scored on the 14 category and intent queries (head-to-head queries excluded as use-case splits).
Citation fidelity: Full only for Perplexity (native numbered source list). ChatGPT chips and AIO collapsed citations are not comparable, so cross-engine source overlap is not claimed.
Snapshot: 2026-06-07. AI answers are volatile; re-run the protocol to reproduce. License CC BY 4.0. Dataset DOI: 10.5281/zenodo.20632768 (Zenodo, CC BY 4.0).

Where does your category sit on this map?

The free AI Visibility Audit runs the first step of the CONSENSUS Protocol for you: it returns your Engine-Consensus flag across ChatGPT, Perplexity, and Google AI Overviews, with a snapshot date, in minutes. No blended black-box number. Just whether the engines agree on you, and where they do not.

Run my free AI Visibility Audit ›

📊 Want the cross-engine divergence dataset? The full 60-answer table plus the Engine-Consensus flag scoring rubric, ready to run on your own category. We will send it.

What should a revenue team do about engine divergence?

The response to divergence is to measure each engine separately and prioritize the engine where you are absent, rather than chasing one averaged score.

Stop optimizing for the average and start optimizing for the gap. Divergence means the leverage is uneven across engines, so the move is to find the engine where you are absent and earn the third-party sources that engine cites. Because roughly four in five citations went to third-party pages, the work is mostly off your own domain: review roundups, comparison pages, and the forum threads (Reddit above all) that the engines lean on. A brand that only optimizes its own site is working the 20 percent of the citation surface that was already most likely to point to it. For the placement side of that work, we productize the monitoring-and-earned-placement loop through our GEO monitoring and placement retainer, and the full measurement standard behind it is the CONSENSUS Protocol.

Q: Does divergence mean AI search is too unstable to optimize for?

A: No. Divergence means the opposite of unstable in the categories that have settled (prospecting, enrichment, content optimization all had a consensus winner). It means the unsettled categories, like GEO itself, are still open. Instability is a reason to date every measurement and re-run on a cadence, not a reason to skip the channel where the buying decision now starts.

If you are starting from zero, the sequence is: define GEO and why it matters in our GEO definition, see the full landscape in the State of AI Search 2026, then adopt the measurement discipline in the CONSENSUS Protocol. The divergence in this report is the evidence; those guides are the response.

Frequently asked questions

Do AI engines agree on which tool to recommend?

Mostly no. In our June 2026 study of 20 GTM buying-intent queries across ChatGPT, Perplexity, and Google AI Overviews, the three engines named the same top tool on only 5 of 14 category and intent queries (36 percent). They agreed two-of-three on 6 (43 percent) and named three completely different top tools on 3 (21 percent), all in the GEO and AI-visibility category. Cross-engine agreement on the single recommended tool is the exception.

How much do ChatGPT, Perplexity, and Google AI Overviews disagree?

On our 14 category and intent queries, the three engines fully agreed on the top tool 36 percent of the time and fully diverged (three different tools) 21 percent of the time, with 43 percent in two-of-three partial agreement. For example, on best GEO tool to track AI search visibility, ChatGPT named Profound, Perplexity named the Semrush AI Visibility Toolkit, and Google AI Overviews named Goodie AI: three answers to one question.

Which AI engine is the contrarian one?

In our run, Perplexity was the most frequent dissenter, naming a different top tool from the ChatGPT and Google AI Overviews consensus on at least four queries (a Veo vs HeyGen video query, a Salesforge vs Instantly cold-email query, an AiSDR vs 11x SDR query, and the GEO-tracking query). This is a snapshot from one dated run, not a permanent property of the engine.

Why does a single blended AI visibility score hide engine disagreement?

A blended score averages disagreeing engines into one number, so it reports a single winner even when the engines named three different tools. In our data, a blended score would have implied consensus on the 21 percent of queries where the engines fully diverged. The honest alternative, defined in the CONSENSUS Protocol, is to report the per-engine Answer-Engine Consensus Index (AECI) and a four-state Engine-Consensus flag instead of one averaged figure.

Were the citation sources behind these answers also compared across engines?

Only partially, and we disclose the limit. This report measures tool-recommendation agreement across all three engines. The citation-URL autopsy was captured at full fidelity only for Perplexity, which exposes a native numbered source list. ChatGPT renders citations as in-product chips and Google AI Overviews collapses and intermixes its citations with organic results, so a like-for-like cross-engine citation-overlap count was not possible in this run. We anchor on our own tool-agreement figures and treat any cross-engine domain-overlap percentage as external.

Bottom line

AI engines disagree far more than the single-blended-score vendors admit, and our own data proves it. Across 20 buying-intent queries on three engines, full three-engine agreement on the recommended tool happened on only 36 percent of category and intent queries, two-of-three on 43 percent, and full divergence on 21 percent, with all three full-divergence queries in the GEO and AI-visibility category. The mature categories had a consensus winner (Apollo, Clay, Surfer SEO); the category that measures AI visibility did not. We measured agreement on the recommended tool across all three engines and captured citation sources at fidelity only for Perplexity, and we keep those two facts separate. The external 11 percent cross-engine domain-overlap figure is not ours; our directly measured equivalent is the 36 percent agreement, 21 percent divergence reading. Run the measurement on your own category with our free AI Visibility Audit, see the method in the CONSENSUS Protocol, or read the full evidence in the Who AI Recommends: GTM 2026 study.

Lucreya original measurement. Who AI Recommends: GTM Tool and Source Citations Across ChatGPT, Perplexity, and Google AI Overviews (2026). 20 queries, 3 engines, 60 answers, 162 Perplexity citations. Snapshot date 2026-06-07. lucreya.com/research/who-ai-recommends-gtm-2026/. CC BY 4.0. Dataset DOI: 10.5281/zenodo.20632768 (Zenodo, CC BY 4.0). verified 2026-06-07
Lucreya. The CONSENSUS Protocol: How to Measure AI Visibility Honestly (AECI Method, 2026). lucreya.com/articles/the-consensus-protocol. The measurement standard defining the AECI and the four-state Engine-Consensus flag used in this report.
External cross-engine citation-overlap analysis. The ~11 percent figure for cited domains appearing across two or more AI engines is an external industry finding; confirm against the originating source before reuse. It is not a Lucreya measurement. Lucreya's own directly measured equivalent (36 percent full three-engine agreement, 21 percent full divergence across 14 queries) is reported from the study above. verified 2026-06-07
Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., Deshpande, A. GEO: Generative Engine Optimization. 2023. arxiv.org/abs/2311.09735. The Princeton GEO framework on raising generative-engine visibility through on-page tactics.
Google. Generative AI in Search: Let Google do the searching for you. blog.google/products/search/generative-ai-search/. Reference for Google AI Overviews behavior and trigger volatility.
Perplexity AI. perplexity.ai. The one engine in the study exposing a native numbered citation list, the basis for the source-fidelity caveat.
Creative Commons. CC BY 4.0 License. creativecommons.org/licenses/by/4.0/. License for the Lucreya measurement dataset.