LucreyaResearch › The AI Recommendation Audit 2026

Original research · 2026-06-19

5 AI engines, one question, zero agreement on the best B2B tool

We asked the AI assistants your buyers actually use which software they recommend, across 16 go-to-market categories. Here is what they say, who is invisible, and why the answer depends on which AI you ask.

0%of 16 categories where all 5 engines named the same #1 tool

AI engines

B2B categories

buying queries

716

recommendations

242

tools named

Leaderboard

The AI-Visibility leaderboard

Every tool's position-weighted recommendation frequency across all engines and categories, normalized 0 to 100. A #1 pick counts more than a #5; breadth across categories is rewarded.

	Tool	Score	Mentions	#1 pick	Cats
1	HubSpot	100	37	18	7
2	Ahrefs	44.4	14	9	2
3	Mailchimp	34.4	15	6	4
4	Calendly	34.1	9	8	1
5	Zendesk	32.7	10	7	2
6	Unbounce	32.1	9	7	1
7	Jasper	30	9	7	2
8	SEMrush	26.9	13	2	2
9	Asana	25.4	7	6	1
10	Salesforce	25	12	3	4
11	Hootsuite	24.7	8	5	1
12	Apollo.io	23.4	9	4	2
13	ActiveCampaign	22.9	11	3	2
14	Amplitude	22.7	7	5	1
15	SurveyMonkey	22	6	5	1
16	Klaviyo	21.5	11	2	3
17	Buffer	21.4	8	3	1
18	Outreach	19.4	8	3	2
19	ChatGPT	18.7	8	4	4
20	Dialogflow	18	5	4	1
21	Freshdesk	17	7	2	1
22	Google Analytics	16.9	8	3	2
23	Typeform	15	8	1	1
24	Marketo	13.8	6	2	2
25	Mixpanel	13	7	0	1

Top 25 of 242 measured tools. Linked tools have a full AI-Visibility report card. Complete ranking in the open dataset.

The centerpiece

Do different AI engines agree on who to recommend?

Mostly, no. Below is each engine's top pick in every category. The five engines named the same single best tool in only 0% of categories, disagreeing in 16 of 16. Which AI a buyer happens to ask changes which tool they're told to buy.

web-grounded engine (reads the live web)model-memory engine (recalls training data)≠ engines disagree on #1

Category	Llama 3.3 via Groqmemory	Cohere Command-Amemory	Gemini 2.5 Flash-Litegrounded	Perplexitygrounded	ChatGPTgrounded	Agree?
AI copywriting	WordLift	Jasper	Jasper	Jasper	Jasper	≠
CRM	HubSpot	HubSpot	Salesforce	HubSpot	HubSpot	≠
Email marketing	Mailchimp	Mailchimp	Klaviyo	HubSpot	ActiveCampaign	≠
SEO tools	Ahrefs	Ahrefs	SEMrush	SEMrush	Ahrefs	≠
Sales engagement	HubSpot	Outreach	Amplemarket	Amplemarket	Outreach	≠
Product analytics	Google Analytics	Amplitude	Amplitude	Amplitude	Amplitude	≠
Project management	Asana	Trello	Asana	Asana	ClickUp	≠
Customer support	Freshdesk	Zendesk	Zendesk	Zendesk	Zendesk	≠
Social media management	Hootsuite	Hootsuite	Buffer	Buffer	Buffer	≠
Landing page builders	Unbounce	Unbounce	Landingi	Unbounce	Webflow	≠
Lead generation	HubSpot	HubSpot	SyncGTM	Apollo.io	Apollo.io	≠
Marketing automation	Marketo	HubSpot	ActiveCampaign	ActiveCampaign	HubSpot	≠
AI chatbots	Dialogflow	Dialogflow	ChatGPT	ChatGPT	ChatGPT	≠
Scheduling	Calendly	Calendly	Celoxis	Calendly	Calendly	≠
Survey / forms	SurveyMonkey	Typeform	SurveyMonkey	SurveyNinja	SurveyMonkey	≠
GEO / AI visibility	Ahrefs	Ahrefs	Ahrefs	Siftly	Profound	≠

Grounded engines and memory engines recommend different tools

The split above is not random. It tracks how each engine knows what it knows.

Model-memory engines

Llama 3.3 via Groq and Cohere Command-A answer from training data. They reliably name the established, widely-written-about brands (the HubSpots and Mailchimps) and tend to miss anything newer than their training cut-off.

Web-grounded engines

Gemini 2.5 Flash-Lite, Perplexity, ChatGPT read live search results, so they surface the current and trending winners (newer category leaders in email, sales, and AI-visibility tooling) that the memory engines never mention.

AI assistants increasingly answer "what's the best tool for X" before a buyer ever scans a page of links, which makes being named by AI a surface distinct from your Google ranking. The engines we queried are public and reproducible (Llama 3.3 via Groq, Cohere Command-A, Gemini 2.5 Flash-Lite, Perplexity, ChatGPT), each asked the identical question set, with the full raw output in the open dataset below. This audit builds on our earlier AI citation autopsy and the 30-tool GTM index.

Source bias

Where AI gets its recommendations

When the grounded engines backed their answers with live web search, these are the domains they cited most. This is the map of where to be cited if you want AI to recommend you.

emailvendorselection.com

By category

What AI recommends in each category

The consensus short list per category, blended across all 5 engines.

AI chatbots

Dialogflow, ChatGPT, Microsoft Bot Framework, ManyChat

AI copywriting

Jasper, Copy.ai, WordLift, Writesonic

CRM

HubSpot, Salesforce, Zoho, Pipedrive

Customer support

Zendesk, Freshdesk, Intercom, Salesforce Service Cloud

Email marketing

Mailchimp, Klaviyo, ActiveCampaign, Constant Contact

GEO / AI visibility

Ahrefs, SEMrush, Profound, Siftly

Landing page builders

Unbounce, Instapage, Leadpages, Webflow

Lead generation

HubSpot, Apollo.io, LinkedIn Sales Navigator, ZoomInfo

Marketing automation

HubSpot, ActiveCampaign, Marketo, Mailchimp

Product analytics

Amplitude, Mixpanel, Google Analytics, Tableau

Project management

Asana, ClickUp, Trello, Jira

Sales engagement

Outreach, HubSpot, Apollo.io, Amplemarket

Scheduling

Calendly, Doodle, Acuity Scheduling, Setmore

SEO tools

Ahrefs, SEMrush, Surfer SEO, Google Search Console

Social media management

Hootsuite, Buffer, Sprout Social, Hootsuite Insights

Survey / forms

SurveyMonkey, Typeform, Google Forms, Qualtrics

Is your tool in the short list?

Run the AI-Visibility check to see where your product lands when buyers ask AI for a recommendation.

Run the AI-Visibility check Download the data

FAQ

Frequently asked questions

Is AI visibility the same as SEO?

No. AI visibility is whether AI assistants name your tool when a buyer asks for a recommendation; SEO is your rank in the blue links. The two overlap only partly, so AI visibility is a separate, winnable race.

How is the AI-Visibility Score calculated?

It is a position-weighted count of how often a tool is recommended across the query set (a #1 recommendation counts more than a #5), normalized 0 to 100 and rewarded for breadth across categories. The full method and open data are on this page.

Does this rank product quality?

No. It measures what AI engines OUTPUT when asked, not which tool is objectively best. AI outputs reflect what their training data or the live web says, and can be wrong or biased. Treat it as a visibility snapshot, not an endorsement.

Why do the engines disagree so much?

Model-memory engines recall the established names from training data, while web-grounded engines read current pages and surface newer or trending tools. Same question, different information source, different answer.

How often is it updated?

It is a dated snapshot. AI answers shift, so the protocol is built to be re-run, and each refresh produces a delta showing how the recommendations moved.

Method & honesty floor

How this was measured

SOURCED: every recommendation is a recorded output of a live AI engine. Engines audited: Llama 3.3 70B via Groq (model memory) [model-memory, 48 queries]; Cohere Command-A (model memory) [model-memory, 48 queries]; Gemini 2.5 Flash-Lite (grounded) [web-grounded, 16 queries]; Perplexity (web search) [web-grounded, 16 queries]; ChatGPT (web, GPT-5) [web-grounded, 16 queries]. Across 16 B2B/GTM software categories, the two model-memory engines answered all three phrasings per category; the three web-grounded engines (Gemini via API, ChatGPT and Perplexity via browser) answered one query per category, capturing live web sources where available. DERIVED: the AI-Visibility Score (position-weighted recommendation frequency, normalized 0-100), per-category leaderboards, cross-engine agreement, and source-citation tallies. HONESTY FLOOR: this measures what AI engines OUTPUT, not product quality or our opinion. Search-grounded engines reflect the live web; model-memory engines reflect training data and can name tools that do not exist or omit real leaders. Because the grounded engine is sampled less densely than the memory engines, the blended leaderboard leans slightly toward model-memory; the per-engine comparison below separates them. Outputs vary by phrasing, date, and engine; this is a dated snapshot, not a ranking endorsement. Browser passes (ChatGPT, Perplexity) are layered in as documented.

Engines: 5 Categories: 16 Queries: 48 Recommendations: 716 License: CC-BY 4.0 Snapshot: 2026-06-19 DOI: 10.5281/zenodo.20767878

Open data: ai-recommendation-audit-2026.json, free to reuse with attribution to Lucreya. This measures what AI engines OUTPUT, not product quality; a dated snapshot, re-run to reproduce.

Cite this dataset

Couey, V. W. (2026). The AI Recommendation Audit 2026: Which B2B/GTM Software AI Assistants Recommend [Data set]. Lucreya. https://doi.org/10.5281/zenodo.20767878

Zenodo DOI 10.5281/zenodo.20767878 Hugging Face Kaggle Raw JSON