LucreyaResearch  ›  The AI Recommendation Audit 2026
Original research · 2026-06-19

5 AI engines, one question, zero agreement on the best B2B tool

We asked the AI assistants your buyers actually use which software they recommend, across 16 go-to-market categories. Here is what they say, who is invisible, and why the answer depends on which AI you ask.

0%of 16 categories where all 5 engines named the same #1 tool
5
AI engines
16
B2B categories
48
buying queries
716
recommendations
242
tools named
Leaderboard

The AI-Visibility leaderboard

Every tool's position-weighted recommendation frequency across all engines and categories, normalized 0 to 100. A #1 pick counts more than a #5; breadth across categories is rewarded.

ToolAI-VisibilityScoreMentions#1 pickCats
1HubSpot
10037187
2Ahrefs
44.41492
3Mailchimp
34.41564
4Calendly
34.1981
5Zendesk
32.71072
6Unbounce
32.1971
7Jasper
30972
8SEMrush
26.91322
9Asana
25.4761
10Salesforce
251234
11Hootsuite
24.7851
12Apollo.io
23.4942
13ActiveCampaign
22.91132
14Amplitude
22.7751
15SurveyMonkey
22651
16Klaviyo
21.51123
17Buffer
21.4831
18Outreach
19.4832
19ChatGPT
18.7844
20Dialogflow
18541
21Freshdesk
17721
22Google Analytics
16.9832
23Typeform
15811
24Marketo
13.8622
25Mixpanel
13701

Top 25 of 242 measured tools. Linked tools have a full AI-Visibility report card. Complete ranking in the open dataset.

The centerpiece

Do different AI engines agree on who to recommend?

Mostly, no. Below is each engine's top pick in every category. The five engines named the same single best tool in only 0% of categories, disagreeing in 16 of 16. Which AI a buyer happens to ask changes which tool they're told to buy.

web-grounded engine (reads the live web)model-memory engine (recalls training data) engines disagree on #1
CategoryLlama 3.3 via GroqmemoryCohere Command-AmemoryGemini 2.5 Flash-LitegroundedPerplexitygroundedChatGPTgroundedAgree?
AI copywritingWordLiftJasperJasperJasperJasper
CRMHubSpotHubSpotSalesforceHubSpotHubSpot
Email marketingMailchimpMailchimpKlaviyoHubSpotActiveCampaign
SEO toolsAhrefsAhrefsSEMrushSEMrushAhrefs
Sales engagementHubSpotOutreachAmplemarketAmplemarketOutreach
Product analyticsGoogle AnalyticsAmplitudeAmplitudeAmplitudeAmplitude
Project managementAsanaTrelloAsanaAsanaClickUp
Customer supportFreshdeskZendeskZendeskZendeskZendesk
Social media managementHootsuiteHootsuiteBufferBufferBuffer
Landing page buildersUnbounceUnbounceLandingiUnbounceWebflow
Lead generationHubSpotHubSpotSyncGTMApollo.ioApollo.io
Marketing automationMarketoHubSpotActiveCampaignActiveCampaignHubSpot
AI chatbotsDialogflowDialogflowChatGPTChatGPTChatGPT
SchedulingCalendlyCalendlyCeloxisCalendlyCalendly
Survey / formsSurveyMonkeyTypeformSurveyMonkeySurveyNinjaSurveyMonkey
GEO / AI visibilityAhrefsAhrefsAhrefsSiftlyProfound
Why they diverge

Grounded engines and memory engines recommend different tools

The split above is not random. It tracks how each engine knows what it knows.

Model-memory engines

Llama 3.3 via Groq and Cohere Command-A answer from training data. They reliably name the established, widely-written-about brands (the HubSpots and Mailchimps) and tend to miss anything newer than their training cut-off.

Web-grounded engines

Gemini 2.5 Flash-Lite, Perplexity, ChatGPT read live search results, so they surface the current and trending winners (newer category leaders in email, sales, and AI-visibility tooling) that the memory engines never mention.

AI assistants increasingly answer "what's the best tool for X" before a buyer ever scans a page of links, which makes being named by AI a surface distinct from your Google ranking. The engines we queried are public and reproducible (Llama 3.3 via Groq, Cohere Command-A, Gemini 2.5 Flash-Lite, Perplexity, ChatGPT), each asked the identical question set, with the full raw output in the open dataset below. This audit builds on our earlier AI citation autopsy and the 30-tool GTM index.

Source bias

Where AI gets its recommendations

When the grounded engines backed their answers with live web search, these are the domains they cited most. This is the map of where to be cited if you want AI to recommend you.

By category

What AI recommends in each category

The consensus short list per category, blended across all 5 engines.

AI chatbots
Dialogflow, ChatGPT, Microsoft Bot Framework, ManyChat
AI copywriting
Jasper, Copy.ai, WordLift, Writesonic
CRM
HubSpot, Salesforce, Zoho, Pipedrive
Customer support
Zendesk, Freshdesk, Intercom, Salesforce Service Cloud
Email marketing
Mailchimp, Klaviyo, ActiveCampaign, Constant Contact
GEO / AI visibility
Ahrefs, SEMrush, Profound, Siftly
Landing page builders
Unbounce, Instapage, Leadpages, Webflow
Lead generation
HubSpot, Apollo.io, LinkedIn Sales Navigator, ZoomInfo
Marketing automation
HubSpot, ActiveCampaign, Marketo, Mailchimp
Product analytics
Amplitude, Mixpanel, Google Analytics, Tableau
Project management
Asana, ClickUp, Trello, Jira
Sales engagement
Outreach, HubSpot, Apollo.io, Amplemarket
Scheduling
Calendly, Doodle, Acuity Scheduling, Setmore
SEO tools
Ahrefs, SEMrush, Surfer SEO, Google Search Console
Social media management
Hootsuite, Buffer, Sprout Social, Hootsuite Insights
Survey / forms
SurveyMonkey, Typeform, Google Forms, Qualtrics

Is your tool in the short list?

Run the AI-Visibility check to see where your product lands when buyers ask AI for a recommendation.

Run the AI-Visibility checkDownload the data
FAQ

Frequently asked questions

Is AI visibility the same as SEO?
No. AI visibility is whether AI assistants name your tool when a buyer asks for a recommendation; SEO is your rank in the blue links. The two overlap only partly, so AI visibility is a separate, winnable race.
How is the AI-Visibility Score calculated?
It is a position-weighted count of how often a tool is recommended across the query set (a #1 recommendation counts more than a #5), normalized 0 to 100 and rewarded for breadth across categories. The full method and open data are on this page.
Does this rank product quality?
No. It measures what AI engines OUTPUT when asked, not which tool is objectively best. AI outputs reflect what their training data or the live web says, and can be wrong or biased. Treat it as a visibility snapshot, not an endorsement.
Why do the engines disagree so much?
Model-memory engines recall the established names from training data, while web-grounded engines read current pages and surface newer or trending tools. Same question, different information source, different answer.
How often is it updated?
It is a dated snapshot. AI answers shift, so the protocol is built to be re-run, and each refresh produces a delta showing how the recommendations moved.
Method & honesty floor

How this was measured

SOURCED: every recommendation is a recorded output of a live AI engine. Engines audited: Llama 3.3 70B via Groq (model memory) [model-memory, 48 queries]; Cohere Command-A (model memory) [model-memory, 48 queries]; Gemini 2.5 Flash-Lite (grounded) [web-grounded, 16 queries]; Perplexity (web search) [web-grounded, 16 queries]; ChatGPT (web, GPT-5) [web-grounded, 16 queries]. Across 16 B2B/GTM software categories, the two model-memory engines answered all three phrasings per category; the three web-grounded engines (Gemini via API, ChatGPT and Perplexity via browser) answered one query per category, capturing live web sources where available. DERIVED: the AI-Visibility Score (position-weighted recommendation frequency, normalized 0-100), per-category leaderboards, cross-engine agreement, and source-citation tallies. HONESTY FLOOR: this measures what AI engines OUTPUT, not product quality or our opinion. Search-grounded engines reflect the live web; model-memory engines reflect training data and can name tools that do not exist or omit real leaders. Because the grounded engine is sampled less densely than the memory engines, the blended leaderboard leans slightly toward model-memory; the per-engine comparison below separates them. Outputs vary by phrasing, date, and engine; this is a dated snapshot, not a ranking endorsement. Browser passes (ChatGPT, Perplexity) are layered in as documented.

Engines: 5 Categories: 16 Queries: 48 Recommendations: 716 License: CC-BY 4.0 Snapshot: 2026-06-19 DOI: 10.5281/zenodo.20767878

Open data: ai-recommendation-audit-2026.json, free to reuse with attribution to Lucreya. This measures what AI engines OUTPUT, not product quality; a dated snapshot, re-run to reproduce.

Cite this dataset

Couey, V. W. (2026). The AI Recommendation Audit 2026: Which B2B/GTM Software AI Assistants Recommend [Data set]. Lucreya. https://doi.org/10.5281/zenodo.20767878

Save
Dashboard