Testing methodology

How we test AI GTM tools

Every Lucreya verdict follows the same protocol: one shared brief per vertical, a seven-dimension rubric, and the failure modes published next to the wins. This page documents it in full, so readers can replicate any test and journalists can cite the method with confidence.

The review shape

Three passes. Nothing publishes that skips one.

Every roundup and comparison goes through three sequential passes. Each exposes failure modes the others miss, so we do not publish until all three are complete.

PASS 01

Hands-on the real job

Each tool runs the actual revenue-team task it claims to do, not a 10-minute trial. Marketing tools produce a full content brief; SEO/GEO tools audit a real target page and track a real brand; sales tools enrich a real list and build a real sequence.

PASS 02

Same brief, side by side

For comparisons we use identical inputs, identical starting state, and the same model versions where configurable, in separate sessions per tool. We capture real outputs, time, approvals, and any silent decisions. A real diff that survives scrutiny, never a "best of" pulled from marketing pages.

PASS 03

Where it loses, first

Every verdict names what the tool is bad at and who should not buy it. We write the "skip if" before the "buy if", because the negative space is what makes the recommendation trustworthy. A review without weaknesses is content marketing.

Scoring rubric

Seven dimensions, scored out of 10

Weights vary slightly by vertical (sales tools weight data accuracy higher; GEO tools weight measurement honesty higher) but the dimensions are stable across the network.

Dimension
What we measure
Default
Output quality
Quality of what the tool actually produces on the shared brief: copy, audits, enriched records, sequences.
25%
Workflow fit
How smoothly it slots into a real GTM stack: CRM, CMS, ad platforms, the editor people already use.
15%
Real total cost
List price vs true cost: seat math, credit overages, tier-flips, and the minimums the sales team pushes.
15%
Speed and reliability
Latency, uptime, rate and credit limits, deliverability controls, and error handling under real load.
15%
Data and accuracy
For data tools: match rates and field accuracy against a hand-verified sample. For content: factual and brand-voice fidelity.
10%
Time-to-value
Time from signup to first useful output. Onboarding, templates, and default quality.
10%
Renewal case
The honest reason to still be paying in month 12, not just the reason to start a trial.
10%
Per-vertical protocols

One shared brief per vertical

Each vertical has a fixed brief every tool runs, so scores compare like for like. Real outputs are kept on file.

AI marketing & content

The same content brief

Each tool produces the identical package from one source brief: a long-form post, five ad variants, and three social posts. We grade output quality, brand-voice control, editing time saved, and how the tool behaves when credits run low.

AI SEO & GEO

One target page, one tracked brand

Each tool audits the same target page and, for GEO tools, tracks the same brand across AI answers. We verify every visibility claim against the live engines (ChatGPT, Perplexity, Google AI) and are explicit about what a tool can and cannot actually measure today.

AI sales & outbound

One ICP, one list

Data and enrichment tools run the same list and ICP; we measure match rates and field accuracy against a hand-verified sample. Sequencing and SDR tools build the same campaign; we measure setup time, deliverability controls, and how honest the "autonomy" really is.

Testing environment

Disclosed, so anyone can replicate it

Primary workstation
Windows 10 Pro, AMD Ryzen + Radeon RX 6600
Secondary platforms
WSL2 (Ubuntu); macOS where vendor-specific
Browser baseline
Chrome (current stable), default settings
Network
Residential broadband, no VPN, US East
Account state
Paid plans bought with our own funds where required
Pricing source
Verified on the live vendor page at publish
Conflict of interest

Every commercial relationship, disclosed

See a problem? If you find a factual error, an outdated price, or a methodology inconsistency, email [email protected] with the article URL and the issue. We fix factual errors within 48 hours and add a correction note at the top of the affected article.
Update cadence

AI GTM tooling moves weekly. We re-test on three triggers.

Every article shows a "Last updated" date, and the full revision history is preserved in the site's git repository.

Author

Who runs these tests

Reviews are written by Vincent Couey, founder and lead reviewer. His evidence-first background in computational toxicology and physics informs the testing standards above. Read more about Lucreya and the team.

Save
Dashboard