Research / State of AI Answers / July 2026
The State of AI Answers, July 2026
What 721 archived AI answers actually said.
A snapshot across ChatGPT, Perplexity, and Google AI Overviews: 66 buyer prompts over 32 category baskets, measured 2026-07-02 to 2026-07-04. Every number below keeps its denominator, and the caveats section is part of the findings, not fine print.
Archived runs
721
Engines
3
Distinct prompts
66
Category baskets
32
How often do the brands in an AI answer change between runs?
Most of the time. On prompts with six or more archived runs, the named-brand set churned in 37 of 57 (64.9%). Only 3 of 57 held fully stable.
For every prompt with six or more archived runs, we score run-to-run agreement of the full named-brand set per engine and grade the prompt by its worst engine. The result is the headline of this edition: on 37 of 57 prompts (64.9%), identical prompts returned a materially different brand roster more than half the time. 17 more were mixed. The set repeated perfectly on 3 prompts (5.3%). The remaining 9 of the archive's 66 prompts had fewer than six runs and are excluded here rather than graded on thin data.
The practical reading: a single screenshot of an AI answer is an anecdote, whichever way it flatters. One brand can sit stably in an answer while the rest of the roster churns around it, which is why presence counts and set agreement are separate measurements on the methods page. This instability is also why nothing here reports a per-query rank.
Whose pages do AI answers actually cite?
Third parties. In our 226-run category basket, 93.6% of 1,494 cited-domain instances carried no tracked brand; 4.8% were a named brand's own site.
In the AI-visibility-tools basket (226 runs, our own category), we classified all 1,494 cited-domain instances against the tracked brand list. 1,399 of them (93.6%) point at pages carrying no tracked brand at all: comparison posts, forums, videos, other people's editorial. Domains owned by any tracked brand account for 6.4%, and only 4.8% are a brand's own domain cited in an answer that names that brand.
Seen from the brand side, self-citation is real but minor: of 221 brand-mention instances in the basket, 32.1% had the brand's own domain in the same answer's citations (the same 71 instances as above, measured against a different base). The rest of the time, the engine names a brand because a page someone else wrote says so. That is the whole case for treating third-party placement, and never on-site tweaks alone, as the lever that moves AI answers.
How much of AI answers' grounding is YouTube and Reddit?
84.8% of prompts cited youtube.com (56 of 66); 80.3% cited reddit.com. They are the two most-cited domains overall.
Across all 66 prompts, youtube.com appeared in the cited domains of 56 (84.8%) and reddit.com in 53 (80.3%). By run volume they beat every vendor site and every publisher: no other domain comes close. The median prompt drew on 28 distinct cited domains (min 5, max 66), pooled across every run of that prompt. The grounding pool per question is wide, and video plus forum content sits in most of those pools.
| # | Domain | Runs citing it (of 721) |
|---|---|---|
| 1 | youtube.com | 319 |
| 2 | reddit.com | 260 |
| 3 | wavecnct.com | 111 |
| 4 | getmobly.com | 96 |
| 5 | hubspot.com | 91 |
| 6 | zuddl.com | 59 |
| 7 | blinq.me | 58 |
| 8 | linkedin.com | 56 |
| 9 | medium.com | 53 |
| 10 | semrush.com | 52 |
| 11 | popl.co | 48 |
| 12 | orbstack.dev | 47 |
| 13 | lensmor.com | 47 |
| 14 | eventleaf.com | 44 |
| 15 | apple.com | 43 |
The rest of the list is what you would hope: the actual vendors in the measured categories, plus the horizontal utility players. UGC platforms at the top, brand sites underneath. A domain counts once per run here even if an answer cites it repeatedly.
Do ChatGPT, Perplexity, and Google AI Overviews agree?
No. In one basket the leading brand's mention share ran from 18.6% on ChatGPT to 56.7% on Perplexity, a 38.0-point spread on the same prompts.
Measurably, no. In the event-lead-capture basket, iCapture's mention share ran from 18.6% on ChatGPT (11 of 59 runs) to 56.7% on Perplexity (34 of 60), with Google AI Overviews in between. Same prompts, same days, different market map. The container-runtimes basket shows the opposite regime: Docker Desktop holds 93.3% or more on every engine. Engine skew is a property of the category, not a constant of the tools.
| Basket / leader | Google AI Overviews | ChatGPT | Perplexity |
|---|---|---|---|
| Event lead capture / iCapture | 48.0% (59/123) | 18.6% (11/59) | 56.7% (34/60) |
| Container runtimes / Docker Desktop | 93.3% (42/45) | 93.3% (14/15) | 100.0% (15/15) |
One engine-specific artifact worth knowing: Google AI Overviews returned an empty answer on 30 of 388 runs (7.7%), usually by not triggering an Overview at all. ChatGPT and Perplexity returned 0 empties across 333 combined runs. If your visibility tracking treats a non-triggered Overview as absence, it is measuring two different things with one number.
Are AI answer categories winner-take-all?
It depends on the category. 22 of 28 small verification baskets had one brand at 80%+ mention share; in our own category the leader held 30.5%.
Both regimes exist, cleanly separated. In 22 of 28 small verification baskets (5 runs each, run as gate checks), a single brand appeared in 80% or more of runs: the answer is already decided, and it is usually the incumbent, not the newest entrant. Meanwhile the large baskets are contested. In our own category the leader, Profound, appears in 69 of 226 runs (30.5%), with Otterly at 26.1%. The AEO-agency basket is even flatter: the leading agency shows in 12 of 45 runs (26.7%). Nobody selling AI visibility, ourselves included, owned their own answer on this snapshot. Ours is the number under a standing public test: the archive tracks whether our own playbook moves it, on the evidence log, nulls included.
| Basket | Runs | Leader | Runner-up |
|---|---|---|---|
| Event lead capture (tracked property unnamed; evidence log #007) | 242 | iCapture 43.0% (104/242) * | Cvent 43.0% (104/242) |
| Container runtimes (orbstack.dev basket) | 75 | Docker Desktop 94.7% (71/75) | OrbStack 82.7% (62/75) |
| AI visibility tools (our own category) | 226 | Profound 30.5% (69/226) | Otterly 26.1% (59/226) |
| AEO agencies | 45 | Discovered Labs 26.7% (12/45) | Profound 22.2% (10/45) |
* Dead tie: both brands appear in 104 of 242 runs. "Leader" there is an arbitrary label and the gap is zero.
The 28 verification baskets sit in categories currently in our outreach pipeline, so they publish in aggregate only: we never name companies we might be talking to (the data-handling rules are on the methods page). One large basket's tracked property is unnamed for a different reason: it has outside stakeholders, and that redaction is itself a logged entry (#007) on the evidence log. The container-runtimes basket shows concentration is not only a small-sample artifact: across 75 runs, Docker Desktop appears in 94.7% of answers, with OrbStack close behind. In a fragmented category, placement work can still change the answer. In a decided one, the honest first deliverable is the concentration reading itself, because it tells you which fight you are in before you fund the wrong one.
What can a two-day snapshot not tell you?
Trends. All 721 runs sit inside a two-day window, so this edition makes no time-axis claims. The archive re-measures weekly; later editions will.
This section is part of the findings. What this edition cannot support, by construction:
- No trends. Every run landed between 2026-07-02 and 2026-07-04. Nothing here says answers are getting more or less stable, or that any brand is rising. The archive re-measures weekly, so the time axis accrues on its own; later editions inherit it.
- Uneven engine mix. Google AI Overviews contributes 388 of 721 runs (53.8%), so any figure not broken out per engine leans toward Google's behavior.
- Small baskets stay small. The 28 verification baskets are mostly 5 runs each. They are labeled as such and never mixed into the large-basket figures.
- The self-listing classifier is deliberately conservative. It matches brands to domains by name, not by verified ownership, so brands whose domain does not carry their name can be undercounted as self-listed. Note the direction: correcting this would raise the self-citation share, not the third-party one.
- Per-prompt domain counts grow with runs. The distinct-domains-per-prompt figure unions across all runs of a prompt, so heavily-run prompts show wider pools.
How were these numbers made?
From the append-only archive: 721 runs, 3 engines, 66 prompts. Every figure keeps its denominator, and per-query rank is never computed or sold.
Everything above is computed from our append-only archive: 721 runs, 66 distinct buyer prompts, three engines, archived with timestamps from run #1 and never rewritten. What we query and how baskets, variance bands, and the brand-set metric work is fully disclosed on the methods page, including what we refuse to measure: per-query rank does not appear in this study and never will, because it does not survive repetition. Prospect and client readings are excluded from publication by rule.
If you want this instrument pointed at your own category, GapCheck runs the same basket methodology on the prompts your buyers ask, with the same variance bands on every number.