Anyone else find that AI visibility tracking tools just give you a different number every week with no actual signal?

White Hat SEO

Anyone else find that AI visibility tracking tools just give you a different number every week with no actual signal?

Posted by soman_yadav on May 6, 2026 at 6:31 am

Been testing tools for tracking how often my client’s brand gets cited by ChatGPT/Perplexity. Tried 3 of the popular ones and the numbers kept jumping around. One week we’d be at 60% mention rate, next week 35%, then back up to 50%. We hadn’t changed anything.
At first I thought the tools were buggy. Then I ran the same prompt manually 10 times in a row in ChatGPT.
Got a different answer almost every time. Different brands appearing in different orders. Sometimes my client wasn’t mentioned at all, sometimes they were the top recommendation. Same prompt, same model, same day.
So the issue isn’t the tools. It’s that LLMs are non-deterministic and most tools are just running the prompt once and reporting that as data. Which is basically a coin flip.
I did the math out of curiosity. If you mention rate “40%” came from 4 mentions in 10 runs, the actual confidence interval on that is something like 12% to 74%. So saying you’re at 40% is meaningless without telling people your sample size.
Most tools don’t show sample size or confidence intervals because running each prompt 10+ times costs them 10x more in API fees. Economics push them to single-run snapshots.
Question for the sub: anyone found a tool that actually does this properly? Or is everyone just using the noisy numbers and pretending they’re real? Because right now I’m telling clients I can’t actually measure their AI visibility reliably and it’s a hard sell.
Also open to manual workflows if anyone has one that doesn’t take 4 hours per audit.

soman_yadav replied 2 weeks, 5 days ago 2 Members · 1 Reply
1 Reply

Shadowdancerdone

Guest
May 6, 2026 at 6:50 am

that’s how LLMs are. you mix in the temperature settings at the backend along with it fetching a different search result – you get a different answer

you’re absolutely right about the 10x fee bit. don’t think there’s a viable alternative as of today
Responsible-Tax-4938

Guest
May 6, 2026 at 7:01 am

I too have been struggling to get some reliable data for AI visibility, and nothing seems to be close as of now.
Your analysis does make sense here, since the same brand is not cited all the time it is difficult to keep a track and the API fee.
I would love to know how is it going to PAN out in the near future.
PDFBearSupport

Guest
May 6, 2026 at 7:02 am

Because they don’t work.
Ranketta

Guest
May 6, 2026 at 7:09 am

Hey, Matt from Ranketta here.
not sure which tools did you try, but it is obvious they were sub par.
– Since you are mentioning API costs, it suggests you used a tool that is tracking prompts (well, running a prompt tracking simulation would be a better term, shoutout to u/pearlswine) via API.

This does NOT give you an idea of what users in UI sees. Measuring asset visibility this way is simply wrong. The behaviour of the environment is wildly different when you ask through UI vs. API (API does not contain personalization and localization, UI can use multiple models and so on)

– Serious tools of course use methods suitable for measuring asset visibility within a probabilistic system (prompting UI and not API, running the prompt many times from a volume of accounts from behind a residential proxy, ….).

Happy to go into greater detail.
indianrodeo

Guest
May 6, 2026 at 7:12 am

the scummiest software category to exist
[deleted]

Guest
May 6, 2026 at 7:13 am

[removed]
[deleted]

Guest
May 6, 2026 at 7:28 am

[removed]
[deleted]

Guest
May 6, 2026 at 8:21 am

[removed]
stovetopmuse

Guest
May 6, 2026 at 9:38 am

You’re not crazy, I saw the same thing when rerunning identical prompts across different sessions. The variance gets even worse on smaller brands because one extra mention swings the percentage massively. Feels like most “AI visibility” dashboards are treating probabilistic outputs like rank tracking, which is kind of a broken model from the start.
Jammurger

Guest
May 6, 2026 at 10:32 am

The math you laid out is right and nobody in the vendor space wants to say it out loud because it undermines the product category they’re selling.

The non-determinism problem is structural, not a tooling problem waiting to be solved. Even if a tool ran every prompt 20 times and averaged the results, the confidence intervals would still be too wide to make meaningful week-over-week comparisons. You’d need hundreds of runs per prompt per week to get stable data, and the economics of that are completely unrealistic at current API pricing.

The honest answer for clients right now is what you’re already doing — telling them you can’t reliably measure ChatGPT/Perplexity citation rates. That’s not a gap in your service, it’s an accurate description of the current state of the space. Any tool claiming otherwise is either misrepresenting their methodology or their sample sizes.

The one slice of “AI visibility” that is actually measurable is Google AI Overviews. Same query, consistent results, trackable over time because it runs against a live index rather than a probabilistic model. That’s where i’d focus client reporting right now. Semust tracks this per keyword daily — which queries trigger an AI Overview, which domains get cited — and the data is stable enough to show real trends. Not the same as ChatGPT visibility but at least it’s a number that means something.

For the LLM side the closest thing to a useful manual workflow is picking a fixed set of 10-15 high-intent prompts, running each one 5 times per month, and tracking directional trends rather than point estimates. Acknowledge the noise explicitly in the report. It’s less impressive than a dashboard but it’s more honest.
[deleted]

Guest
May 6, 2026 at 12:18 pm

[removed]

Anyone else find that AI visibility tracking tools just give you a different number every week with no actual signal?

Shadowdancerdone

Responsible-Tax-4938

PDFBearSupport

Ranketta

indianrodeo

[deleted]

[deleted]

[deleted]

stovetopmuse

Jammurger

[deleted]