-
Page Grounding Probe [Free AI SEO Tool] by DEJAN SEO
How Google’s Grounding Pipeline Works
DEJAN reverse-engineered Google’s Gemini grounding pipeline by examining raw
groundingSupportsandgroundingChunksfrom the API. The pipeline operates in this sequence:- User enters a prompt.
- Query fanout: A model decomposes the prompt into single-intent sub-queries (fanout queries).
- Retrieval: For each fanout query, Google’s search index returns ranked results, narrowed to ~5–20 sources per query.
- Extractive summarization (snippet construction): For each selected result, the system builds a grounding snippet. Page content is chunked into sentences, each scored against the query, and the highest-scoring chunks are assembled into the snippet — joined by ellipses where non-contiguous.
- Grounding context assembly: All snippets across all sources are supplied to the model as context alongside the user prompt, media, and personalization signals.
- Synthesis & attribution: The model generates its answer, and each claim is attributed back to specific source sentences.
Key insight: Because snippets are query-dependent, the same page yields different extractions for different fanout queries.
The Extraction Method: Extractive Summarization
Google uses extractive (not abstractive) summarization for grounding. This means it pulls exact sentences from your page — it does not rewrite or paraphrase your content for the grounding context.
Observed Extraction Characteristics
- Query-focused selection: Sentences semantically close to the query are strongly preferred. Unrelated sections on the same page are skipped entirely.
- Heavy positional/lead bias: Opening paragraphs are extracted almost wholesale, regardless of content.
- Structural noise ingestion: Table-of-contents entries, section headers, link artifacts, and
¶markers are treated as sentences and scored alongside prose. - Sentence-level granularity: The extraction unit is individual sentences, not passages or paragraphs.
- Confidence scores: Per-chunk scores range from 0.1 to 1.0, representing grounding-source-to-generative-chunk relevance.
DEJAN successfully fine-tuned
micSource: https://dejan.ai/blog/sro-grounding-snippets/
Bot/CloudFlare Notes
Check your robots.txt:
User-agent: DataForSeoBot
Allow: /User Agent String: Mozilla/5.0 (compatible; DataForSeoBot/1.0; +https://dataforseo.com/dataforseo-bot)The bot obeys robots.txt rules and crawl-delay directives.
Log in to reply.