Forums Forums White Hat SEO Page Grounding Probe [Free AI SEO Tool] by DEJAN SEO

  • Page Grounding Probe [Free AI SEO Tool] by DEJAN SEO

    Posted by WebLinkr on March 11, 2026 at 3:03 pm

    How Google’s Grounding Pipeline Works

    DEJAN reverse-engineered Google’s Gemini grounding pipeline by examining raw groundingSupports and groundingChunks from the API. The pipeline operates in this sequence:

    1. User enters a prompt.
    2. Query fanout: A model decomposes the prompt into single-intent sub-queries (fanout queries).
    3. Retrieval: For each fanout query, Google’s search index returns ranked results, narrowed to ~5–20 sources per query.
    4. Extractive summarization (snippet construction): For each selected result, the system builds a grounding snippet. Page content is chunked into sentences, each scored against the query, and the highest-scoring chunks are assembled into the snippet — joined by ellipses where non-contiguous.
    5. Grounding context assembly: All snippets across all sources are supplied to the model as context alongside the user prompt, media, and personalization signals.
    6. Synthesis & attribution: The model generates its answer, and each claim is attributed back to specific source sentences.

    Key insight: Because snippets are query-dependent, the same page yields different extractions for different fanout queries.

    The Extraction Method: Extractive Summarization

    Google uses extractive (not abstractive) summarization for grounding. This means it pulls exact sentences from your page — it does not rewrite or paraphrase your content for the grounding context.

    Observed Extraction Characteristics

    • Query-focused selection: Sentences semantically close to the query are strongly preferred. Unrelated sections on the same page are skipped entirely.
    • Heavy positional/lead bias: Opening paragraphs are extracted almost wholesale, regardless of content.
    • Structural noise ingestion: Table-of-contents entries, section headers, link artifacts, and  markers are treated as sentences and scored alongside prose.
    • Sentence-level granularity: The extraction unit is individual sentences, not passages or paragraphs.
    • Confidence scores: Per-chunk scores range from 0.1 to 1.0, representing grounding-source-to-generative-chunk relevance.

    DEJAN successfully fine-tuned mic

    Source: https://dejan.ai/blog/sro-grounding-snippets/

    Bot/CloudFlare Notes

    Check your robots.txt:

    User-agent: DataForSeoBot
    Allow: /

    User Agent String: Mozilla/5.0 (compatible; DataForSeoBot/1.0; +https://dataforseo.com/dataforseo-bot)

    The bot obeys robots.txt rules and crawl-delay directives.

    WebLinkr replied 3 hours, 27 minutes ago 2 Members · 1 Reply
  • 1 Reply
  • PrimaryPositionSEO

    Guest
    March 11, 2026 at 3:03 pm

    Oh, this is cool

  • yekedero

    Guest
    March 11, 2026 at 3:26 pm

    Query fan out, ah yes… I will drink to that.

    Life’s too short to be sitting around miserable.

  • Lucifer_x7

    Guest
    March 11, 2026 at 3:59 pm

    I thought there was no tool promotion in this sub? Free or paid.

Log in to reply.