URLs & Sitemaps

Add website content as knowledge sources.

Crawl your website pages to train your chatbot on existing content.

Adding URL Sources

Single URL

  1. Go to Sources > Add Source
  2. Select URL
  3. Enter the full URL (e.g., https://example.com/help)
  4. Click Add

Grounded will crawl the page and extract its content.

Sitemap

To add multiple pages at once:

  1. Enter your sitemap URL (e.g., https://example.com/sitemap.xml)
  2. Grounded will discover and crawl all pages in the sitemap
  3. Each page becomes a separate source

Most websites have a sitemap at /sitemap.xml. Check your site or ask your developer.

Crawling Behavior

What Gets Crawled

  • Main content of the page
  • Text in headings, paragraphs, and lists
  • Table content
  • Alt text from images

What's Excluded

  • Navigation menus and footers
  • Advertisements
  • Cookie consent banners
  • Scripts and styling

Depth and Limits

  • Single URL: Only that page
  • Sitemap: All pages listed (up to 500 per sitemap)
  • Links on pages are not automatically followed

Content Requirements

For best results, your pages should:

  • Be publicly accessible (no login required)
  • Return HTTP 200 status
  • Contain text content (not just images/video)
  • Load without JavaScript (server-rendered)

Pages that require JavaScript to render content may not be fully crawled. Contact support if you have a single-page application (SPA).

Refreshing Content

When your website content changes:

  1. Find the source in your Sources list
  2. Click the Refresh icon
  3. Wait for processing to complete

The chatbot will use updated content immediately after processing.

Troubleshooting

Page Not Crawling

Possible causes:

  • Page requires authentication
  • Page blocked by robots.txt
  • Server returning errors
  • JavaScript-only content

Solutions:

  • Make the page publicly accessible
  • Check your robots.txt allows our crawler
  • Verify the page loads in an incognito browser

Content Missing

Possible causes:

  • Content loaded via JavaScript
  • Content in iframes
  • Very short page content

Solutions:

  • Ensure content is server-rendered
  • Add the iframe source URL separately
  • Combine short pages into comprehensive ones

Wrong Content Extracted

If navigation or footer content is included:

  • This occasionally happens with unusual page structures
  • The impact on chat quality is usually minimal
  • Contact support for persistent issues