URLs & Sitemaps
Add website content as knowledge sources.
Crawl your website pages to train your chatbot on existing content.
Adding URL Sources
Single URL
- Go to Sources > Add Source
- Select URL
- Enter the full URL (e.g.,
https://example.com/help) - Click Add
Grounded will crawl the page and extract its content.
Sitemap
To add multiple pages at once:
- Enter your sitemap URL (e.g.,
https://example.com/sitemap.xml) - Grounded will discover and crawl all pages in the sitemap
- Each page becomes a separate source
Most websites have a sitemap at /sitemap.xml. Check your site or ask your developer.
Crawling Behavior
What Gets Crawled
- Main content of the page
- Text in headings, paragraphs, and lists
- Table content
- Alt text from images
What's Excluded
- Navigation menus and footers
- Advertisements
- Cookie consent banners
- Scripts and styling
Depth and Limits
- Single URL: Only that page
- Sitemap: All pages listed (up to 500 per sitemap)
- Links on pages are not automatically followed
Content Requirements
For best results, your pages should:
- Be publicly accessible (no login required)
- Return HTTP 200 status
- Contain text content (not just images/video)
- Load without JavaScript (server-rendered)
Pages that require JavaScript to render content may not be fully crawled. Contact support if you have a single-page application (SPA).
Refreshing Content
When your website content changes:
- Find the source in your Sources list
- Click the Refresh icon
- Wait for processing to complete
The chatbot will use updated content immediately after processing.
Troubleshooting
Page Not Crawling
Possible causes:
- Page requires authentication
- Page blocked by robots.txt
- Server returning errors
- JavaScript-only content
Solutions:
- Make the page publicly accessible
- Check your robots.txt allows our crawler
- Verify the page loads in an incognito browser
Content Missing
Possible causes:
- Content loaded via JavaScript
- Content in iframes
- Very short page content
Solutions:
- Ensure content is server-rendered
- Add the iframe source URL separately
- Combine short pages into comprehensive ones
Wrong Content Extracted
If navigation or footer content is included:
- This occasionally happens with unusual page structures
- The impact on chat quality is usually minimal
- Contact support for persistent issues