AI Sitemap Content Extractor
Pricing
from $4.00 / 1,000 processed pages
AI Sitemap Content Extractor
Transform website sitemaps into clean, AI-ready content with Markdown, semantic chunks, and optional AI summaries.
AI Sitemap Content Extractor
Pricing
from $4.00 / 1,000 processed pages
Transform website sitemaps into clean, AI-ready content with Markdown, semantic chunks, and optional AI summaries.
Enter the website's main URL (e.g., https://example.com) or a direct sitemap URL (e.g., https://example.com/sitemap.xml). The Actor will automatically find and parse the sitemap.
[ { "url": "https://example.com" }]Maximum number of pages to fetch and process. Set to 0 for unlimited (not recommended for large sites).
Maximum URL path depth to process. Pages deeper than this will be skipped. Set to 0 for no limit.
Number of pages to fetch in parallel. Higher = faster but uses more memory. Recommended: 10-50.
Additional URL patterns to exclude (one per line, supports regex). Built-in exclusions: login, privacy, terms, admin, feeds, media files.
[]If set, only URLs matching these patterns will be processed (one per line, supports regex). Leave empty to process all non-excluded URLs.
[]Minimum quality score (0-100) for a page to be included. Pages below this threshold will be skipped. Set to 0 to include all pages.
Target number of tokens per chunk for LLM-ready content splitting. Set to 0 to disable chunking.
Number of overlapping tokens between consecutive chunks for context continuity.
Generate a 2-4 sentence summary for each page using Groq AI.
Classify each page as blog_post, documentation, landing_page, etc. using Groq AI.