News & Article Extractor
Pricing
Pay per event
News & Article Extractor
Auto-discover news/blog articles and extract clean text plus Markdown for LLM/RAG corpora. Uses RSS, sitemaps, and Readability; outputs metadata, counts, and token estimates.
News & Article Extractor
Pricing
Pay per event
Auto-discover news/blog articles and extract clean text plus Markdown for LLM/RAG corpora. Uses RSS, sitemaps, and Readability; outputs metadata, counts, and token estimates.
Enter the website URLs to extract articles from (e.g. https://bbc.com, https://techcrunch.com). Each URL will be scanned for RSS feeds or sitemaps to discover articles.
Maximum number of articles to extract per website. Keep low for quick tests (10-20), set higher for full crawls.
Enable to fetch and extract the full article body text using @mozilla/readability. Disable to return only metadata (title, date, author) from RSS/sitemap — much faster and cheaper.
Include image URLs found in the article content.
Only extract articles published on or after this date (ISO format: YYYY-MM-DD). Leave empty for no date filter.
Only extract articles published on or before this date (ISO format: YYYY-MM-DD). Leave empty for no date filter.
Timeout for each HTTP request in seconds. Increase for slow sites.