Website Content Crawler
Pricing
from $0.01 / 1,000 results
Website Content Crawler
Crawl websites for SEO audits. Extracts HTML, title, meta tags, headings, links, & text content from pages. Automatic sitemap detection & parsing Extracts metadata (title, description, OG tags) Heading structure (H1, H2, H3) Internal & external link analysis Image extraction w/alt text Word count
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer

John Rippy
Actor stats
0
Bookmarked
15
Total users
9
Monthly active users
4 days ago
Last modified
Categories
Share
"SEO Audit & LLM-Ready Content Extraction" by John Rippy | johnrippy.link
🏆 2025 Zapier Automation Hero of the Year — Project Phoenix: A 95-step AI sales pipeline cutting development time by 50%. Read more →
Fast, reliable website crawler built for SEO audits and AI/LLM content analysis. Auto-discovers sitemaps, extracts metadata, headings, and LLM-ready markdown content from every page. A powerful Firecrawl alternative.
Features
- LLM-Ready Markdown Extraction - Firecrawl-like functionality using Mozilla Readability + Turndown
- Automatic sitemap detection (sitemap.xml, sitemap_index.xml, wp-sitemap.xml)
- Full HTML extraction per page
- Extracts metadata (title, description, OG tags, canonical, robots)
- Heading structure (H1, H2, H3)
- Internal and external link analysis
- Image extraction with alt text
- Word count and load time metrics
- Fingerprint injection for bot evasion
- Configurable crawl depth and page limits
Input
{"startUrls": [{ "url": "https://example.com" }],"maxCrawlPages": 25,"maxCrawlDepth": 2,"crawlSitemap": true,"demoMode": false}
| Field | Type | Default | Description |
|---|---|---|---|
startUrls | array | required | URLs to start crawling from |
maxCrawlPages | number | 25 | Maximum pages to crawl |
maxCrawlDepth | number | 2 | Maximum link depth to follow |
crawlSitemap | boolean | true | Auto-discover and parse sitemaps |
demoMode | boolean | true | Return mock data (for testing) |
Output
Each page returns:
{"url": "https://example.com/page","title": "Page Title","html": "<html>...</html>","text": "Page text content...","markdown": "# Page Title\n\nClean markdown content ready for LLMs...","statusCode": 200,"loadTimeMs": 1234,"metadata": {"description": "Meta description","keywords": "seo, crawler","ogTitle": "Open Graph Title","ogDescription": "OG description","ogImage": "https://example.com/og.jpg","canonical": "https://example.com/page","robots": "index, follow"},"headings": {"h1": ["Main Heading"],"h2": ["Subheading 1", "Subheading 2"],"h3": []},"links": {"internal": ["https://example.com/other"],"external": ["https://external.com"]},"images": [{ "src": "https://example.com/image.jpg", "alt": "Alt text" }],"wordCount": 1500,"markdownWordCount": 1200,"crawledAt": "2025-01-09T12:00:00.000Z"}
Use Cases
- SEO Audits - Technical SEO analysis and site health checks
- LLM Content Preparation - Extract clean markdown for AI training data
- Content Analysis - Analyze page content structure and quality
- Site Structure Mapping - Map internal linking and site architecture
- Broken Link Detection - Find 404s and dead links
- Meta Tag Analysis - Audit title tags, descriptions, and OG tags
Pricing
This actor uses pay-per-event pricing:
| Event | Title | Description | Price |
|---|---|---|---|
page_crawled | Page Crawled | Per page crawled with full HTML, text, and markdown extraction | $0.005 |
sitemap_discovered | Sitemap Discovered | Per sitemap discovered and parsed for URL extraction | $0.01 |
Example costs:
- Crawl 25 pages (no sitemap): 25 × $0.005 = $0.125
- Crawl 100 pages + sitemap: (100 × $0.005) + $0.01 = $0.51
- Crawl 500 pages + sitemap: (500 × $0.005) + $0.01 = $2.51
Author
Built by John Rippy | johnrippy.link
🏆 2025 Zapier Automation Hero of the Year — Project Phoenix: A 95-step AI sales pipeline cutting development time by 50%. Read more →
Keywords
website crawler, seo audit, sitemap crawler, markdown extraction, firecrawl alternative, llm content, web scraper, content extraction, technical seo