Get started
Product
Back
Start here!
Get data with ready-made web scrapers for popular websites
Browse 19,192 Actors
Apify platform
Apify Store
Pre-built web scraping tools
Actors
Build and run serverless programs
Integrations
Connect with apps and services
MCP
Give your AI access to Actors
Anti-blocking
Scrape without getting blocked
Proxy
Rotate scraper IP addresses
Open source
Crawlee
Web scraping and crawling library
Solutions
MCP server configuration
Configure your Apify MCP server with Actors and tools for seamless integration with MCP clients.
Start building
Web data for
Enterprise
Startups
Universities
Nonprofits
Use cases
Data for generative AI
Data for AI agents
Lead generation
Market research
View more →
Consulting
Apify Professional Services
Apify Partners
Developers
Documentation
Full reference for the Apify platform
Code templates
Python, JavaScript, and TypeScript
Web scraping academy
Courses for beginners and experts
Monetize your code
Publish your scrapers and get paid
Learn
API reference
CLI
SDK
Earn from your code
$596k paid out in December. Many developers earn $3k+ every month.
Start earning now
Resources
Help and support
Advice and answers about Apify
Actor ideas
Get inspired to build Actors
Changelog
See what’s new on Apify
Customer stories
Find out how others use Apify
Company
About Apify
Contact us
Blog
Live events
Partners
Jobs
We're hiring!
Join our Discord
Talk to scraping experts
Pricing
Contact sales
Sitemap Content Crawler
Pay per usage
consummate_mandala/sitemap-content-crawler
Rating
0.0
(0)
Developer
Donny Nguyen
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
AI
Share
tri_angle/sitemap-change-orchestrator
Monitor website sitemaps for new, updated, or removed URLs. Integration with the Website Content Crawler (WCC) allows feeding only relevant URLs. This ensures your web crawls are efficient, targeted, and resource-optimized, keeping your datasets fresh for any application.
Tri⟁angle
39
tomas.gabik/updated-content-checker
Monitors sitemaps for new/updated content. Returns only URLs modified since a specified date for efficient incremental scraping.
Tomáš Gabík
3
apify/website-content-crawler
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
Apify
102K
4.5
(167)
powerful_bachelor/website-metadata-extractor
🔍 Website Metadata Extractor 🌐 Extract essential website data: meta tags, robots.txt, and sitemap.xml in one scan. 📊 Analyze SEO elements, crawler directives, and site structure. ✅ Perfect for SEO audits, 🔎 competitor research, and 🚀 understanding how search engines view your website.
Powerful Bachelor
7
filip_cicvarek/cnn-article-scraper
Extract CNN articles by category or search query with date filtering. Scrape news from politics, business, world, tech, sports, and more. Get structured data: title, author, publication date, full content. Perfect for media monitoring, research, and content analysis.
Filip Cicvárek
20
5.0
(3)
alizarin_refrigerator-owner/website-crawler
Crawl websites for SEO audits. Extracts HTML, title, meta tags, headings, links, & text content from pages. Automatic sitemap detection & parsing Extracts metadata (title, description, OG tags) Heading structure (H1, H2, H3) Internal & external link analysis Image extraction w/alt text Word count
John Rippy
29
nocodeventure/seo-data-extractor
Extract comprehensive SEO metadata, headings, links, images, Open Graph tags, Twitter Cards, and technical data from websites. Perfect for SEO audits, competitor analysis, and content optimization. Runs on Apify platform with structured JSON output.
No-Code Venture
16
salesblaster-ai/website-content-crawler
Extract contact information + turn any website into clean, structured content ready for LLM's (e.g. AI lead magnets, RAG pipelines, and outbound personalization). Most web scrapers dump raw HTML or unstructured text. This crawler is purpose-built for LLM's, and optimized for lead generation.
SalesBlaster AI
tropical_quince/website-content-crawler-rag
eunit/sqoosh-image-compressor
Optimize images for SEO with the Squoosh Actor on Apify. Batch compress, resize, and convert images to WebP, AVIF, and MozJPEG to boost site speed and Core Web Vitals. Automate high-performance image optimization for web scraping and developer workflows with ease.
Emmanuel Uchenna
5
Description
JSON Example
Sitemap URL
sitemapUrl
Optional
URL of the sitemap.xml file
Max Pages
maxPages
Maximum number of pages to crawl from the sitemap