Deprecated

Pricing

Pay per event

See alternative Actors

Go to Apify Store

Deep Website Crawler (DEPRECATED)

Deprecated

See alternative Actors

DEPRECATED — use santamaria-automations/website-content-crawler instead. Same crawl behavior, richer output (clean AI/RAG-ready Markdown vs plain text).

Pricing

Pay per event

Rating

0.0

(0)

Developer

Ale

Actor stats

Bookmarked

Total users

Monthly active users

7 days ago

Last modified

Deep Website Crawler

Crawl any website to configurable depth and extract the title and full text content of every page. Give it a list of start URLs — it follows links level by level and returns one record per page. No API keys or login required.

How It Works

For each start URL you provide, the crawler:

Fetches the start page
Extracts all internal links from that page
Follows those links to the next depth level
Repeats until the configured depth or page limit is reached
Returns one record per crawled page with its title, text content, and crawl depth

Challenge pages (bot-protection walls) are skipped automatically so the run keeps going. Pages that return errors are logged and skipped.

Use with AI Agents (MCP)

Connect this actor to any MCP-compatible AI client — Claude Desktop, Claude.ai, Cursor, VS Code, LangChain, LlamaIndex, or custom agents.

Apify MCP server URL:

https://mcp.apify.com?tools=santamaria-automations/deep-website-crawler

Example prompt once connected:

"Use deep-website-crawler to crawl https://example.com to depth 2 and return all page titles and text as a table."

Clients that support dynamic tool discovery (Claude.ai, VS Code) will receive the full input schema automatically via add-actor.

Input Example

{
  "startUrls": [
    "https://acme-corp.com",
    "https://www.another-company.de/blog"
  ],
  "maxDepth": 2,
  "maxPagesPerCrawl": 100,
  "maxPagesPerDomain": 50
}

Both bare domains (acme-corp.com) and full URLs (https://acme-corp.com/about) are accepted.

Output Example

[
  {
    "url": "https://acme-corp.com",
    "title": "Acme Corp - Industrial Solutions",
    "text": "Acme Corp is a global leader in industrial solutions. Since 1950 we have...",
    "depth": 0,
    "start_url": "https://acme-corp.com",
    "links_found": 14,
    "status_code": 200,
    "content_length": 3842,
    "scraped_at": "2026-04-29T10:00:00Z"
  },
  {
    "url": "https://acme-corp.com/about",
    "title": "About Us - Acme Corp",
    "text": "Founded in 1950, Acme Corp has grown from a small family workshop into...",
    "depth": 1,
    "start_url": "https://acme-corp.com",
    "links_found": 8,
    "status_code": 200,
    "content_length": 2190,
    "scraped_at": "2026-04-29T10:00:01Z"
  }
]

Pricing

You pay per page crawled — only charged for pages you actually receive.

Event	Price	Description
Actor start	$0.25	Covers container startup
Page result	$0.0005	Per page crawled and returned

Example costs:

Pages crawled	Cost
0 pages	$0.25
100 pages	$0.30
1,000 pages	$0.75
10,000 pages	$5.25

No monthly fees. No minimum spend.

Input Parameters

Parameter	Type	Default	Description
`startUrls`	string[]	required	URLs to start crawling from
`maxDepth`	integer	2	Link levels deep to follow (0–5)
`maxPagesPerCrawl`	integer	100	Max total pages across all start URLs (1–500)
`maxPagesPerDomain`	integer	50	Max pages per unique domain (1–250)
`proxyConfiguration`	object	Apify proxy	Proxy settings

Output Fields

Field	Type	Description
`url`	string	Canonical URL of the crawled page
`title`	string	HTML title tag content
`text`	string	Visible plain text (truncated at 10,000 characters)
`depth`	integer	Crawl depth (0 = start URL, 1 = one link away, etc.)
`start_url`	string	The start URL that initiated this crawl path
`links_found`	integer	Internal links discovered on this page
`status_code`	integer	HTTP status code
`content_length`	integer	Characters in extracted text (before truncation)
`scraped_at`	string	ISO 8601 UTC timestamp

Tips

Depth 2 covers most websites — homepage → section pages → detail pages is typically enough for site audits and content extraction
Use maxPagesPerCrawl for budget control — set this lower than the theoretical maximum to cap spend on large sites
Depth 0 is just the start page — useful when you have a precise list of URLs and only need content extraction without following links
One record per page — each unique URL gets its own row, making it easy to filter, sort, or feed into downstream processing

Free Email Domain Scraper — extract email addresses from any domain
Website Contact Extractor — extract full contact records (email + phone + social + address)
SEO Metadata Extractor — extract meta title, description, canonical, and OG tags

Issues & Feature Requests

If something is not working or you're missing a feature, please open an issue and we'll look into it.

Website Content Crawler

apify/website-content-crawler

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Apify

131K

4.6

(205)

Intent Signal Tracker — Jobs, Tech & Funding

ryanclinton/intent-signal-tracker

Track buying signals across job postings, tech stack changes, funding, and content updates. Composite intent score per company. $0.05/company — replaces Clay's $495/mo Web Intent.

Ryan Clinton

1.0

(1)

Google Lens Scraper — Reverse Image Search

scrape.badger/google-lens-scraper

Reverse image search at scale via Google Lens: visual matches, exact matches, and shoppable product matches with price chips. Feed any public image URL, get structured JSON — title, source, source_favicon, thumbnail, price tag, in-stock flag, plus related search chips.

Scrape Badger

Website Tech Stack Detector — 100+ Technologies

ryanclinton/website-tech-stack-detector

Identify the technologies, frameworks, and services running on any website. Website Tech Stack Detector crawls one or more URLs, inspects HTTP headers, HTML meta tags, script sources, and body content, then matches them against a fingerprint database of 106 web technologies across 17 categories.

Ryan Clinton

All-in-One Facebook Scraper

get-leads/all-in-one-facebook-scraper

Facebook scraper — 12 modes: pages, posts, events, groups, search, reviews, comments, marketplace, reels & ads. HTTP-only, 256MB, fast. Premium residential proxy (~95% success rate). Up to 50% cheaper than alternatives. MCP-ready for AI agents.

Japi Cricket

Google Maps Scraper

get-leads/google-maps-scraper---best-value-for-money

Extract business leads from Google Maps with verified emails, phones & social profiles. 50+ niches (HVAC, dentists, lawyers, real estate & more). DNS/SMTP email verification. $0.003/lead — up to 10x cheaper than alternatives. HubSpot/Salesforce export. MCP-ready for AI agents.

Japi Cricket

NPI Registry Scraper | 7M+ US Healthcare Providers (CMS)

haketa/nppes-npi-registry-scraper

NPPES NPI Registry scraper & API: search 8M+ US healthcare providers and export NPI, name, taxonomy and specialty, practice and mailing address, phone, credentials and enumeration date. Healthcare provider data, verification and lead generation — fast, no login.

Haketa

npm Dependency Tree & License Scraper

taroyamada/open-source-license-dependency-audit

Scrape npm to map transitive dependency trees up to three levels deep. Extract exact license types, deprecation warnings, and active maintainer counts.

naoki anzai

Chinese Brand Monitor — Weibo+RedNote+Bilibili+Douban+Xueqiu

zhorex/chinese-brand-monitor

Track brand mentions across Weibo, Xiaohongshu (RedNote), Bilibili, Douban and Xueqiu in one normalized API call. Sentiment-tagged, cross-platform deduplicated. $0.045 per mention, pay-as-you-go. Synthesio/Brandwatch alternative for brand monitoring agencies, DTC China teams, and hedge funds.

Sami

5.0

(1)

Lead Enrichment Pipeline — 5-47x Cheaper Than Clay

ryanclinton/lead-enrichment-pipeline

All-in-one lead enrichment: email discovery, phone finding, verification, company research, and lead scoring in one run. CSV or JSON in, scored leads out. $0.12/lead — 5-47x cheaper than Clay.

Ryan Clinton

1.0

(1)

Watch Arbitrage Tracker — Rolex/Patek/AP × 13 Marketplaces

kazkn/watch-arbitrage-mcp

Cross-platform Patek/Rolex/AP arbitrage. Tracks 13 marketplaces: Chrono24, WatchBox, Bob's, Watchfinder, European Watch, Watches of Switzerland, Watch Club, Spliedt, A Collected Man, Analog:Shift, Bachmann & Scher, Yahoo Japan + Hodinkee. Telegram alerts on cross-country spreads. Pay-Per-Event.