Pricing

$0.05 / actor start

Web Content Crawler — Generic Site Text Extractor

Generic web content crawler. Extract text content from any URL. Lightweight alternative for quick page scraping and data collection for AI training and research.

Pricing

$0.05 / actor start

Rating

0.0

(0)

Developer

Valdeir Lima

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Website Content Crawler & Text Extractor

Crawl any website and extract clean, structured content — page title, meta description, main text, headings and links — at scale. A fast, no-nonsense web scraper built on Cheerio (no browser overhead) for SEO research, competitor analysis, content monitoring, and feeding AI / RAG / LLM pipelines.

What this website scraper does

Give it one or more URLs and it returns clean data for every page:

Title and meta description (great for SEO audits)
First H1 heading
Full visible text (scripts, styles, nav and footer stripped out)
All outbound links on the page
Optional same-domain crawling to follow links automatically

No login, no API key. Optional Apify Proxy for sites that block datacenter IPs.

Use cases

Competitor research — scrape a competitor's site copy, landing pages and blog
SEO audits — pull titles, meta descriptions and headings across many URLs
AI / RAG ingestion — turn websites into clean text for embeddings and LLMs
Content monitoring — detect copy or pricing changes over time
Lead research — extract company descriptions and contact pages at scale

Input

Field	Type	Default	Description
`startUrls`	array	—	List of URLs to crawl
`maxPages`	integer	`10`	Maximum pages to crawl in total
`followLinks`	boolean	`false`	Follow same-domain links up to `maxPages`
`maxTextLength`	integer	`20000`	Truncate page text to this many characters
`useProxy`	boolean	`false`	Route through Apify Proxy to avoid IP blocks

Example input

{
  "startUrls": ["https://example.com"],
  "maxPages": 25,
  "followLinks": true
}

Output

{
  "url": "https://example.com/pricing",
  "title": "Pricing — Example",
  "description": "Simple, transparent pricing.",
  "h1": "Plans for every team",
  "textLength": 4213,
  "text": "Plans for every team ...",
  "linkCount": 38,
  "links": ["https://example.com/signup"],
  "crawledAt": "2026-06-05T10:00:00.000Z"
}

FAQ

Do I need an API key? No. It works out of the box.

Can it crawl JavaScript-heavy sites? It extracts server-rendered HTML. For content that only appears after heavy client-side rendering, enable useProxy and target the API or rendered URLs.

Will it get blocked? Most sites are fine. For strict sites, enable useProxy to route through Apify Proxy.

How do I crawl a whole site? Set followLinks: true and raise maxPages.

No-BS Content Crawler 🖕

successful_nonagon/no-bs-content-crawler

Fast web crawler that extracts clean text from websites. Returns readable content, headings, and links. Perfect for content aggregation, SEO research, and data collection.

hafsah nuzhat

5.0

Website Content Crawler Fast

timelody/website-content-crawler-fast

Scraping data from every single web page.

timelody

5.0

Website Content Crawler

rupom888/website-content-crawler

Syed Rupom

Website Content Crawler — Extract Full Site Content

oneary/website-content-crawler

🌐 Full website crawler that extracts structured content (text, headings, metadata, links, images) from any domain. Free platform compute pricing.

Luan M.

Web Text Extractor

rl1987/web-text-extractor

R.L.

Website Content Crawler

bhansalisoft/website-content-crawler

Website Content Crawler : scrap any website content with meta title and meta description and site logo

bhansalisoft

Pro Web Content Crawler (With Images)

assertive_analogy/pro-web-content-crawler

Pro Web Content Crawler is a powerful tool that digs deep into web content and images. It handles complex sites, dynamic pages, and hidden content, making it perfect for extracting both data and images. Customizable and API-ready for your unique data needs.

Gideon Nesh

260

5.0

Generic Html Scraper

daddyapi/generic-html-scraper

A lightweight, robust, and simple actor to fetch the raw HTML content of any URL

DaddyAPI

Enhanced Deep Content Crawler

assertive_analogy/advanced-crawler

A fast, Python-powered web crawler with smart content extraction, JS support, metadata capture, and duplicate detection. Ideal for SEO, content migration, and e-commerce scraping. Reliable, scalable, and easy to customize.

Gideon Nesh

1.0

Fast URL Content Crawler

6sigmag/fast-url-content-crawler

A high-performance web scraper that rapidly extracts and analyzes content from multiple URLs simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.