universal-website-content-scraper
Pricing
from $2.00 / 1,000 results
universal-website-content-scraper
Powerful universal website scraper that extracts structured page titles, meta descriptions, H1–H3 headings, and clean main content. Smart content detection removes navigation and noise. Optional depth-controlled internal crawling. Ideal for SEO audits, AI preprocessing, research, and data pipelines.
Pricing
from $2.00 / 1,000 results
Rating
5.0
(1)
Developer

Techionik
Actor stats
1
Bookmarked
3
Total users
2
Monthly active users
9 days ago
Last modified
Categories
Share
UNIVERSAL WEBSITE CONTENT SCRAPER
A general-purpose website content scraper built using Crawlee (CheerioCrawler).
This Actor extracts clean, structured, human-readable content from standard HTML websites without requiring custom selectors.
It is designed to be simple, reliable, and easy to integrate with automation workflows.
PURPOSE
Extract structured content from general websites in a consistent and reusable format.
DATA EXTRACTED
- Page title
- Meta description
- Headings (H1 to H3)
- Main text content
- Page URL
HOW IT WORKS
- Starts from one or more provided URLs
- Automatically detects the main content area
- Removes navigation, footers, popups, and cookie banners
- Extracts readable text using a smart fallback strategy
- Optionally follows internal links with depth control
INPUT OPTIONS
Start URLs
- One or more URLs to begin scraping from
Crawl Links
- Enable or disable link crawling
Max Enqueue Depth
- Controls how deep link crawling goes
Same Domain Only
- Restricts crawling to the starting domain
Max Requests per Crawl
- Limits the number of pages processed per run
All inputs are configurable from the Apify Console.
OUTPUT
Each scraped page produces one dataset item containing:
- pageTitle
- metaDescription
- headings
- mainText
- pageUrl
An overview table is included for quick browsing of page titles and URLs.
TYPICAL USE CASES
- Website content extraction
- SEO and content audits
- Research and data collection
- AI and search preprocessing
- Website archiving
TECHNOLOGY STACK
- Apify SDK
- Crawlee (CheerioCrawler)
- Cheerio
- Mozilla Readability
NOTES
- Best suited for static and semi-static websites
- Not intended for heavily JavaScript-rendered applications
STATUS
Simple Clean Production-ready