universal-website-content-scraper avatar

universal-website-content-scraper

Pricing

from $2.00 / 1,000 results

Go to Apify Store
universal-website-content-scraper

universal-website-content-scraper

Powerful universal website scraper that extracts structured page titles, meta descriptions, H1–H3 headings, and clean main content. Smart content detection removes navigation and noise. Optional depth-controlled internal crawling. Ideal for SEO audits, AI preprocessing, research, and data pipelines.

Pricing

from $2.00 / 1,000 results

Rating

5.0

(1)

Developer

Techionik

Techionik

Maintained by Community

Actor stats

1

Bookmarked

3

Total users

2

Monthly active users

9 days ago

Last modified

Share

UNIVERSAL WEBSITE CONTENT SCRAPER

A general-purpose website content scraper built using Crawlee (CheerioCrawler).

This Actor extracts clean, structured, human-readable content from standard HTML websites without requiring custom selectors.

It is designed to be simple, reliable, and easy to integrate with automation workflows.


PURPOSE

Extract structured content from general websites in a consistent and reusable format.


DATA EXTRACTED

  1. Page title
  2. Meta description
  3. Headings (H1 to H3)
  4. Main text content
  5. Page URL

HOW IT WORKS

  1. Starts from one or more provided URLs
  2. Automatically detects the main content area
  3. Removes navigation, footers, popups, and cookie banners
  4. Extracts readable text using a smart fallback strategy
  5. Optionally follows internal links with depth control

INPUT OPTIONS

Start URLs

  • One or more URLs to begin scraping from

Crawl Links

  • Enable or disable link crawling

Max Enqueue Depth

  • Controls how deep link crawling goes

Same Domain Only

  • Restricts crawling to the starting domain

Max Requests per Crawl

  • Limits the number of pages processed per run

All inputs are configurable from the Apify Console.


OUTPUT

Each scraped page produces one dataset item containing:

  • pageTitle
  • metaDescription
  • headings
  • mainText
  • pageUrl

An overview table is included for quick browsing of page titles and URLs.


TYPICAL USE CASES

  1. Website content extraction
  2. SEO and content audits
  3. Research and data collection
  4. AI and search preprocessing
  5. Website archiving

TECHNOLOGY STACK

  • Apify SDK
  • Crawlee (CheerioCrawler)
  • Cheerio
  • Mozilla Readability

NOTES

  • Best suited for static and semi-static websites
  • Not intended for heavily JavaScript-rendered applications

STATUS

Simple Clean Production-ready