Pricing

$0.20 / 1,000 url results

Website Content Extractor Scraper

Extract title, body text, links, emails, images, metadata, raw HTML, or clean markdown from one or more URLs using Scrappa's Web Scraper API.

Pricing

$0.20 / 1,000 url results

Rating

0.0

(0)

Developer

Scrappa

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

What You Get

URL to JSON extraction with title, meta description, keywords, favicon, social links, body text, detected languages, links, emails, phone numbers, and images
Optional raw HTML with include_html=true
HTML to Markdown output with response_type=markdown
One Apify dataset item per processed URL
Per-URL success/error metadata so a failed target URL does not hide successful URLs in the same batch
site_status_code kept separate from actor/API failures, so target-site 404/500 responses can be handled downstream

Input

Batch input is preferred because it keeps Apify run overhead low:

{
  "urls": [
    "https://example.com",
    "https://www.iana.org/domains/reserved"
  ],
  "include_html": false,
  "response_type": "json"
}

Backward-compatible single URL input is also supported:

{
  "url": "https://example.com",
  "response_type": "markdown"
}

Output

JSON mode writes one dataset item per URL:

{
  "success": true,
  "input_url": "https://example.com",
  "request_url": "https://example.com",
  "response_type": "json",
  "include_html": false,
  "site_status_code": 200,
  "url": "https://example.com",
  "final_url": "https://example.com",
  "title": "Example Domain",
  "description": "This domain is for use in illustrative examples.",
  "body_text": "Example Domain This domain is for use in illustrative examples in documents.",
  "links_count": 1,
  "emails_count": 0,
  "phone_numbers_count": 0,
  "images_count": 0,
  "languages_detected": ["en"],
  "data": {
    "title": "Example Domain",
    "links": ["https://www.iana.org/domains/example"]
  }
}

Markdown mode normalizes Scrappa's plain-text response into a dataset item:

{
  "success": true,
  "input_url": "https://example.com",
  "request_url": "https://example.com",
  "response_type": "markdown",
  "include_html": false,
  "url": "https://example.com",
  "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "markdown_length": 89
}

If one URL fails validation or Scrappa returns an infrastructure error, the actor writes an error row for that URL and continues with the rest of the batch.

High-Volume Use

For high-volume website content extraction or direct Web Scraper API access, call Scrappa directly at https://scrappa.co/api/web-scraper. This Apify actor is optimized for marketplace workflows and batched URL extraction, not for running a separate Apify job per page.

Web Page to Markdown Extractor

fetch_cat/web-page-to-markdown-extractor

Convert public URLs into clean Markdown, text, metadata, links, images, and optional HTML for AI and automation workflows.

Hanna Nosova

Web Text Extractor

rl1987/web-text-extractor

R.L.

Article Extractor — Clean Web Content to Markdown/Text

omao/article-extractor

Extract the main article from any web page into clean Markdown or text, with title, author, date and description. Strips nav, ads and boilerplate. Fast, no setup.

Marouane Oulabass

Website Content Crawler — Text, Markdown & HTML for AI/LLM

hichemdev/website-content-crawler

Crawl any website and extract clean text, Markdown, and HTML from every page — ready for LLM, RAG, and AI ingestion.

Hichem Ben Moussa

HTML to Markdown Converter API — URL or Raw HTML to GFM

eliai/html-to-markdown

Convert HTML to Markdown via API. Input: a URL or raw HTML string. Output: clean GitHub-flavored Markdown with headings, links, lists, tables, and code blocks preserved. Sync run-and-return. Cheap pay-per-result: $0.02 per page converted.

Anthony Snider

Website to Markdown Crawler for LLM & RAG

logiover/website-text-markdown-crawler

Crawl any website to clean Markdown and plain text for LLM training and RAG. HTML to Markdown, no API or login. Export website text to CSV or JSON.

Logiover

RAG Web Extractor — Clean Markdown, HTML & Chunks

junipr/rag-web-extractor

Extract clean website content for RAG and AI search. Crawl pages, remove boilerplate, preserve structure, and export markdown, HTML, text, JSON, and chunks.

junipr

AI-Ready Webpage Extractor

s3nafps/ai-ready-webpage-extractor

Convert public web pages into clean Markdown, text, links, tables, images, metadata, and JSON-LD for AI workflows.

mohamed senator

Web Article Extractor — Clean Reader Mode Text & Metadata

maged120/reader-mode

Extract clean, readable article content from any web page. Strips ads, navigation, and clutter — returns title, author, full body text, and publish date in structured JSON.

Maged

Website Content Extractor for RAG: Markdown, HTML, Text

nezha/website-content-crawler

Turn docs sites, help centers, blogs, and websites into clean markdown, text, or HTML for RAG, AI knowledge bases, and internal search. Crawl from start URLs or sitemaps and keep the crawl in scope.