Pricing

from $2.80 / 1,000 page reads

Web Page to Markdown & Text - URL Reader for LLMs & RAG

Read any web page as clean text + Markdown for LLMs and automations. Strips ads, nav, and scripts; returns the main content, metadata (title, author, date, word count), and an optional AI TL;DR + key points. The web-reading primitive for AI agents, RAG pipelines, and no-code flows.

Pricing

from $2.80 / 1,000 page reads

Rating

0.0

(0)

Developer

AIDevs

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

AI Web Page Reader

Convert any URL into clean, LLM-ready text + Markdown in one call — the web-reading primitive for AI agents, RAG pipelines, and no-code automations.

Give it a page URL and it strips ads, navigation, and scripts, isolates the main content, and returns clean text, Markdown, and page metadata — plus an optional AI summary. It's the fast single-page alternative to a full-site crawler.

Why AI Web Page Reader

AI agents and automations constantly need to "read this page" and get text an LLM can actually use. Doing that well means removing boilerplate (menus, cookie banners, footers) and converting messy HTML into clean Markdown. This Actor does exactly that, predictably, in a single call.

One call, one record — no crawling, no configuration.
LLM-ready — clean text and Markdown, with metadata (title, author, date, word count).
Cheap, high-volume — a tiny per-read price designed for machine-driven, repeat usage.

When to use it

RAG ingestion of a specific article, doc page, or knowledge-base entry.
Research / chat agents that fetch a URL and need its readable content.
No-code flows (Make, Zapier, n8n) that pass a URL and store clean content.
Quick reader-mode + summarize of any article.

When NOT to use it

Crawling a whole site (many pages) — use a deep crawler; this reads one URL.
Heavily client-rendered apps that need full JS execution and interaction.
Login-gated pages — it fetches as an anonymous visitor.

Built for

AI engineers, RAG/LLM developers, automation builders, and anyone who wants a reliable "URL → clean text" tool.

How it works

Fetch the page at url with a real browser-like user agent.
Extract metadata — title, description, author/byline, site name, published time, language, and OG image.
Clean — remove scripts, styles, nav, header, footer, ads, cookie/newsletter/share widgets.
Isolate main content — prefer <article>/<main>/content containers; otherwise pick the densest text block.
Convert to Markdown (headings, lists, links, bold/italic, blockquotes, images) and derive well-spaced plain text.
(Optional) Summarize with your OpenAI key.
Output one record; usage is billed per event.

How to call it

From the Console

Paste a URL into Page URL, optionally enable Generate AI summary with your OpenAI key, click Start, and read the Output tab.

From the API

POST https://api.apify.com/v2/acts/entranced_gelato~ai-web-page-reader/runs?token=<APIFY_TOKEN>
{
  "url": "https://en.wikipedia.org/wiki/Web_scraping",
  "includeMarkdown": true,
  "summarize": false
}

Also callable over MCP as an agent tool.

Input reference

Field	Type	Required	Default	Description
`url`	string	Yes	—	The public web page to read.
`includeMarkdown`	boolean	No	`true`	Also return a clean Markdown version.
`summarize`	boolean	No	`false`	Generate an AI TL;DR + key points (needs `openaiApiKey`).
`openaiApiKey`	string (secret)	No	—	Your OpenAI key; used only for the summary.
`model`	string	No	`gpt-4o-mini`	OpenAI model for the summary.
`maxChars`	integer	No	`0`	Cap returned text/markdown length (`0` = no limit).

Output reference

One dataset record per run:

Field	Description
`url`	The page URL that was read.
`title`	Page/article title.
`byline`	Author, if detected.
`siteName`	Publisher/site name (OG).
`publishedTime`	Published date, if available.
`lang`	Page language.
`description`	Meta description.
`image`	OG image URL.
`wordCount`	Word count of the extracted text.
`content`	Clean plain text.
`markdown`	LLM-ready Markdown.
`summary`	AI TL;DR (only when summarization is enabled).
`keyPoints`	Array of key points (only when summarization is enabled).
`fetchedAt`	ISO timestamp of the run.

Pricing

Pay per event — you only pay for what you run:

Page read — charged once per successful run (one page).
AI summary — a small premium that applies only when you enable summarization. You supply your own OpenAI key, so the model's cost is billed by OpenAI separately and is never added to the Actor price.

Apify platform/compute usage is included in the per-event price. See the Pricing tab for current rates.

Integrations

LangChain / LlamaIndex — load content/markdown into vector stores and RAG chains.
Make / Zapier / n8n — URL in, clean content out.
MCP — expose as a tool for autonomous agents.

🔌 Integrations & code examples

Call it from the API

curl "https://api.apify.com/v2/acts/entranced_gelato~ai-web-page-reader/run-sync-get-dataset-items?token=<APIFY_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://en.wikipedia.org/wiki/Web_scraping", "includeMarkdown": true }'

Python (Apify client)

from apify_client import ApifyClient

client = ApifyClient("<APIFY_TOKEN>")
run = client.actor("entranced_gelato/ai-web-page-reader").call(
    run_input={"url": "https://example.com/article", "includeMarkdown": True}
)
item = next(client.dataset(run["defaultDatasetId"]).iterate_items())
print(item["title"], "->", item["wordCount"], "words")
print(item["markdown"][:500])

LangChain (load one page into a RAG chain)

from langchain_community.utilities import ApifyWrapper
from langchain_core.documents import Document

apify = ApifyWrapper()
loader = apify.call_actor(
    actor_id="entranced_gelato/ai-web-page-reader",
    run_input={"url": "https://example.com/article"},
    dataset_mapping_function=lambda i: Document(
        page_content=i["markdown"] or i["content"] or "",
        metadata={"source": i["url"], "title": i.get("title")},
    ),
)
docs = loader.load()

MCP — add it to Claude, Cursor, or any agent

{
  "mcpServers": {
    "apify": {
      "command": "npx",
      "args": ["-y", "@apify/actors-mcp-server", "--actors", "entranced_gelato/ai-web-page-reader"],
      "env": { "APIFY_TOKEN": "<APIFY_TOKEN>" }
    }
  }
}

Also works with LlamaIndex, Make, Zapier, and n8n — pass a URL, get clean content back into any workflow.

Example output

{
  "url": "https://en.wikipedia.org/wiki/Web_scraping",
  "title": "Web scraping",
  "byline": null,
  "siteName": "Wikipedia",
  "wordCount": 3412,
  "content": "Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites...",
  "markdown": "# Web scraping\n\nWeb scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites...",
  "fetchedAt": "2026-07-02T07:20:00.000Z"
}

FAQ

Will it run JavaScript-heavy pages? It fetches server-rendered HTML. Pages that render entirely client-side may return limited content.

Markdown or plain text? Both — markdown for rich formatting, content for plain text. Disable Markdown with includeMarkdown: false.

How is it different from a content crawler? It reads exactly one URL, fast and cheap — ideal as an agent/automation primitive rather than a bulk crawl.

Limitations

Single page per run (no crawling).
No JS execution / interaction.
Public pages only (no auth).

Use it in n8n, Make & Zapier (no code)

This Actor is a drop-in web-reading step for no-code workflows. In n8n, add the Apify node (or an HTTP Request node) and call entranced_gelato/ai-web-page-reader with {"url": "https://example.com"} - the run returns markdown, text, and metadata fields you can map into any downstream node.

n8n workflow 1 - URL to Slack summary. Webhook trigger receives a URL, the Apify node runs this Actor with summarize: true, and a Slack node posts the TL;DR + key points to your channel. Great for team link-sharing bots.

n8n workflow 2 - RAG ingestion pipeline. Schedule trigger reads URLs from a Google Sheet, the Apify node converts each page to Markdown, a Code node chunks the markdown field, and a Pinecone or Qdrant node upserts embeddings. Your vector DB stays fresh without writing a scraper.

n8n workflow 3 - Competitor blog monitor. RSS trigger fires on new competitor posts, this Actor extracts the clean article text, an LLM node classifies relevance and drafts a summary, and Gmail sends you a digest.

No Apify node? POST your input JSON straight to https://api.apify.com/v2/acts/entranced_gelato~ai-web-page-reader/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN from any HTTP node. The same pattern works in Make, Zapier, LangChain, and as an MCP tool for AI agents.

Website to Markdown

cool_ya/website-to-markdown

Convert any web page into clean, LLM-ready Markdown. Strips nav, ads and boilerplate and returns the main article text plus title, description and word count. Perfect for RAG and AI pipelines.

Y A

Web to Markdown — AI-Ready Text from Any URL

wsgcjj/web-to-markdown

Convert any web page URL to clean Markdown format. Perfect for LLM training data, RAG pipelines, and AI content processing. Extracts main content, strips ads/nav/footers.

陈俊杰

Website to Markdown Crawler - Full-Site Text for LLMs & RAG

entranced_gelato/website-to-markdown-crawler

Crawl any website from a start URL and get every page as clean text + Markdown for LLMs, RAG, and AI agents. Follows internal links with depth and page limits, strips nav and ads, and returns one structured record per page. A fast, no-config site-to-Markdown crawler.

AIDevs

Article Extractor — Clean Web Content to Markdown/Text

omao/article-extractor

Extract the main article from any web page into clean Markdown or text, with title, author, date and description. Strips nav, ads and boilerplate. Fast, no setup.

Marouane Oulabass

AI Web-to-Markdown Extract API — URL to Clean JSON for LLMs

olican/ai-web-to-markdown-extract

Scrapes any webpage, automatically cleans HTML clutter (nav, footers, scripts, ads, cookie consent banners), and transforms the main content into clean, structured Markdown for LLMs and RAG.

Sergio Calvo

5.0

LLM-Ready Web Extractor

phantom_horse/my-actor-1

Turn any web page into clean, LLM-ready Markdown. Strips scripts, nav, and page chrome, then converts the main content to tidy Markdown with title, meta description, and token counts. Perfect for AI prompts and RAG ingestion pipelines.

NATNAEL FIKRE

LLM-Ready Web Extractor: URL to Clean Markdown & JSON

f0rty7even/llm-web-extractor

Turn any web page or site into clean, LLM-ready Markdown and structured JSON for RAG, agents, and fine-tuning. Strips nav/ads/boilerplate; returns main content + metadata.

Michael Yousrie

Webpage to Markdown

technicaldost/webpage-to-clean-markdown

Convert any web page into clean, LLM-ready Markdown. Strips ads, nav and boilerplate, keeping headings, links, tables and code. Perfect for RAG pipelines and AI agents.