Web Page to Markdown & Text - URL Reader for LLMs & RAG
Pricing
from $20.00 / 1,000 page reads
Web Page to Markdown & Text - URL Reader for LLMs & RAG
Read any web page as clean text + Markdown for LLMs and automations. Strips ads, nav, and scripts; returns the main content, metadata (title, author, date, word count), and an optional AI TL;DR + key points. The web-reading primitive for AI agents, RAG pipelines, and no-code flows.
Pricing
from $20.00 / 1,000 page reads
Rating
0.0
(0)
Developer
AIDevs
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 hours ago
Last modified
Categories
Share
AI Web Page Reader
Convert any URL into clean, LLM-ready text + Markdown in one call — the web-reading primitive for AI agents, RAG pipelines, and no-code automations.
Give it a page URL and it strips ads, navigation, and scripts, isolates the main content, and returns clean text, Markdown, and page metadata — plus an optional AI summary. It's the fast single-page alternative to a full-site crawler.
Why AI Web Page Reader
AI agents and automations constantly need to "read this page" and get text an LLM can actually use. Doing that well means removing boilerplate (menus, cookie banners, footers) and converting messy HTML into clean Markdown. This Actor does exactly that, predictably, in a single call.
- One call, one record — no crawling, no configuration.
- LLM-ready — clean text and Markdown, with metadata (title, author, date, word count).
- Cheap, high-volume — a tiny per-read price designed for machine-driven, repeat usage.
When to use it
- RAG ingestion of a specific article, doc page, or knowledge-base entry.
- Research / chat agents that fetch a URL and need its readable content.
- No-code flows (Make, Zapier, n8n) that pass a URL and store clean content.
- Quick reader-mode + summarize of any article.
When NOT to use it
- Crawling a whole site (many pages) — use a deep crawler; this reads one URL.
- Heavily client-rendered apps that need full JS execution and interaction.
- Login-gated pages — it fetches as an anonymous visitor.
Built for
AI engineers, RAG/LLM developers, automation builders, and anyone who wants a reliable "URL → clean text" tool.
How it works
- Fetch the page at
urlwith a real browser-like user agent. - Extract metadata — title, description, author/byline, site name, published time, language, and OG image.
- Clean — remove scripts, styles, nav, header, footer, ads, cookie/newsletter/share widgets.
- Isolate main content — prefer
<article>/<main>/content containers; otherwise pick the densest text block. - Convert to Markdown (headings, lists, links, bold/italic, blockquotes, images) and derive well-spaced plain text.
- (Optional) Summarize with your OpenAI key.
- Output one record; usage is billed per event.
How to call it
From the Console
Paste a URL into Page URL, optionally enable Generate AI summary with your OpenAI key, click Start, and read the Output tab.
From the API
POST https://api.apify.com/v2/acts/entranced_gelato~ai-web-page-reader/runs?token=<APIFY_TOKEN>{"url": "https://en.wikipedia.org/wiki/Web_scraping","includeMarkdown": true,"summarize": false}
Also callable over MCP as an agent tool.
Input reference
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | — | The public web page to read. |
includeMarkdown | boolean | No | true | Also return a clean Markdown version. |
summarize | boolean | No | false | Generate an AI TL;DR + key points (needs openaiApiKey). |
openaiApiKey | string (secret) | No | — | Your OpenAI key; used only for the summary. |
model | string | No | gpt-4o-mini | OpenAI model for the summary. |
maxChars | integer | No | 0 | Cap returned text/markdown length (0 = no limit). |
Output reference
One dataset record per run:
| Field | Description |
|---|---|
url | The page URL that was read. |
title | Page/article title. |
byline | Author, if detected. |
siteName | Publisher/site name (OG). |
publishedTime | Published date, if available. |
lang | Page language. |
description | Meta description. |
image | OG image URL. |
wordCount | Word count of the extracted text. |
content | Clean plain text. |
markdown | LLM-ready Markdown. |
summary | AI TL;DR (only when summarization is enabled). |
keyPoints | Array of key points (only when summarization is enabled). |
fetchedAt | ISO timestamp of the run. |
Pricing
Pay per event — you only pay for what you run:
- Page read — charged once per successful run (one page).
- AI summary — a small premium that applies only when you enable summarization. You supply your own OpenAI key, so the model's cost is billed by OpenAI separately and is never added to the Actor price.
Apify platform/compute usage is included in the per-event price. See the Pricing tab for current rates.
Integrations
- LangChain / LlamaIndex — load
content/markdowninto vector stores and RAG chains. - Make / Zapier / n8n — URL in, clean content out.
- MCP — expose as a tool for autonomous agents.
🔌 Integrations & code examples
Call it from the API
curl "https://api.apify.com/v2/acts/entranced_gelato~ai-web-page-reader/run-sync-get-dataset-items?token=<APIFY_TOKEN>" \-H "Content-Type: application/json" \-d '{ "url": "https://en.wikipedia.org/wiki/Web_scraping", "includeMarkdown": true }'
Python (Apify client)
from apify_client import ApifyClientclient = ApifyClient("<APIFY_TOKEN>")run = client.actor("entranced_gelato/ai-web-page-reader").call(run_input={"url": "https://example.com/article", "includeMarkdown": True})item = next(client.dataset(run["defaultDatasetId"]).iterate_items())print(item["title"], "->", item["wordCount"], "words")print(item["markdown"][:500])
LangChain (load one page into a RAG chain)
from langchain_community.utilities import ApifyWrapperfrom langchain_core.documents import Documentapify = ApifyWrapper()loader = apify.call_actor(actor_id="entranced_gelato/ai-web-page-reader",run_input={"url": "https://example.com/article"},dataset_mapping_function=lambda i: Document(page_content=i["markdown"] or i["content"] or "",metadata={"source": i["url"], "title": i.get("title")},),)docs = loader.load()
MCP — add it to Claude, Cursor, or any agent
{"mcpServers": {"apify": {"command": "npx","args": ["-y", "@apify/actors-mcp-server", "--actors", "entranced_gelato/ai-web-page-reader"],"env": { "APIFY_TOKEN": "<APIFY_TOKEN>" }}}}
Also works with LlamaIndex, Make, Zapier, and n8n — pass a URL, get clean content back into any workflow.
Example output
{"url": "https://en.wikipedia.org/wiki/Web_scraping","title": "Web scraping","byline": null,"siteName": "Wikipedia","wordCount": 3412,"content": "Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites...","markdown": "# Web scraping\n\nWeb scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites...","fetchedAt": "2026-07-02T07:20:00.000Z"}
FAQ
Will it run JavaScript-heavy pages? It fetches server-rendered HTML. Pages that render entirely client-side may return limited content.
Markdown or plain text? Both — markdown for rich formatting, content for plain text. Disable Markdown with includeMarkdown: false.
How is it different from a content crawler? It reads exactly one URL, fast and cheap — ideal as an agent/automation primitive rather than a bulk crawl.
Limitations
- Single page per run (no crawling).
- No JS execution / interaction.
- Public pages only (no auth).
See also
- AI Document Reader - PDF, DOCX, or file URL to clean text + Markdown.
- AI Competitive Brief Generator - any company URL to a competitive, SEO, or sales brief.