Pricing

Pay per usage

Notion Page & Database Scraper

Scrape public Notion pages and databases. Extract text, headings, lists, code blocks, tables, images, and linked sub-pages. Convert output to Markdown, HTML, or structured JSON. Ideal for knowledge base backup, content migration, and RAG pipelines.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Ricardo Akiyoshi

Actor stats

Bookmarked

151

Total users

Monthly active users

79 days

Issues response

2 months ago

Last modified

What it does

This actor visits public Notion pages (both notion.so and notion.site domains), parses the rendered HTML, and extracts all content blocks including:

Text paragraphs with formatting (bold, italic, code, links)
Headings (H1, H2, H3)
Bullet lists and numbered lists
To-do / checkbox lists
Code blocks with language detection
Tables and databases (inline and full-page)
Images, embeds, and callouts
Toggle blocks (collapsible content)
Linked sub-pages (automatically followed when enabled)
Page metadata (title, icon, cover, last edited)

Use cases

Knowledge base backup — Export your team's Notion workspace to Markdown files for version control or offline access
Content migration — Move Notion content to another CMS, wiki, or documentation platform
RAG / LLM pipelines — Feed Notion content into vector databases (Pinecone, Weaviate, Chroma) for retrieval-augmented generation
Documentation archival — Create snapshots of public documentation pages for compliance or reference
Competitive intelligence — Monitor public Notion pages from competitors, startups, or industry resources
SEO analysis — Extract and analyze content structure from Notion-hosted websites
Data extraction — Pull structured data from Notion databases (tables, boards, galleries)

Input configuration

Parameter	Type	Default	Description
`pageUrls`	array	(required)	List of public Notion page URLs to scrape
`includeSubpages`	boolean	`true`	Follow links to child pages within the same workspace
`maxPages`	integer	`100`	Maximum number of pages to scrape (including sub-pages)
`outputFormat`	string	`"markdown"`	Output format: `markdown`, `html`, or `json`
`proxyConfiguration`	object	—	Apify proxy settings for avoiding rate limits

Output format

Each scraped page produces a dataset item with:

{
  "pageTitle": "My Notion Page",
  "url": "https://www.notion.so/My-Page-abc123",
  "content": "# My Notion Page\n\nThis is the page content in markdown...",
  "contentHtml": "<h1>My Notion Page</h1><p>This is the page content...</p>",
  "blocks": [
    { "type": "heading_1", "text": "My Notion Page" },
    { "type": "paragraph", "text": "This is the page content..." }
  ],
  "blockCount": 42,
  "subPages": [
    { "title": "Child Page", "url": "https://www.notion.so/Child-Page-def456" }
  ],
  "subPageCount": 3,
  "images": ["https://..."],
  "lastEdited": "2026-01-15T10:30:00.000Z",
  "icon": "📄",
  "cover": "https://...",
  "depth": 0,
  "scrapedAt": "2026-03-02T12:00:00.000Z"
}

The content field contains the output in your chosen format (Markdown by default). The blocks array is always included for structured access regardless of output format.

Example usage

Scrape a single page as Markdown

{
  "pageUrls": ["https://www.notion.so/My-Public-Page-abc123def456"],
  "outputFormat": "markdown"
}

Scrape an entire workspace (up to 500 pages)

{
  "pageUrls": ["https://www.notion.so/Workspace-Root-abc123def456"],
  "includeSubpages": true,
  "maxPages": 500,
  "outputFormat": "json"
}

Scrape without following sub-pages

{
  "pageUrls": [
    "https://www.notion.so/Page-One-abc123",
    "https://www.notion.so/Page-Two-def456"
  ],
  "includeSubpages": false,
  "outputFormat": "html"
}

Supported Notion URL formats

https://www.notion.so/Page-Title-{id}
https://www.notion.so/{workspace}/Page-Title-{id}
https://{workspace}.notion.site/Page-Title-{id}
https://notion.so/{id}

Tips for best results

Pages must be public — This actor cannot access private Notion pages. Make sure "Share to web" is enabled in Notion.
Use proxies for large scrapes — If scraping 100+ pages, enable Apify proxy to avoid rate limiting.
Start with a small maxPages — Test with 5-10 pages first to verify the output matches your needs.
Markdown for RAG — If building an LLM/RAG pipeline, Markdown output gives the cleanest text with preserved structure.
JSON for databases — When scraping Notion databases (tables), JSON output preserves column types and cell values.

Pricing

This actor uses pay-per-event pricing. You are charged $0.004 per page scraped (approximately $4 per 1,000 pages). Sub-pages count as individual pages.

Limitations

Only works with public Notion pages (shared to web)
Notion's dynamic rendering may delay content loading — the actor handles this with retry logic
Very large databases (10,000+ rows) may require multiple runs with pagination
Embedded content from third-party services (e.g., Google Docs, Figma) is captured as links, not content

Support

If you encounter issues or have feature requests, please open an issue on the actor's page or contact the developer.

Integration — Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("sovereigntaylor/notion-scraper").call(run_input={
    "searchTerm": "notion",
    "maxResults": 50
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{item.get('title', item.get('name', 'N/A'))}")

Integration — JavaScript

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const run = await client.actor('sovereigntaylor/notion-scraper').call({
    searchTerm: 'notion',
    maxResults: 50
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(item => console.log(item.title || item.name || 'N/A'));

Notion API - Database & Page Automation

alizarin_refrigerator-owner/notion-api---database-page-automation

Automate your Notion workspace with the official API. Query databases, create and update pages, manage blocks, search content, and sync data. Perfect for CMS automation, content pipelines, project management, and knowledge base workflows.

The Howlers

Notion to Markdown

lucen_data/notion-to-md

Convert Notion pages to clean Markdown with automatic image download, hosting, and optimization.

Lucen

HTML to Markdown

web.harvester/html-to-markdown

Convert HTML to clean Markdown. Supports GFM tables, code blocks, and custom rules. Perfect for content migration and documentation.

Web Harvester

Notion Automation Toolkit

waxlike_polecat/notion-automation-toolkit

Automate your Notion workspace - Export, backup, migrate, and manage your Notion content at scale with 9 powerful automation tasks.

Mohamed Ali DHIBA

Notion MCP Server

constant_quadruped/notion-mcp-server

Notion MCP Server for AI agents (Claude, GPT, Cursor). 20 Notion API operations: search pages, query databases, create content, manage blocks, comments, users. Model Context Protocol server with full CRUD support.

Website to Markdown Converter for LLM Training

pink_comic/website-content-to-markdown

Convert any web page to clean Markdown. Strips nav, ads, scripts, styling. Preserves headings, lists, tables, code blocks, links. Perfect for LLM training data, RAG pipelines, content migration, documentation archival, and text analysis. Bulk processing with word/link/image counts.

Ava Torres

Notion Uploader

filip_cicvarek/notion-uploader

Upload data into a specified Notion database. It dynamically maps data from any Actor / Dataset to your Notion properties, ensuring your database stays up-to-date with the latest information.

Filip Cicvárek

5.0

HTML to Markdown Converter - Bulk Web Content to MD

santamaria-automations/html-to-markdown

Extract main article content from any website and convert to clean Markdown including headings, links, images, tables, and code blocks. Perfect for LLM training, AI pipelines, and documentation. Export data, run via API, schedule and monitor runs, or integrate with other tools.

Ale

Website Content Crawler

crawlerbros/website-content-crawler

Crawls websites and extracts clean text, markdown, or HTML content. Ideal for LLM training data, RAG pipelines, and knowledge base building.

Crawler Bros

5.0

Notion Marketplace Scraper

webdatalabs/notion-marketplace-scraper

Scrape templates, categories, ratings, and creator profiles from Notion's official template marketplace. Perfect for competitive analysis, market research, creator monitoring, and discovering trending Notion templates.