Notion Page & Database Scraper
Pricing
Pay per usage
Notion Page & Database Scraper
Scrape public Notion pages and databases. Extract text, headings, lists, code blocks, tables, images, and linked sub-pages. Convert output to Markdown, HTML, or structured JSON. Ideal for knowledge base backup, content migration, and RAG pipelines.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Ricardo Akiyoshi
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 hours ago
Last modified
Categories
Share
Scrape any public Notion page or database and extract structured content. Outputs clean Markdown, HTML, or structured JSON — perfect for knowledge base backup, content migration, RAG/LLM pipelines, and documentation archival.
What it does
This actor visits public Notion pages (both notion.so and notion.site domains), parses the rendered HTML, and extracts all content blocks including:
- Text paragraphs with formatting (bold, italic, code, links)
- Headings (H1, H2, H3)
- Bullet lists and numbered lists
- To-do / checkbox lists
- Code blocks with language detection
- Tables and databases (inline and full-page)
- Images, embeds, and callouts
- Toggle blocks (collapsible content)
- Linked sub-pages (automatically followed when enabled)
- Page metadata (title, icon, cover, last edited)
Use cases
- Knowledge base backup — Export your team's Notion workspace to Markdown files for version control or offline access
- Content migration — Move Notion content to another CMS, wiki, or documentation platform
- RAG / LLM pipelines — Feed Notion content into vector databases (Pinecone, Weaviate, Chroma) for retrieval-augmented generation
- Documentation archival — Create snapshots of public documentation pages for compliance or reference
- Competitive intelligence — Monitor public Notion pages from competitors, startups, or industry resources
- SEO analysis — Extract and analyze content structure from Notion-hosted websites
- Data extraction — Pull structured data from Notion databases (tables, boards, galleries)
Input configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
pageUrls | array | (required) | List of public Notion page URLs to scrape |
includeSubpages | boolean | true | Follow links to child pages within the same workspace |
maxPages | integer | 100 | Maximum number of pages to scrape (including sub-pages) |
outputFormat | string | "markdown" | Output format: markdown, html, or json |
proxyConfiguration | object | — | Apify proxy settings for avoiding rate limits |
Output format
Each scraped page produces a dataset item with:
{"pageTitle": "My Notion Page","url": "https://www.notion.so/My-Page-abc123","content": "# My Notion Page\n\nThis is the page content in markdown...","contentHtml": "<h1>My Notion Page</h1><p>This is the page content...</p>","blocks": [{ "type": "heading_1", "text": "My Notion Page" },{ "type": "paragraph", "text": "This is the page content..." }],"blockCount": 42,"subPages": [{ "title": "Child Page", "url": "https://www.notion.so/Child-Page-def456" }],"subPageCount": 3,"images": ["https://..."],"lastEdited": "2026-01-15T10:30:00.000Z","icon": "📄","cover": "https://...","depth": 0,"scrapedAt": "2026-03-02T12:00:00.000Z"}
The content field contains the output in your chosen format (Markdown by default). The blocks array is always included for structured access regardless of output format.
Example usage
Scrape a single page as Markdown
{"pageUrls": ["https://www.notion.so/My-Public-Page-abc123def456"],"outputFormat": "markdown"}
Scrape an entire workspace (up to 500 pages)
{"pageUrls": ["https://www.notion.so/Workspace-Root-abc123def456"],"includeSubpages": true,"maxPages": 500,"outputFormat": "json"}
Scrape without following sub-pages
{"pageUrls": ["https://www.notion.so/Page-One-abc123","https://www.notion.so/Page-Two-def456"],"includeSubpages": false,"outputFormat": "html"}
Supported Notion URL formats
https://www.notion.so/Page-Title-{id}https://www.notion.so/{workspace}/Page-Title-{id}https://{workspace}.notion.site/Page-Title-{id}https://notion.so/{id}
Tips for best results
- Pages must be public — This actor cannot access private Notion pages. Make sure "Share to web" is enabled in Notion.
- Use proxies for large scrapes — If scraping 100+ pages, enable Apify proxy to avoid rate limiting.
- Start with a small
maxPages— Test with 5-10 pages first to verify the output matches your needs. - Markdown for RAG — If building an LLM/RAG pipeline, Markdown output gives the cleanest text with preserved structure.
- JSON for databases — When scraping Notion databases (tables), JSON output preserves column types and cell values.
Pricing
This actor uses pay-per-event pricing. You are charged $0.004 per page scraped (approximately $4 per 1,000 pages). Sub-pages count as individual pages.
Limitations
- Only works with public Notion pages (shared to web)
- Notion's dynamic rendering may delay content loading — the actor handles this with retry logic
- Very large databases (10,000+ rows) may require multiple runs with pagination
- Embedded content from third-party services (e.g., Google Docs, Figma) is captured as links, not content
Support
If you encounter issues or have feature requests, please open an issue on the actor's page or contact the developer.
Integration — Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("sovereigntaylor/notion-scraper").call(run_input={"searchTerm": "notion","maxResults": 50})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{item.get('title', item.get('name', 'N/A'))}")
Integration — JavaScript
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('sovereigntaylor/notion-scraper').call({searchTerm: 'notion',maxResults: 50});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(item => console.log(item.title || item.name || 'N/A'));