# Webpage to Markdown (`epicscrapers/webpage-to-markdown`) Actor

Get the main content of any page as Markdown. Great for LLMs and AI agent workflows.

- **URL**: https://apify.com/epicscrapers/webpage-to-markdown.md
- **Developed by:** [Epic Scrapers](https://apify.com/epicscrapers) (community)
- **Categories:** AI, Agents, Integrations
- **Stats:** 2 total users, 2 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Webpage to Markdown

Extract clean **Markdown content** from any webpage. This **Apify Actor** converts HTML pages into well-formatted Markdown — perfect for feeding to LLMs, creating knowledge bases, or archiving web content in a readable format.

---

### What can Webpage to Markdown do?

-  **Extract main content only** — Removes navigation, ads, and clutter using intelligent content detection
-  **Clean Markdown output** — Properly formatted headings, lists, links, and code blocks
-  **Works on any website** — No site-specific configuration needed
-  **Fast & lightweight** — Single-page scraping optimized for quick results
-  **API & scheduling ready** — Run manually, via API, or on a schedule

---

### What data can you extract from webpages?

| Field | Type | Description |
|-------|------|-------------|
| `url` | String | The source URL of the webpage |
| `content` | String | The extracted Markdown content from the page |

---

### How to scrape webpages with Webpage to Markdown

1. **Open the Actor** → Go to [Webpage to Markdown](https://apify.com/store) on Apify
2. **Paste a URL** → Enter any webpage URL in the **URL** input field
3. **Click Run** → The Actor will fetch and parse the page
4. **Download results** → Get your Markdown as JSON, or export to CSV/Excel

#### Example Input

```json
{
  "url": "https://docs.apify.com/academy/scraping-basics-javascript"
}
````

***

### Input

The Actor accepts a simple JSON input:

| Field | Required | Type | Description |
|-------|----------|------|-------------|
| `url` | ✅ | String | The full URL of the webpage to convert (e.g., `https://example.com/article`) |

#### Input Schema

You can also view the input schema in the **Input** tab of the Apify Console.

***

### Output

#### Example JSON Output

```json
{
  "url": "https://docs.apify.com/academy/scraping-basics-javascript",
  "content": "## Scraping basics in JavaScript\n\nThis lesson covers the fundamentals...\n\n### What you'll learn\n\n- How to send HTTP requests\n- How to parse HTML with Cheerio\n- How to extract data from pages\n"
}
```

#### Export Formats

Results are stored in **Apify Datasets** and can be exported as:

- **JSON** — Default format with full content
- **CSV** — For spreadsheet applications (truncated long fields)
- **Excel** — For Microsoft Excel users
- **HTML** — For viewing in browsers
- **XML** — For XML-based workflows

***

### Is it legal to scrape webpages?

This Actor **only scrapes publicly available data** — content that any visitor can see without logging in. It does NOT extract:

- ❌ Content behind paywalls or authentication
- ❌ Copyrighted material beyond fair use

⚖️ **Important:** Always respect website Terms of Service and `robots.txt` files. This tool is designed for legitimate use cases like:

- Creating LLM training datasets from your own content
- Archiving public articles for research
- Building knowledge bases from documentation

***

### Why use Webpage to Markdown instead of copy-paste?

| Manual Copy-Paste | Webpage to Markdown |
|-------------------|---------------------|
| Includes ads, navigation, sidebars | ✅ Extracts only main article content |
| Messy formatting | ✅ Clean, structured Markdown |
| No metadata | ✅ Includes source URL |
| Manual work | ✅ Automated & scalable |
| Can't schedule | ✅ Schedule runs via API or Apify platform |

***

### FAQ

#### What websites does this work on?

This Actor works on any publicly accessible webpage. It's designed for article pages, blog posts, documentation, and content pages. Results may vary on heavily JavaScript-rendered sites or pages with unusual HTML structures.

#### Can I scrape multiple pages at once?

This Actor is designed for **single-page extraction**. For batch processing, use Apify's API or SDK to queue multiple runs, or check out the [Website Content Crawler](https://apify.com/apify/website-content-crawler) for full-site scraping.

#### Does this work with paywalled content?

No — this Actor only extracts content visible without authentication. It respects robots.txt and is not designed for bypassing paywalls.

#### Can I integrate this with my workflow?

Yes! Use Apify's **API**, **webhooks**, or integrations with **Zapier**, **Make**, **n8n**, and more to connect results to your apps.

***

### Support

Need help or have a feature request?

- 🐛 **Issues:** Report bugs in the Apify Console Issues tab
- 💡 **Feature requests:** Contact via Apify or Discord
- 📧 **Custom solutions:** Open to custom Actor development

***

### Apify Platform Features

This Actor benefits from the full Apify platform:

- 📅 **Scheduling** — Run daily, hourly, or custom schedules
- 🔌 **API access** — Trigger via REST API
- 🔗 **Integrations** — Connect to Zapier, Make, n8n, Google Sheets
- 🔄 **Proxy rotation** — Automatic proxy handling (if needed)
- ☁️ **Cloud storage** — Managed datasets with export options
- 🔔 **Monitoring & alerts** — Get notified on errors or completion

***

Built with ❤️ for the Apify community.

# Actor input Schema

## `url` (type: `string`):

URL of the webpage

## Actor input object example

```json
{
  "url": "https://docs.apify.com/academy/scraping-basics-javascript"
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "url": "https://docs.apify.com/academy/scraping-basics-javascript"
};

// Run the Actor and wait for it to finish
const run = await client.actor("epicscrapers/webpage-to-markdown").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "url": "https://docs.apify.com/academy/scraping-basics-javascript" }

# Run the Actor and wait for it to finish
run = client.actor("epicscrapers/webpage-to-markdown").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "url": "https://docs.apify.com/academy/scraping-basics-javascript"
}' |
apify call epicscrapers/webpage-to-markdown --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=epicscrapers/webpage-to-markdown",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Webpage to Markdown",
        "description": "Get the main content of any page as Markdown. Great for LLMs and AI agent workflows.",
        "version": "0.0",
        "x-build-id": "WB4vif9SBBEYRFu5y"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/epicscrapers~webpage-to-markdown/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-epicscrapers-webpage-to-markdown",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/epicscrapers~webpage-to-markdown/runs": {
            "post": {
                "operationId": "runs-sync-epicscrapers-webpage-to-markdown",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/epicscrapers~webpage-to-markdown/run-sync": {
            "post": {
                "operationId": "run-sync-epicscrapers-webpage-to-markdown",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "url"
                ],
                "properties": {
                    "url": {
                        "title": "URL",
                        "type": "string",
                        "description": "URL of the webpage"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
