# URL to markdown (`apify/url-to-markdown`) Actor

An Apify Actor that takes a URL as input and returns the content of the page in Markdown format.

- **URL**: https://apify.com/apify/url-to-markdown.md
- **Developed by:** [Apify](https://apify.com/apify) (Apify)
- **Categories:** Developer tools, AI, Automation
- **Stats:** 4 total users, 0 monthly users, 83.3% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.50 / 1,000 converted pages

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## URL to Markdown 
Extract content from any URL and convert it into clean Markdown ready for large language models (LLMs). This is ideal for retrieval-augmented generation (RAG) pipelines, AI training data, and knowledge-base ingestion.

### How to use URL to Markdown Converter
This Actor has one compulsory input: a URL you wish to convert.

Additionally, you can select the **Scraping mode**:
- The Raw HTTP mode (default) is the fastest and cheapest, but can't handle JavaScript.
- The Browser mode is more powerful and can handle JavaScript-heavy websites.

It will return an output with the following data:
- URL
- Markdown of the page
- Basic metadata

This Actor doesn't support pagination or crawling to discover new URLs. If you are looking to convert a whole website to Markdown, use the [Website Content Crawler](https://apify.com/apify/website-content-crawler) instead.

#### Input example
```json
{
   "url": "https://apify.com",
   "scrapingTool": "raw-http"
}
````

#### Output example

```json
[{
  "crawl": {
    "httpStatusCode": 200,
    "httpStatusMessage": "OK",
    "loadedAt": "2026-06-11T09:00:12.010Z",
    "uniqueKey": "I0mexdHttr",
    "requestStatus": "handled"
  },
  "metadata": {
    "title": "Apify: Full-stack web scraping and data extraction platform",
    "description": "Cloud platform for web scraping, browser automation, AI agents, and data for AI. Use 38,000+ ready-made tools, code templates, or order a custom solution.",
    "languageCode": "en",
    "url": "https://apify.com",
    "redirectedUrl": "https://apify.com/"
  },
  "query": "https://apify.com",
  "markdown": "Apify: Full-stack web scraping and data extraction platform\n\n"
}]
```

### How much does URL to Markdown cost?

The price per page depends on your Apify plan and the selected mode. The table below shows the prices for 1,000 URLs:
| Apify plan | Raw HTTP mode |Browser mode|
|------------|---------------------|---------------------|
| Free       | $3                 | $6                  |
| Starter    | $2                | $5                 |
| Scale      | $1.70                | $4                  |
| Business   | $1.50                | $3                  |

### What are the Use cases for URL to markdown?

- **Get clean training data for LLM:** get clean, structured Markdown ready for model fine-tuning
- **Enhance your LLM:** provide your [LLM with custom knowledge](https://blog.apify.com/custom-gpts-knowledge/) to make it more accurate
- **Implement Retrieval** Augmented Generation (RAG)

### Integrate URL to Markdown with your AI ecosystem

Use [Apify platform integrations](https://docs.apify.com/integrations) to connect URL to Markdown with third-party tools.

[video integrations tutorial](https://www.youtube.com/watch?v=bNACk1_S_6w)

Top integrations to look at are:

- [LangChain](https://github.com/hwchase17/langchain): the most popular framework for developing applications powered by language models
- [Pinecone](https://apify.com/apify/pinecone-integration): a vector database to store the crawled data for semantic search.
- [OpenRouter](https://apify.com/apify/openrouter): give you access to multiple AI models through a unified OpenAI-compatible interface

### FAQ

#### Why convert URLs to markdown?

Markdown is the perfect format to feed large language model (LLM). It is a less heavy format than HTML but still maintains the text structure like titles.

Using markdown instead of html can help you lower the AI token cost.

#### Can I use URL to Markdown with the Apify API?

The Apify API gives you programmatic access to the Apify platform. The API is organized around RESTful HTTP endpoints that enable you to manage, schedule, and run Apify Actors. The API also lets you access any datasets, monitor Actor performance, fetch results, create and update versions, and more.

To access the API using Node.js, use the `apify-client` npm package. To access the API using Python, use the `apify-client` PyPI package. Check out the [Apify API reference](https://docs.apify.com/api/v2) docs for all the details.

#### Can I use URL to Markdown through an MCP Server?

With Apify API, you can use almost any Actor in conjunction with an MCP server. You can connect to the MCP server using clients like ClaudeDesktop and LibreChat, or even build your own. Read all about how you can [set up Apify Actors with MCP](https://blog.apify.com/how-to-use-mcp/).

#### Is scraping legal?

Web scraping is generally legal if you scrape publicly available non-personal data. What you do with the data is another question. Documentation, help articles, or blogs are typically protected by copyright, so you can't republish the content without the owner's permission.

Learn more about the legality of web scraping in [this blog post](https://blog.apify.com/is-web-scraping-legal/). If you're not sure, please seek professional legal advice.

# Actor input Schema

## `url` (type: `string`):

Enter the URL of a specific web page to extract its content in Markdown format.

## `scrapingTool` (type: `string`):

Select the scraping mode for extracting the target web pages.

The Raw HTTP mode (default) is the fastest, but can't handle JavaScript.

The Browser mode is more powerful, can handle JavaScript heavy websites, but costs more.

## `proxyConfiguration` (type: `object`):

Apify Proxy configuration used for scraping the target web pages.

## `removeElementsCssSelector` (type: `string`):

A CSS selector matching HTML elements that will be removed from the DOM, before converting it to text, Markdown, or saving as HTML. This is useful to skip irrelevant page content. The value must be a valid CSS selector as accepted by the `document.querySelectorAll()` function.

By default, the Actor removes common navigation elements, headers, footers, modals, scripts, and inline image. You can disable the removal by setting this value to some non-existent CSS selector like `dummy_keep_everything`.

## `desiredConcurrency` (type: `integer`):

The desired number of web browsers running in parallel. The system automatically scales the number based on the CPU and memory usage. If the initial value is `0`, the Actor picks the number automatically based on the available memory.

## `debugMode` (type: `boolean`):

If enabled, the Actor will store debugging information into the resulting dataset under the `debug` field.

## Actor input object example

```json
{
  "url": "https://apify.com",
  "scrapingTool": "raw-http",
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "removeElementsCssSelector": "nav, footer, script, style, noscript, svg, img[src^='data:'],\n[role=\"alert\"],\n[role=\"banner\"],\n[role=\"dialog\"],\n[role=\"alertdialog\"],\n[role=\"region\"][aria-label*=\"skip\" i],\n[aria-modal=\"true\"]",
  "desiredConcurrency": 1,
  "debugMode": false
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "url": "https://apify.com",
    "proxyConfiguration": {
        "useApifyProxy": true
    },
    "removeElementsCssSelector": `nav, footer, script, style, noscript, svg, img[src^='data:'],
[role="alert"],
[role="banner"],
[role="dialog"],
[role="alertdialog"],
[role="region"][aria-label*="skip" i],
[aria-modal="true"]`
};

// Run the Actor and wait for it to finish
const run = await client.actor("apify/url-to-markdown").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "url": "https://apify.com",
    "proxyConfiguration": { "useApifyProxy": True },
    "removeElementsCssSelector": """nav, footer, script, style, noscript, svg, img[src^='data:'],
[role=\"alert\"],
[role=\"banner\"],
[role=\"dialog\"],
[role=\"alertdialog\"],
[role=\"region\"][aria-label*=\"skip\" i],
[aria-modal=\"true\"]""",
}

# Run the Actor and wait for it to finish
run = client.actor("apify/url-to-markdown").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "url": "https://apify.com",
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "removeElementsCssSelector": "nav, footer, script, style, noscript, svg, img[src^='\''data:'\''],\\n[role=\\"alert\\"],\\n[role=\\"banner\\"],\\n[role=\\"dialog\\"],\\n[role=\\"alertdialog\\"],\\n[role=\\"region\\"][aria-label*=\\"skip\\" i],\\n[aria-modal=\\"true\\"]"
}' |
apify call apify/url-to-markdown --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=apify/url-to-markdown",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "URL to markdown",
        "description": "An Apify Actor that takes a URL as input and returns the content of the page in Markdown format.",
        "version": "0.1",
        "x-build-id": "uMtWqotzYeoYIgtgm"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/apify~url-to-markdown/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-apify-url-to-markdown",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/apify~url-to-markdown/runs": {
            "post": {
                "operationId": "runs-sync-apify-url-to-markdown",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/apify~url-to-markdown/run-sync": {
            "post": {
                "operationId": "run-sync-apify-url-to-markdown",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "url"
                ],
                "properties": {
                    "url": {
                        "title": "URL",
                        "type": "string",
                        "description": "Enter the URL of a specific web page to extract its content in Markdown format."
                    },
                    "scrapingTool": {
                        "title": "Scraping mode",
                        "enum": [
                            "raw-http",
                            "browser-playwright"
                        ],
                        "type": "string",
                        "description": "Select the scraping mode for extracting the target web pages.\n\nThe Raw HTTP mode (default) is the fastest, but can't handle JavaScript.\n\nThe Browser mode is more powerful, can handle JavaScript heavy websites, but costs more.",
                        "default": "raw-http"
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify Proxy configuration used for scraping the target web pages.",
                        "default": {
                            "useApifyProxy": true
                        }
                    },
                    "removeElementsCssSelector": {
                        "title": "Remove HTML elements (CSS selector)",
                        "type": "string",
                        "description": "A CSS selector matching HTML elements that will be removed from the DOM, before converting it to text, Markdown, or saving as HTML. This is useful to skip irrelevant page content. The value must be a valid CSS selector as accepted by the `document.querySelectorAll()` function. \n\nBy default, the Actor removes common navigation elements, headers, footers, modals, scripts, and inline image. You can disable the removal by setting this value to some non-existent CSS selector like `dummy_keep_everything`.",
                        "default": "nav, footer, script, style, noscript, svg, img[src^='data:'],\n[role=\"alert\"],\n[role=\"banner\"],\n[role=\"dialog\"],\n[role=\"alertdialog\"],\n[role=\"region\"][aria-label*=\"skip\" i],\n[aria-modal=\"true\"]"
                    },
                    "desiredConcurrency": {
                        "title": "Desired browsing concurrency",
                        "minimum": 0,
                        "maximum": 50,
                        "type": "integer",
                        "description": "The desired number of web browsers running in parallel. The system automatically scales the number based on the CPU and memory usage. If the initial value is `0`, the Actor picks the number automatically based on the available memory.",
                        "default": 1
                    },
                    "debugMode": {
                        "title": "Enable debug mode",
                        "type": "boolean",
                        "description": "If enabled, the Actor will store debugging information into the resulting dataset under the `debug` field.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
