# Medium Article Scraper (`junipr/medium-scraper`) Actor

Scrape Medium articles by tag, author, or publication. Extracts article titles, content, authors, publication dates, tags, and metadata via Medium's RSS feeds. Supports multiple content formats (text, HTML, markdown), date filtering, and batch processing across multiple sources.

- **URL**: https://apify.com/junipr/medium-scraper.md
- **Developed by:** [junipr](https://apify.com/junipr) (community)
- **Categories:** Developer tools, News
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$3.90 / 1,000 article scrapeds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Medium Scraper

Extract articles from Medium by tag, author, publication, or search query. Get full content, claps, read time, author data, and engagement metrics. Output as **text, HTML, or markdown** — perfect for content research and AI training data.

### What Can You Extract?

| Field | Description |
|-------|-------------|
| `title` / `subtitle` | Article title and subtitle |
| `content` | Full article text, HTML, or markdown |
| `author` | Name, username, bio, follower count |
| `publication` | Publication name and URL |
| `claps` | Number of claps (engagement metric) |
| `responses` | Number of responses/comments |
| `readTime` | Estimated reading time (minutes) |
| `tags` | Topic tags |
| `isMemberOnly` | Whether article is behind paywall |
| `publishedDate` | Publication date (ISO 8601) |
| `featuredImage` | Cover image URL |
| `wordCount` | Approximate word count |
| `language` | Detected language |

### How to Use

**Scrape by tag (zero-config):**
```json
{}
````

Extracts 50 articles tagged "web-scraping" by default.

**Scrape specific articles:**

```json
{
  "articleUrls": [
    "https://medium.com/@user/article-title-abc123",
    "https://betterprogramming.pub/some-article-def456"
  ]
}
```

**Scrape by author:**

```json
{
  "authorUrls": ["https://medium.com/@towardsdatascience"],
  "maxArticlesPerSource": 100,
  "contentFormat": "markdown"
}
```

**Scrape by publication with filters:**

```json
{
  "publicationUrls": ["https://betterprogramming.pub"],
  "minClaps": 500,
  "dateFrom": "2025-01-01",
  "memberOnly": "free"
}
```

### Input Configuration

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `articleUrls` | array | `[]` | Direct Medium article URLs |
| `tags` | array | `["web-scraping"]` | Medium topic tags |
| `authorUrls` | array | `[]` | Author profile URLs |
| `publicationUrls` | array | `[]` | Publication homepage URLs |
| `searchQueries` | array | `[]` | Search queries |
| `maxArticlesPerSource` | integer | `50` | Max articles per tag/author/publication |
| `includeContent` | boolean | `true` | Extract full article content |
| `contentFormat` | string | `"text"` | `text`, `html`, or `markdown` |
| `dateFrom` | string | — | Filter: only articles after this date |
| `dateTo` | string | — | Filter: only articles before this date |
| `memberOnly` | string | `"all"` | `all`, `free`, or `member_only` |
| `minClaps` | integer | `0` | Minimum clap count filter |
| `sortBy` | string | `"relevance"` | `relevance`, `latest`, or `most_clapped` |

### Output Example

```json
{
  "url": "https://medium.com/@user/the-complete-guide-abc123",
  "articleId": "abc123",
  "title": "The Complete Guide to Web Scraping in 2026",
  "subtitle": "Everything you need to know",
  "author": {
    "name": "John Smith",
    "username": "johnsmith",
    "url": "https://medium.com/@johnsmith"
  },
  "content": "Full article text content...",
  "readTime": 12,
  "claps": 4500,
  "responses": 23,
  "tags": ["web-scraping", "python", "automation"],
  "isMemberOnly": false,
  "publishedDate": "2026-02-15T10:30:00.000Z",
  "wordCount": 3200,
  "featuredImage": "https://miro.medium.com/...",
  "scrapedAt": "2026-03-11T12:00:00.000Z"
}
```

### Edge Cases

- **Member-only articles:** Metadata (title, claps, tags) extracted; content availability depends on server-side rendering
- **Custom domain publications:** Detected as Medium via meta tags, extracted normally
- **Rate limiting (429):** Automatic retry with 2s → 5s → 10s backoff, proxy rotation
- **Deleted articles:** `error: "ARTICLE_NOT_FOUND"` returned, no PPE charge
- **Non-English articles:** Content extracted with Unicode support, language detected

### Pricing

**$3.90 per 1,000 articles scraped** (PPE — pay per article successfully extracted)

Pricing includes all platform compute costs — no hidden fees.

| Run Type | Articles | Cost |
|----------|----------|------|
| Tag research | 50 | $0.20 |
| Author archive | 200 | $0.78 |
| Publication scrape | 1,000 | $3.90 |
| AI training dataset | 50,000 | $195.00 |

### FAQ

#### Can I scrape member-only (paywalled) articles?

Metadata (title, author, claps, tags) is extractable for all articles. Full content extraction for member-only articles depends on what Medium renders without authentication.

#### What content formats are supported?

Plain text (cleaned, readable), HTML (raw article HTML), and markdown (converted from HTML structure).

#### Can I filter articles by date or popularity?

Yes: `dateFrom`/`dateTo` for date range, `minClaps` for engagement threshold.

#### Does it work with custom domain publications?

Yes. Publications on custom domains (e.g., betterprogramming.pub) are detected as Medium via meta tags and extracted normally.

#### Can I use this for AI training data?

Yes — text or markdown output is directly usable for fine-tuning and RAG pipelines. Respect author copyright: data for analysis only, not republication.

***

Legal notice: Medium's ToS prohibits automated access, but article content is publicly available and indexed by search engines. Use responsibly. Do not circumvent authentication or republish scraped content.

Related actors by Junipr: [Spotify Playlist Scraper](https://apify.com/junipr/spotify-playlist) | [AI Content Detector](https://apify.com/junipr/ai-content-detector)

# Actor input Schema

## `articleUrls` (type: `array`):

Direct Medium article URLs to scrape.

## `tags` (type: `array`):

Medium tags to scrape articles from (e.g. 'web-scraping', 'python', 'startup').

## `authorUrls` (type: `array`):

Medium author profile URLs (e.g. https://medium.com/@username).

## `publicationUrls` (type: `array`):

Medium publication URLs (e.g. https://betterprogramming.pub).

## `searchQueries` (type: `array`):

Search Medium for articles matching these queries.

## `maxArticlesPerSource` (type: `integer`):

Maximum articles to extract per tag, author, or publication. Min: 1, Max: 1000.

## `includeContent` (type: `boolean`):

Extract full article content.

## `contentFormat` (type: `string`):

Content output format: text, html, or markdown.

## `dateFrom` (type: `string`):

Only articles published after this date (ISO 8601, e.g. 2024-01-01).

## `dateTo` (type: `string`):

Only articles published before this date (ISO 8601).

## `memberOnly` (type: `string`):

Filter by member-only status: all, free, or member\_only.

## `minClaps` (type: `integer`):

Only return articles with at least this many claps.

## `sortBy` (type: `string`):

Sort order for tag/search results.

## `proxyConfiguration` (type: `object`):

Proxy settings for requests.

## Actor input object example

```json
{
  "articleUrls": [],
  "tags": [
    "web-scraping"
  ],
  "authorUrls": [],
  "publicationUrls": [],
  "searchQueries": [],
  "maxArticlesPerSource": 1,
  "includeContent": false,
  "contentFormat": "text",
  "memberOnly": "all",
  "minClaps": 0,
  "sortBy": "relevance",
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# Actor output Schema

## `results` (type: `string`):

Article title, subtitle, full content (text/HTML/markdown), author profile, claps, responses, read time, tags, publication, member-only status, and publication date.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("junipr/medium-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("junipr/medium-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call junipr/medium-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=junipr/medium-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Medium Article Scraper",
        "description": "Scrape Medium articles by tag, author, or publication. Extracts article titles, content, authors, publication dates, tags, and metadata via Medium's RSS feeds. Supports multiple content formats (text, HTML, markdown), date filtering, and batch processing across multiple sources.",
        "version": "1.0",
        "x-build-id": "DZHEzt7jhTDNW7DGk"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/junipr~medium-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-junipr-medium-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/junipr~medium-scraper/runs": {
            "post": {
                "operationId": "runs-sync-junipr-medium-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/junipr~medium-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-junipr-medium-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "articleUrls": {
                        "title": "Article URLs",
                        "type": "array",
                        "description": "Direct Medium article URLs to scrape.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "tags": {
                        "title": "Tags",
                        "type": "array",
                        "description": "Medium tags to scrape articles from (e.g. 'web-scraping', 'python', 'startup').",
                        "default": [
                            "web-scraping"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "authorUrls": {
                        "title": "Author URLs",
                        "type": "array",
                        "description": "Medium author profile URLs (e.g. https://medium.com/@username).",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "publicationUrls": {
                        "title": "Publication URLs",
                        "type": "array",
                        "description": "Medium publication URLs (e.g. https://betterprogramming.pub).",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "searchQueries": {
                        "title": "Search Queries",
                        "type": "array",
                        "description": "Search Medium for articles matching these queries.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxArticlesPerSource": {
                        "title": "Max Articles Per Source",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum articles to extract per tag, author, or publication. Min: 1, Max: 1000.",
                        "default": 1
                    },
                    "includeContent": {
                        "title": "Include Content",
                        "type": "boolean",
                        "description": "Extract full article content.",
                        "default": false
                    },
                    "contentFormat": {
                        "title": "Content Format",
                        "enum": [
                            "text",
                            "html",
                            "markdown"
                        ],
                        "type": "string",
                        "description": "Content output format: text, html, or markdown.",
                        "default": "text"
                    },
                    "dateFrom": {
                        "title": "Date From",
                        "type": "string",
                        "description": "Only articles published after this date (ISO 8601, e.g. 2024-01-01)."
                    },
                    "dateTo": {
                        "title": "Date To",
                        "type": "string",
                        "description": "Only articles published before this date (ISO 8601)."
                    },
                    "memberOnly": {
                        "title": "Member Only Filter",
                        "enum": [
                            "all",
                            "free",
                            "member_only"
                        ],
                        "type": "string",
                        "description": "Filter by member-only status: all, free, or member_only.",
                        "default": "all"
                    },
                    "minClaps": {
                        "title": "Minimum Claps",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Only return articles with at least this many claps.",
                        "default": 0
                    },
                    "sortBy": {
                        "title": "Sort By",
                        "enum": [
                            "relevance",
                            "latest",
                            "most_clapped"
                        ],
                        "type": "string",
                        "description": "Sort order for tag/search results.",
                        "default": "relevance"
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Proxy settings for requests.",
                        "default": {
                            "useApifyProxy": true
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
