# Medium Articles Scraper (`datacach/medium-scraper`) Actor

Extract Medium articles by keyword at scale — no account needed. Returns title, author, publication, clap count, reading time, tags, and direct URLs as structured JSON.

- **URL**: https://apify.com/datacach/medium-scraper.md
- **Developed by:** [DataCach](https://apify.com/datacach) (community)
- **Categories:** News, Social media, Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.99 / 1,000 articles

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Medium Article Scraper

**Extract articles, authors, and publication data from [Medium](https://medium.com) by keyword — no account required.** Search Medium at scale and collect rich metadata for every result: title, URL, author, publication, clap count, reading time, tags, and more. Try it directly on the [Apify platform](https://apify.com/actors) without writing a single line of code.

### What does Medium Article Scraper do?

The Medium Article Scraper lets you **search Medium for articles by keyword** and collect structured data for each result. You provide one or more search terms and a result limit; the scraper handles pagination automatically and delivers each article as a clean JSON record ready to export or integrate.

It retrieves data that is **publicly visible on Medium's search page** — no login, cookies, or API key needed.

### Why use Medium Article Scraper?

Medium does not offer a public search API, making it hard to extract content data at scale. This scraper solves that. Common use cases:

- **Content research** — Find the top-performing articles on any topic to inform your editorial strategy.
- **Competitive analysis** — Monitor what your competitors or industry leaders are publishing.
- **SEO & keyword research** — Discover trending titles, subtitles, and tags for any search term.
- **Lead generation** — Identify authors and publications active in your niche.
- **Data journalism** — Analyse publishing trends, clap counts, and reading times across topics.
- **Training datasets** — Collect article metadata for NLP or recommendation models.

Running on the Apify platform gives you scheduling, monitoring, cloud storage, and integrations with 1,500+ tools (Zapier, Make, Google Sheets, etc.) out of the box.

### How to use Medium Article Scraper

1. **Open the Actor** on [Apify Console](https://console.apify.com/actors) and click **Try for free**.
2. In the **Input** tab, enter your search keywords (e.g. `python`, `machine learning`).
3. Set **Max Results per Keyword** — how many articles you want per keyword.
4. Click **Start** and wait a few seconds.
5. Go to the **Output** tab to preview results, or click **Export** to download as JSON, CSV, or Excel.

You can also run the scraper via the [Apify API](https://docs.apify.com/api/v2) or schedule it to run automatically on a cron.

### Input

Configure the scraper in the **Input** tab or via JSON:

| Field | Type | Description | Default |
|-------|------|-------------|---------|
| `search_keywords` | Array of strings | Keywords to search for on Medium. Each keyword is scraped independently. | `["python", "docker"]` |
| `max_results` | Integer | Maximum number of articles to collect **per keyword**. Total output ≤ keywords × max_results. | `100` |
| `start_page` | Integer | Page number to start scraping from for each keyword. Useful to skip already-collected results or resume from a specific offset. | `1` |

**Example input:**

```json
{
  "search_keywords": ["machine learning", "kubernetes", "python"],
  "max_results": 50,
  "start_page": 3
}
````

This would return up to 150 articles (50 per keyword), starting from page 3 of each keyword's search results.

### Output

Each article is saved as a JSON object in the Apify dataset. You can download the dataset in **JSON, CSV, Excel, or XML** from the Output tab or via the API.

**Example output record:**

```json
{
  "title": "How a Few Great Python Libraries Helped Me Build Faster and Think Bigger",
  "mediumUrl": "https://blog.stackademic.com/how-a-few-great-python-libraries-helped-me-build-faster-and-think-bigger-3e65d5d95a23",
  "visibility": "LOCKED",
  "isPublished": true,
  "isLocked": true,
  "clapCount": 34,
  "readingTime": 5.47,
  "firstPublishedAt": 1778302377518,
  "extendedPreviewContent": {
    "subtitle": "Automation Changed Everything"
  },
  "creator": {
    "name": "Muhummad Zaki",
    "username": "Muhummadzaki"
  },
  "collection": {
    "name": "Stackademic",
    "domain": "blog.stackademic.com",
    "subscriberCount": 83405
  },
  "tags": [
    { "displayTitle": "Python" },
    { "displayTitle": "Data Science" }
  ]
}
```

### Data Fields

| Field | Type | Description |
|-------|------|-------------|
| `title` | String | Article title |
| `mediumUrl` | String | Direct URL to the article |
| `uniqueSlug` | String | Unique URL-friendly identifier |
| `visibility` | String | `LOCKED` (members only) or `PUBLIC` |
| `isPublished` | Boolean | Whether the post is live |
| `isLocked` | Boolean | Whether the post requires a Medium membership |
| `isSeries` | Boolean | Whether the post belongs to a series |
| `clapCount` | Number | Total claps received |
| `readingTime` | Number | Estimated reading time in minutes |
| `firstPublishedAt` | Number | Unix timestamp (ms) of first publication |
| `latestPublishedAt` | Number | Unix timestamp (ms) of last update |
| `pinnedAt` | Number | Unix timestamp (ms) if pinned, else `0` |
| `creator` | Object | Author details: `name`, `username`, `imageId` |
| `collection` | Object | Publication details: `name`, `domain`, `description`, `subscriberCount` |
| `tags` | Array | List of tags with display titles |
| `extendedPreviewContent` | Object | Article subtitle and preview flag |
| `previewImage` | Object | Preview image metadata |
| `postResponses` | Object | Number of responses/comments |

### Tips and Advanced Options

**Large keyword lists**
For broad research campaigns, prefer fewer keywords with a higher `max_results` value over many keywords with a low limit — you'll get more consistent, deeper coverage per topic.

**Scheduling recurring runs**
Use Apify's built-in scheduler to run the scraper daily or weekly and monitor trending articles in your niche over time.

**Exporting to Google Sheets / Zapier / Make**
Connect the dataset directly from the Apify Console **Integrations** tab — no code required.

**Crash recovery**
If a run is interrupted, restarting it automatically resumes from where it left off rather than starting over, saving both time and compute costs.

### FAQ, Disclaimers, and Support

**Is scraping Medium legal?**
This scraper only accesses data that is **publicly visible** on Medium's search results — the same data any user sees without logging in. It does not bypass paywalls, access member-only content, or collect personal data beyond what Medium displays publicly. Always comply with [Medium's Terms of Service](https://policy.medium.com/medium-terms-of-service-9db0094a1e0f) and applicable data protection laws in your jurisdiction.

**Why are some articles marked `isLocked: true`?**
Medium marks paywalled articles (requiring a membership to read in full) as `LOCKED`. The scraper collects their metadata but does not access the full article body.

**The scraper returned fewer results than expected.**
Medium's search index does not always return the maximum number of results for every query. Niche keywords may have fewer matching articles in total. Try broader keywords or verify the result count by searching manually on Medium.

**Can I scrape a specific publication or author?**
Not directly — the current version searches by keyword only. Reach out if you need filtering by publication, author, or date range.

**Found a bug or need a custom feature?**
Open an issue in the [Issues tab](https://console.apify.com/actors) on Apify, or reach out for a custom scraping solution.

# Actor input Schema

## `search_keywords` (type: `array`):

One or more keywords to search for on Medium. Each keyword is scraped independently up to the Max Results limit.

## `max_results` (type: `integer`):

Maximum number of articles to collect per keyword. Total output will be at most (keywords × max\_results) items.

## `start_page` (type: `integer`):

Page number to start scraping from for each keyword. Use this to resume from a specific offset or skip already-collected results. Defaults to 1 (the beginning).

## Actor input object example

```json
{
  "search_keywords": [
    "python",
    "docker"
  ],
  "max_results": 100,
  "start_page": 1
}
```

# Actor output Schema

## `dataset` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "search_keywords": [
        "python",
        "docker"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("datacach/medium-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "search_keywords": [
        "python",
        "docker",
    ] }

# Run the Actor and wait for it to finish
run = client.actor("datacach/medium-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "search_keywords": [
    "python",
    "docker"
  ]
}' |
apify call datacach/medium-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=datacach/medium-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Medium Articles Scraper",
        "description": "Extract Medium articles by keyword at scale — no account needed. Returns title, author, publication, clap count, reading time, tags, and direct URLs as structured JSON.",
        "version": "0.0",
        "x-build-id": "RZw720tDq1Y7hqe0Y"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/datacach~medium-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-datacach-medium-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/datacach~medium-scraper/runs": {
            "post": {
                "operationId": "runs-sync-datacach-medium-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/datacach~medium-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-datacach-medium-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "search_keywords"
                ],
                "properties": {
                    "search_keywords": {
                        "title": "Search Keywords",
                        "type": "array",
                        "description": "One or more keywords to search for on Medium. Each keyword is scraped independently up to the Max Results limit.",
                        "default": [
                            "python",
                            "docker"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "max_results": {
                        "title": "Max Results per Keyword",
                        "minimum": 1,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Maximum number of articles to collect per keyword. Total output will be at most (keywords × max_results) items.",
                        "default": 100
                    },
                    "start_page": {
                        "title": "Start Page",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Page number to start scraping from for each keyword. Use this to resume from a specific offset or skip already-collected results. Defaults to 1 (the beginning).",
                        "default": 1
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
