Medium Article Scraper
Pricing
from $30.00 / 1,000 results
Medium Article Scraper
Extract complete data from Medium articles including title, author, content, subtitle, publish date, reading time, and response count. Reliable scraping with automatic retries and proxy rotation. Export as JSON or CSV.
Pricing
from $30.00 / 1,000 results
Rating
0.0
(0)
Developer

Sunday Victor
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
7 hours ago
Last modified
Categories
Share
A powerful Apify Actor that scrapes article content, metadata, and engagement metrics from Medium articles using Playwright and BeautifulSoup.
Features
- ๐ Fast and reliable scraping using Playwright with residential proxies
- ๐ Extracts comprehensive article data including title, author, content, and metadata
- ๐ Automatic retry mechanism for failed requests
- ๐ Exports data in both JSON and CSV formats
- ๐ก๏ธ Built-in error handling and validation
- ๐ Uses residential proxies to avoid blocking
Extracted Data
The scraper extracts the following fields from each Medium article:
| Field | Type | Description |
|---|---|---|
title | String | The main headline of the article |
subtitle | String | The article's subtitle or description |
url | String | The full URL of the article |
author | String | The author's name |
date | String | Publication date of the article |
read_time | String | Estimated reading time (e.g., "5 min read") |
response_count | Integer | Number of responses/comments |
content | String | Full text content of the article |
Input Configuration
Input Schema
{"StartUrl": ["https://medium.com/@author/article-title-123","https://towardsdatascience.com/another-article-456"]}
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
StartUrl | Array | Yes | - | List of Medium article URLs to scrape |
Example Input
{"StartUrl": ["https://medium.com/@barackobama/here-are-my-favorite-books-movies-and-music-of-2025-7139a0bdaf5b","https://towardsdatascience.com/machine-learning-explained-2024"]}
Output
JSON Output
The Actor stores data in the Apify dataset, which can be accessed via:
- The Apify Console
- The Apify API
- Downloaded as JSON, CSV, Excel, or other formats
Example output:
[{"title": "Here Are My Favorite Books, Movies, and Music of 2025","subtitle": "An annual tradition of sharing recommendations","url": "https://medium.com/@barackobama/here-are-my-favorite-books-movies-and-music-of-2025-7139a0bdaf5b","author": "Barack Obama","date": "Dec 19, 2024","read_time": "5 min read","response_count": 142,"content": "Every year, I share a list of my favorite books..."}]
CSV Export
The Actor also creates a CSV file (articles.csv) stored in the key-value store, which includes all scraped fields in a spreadsheet-friendly format.
Usage
Running on Apify Platform
- Go to the Actor's page on Apify
- Click "Try for Free" or "Start"
- Enter your Medium article URLs in the input
- Click "Start" to run the Actor
- Download results in your preferred format (JSON, CSV, Excel, etc.)
Using Apify API
const Apify = require('apify-client');const client = new Apify.ApifyClient({token: 'YOUR_APIFY_TOKEN',});const input = {"StartUrl": ["https://medium.com/@author/article-title"]};const run = await client.actor("YOUR_ACTOR_ID").call(input);const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Using Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_APIFY_TOKEN')run_input = {"StartUrl": ["https://medium.com/@author/article-title"]}run = client.actor("YOUR_ACTOR_ID").call(run_input=run_input)for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
Technical Details
Technologies Used
- Crawlee: Modern web scraping and browser automation library
- Playwright: Headless browser automation
- BeautifulSoup: HTML parsing and data extraction
- Apify SDK: Cloud scraping platform integration
Proxy Configuration
The Actor uses residential proxies by default to avoid IP blocking and ensure reliable scraping. This helps maintain high success rates when accessing Medium articles.
Performance
- Concurrency: Handles multiple articles simultaneously
- Retries: Automatically retries failed requests up to 3 times
- Timeout Handling: Gracefully handles slow-loading pages
- Average Runtime: ~3-5 seconds per article (depending on article length and network conditions)
Error Handling
The Actor includes comprehensive error handling:
- URL Validation: Ensures all input URLs are valid HTTP/HTTPS links
- Timeout Management: Handles slow-loading pages gracefully
- Extraction Failures: Records partial data even if some fields fail to extract
- Failed Requests: Logs errors and continues with remaining URLs
Failed Request Output
If an article fails to scrape, the output includes:
{"url": "https://medium.com/@author/failed-article","error": "Timeout 30000ms exceeded","status": "failed"}
Limitations
- Medium Paywall: Cannot extract full content from paywalled articles if you're not a Medium member
- Dynamic Content: Some interactive elements or embedded content may not be fully captured
- Rate Limiting: Medium may implement rate limiting; the Actor uses proxies to mitigate this
- URL Format: Only supports direct Medium article URLs (not Medium publication homepages or author profiles)
Use Cases
- Content Analysis: Analyze trends, topics, and writing styles across multiple articles
- Research: Collect articles for academic or market research
- Content Curation: Build curated lists of articles on specific topics
- SEO Analysis: Study successful Medium articles for content strategy
- Archiving: Create backups of important articles
- Data Science: Build datasets for NLP, sentiment analysis, or ML projects
Cost Estimation
The Actor's cost depends on:
- Number of articles scraped
- Article complexity and load time
- Compute unit pricing on Apify
Average cost: ~$0.10-$0.15 per 100 articles
Changelog
Version 1.0 (Current)
- Initial release
- Basic article scraping functionality
- Support for title, author, content, and metadata extraction
- CSV and JSON export
- Residential proxy support
Support
For issues, feature requests, or questions:
- Open an issue on GitHub
- Contact via Apify Console
- Email: sunvictor567@gmail.com
License
This Actor is provided as-is for use on the Apify platform.
Disclaimer
This Actor is intended for legitimate use cases such as research, analysis, and personal archiving. Users are responsible for complying with Medium's Terms of Service and applicable laws. Always respect robots.txt and rate limits. Do not use this tool for unauthorized scraping or commercial purposes that violate Medium's policies.
Made by sunvic using Crawlee and Apify