Medium Article Scraper avatar
Medium Article Scraper

Pricing

from $30.00 / 1,000 results

Go to Apify Store
Medium Article Scraper

Medium Article Scraper

Extract complete data from Medium articles including title, author, content, subtitle, publish date, reading time, and response count. Reliable scraping with automatic retries and proxy rotation. Export as JSON or CSV.

Pricing

from $30.00 / 1,000 results

Rating

0.0

(0)

Developer

Sunday Victor

Sunday Victor

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

7 hours ago

Last modified

Share

A powerful Apify Actor that scrapes article content, metadata, and engagement metrics from Medium articles using Playwright and BeautifulSoup.

Features

  • ๐Ÿš€ Fast and reliable scraping using Playwright with residential proxies
  • ๐Ÿ“Š Extracts comprehensive article data including title, author, content, and metadata
  • ๐Ÿ”„ Automatic retry mechanism for failed requests
  • ๐Ÿ“ Exports data in both JSON and CSV formats
  • ๐Ÿ›ก๏ธ Built-in error handling and validation
  • ๐ŸŒ Uses residential proxies to avoid blocking

Extracted Data

The scraper extracts the following fields from each Medium article:

FieldTypeDescription
titleStringThe main headline of the article
subtitleStringThe article's subtitle or description
urlStringThe full URL of the article
authorStringThe author's name
dateStringPublication date of the article
read_timeStringEstimated reading time (e.g., "5 min read")
response_countIntegerNumber of responses/comments
contentStringFull text content of the article

Input Configuration

Input Schema

{
"StartUrl": [
"https://medium.com/@author/article-title-123",
"https://towardsdatascience.com/another-article-456"
]
}

Input Parameters

ParameterTypeRequiredDefaultDescription
StartUrlArrayYes-List of Medium article URLs to scrape

Example Input

{
"StartUrl": [
"https://medium.com/@barackobama/here-are-my-favorite-books-movies-and-music-of-2025-7139a0bdaf5b",
"https://towardsdatascience.com/machine-learning-explained-2024"
]
}

Output

JSON Output

The Actor stores data in the Apify dataset, which can be accessed via:

  • The Apify Console
  • The Apify API
  • Downloaded as JSON, CSV, Excel, or other formats

Example output:

[
{
"title": "Here Are My Favorite Books, Movies, and Music of 2025",
"subtitle": "An annual tradition of sharing recommendations",
"url": "https://medium.com/@barackobama/here-are-my-favorite-books-movies-and-music-of-2025-7139a0bdaf5b",
"author": "Barack Obama",
"date": "Dec 19, 2024",
"read_time": "5 min read",
"response_count": 142,
"content": "Every year, I share a list of my favorite books..."
}
]

CSV Export

The Actor also creates a CSV file (articles.csv) stored in the key-value store, which includes all scraped fields in a spreadsheet-friendly format.

Usage

Running on Apify Platform

  1. Go to the Actor's page on Apify
  2. Click "Try for Free" or "Start"
  3. Enter your Medium article URLs in the input
  4. Click "Start" to run the Actor
  5. Download results in your preferred format (JSON, CSV, Excel, etc.)

Using Apify API

const Apify = require('apify-client');
const client = new Apify.ApifyClient({
token: 'YOUR_APIFY_TOKEN',
});
const input = {
"StartUrl": [
"https://medium.com/@author/article-title"
]
};
const run = await client.actor("YOUR_ACTOR_ID").call(input);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Using Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run_input = {
"StartUrl": [
"https://medium.com/@author/article-title"
]
}
run = client.actor("YOUR_ACTOR_ID").call(run_input=run_input)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

Technical Details

Technologies Used

  • Crawlee: Modern web scraping and browser automation library
  • Playwright: Headless browser automation
  • BeautifulSoup: HTML parsing and data extraction
  • Apify SDK: Cloud scraping platform integration

Proxy Configuration

The Actor uses residential proxies by default to avoid IP blocking and ensure reliable scraping. This helps maintain high success rates when accessing Medium articles.

Performance

  • Concurrency: Handles multiple articles simultaneously
  • Retries: Automatically retries failed requests up to 3 times
  • Timeout Handling: Gracefully handles slow-loading pages
  • Average Runtime: ~3-5 seconds per article (depending on article length and network conditions)

Error Handling

The Actor includes comprehensive error handling:

  • URL Validation: Ensures all input URLs are valid HTTP/HTTPS links
  • Timeout Management: Handles slow-loading pages gracefully
  • Extraction Failures: Records partial data even if some fields fail to extract
  • Failed Requests: Logs errors and continues with remaining URLs

Failed Request Output

If an article fails to scrape, the output includes:

{
"url": "https://medium.com/@author/failed-article",
"error": "Timeout 30000ms exceeded",
"status": "failed"
}

Limitations

  • Medium Paywall: Cannot extract full content from paywalled articles if you're not a Medium member
  • Dynamic Content: Some interactive elements or embedded content may not be fully captured
  • Rate Limiting: Medium may implement rate limiting; the Actor uses proxies to mitigate this
  • URL Format: Only supports direct Medium article URLs (not Medium publication homepages or author profiles)

Use Cases

  1. Content Analysis: Analyze trends, topics, and writing styles across multiple articles
  2. Research: Collect articles for academic or market research
  3. Content Curation: Build curated lists of articles on specific topics
  4. SEO Analysis: Study successful Medium articles for content strategy
  5. Archiving: Create backups of important articles
  6. Data Science: Build datasets for NLP, sentiment analysis, or ML projects

Cost Estimation

The Actor's cost depends on:

  • Number of articles scraped
  • Article complexity and load time
  • Compute unit pricing on Apify

Average cost: ~$0.10-$0.15 per 100 articles

Changelog

Version 1.0 (Current)

  • Initial release
  • Basic article scraping functionality
  • Support for title, author, content, and metadata extraction
  • CSV and JSON export
  • Residential proxy support

Support

For issues, feature requests, or questions:

License

This Actor is provided as-is for use on the Apify platform.

Disclaimer

This Actor is intended for legitimate use cases such as research, analysis, and personal archiving. Users are responsible for complying with Medium's Terms of Service and applicable laws. Always respect robots.txt and rate limits. Do not use this tool for unauthorized scraping or commercial purposes that violate Medium's policies.


Made by sunvic using Crawlee and Apify