Medium Scraper avatar

Medium Scraper

Pricing

Pay per usage

Go to Apify Store
Medium Scraper

Medium Scraper

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Ricardo Akiyoshi

Ricardo Akiyoshi

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 hours ago

Last modified

Categories

Share

Medium Article Scraper

Scrape Medium articles by search term, tag, or author profile. This actor extracts comprehensive article data including title, subtitle, full text content, author information, publication details, claps, responses, reading time, tags, and member-only status.

Built with Crawlee and CheerioCrawler for fast, low-resource HTML scraping. Uses 4 independent extraction strategies (JSON-LD, Apollo state, DOM parsing, and meta tags) to maximize data reliability across Medium's dynamic layouts.

Features

  • Search by keyword — Find articles matching any search term
  • Filter by tag — Browse Medium tags like javascript, data-science, or startup
  • Scrape author profiles — Extract all articles from a specific Medium author
  • Full article text — Extracts the complete article body (or excerpt for member-only content)
  • Member-only detection — Identifies paywalled articles automatically
  • 4 extraction strategies — JSON-LD, Apollo/inline scripts, DOM, and meta tags
  • Deduplication — Automatically removes duplicate articles by URL and title
  • Proxy support — Rotate IPs to avoid rate limiting on large scrapes
  • Pay-per-event pricing — Only pay for articles successfully scraped ($0.003/article)

Use Cases

Content Research & Ideation

Find trending articles in your niche to discover what topics resonate with readers. Analyze titles, subtitles, and tags to inform your content strategy and identify gaps in existing coverage.

Trend Analysis & Market Intelligence

Track how specific topics evolve over time on Medium. Monitor clap counts and response rates to gauge audience engagement. Identify emerging trends before they go mainstream.

Competitive Analysis

Study what competitors and industry leaders are publishing. Analyze their posting frequency, popular topics, engagement metrics, and which publications they write for. Benchmark your content performance.

Academic & Journalistic Research

Collect articles on a specific subject for literature reviews, background research, or sourcing. Extract full text and metadata for structured analysis.

SEO & Content Marketing

Discover high-performing content formats and topics. Analyze which tags drive the most engagement. Find potential collaboration opportunities with popular Medium authors.

Data Science & NLP Training

Build datasets of categorized, tagged articles for natural language processing, sentiment analysis, topic modeling, or text classification projects.

Input Parameters

ParameterTypeRequiredDefaultDescription
searchTermstringYesKeyword or phrase to search for on Medium
tagstringNoMedium tag to filter by (e.g., programming, data-science)
authorstringNoAuthor username without @ (e.g., elonmusk)
maxResultsintegerNo50Maximum articles to scrape (1–5,000)
sortByenumNorelevanceSort order: relevance, recent, or popular
proxyConfigurationobjectNoApify ProxyProxy settings for IP rotation

Note: At least one of searchTerm, tag, or author must be provided. You can combine all three for more targeted results.

Output Schema

Each scraped article produces a JSON object with these fields:

{
"title": "How I Built a Million-Dollar SaaS in 12 Months",
"subtitle": "A step-by-step breakdown of the strategy that worked",
"author": "Jane Developer",
"authorUrl": "https://medium.com/@janedev",
"publication": "Better Programming",
"publishDate": "2024-03-15T00:00:00.000Z",
"readingTime": "8 min read",
"claps": 4200,
"responses": 87,
"content": "Full article text extracted from the page...",
"tags": ["startup", "saas", "entrepreneurship"],
"imageUrl": "https://miro.medium.com/v2/resize:fit:1200/image.jpeg",
"articleUrl": "https://medium.com/@janedev/how-i-built-a-million-dollar-saas-abc123def456",
"memberOnly": false,
"extractionMethods": ["json-ld", "dom", "meta-tags"],
"scrapedAt": "2024-03-20T14:30:00.000Z"
}

Field Reference

FieldTypeDescription
titlestringArticle headline
subtitlestringArticle subtitle or description
authorstringAuthor display name
authorUrlstringLink to author's Medium profile
publicationstringPublication name (e.g., "Towards Data Science")
publishDatestringISO 8601 publication date
readingTimestringEstimated reading time (e.g., "5 min read")
clapsintegerNumber of claps (likes)
responsesintegerNumber of comments/responses
contentstringFull article text (may be truncated for member-only articles)
tagsarrayList of topic tags
imageUrlstringFeatured/header image URL
articleUrlstringCanonical article URL
memberOnlybooleanWhether the article is behind Medium's paywall
extractionMethodsarrayWhich extraction strategies produced data
scrapedAtstringISO 8601 timestamp of when the article was scraped

Code Examples

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run_input = {
"searchTerm": "machine learning",
"tag": "artificial-intelligence",
"maxResults": 100,
"sortBy": "popular",
"proxyConfiguration": {"useApifyProxy": True},
}
run = client.actor("sovereigntaylor/medium-scraper").call(run_input=run_input)
print(f"Scraping complete. Dataset ID: {run['defaultDatasetId']}")
# Iterate over results
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{item['title']}{item['claps']} claps — {item['readingTime']}")

Python — Export to Pandas DataFrame

from apify_client import ApifyClient
import pandas as pd
client = ApifyClient("YOUR_API_TOKEN")
run_input = {
"searchTerm": "startup advice",
"maxResults": 200,
"sortBy": "popular",
}
run = client.actor("sovereigntaylor/medium-scraper").call(run_input=run_input)
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
df = pd.DataFrame(items)
# Analyze engagement
print(f"Total articles: {len(df)}")
print(f"Average claps: {df['claps'].mean():.0f}")
print(f"Top tags: {df['tags'].explode().value_counts().head(10)}")
# Export
df.to_csv("medium_articles.csv", index=False)

JavaScript (Node.js)

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const input = {
searchTerm: 'web development',
tag: 'javascript',
maxResults: 50,
sortBy: 'recent',
proxyConfiguration: { useApifyProxy: true },
};
const run = await client.actor('sovereigntaylor/medium-scraper').call(input);
console.log(`Dataset ID: ${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
console.log(`${item.title} by ${item.author} (${item.claps} claps)`);
}

JavaScript — Filter Member-Only Articles

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('sovereigntaylor/medium-scraper').call({
searchTerm: 'product management',
maxResults: 100,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
const freeArticles = items.filter(item => !item.memberOnly);
const paidArticles = items.filter(item => item.memberOnly);
console.log(`Free articles: ${freeArticles.length}`);
console.log(`Member-only articles: ${paidArticles.length}`);
// Top free articles by claps
freeArticles
.sort((a, b) => b.claps - a.claps)
.slice(0, 10)
.forEach((a, i) => console.log(`${i + 1}. ${a.title} (${a.claps} claps)`));

Tips & Best Practices

  1. Use proxies for large scrapes. Medium may rate-limit or block datacenter IPs. Enable Apify Proxy with residential IPs for scrapes over 100 articles.

  2. Combine sources for broader coverage. Use searchTerm + tag + author together to cast a wider net and get more diverse results.

  3. Start small. Test with maxResults: 10 first to verify the output format meets your needs before running large scrapes.

  4. Member-only content. The scraper can detect member-only articles but may only extract partial content (subtitle/excerpt) for paywalled articles.

  5. Rate limiting. The scraper automatically throttles requests to avoid being blocked. Increasing maxResults will increase run time proportionally.

  6. Tag formatting. Tags should be lowercase with hyphens (e.g., data-science, not Data Science). The scraper normalizes tags automatically.

FAQ

Q: Can this scraper extract full text from member-only articles? A: The scraper extracts whatever content is available in the HTML. For member-only articles, this is typically the subtitle and first few paragraphs. Full text requires a Medium membership session, which this scraper does not support.

Q: How many articles can I scrape per run? A: Up to 5,000 articles per run. For larger datasets, run the actor multiple times with different search terms or time ranges.

Q: Why are some fields empty? A: Medium's page structure varies across different article templates, publications, and A/B tests. The scraper uses 4 extraction strategies to maximize coverage, but some fields may not be available on every article.

Q: How much does it cost? A: The actor uses pay-per-event pricing at $0.003 per article scraped. A 100-article scrape costs $0.30. You also pay standard Apify platform compute costs (typically $0.01–0.05 per run depending on duration).

Q: Can I scrape articles from custom domain publications? A: Yes, if the publication uses a custom domain (e.g., blog.company.com) but is hosted on Medium, the scraper can extract articles from those pages as well when they appear in search results.

Q: How often can I run this scraper? A: You can schedule it to run as often as needed. For monitoring use cases, daily or weekly runs are common. Use Apify's scheduling feature to automate recurring scrapes.

Q: Does this work with Medium's API? A: No. Medium's official API is very limited and does not support search. This scraper works by parsing the public HTML pages, which provides much richer data.

Pricing

This actor uses Pay Per Event pricing.

EventPrice
Article scraped$0.003

You only pay for articles that are successfully extracted and saved to your dataset. Failed or skipped articles are not charged.

Integration — Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("sovereigntaylor/medium-scraper").call(run_input={
"searchTerm": "medium",
"maxResults": 50
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{item.get('title', item.get('name', 'N/A'))}")

Integration — JavaScript

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('sovereigntaylor/medium-scraper').call({
searchTerm: 'medium',
maxResults: 50
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(item => console.log(item.title || item.name || 'N/A'));

Changelog

v1.0 (2026-03-02)

  • Initial release
  • Search, tag, and author scraping
  • 4 extraction strategies (JSON-LD, Apollo, DOM, meta tags)
  • Member-only detection
  • Deduplication by URL and title
  • Pay-per-event billing

License

MIT License. Built by Sovereign AI.