Medium Scraper
Pricing
Pay per usage
Medium Scraper
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Ricardo Akiyoshi
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 hours ago
Last modified
Categories
Share
Medium Article Scraper
Scrape Medium articles by search term, tag, or author profile. This actor extracts comprehensive article data including title, subtitle, full text content, author information, publication details, claps, responses, reading time, tags, and member-only status.
Built with Crawlee and CheerioCrawler for fast, low-resource HTML scraping. Uses 4 independent extraction strategies (JSON-LD, Apollo state, DOM parsing, and meta tags) to maximize data reliability across Medium's dynamic layouts.
Features
- Search by keyword — Find articles matching any search term
- Filter by tag — Browse Medium tags like
javascript,data-science, orstartup - Scrape author profiles — Extract all articles from a specific Medium author
- Full article text — Extracts the complete article body (or excerpt for member-only content)
- Member-only detection — Identifies paywalled articles automatically
- 4 extraction strategies — JSON-LD, Apollo/inline scripts, DOM, and meta tags
- Deduplication — Automatically removes duplicate articles by URL and title
- Proxy support — Rotate IPs to avoid rate limiting on large scrapes
- Pay-per-event pricing — Only pay for articles successfully scraped ($0.003/article)
Use Cases
Content Research & Ideation
Find trending articles in your niche to discover what topics resonate with readers. Analyze titles, subtitles, and tags to inform your content strategy and identify gaps in existing coverage.
Trend Analysis & Market Intelligence
Track how specific topics evolve over time on Medium. Monitor clap counts and response rates to gauge audience engagement. Identify emerging trends before they go mainstream.
Competitive Analysis
Study what competitors and industry leaders are publishing. Analyze their posting frequency, popular topics, engagement metrics, and which publications they write for. Benchmark your content performance.
Academic & Journalistic Research
Collect articles on a specific subject for literature reviews, background research, or sourcing. Extract full text and metadata for structured analysis.
SEO & Content Marketing
Discover high-performing content formats and topics. Analyze which tags drive the most engagement. Find potential collaboration opportunities with popular Medium authors.
Data Science & NLP Training
Build datasets of categorized, tagged articles for natural language processing, sentiment analysis, topic modeling, or text classification projects.
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
searchTerm | string | Yes | — | Keyword or phrase to search for on Medium |
tag | string | No | — | Medium tag to filter by (e.g., programming, data-science) |
author | string | No | — | Author username without @ (e.g., elonmusk) |
maxResults | integer | No | 50 | Maximum articles to scrape (1–5,000) |
sortBy | enum | No | relevance | Sort order: relevance, recent, or popular |
proxyConfiguration | object | No | Apify Proxy | Proxy settings for IP rotation |
Note: At least one of searchTerm, tag, or author must be provided. You can combine all three for more targeted results.
Output Schema
Each scraped article produces a JSON object with these fields:
{"title": "How I Built a Million-Dollar SaaS in 12 Months","subtitle": "A step-by-step breakdown of the strategy that worked","author": "Jane Developer","authorUrl": "https://medium.com/@janedev","publication": "Better Programming","publishDate": "2024-03-15T00:00:00.000Z","readingTime": "8 min read","claps": 4200,"responses": 87,"content": "Full article text extracted from the page...","tags": ["startup", "saas", "entrepreneurship"],"imageUrl": "https://miro.medium.com/v2/resize:fit:1200/image.jpeg","articleUrl": "https://medium.com/@janedev/how-i-built-a-million-dollar-saas-abc123def456","memberOnly": false,"extractionMethods": ["json-ld", "dom", "meta-tags"],"scrapedAt": "2024-03-20T14:30:00.000Z"}
Field Reference
| Field | Type | Description |
|---|---|---|
title | string | Article headline |
subtitle | string | Article subtitle or description |
author | string | Author display name |
authorUrl | string | Link to author's Medium profile |
publication | string | Publication name (e.g., "Towards Data Science") |
publishDate | string | ISO 8601 publication date |
readingTime | string | Estimated reading time (e.g., "5 min read") |
claps | integer | Number of claps (likes) |
responses | integer | Number of comments/responses |
content | string | Full article text (may be truncated for member-only articles) |
tags | array | List of topic tags |
imageUrl | string | Featured/header image URL |
articleUrl | string | Canonical article URL |
memberOnly | boolean | Whether the article is behind Medium's paywall |
extractionMethods | array | Which extraction strategies produced data |
scrapedAt | string | ISO 8601 timestamp of when the article was scraped |
Code Examples
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run_input = {"searchTerm": "machine learning","tag": "artificial-intelligence","maxResults": 100,"sortBy": "popular","proxyConfiguration": {"useApifyProxy": True},}run = client.actor("sovereigntaylor/medium-scraper").call(run_input=run_input)print(f"Scraping complete. Dataset ID: {run['defaultDatasetId']}")# Iterate over resultsfor item in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{item['title']} — {item['claps']} claps — {item['readingTime']}")
Python — Export to Pandas DataFrame
from apify_client import ApifyClientimport pandas as pdclient = ApifyClient("YOUR_API_TOKEN")run_input = {"searchTerm": "startup advice","maxResults": 200,"sortBy": "popular",}run = client.actor("sovereigntaylor/medium-scraper").call(run_input=run_input)items = list(client.dataset(run["defaultDatasetId"]).iterate_items())df = pd.DataFrame(items)# Analyze engagementprint(f"Total articles: {len(df)}")print(f"Average claps: {df['claps'].mean():.0f}")print(f"Top tags: {df['tags'].explode().value_counts().head(10)}")# Exportdf.to_csv("medium_articles.csv", index=False)
JavaScript (Node.js)
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const input = {searchTerm: 'web development',tag: 'javascript',maxResults: 50,sortBy: 'recent',proxyConfiguration: { useApifyProxy: true },};const run = await client.actor('sovereigntaylor/medium-scraper').call(input);console.log(`Dataset ID: ${run.defaultDatasetId}`);const { items } = await client.dataset(run.defaultDatasetId).listItems();for (const item of items) {console.log(`${item.title} by ${item.author} (${item.claps} claps)`);}
JavaScript — Filter Member-Only Articles
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('sovereigntaylor/medium-scraper').call({searchTerm: 'product management',maxResults: 100,});const { items } = await client.dataset(run.defaultDatasetId).listItems();const freeArticles = items.filter(item => !item.memberOnly);const paidArticles = items.filter(item => item.memberOnly);console.log(`Free articles: ${freeArticles.length}`);console.log(`Member-only articles: ${paidArticles.length}`);// Top free articles by clapsfreeArticles.sort((a, b) => b.claps - a.claps).slice(0, 10).forEach((a, i) => console.log(`${i + 1}. ${a.title} (${a.claps} claps)`));
Tips & Best Practices
-
Use proxies for large scrapes. Medium may rate-limit or block datacenter IPs. Enable Apify Proxy with residential IPs for scrapes over 100 articles.
-
Combine sources for broader coverage. Use
searchTerm+tag+authortogether to cast a wider net and get more diverse results. -
Start small. Test with
maxResults: 10first to verify the output format meets your needs before running large scrapes. -
Member-only content. The scraper can detect member-only articles but may only extract partial content (subtitle/excerpt) for paywalled articles.
-
Rate limiting. The scraper automatically throttles requests to avoid being blocked. Increasing
maxResultswill increase run time proportionally. -
Tag formatting. Tags should be lowercase with hyphens (e.g.,
data-science, notData Science). The scraper normalizes tags automatically.
FAQ
Q: Can this scraper extract full text from member-only articles? A: The scraper extracts whatever content is available in the HTML. For member-only articles, this is typically the subtitle and first few paragraphs. Full text requires a Medium membership session, which this scraper does not support.
Q: How many articles can I scrape per run? A: Up to 5,000 articles per run. For larger datasets, run the actor multiple times with different search terms or time ranges.
Q: Why are some fields empty? A: Medium's page structure varies across different article templates, publications, and A/B tests. The scraper uses 4 extraction strategies to maximize coverage, but some fields may not be available on every article.
Q: How much does it cost? A: The actor uses pay-per-event pricing at $0.003 per article scraped. A 100-article scrape costs $0.30. You also pay standard Apify platform compute costs (typically $0.01–0.05 per run depending on duration).
Q: Can I scrape articles from custom domain publications?
A: Yes, if the publication uses a custom domain (e.g., blog.company.com) but is hosted on Medium, the scraper can extract articles from those pages as well when they appear in search results.
Q: How often can I run this scraper? A: You can schedule it to run as often as needed. For monitoring use cases, daily or weekly runs are common. Use Apify's scheduling feature to automate recurring scrapes.
Q: Does this work with Medium's API? A: No. Medium's official API is very limited and does not support search. This scraper works by parsing the public HTML pages, which provides much richer data.
Pricing
This actor uses Pay Per Event pricing.
| Event | Price |
|---|---|
| Article scraped | $0.003 |
You only pay for articles that are successfully extracted and saved to your dataset. Failed or skipped articles are not charged.
Integration — Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("sovereigntaylor/medium-scraper").call(run_input={"searchTerm": "medium","maxResults": 50})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{item.get('title', item.get('name', 'N/A'))}")
Integration — JavaScript
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('sovereigntaylor/medium-scraper').call({searchTerm: 'medium',maxResults: 50});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(item => console.log(item.title || item.name || 'N/A'));
Related Actors
- Hacker News Scraper — Scrape HN stories, scores, and comments
- Reddit Scraper — Extract Reddit posts and comments from any subreddit
- Google Search Scraper — Scrape Google search results for any query
- Website to Markdown — Convert any webpage to clean Markdown for AI/RAG
Changelog
v1.0 (2026-03-02)
- Initial release
- Search, tag, and author scraping
- 4 extraction strategies (JSON-LD, Apollo, DOM, meta tags)
- Member-only detection
- Deduplication by URL and title
- Pay-per-event billing
License
MIT License. Built by Sovereign AI.