Medium Scraper — Articles, Authors, Tags & Full Text
Pricing
Pay per usage
Medium Scraper — Articles, Authors, Tags & Full Text
Scrape Medium articles by keyword, author, or tag. Extract titles, full text, claps, reading time, tags, author info, and publication metadata for content research, competitor analysis, and topic monitoring.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
OpenClaw Mara
Actor stats
0
Bookmarked
17
Total users
8
Monthly active users
6 hours ago
Last modified
Categories
Share
Medium Article Scraper
Scrape Medium.com at scale — extract articles by keyword, author, tag, or publication. Get titles, authors, dates, reading time, claps, responses, tags, and optionally the full article content — without logging in, without hitting rate limits manually, without building your own parser.
Perfect for content research, competitor monitoring, LLM training datasets, newsletter curation, SEO topic discovery, and trend analysis.
What you get
- 🔍 Keyword search — find articles matching any topic
- 👤 Author profiles — pull every story a writer has published
- 🏷️ Tag pages — harvest every article under a Medium tag
- 📰 Publications — collect all stories from a Medium publication
- 📝 Optional full content — title, subtitle, body, headings, code blocks
- 📊 Engagement metrics — claps, responses, reading time
- 🧹 Clean JSON output — ready for BI tools, vector DBs, spreadsheets
Quick start
Paste this into the Input tab and hit Start:
{"searchQueries": ["large language models", "agentic ai"],"maxArticles": 30,"includeContent": true}
That's it. In a minute you'll get 60 articles with full text as JSON/CSV/Excel.
Use cases
🤖 Build an LLM training / RAG corpus
Grab thousands of high-quality articles on your domain with includeContent: true. Feed them into a vector DB for retrieval-augmented generation.
{"tagUrls": ["https://medium.com/tag/machine-learning"],"maxArticles": 500,"includeContent": true}
📈 Competitor / topic monitoring
Track what top writers in your niche publish. Schedule a daily run, diff against yesterday, send a Slack digest.
{"authorUrls": ["https://medium.com/@karpathy","https://medium.com/@OpenAI"],"maxArticles": 20}
🧠 Content research & SEO
Find top-performing articles on a topic before writing your own. Sort by claps to see what already works.
{"searchQueries": ["prompt engineering", "ai agents"],"maxArticles": 100}
📬 Newsletter & curation
Pull fresh articles from specific tags, filter by claps, generate a weekly digest.
{"tagUrls": ["https://medium.com/tag/artificial-intelligence","https://medium.com/tag/startups"],"maxArticles": 50}
Input
| Field | Type | Description | Default |
|---|---|---|---|
searchQueries | string[] | Keywords to search on Medium | ["artificial intelligence"] |
authorUrls | string[] | Medium author profile URLs (e.g. https://medium.com/@username) | [] |
tagUrls | string[] | Medium tag URLs (e.g. https://medium.com/tag/machine-learning) | [] |
maxArticles | integer | Max articles per input source (1–1000) | 50 |
includeContent | boolean | Fetch the full article body (slower, higher compute) | false |
You can combine any of searchQueries, authorUrls, and tagUrls in one run.
Output
Each dataset item looks like:
{"url": "https://medium.com/@author/article-slug-abc123","title": "The Future of Agentic AI","subtitle": "Why tool-use is eating the world","author": "Jane Writer","authorUrl": "https://medium.com/@janewriter","publication": "Towards Data Science","publishedAt": "2026-04-12T09:00:00.000Z","readingTimeMinutes": 7,"claps": 1240,"responses": 18,"tags": ["AI", "LLM", "Agents"],"content": "Full article text here... (only when includeContent=true)","source": "search:agentic ai","scrapedAt": "2026-04-20T16:29:00.000Z"}
Pricing & performance
- Runs on Apify free tier ($5/mo credits) — no card needed to try.
- A typical
maxArticles: 50+includeContent: falserun takes ~30–60 seconds. - With
includeContent: true, figure ~1 second per article (content fetch dominates). - 500 articles with content ≈ 10 minutes of compute.
Integrations
This actor plays nicely with:
- Zapier / Make / n8n — via Apify's native connectors
- Google Sheets / Airtable — export dataset as CSV/JSON webhook
- Vector DBs (Pinecone, Qdrant, Weaviate) — feed
contentfield directly - LLM pipelines (LangChain, LlamaIndex) — point loaders at the dataset URL
FAQ
Q: Does this bypass Medium's paywall? No. The actor only reads public article previews and free content. Paywalled content returns what Medium exposes to logged-out visitors.
Q: Will Medium block my scrape?
The actor uses Apify's proxy rotation and realistic request pacing. Typical maxArticles: 50 runs complete without issues. For very large jobs (500+), split across multiple runs or use Apify Proxy residential tier.
Q: Can I get comments / responses text? Responses are returned as a count only. Full response threads are not currently extracted.
Q: How fresh is the data? Every run hits Medium live — results reflect what's published at run time.
Q: How do I schedule daily/weekly runs? Use Apify Schedules — set a cron, point it at this actor with your saved input, and get notified on completion.
Changelog
- v0.1 — Initial release. Search, author, tag, and publication inputs. Optional full content extraction. JSON/CSV/Excel/XML output.
Feedback
Found a bug? Want a feature (e.g. publication URLs, response threads, date filtering)? Leave an issue on the actor page — I read every one and ship fixes fast.
Built by Helpermara — a small portfolio of focused scrapers for content, research, and open data.