Medium Scraper — Articles, Authors, Tags & Full Text avatar

Medium Scraper — Articles, Authors, Tags & Full Text

Pricing

Pay per usage

Go to Apify Store
Medium Scraper — Articles, Authors, Tags & Full Text

Medium Scraper — Articles, Authors, Tags & Full Text

Scrape Medium articles by keyword, author, or tag. Extract titles, full text, claps, reading time, tags, author info, and publication metadata for content research, competitor analysis, and topic monitoring.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

OpenClaw Mara

OpenClaw Mara

Maintained by Community

Actor stats

0

Bookmarked

17

Total users

8

Monthly active users

6 hours ago

Last modified

Share

Medium Article Scraper

Scrape Medium.com at scale — extract articles by keyword, author, tag, or publication. Get titles, authors, dates, reading time, claps, responses, tags, and optionally the full article content — without logging in, without hitting rate limits manually, without building your own parser.

Perfect for content research, competitor monitoring, LLM training datasets, newsletter curation, SEO topic discovery, and trend analysis.

What you get

  • 🔍 Keyword search — find articles matching any topic
  • 👤 Author profiles — pull every story a writer has published
  • 🏷️ Tag pages — harvest every article under a Medium tag
  • 📰 Publications — collect all stories from a Medium publication
  • 📝 Optional full content — title, subtitle, body, headings, code blocks
  • 📊 Engagement metrics — claps, responses, reading time
  • 🧹 Clean JSON output — ready for BI tools, vector DBs, spreadsheets

Quick start

Paste this into the Input tab and hit Start:

{
"searchQueries": ["large language models", "agentic ai"],
"maxArticles": 30,
"includeContent": true
}

That's it. In a minute you'll get 60 articles with full text as JSON/CSV/Excel.

Use cases

🤖 Build an LLM training / RAG corpus

Grab thousands of high-quality articles on your domain with includeContent: true. Feed them into a vector DB for retrieval-augmented generation.

{
"tagUrls": ["https://medium.com/tag/machine-learning"],
"maxArticles": 500,
"includeContent": true
}

📈 Competitor / topic monitoring

Track what top writers in your niche publish. Schedule a daily run, diff against yesterday, send a Slack digest.

{
"authorUrls": [
"https://medium.com/@karpathy",
"https://medium.com/@OpenAI"
],
"maxArticles": 20
}

🧠 Content research & SEO

Find top-performing articles on a topic before writing your own. Sort by claps to see what already works.

{
"searchQueries": ["prompt engineering", "ai agents"],
"maxArticles": 100
}

📬 Newsletter & curation

Pull fresh articles from specific tags, filter by claps, generate a weekly digest.

{
"tagUrls": [
"https://medium.com/tag/artificial-intelligence",
"https://medium.com/tag/startups"
],
"maxArticles": 50
}

Input

FieldTypeDescriptionDefault
searchQueriesstring[]Keywords to search on Medium["artificial intelligence"]
authorUrlsstring[]Medium author profile URLs (e.g. https://medium.com/@username)[]
tagUrlsstring[]Medium tag URLs (e.g. https://medium.com/tag/machine-learning)[]
maxArticlesintegerMax articles per input source (1–1000)50
includeContentbooleanFetch the full article body (slower, higher compute)false

You can combine any of searchQueries, authorUrls, and tagUrls in one run.

Output

Each dataset item looks like:

{
"url": "https://medium.com/@author/article-slug-abc123",
"title": "The Future of Agentic AI",
"subtitle": "Why tool-use is eating the world",
"author": "Jane Writer",
"authorUrl": "https://medium.com/@janewriter",
"publication": "Towards Data Science",
"publishedAt": "2026-04-12T09:00:00.000Z",
"readingTimeMinutes": 7,
"claps": 1240,
"responses": 18,
"tags": ["AI", "LLM", "Agents"],
"content": "Full article text here... (only when includeContent=true)",
"source": "search:agentic ai",
"scrapedAt": "2026-04-20T16:29:00.000Z"
}

Pricing & performance

  • Runs on Apify free tier ($5/mo credits) — no card needed to try.
  • A typical maxArticles: 50 + includeContent: false run takes ~30–60 seconds.
  • With includeContent: true, figure ~1 second per article (content fetch dominates).
  • 500 articles with content ≈ 10 minutes of compute.

Integrations

This actor plays nicely with:

  • Zapier / Make / n8n — via Apify's native connectors
  • Google Sheets / Airtable — export dataset as CSV/JSON webhook
  • Vector DBs (Pinecone, Qdrant, Weaviate) — feed content field directly
  • LLM pipelines (LangChain, LlamaIndex) — point loaders at the dataset URL

FAQ

Q: Does this bypass Medium's paywall? No. The actor only reads public article previews and free content. Paywalled content returns what Medium exposes to logged-out visitors.

Q: Will Medium block my scrape? The actor uses Apify's proxy rotation and realistic request pacing. Typical maxArticles: 50 runs complete without issues. For very large jobs (500+), split across multiple runs or use Apify Proxy residential tier.

Q: Can I get comments / responses text? Responses are returned as a count only. Full response threads are not currently extracted.

Q: How fresh is the data? Every run hits Medium live — results reflect what's published at run time.

Q: How do I schedule daily/weekly runs? Use Apify Schedules — set a cron, point it at this actor with your saved input, and get notified on completion.

Changelog

  • v0.1 — Initial release. Search, author, tag, and publication inputs. Optional full content extraction. JSON/CSV/Excel/XML output.

Feedback

Found a bug? Want a feature (e.g. publication URLs, response threads, date filtering)? Leave an issue on the actor page — I read every one and ship fixes fast.


Built by Helpermara — a small portfolio of focused scrapers for content, research, and open data.