Dev.to Articles Scraper avatar

Dev.to Articles Scraper

Pricing

from $3.50 / 1,000 results

Go to Apify Store
Dev.to Articles Scraper

Dev.to Articles Scraper

Scrape developer articles from Dev.to by tag or author: title, description, tags, reactions, comments and reading time. Schedule it to track trending tech content and topics.

Pricing

from $3.50 / 1,000 results

Rating

0.0

(0)

Developer

Logiover

Logiover

Maintained by Community

Actor stats

0

Bookmarked

41

Total users

8

Monthly active users

6 hours ago

Last modified

Share

✍️ Dev.to Articles Scraper — Developer Blog Posts by Tag or Author to JSON/CSV

Dev.to Articles Scraper

Bulk-scrape developer articles from Dev.to via the official public API — by tag, by author username, or the latest feed across all tags — fully paginated through Forem's public JSON. Title, description, URL, tags, author display name + username, reactions count, comments count, reading time, cover image and publish timestamp. No login, no Forem API key, no proxy. Export to JSON, CSV, Excel or XML.

Built for technical content marketers running newsletters, dev-relations teams monitoring tool adoption, content aggregators, technical journalists, devtool product teams tracking ecosystem conversation, and ML engineers building tech-writing corpora.

🟢 No Dev.to account. No API key. No proxy. Pure public REST.


🚀 Why this scraper

Dev.to (Forem) is one of the largest dedicated technical-writing platforms — thousands of new posts every week from indie developers, OSS maintainers, DevRel teams at major companies, ML researchers, frontend craftspeople, system designers, and self-taught builders sharing what they learn. The signal is dense and well-tagged: every post carries up to four tags (#javascript, #ai, #webdev, #python, #react, #rust, #devops, #career, #beginners, #tutorial...), reading time, engagement counts and a clear author identity.

Pulling Dev.to at scale yourself runs into:

  • Knowing the Forem /api/articles query parameter conventions (tag, username, top, state, page, per_page)
  • Threading full pagination across hundreds of pages
  • Distinguishing "tag feed" from "username feed" from "latest feed"
  • Flattening Forem's nested response into flat rows
  • Handling occasional 429s with backoff
  • Persisting output in a format your warehouse, BI tool or content pipeline can use

This Actor handles all of that. Set a tag, a username (or both, or neither), set the cap, hit run — get back a flat, structured, paginated dataset of every matching article with all the metadata you need.


✨ Key features

FeatureWhat it gives you
🔌 Official Dev.to / Forem APIStable, well-documented, fully paginated — no fragile HTML parsing
🏷️ Tag-based filteringPull articles for any topic: javascript, ai, webdev, python, react, rust, devops, career, beginners, tutorial, etc.
👤 Author-based filteringCollect every article from a specific Dev.to writer by username
♾️ Full paginationWalks the entire feed for your filter, not just page 1
📊 Rich metadata per article13 fields: title, description, URL, tags, author display name + username, reactions, comments, reading time, cover image, publish date
📈 Engagement metricsreactionsCount and commentsCount make it easy to rank by popularity
⏱️ Reading-time includedEstimated reading time in minutes — useful for content curation and email digests
🎯 Adjustable run sizemaxArticles=0 pulls everything; cap to anything for faster runs
🧱 Flat, export-ready schemaNo nested JSON — drop straight into a spreadsheet or warehouse
📦 All export formatsJSON, CSV, Excel, HTML, XML, JSONL via the Apify Dataset
🔓 No auth, no proxyPure public-API access — no Dev.to account, no API key, no residential proxy
🧰 Built-in Overview viewPre-configured Apify Dataset view with the most-useful columns visible by default

🎯 Built for these use cases

1. Tech content discovery & curation

What's being written about AI agents this week? About Rust on the backend? About Astro vs Next.js? Schedule the Actor with a tag filter for a continuously-fresh stream of new developer articles in your niche — feed your newsletter, your podcast research, your weekly internal dev digest.

2. Author network analysis

Pass a username — get every article that author has ever published, with tags and engagement metrics. Build maps of who writes about what, who's gaining traction, who's a credible voice in a given subdomain. Useful for DevRel outreach, podcast guest sourcing, technical-recruiting research and content partnerships.

3. Tag trend tracking

Pull weekly snapshots of the top tags in your space. Plot article counts per tag over time. See which technologies are heating up (rust vs go, nextjs vs remix, ollama vs vllm) months before consensus catches up.

4. Newsletter & content aggregator pipelines

Daily run, top-reactions filter (sort/cap downstream), email-ready output. Power a "best of Dev.to" weekly newsletter, a Slack-channel digest, an internal "what should we read?" feed, or an SEO content-syndication pipeline.

5. Devtool marketing & DevRel monitoring

You sell a devtool (CI service, framework, observability tool, hosting platform). Monitor mentions of your product, your competitor and your category in Dev.to articles. Catch reviews early, partner with active writers, surface integration tutorials your community is asking for.

6. Technical writing benchmarking

Studying a topic for an upcoming book / course / docs site? Pull every Dev.to article on that tag. See what's been covered, what hasn't, what got engagement and what didn't — a research shortcut.

7. LLM / NLP training data

Dev.to articles are well-tagged, clearly attributed and topically diverse — great structure for fine-tuning a developer-assistant model on contemporary tech writing.

8. Competitive analysis & influencer tracking

Track every Dev.to article published by competitors and influential authors in your space. Engagement deltas, posting cadence, topic mix — all directly observable.


📥 Inputs

FieldTypeRequiredDescription
tagstringNoTag filter, e.g. javascript, ai, webdev, python, react, rust, devops, career. Leave empty for the latest feed across all tags.
usernamestringNoFilter to a specific author's articles by Dev.to username (e.g. ben, ali, florincornea). Combine with tag or leave standalone.
maxArticlesintegerNoHard cap on rows. 0 = pull every available article for the filter.

Example inputs

Latest articles across all tags:

{
"tag": "",
"username": "",
"maxArticles": 200
}

Every AI article on Dev.to:

{
"tag": "ai",
"maxArticles": 0
}

Every article by a specific author:

{
"username": "ben",
"maxArticles": 0
}

A specific author's posts on a specific tag:

{
"tag": "javascript",
"username": "florincornea",
"maxArticles": 50
}

📤 Output

One Apify dataset row per article. Sample:

{
"id": 1234567,
"title": "10 JavaScript Tricks You Should Know in 2026",
"description": "A practical roundup of modern JS techniques you can apply today.",
"url": "https://dev.to/janedev/10-javascript-tricks-you-should-know-2026",
"author": "Jane Developer",
"authorUsername": "janedev",
"tags": ["javascript", "webdev", "beginners", "tutorial"],
"commentsCount": 24,
"reactionsCount": 312,
"readingTimeMinutes": 6,
"coverImage": "https://res.cloudinary.com/practicaldev/.../cover.png",
"publishedAt": "2026-05-10T09:00:00Z",
"scrapedAt": "2026-05-16T10:00:00.000Z"
}

Full field reference

FieldTypeMeaning
idnumberDev.to numeric article ID
titlestringArticle title
descriptionstringShort description / summary
urlstringCanonical URL of the article on dev.to
authorstringDisplay name of the author
authorUsernamestringDev.to username (use this for follow-up username queries)
tagsarrayTags attached to the article (up to 4 on Dev.to)
commentsCountnumberNumber of comments on the article
reactionsCountnumberTotal reactions (likes, unicorns, bookmarks combined)
readingTimeMinutesnumberEstimated reading time in minutes
coverImagestringURL of the article's cover image
publishedAtstringISO 8601 publication timestamp
scrapedAtstringISO 8601 timestamp of the scrape

⚙️ How it works

  1. Parses input — tag, username, max cap.
  2. Calls https://dev.to/api/articles with the right query: ?tag=<tag>&username=<username>&per_page=100&page=1.
  3. Paginates through page=1..N until the cap is hit or the API returns an empty page.
  4. Backs off on HTTP 429 / 5xx with exponential retry.
  5. Flattens the nested Forem response — extracts user.nameauthor, user.usernameauthorUsername, tag_listtags, normalizes timestamps to ISO 8601.
  6. Streams each article as one flat row directly into the Apify Dataset.

The Actor uses ONLY the official, publicly-documented Dev.to / Forem v1 API (dev.to/api/articles). No HTML scraping, no headless browser, no proxy, no auth.


⚡ Performance

WorkloadApprox timeAPI calls
Latest 100 articles, no filter~3 seconds1
500 articles for a tag~10 seconds5
2,000 articles for a tag~40 seconds20
All articles for an author (200 typical)~5 seconds2
Full tag backfill (10,000+ articles)~5 minutes~100

The Forem API returns up to 100 articles per page. The Actor stays comfortably within published rate limits via built-in pacing.


💰 Cost model

Pay-Per-Result. You only pay for article rows actually saved. Pages that return zero matches are not billed.

Typical costs (rough order):

  • Daily newsletter ingestion (~100 latest) → tiny
  • Weekly tag-feed sweep (~500 per tag) → small
  • Author network mapping (1,000 authors × top 10 each) → moderate
  • Full historical tag backfill (10,000+) → moderate but bounded

🔄 Schedule for continuous monitoring

Common patterns:

  • Hourly for new-article alerts in fast-moving tags (ai, llm)
  • Daily at 7:00 UTC for newsletter and Slack-channel digests
  • Weekly for "what's trending in
  • Monthly for ecosystem health dashboards and content audits

Push new rows into Slack, Discord, Notion, Airtable, Sheets, your CRM, Postgres, BigQuery, your newsletter sender or any HTTP endpoint via Apify Webhooks.


🛠️ FAQ

Do I need a Dev.to API key or login? No. The Actor uses Dev.to's fully public v1 API, which doesn't require authentication for read-only article queries.

Is it legal to scrape Dev.to? The Actor reads publicly available article metadata via Dev.to's official public API — an API intended for programmatic access. You are responsible for complying with Dev.to's terms of service and for how you use the data (especially attribution if you republish).

How many articles can I get per run? As many as the API serves for your filter. Set maxArticles=0 to pull everything for your tag/username; set a number for a faster, capped run.

Can I filter by tag AND author at the same time? Yes — supply both tag and username and the Actor returns articles that match both.

Does it return full article body / Markdown / HTML? This Actor returns the article metadata (including title, description, URL, tags, engagement). For the full article body, the Dev.to API exposes that on the per-article endpoint — request a companion Actor build if you need it.

Does it include engagement metrics? Yes — reactionsCount, commentsCount and readingTimeMinutes are every record's three engagement signals. Rank, filter or threshold by them downstream.

Are private / unpublished articles included? No. The public API only exposes published articles. Drafts, scheduled posts and private content are not accessible.

Can I sort by reactions or comments? The Forem /api/articles endpoint returns articles in publish-date order by default. Sort by reactionsCount or commentsCount downstream in your spreadsheet, SQL or pandas.

Is the data fresh? Yes — the API serves data in near real-time. New articles typically appear within a minute or two of publishing.

How is this different from RSS-feed-based aggregators? Dev.to's RSS is shallow (latest N items only) and lacks engagement counts. This Actor uses the structured API for full pagination and rich metadata.

Can I use this for LLM training? Yes. Dev.to article metadata (and the body via the per-article endpoint) is well-tagged, clearly attributed and topically diverse — common pick for tech-writing training sets. Respect attribution and the original license.

What output formats are supported? JSON, CSV, Excel, HTML, XML, JSONL via the Apify Dataset, plus REST API and webhooks for live integrations.


Adjacent data sources in the social/dev/content suite:

ScraperPurpose
devto-articles-scraperYou are here. Dev.to articles by tag/author/feed via the public API.
hacker-news-search-scraperHN stories/comments/Show HN/Ask HN/front page by keyword.
hacker-news-who-is-hiring-scraperMonthly HN "Who is hiring?" thread parsed by company/role/stack.
reddit-subreddit-scraperPosts from any subreddit by sort and time window.
reddit-historical-archive-scraperYears of subreddit history at scale.
stack-exchange-questions-scraperQ&A across 170+ Stack Exchange sites by tag/site/sort.
github-repository-scraperPublic GitHub repo metadata by search query.
product-hunt-daily-launches-scraperToday's Product Hunt launches with votes and makers.
linkedin-top-content-scraperTop-performing LinkedIn posts by keyword/author.
linkedin-ad-library-scraperLinkedIn Ad Library — competitor ad creative & spend signals.
letterboxd-film-review-scraperFilm reviews from Letterboxd for culture/sentiment work.
instagram-media-downloaderReels/Posts/Stories HD download URLs in bulk.

🔑 Keyword cloud

Core: devto scraper, dev.to scraper, devto api scraper, dev.to articles scraper, devto blog scraper, devto tag scraper, devto author scraper, devto json export, devto csv export, forem scraper, forem api scraper, tech blog scraper, developer articles scraper, developer blog dataset.

Niche: devto javascript scraper, devto python scraper, devto ai scraper, devto webdev scraper, devto rust scraper, devto react scraper, devto devops scraper, devto career scraper, devto beginners scraper, devto tutorial scraper, devto reactions scraper, devto comments count scraper, devto reading time scraper, devto cover image scraper.

Use case: tech content discovery, developer content aggregator, tech newsletter automation, author network analysis, tag trend tracking, devrel monitoring, devtool marketing intelligence, technical writing benchmarking, content audit dataset, content curation pipeline, competitive content tracking, influencer monitoring, podcast guest research, technical recruiting research, llm training data for developer writing, nlp corpus building, sentiment analysis on tech content.

Audience: technical content marketers, devrel teams, newsletter writers, dev tool product managers, founders of dev-tool startups, technical journalists, content aggregator owners, ml/llm engineers, ai researchers, technical recruiters, dev community managers, growth marketers targeting developers, podcast hosts in tech, developer educators, technical writers and authors.


Changelog

  • 2026-06-01 — Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.
  • 2026-05-25 — Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.

  • 2026-05-20 — Maintenance pass: reviewed the input schema and default values for a smooth one-click start, and rebuilt the Actor on the latest base image.

Last reviewed: 2026-06-01.

📝 Changelog

2026-06-04

  • Verified live & refreshed build — reliability/maintenance pass.