Pricing

from $2.50 / 1,000 questions

Stack Overflow & Stack Exchange Scraper

[💰 $2.5 / 1K] Extract questions from Stack Overflow and the 170+ site Stack Exchange network. Search by keyword or tag, sort by votes/activity, or pull specific questions by URL. Optionally collect answers and comments as linked rows.

Pricing

from $2.50 / 1,000 questions

Rating

0.0

(0)

Developer

SolidCode

Actor stats

Bookmarked

Total users

Monthly active users

22 days ago

Last modified

Why This Scraper?

170+ Stack Exchange communities from one field — Stack Overflow, Server Fault, Super User, Ask Ubuntu, Mathematics, Cross Validated, Unix & Linux, Data Science, and 20+ more curated sites, all selectable without touching a URL.
Full question + answer + comment bodies in Markdown — not just titles and metadata. Pull the actual content behind every thread, ready for content analysis and LLM pipelines.
Answers with vote score, accepted-answer flag, and author reputation — every answer row carries score, isAccepted, and the answerer's authorReputation, so you can rank canonical solutions instantly.
Comments on both questions and answers — each comment row is tagged postType (question or answer), carries the postId of the exact question or answer it hangs off, and is linked back to its parent question by ID. Answer comments arrive when you enable both answers and comments.
Linked three-record output — question, answer, and comment rows share questionId so you can reassemble whole threads or load each record type into its own table.
Tag AND-filtering — pass python + pandas and get only questions carrying every tag, filtered on Stack Exchange's side so you never pay for off-target rows.
6 sort modes — Recent activity, Newest, Most votes, Hot, Top this week, and Top this month, plus a date range for precise "new since yesterday" windows.
Search a keyword or fetch exact questions by URL/ID — run a full-text search across titles and bodies, or paste specific question links like stackoverflow.com/questions/11227809/... to pull those threads directly.

Use Cases

Developer Tooling & IDE Plugins

Feed an in-editor "top accepted answers" panel for a language or framework tag
Surface the highest-voted solution for an error message inside a support bot
Keep a curated snippet library fresh from canonical Q&A threads

LLM & AI Training Data

Build instruction-tuning datasets of real questions paired with accepted answers
Extract Markdown code blocks and explanations for code-model pretraining
Assemble evaluation sets of high-score answers with their vote signals

Technical Research & Trend Analysis

Track which frameworks and libraries are gaining question volume over a date range
Analyze answer quality by score distribution across a tag
Compare activity between Stack Overflow and niche communities like Data Science or DevOps

Community & Reputation Monitoring

Watch a tag for newly asked, still-unanswered questions to jump on
Track top contributors by author reputation across a community
Alert on trending "Hot" threads in your product's ecosystem

Content & Documentation

Mine frequently asked questions to prioritize docs and knowledge-base articles
Pull real user phrasing for FAQ and help-center content
Source vetted code examples with attribution back to the original thread

Getting Started

Simple Keyword Search

One topic, newest 100 questions:

{
    "site": "stackoverflow",
    "searchQuery": "pandas groupby performance"
}

Tag Filter + Sort + Date Range

The highest-voted Kubernetes questions asked in 2024:

{
    "site": "stackoverflow",
    "tags": ["kubernetes", "networking"],
    "sort": "votes",
    "fromDate": "2024-01-01",
    "toDate": "2024-12-31",
    "maxResults": 200
}

Specific Questions with Answers + Comments

Pull two exact threads with their full Q&A content:

{
    "questionUrlsOrIds": [
        "https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster",
        "231767"
    ],
    "includeQuestionBody": true,
    "includeAnswers": true,
    "maxAnswersPerQuestion": 5,
    "includeComments": true
}

Browse a Whole Community

The most-active questions on Unix & Linux, no keyword needed:

{
    "site": "unix",
    "sort": "activity",
    "maxResults": 500
}

Input Reference

What to Scrape

Parameter	Type	Default	Description
`site`	string	`"stackoverflow"`	Which Stack Exchange community to pull from — Stack Overflow, Server Fault, Super User, Ask Ubuntu, Mathematics, Data Science, Cross Validated, and more.
`searchQuery`	string	`""`	Full-text search across question titles and bodies (e.g. `"kubernetes ingress timeout"`). Leave blank to browse by tag and sort order instead.
`tags`	array	`[]`	Only include questions carrying ALL of these tags (e.g. `python`, `pandas`). Use exact tag names as they appear on the site. Leave empty to include every tag.
`questionUrlsOrIds`	array	`[]`	Fetch specific questions directly by URL or numeric ID. When set, the keyword/tag/sort finders are ignored for those questions.

Filters

Parameter	Type	Default	Description
`sort`	string	`"activity"`	Order questions are collected in: Recent activity, Newest, Most votes, Hot, Top this week, or Top this month. Ignored when fetching specific question URLs/IDs.
`fromDate`	string	`""`	Only include questions created on or after this date (`YYYY-MM-DD`). Perfect for scheduled "new since yesterday" runs.
`toDate`	string	`""`	Only include questions created on or before this date (`YYYY-MM-DD`).

Limits & Content

Parameter	Type	Default	Description
`maxResults`	integer	`100`	Maximum number of questions to collect. Set to `0` for as many as the site returns. The full last page is kept even if it slightly overshoots. Ignored when fetching specific question URLs/IDs.
`includeQuestionBody`	boolean	`false`	Include each question's full body text (Markdown), not just its title.
`includeAnswers`	boolean	`false`	Also collect each question's answers — with body, score, accepted flag, and author — as separate linked rows.
`maxAnswersPerQuestion`	integer	`0`	Cap how many answers to collect per question when answers are enabled. `0` = all.
`includeComments`	boolean	`false`	Also collect comments as separate linked rows. Question comments are always included; answer comments are included when both answers and comments are enabled.

Output

Every row carries a recordType field — question, answer, or comment — and shares a questionId so you can rejoin whole threads or load each type into its own table.

Question (`recordType: "question"`)

{
    "recordType": "question",
    "questionId": 11227809,
    "title": "Why is processing a sorted array faster than processing an unsorted array?",
    "link": "https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array",
    "site": "stackoverflow",
    "tags": ["java", "c++", "performance", "cpu-architecture", "branch-prediction"],
    "author": "GManNickG",
    "authorId": 87234,
    "authorReputation": 511234,
    "score": 27543,
    "viewCount": 1850342,
    "answerCount": 25,
    "commentCount": 12,
    "isAnswered": true,
    "hasAcceptedAnswer": true,
    "acceptedAnswerId": 11227902,
    "body": "Here is a piece of C++ code that shows some very peculiar behavior...",
    "createdAt": "2012-06-27T13:51:36+00:00",
    "lastActivityAt": "2024-05-10T09:12:04+00:00",
    "scrapedAt": "2026-07-02T10:15:00+00:00"
}

Field	Type	Description
`recordType`	string	Always `"question"`
`questionId`	integer	Stack Exchange question ID — the link key for answers and comments
`title`	string	Question title
`link`	string	Canonical question URL
`site`	string	Source community (e.g. `stackoverflow`)
`tags`	array	Tags on the question
`author`	string	Asker display name
`authorId`	integer	Asker account ID (`null` for deleted accounts)
`authorReputation`	integer	Asker reputation
`score`	integer	Net votes
`viewCount`	integer	Total views
`answerCount`	integer	Number of answers
`commentCount`	integer	Number of comments on the question
`isAnswered`	boolean	Whether the question is marked answered
`hasAcceptedAnswer`	boolean	Whether an accepted answer exists
`acceptedAnswerId`	integer	ID of the accepted answer, if any
`body`	string	Question body in Markdown — only when `includeQuestionBody` is on
`createdAt`	string	Creation timestamp (ISO 8601)
`lastActivityAt`	string	Last-activity timestamp (ISO 8601)
`scrapedAt`	string	Collection timestamp (ISO 8601)

Answer (`recordType: "answer"`)

Emitted only when includeAnswers is on.

{
    "recordType": "answer",
    "answerId": 11227902,
    "questionId": 11227809,
    "site": "stackoverflow",
    "body": "**Branch prediction.**\n\nWith a sorted array, the condition is predictable...",
    "score": 36012,
    "isAccepted": true,
    "author": "Mysticial",
    "authorId": 922184,
    "authorReputation": 481203,
    "commentCount": 8,
    "createdAt": "2012-06-27T13:56:42+00:00",
    "lastActivityAt": "2023-08-14T18:33:20+00:00"
}

Field	Type	Description
`recordType`	string	Always `"answer"`
`answerId`	integer	Answer ID
`questionId`	integer	Parent question ID (link key)
`site`	string	Source community
`body`	string	Answer body in Markdown
`score`	integer	Net votes
`isAccepted`	boolean	Whether this is the accepted answer
`author`	string	Answerer display name
`authorId`	integer	Answerer account ID (`null` for deleted accounts)
`authorReputation`	integer	Answerer reputation
`commentCount`	integer	Number of comments on this answer
`createdAt`	string	Creation timestamp (ISO 8601)
`lastActivityAt`	string	Last-activity timestamp (ISO 8601)

Comment (`recordType: "comment"`)

Emitted only when includeComments is on.

{
    "recordType": "comment",
    "commentId": 14738201,
    "postId": 11227902,
    "postType": "answer",
    "questionId": 11227809,
    "site": "stackoverflow",
    "body": "This is the clearest explanation of branch prediction I have ever read.",
    "score": 214,
    "author": "user1234",
    "authorId": 445566,
    "createdAt": "2012-06-28T08:04:11+00:00"
}

Field	Type	Description
`recordType`	string	Always `"comment"`
`commentId`	integer	Comment ID
`postId`	integer	ID of the question or answer the comment belongs to
`postType`	string	`"question"` or `"answer"` — what the comment is attached to
`questionId`	integer	Parent question ID (link key)
`site`	string	Source community
`body`	string	Comment body in Markdown
`score`	integer	Net votes
`author`	string	Commenter display name
`authorId`	integer	Commenter account ID (`null` for deleted accounts)
`createdAt`	string	Creation timestamp (ISO 8601)

Tips for Best Results

Mine canonical answers with votes sort + tag AND-filtering. Combine two or three specific tags with sort: "votes" to surface the definitive, highest-scored solutions for a topic — ideal for training data and snippet libraries.
Set a date range for trend windows. Pair fromDate and toDate to isolate a quarter or a release window and measure how question volume for a framework shifts over time.
Reach beyond Stack Overflow with site. The same run works on Server Fault, Super User, Data Science, Unix & Linux, and 170+ other communities — switch site to pull domain-specific Q&A the main site doesn't cover.
Cap answers on popular threads. Canonical questions can carry 30–50+ answers. Set maxAnswersPerQuestion to keep only the top few and control run size and cost. Note that answerCount always reports the question's true total on the site, independent of how many answer rows you actually collect.
Turn on bodies only when you need content. Leave includeQuestionBody, includeAnswers, and includeComments off for lightweight metadata runs; enable them when you need the actual Markdown text.
Use Newest for scheduled incremental runs. sort: "creation" with a rolling fromDate reliably catches only questions added since your last run.
Fetch exact threads by URL for deep dives. Paste question links into questionUrlsOrIds to pull specific high-value threads with all their answers and comments in one shot.

Pricing

From $2.50 per 1,000 questions — undercuts the market rate for Stack Exchange extraction, and answers and comments (when you enable them) are billed separately at much lower rates. You pay only for the results you collect.

This actor uses a per-result model split by record type. Prices below are per 1,000 rows of that type; Bronze, Silver, and Gold subscribers pay progressively less.

Record type	No discount	Bronze	Silver	Gold
Question	$3.00	$2.80	$2.65	$2.50
Answer	$0.60	$0.56	$0.53	$0.50
Comment	$0.24	$0.22	$0.21	$0.20

Plus a small fixed $0.005 per-run start fee.

Because answers and comments are far cheaper than questions, your real total depends on the mix you collect. Example totals at the Gold tier:

What you collect	Rows	Cost at Gold
100 questions only	100 questions	$0.255
100 questions + ~3 answers each	100 questions + 300 answers	$0.405
100 questions + 300 answers + 500 comments	900 rows	$0.505

No compute or time-based charges — you pay only for the results you collect, plus the small fixed per-run start fee. Answers and comments are billed only when you turn them on. Platform fees (storage, data transfer) depend on your Apify plan.

Integrations

Export data in JSON, CSV, Excel, XML, or RSS. Connect to 1,500+ apps via:

Zapier / Make / n8n — Workflow automation
Google Sheets — Direct spreadsheet export
Slack / Email — Notifications on new results
Webhooks — Trigger custom APIs on run completion
Apify API — Full programmatic access

Legal & Ethical Use

This actor is designed for legitimate research, developer tooling, dataset building, and market intelligence. Users are responsible for complying with applicable laws and Stack Exchange's terms of service, including content-attribution and licensing requirements for any questions, answers, and comments collected. Do not use extracted data for spam, harassment, or any illegal purpose.

Stack Overflow Scraper — Questions, Answers & Tags

hichemdev/stackoverflow-scraper

Scrape Stack Overflow questions and answers by keyword or tag via the official Stack Exchange API: score, views, tags, author and body. Works on any Stack Exchange site.

Hichem Ben Moussa

Stack Exchange Questions Scraper

fetch_cat/stack-exchange-questions-scraper

Collect public Stack Overflow and Stack Exchange questions by site, tag, keyword, date, score, and answers for SEO, DevRel, product, and support research.

Hanna Nosova

Stack Overflow & Stack Exchange Q&A Scraper API

f0rty7even/stackexchange-scraper

Scrape questions and answers from Stack Overflow and any Stack Exchange site via the official API. Filter by tag, keyword, and sort. Clean text output, perfect for LLM/RAG datasets and dev research.

Michael Yousrie

Stack Overflow & Stack Exchange Scraper

ponderable_hydrometer/stackexchange-scraper

Search Stack Overflow & any Stack Exchange site for questions — and their answers — by keyword, tags, votes. Free API, optional key for higher quota. For research & datasets.

Ponderable Hydrometer

Stack Exchange — Questions Search (Stack Overflow & more)

omao/stackexchange

Search questions across Stack Overflow and any Stack Exchange site into clean JSON: title, score, views, answers, tags, owner, dates and link. Powered by the official Stack Exchange API. No API key, no anti-bot.

Marouane Oulabass

Stack Overflow Scraper — Stack Exchange Questions

devilscrapes/stackexchange-questions-scraper

Search and scrape questions across Stack Overflow and every Stack Exchange site — by tag, search query, or user — title, body, tags, score, views, answers, accepted answer, asker, timestamps — export to a JSON or CSV dataset. Built on the Stack Exchange v2.3 API.

DevilScrapes

Stack Exchange Scraper

crawlerbros/stack-exchange-scraper

Scrape questions, answers, users, and tags from Stack Overflow and 170+ Stack Exchange communities. HTTP-only via the public Stack Exchange API. No login, no proxy.

Crawler Bros

Stack Exchange Scraper - Questions, Answers, Tags

wetyr_corporation/stackexchange-scraper

Bulk extract questions and answers from Stack Overflow and any Stack Exchange site. Filter by tag, score, sort. Built for AI/LLM training, developer RAG, and technical research.

WETYR

Stack Exchange Q&A Scraper

crawlerbros/stack-exchange-qa-scraper

Scrape questions, answers, and site listings from Stack Overflow and 170+ Stack Exchange communities via the official Stack Exchange API v2.3. No login, no cookies, no proxy needed.

Crawler Bros

Stack Exchange Q&A Scraper

crawlergang/stack-exchange-qa-scraper

Scrape questions, answers, and site listings from Stack Overflow and 170+ Stack Exchange communities via the official Stack Exchange API v2.3. No login, no cookies, no proxy needed.