Stack Exchange Q&A Scraper
Pricing
from $3.00 / 1,000 results
Stack Exchange Q&A Scraper
Scrape questions, answers, and site listings from Stack Overflow and 170+ Stack Exchange communities via the official Stack Exchange API v2.3. No login, no cookies, no proxy needed.
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Crawler Bros
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
Scrape questions, answers, and site listings from Stack Overflow and 170+ Stack Exchange communities (Server Fault, Super User, Math, Cross Validated, Ask Ubuntu, Code Review, Software Engineering, AI, Data Science, Security, DBA, GIS, and more) via the official Stack Exchange API v2.3. No login, no cookies, no proxy required.
What this actor does
- Four fetch modes — search questions by keyword, browse by tag, fetch answers for specific question IDs, or list all Stack Exchange network sites
- Returns full question metadata — title, body excerpt, tags, score, view count, answer count, accepted answer ID, author, and direct URLs
- Answers mode — fetch all answers for a list of question IDs with body, score, author, and accepted flag
- Sites mode — list all 170+ Stack Exchange network sites with name, audience, type, icon, and launch date
- Honors Stack Exchange's quota and backoff headers automatically
- Optional API key for 10,000 requests/day (vs. 300/day anonymous)
- No proxy, no cookies — Stack Exchange is publicly accessible
Output
Questions (mode=searchQuestions or getQuestionsByTag)
| Field | Type | Description |
|---|---|---|
questionId | integer | Stack Exchange question ID |
title | string | Question title |
body | string | Plain-text excerpt of the question body (up to 500 chars) |
tags | array | Question tags (e.g. ["python", "asyncio"]) |
score | integer | Up-votes minus down-votes |
viewCount | integer | Total page views |
answerCount | integer | Number of posted answers |
acceptedAnswerId | integer | ID of the accepted answer (if any) |
isAnswered | boolean | Whether the question has an accepted answer |
author | string | Question author's display name |
authorReputation | integer | Author's Stack Exchange reputation |
createdAt | string | ISO-8601 creation timestamp (UTC) |
lastActivityAt | string | ISO-8601 last activity timestamp (UTC) |
questionUrl | string | Direct URL to the question |
site | string | Stack Exchange site slug (e.g. stackoverflow) |
scrapedAt | string | ISO-8601 scrape timestamp (UTC) |
Answers (mode=getAnswers)
| Field | Type | Description |
|---|---|---|
answerId | integer | Stack Exchange answer ID |
questionId | integer | Parent question ID |
body | string | Plain-text excerpt of the answer body (up to 500 chars) |
score | integer | Answer score |
isAccepted | boolean | Whether this answer was accepted |
author | string | Answer author's display name |
authorReputation | integer | Author's reputation |
createdAt | string | ISO-8601 creation timestamp (UTC) |
scrapedAt | string | ISO-8601 scrape timestamp (UTC) |
Sites (mode=listSites)
| Field | Type | Description |
|---|---|---|
siteUrl | string | Base URL of the community (e.g. https://stackoverflow.com) |
siteName | string | Community name (e.g. Stack Overflow) |
audience | string | Target audience description |
siteType | string | Site type (main_site, meta_site, etc.) |
iconUrl | string | Community icon URL |
launchDate | string | ISO-8601 launch date (UTC) |
scrapedAt | string | ISO-8601 scrape timestamp (UTC) |
Empty fields are always omitted — no nulls in output.
Input
| Field | Type | Default | Description |
|---|---|---|---|
mode | string | searchQuestions | searchQuestions / getQuestionsByTag / getAnswers / listSites |
query | string | – | Search keyword (required for mode=searchQuestions) |
tags | array | [] | Tags to filter (required for mode=getQuestionsByTag) |
questionIds | array | [] | Question IDs to fetch answers for (required for mode=getAnswers) |
site | string | stackoverflow | Stack Exchange site slug |
sortBy | string | votes | votes / activity / creation / relevance |
apiKey | string | – | Optional API key (10k/day vs 300/day anonymous) |
maxItems | integer | 25 | Maximum records to return (1–200) |
Example inputs
Search for Python questions on Stack Overflow
{"mode": "searchQuestions","query": "python","site": "stackoverflow","sortBy": "votes","maxItems": 25}
Browse JavaScript questions by tag
{"mode": "getQuestionsByTag","tags": ["javascript"],"site": "stackoverflow","sortBy": "votes","maxItems": 50}
Fetch answers for specific questions
{"mode": "getAnswers","questionIds": ["11227809", "231767"],"site": "stackoverflow","maxItems": 50}
List all Stack Exchange network sites
{"mode": "listSites","maxItems": 200}
Use cases
- Developer relations — monitor your library/SDK tag for new unanswered questions
- Technical content marketing — find gaps in documentation by analyzing high-view low-score questions
- Q&A datasets for ML/RAG — export curated answers for fine-tuning or retrieval-augmented generation
- Recruiting — identify domain experts by reputation and tag activity
- Community analytics — analyze trends in what developers struggle with
- Competitive intelligence — track questions about competitor products
- Documentation prioritization — high-view questions reveal documentation gaps
FAQ
Does it require a login or cookies? No. The Stack Exchange API is fully public and returns real data without authentication.
Is a proxy needed? No. Stack Exchange accepts requests from any IP address.
What is the API quota?
300 requests/day without an API key; 10,000/day with a free key. Register at https://stackapps.com/apps/oauth/register and pass it via apiKey.
Which sites are supported?
All 170+ Stack Exchange communities. Pass any valid site slug to the site field (e.g. stackoverflow, serverfault, math, cooking, worldbuilding). Use mode=listSites to enumerate all available sites.
Why are some fields missing?
Fields are omitted when the API returns no data for them (e.g. acceptedAnswerId only appears when a question has an accepted answer, authorReputation only appears when the author profile is available).
What does body contain?
The body field is a plain-text excerpt (up to 500 characters) extracted from the question or answer's HTML body. HTML tags are stripped, common entities are decoded, and the text is trimmed.
How does sortBy=relevance work?
relevance is only valid for mode=searchQuestions — it ranks results by how closely they match the search query. For other modes, votes is used.
What happens if I exceed the quota?
The actor watches the quota_remaining and backoff headers and sleeps gracefully when asked. Once the daily quota is exhausted, you will get fewer results — pass an apiKey to raise the cap to 10,000/day.
Can I scrape multiple tags at once?
Yes — pass multiple values in the tags array. Stack Exchange treats them as an AND filter (questions must have all specified tags).
Is the data real-time? Yes. Stack Exchange's API surfaces new questions within seconds of posting.