Stack Exchange Scraper
Pricing
from $1.00 / 1,000 results
Stack Exchange Scraper
Scrape questions, answers, users, and tags from Stack Overflow and 170+ Stack Exchange communities. HTTP-only via the public Stack Exchange API. No login, no proxy.
Pricing
from $1.00 / 1,000 results
Rating
5.0
(20)
Developer
Crawler Bros
Actor stats
20
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Scrape questions, answers, users, and tags from Stack Overflow + 170+ Stack Exchange communities (Server Fault, Super User, Math, Cross Validated, Ask Ubuntu, Code Review, Software Engineering, AI, Data Science, Security, DBA, GIS, and more). HTTP-only via the official Stack Exchange API v2.3. No login, no cookies, no proxy.
What this actor does
- Five fetch modes: top questions, full-text search, by tag, by user, unanswered
- Returns full HTML + Markdown bodies, scores, view counts, answer counts, tags, owner profile, accepted answer ID, and direct URLs
- Optionally fetches all answers per question (sorted by votes)
- Filters by score, answer count, "answered only", date range, and tag intersection
- Honors Stack Exchange's quota and
backoffheaders; passes your free API key for 10k/day quota (vs. 300/day anonymous) - Sites covered: 170+ Stack Exchange communities (programming, math, science, language, hobbies, professional, life)
Output per question
questionId,title,link,score,viewCount,answerCountbody(HTML) +bodyMarkdown(cleaned Markdown) — whenincludeBody=truetags[],isAnswered,acceptedAnswerIdowner—{userId, displayName, reputation, profileUrl, profileImage}createdAt,lastActivityAt,lastEditAtanswers[]— whenincludeAnswers=true, each withanswerId,body,bodyMarkdown,score,isAccepted,owner,createdAtrecordType: "question",site,scrapedAt
Mode user emits user records: userId, displayName, profileUrl, profileImage, reputation, goldBadges, silverBadges, bronzeBadges, location, websiteUrl, aboutMe (Markdown), creationDate, lastAccessDate, recordType: "user".
Empty fields are omitted (no nulls).
Input
| Field | Type | Default | Description |
|---|---|---|---|
site | string | stackoverflow | Which Stack Exchange site (90+ enum) |
mode | string | topQuestions | topQuestions / search / tag / user / unanswered |
searchQuery | string | – | Required for mode=search |
tagAnyOf | array | [] | Tags to filter (required for mode=tag) |
userIds | array | [] | User IDs (required for mode=user) |
sortBy | string | votes | activity / votes / creation / hot / week / month |
dateRangeFrom | string | – | ISO date — drop questions older than this |
dateRangeTo | string | – | ISO date — drop questions newer than this |
minScore | int | – | Drop questions below this score |
minAnswers | int | – | Drop questions with fewer answers than this |
isAnsweredOnly | bool | false | Only emit questions with an accepted answer |
includeAnswers | bool | false | Also fetch answers per question (~1 extra call each) |
includeBody | bool | true | Include full HTML + Markdown body |
maxItems | int | 50 | Hard cap on emitted records (1–5000) |
apiKey | string | – | Optional free API key (10k/day vs 300/day anonymous) |
Example: top Python questions on Stack Overflow
{"site": "stackoverflow","mode": "topQuestions","tagAnyOf": ["python"],"sortBy": "votes","maxItems": 50}
Example: search for "async/await"
{"site": "stackoverflow","mode": "search","searchQuery": "async await","tagAnyOf": ["python"],"minScore": 10,"includeAnswers": true,"maxItems": 25}
Example: unanswered questions for Devrel monitoring
{"site": "stackoverflow","mode": "unanswered","tagAnyOf": ["langchain"],"maxItems": 100}
Example: a user's profile + their top questions
{"site": "stackoverflow","mode": "user","userIds": ["9285"],"maxItems": 1}
Example: latest hot questions on Server Fault
{"site": "serverfault","mode": "topQuestions","sortBy": "hot","dateRangeFrom": "2025-01-01","maxItems": 100}
Use cases
- Developer relations — find unanswered questions about your library/tool/SDK to engage with the community
- Technical content marketing — gap analysis on what topics get asked but aren't well-covered yet
- Recruiting — find domain experts by tag + reputation (mode=user)
- Q&A datasets for ML/RAG — bulk-export curated answers for fine-tuning or retrieval indexes
- Community management — monitor your tag for new questions
- Programming research — analyze patterns in what developers struggle with
- Competitive intelligence — track questions about competitor products
- Documentation prioritization — high-view low-score questions reveal docs gaps
FAQ
Does it require a login or cookies? No. The Stack Exchange API is fully public.
Is a proxy needed? No. Stack Exchange accepts requests from any IP.
What is the API quota? 300 requests/day without an API key; 10,000/day with a free key. Register a key at https://stackapps.com/apps/oauth/register and pass it via apiKey.
Which sites are supported? 170+ Stack Exchange communities. The site enum lists the most popular 90; you can also pass any other valid Stack Exchange site key (e.g. cooking, worldbuilding, quant).
Why are some fields empty? The actor omits empty fields rather than emit nulls. For example, aboutMe is only present if a user filled it in.
Can I get the answer text? Yes — set includeAnswers: true. Each question record then includes an answers[] array with body, score, owner, and accepted flag for every answer.
What's the difference between score and viewCount? score is up-votes minus down-votes; viewCount is total page views. High-view low-score questions often signal missing canonical answers.
How do tagAnyOf filters interact across modes? In mode=tag they're required and ANDed by the API. In mode=topQuestions and mode=search they're an optional filter. In mode=user they're ignored (user mode returns the user, not their questions).
Why HTML + Markdown bodies? HTML is what Stack Exchange returns natively (preserves code blocks, tables, links). Markdown is a cleaned version optimized for LLM ingestion / display in dashboards.
How fresh is the data? Real-time. Stack Exchange's API surfaces new questions within seconds of posting.
Can I scrape comments? Comments are not part of v1. Use mode=topQuestions + includeAnswers=true to get the full Q&A thread; comment support may be added in a future version.
What happens if I exceed the quota? The actor watches the quota_remaining and backoff headers and waits gracefully. If the daily quota is exhausted, you'll get an empty result and a status message — pass an apiKey to raise the cap to 10k/day.