Stack Overflow Scraper: Questions, Answers, Users & Tags
Pricing
$1.00 / 1,000 result items
Stack Overflow Scraper: Questions, Answers, Users & Tags
Scrape any Stack Exchange site (stackoverflow, superuser, askubuntu, math.stackexchange and 170+ more) via the official Stack Exchange API. Questions, answers with full body, user profiles with reputation and badges, top tags, search. No auth, no proxies, no cookies. Pay only per result item.
Pricing
$1.00 / 1,000 result items
Rating
0.0
(0)
Developer
Perconey
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
What does Stack Overflow Scraper do?
Stack Overflow Scraper pulls structured data from any Stack Exchange site through the official Stack Exchange API v2.3. Get top questions, full search results, every answer to a single question, user profiles with reputation and badges, and trending tags. The actor calls the documented public API directly, so no browser, no proxies, no cookies, no anti-bot fight, ever. One actor covers 170+ Stack Exchange sites under a single site parameter: stackoverflow, superuser, askubuntu, math.stackexchange, codereview.stackexchange, gaming.stackexchange, parenting.stackexchange, and many more.
Try it instantly: pick getQuestions, leave site as stackoverflow, click Start. You get the top 100 highest-voted Stack Overflow questions of all time with full metadata, in under 10 seconds, for about $0.10.
Why use Stack Overflow Scraper?
- Recruiters and sourcers: Stack Overflow Talent shut down in 2024. Roll your own developer pipeline. Use
searchUsersandgetUserAnswersto find high-reputation engineers in a specific tag, then export contact via their linked profiles. - DevRel and product teams: Monitor questions tagged with your product (e.g.
tensorflow,langchain,kubernetes) usinggetQuestionsByTag. Set up an Apify schedule to alert you on new high-vote questions about your SDK. - Content marketers: Use
searchQuestionswithsort=hotorsort=weekto find trending questions worth writing about. UsegetTopTagsto discover where developer attention is shifting. - Q&A dataset builders: With
getQuestionDetail+includeAnswers: true+includeBody: true, you get clean markdown Q&A pairs perfect for fine-tuning or RAG. Far cheaper than scraping Stack Overflow's own data dump (license tightened in 2024). - Competitive intelligence: How active is the community around a competitor's stack? Run
getQuestionsByTagfor their products and ours over the same date window.
How to use Stack Overflow Scraper
- Open the Input tab.
- Pick an action from the dropdown.
getQuestionsis the simplest starting point. - Set site (default
stackoverflow). To scrape a different Stack Exchange site, type its short name without.com(e.g.superuser,askubuntu). - For search/profile/tag/detail actions, fill queries (one entry per line). For top-questions and top-tags actions, leave queries empty.
- Set maxItems to cap the run. Default 100.
- (Optional) Paste a free Stack Apps API key to lift the quota from 300 to 10,000 requests per day. Register at https://stackapps.com/apps/oauth/register.
- Click Start. Results stream to the dataset and you can preview them on the Output tab.
Input
| Field | Required | Description |
|---|---|---|
action | yes | Which API call to make. See the dropdown for the eight options. |
site | yes | Stack Exchange site short name. Default stackoverflow. |
queries | sometimes | Required for search/detail/profile actions. Free text for searchQuestions/searchUsers; a tag for getQuestionsByTag; an id or full URL for getQuestionDetail, getUserProfile, getUserAnswers. |
maxItems | no | Max items per query. Default 100. |
sort | no | API sort key (e.g. votes, activity, hot, week, reputation). |
order | no | desc (default) or asc. |
tagged | no | For getQuestions / searchQuestions: limit to questions with this tag. |
since / until | no | ISO date filters for question-listing actions. |
includeBody | no | If true, also fetch full question/answer body as markdown. |
includeAnswers | no | For getQuestionDetail: also fetch all answers. Default true. |
apiKey | no | Stack Apps app key to lift the daily quota. |
Output
Every dataset item carries a _type field (question, answer, user, tag, or error) plus _action and _site for filtering when one run mixes actions. Field names match the Stack Exchange API types, with Unix timestamps converted to ISO 8601.
{"_type": "question","_action": "getQuestions","_site": "stackoverflow","question_id": 11227809,"title": "Why is processing a sorted array faster than processing an unsorted array?","link": "https://stackoverflow.com/questions/11227809/...","tags": ["java", "c++", "performance", "cpu-architecture", "branch-prediction"],"score": 27000,"view_count": 1900000,"answer_count": 27,"comment_count": 4,"is_answered": true,"accepted_answer_id": 11227902,"creation_date": "2012-06-27T13:51:36.000Z","owner": {"user_id": 1539405,"display_name": "GManNickG","reputation": 510000,"link": "https://stackoverflow.com/users/1539405/gmannickg"}}
You can download the dataset in JSON, CSV, XML, Excel, RSS or HTML format from the Output tab or the Apify API.
Data fields
| Type | Key fields |
|---|---|
question | question_id, title, link, tags, score, view_count, answer_count, is_answered, accepted_answer_id, creation_date, owner, body (optional) |
answer | answer_id, question_id, is_accepted, score, link, creation_date, owner, body (optional) |
user | user_id, display_name, reputation, link, location, website_url, about_me, badge_counts, question_count, answer_count |
tag | name, count (total questions tagged), is_required, has_synonyms |
Pricing
Pay-per-result: $0.001 per item. One question = one event. One answer = one event. One user profile = one event. No flat monthly fee, no rental, no charge for the time the actor runs (just Apify's default compute, ~$0.0002 per typical run at 512 MB).
Cost examples:
- Top 100 questions on a tag: $0.10
- 500-user shortlist for a recruiting campaign (searchUsers + getUserAnswers, ~6 answers per user): $3.50
- 1,000 Q&A pairs for a fine-tuning dataset (getQuestionDetail with includeAnswers, ~5 answers per question): $6.00
Tips
- Get a Stack Apps key if you plan to run more than ~30 small runs per day from the same IP. It lifts the API quota from 300 to 10,000 requests/day. Free to register, no review process.
- Use
includeBody: falseunless you actually need the markdown body. The response is ~10x smaller and faster. getQuestionDetailis the cheapest way to get Q&A pairs because one API call returns the question, and a second returns all answers paged. WithmaxItems: 50you cap at 50 answers per question.- Cross-site research: schedule the same input with different
sitevalues to compare communities (e.g.reacttag on stackoverflow vs. softwareengineering.stackexchange). - Date windows: combine
tagged: tensorflowwithsince: 2025-01-01to see what users asked since the GPT-5 release.
FAQ, disclaimers, support
Is this legal? The actor calls Stack Exchange's official public API (api.stackexchange.com) with documented endpoints. Public read access is explicitly permitted by Stack Exchange's API Terms of Service. We send a clear User-Agent string identifying the actor. Stack Exchange content is licensed CC BY-SA - attribution required when republishing.
Why the 300-request anonymous limit? That's Stack Exchange's policy, not Apify's. Register a free app at https://stackapps.com/apps/oauth/register for 10,000/day per IP.
Will I get rate-limited? The actor reads the API's backoff hint and sleeps automatically. We also retry on 429/502/503/504 with exponential backoff. Quota and backoff are logged for transparency.
Does it cover Reddit / Quora / GitHub Discussions? No. This actor is only for Stack Exchange's 170+ sites. Each platform deserves its own actor for clean data shapes.
Bug or feature request? Open an Issue on the actor's Issues tab. I usually respond within a day.
Need a custom scraper for another Q&A platform? Bluesky? Substack? Mastodon? See my other actors at https://apify.com/perconey, or open an Issue.