Stack Exchange Scraper avatar

Stack Exchange Scraper

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Stack Exchange Scraper

Stack Exchange Scraper

Scrape questions, answers, users, and tags from Stack Overflow and 170+ Stack Exchange communities. HTTP-only via the public Stack Exchange API. No login, no proxy.

Pricing

from $1.00 / 1,000 results

Rating

5.0

(20)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

20

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Scrape questions, answers, users, and tags from Stack Overflow + 170+ Stack Exchange communities (Server Fault, Super User, Math, Cross Validated, Ask Ubuntu, Code Review, Software Engineering, AI, Data Science, Security, DBA, GIS, and more). HTTP-only via the official Stack Exchange API v2.3. No login, no cookies, no proxy.

What this actor does

  • Five fetch modes: top questions, full-text search, by tag, by user, unanswered
  • Returns full HTML + Markdown bodies, scores, view counts, answer counts, tags, owner profile, accepted answer ID, and direct URLs
  • Optionally fetches all answers per question (sorted by votes)
  • Filters by score, answer count, "answered only", date range, and tag intersection
  • Honors Stack Exchange's quota and backoff headers; passes your free API key for 10k/day quota (vs. 300/day anonymous)
  • Sites covered: 170+ Stack Exchange communities (programming, math, science, language, hobbies, professional, life)

Output per question

  • questionId, title, link, score, viewCount, answerCount
  • body (HTML) + bodyMarkdown (cleaned Markdown) — when includeBody=true
  • tags[], isAnswered, acceptedAnswerId
  • owner{userId, displayName, reputation, profileUrl, profileImage}
  • createdAt, lastActivityAt, lastEditAt
  • answers[] — when includeAnswers=true, each with answerId, body, bodyMarkdown, score, isAccepted, owner, createdAt
  • recordType: "question", site, scrapedAt

Mode user emits user records: userId, displayName, profileUrl, profileImage, reputation, goldBadges, silverBadges, bronzeBadges, location, websiteUrl, aboutMe (Markdown), creationDate, lastAccessDate, recordType: "user".

Empty fields are omitted (no nulls).

Input

FieldTypeDefaultDescription
sitestringstackoverflowWhich Stack Exchange site (90+ enum)
modestringtopQuestionstopQuestions / search / tag / user / unanswered
searchQuerystringRequired for mode=search
tagAnyOfarray[]Tags to filter (required for mode=tag)
userIdsarray[]User IDs (required for mode=user)
sortBystringvotesactivity / votes / creation / hot / week / month
dateRangeFromstringISO date — drop questions older than this
dateRangeTostringISO date — drop questions newer than this
minScoreintDrop questions below this score
minAnswersintDrop questions with fewer answers than this
isAnsweredOnlyboolfalseOnly emit questions with an accepted answer
includeAnswersboolfalseAlso fetch answers per question (~1 extra call each)
includeBodybooltrueInclude full HTML + Markdown body
maxItemsint50Hard cap on emitted records (1–5000)
apiKeystringOptional free API key (10k/day vs 300/day anonymous)

Example: top Python questions on Stack Overflow

{
"site": "stackoverflow",
"mode": "topQuestions",
"tagAnyOf": ["python"],
"sortBy": "votes",
"maxItems": 50
}

Example: search for "async/await"

{
"site": "stackoverflow",
"mode": "search",
"searchQuery": "async await",
"tagAnyOf": ["python"],
"minScore": 10,
"includeAnswers": true,
"maxItems": 25
}

Example: unanswered questions for Devrel monitoring

{
"site": "stackoverflow",
"mode": "unanswered",
"tagAnyOf": ["langchain"],
"maxItems": 100
}

Example: a user's profile + their top questions

{
"site": "stackoverflow",
"mode": "user",
"userIds": ["9285"],
"maxItems": 1
}

Example: latest hot questions on Server Fault

{
"site": "serverfault",
"mode": "topQuestions",
"sortBy": "hot",
"dateRangeFrom": "2025-01-01",
"maxItems": 100
}

Use cases

  • Developer relations — find unanswered questions about your library/tool/SDK to engage with the community
  • Technical content marketing — gap analysis on what topics get asked but aren't well-covered yet
  • Recruiting — find domain experts by tag + reputation (mode=user)
  • Q&A datasets for ML/RAG — bulk-export curated answers for fine-tuning or retrieval indexes
  • Community management — monitor your tag for new questions
  • Programming research — analyze patterns in what developers struggle with
  • Competitive intelligence — track questions about competitor products
  • Documentation prioritization — high-view low-score questions reveal docs gaps

FAQ

Does it require a login or cookies? No. The Stack Exchange API is fully public.

Is a proxy needed? No. Stack Exchange accepts requests from any IP.

What is the API quota? 300 requests/day without an API key; 10,000/day with a free key. Register a key at https://stackapps.com/apps/oauth/register and pass it via apiKey.

Which sites are supported? 170+ Stack Exchange communities. The site enum lists the most popular 90; you can also pass any other valid Stack Exchange site key (e.g. cooking, worldbuilding, quant).

Why are some fields empty? The actor omits empty fields rather than emit nulls. For example, aboutMe is only present if a user filled it in.

Can I get the answer text? Yes — set includeAnswers: true. Each question record then includes an answers[] array with body, score, owner, and accepted flag for every answer.

What's the difference between score and viewCount? score is up-votes minus down-votes; viewCount is total page views. High-view low-score questions often signal missing canonical answers.

How do tagAnyOf filters interact across modes? In mode=tag they're required and ANDed by the API. In mode=topQuestions and mode=search they're an optional filter. In mode=user they're ignored (user mode returns the user, not their questions).

Why HTML + Markdown bodies? HTML is what Stack Exchange returns natively (preserves code blocks, tables, links). Markdown is a cleaned version optimized for LLM ingestion / display in dashboards.

How fresh is the data? Real-time. Stack Exchange's API surfaces new questions within seconds of posting.

Can I scrape comments? Comments are not part of v1. Use mode=topQuestions + includeAnswers=true to get the full Q&A thread; comment support may be added in a future version.

What happens if I exceed the quota? The actor watches the quota_remaining and backoff headers and waits gracefully. If the daily quota is exhausted, you'll get an empty result and a status message — pass an apiKey to raise the cap to 10k/day.