Stack Exchange Q&A Scraper avatar

Stack Exchange Q&A Scraper

Pricing

from $3.00 / 1,000 results

Go to Apify Store
Stack Exchange Q&A Scraper

Stack Exchange Q&A Scraper

Scrape questions, answers, and site listings from Stack Overflow and 170+ Stack Exchange communities via the official Stack Exchange API v2.3. No login, no cookies, no proxy needed.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Share

Scrape questions, answers, and site listings from Stack Overflow and 170+ Stack Exchange communities (Server Fault, Super User, Math, Cross Validated, Ask Ubuntu, Code Review, Software Engineering, AI, Data Science, Security, DBA, GIS, and more) via the official Stack Exchange API v2.3. No login, no cookies, no proxy required.

What this actor does

  • Four fetch modes — search questions by keyword, browse by tag, fetch answers for specific question IDs, or list all Stack Exchange network sites
  • Returns full question metadata — title, body excerpt, tags, score, view count, answer count, accepted answer ID, author, and direct URLs
  • Answers mode — fetch all answers for a list of question IDs with body, score, author, and accepted flag
  • Sites mode — list all 170+ Stack Exchange network sites with name, audience, type, icon, and launch date
  • Honors Stack Exchange's quota and backoff headers automatically
  • Optional API key for 10,000 requests/day (vs. 300/day anonymous)
  • No proxy, no cookies — Stack Exchange is publicly accessible

Output

Questions (mode=searchQuestions or getQuestionsByTag)

FieldTypeDescription
questionIdintegerStack Exchange question ID
titlestringQuestion title
bodystringPlain-text excerpt of the question body (up to 500 chars)
tagsarrayQuestion tags (e.g. ["python", "asyncio"])
scoreintegerUp-votes minus down-votes
viewCountintegerTotal page views
answerCountintegerNumber of posted answers
acceptedAnswerIdintegerID of the accepted answer (if any)
isAnsweredbooleanWhether the question has an accepted answer
authorstringQuestion author's display name
authorReputationintegerAuthor's Stack Exchange reputation
createdAtstringISO-8601 creation timestamp (UTC)
lastActivityAtstringISO-8601 last activity timestamp (UTC)
questionUrlstringDirect URL to the question
sitestringStack Exchange site slug (e.g. stackoverflow)
scrapedAtstringISO-8601 scrape timestamp (UTC)

Answers (mode=getAnswers)

FieldTypeDescription
answerIdintegerStack Exchange answer ID
questionIdintegerParent question ID
bodystringPlain-text excerpt of the answer body (up to 500 chars)
scoreintegerAnswer score
isAcceptedbooleanWhether this answer was accepted
authorstringAnswer author's display name
authorReputationintegerAuthor's reputation
createdAtstringISO-8601 creation timestamp (UTC)
scrapedAtstringISO-8601 scrape timestamp (UTC)

Sites (mode=listSites)

FieldTypeDescription
siteUrlstringBase URL of the community (e.g. https://stackoverflow.com)
siteNamestringCommunity name (e.g. Stack Overflow)
audiencestringTarget audience description
siteTypestringSite type (main_site, meta_site, etc.)
iconUrlstringCommunity icon URL
launchDatestringISO-8601 launch date (UTC)
scrapedAtstringISO-8601 scrape timestamp (UTC)

Empty fields are always omitted — no nulls in output.

Input

FieldTypeDefaultDescription
modestringsearchQuestionssearchQuestions / getQuestionsByTag / getAnswers / listSites
querystringSearch keyword (required for mode=searchQuestions)
tagsarray[]Tags to filter (required for mode=getQuestionsByTag)
questionIdsarray[]Question IDs to fetch answers for (required for mode=getAnswers)
sitestringstackoverflowStack Exchange site slug
sortBystringvotesvotes / activity / creation / relevance
apiKeystringOptional API key (10k/day vs 300/day anonymous)
maxItemsinteger25Maximum records to return (1–200)

Example inputs

Search for Python questions on Stack Overflow

{
"mode": "searchQuestions",
"query": "python",
"site": "stackoverflow",
"sortBy": "votes",
"maxItems": 25
}

Browse JavaScript questions by tag

{
"mode": "getQuestionsByTag",
"tags": ["javascript"],
"site": "stackoverflow",
"sortBy": "votes",
"maxItems": 50
}

Fetch answers for specific questions

{
"mode": "getAnswers",
"questionIds": ["11227809", "231767"],
"site": "stackoverflow",
"maxItems": 50
}

List all Stack Exchange network sites

{
"mode": "listSites",
"maxItems": 200
}

Use cases

  • Developer relations — monitor your library/SDK tag for new unanswered questions
  • Technical content marketing — find gaps in documentation by analyzing high-view low-score questions
  • Q&A datasets for ML/RAG — export curated answers for fine-tuning or retrieval-augmented generation
  • Recruiting — identify domain experts by reputation and tag activity
  • Community analytics — analyze trends in what developers struggle with
  • Competitive intelligence — track questions about competitor products
  • Documentation prioritization — high-view questions reveal documentation gaps

FAQ

Does it require a login or cookies? No. The Stack Exchange API is fully public and returns real data without authentication.

Is a proxy needed? No. Stack Exchange accepts requests from any IP address.

What is the API quota? 300 requests/day without an API key; 10,000/day with a free key. Register at https://stackapps.com/apps/oauth/register and pass it via apiKey.

Which sites are supported? All 170+ Stack Exchange communities. Pass any valid site slug to the site field (e.g. stackoverflow, serverfault, math, cooking, worldbuilding). Use mode=listSites to enumerate all available sites.

Why are some fields missing? Fields are omitted when the API returns no data for them (e.g. acceptedAnswerId only appears when a question has an accepted answer, authorReputation only appears when the author profile is available).

What does body contain? The body field is a plain-text excerpt (up to 500 characters) extracted from the question or answer's HTML body. HTML tags are stripped, common entities are decoded, and the text is trimmed.

How does sortBy=relevance work? relevance is only valid for mode=searchQuestions — it ranks results by how closely they match the search query. For other modes, votes is used.

What happens if I exceed the quota? The actor watches the quota_remaining and backoff headers and sleeps gracefully when asked. Once the daily quota is exhausted, you will get fewer results — pass an apiKey to raise the cap to 10,000/day.

Can I scrape multiple tags at once? Yes — pass multiple values in the tags array. Stack Exchange treats them as an AND filter (questions must have all specified tags).

Is the data real-time? Yes. Stack Exchange's API surfaces new questions within seconds of posting.