Stack Exchange Questions Scraper avatar

Stack Exchange Questions Scraper

Pricing

Pay per event

Go to Apify Store
Stack Exchange Questions Scraper

Stack Exchange Questions Scraper

Search and scrape questions across Stack Overflow and every Stack Exchange site — by tag, search query, or user. Returns title, body, tags, score, view count, answer count, accepted answer, asker, and timestamps. Built on the Stack Exchange v2.3 API.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share


🎯 What this scrapes

Stack Exchange's API (api.stackexchange.com/2.3) covers every site in the network — Stack Overflow, Server Fault, Super User, Cross Validated, plus 170+ topic sites. This Actor wraps the questions endpoint, paginates safely (respecting the backoff field), and writes one row per question with body, tags, and key metadata.

🔥 What we handle for you

  • 🛡️ Browser fingerprint rotationcurl-cffi impersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python.
  • 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
  • 🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per page, Retry-After honoured.
  • 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
  • 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
  • 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.

💡 Use cases

  • Support-knowledge mining — pull the top 100 voted Q&A for your product's tag.
  • Trend monitoring — daily diffs on the hot sort to see emergent issues.
  • Help-center seed — feed Q&A bodies into a RAG store for an internal documentation bot.
  • Recruiter outreach — extract askers from a react tag, score by reputation (via user endpoint).

⚙️ How to use it

  1. Click Try for free at the top of the page.
  2. Fill in the input form — most fields have sensible defaults.
  3. Click Start. Output streams into the run's dataset.
  4. Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.

📥 Input

FieldTypeRequiredDefaultNotes
sitestringno'stackoverflow'Site slug, e.g. stackoverflow, superuser, serverfault, askubuntu, <c
modestringno'tagged'How to find questions.
tagsarrayno['python']Tags to filter by. Multiple tags = OR by default; use one to keep it tight.
searchQuerystringno'—'Free-text search query.
userIdintegerno'—'Numeric Stack Exchange user id.
sortBystringno'activity'Stack Exchange API sort param.
maxResultsintegerno30Max items across pages. API caps page size at 100.
includeBodybooleannoTrueRequest filter=withbody to include the full question body. Slightly bigger payload.
apiKeystringno'—'Get one at stackapps.com — lifts the daily quota from 300 to 10 000 requests.
proxyConfigurationobjectno{'useApifyProxy': False}Optional. The API is friendly to direct clients.

Example input

{
"site": "stackoverflow",
"mode": "tagged",
"tags": [
"python"
],
"sortBy": "votes",
"maxResults": 3,
"includeBody": false,
"proxyConfiguration": {
"useApifyProxy": false
}
}

📤 Output

Every row is one dataset item.

FieldTypeNotes
question_idintegerStack Exchange question id.
sitestringSite slug the question came from.
titlestringQuestion title.
body_html['string', 'null']Question body in HTML (when includeBody=true).
tagsarrayTags applied to the question.
scoreintegerNet score (upvotes minus downvotes).
view_countintegerQuestion views.
answer_countintegerNumber of answers.
is_answeredbooleanHas an accepted answer or any positive-score answer.
accepted_answer_id['integer', 'null']Accepted answer id, when present.
linkstringCanonical question URL.
owner_user_id['integer', 'null']Asker user id.
owner_display_name['string', 'null']Asker display name.
creation_dateintegerUnix timestamp — created at.
last_activity_dateintegerUnix timestamp — last activity.
posted_atstringISO-8601 UTC derived from creation_date.
scraped_atstringWhen this row was recorded.

Example output

{
"question_id": 1234567,
"site": "stackoverflow",
"title": "How do I close a connection cleanly in asyncio?",
"tags": [
"python",
"asyncio"
],
"score": 142,
"answer_count": 3,
"link": "https://stackoverflow.com/questions/1234567/..."
}

💰 Pricing

Pay-Per-Event — you pay only when these events fire:

EventUSDWhat it is
actor-start$0.005One-off warm-up charge per run
result$0.0015Per dataset item

Example: 1 000 results at the rates above ≈ $1.50. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.

🚧 Limitations

Comments, voting graph, and revisions are not in scope. Search ranking is the API's, which differs from the website's UI search.

❓ FAQ

Why is the API quota so low?

Stack Exchange caps anonymous usage at 300 requests/day. Sign up for a key and you get 10 000.

Can I get answers too?

Out of scope here — see the sibling Actor stackexchange-answers-scraper (planned).

What about voting / posting?

We do not write to Stack Exchange. Read-only API access only.

Why are some user fields null?

Some questions are asked by deleted users — the API surfaces null.

💬 Your feedback

Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.