Stack Exchange Questions Scraper
Pricing
Pay per event
Stack Exchange Questions Scraper
Search and scrape questions across Stack Overflow and every Stack Exchange site — by tag, search query, or user. Returns title, body, tags, score, view count, answer count, accepted answer, asker, and timestamps. Built on the Stack Exchange v2.3 API.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
🎯 What this scrapes
Stack Exchange's API (api.stackexchange.com/2.3) covers every site in the network — Stack Overflow, Server Fault, Super User, Cross Validated, plus 170+ topic sites. This Actor wraps the questions endpoint, paginates safely (respecting the backoff field), and writes one row per question with body, tags, and key metadata.
🔥 What we handle for you
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per page,Retry-Afterhonoured. - 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
- 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.
💡 Use cases
- Support-knowledge mining — pull the top 100 voted Q&A for your product's tag.
- Trend monitoring — daily diffs on the
hotsort to see emergent issues. - Help-center seed — feed Q&A bodies into a RAG store for an internal documentation bot.
- Recruiter outreach — extract askers from a
reacttag, score by reputation (via user endpoint).
⚙️ How to use it
- Click Try for free at the top of the page.
- Fill in the input form — most fields have sensible defaults.
- Click Start. Output streams into the run's dataset.
- Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
site | string | no | 'stackoverflow' | Site slug, e.g. stackoverflow, superuser, serverfault, askubuntu, <c |
mode | string | no | 'tagged' | How to find questions. |
tags | array | no | ['python'] | Tags to filter by. Multiple tags = OR by default; use one to keep it tight. |
searchQuery | string | no | '—' | Free-text search query. |
userId | integer | no | '—' | Numeric Stack Exchange user id. |
sortBy | string | no | 'activity' | Stack Exchange API sort param. |
maxResults | integer | no | 30 | Max items across pages. API caps page size at 100. |
includeBody | boolean | no | True | Request filter=withbody to include the full question body. Slightly bigger payload. |
apiKey | string | no | '—' | Get one at stackapps.com — lifts the daily quota from 300 to 10 000 requests. |
proxyConfiguration | object | no | {'useApifyProxy': False} | Optional. The API is friendly to direct clients. |
Example input
{"site": "stackoverflow","mode": "tagged","tags": ["python"],"sortBy": "votes","maxResults": 3,"includeBody": false,"proxyConfiguration": {"useApifyProxy": false}}
📤 Output
Every row is one dataset item.
| Field | Type | Notes |
|---|---|---|
question_id | integer | Stack Exchange question id. |
site | string | Site slug the question came from. |
title | string | Question title. |
body_html | ['string', 'null'] | Question body in HTML (when includeBody=true). |
tags | array | Tags applied to the question. |
score | integer | Net score (upvotes minus downvotes). |
view_count | integer | Question views. |
answer_count | integer | Number of answers. |
is_answered | boolean | Has an accepted answer or any positive-score answer. |
accepted_answer_id | ['integer', 'null'] | Accepted answer id, when present. |
link | string | Canonical question URL. |
owner_user_id | ['integer', 'null'] | Asker user id. |
owner_display_name | ['string', 'null'] | Asker display name. |
creation_date | integer | Unix timestamp — created at. |
last_activity_date | integer | Unix timestamp — last activity. |
posted_at | string | ISO-8601 UTC derived from creation_date. |
scraped_at | string | When this row was recorded. |
Example output
{"question_id": 1234567,"site": "stackoverflow","title": "How do I close a connection cleanly in asyncio?","tags": ["python","asyncio"],"score": 142,"answer_count": 3,"link": "https://stackoverflow.com/questions/1234567/..."}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result | $0.0015 | Per dataset item |
Example: 1 000 results at the rates above ≈ $1.50. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.
🚧 Limitations
Comments, voting graph, and revisions are not in scope. Search ranking is the API's, which differs from the website's UI search.
❓ FAQ
Why is the API quota so low?
Stack Exchange caps anonymous usage at 300 requests/day. Sign up for a key and you get 10 000.
Can I get answers too?
Out of scope here — see the sibling Actor stackexchange-answers-scraper (planned).
What about voting / posting?
We do not write to Stack Exchange. Read-only API access only.
Why are some user fields null?
Some questions are asked by deleted users — the API surfaces null.
💬 Your feedback
Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.