Stack Exchange Questions Scraper avatar

Stack Exchange Questions Scraper

Pricing

from $0.02 / 1,000 question saveds

Go to Apify Store
Stack Exchange Questions Scraper

Stack Exchange Questions Scraper

Collect public Stack Overflow and Stack Exchange questions by site, tag, keyword, date, score, and answers for SEO, DevRel, product, and support research.

Pricing

from $0.02 / 1,000 question saveds

Rating

0.0

(0)

Developer

Hanna Nosova

Hanna Nosova

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

13 hours ago

Last modified

Categories

Share

Stack Exchange and Stack Overflow Questions Scraper

Collect public Stack Overflow and Stack Exchange questions by site, tag, keyword, date range, and score.

Use this actor to turn public developer Q&A activity into a clean dataset for SEO, product research, support insights, competitive monitoring, and developer-relations planning.


What does Stack Exchange and Stack Overflow Questions Scraper do?

Stack Exchange Questions Scraper extracts public question metadata from Stack Overflow and other Stack Exchange communities.

It returns structured rows with question title, URL, tags, author, score, answer count, view count, answer status, dates, and license information.

You can search one site or several sites in the same run.

You can filter by tags, free-text query, date range, and minimum score.

You can also enable answer summaries when you need more context around which questions received useful responses.


Who is it for?

SEO and content teams

Find real questions people ask about programming languages, frameworks, libraries, tools, and errors.

Developer-relations teams

Track community pain points around your SDK, API, product, integration, or competitor.

Product managers

Discover recurring feature requests, confusion, migration problems, and adoption blockers.

Support teams

Monitor public troubleshooting topics and prepare proactive documentation.

Market researchers

Compare tag activity across technologies, products, ecosystems, and communities.


Why use this actor?

  • ✅ Search public Stack Overflow and Stack Exchange questions without building your own collector
  • ✅ Export clean JSON, CSV, Excel, HTML, XML, or RSS from Apify datasets
  • ✅ Run scheduled monitoring jobs for tags and keywords
  • ✅ Combine multiple Stack Exchange sites in one run
  • ✅ Use date and score filters to focus on useful questions
  • ✅ Integrate results into spreadsheets, BI tools, notebooks, or automations

What data can you extract?

FieldDescription
siteStack Exchange site name used for the question
questionIdStack Exchange question identifier
titleQuestion title
urlPublic question URL
tagsQuestion tags
ownerDisplayNameDisplay name of the question owner when available
ownerUrlPublic profile URL when available
scoreQuestion score
answerCountNumber of answers
viewCountNumber of views
isAnsweredWhether the question is considered answered
acceptedAnswerIdAccepted answer ID when present
creationDateQuestion creation date
lastActivityDateLast activity date
lastEditDateLast edit date when present
contentLicenseContent license shown by Stack Exchange
answersOptional answer summaries when enabled

How much does it cost to scrape Stack Exchange questions?

Pricing is pay-per-event.

There is a small $0.005 run-start charge and a low per-question charge for each saved question row.

TierPrice per saved question
FREE$0.000034022
BRONZE$0.000029585
SILVER$0.000023076
GOLD$0.000017751
PLATINUM$0.000011834
DIAMOND$0.00001

The default prefilled run saves only 20 questions so you can test the actor cheaply.

For large monitoring jobs, set maxItems to the volume you need and export the dataset after the run finishes.


How to scrape Stack Exchange questions

  1. Open the actor on Apify.
  2. Enter one or more Stack Exchange site names, for example stackoverflow or serverfault.
  3. Add one or more tags, such as python, react, or docker.
  4. Optionally add a keyword query like timeout error.
  5. Choose a sort order.
  6. Set maxItems.
  7. Run the actor.
  8. Download your dataset from the Storage tab.

Input options

sites

A list of Stack Exchange site names.

Examples:

  • stackoverflow
  • serverfault
  • superuser
  • askubuntu
  • datascience
  • math

tags

A list of tags to filter questions.

Examples:

  • python
  • javascript
  • reactjs
  • docker
  • kubernetes

query

Optional keyword search text.

Use this for phrases, errors, product names, or broad topics.

sort

Supported values:

  • activity
  • votes
  • creation
  • hot
  • week
  • month
  • relevance

fromDate and toDate

Optional date filters in ISO format, such as 2026-01-01.

minScore

Minimum score for saved questions.

Use 0 to ignore downvoted questions.

includeAnswers

Set to true to add answer summaries to each question row.


Example input

{
"sites": ["stackoverflow"],
"tags": ["python"],
"query": "pandas dataframe",
"sort": "activity",
"maxItems": 20,
"includeAnswers": false,
"minScore": 0
}

Example output

{
"site": "stackoverflow",
"questionId": 123456,
"title": "How do I filter rows in a pandas DataFrame?",
"url": "https://stackoverflow.com/questions/123456/example",
"tags": ["python", "pandas", "dataframe"],
"ownerDisplayName": "developer123",
"ownerUrl": "https://stackoverflow.com/users/123/developer123",
"score": 42,
"answerCount": 3,
"viewCount": 10000,
"isAnswered": true,
"acceptedAnswerId": 123457,
"creationDate": "2026-01-01T12:00:00.000Z",
"lastActivityDate": "2026-01-03T09:30:00.000Z",
"lastEditDate": "2026-01-02T08:15:00.000Z",
"contentLicense": "CC BY-SA 4.0"
}

Tips for better results

  • Use exact product or library tags when they exist.
  • Use votes to find high-signal evergreen questions.
  • Use creation to monitor new questions.
  • Use activity for recently updated discussions.
  • Use minScore to reduce noisy rows.
  • Keep first test runs small, then scale up after reviewing output.
  • Use multiple sites when researching topics that span admin, devops, and programming communities.

Common workflows

Content gap research

Run tags like python, pandas, or reactjs and sort by votes.

Use top questions to plan tutorials, FAQs, and comparison pages.

Support trend monitoring

Schedule a daily run for your product tag or company keyword.

Send new rows to Slack, Airtable, Google Sheets, or a help-center backlog.

Competitive research

Search competitor product names and integration tags.

Group questions by score, views, answer count, and recency.

DevRel topic discovery

Monitor language and framework tags to find what developers are struggling with this week.


Integrations

You can connect the dataset to:

  • Google Sheets for editorial planning
  • Airtable for research queues
  • Notion for content briefs
  • Slack for alerting
  • BigQuery or Snowflake for trend analysis
  • Zapier or Make for no-code automations
  • Webhooks for custom pipelines

Use as a public Stack Exchange API workflow

Use this actor as a repeatable public Stack Exchange and Stack Overflow API workflow for tag monitoring, keyword research, content-gap analysis, DevRel planning, and support trend tracking. It uses public Stack Exchange data/API behavior, respects platform limits, and does not access private account data.


API usage

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('fetch_cat/stack-exchange-questions-scraper').call({
sites: ['stackoverflow'],
tags: ['python'],
query: 'pandas dataframe',
maxItems: 20
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
import os
client = ApifyClient(os.environ['APIFY_TOKEN'])
run = client.actor('fetch_cat/stack-exchange-questions-scraper').call(run_input={
'sites': ['stackoverflow'],
'tags': ['python'],
'query': 'pandas dataframe',
'maxItems': 25,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

cURL

curl -X POST "https://api.apify.com/v2/acts/fetch_cat~stack-exchange-questions-scraper/runs?token=$APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{"sites":["stackoverflow"],"tags":["python"],"maxItems":20}'

MCP integration

Use Apify MCP to run this actor from AI tools.

MCP URL:

https://mcp.apify.com/?tools=fetch_cat/stack-exchange-questions-scraper

Claude Code setup:

$claude mcp add apify-stack-exchange-questions https://mcp.apify.com/?tools=fetch_cat/stack-exchange-questions-scraper

Claude Desktop JSON config example:

{
"mcpServers": {
"apify-stack-exchange-questions": {
"url": "https://mcp.apify.com/?tools=fetch_cat/stack-exchange-questions-scraper"
}
}
}

After connecting, ask your AI assistant to run fetch_cat/stack-exchange-questions-scraper with the sites, tags, and filters you need.

Example prompts showing MCP usage:

  • "Use the fetch_cat/stack-exchange-questions-scraper MCP tool to find recent Stack Overflow questions about Python pandas performance and summarize common pain points."
  • "Use the Stack Exchange Questions Scraper MCP tool to monitor Docker questions with score above 2 and list topics our docs should cover."
  • "Run fetch_cat/stack-exchange-questions-scraper via MCP to compare activity for Kubernetes and Docker tags over the last month."

Data freshness

The actor collects currently available public question metadata at run time.

Use scheduled runs for ongoing monitoring.

Use date filters to build repeatable daily, weekly, or monthly snapshots.


Limits and behavior

Stack Exchange may return fewer questions than requested if the filters are narrow.

Some owner fields may be missing for deleted or anonymous accounts.

Some questions do not have an accepted answer.

Answer summaries require extra requests and may make runs take longer.


Legality and responsible use

This actor extracts publicly available question metadata.

You are responsible for using the data according to applicable laws, platform terms, and your own compliance requirements.

Respect attribution and license requirements for Stack Exchange content.

Do not use the output for spam, harassment, or abusive automation.


Troubleshooting

Why did I get fewer rows than maxItems?

Your filters may be too narrow, the selected site may have fewer matching questions, or the date range may be restrictive.

Try broadening tags, removing query, lowering minScore, or increasing the date range.

Why are owner fields empty?

Some public question records do not include full owner details, especially when an account was removed or anonymized.

Which site names should I use?

Use Stack Exchange API site names, such as stackoverflow, serverfault, superuser, askubuntu, math, or datascience.


FAQ

Can I scrape multiple sites at once?

Yes. Add several site names to sites; the actor will collect matching questions across them until maxItems is reached.

Can I scrape Stack Overflow questions by tag and keyword together?

Yes. Use query with tags to focus on a specific topic inside a Stack Overflow or Stack Exchange tag.

Can I include answers?

Yes. Enable includeAnswers to include answer summaries on each question row.

Can I run this actor on a schedule?

Yes. Use Apify schedules to run the same input daily, weekly, or monthly.

Can I export to CSV or Excel?

Yes. Apify datasets can be exported as JSON, CSV, Excel, XML, HTML, and RSS.


Changelog

0.1

Initial version with multi-site question collection, tag filters, keyword search, date filters, minimum score filtering, and optional answer summaries.