Pricing

Pay per event

Go to Apify Store

Stack Overflow Scraper

Try for free

Search and extract Stack Overflow questions with scores, answers, tags, view counts, and author info.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

What does Stack Overflow Scraper do?

Stack Overflow Scraper uses the StackExchange API to search and extract questions from Stack Overflow. For each question, it returns the title, vote score, answer count, view count, tags, creation date, and author details.

Sort results by relevance, votes, creation date, or recent activity. Filter by tags to narrow results to specific technologies.

Who is Stack Overflow Scraper for?

💻 Software developers researching solutions to programming problems
🤖 AI/ML engineers curating Q&A datasets for RAG pipelines or LLM training
📝 Technical writers identifying common developer pain points for documentation
📊 Developer advocates tracking community questions about their frameworks
🏢 Engineering managers analyzing technology trends and developer challenges
🎓 Educators building curated collections of programming exercises and explanations
📈 Market researchers studying technology adoption through developer questions

Why scrape Stack Overflow?

Stack Overflow has 23+ million questions covering every programming topic. Use cases include:

🔍 Developer research — find the most upvoted solutions for any programming problem
📊 Content analysis — study popular questions, trending topics, and technology adoption
📝 Documentation gaps — identify frequently asked questions to improve your docs
🤖 Training data — build datasets of programming Q&A for LLM fine-tuning, RAG pipelines, or AI coding assistants
🏁 Competitive analysis — track questions about your framework or library
👥 Hiring insights — analyze what technologies developers struggle with most

Data extraction fields

Default format

Field	Type	Description
`questionId`	number	Stack Overflow question ID
`title`	string	Question title
`score`	number	Net vote score (upvotes - downvotes)
`answerCount`	number	Number of answers
`viewCount`	number	Total view count
`isAnswered`	boolean	Whether the question has an upvoted answer
`hasAcceptedAnswer`	boolean	Whether the author accepted an answer
`tags`	string[]	Associated technology tags
`creationDate`	string	When the question was posted
`lastActivityDate`	string	Last edit or answer activity
`url`	string	Direct link to the question
`authorName`	string	Question author's display name
`authorReputation`	number	Author's reputation score
`authorUrl`	string	Author's profile URL
`scrapedAt`	string	ISO timestamp of extraction

LLM fine-tuning format (`jsonl-finetune`)

Field	Type	Description
`instruction`	string	The question title — used as the prompt/instruction
`context`	string	Tags, question score, and source URL as context
`response`	string	Plain-text answer (accepted or top-voted), code blocks preserved
`metadata.questionId`	number	Stack Overflow question ID
`metadata.url`	string	Source URL
`metadata.tags`	string[]	Technology tags
`metadata.questionScore`	number	Question vote score
`metadata.answerScore`	number	Answer vote score
`metadata.hasAcceptedAnswer`	boolean	Whether this is the accepted answer
`metadata.scrapedAt`	string	ISO timestamp of extraction

How much does it cost to scrape Stack Overflow?

Stack Overflow Scraper uses pay-per-event pricing:

Event	Price
Run started	$0.001
Question extracted	$0.001 per question

Example costs:

20 top React questions: ~$0.021
100 Python questions: ~$0.101
300 questions across 3 topics: ~$0.301

Platform costs are minimal. The StackExchange API is free (300 requests/day without API key).

How to scrape Stack Overflow questions

Go to Stack Overflow Scraper on Apify Store.
Enter one or more search keywords in the searchQueries field (e.g., react hooks, python asyncio).
Optionally filter by tags (e.g., javascript;react) and choose a sort order.
Set the maximum number of results per keyword.
Click Start and wait for results.
Download data as JSON, CSV, or Excel.

Input parameters

Parameter	Type	Description	Default
`searchQueries`	string[]	Keywords to search on Stack Overflow	Required
`tagged`	string	Filter by tags (semicolon-separated, e.g. `javascript;react`)	—
`sortBy`	string	Sort: `relevance`, `votes`, `creation`, `activity`	`relevance`
`maxResults`	integer	Maximum questions per keyword (1-300)	`50`
`minScore`	integer	Minimum question vote score threshold	`0` (all)
`outputFormat`	string	`default` (question metadata) or `jsonl-finetune` (Q&A pairs)	`default`

Input example — default format

{
  "searchQueries": ["react hooks", "python asyncio"],
  "sortBy": "votes",
  "maxResults": 20
}

Input example — LLM fine-tuning

{
  "searchQueries": ["python decorators", "javascript promises"],
  "sortBy": "votes",
  "maxResults": 100,
  "minScore": 50,
  "outputFormat": "jsonl-finetune"
}

Output example

Default format

Each question is returned as a JSON object:

{
  "questionId": 53219858,
  "title": "How to fix missing dependency warning when using useEffect React Hook",
  "score": 890,
  "answerCount": 26,
  "viewCount": 1252100,
  "isAnswered": true,
  "hasAcceptedAnswer": true,
  "tags": ["reactjs", "react-hooks", "eslint"],
  "creationDate": "2018-11-09T08:45:12.000Z",
  "lastActivityDate": "2026-01-15T12:30:00.000Z",
  "url": "https://stackoverflow.com/questions/53219858",
  "authorName": "Andru",
  "authorReputation": 5234,
  "authorUrl": "https://stackoverflow.com/users/123456/andru",
  "scrapedAt": "2026-03-03T05:02:00.000Z"
}

LLM fine-tuning format (`jsonl-finetune`)

Each item is an instruction/response pair ready for supervised fine-tuning (SFT):

{
  "instruction": "if/else in a list comprehension",
  "context": "This is a Stack Overflow question. Tags: python, list, if-statement, list-comprehension. Question score: 1748. Source: https://stackoverflow.com/questions/4260280",
  "response": "You can totally do that. It's just an ordering issue:\n\n```\n[f(x) if x is not None else '' for x in xs]\n```\n\nIn general:\n\n```\n[f(x) if condition else g(x) for x in sequence]\n```",
  "metadata": {
    "questionId": 4260280,
    "url": "https://stackoverflow.com/questions/4260280",
    "tags": ["python", "list", "if-statement", "list-comprehension"],
    "questionScore": 1748,
    "answerScore": 2923,
    "hasAcceptedAnswer": true,
    "scrapedAt": "2026-04-05T10:07:59.074Z"
  }
}

The instruction field is the question title. The response is the accepted answer (or highest-voted if no accepted answer), converted to plain text with code blocks preserved using Markdown fences. The context field provides grounding metadata. This format is compatible with Hugging Face datasets, OpenAI fine-tuning JSONL, and frameworks like Axolotl and LLaMA-Factory.

Using Stack Overflow data for LLM fine-tuning

Stack Overflow is one of the highest-quality public sources of programming Q&A available. With the jsonl-finetune output format, each result is an instruction/response pair you can feed directly into supervised fine-tuning (SFT) pipelines.

Why Stack Overflow for LLM training?

🏆 Peer-reviewed quality — answers are voted on by millions of developers. High-score answers are vetted as correct.
💻 Code-rich — answers include real working code examples, not just prose explanations.
📚 Breadth — 23+ million questions covering every programming topic from beginner to expert.
🔓 Freely licensed — content is CC BY-SA 4.0. You may use it for training with attribution.

Recommended workflow

Set outputFormat to jsonl-finetune
Set sortBy to votes for highest-quality answers
Set minScore to at least 10 (or 50+ for premium training data) to filter out low-quality questions
Search targeted topics relevant to your model's domain
Download the dataset as JSON or JSONL
Load with Hugging Face datasets:

from datasets import load_dataset

ds = load_dataset("json", data_files="stackoverflow_finetune.json")
# Fields: instruction, context, response, metadata

Format compatibility

The jsonl-finetune output is compatible with:

OpenAI fine-tuning API — map instruction → user, response → assistant
Axolotl — use alpaca or completion dataset format
LLaMA-Factory — alpaca-style with instruction, input (context), output (response)
Hugging Face TRL — SFTTrainer with instruction and response columns

Quality filtering

Use minScore to control training data quality:

`minScore`	Use case
`0`	Maximum volume, mixed quality
`10`	Balanced quality/volume — good for general coding models
`50`	High-quality curated data — better signal-to-noise
`100`	Premium data for domain-specific fine-tuning

Tips and best practices

🏆 Sort by votes — use votes sorting to find the most authoritative answers.
🏷️ Tag filtering — use tagged to narrow to specific technologies (e.g., python;pandas).
👀 View count — high view counts indicate common problems many developers face.
🔄 API quota — the free tier allows 300 API requests/day. Each page of results = 1 request.
📊 Max 300 results — the API limits unauthenticated search to ~300 results per query.
⭐ Score interpretation — scores above 100 indicate widely-appreciated questions; above 500 is exceptional.
🤖 Fine-tuning tip — combine sortBy: "votes" with minScore: 50 and outputFormat: "jsonl-finetune" for the highest-quality LLM training pairs.
🔍 Topic breadth — run multiple keyword searches and merge datasets to cover a programming domain comprehensively.

Integrations

Connect Stack Overflow Scraper to apps:

📊 Google Sheets — export Q&A data for analysis
🔔 Slack — notifications for new popular questions in your tech stack
⚡ Zapier / Make — automate workflows with developer Q&A data
🔗 Webhook — send results to your own API

How to use Stack Overflow Scraper with the API

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const run = await client.actor('automation-lab/stackoverflow-scraper').call({
    searchQueries: ['python machine learning'],
    sortBy: 'votes',
    maxResults: 50,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(q => {
    console.log(`[${q.score}] ${q.title} (${q.viewCount} views)`);
});

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("automation-lab/stackoverflow-scraper").call(run_input={
    "searchQueries": ["python machine learning"],
    "sortBy": "votes",
    "maxResults": 50,
})

for q in client.dataset(run["defaultDatasetId"]).iterate_items():
    answered = "✓" if q["isAnswered"] else " "
    print(f"{answered} score={q['score']:4d} views={q['viewCount']:7d} {q['title'][:60]}")

cURL

curl "https://api.apify.com/v2/acts/automation-lab~stackoverflow-scraper/runs" \
  -X POST \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"searchQueries": ["react hooks"], "sortBy": "votes", "maxResults": 20}'

Use with AI agents via MCP

Stack Overflow Scraper is available as a tool for AI assistants via the Model Context Protocol (MCP).

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/stackoverflow-scraper"

Setup for Claude Desktop, Cursor, or VS Code

{
    "mcpServers": {
        "apify": {
            "url": "https://mcp.apify.com?tools=automation-lab/stackoverflow-scraper"
        }
    }
}

Example prompts

"Find top Stack Overflow questions about Python async"
"Get the most voted questions about React hooks"
"Search Stack Overflow for questions tagged rust about memory safety"

Learn more in the Apify MCP documentation.

Is it legal to scrape Stack Overflow?

This scraper uses the official StackExchange API, not web scraping. The StackExchange API is publicly available and designed for programmatic access. All data returned is publicly visible on Stack Overflow.

Stack Overflow content is licensed under CC BY-SA 4.0, which allows sharing and adaptation with proper attribution. If you republish the data, you must provide attribution to the original authors and Stack Overflow.

FAQ

Q: Does it return the answer text? A: Yes — set outputFormat to jsonl-finetune and the actor fetches the accepted (or top-voted) answer for each question, converting it to plain text with code blocks preserved. The default format returns question metadata only.

Q: Is an API key required? A: No. The StackExchange API works without authentication (300 requests/day limit).

Q: Can I search other StackExchange sites? A: This scraper is configured for Stack Overflow specifically.

Q: How current is the data? A: Data is real-time from the StackExchange API.

Q: I'm getting fewer results than maxResults — why? A: The StackExchange API may return fewer results if your search query is too specific or if the daily API quota (300 requests) has been reached. Try broader keywords or wait for the quota to reset.

Q: Results seem outdated or missing recent questions — what can I do? A: Sort by activity instead of votes to surface recently active questions. The default relevance sorting may favor older, highly-voted questions.

GitHub Scraper — scrape GitHub repositories, profiles, and stars
GitHub Trending Scraper — track trending repositories on GitHub
Hacker News Scraper — extract posts and comments from Hacker News
Dev.to Scraper — scrape articles and profiles from Dev.to

Stack Overflow Scraper

pear_fight/stackoverflow-scraper

Scrape questions, answers, tags from Stack Overflow

Harald

Stack Overflow

jupri/stackexchange

cat

Stack Overflow Scraper

cloud9_ai/stackoverflow-scraper

Scrape Stack Overflow questions, answers, and tags via Stack Exchange API. Search by keyword or tag, get accepted answers, vote counts, and view statistics.

cloud9

Stack Overflow Scraper API - Search Questions, Answers & Trends

fresh_cliff/stackoverflow-api-scraper

Extract Stack Overflow questions, answers, tags, votes, users, and comments via the Stack Exchange API. Fast JSON export, pagination, filters, date ranges, and keyword search. Ideal for analytics, AI training, and monitoring trends in developer Q&A data.

Brennan Crawford

Stack Overflow Q&A Scraper

sheshinmcfly/stackoverflow-scraper

Extract questions and answers from Stack Overflow via the official Stack Exchange API. Filter by tags, keywords, or top voted. Returns question body, accepted answer, top answers, vote counts, and tags. Perfect for AI training data, RAG pipelines, and knowledge bases.

Sheshinmcfly

Stack Overflow Search Scraper

jeremy_frost/stack-overflow-search-scraper

Specify a 🔍search expression for retrieving Stack Overflow search results.

Jeremy Frost

Find Repeated Developer Pain Points on Stack Overflow

happyfhantum/stack-overflow-opportunity-finder

Turn Stack Overflow questions into product and automation opportunity signals.

Kelsey Todd

Stack Exchange Scraper - Questions, Answers, Tags

wetyr_corporation/stackexchange-scraper

Bulk extract questions and answers from Stack Overflow and any Stack Exchange site. Filter by tag, score, sort. Built for AI/LLM training, developer RAG, and technical research.

WETYR CORPORATION

Stack Overflow Questions Pain Point Scraper

happyfhantum/stackoverflow-pain-points

Find repeated developer problems in Stack Overflow questions and turn them into demand signals.

Kelsey Todd

Stackoverflow Email Scraper - Advanced, Fast & Cheapest

contacts-api/stackoverflow-email-scraper-fast-advanced-and-cheapest

💻 Stack Overflow Email Scraper helps you extract developer and company emails from Stack Overflow profiles 🔍 Ideal for tech hiring, outreach, and research 📧

Lead Heaven

Stack Overflow Scraper

What does Stack Overflow Scraper do?

Who is Stack Overflow Scraper for?

Why scrape Stack Overflow?

Data extraction fields

Default format

LLM fine-tuning format (jsonl-finetune)

How much does it cost to scrape Stack Overflow?

How to scrape Stack Overflow questions

Input parameters

Input example — default format

Input example — LLM fine-tuning

Output example

Default format

LLM fine-tuning format (jsonl-finetune)

Using Stack Overflow data for LLM fine-tuning

Why Stack Overflow for LLM training?

Recommended workflow

Format compatibility

Quality filtering

Tips and best practices

Integrations

How to use Stack Overflow Scraper with the API

Node.js

Python

cURL

Use with AI agents via MCP

Setup for Claude Code

Setup for Claude Desktop, Cursor, or VS Code

Example prompts

Is it legal to scrape Stack Overflow?

FAQ

Related scrapers

You might also like

Stack Overflow Scraper

Stack Overflow

Stack Overflow Scraper

Stack Overflow Scraper API - Search Questions, Answers & Trends

Stack Overflow Q&A Scraper

Stack Overflow Search Scraper

Find Repeated Developer Pain Points on Stack Overflow

Stack Exchange Scraper - Questions, Answers, Tags

Stack Overflow Questions Pain Point Scraper

Stackoverflow Email Scraper - Advanced, Fast & Cheapest

LLM fine-tuning format (`jsonl-finetune`)

LLM fine-tuning format (`jsonl-finetune`)