Discourse Forum Scraper
Pricing
Pay per event
Discourse Forum Scraper
Extract topics, posts, and discussions from any public Discourse forum. Supports latest topics, category filtering, and keyword search. No login required.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Extract topics, posts, and community discussions from any public Discourse forum — including community.openai.com, discuss.huggingface.co, meta.discourse.org, and 5,000+ other Discourse communities worldwide. No API key required.
🔍 What does it do?
Discourse Forum Scraper connects to Discourse's public JSON API (no authentication required for public forums) and extracts:
- Latest topics — trending and most recent discussions paginated across the entire forum
- Category topics — all topics within a specific category or sub-category
- Search results — topics matching a keyword or phrase
For each topic, you get structured metadata: title, URL, author, view count, like count, post count, category, tags, excerpt, creation date, and more. Optionally enable Include Post Content to fetch the full post body for each topic (including HTML content, poster info, and reaction counts).
The actor uses only Discourse's public JSON API endpoints (/latest.json, /c/{category}.json, /search.json, /t/{id}.json) — no browser, no proxies, minimal cost per result.
👥 Who is it for?
AI/ML researchers who want to monitor community.openai.com or discuss.huggingface.co for emerging topics, model comparisons, and developer pain points — without manually browsing the forum every day.
Product managers tracking competitor mentions, feature requests, and user feedback across developer communities. Discourse forums are where power users discuss product issues openly.
Content marketers and SEO teams identifying high-traffic discussion topics in their niche, repurposing forum Q&As into blog posts, or monitoring what questions get thousands of views.
Data scientists and NLP researchers building training datasets from community discussions, sentiment analysis corpora, or topic clustering studies on domain-specific text.
Community managers bulk-exporting discussions for archiving, migration, compliance audits, or import into internal knowledge bases.
Competitive intelligence analysts monitoring developer forums for product feedback, bug reports, and integration questions targeting specific categories.
💡 Why use it?
Discourse has over 5,000 active public forums across tech, gaming, academia, open-source, and more — but there's no centralized API that lets you query across all of them. This actor gives you a uniform interface to any Discourse forum, producing clean structured JSON you can pipe directly into spreadsheets, databases, or downstream APIs.
Compared to manually browsing Discourse or writing one-off scripts:
- ✅ Works on any Discourse forum — just change the URL
- ✅ Handles pagination automatically — get 5,000+ topics with one run
- ✅ Clean JSON output compatible with Google Sheets, Airtable, BigQuery, and more
- ✅ Optional post content extraction for full discussion text
- ✅ Category name resolution — readable
categoryNamefield, not just an ID - ✅ No API keys, no rate limit headaches, no authentication setup
📊 What data does it extract?
| Field | Description | Example |
|---|---|---|
topicId | Unique topic ID | 137192 |
title | Topic title | "Best practices for GPT-4 system prompts" |
slug | URL-friendly slug | "best-practices-for-gpt-4-system-prompts" |
url | Direct link to the topic | "https://community.openai.com/t/..." |
categoryId | Category numeric ID | 7 |
categoryName | Resolved category name | "API" |
postsCount | Number of replies | 24 |
replyCount | Direct reply count | 18 |
views | Total view count | 12503 |
likeCount | Total likes received | 89 |
createdAt | ISO timestamp created | "2023-04-03T10:23:49.213Z" |
lastPostedAt | ISO timestamp last reply | "2024-01-15T08:44:11.000Z" |
tags | List of topic tags | ["gpt-4", "system-prompt"] |
excerpt | Short text excerpt | "Has anyone found a good pattern for..." |
authorUsername | Original poster username | "johndev42" |
pinned | Is topic pinned | false |
closed | Is topic closed | false |
posts | Post array (optional) | See below |
When includePostContent: true, each topic includes a posts array:
| Field | Description |
|---|---|
postId | Unique post ID |
postNumber | Position in thread |
username | Poster's username |
displayName | Poster's display name |
createdAt | Post creation timestamp |
content | Full HTML post content |
likeCount | Number of likes on this post |
replyCount | Direct replies to this post |
reads | How many users read this post |
isAcceptedAnswer | Is this the accepted answer? |
💰 How much does it cost to scrape Discourse forum topics?
This actor uses pay-per-event (PPE) pricing — you pay only for the data you extract, not for idle time.
| Plan | Price per topic |
|---|---|
| FREE | $0.00115 |
| BRONZE | $0.001 |
| SILVER | $0.00078 |
| GOLD | $0.0006 |
| PLATINUM | $0.0004 |
| DIAMOND | $0.00028 |
Plus a flat $0.005 start fee per run.
Cost examples:
- 100 topics: ~$0.11 (FREE tier) / $0.105 (BRONZE)
- 500 topics: ~$0.585 (FREE) / $0.505 (BRONZE)
- 2,000 topics: ~$2.31 (FREE) / $2.005 (BRONZE)
- 5,000 topics with posts: ~$5.78 (FREE) / $5.005 (BRONZE)
Topics-only runs are cheap. Enabling includePostContent doubles API calls (one extra request per topic) but the PPE charge per result stays the same.
Free plan estimate: Apify Free plan ($0) includes $5/month in usage, which covers ~4,000 topics per month at BRONZE rates.
🚀 How to use it
Step 1: Choose your target forum
Find the base URL of any public Discourse forum. Examples:
https://community.openai.com— OpenAI Developer Communityhttps://discuss.huggingface.co— HuggingFace forumshttps://meta.discourse.org— Discourse Metahttps://forum.cursor.sh— Cursor AI communityhttps://community.cloudflare.com— Cloudflare developers
Step 2: Select a scrape mode
- Latest Topics — get the freshest discussions sorted by most recently active
- Category Topics — get topics from a specific category (requires the category slug, found in the URL)
- Search Topics — search for topics matching a keyword
Step 3: Set limits
Start small with maxTopics: 20 to preview the data. Scale up to 5,000 for bulk exports.
Step 4: Enable post content (optional)
Set includePostContent: true and maxPostsPerTopic: 10 to fetch the actual post bodies. Best for NLP datasets, content archiving, and answer extraction.
Step 5: Run and export
Click Start and wait for results. Export to JSON, CSV, or Excel. Connect to Google Sheets or Airtable directly from the Apify platform.
⚙️ Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
forumUrl | string | https://community.openai.com | Base URL of the Discourse forum |
scrapeMode | enum | latestTopics | What to scrape: latestTopics, categoryTopics, searchTopics |
categorySlug | string | — | Category slug for categoryTopics mode (e.g., api) |
searchQuery | string | — | Search keywords for searchTopics mode |
maxTopics | integer | 50 | Maximum number of topics to extract |
includePostContent | boolean | false | Fetch full post content for each topic |
maxPostsPerTopic | integer | 10 | Max posts per topic (when includePostContent is true) |
maxRequestRetries | integer | 3 | Retry attempts for failed HTTP requests |
📤 Output example
{"topicId": 137192,"title": "Cant find and use gpt-4 model (I have the gpt-4 Invitation)","slug": "cant-find-and-use-gpt-4-model-i-have-the-gpt-4-invitation","url": "https://community.openai.com/t/cant-find-and-use-gpt-4-model/137192","categoryId": 7,"categoryName": "API","postsCount": 15,"replyCount": 12,"views": 8432,"likeCount": 23,"createdAt": "2023-04-03T10:23:49.213Z","lastPostedAt": "2023-05-12T14:31:00.000Z","tags": ["gpt-4", "access"],"excerpt": "Hi everyone, I recently received an invitation for GPT-4 access...","authorUsername": "johndev42","pinned": false,"closed": false}
💡 Tips and tricks
Finding the category slug: Navigate to a category page and check the URL. For https://community.openai.com/c/api/7, the slug is api. For subcategories like /c/api/plugins/23, use api/plugins.
Filtering noisy topics: Use searchTopics mode with specific technical keywords to get only relevant discussions. For example, "function calling" returns topics specifically about that feature.
Scraping multiple forums: Run the actor multiple times with different forumUrl values. Use the Apify API to start runs programmatically and merge results.
Pagination behavior: The actor automatically paginates. Set maxTopics: 5000 to get the maximum — Discourse's /latest.json returns ~30 topics per page, so 5,000 topics requires ~167 API calls.
Rate limiting: If you get retried errors or timeouts, reduce throughput by lowering maxTopics per run or increasing maxRequestRetries. Most Discourse forums have no strict rate limits for anonymous JSON requests.
searchTopics mode — limited fields: Discourse's /search.json endpoint returns slim topic objects that do not include views, authorUsername, or excerpt. These fields will be empty ("" or 0) for search results. This is a Discourse API limitation, not a scraper bug. If you need these fields, use latestTopics or categoryTopics mode and filter by keyword on your end, or enable includePostContent: true (which fetches full topic data per result).
Private forums: This actor only works with public Discourse forums. Forums requiring login to browse will return empty results or redirect to a login page.
🔗 Integrations
Workflow: Daily topic monitoring via Google Sheets
Schedule this actor daily, point output to a Google Sheets integration (Apify has a built-in Sheets connector), and track emerging topics over time. Set scrapeMode: latestTopics and maxTopics: 50 for a lightweight daily snapshot.
Workflow: Competitive intelligence pipeline
Run searchTopics mode against competitor brand names across multiple forums. Feed output into an LLM summarizer to extract sentiment and pain points. Export a weekly digest.
Workflow: NLP training dataset creation
Enable includePostContent: true and maxPostsPerTopic: 20 for a specific forum category. Output gives you structured conversation threads with HTML content, which you can clean for fine-tuning or RAG ingestion.
Workflow: Community migration
Export all topics and posts from an old Discourse installation before migration. The structured JSON output makes it easy to transform and re-import into a new forum or knowledge base platform.
Workflow: SEO content research
Run searchTopics mode on niche Discourse forums with target keywords. Topics with high views and likeCount but few answers are prime candidates for content marketing — write the definitive answer and link back.
🔌 API usage
Node.js (Apify client)
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });const run = await client.actor('automation-lab/discourse-scraper').call({forumUrl: 'https://community.openai.com',scrapeMode: 'searchTopics',searchQuery: 'function calling',maxTopics: 100,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(`Scraped ${items.length} topics`);
Python
from apify_client import ApifyClientclient = ApifyClient(token='YOUR_APIFY_TOKEN')run = client.actor('automation-lab/discourse-scraper').call(run_input={'forumUrl': 'https://discuss.huggingface.co','scrapeMode': 'categoryTopics','categorySlug': 'research','maxTopics': 200,'includePostContent': True,'maxPostsPerTopic': 5,})items = list(client.dataset(run['defaultDatasetId']).iterate_items())print(f'Scraped {len(items)} topics with full post content')
cURL
curl -X POST "https://api.apify.com/v2/acts/automation-lab~discourse-scraper/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"forumUrl": "https://community.openai.com","scrapeMode": "latestTopics","maxTopics": 50}'
🤖 MCP (AI assistant integration)
Use this scraper directly inside Claude Code, Claude Desktop, Cursor, or any MCP-compatible AI assistant.
Claude Code (terminal)
$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/discourse-scraper"
Claude Desktop / Cursor / VS Code
Add to your MCP config file:
{"mcpServers": {"apify": {"type": "http","url": "https://mcp.apify.com?tools=automation-lab/discourse-scraper","headers": {"Authorization": "Bearer YOUR_APIFY_TOKEN"}}}}
Example AI prompts
Once connected, try these prompts in your AI assistant:
- "Scrape the latest 50 topics from community.openai.com and show me the ones with over 1,000 views."
- "Search discuss.huggingface.co for topics about 'fine-tuning Llama' and give me a summary of the top discussions."
- "Get the most recent topics from the 'api' category on community.openai.com and find any that mention authentication errors."
- "Export all topics from a Discourse forum to a CSV for analysis."
⚖️ Legality — Is it Legal to Scrape Discourse Forums?
This actor only accesses public Discourse forums — the same content visible to any anonymous visitor. It uses Discourse's documented public JSON API, which is explicitly designed for programmatic access. No login, no authentication bypass, and no private data is accessed.
The data extracted is publicly visible on the web. Always review the specific forum's Terms of Service before scraping at large scale. For commercial data re-use or republication, consult the forum's terms.
This actor does not scrape private messages, user email addresses, IP addresses, or any non-public data.
❓ FAQ
Q: Which Discourse forums does this work with? A: Any public Discourse forum — the software powers thousands of communities including Rust, Ruby, Docker, Wikipedia, and major AI/ML communities. If you can browse topics without logging in, this actor can scrape it.
Q: Does it require an API key? A: No. Public Discourse forums expose a JSON API to anonymous visitors. Just provide the forum URL and run.
Q: Why is categoryName showing a number instead of a name?
A: This can happen if the forum restricts the /categories.json endpoint. The actor falls back to the category ID when the name can't be resolved. Try with a different forum or check if categories require login to view.
Q: Can I scrape a forum that requires login to view topics? A: No — this actor doesn't support authentication. It only works with publicly accessible Discourse forums.
Q: The run finished but I got fewer topics than expected. Why?
A: The forum may have fewer public topics than your maxTopics limit, or the category you specified has fewer topics. For searchTopics mode, the search results are naturally limited by relevance — try a broader search query.
Q: Can I get topics from subcategories?
A: Yes. For nested categories, use the full path in categorySlug. For example, if the URL is /c/parent/child/12, use parent/child as the slug.
🔗 Related scrapers
Explore other automation-lab scrapers for AI/developer community data:
- AI Jobs Scraper — AI/ML job listings from aijobs.net
- AIcrowd Scraper — AI competition listings from AIcrowd
- DevPost Scraper — hackathon and project listings
- AI Tools Directory Scraper — AI tool listings