Discourse Forum Scraper avatar

Discourse Forum Scraper

Pricing

Pay per event

Go to Apify Store
Discourse Forum Scraper

Discourse Forum Scraper

Extract topics, posts, and discussions from any public Discourse forum. Supports latest topics, category filtering, and keyword search. No login required.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Categories

Share

Extract topics, posts, and community discussions from any public Discourse forum — including community.openai.com, discuss.huggingface.co, meta.discourse.org, and 5,000+ other Discourse communities worldwide. No API key required.

🔍 What does it do?

Discourse Forum Scraper connects to Discourse's public JSON API (no authentication required for public forums) and extracts:

  • Latest topics — trending and most recent discussions paginated across the entire forum
  • Category topics — all topics within a specific category or sub-category
  • Search results — topics matching a keyword or phrase

For each topic, you get structured metadata: title, URL, author, view count, like count, post count, category, tags, excerpt, creation date, and more. Optionally enable Include Post Content to fetch the full post body for each topic (including HTML content, poster info, and reaction counts).

The actor uses only Discourse's public JSON API endpoints (/latest.json, /c/{category}.json, /search.json, /t/{id}.json) — no browser, no proxies, minimal cost per result.

👥 Who is it for?

AI/ML researchers who want to monitor community.openai.com or discuss.huggingface.co for emerging topics, model comparisons, and developer pain points — without manually browsing the forum every day.

Product managers tracking competitor mentions, feature requests, and user feedback across developer communities. Discourse forums are where power users discuss product issues openly.

Content marketers and SEO teams identifying high-traffic discussion topics in their niche, repurposing forum Q&As into blog posts, or monitoring what questions get thousands of views.

Data scientists and NLP researchers building training datasets from community discussions, sentiment analysis corpora, or topic clustering studies on domain-specific text.

Community managers bulk-exporting discussions for archiving, migration, compliance audits, or import into internal knowledge bases.

Competitive intelligence analysts monitoring developer forums for product feedback, bug reports, and integration questions targeting specific categories.

💡 Why use it?

Discourse has over 5,000 active public forums across tech, gaming, academia, open-source, and more — but there's no centralized API that lets you query across all of them. This actor gives you a uniform interface to any Discourse forum, producing clean structured JSON you can pipe directly into spreadsheets, databases, or downstream APIs.

Compared to manually browsing Discourse or writing one-off scripts:

  • ✅ Works on any Discourse forum — just change the URL
  • ✅ Handles pagination automatically — get 5,000+ topics with one run
  • ✅ Clean JSON output compatible with Google Sheets, Airtable, BigQuery, and more
  • ✅ Optional post content extraction for full discussion text
  • ✅ Category name resolution — readable categoryName field, not just an ID
  • ✅ No API keys, no rate limit headaches, no authentication setup

📊 What data does it extract?

FieldDescriptionExample
topicIdUnique topic ID137192
titleTopic title"Best practices for GPT-4 system prompts"
slugURL-friendly slug"best-practices-for-gpt-4-system-prompts"
urlDirect link to the topic"https://community.openai.com/t/..."
categoryIdCategory numeric ID7
categoryNameResolved category name"API"
postsCountNumber of replies24
replyCountDirect reply count18
viewsTotal view count12503
likeCountTotal likes received89
createdAtISO timestamp created"2023-04-03T10:23:49.213Z"
lastPostedAtISO timestamp last reply"2024-01-15T08:44:11.000Z"
tagsList of topic tags["gpt-4", "system-prompt"]
excerptShort text excerpt"Has anyone found a good pattern for..."
authorUsernameOriginal poster username"johndev42"
pinnedIs topic pinnedfalse
closedIs topic closedfalse
postsPost array (optional)See below

When includePostContent: true, each topic includes a posts array:

FieldDescription
postIdUnique post ID
postNumberPosition in thread
usernamePoster's username
displayNamePoster's display name
createdAtPost creation timestamp
contentFull HTML post content
likeCountNumber of likes on this post
replyCountDirect replies to this post
readsHow many users read this post
isAcceptedAnswerIs this the accepted answer?

💰 How much does it cost to scrape Discourse forum topics?

This actor uses pay-per-event (PPE) pricing — you pay only for the data you extract, not for idle time.

PlanPrice per topic
FREE$0.00115
BRONZE$0.001
SILVER$0.00078
GOLD$0.0006
PLATINUM$0.0004
DIAMOND$0.00028

Plus a flat $0.005 start fee per run.

Cost examples:

  • 100 topics: ~$0.11 (FREE tier) / $0.105 (BRONZE)
  • 500 topics: ~$0.585 (FREE) / $0.505 (BRONZE)
  • 2,000 topics: ~$2.31 (FREE) / $2.005 (BRONZE)
  • 5,000 topics with posts: ~$5.78 (FREE) / $5.005 (BRONZE)

Topics-only runs are cheap. Enabling includePostContent doubles API calls (one extra request per topic) but the PPE charge per result stays the same.

Free plan estimate: Apify Free plan ($0) includes $5/month in usage, which covers ~4,000 topics per month at BRONZE rates.

🚀 How to use it

Step 1: Choose your target forum

Find the base URL of any public Discourse forum. Examples:

  • https://community.openai.com — OpenAI Developer Community
  • https://discuss.huggingface.co — HuggingFace forums
  • https://meta.discourse.org — Discourse Meta
  • https://forum.cursor.sh — Cursor AI community
  • https://community.cloudflare.com — Cloudflare developers

Step 2: Select a scrape mode

  • Latest Topics — get the freshest discussions sorted by most recently active
  • Category Topics — get topics from a specific category (requires the category slug, found in the URL)
  • Search Topics — search for topics matching a keyword

Step 3: Set limits

Start small with maxTopics: 20 to preview the data. Scale up to 5,000 for bulk exports.

Step 4: Enable post content (optional)

Set includePostContent: true and maxPostsPerTopic: 10 to fetch the actual post bodies. Best for NLP datasets, content archiving, and answer extraction.

Step 5: Run and export

Click Start and wait for results. Export to JSON, CSV, or Excel. Connect to Google Sheets or Airtable directly from the Apify platform.

⚙️ Input parameters

ParameterTypeDefaultDescription
forumUrlstringhttps://community.openai.comBase URL of the Discourse forum
scrapeModeenumlatestTopicsWhat to scrape: latestTopics, categoryTopics, searchTopics
categorySlugstringCategory slug for categoryTopics mode (e.g., api)
searchQuerystringSearch keywords for searchTopics mode
maxTopicsinteger50Maximum number of topics to extract
includePostContentbooleanfalseFetch full post content for each topic
maxPostsPerTopicinteger10Max posts per topic (when includePostContent is true)
maxRequestRetriesinteger3Retry attempts for failed HTTP requests

📤 Output example

{
"topicId": 137192,
"title": "Cant find and use gpt-4 model (I have the gpt-4 Invitation)",
"slug": "cant-find-and-use-gpt-4-model-i-have-the-gpt-4-invitation",
"url": "https://community.openai.com/t/cant-find-and-use-gpt-4-model/137192",
"categoryId": 7,
"categoryName": "API",
"postsCount": 15,
"replyCount": 12,
"views": 8432,
"likeCount": 23,
"createdAt": "2023-04-03T10:23:49.213Z",
"lastPostedAt": "2023-05-12T14:31:00.000Z",
"tags": ["gpt-4", "access"],
"excerpt": "Hi everyone, I recently received an invitation for GPT-4 access...",
"authorUsername": "johndev42",
"pinned": false,
"closed": false
}

💡 Tips and tricks

Finding the category slug: Navigate to a category page and check the URL. For https://community.openai.com/c/api/7, the slug is api. For subcategories like /c/api/plugins/23, use api/plugins.

Filtering noisy topics: Use searchTopics mode with specific technical keywords to get only relevant discussions. For example, "function calling" returns topics specifically about that feature.

Scraping multiple forums: Run the actor multiple times with different forumUrl values. Use the Apify API to start runs programmatically and merge results.

Pagination behavior: The actor automatically paginates. Set maxTopics: 5000 to get the maximum — Discourse's /latest.json returns ~30 topics per page, so 5,000 topics requires ~167 API calls.

Rate limiting: If you get retried errors or timeouts, reduce throughput by lowering maxTopics per run or increasing maxRequestRetries. Most Discourse forums have no strict rate limits for anonymous JSON requests.

searchTopics mode — limited fields: Discourse's /search.json endpoint returns slim topic objects that do not include views, authorUsername, or excerpt. These fields will be empty ("" or 0) for search results. This is a Discourse API limitation, not a scraper bug. If you need these fields, use latestTopics or categoryTopics mode and filter by keyword on your end, or enable includePostContent: true (which fetches full topic data per result).

Private forums: This actor only works with public Discourse forums. Forums requiring login to browse will return empty results or redirect to a login page.

🔗 Integrations

Workflow: Daily topic monitoring via Google Sheets

Schedule this actor daily, point output to a Google Sheets integration (Apify has a built-in Sheets connector), and track emerging topics over time. Set scrapeMode: latestTopics and maxTopics: 50 for a lightweight daily snapshot.

Workflow: Competitive intelligence pipeline

Run searchTopics mode against competitor brand names across multiple forums. Feed output into an LLM summarizer to extract sentiment and pain points. Export a weekly digest.

Workflow: NLP training dataset creation

Enable includePostContent: true and maxPostsPerTopic: 20 for a specific forum category. Output gives you structured conversation threads with HTML content, which you can clean for fine-tuning or RAG ingestion.

Workflow: Community migration

Export all topics and posts from an old Discourse installation before migration. The structured JSON output makes it easy to transform and re-import into a new forum or knowledge base platform.

Workflow: SEO content research

Run searchTopics mode on niche Discourse forums with target keywords. Topics with high views and likeCount but few answers are prime candidates for content marketing — write the definitive answer and link back.

🔌 API usage

Node.js (Apify client)

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('automation-lab/discourse-scraper').call({
forumUrl: 'https://community.openai.com',
scrapeMode: 'searchTopics',
searchQuery: 'function calling',
maxTopics: 100,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Scraped ${items.length} topics`);

Python

from apify_client import ApifyClient
client = ApifyClient(token='YOUR_APIFY_TOKEN')
run = client.actor('automation-lab/discourse-scraper').call(run_input={
'forumUrl': 'https://discuss.huggingface.co',
'scrapeMode': 'categoryTopics',
'categorySlug': 'research',
'maxTopics': 200,
'includePostContent': True,
'maxPostsPerTopic': 5,
})
items = list(client.dataset(run['defaultDatasetId']).iterate_items())
print(f'Scraped {len(items)} topics with full post content')

cURL

curl -X POST "https://api.apify.com/v2/acts/automation-lab~discourse-scraper/runs?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"forumUrl": "https://community.openai.com",
"scrapeMode": "latestTopics",
"maxTopics": 50
}'

🤖 MCP (AI assistant integration)

Use this scraper directly inside Claude Code, Claude Desktop, Cursor, or any MCP-compatible AI assistant.

Claude Code (terminal)

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/discourse-scraper"

Claude Desktop / Cursor / VS Code

Add to your MCP config file:

{
"mcpServers": {
"apify": {
"type": "http",
"url": "https://mcp.apify.com?tools=automation-lab/discourse-scraper",
"headers": {
"Authorization": "Bearer YOUR_APIFY_TOKEN"
}
}
}
}

Example AI prompts

Once connected, try these prompts in your AI assistant:

  • "Scrape the latest 50 topics from community.openai.com and show me the ones with over 1,000 views."
  • "Search discuss.huggingface.co for topics about 'fine-tuning Llama' and give me a summary of the top discussions."
  • "Get the most recent topics from the 'api' category on community.openai.com and find any that mention authentication errors."
  • "Export all topics from a Discourse forum to a CSV for analysis."

This actor only accesses public Discourse forums — the same content visible to any anonymous visitor. It uses Discourse's documented public JSON API, which is explicitly designed for programmatic access. No login, no authentication bypass, and no private data is accessed.

The data extracted is publicly visible on the web. Always review the specific forum's Terms of Service before scraping at large scale. For commercial data re-use or republication, consult the forum's terms.

This actor does not scrape private messages, user email addresses, IP addresses, or any non-public data.

❓ FAQ

Q: Which Discourse forums does this work with? A: Any public Discourse forum — the software powers thousands of communities including Rust, Ruby, Docker, Wikipedia, and major AI/ML communities. If you can browse topics without logging in, this actor can scrape it.

Q: Does it require an API key? A: No. Public Discourse forums expose a JSON API to anonymous visitors. Just provide the forum URL and run.

Q: Why is categoryName showing a number instead of a name? A: This can happen if the forum restricts the /categories.json endpoint. The actor falls back to the category ID when the name can't be resolved. Try with a different forum or check if categories require login to view.

Q: Can I scrape a forum that requires login to view topics? A: No — this actor doesn't support authentication. It only works with publicly accessible Discourse forums.

Q: The run finished but I got fewer topics than expected. Why? A: The forum may have fewer public topics than your maxTopics limit, or the category you specified has fewer topics. For searchTopics mode, the search results are naturally limited by relevance — try a broader search query.

Q: Can I get topics from subcategories? A: Yes. For nested categories, use the full path in categorySlug. For example, if the URL is /c/parent/child/12, use parent/child as the slug.

Explore other automation-lab scrapers for AI/developer community data: