Pricing

Pay per event

Discourse Forum Scraper

Extract topics, posts, and discussions from any public Discourse forum. Supports latest topics, category filtering, and keyword search. No login required.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

17 days ago

Last modified

🔍 What does it do?

Discourse Forum Scraper connects to Discourse's public JSON API (no authentication required for public forums) and extracts:

Latest topics — trending and most recent discussions paginated across the entire forum
Category topics — all topics within a specific category or sub-category
Search results — topics matching a keyword or phrase

For each topic, you get structured metadata: title, URL, author, view count, like count, post count, category, tags, excerpt, creation date, and more. Optionally enable Include Post Content to fetch the full post body for each topic (including HTML content, poster info, and reaction counts).

The actor uses only Discourse's public JSON API endpoints (/latest.json, /c/{category}.json, /search.json, /t/{id}.json) — no browser, no proxies, minimal cost per result.

👥 Who is it for?

AI/ML researchers who want to monitor community.openai.com or discuss.huggingface.co for emerging topics, model comparisons, and developer pain points — without manually browsing the forum every day.

Product managers tracking competitor mentions, feature requests, and user feedback across developer communities. Discourse forums are where power users discuss product issues openly.

Content marketers and SEO teams identifying high-traffic discussion topics in their niche, repurposing forum Q&As into blog posts, or monitoring what questions get thousands of views.

Data scientists and NLP researchers building training datasets from community discussions, sentiment analysis corpora, or topic clustering studies on domain-specific text.

Community managers bulk-exporting discussions for archiving, migration, compliance audits, or import into internal knowledge bases.

Competitive intelligence analysts monitoring developer forums for product feedback, bug reports, and integration questions targeting specific categories.

💡 Why use it?

Discourse has over 5,000 active public forums across tech, gaming, academia, open-source, and more — but there's no centralized API that lets you query across all of them. This actor gives you a uniform interface to any Discourse forum, producing clean structured JSON you can pipe directly into spreadsheets, databases, or downstream APIs.

Compared to manually browsing Discourse or writing one-off scripts:

✅ Works on any Discourse forum — just change the URL
✅ Handles pagination automatically — get 5,000+ topics with one run
✅ Clean JSON output compatible with Google Sheets, Airtable, BigQuery, and more
✅ Optional post content extraction for full discussion text
✅ Category name resolution — readable categoryName field, not just an ID
✅ No API keys, no rate limit headaches, no authentication setup

📊 What data does it extract?

Field	Description	Example
`topicId`	Unique topic ID	`137192`
`title`	Topic title	`"Best practices for GPT-4 system prompts"`
`slug`	URL-friendly slug	`"best-practices-for-gpt-4-system-prompts"`
`url`	Direct link to the topic	`"https://community.openai.com/t/..."`
`categoryId`	Category numeric ID	`7`
`categoryName`	Resolved category name	`"API"`
`postsCount`	Number of replies	`24`
`replyCount`	Direct reply count	`18`
`views`	Total view count	`12503`
`likeCount`	Total likes received	`89`
`createdAt`	ISO timestamp created	`"2023-04-03T10:23:49.213Z"`
`lastPostedAt`	ISO timestamp last reply	`"2024-01-15T08:44:11.000Z"`
`tags`	List of topic tags	`["gpt-4", "system-prompt"]`
`excerpt`	Short text excerpt	`"Has anyone found a good pattern for..."`
`authorUsername`	Original poster username	`"johndev42"`
`pinned`	Is topic pinned	`false`
`closed`	Is topic closed	`false`
`posts`	Post array (optional)	See below

When includePostContent: true, each topic includes a posts array:

Field	Description
`postId`	Unique post ID
`postNumber`	Position in thread
`username`	Poster's username
`displayName`	Poster's display name
`createdAt`	Post creation timestamp
`content`	Full HTML post content
`likeCount`	Number of likes on this post
`replyCount`	Direct replies to this post
`reads`	How many users read this post
`isAcceptedAnswer`	Is this the accepted answer?

💰 How much does it cost to scrape Discourse forum topics?

This actor uses pay-per-event (PPE) pricing — you pay only for the data you extract, not for idle time.

Plan	Price per topic
FREE	$0.00115
BRONZE	$0.001
SILVER	$0.00078
GOLD	$0.0006
PLATINUM	$0.0004
DIAMOND	$0.00028

Plus a flat $0.005 start fee per run.

Cost examples:

100 topics: ~$0.11 (FREE tier) / $0.105 (BRONZE)
500 topics: ~$0.585 (FREE) / $0.505 (BRONZE)
2,000 topics: ~$2.31 (FREE) / $2.005 (BRONZE)
5,000 topics with posts: ~$5.78 (FREE) / $5.005 (BRONZE)

Topics-only runs are cheap. Enabling includePostContent doubles API calls (one extra request per topic) but the PPE charge per result stays the same.

Free plan estimate: Apify Free plan ($0) includes $5/month in usage, which covers ~4,000 topics per month at BRONZE rates.

🚀 How to use it

Step 1: Choose your target forum

Find the base URL of any public Discourse forum. Examples:

https://community.openai.com — OpenAI Developer Community
https://discuss.huggingface.co — HuggingFace forums
https://meta.discourse.org — Discourse Meta
https://forum.cursor.sh — Cursor AI community
https://community.cloudflare.com — Cloudflare developers

Step 2: Select a scrape mode

Latest Topics — get the freshest discussions sorted by most recently active
Category Topics — get topics from a specific category (requires the category slug, found in the URL)
Search Topics — search for topics matching a keyword

Step 3: Set limits

Start small with maxTopics: 20 to preview the data. Scale up to 5,000 for bulk exports.

Step 4: Enable post content (optional)

Set includePostContent: true and maxPostsPerTopic: 10 to fetch the actual post bodies. Best for NLP datasets, content archiving, and answer extraction.

Step 5: Run and export

Click Start and wait for results. Export to JSON, CSV, or Excel. Connect to Google Sheets or Airtable directly from the Apify platform.

⚙️ Input parameters

Parameter	Type	Default	Description
`forumUrl`	string	`https://community.openai.com`	Base URL of the Discourse forum
`scrapeMode`	enum	`latestTopics`	What to scrape: `latestTopics`, `categoryTopics`, `searchTopics`
`categorySlug`	string	—	Category slug for `categoryTopics` mode (e.g., `api`)
`searchQuery`	string	—	Search keywords for `searchTopics` mode
`maxTopics`	integer	`50`	Maximum number of topics to extract
`includePostContent`	boolean	`false`	Fetch full post content for each topic
`maxPostsPerTopic`	integer	`10`	Max posts per topic (when `includePostContent` is true)
`maxRequestRetries`	integer	`3`	Retry attempts for failed HTTP requests

📤 Output example

{
  "topicId": 137192,
  "title": "Cant find and use gpt-4 model (I have the gpt-4 Invitation)",
  "slug": "cant-find-and-use-gpt-4-model-i-have-the-gpt-4-invitation",
  "url": "https://community.openai.com/t/cant-find-and-use-gpt-4-model/137192",
  "categoryId": 7,
  "categoryName": "API",
  "postsCount": 15,
  "replyCount": 12,
  "views": 8432,
  "likeCount": 23,
  "createdAt": "2023-04-03T10:23:49.213Z",
  "lastPostedAt": "2023-05-12T14:31:00.000Z",
  "tags": ["gpt-4", "access"],
  "excerpt": "Hi everyone, I recently received an invitation for GPT-4 access...",
  "authorUsername": "johndev42",
  "pinned": false,
  "closed": false
}

💡 Tips and tricks

Finding the category slug: Navigate to a category page and check the URL. For https://community.openai.com/c/api/7, the slug is api. For subcategories like /c/api/plugins/23, use api/plugins.

Filtering noisy topics: Use searchTopics mode with specific technical keywords to get only relevant discussions. For example, "function calling" returns topics specifically about that feature.

Scraping multiple forums: Run the actor multiple times with different forumUrl values. Use the Apify API to start runs programmatically and merge results.

Pagination behavior: The actor automatically paginates. Set maxTopics: 5000 to get the maximum — Discourse's /latest.json returns ~30 topics per page, so 5,000 topics requires ~167 API calls.

Rate limiting: If you get retried errors or timeouts, reduce throughput by lowering maxTopics per run or increasing maxRequestRetries. Most Discourse forums have no strict rate limits for anonymous JSON requests.

searchTopics mode — limited fields: Discourse's /search.json endpoint returns slim topic objects that do not include views, authorUsername, or excerpt. These fields will be empty ("" or 0) for search results. This is a Discourse API limitation, not a scraper bug. If you need these fields, use latestTopics or categoryTopics mode and filter by keyword on your end, or enable includePostContent: true (which fetches full topic data per result).

Private forums: This actor only works with public Discourse forums. Forums requiring login to browse will return empty results or redirect to a login page.

🔗 Integrations

Workflow: Daily topic monitoring via Google Sheets

Schedule this actor daily, point output to a Google Sheets integration (Apify has a built-in Sheets connector), and track emerging topics over time. Set scrapeMode: latestTopics and maxTopics: 50 for a lightweight daily snapshot.

Workflow: Competitive intelligence pipeline

Run searchTopics mode against competitor brand names across multiple forums. Feed output into an LLM summarizer to extract sentiment and pain points. Export a weekly digest.

Workflow: NLP training dataset creation

Enable includePostContent: true and maxPostsPerTopic: 20 for a specific forum category. Output gives you structured conversation threads with HTML content, which you can clean for fine-tuning or RAG ingestion.

Workflow: Community migration

Export all topics and posts from an old Discourse installation before migration. The structured JSON output makes it easy to transform and re-import into a new forum or knowledge base platform.

Run searchTopics mode on niche Discourse forums with target keywords. Topics with high views and likeCount but few answers are prime candidates for content marketing — write the definitive answer and link back.

🔌 API usage

Node.js (Apify client)

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('automation-lab/discourse-scraper').call({
  forumUrl: 'https://community.openai.com',
  scrapeMode: 'searchTopics',
  searchQuery: 'function calling',
  maxTopics: 100,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Scraped ${items.length} topics`);

Python

from apify_client import ApifyClient

client = ApifyClient(token='YOUR_APIFY_TOKEN')

run = client.actor('automation-lab/discourse-scraper').call(run_input={
    'forumUrl': 'https://discuss.huggingface.co',
    'scrapeMode': 'categoryTopics',
    'categorySlug': 'research',
    'maxTopics': 200,
    'includePostContent': True,
    'maxPostsPerTopic': 5,
})

items = list(client.dataset(run['defaultDatasetId']).iterate_items())
print(f'Scraped {len(items)} topics with full post content')

cURL

curl -X POST "https://api.apify.com/v2/acts/automation-lab~discourse-scraper/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "forumUrl": "https://community.openai.com",
    "scrapeMode": "latestTopics",
    "maxTopics": 50
  }'

🤖 MCP (AI assistant integration)

Use this scraper directly inside Claude Code, Claude Desktop, Cursor, or any MCP-compatible AI assistant.

Claude Code (terminal)

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/discourse-scraper"

Claude Desktop / Cursor / VS Code

Add to your MCP config file:

{
  "mcpServers": {
    "apify": {
      "type": "http",
      "url": "https://mcp.apify.com?tools=automation-lab/discourse-scraper",
      "headers": {
        "Authorization": "Bearer YOUR_APIFY_TOKEN"
      }
    }
  }
}

Example AI prompts

Once connected, try these prompts in your AI assistant:

"Scrape the latest 50 topics from community.openai.com and show me the ones with over 1,000 views."
"Search discuss.huggingface.co for topics about 'fine-tuning Llama' and give me a summary of the top discussions."
"Get the most recent topics from the 'api' category on community.openai.com and find any that mention authentication errors."
"Export all topics from a Discourse forum to a CSV for analysis."

⚖️ Legality — Is it Legal to Scrape Discourse Forums?

This actor only accesses public Discourse forums — the same content visible to any anonymous visitor. It uses Discourse's documented public JSON API, which is explicitly designed for programmatic access. No login, no authentication bypass, and no private data is accessed.

The data extracted is publicly visible on the web. Always review the specific forum's Terms of Service before scraping at large scale. For commercial data re-use or republication, consult the forum's terms.

This actor does not scrape private messages, user email addresses, IP addresses, or any non-public data.

❓ FAQ

Q: Which Discourse forums does this work with? A: Any public Discourse forum — the software powers thousands of communities including Rust, Ruby, Docker, Wikipedia, and major AI/ML communities. If you can browse topics without logging in, this actor can scrape it.

Q: Does it require an API key? A: No. Public Discourse forums expose a JSON API to anonymous visitors. Just provide the forum URL and run.

Q: Why is categoryName showing a number instead of a name? A: This can happen if the forum restricts the /categories.json endpoint. The actor falls back to the category ID when the name can't be resolved. Try with a different forum or check if categories require login to view.

Q: Can I scrape a forum that requires login to view topics? A: No — this actor doesn't support authentication. It only works with publicly accessible Discourse forums.

Q: The run finished but I got fewer topics than expected. Why? A: The forum may have fewer public topics than your maxTopics limit, or the category you specified has fewer topics. For searchTopics mode, the search results are naturally limited by relevance — try a broader search query.

Q: Can I get topics from subcategories? A: Yes. For nested categories, use the full path in categorySlug. For example, if the URL is /c/parent/child/12, use parent/child as the slug.

Explore other automation-lab scrapers for AI/developer community data:

AI Jobs Scraper — AI/ML job listings from aijobs.net
AIcrowd Scraper — AI competition listings from AIcrowd
DevPost Scraper — hackathon and project listings
AI Tools Directory Scraper — AI tool listings

Discourse Forum Monitor — Mentions & Feedback

bikram07/discourse-forum-monitor

Monitor any Discourse-powered forum for new topics, feature requests, and brand mentions — by latest or keyword search, across one or many forums at once. Keyless official Discourse JSON. Zero-config: latest topics from meta.discourse.org.

Bikram

Discourse Community Scraper

crawlerbros/discourse-community-scraper

Scrape any public Discourse forum with latest topics, trending discussions, category browsing, tag filtering, full-text search, user profiles, and complete post threads. Works with meta.discourse.org, community forums, and any self-hosted Discourse.

Crawler Bros

Discourse Scraper: Topics, Posts, Users & Search

perconey/discourse-scraper

Scrape any Discourse forum via the public REST API. Latest / top topics, category topics, full topic + posts, user profiles + activity, full-text search. No browser, no proxies, no auth. Pay only per result item.

Perconey

Discourse Community Scraper

rl1987/discourse-community-scraper

Generalised scraper for any Discourse-based community forum (topics, posts, categories, search) via Discourse's JSON API.

R.L.

Discourse.org Forum Scraper

enezli/discourse-forum-scraper

Search any public Discourse forum and get clean, de-duplicated topic JSON: title, author, replies, views and canonical topic URLs. One click, no required fields, no LLM.

Turgay NANTA

Discourse Forum Topics Scraper

parseforge/discourse-forum-topics-scraper

Gather social activity from Discourse Forum Topics with profile name, follower count, posts, replies and timestamps. Loved by community managers, brand watchers and trend researchers. Run on demand or on a recurring schedule and feed every row into your favourite analytics or workflow stack.

ParseForge

Forums Search Forum Posts Scraper

data_direct/forums-posts-scraper

Search forum posts across the web by keyword with time and country filters

Data Direct

Discourse Meta Latest Scraper

benthepythondev/discourse-meta-latest-scraper

Collect Discourse Meta records and export id, title, description, created at, views, slug as structured JSON, CSV or Excel data.

Ben

SlickDeals Forum Threads Scraper (Original Merchant Links)

scralab/slickdeals-forum-threads-scraper

Scrape deals from **any Slickdeals forum thread list page**. Simply provide a forum URL — Hot Deals, filtered views, category pages, or any custom forum query — and get structured deal data with **direct links to the original merchant**.

scralab

RedFlagDeals Forum Threads Scraper (Original Merchant Links)

scralab/redflagdeals-forum-threads-scraper

Scrape deals from any RedFlagDeals (Canada) forum thread list page. Provide any forum URL (e.g., Hot Deals, filtered views, category pages) and get structured deal data with direct merchant links.

scralab

Discourse Forum Scraper

🔍 What does it do?

👥 Who is it for?

💡 Why use it?

📊 What data does it extract?

💰 How much does it cost to scrape Discourse forum topics?

🚀 How to use it

Step 1: Choose your target forum

Step 2: Select a scrape mode

Step 3: Set limits

Step 4: Enable post content (optional)

Step 5: Run and export

⚙️ Input parameters

📤 Output example

💡 Tips and tricks

🔗 Integrations

Workflow: Daily topic monitoring via Google Sheets

Workflow: Competitive intelligence pipeline

Workflow: NLP training dataset creation

Workflow: Community migration

Workflow: SEO content research

🔌 API usage

Node.js (Apify client)

Python

cURL

🤖 MCP (AI assistant integration)

Claude Code (terminal)

Claude Desktop / Cursor / VS Code

Example AI prompts

⚖️ Legality — Is it Legal to Scrape Discourse Forums?

❓ FAQ

🔗 Related scrapers

You might also like

Discourse Forum Monitor — Mentions & Feedback

Discourse Community Scraper

Discourse Scraper: Topics, Posts, Users & Search

Discourse Community Scraper

Discourse.org Forum Scraper

Discourse Forum Topics Scraper

Forums Search Forum Posts Scraper

Discourse Meta Latest Scraper

SlickDeals Forum Threads Scraper (Original Merchant Links)

RedFlagDeals Forum Threads Scraper (Original Merchant Links)