Pricing

$1.00 / 1,000 result items

Discourse Scraper: Topics, Posts, Users & Search

Scrape any Discourse forum via the public REST API. Latest / top topics, category topics, full topic + posts, user profiles + activity, full-text search. No browser, no proxies, no auth. Pay only per result item.

Pricing

$1.00 / 1,000 result items

Rating

0.0

(0)

Developer

Perconey

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

What does Discourse Scraper do?

Discourse Scraper pulls structured data from any Discourse forum via the official public REST API. Topics with view counts and like counts, full post threads, user profiles with trust levels and badges, category trees, full-text search. The actor calls the documented public JSON endpoints directly: no browser, no proxies, no cookies, no auth. One actor works with every Discourse-powered community: HuggingFace, Django, Python.org, Unity, KiCad, Ruby on Rails, Brave, meta.discourse.org, and hundreds more.

Try it instantly: pick getLatest, leave instance as https://discuss.huggingface.co, click Start. You get the 30 newest HuggingFace forum topics in under 3 seconds for $0.03.

Why use Discourse Scraper?

DevRel teams: Monitor mentions of your project across the major open-source forums. Schedule daily searchPosts runs across Django, Python, HuggingFace, Unity in parallel.
Community managers: Track engagement on your own Discourse forum. getLatest + getCategoryTopics give you topic counts, view counts, like counts for every recent thread.
Customer-support archaeology: When a bug report references "the forum thread from last month", pull getTopicDetail with the topic id and you get the full conversation tree in JSON.
Recruiters: getUserProfile returns trust level, badge count, post count - quick signals on technical depth in a community.
OSS maintainers: Pull getCategoryTopics for "help" categories on multiple Discourse instances to see what users struggle with this week.

How to use Discourse Scraper

Open the Input tab.
Pick an action from the dropdown. getLatest is the simplest starting point.
Set instance (default https://discuss.huggingface.co). To scrape a different Discourse forum, paste its URL.
For category / topic / user / search actions, fill queries.
Tune maxItems (default 30).
Click Start.

Query format by action

Action	Query format
getLatest	leave empty
getTop	leave empty (use topPeriod field if needed)
getCategories	leave empty
getCategoryTopics	category slug (e.g. `beginners`) or `slug/id` (e.g. `beginners/5`)
getTopicDetail	numeric topic id (e.g. `175977`)
getUserProfile	username (e.g. `julien-c`)
getUserActivity	username
searchPosts	free-text search query

Input

Field	Required	Description
`action`	yes	Which API call to make. Eight options.
`instance`	yes	Discourse forum URL. Default https://discuss.huggingface.co.
`queries`	sometimes	Required for category / topic / user / search actions.
`maxItems`	no	Max items per query. Default 30.
`topPeriod`	no	getTop only. all / yearly / quarterly / monthly / weekly / daily.

Output

Every item carries _type (topic / post / category / user / user_action / search_result / error) plus _action and _instance.

{
    "_type": "topic",
    "_action": "getLatest",
    "_instance": "https://discuss.huggingface.co",
    "id": 175977,
    "title": "Practical match for 128Gb Strix Halo with 2x3090s? (inference for coding)",
    "slug": "practical-match-for-128gb-strix-halo-with-2x3090s-inference-for-coding",
    "category_id": 5,
    "posts_count": 2,
    "views": 41,
    "like_count": 0,
    "created_at": "2026-05-14T10:08:00Z",
    "bumped_at": "2026-05-14T10:12:00Z",
    "tags": [],
    "url": "https://discuss.huggingface.co/t/practical-match-for-128gb-strix-halo-with-2x3090s-inference-for-coding/175977"
}

You can download the dataset in JSON, CSV, XML, Excel, RSS or HTML format from the Output tab.

Data fields

Type	Key fields
`topic`	id, title, slug, category_id, posts_count, views, like_count, created_at, bumped_at, last_posted_at, tags, archetype, closed, archived, pinned, url
`post`	id, topic_id, post_number, username, user_trust_level, cooked (HTML), raw (markdown), reply_count, like_count, accepted_answer, created_at, url
`category`	id, name, slug, description, topic_count, post_count, color, parent_category_id, url
`user`	id, username, name, title, trust_level, post_count, topic_count, badge_count, likes_given, likes_received, created_at, last_seen_at
`user_action`	action_type, action_code, created_at, excerpt, topic_id, topic_title, post_number, category_id, url
`search_result`	id, topic_id, post_number, title, blurb, username, like_count, url

Pricing

Pay-per-result: $0.001 per item. No flat monthly fee.

Cost examples:

Daily 30 newest HuggingFace topics: $0.03
1,000 topics from the HF "beginners" category: $1.00
A 200-post thread with full posts: $0.20
50 user profiles across moderators of a forum: $0.05

Tips

Discourse forums run different versions. Most endpoints we wrap have been stable since 2018, but tag plugins are optional - we omit tag actions in v0.1 because they 404 on some installs.
Category slug auto-resolves. Pass just beginners and the actor looks up the numeric id from /categories.json before fetching. You can also pass beginners/5 if you already know it.
Topic detail returns chunks of 20 posts. Past that, the actor fetches additional batches via /t/{id}/posts.json?post_ids[]=... until maxItems is reached.
Search is full-text. It searches both posts and topics; the actor flattens results into a single search_result type with a topic_id so you can fetch the full thread separately.

FAQ, disclaimers, support

Is this legal? The actor calls each Discourse forum's official public REST API with documented endpoints. Public read access is the design intent of the open-source Discourse software (GPL-licensed). We identify with a clear User-Agent and honor 429 / Retry-After.

Does it work with private forums? No. We only hit anonymous read endpoints. Forums that require login to view content are out of scope.

Will I get rate-limited? Discourse has generous per-IP rate limits for read traffic and the actor retries with exponential backoff on 429. For very heavy scraping consider supplying an API key via the headers in your own fork.

Why are tags missing? The tags plugin is optional and not enabled on every Discourse instance. The actor returns topic.tags when present but doesn't have a dedicated getTags action because the endpoint 404s too often.

Bug or feature request? Open an Issue on the actor's Issues tab. I usually respond within a day.

Need a scraper for Hacker News, Stack Overflow, dev.to, arxiv, Lemmy, Mastodon, PeerTube? See my other actors at https://apify.com/perconey.

Discourse Forum Scraper

automation-lab/discourse-scraper

Extract topics, posts, and discussions from any public Discourse forum. Supports latest topics, category filtering, and keyword search. No login required.

Stas Persiianenko

Discourse Community Scraper

rl1987/discourse-community-scraper

Generalised scraper for any Discourse-based community forum (topics, posts, categories, search) via Discourse's JSON API.

R.L.

Discourse Community Scraper

crawlerbros/discourse-community-scraper

Scrape any public Discourse forum with latest topics, trending discussions, category browsing, tag filtering, full-text search, user profiles, and complete post threads. Works with meta.discourse.org, community forums, and any self-hosted Discourse.

Crawler Bros

Discourse Forum Monitor — Mentions & Feedback

bikram07/discourse-forum-monitor

Monitor any Discourse-powered forum for new topics, feature requests, and brand mentions — by latest or keyword search, across one or many forums at once. Keyless official Discourse JSON. Zero-config: latest topics from meta.discourse.org.

Bikram

Discourse.org Forum Scraper

enezli/discourse-forum-scraper

Search any public Discourse forum and get clean, de-duplicated topic JSON: title, author, replies, views and canonical topic URLs. One click, no required fields, no LLM.

Turgay NANTA

Discourse Forum Topics Scraper

parseforge/discourse-forum-topics-scraper

Gather social activity from Discourse Forum Topics with profile name, follower count, posts, replies and timestamps. Loved by community managers, brand watchers and trend researchers. Run on demand or on a recurring schedule and feed every row into your favourite analytics or workflow stack.

ParseForge

Dev.to Scraper: Articles, Comments, Users & Tags

perconey/devto-scraper

Scrape dev.to (Forem) via the official public REST API. Articles by tag/user/latest/top, comments, user profiles, tags, podcasts, videos, listings. No browser, no proxies, no auth. Pay only per result item.

Perconey

Hacker News Scraper: Stories, Comments, Users & Search

perconey/hackernews-scraper

Scrape Hacker News via the official Firebase API + Algolia search. Top/new/best/ask/show/jobs stories, full comment trees, user profiles with karma, free-text search. No browser, no proxies, no auth. Pay only per result item.

Perconey

arXiv Scraper: Papers, Authors, Categories & Search

perconey/arxiv-scraper

Scrape arxiv.org via the official Atom API. Full-text search, by author / title / category, paper detail by id, latest in any category. Returns title, abstract, authors, DOI, PDF link. No auth, no proxies. Pay only per result item.

Perconey