Discourse Scraper: Topics, Posts, Users & Search
Pricing
$1.00 / 1,000 result items
Discourse Scraper: Topics, Posts, Users & Search
Scrape any Discourse forum via the public REST API. Latest / top topics, category topics, full topic + posts, user profiles + activity, full-text search. No browser, no proxies, no auth. Pay only per result item.
Pricing
$1.00 / 1,000 result items
Rating
0.0
(0)
Developer
Perconey
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
What does Discourse Scraper do?
Discourse Scraper pulls structured data from any Discourse forum via the official public REST API. Topics with view counts and like counts, full post threads, user profiles with trust levels and badges, category trees, full-text search. The actor calls the documented public JSON endpoints directly: no browser, no proxies, no cookies, no auth. One actor works with every Discourse-powered community: HuggingFace, Django, Python.org, Unity, KiCad, Ruby on Rails, Brave, meta.discourse.org, and hundreds more.
Try it instantly: pick getLatest, leave instance as https://discuss.huggingface.co, click Start. You get the 30 newest HuggingFace forum topics in under 3 seconds for $0.03.
Why use Discourse Scraper?
- DevRel teams: Monitor mentions of your project across the major open-source forums. Schedule daily
searchPostsruns across Django, Python, HuggingFace, Unity in parallel. - Community managers: Track engagement on your own Discourse forum.
getLatest+getCategoryTopicsgive you topic counts, view counts, like counts for every recent thread. - Customer-support archaeology: When a bug report references "the forum thread from last month", pull
getTopicDetailwith the topic id and you get the full conversation tree in JSON. - Recruiters:
getUserProfilereturns trust level, badge count, post count - quick signals on technical depth in a community. - OSS maintainers: Pull
getCategoryTopicsfor "help" categories on multiple Discourse instances to see what users struggle with this week.
How to use Discourse Scraper
- Open the Input tab.
- Pick an action from the dropdown.
getLatestis the simplest starting point. - Set instance (default
https://discuss.huggingface.co). To scrape a different Discourse forum, paste its URL. - For category / topic / user / search actions, fill queries.
- Tune maxItems (default 30).
- Click Start.
Query format by action
| Action | Query format |
|---|---|
| getLatest | leave empty |
| getTop | leave empty (use topPeriod field if needed) |
| getCategories | leave empty |
| getCategoryTopics | category slug (e.g. beginners) or slug/id (e.g. beginners/5) |
| getTopicDetail | numeric topic id (e.g. 175977) |
| getUserProfile | username (e.g. julien-c) |
| getUserActivity | username |
| searchPosts | free-text search query |
Input
| Field | Required | Description |
|---|---|---|
action | yes | Which API call to make. Eight options. |
instance | yes | Discourse forum URL. Default https://discuss.huggingface.co. |
queries | sometimes | Required for category / topic / user / search actions. |
maxItems | no | Max items per query. Default 30. |
topPeriod | no | getTop only. all / yearly / quarterly / monthly / weekly / daily. |
Output
Every item carries _type (topic / post / category / user / user_action / search_result / error) plus _action and _instance.
{"_type": "topic","_action": "getLatest","_instance": "https://discuss.huggingface.co","id": 175977,"title": "Practical match for 128Gb Strix Halo with 2x3090s? (inference for coding)","slug": "practical-match-for-128gb-strix-halo-with-2x3090s-inference-for-coding","category_id": 5,"posts_count": 2,"views": 41,"like_count": 0,"created_at": "2026-05-14T10:08:00Z","bumped_at": "2026-05-14T10:12:00Z","tags": [],"url": "https://discuss.huggingface.co/t/practical-match-for-128gb-strix-halo-with-2x3090s-inference-for-coding/175977"}
You can download the dataset in JSON, CSV, XML, Excel, RSS or HTML format from the Output tab.
Data fields
| Type | Key fields |
|---|---|
topic | id, title, slug, category_id, posts_count, views, like_count, created_at, bumped_at, last_posted_at, tags, archetype, closed, archived, pinned, url |
post | id, topic_id, post_number, username, user_trust_level, cooked (HTML), raw (markdown), reply_count, like_count, accepted_answer, created_at, url |
category | id, name, slug, description, topic_count, post_count, color, parent_category_id, url |
user | id, username, name, title, trust_level, post_count, topic_count, badge_count, likes_given, likes_received, created_at, last_seen_at |
user_action | action_type, action_code, created_at, excerpt, topic_id, topic_title, post_number, category_id, url |
search_result | id, topic_id, post_number, title, blurb, username, like_count, url |
Pricing
Pay-per-result: $0.001 per item. No flat monthly fee.
Cost examples:
- Daily 30 newest HuggingFace topics: $0.03
- 1,000 topics from the HF "beginners" category: $1.00
- A 200-post thread with full posts: $0.20
- 50 user profiles across moderators of a forum: $0.05
Tips
- Discourse forums run different versions. Most endpoints we wrap have been stable since 2018, but tag plugins are optional - we omit tag actions in v0.1 because they 404 on some installs.
- Category slug auto-resolves. Pass just
beginnersand the actor looks up the numeric id from/categories.jsonbefore fetching. You can also passbeginners/5if you already know it. - Topic detail returns chunks of 20 posts. Past that, the actor fetches additional batches via
/t/{id}/posts.json?post_ids[]=...until maxItems is reached. - Search is full-text. It searches both posts and topics; the actor flattens results into a single
search_resulttype with atopic_idso you can fetch the full thread separately.
FAQ, disclaimers, support
Is this legal? The actor calls each Discourse forum's official public REST API with documented endpoints. Public read access is the design intent of the open-source Discourse software (GPL-licensed). We identify with a clear User-Agent and honor 429 / Retry-After.
Does it work with private forums? No. We only hit anonymous read endpoints. Forums that require login to view content are out of scope.
Will I get rate-limited? Discourse has generous per-IP rate limits for read traffic and the actor retries with exponential backoff on 429. For very heavy scraping consider supplying an API key via the headers in your own fork.
Why are tags missing? The tags plugin is optional and not enabled on every Discourse instance. The actor returns topic.tags when present but doesn't have a dedicated getTags action because the endpoint 404s too often.
Bug or feature request? Open an Issue on the actor's Issues tab. I usually respond within a day.
Need a scraper for Hacker News, Stack Overflow, dev.to, arxiv, Lemmy, Mastodon, PeerTube? See my other actors at https://apify.com/perconey.