Discourse Scraper: Topics, Posts, Users & Search avatar

Discourse Scraper: Topics, Posts, Users & Search

Pricing

$1.00 / 1,000 result items

Go to Apify Store
Discourse Scraper: Topics, Posts, Users & Search

Discourse Scraper: Topics, Posts, Users & Search

Scrape any Discourse forum via the public REST API. Latest / top topics, category topics, full topic + posts, user profiles + activity, full-text search. No browser, no proxies, no auth. Pay only per result item.

Pricing

$1.00 / 1,000 result items

Rating

0.0

(0)

Developer

Perconey

Perconey

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

What does Discourse Scraper do?

Discourse Scraper pulls structured data from any Discourse forum via the official public REST API. Topics with view counts and like counts, full post threads, user profiles with trust levels and badges, category trees, full-text search. The actor calls the documented public JSON endpoints directly: no browser, no proxies, no cookies, no auth. One actor works with every Discourse-powered community: HuggingFace, Django, Python.org, Unity, KiCad, Ruby on Rails, Brave, meta.discourse.org, and hundreds more.

Try it instantly: pick getLatest, leave instance as https://discuss.huggingface.co, click Start. You get the 30 newest HuggingFace forum topics in under 3 seconds for $0.03.

Why use Discourse Scraper?

  • DevRel teams: Monitor mentions of your project across the major open-source forums. Schedule daily searchPosts runs across Django, Python, HuggingFace, Unity in parallel.
  • Community managers: Track engagement on your own Discourse forum. getLatest + getCategoryTopics give you topic counts, view counts, like counts for every recent thread.
  • Customer-support archaeology: When a bug report references "the forum thread from last month", pull getTopicDetail with the topic id and you get the full conversation tree in JSON.
  • Recruiters: getUserProfile returns trust level, badge count, post count - quick signals on technical depth in a community.
  • OSS maintainers: Pull getCategoryTopics for "help" categories on multiple Discourse instances to see what users struggle with this week.

How to use Discourse Scraper

  1. Open the Input tab.
  2. Pick an action from the dropdown. getLatest is the simplest starting point.
  3. Set instance (default https://discuss.huggingface.co). To scrape a different Discourse forum, paste its URL.
  4. For category / topic / user / search actions, fill queries.
  5. Tune maxItems (default 30).
  6. Click Start.

Query format by action

ActionQuery format
getLatestleave empty
getTopleave empty (use topPeriod field if needed)
getCategoriesleave empty
getCategoryTopicscategory slug (e.g. beginners) or slug/id (e.g. beginners/5)
getTopicDetailnumeric topic id (e.g. 175977)
getUserProfileusername (e.g. julien-c)
getUserActivityusername
searchPostsfree-text search query

Input

FieldRequiredDescription
actionyesWhich API call to make. Eight options.
instanceyesDiscourse forum URL. Default https://discuss.huggingface.co.
queriessometimesRequired for category / topic / user / search actions.
maxItemsnoMax items per query. Default 30.
topPeriodnogetTop only. all / yearly / quarterly / monthly / weekly / daily.

Output

Every item carries _type (topic / post / category / user / user_action / search_result / error) plus _action and _instance.

{
"_type": "topic",
"_action": "getLatest",
"_instance": "https://discuss.huggingface.co",
"id": 175977,
"title": "Practical match for 128Gb Strix Halo with 2x3090s? (inference for coding)",
"slug": "practical-match-for-128gb-strix-halo-with-2x3090s-inference-for-coding",
"category_id": 5,
"posts_count": 2,
"views": 41,
"like_count": 0,
"created_at": "2026-05-14T10:08:00Z",
"bumped_at": "2026-05-14T10:12:00Z",
"tags": [],
"url": "https://discuss.huggingface.co/t/practical-match-for-128gb-strix-halo-with-2x3090s-inference-for-coding/175977"
}

You can download the dataset in JSON, CSV, XML, Excel, RSS or HTML format from the Output tab.

Data fields

TypeKey fields
topicid, title, slug, category_id, posts_count, views, like_count, created_at, bumped_at, last_posted_at, tags, archetype, closed, archived, pinned, url
postid, topic_id, post_number, username, user_trust_level, cooked (HTML), raw (markdown), reply_count, like_count, accepted_answer, created_at, url
categoryid, name, slug, description, topic_count, post_count, color, parent_category_id, url
userid, username, name, title, trust_level, post_count, topic_count, badge_count, likes_given, likes_received, created_at, last_seen_at
user_actionaction_type, action_code, created_at, excerpt, topic_id, topic_title, post_number, category_id, url
search_resultid, topic_id, post_number, title, blurb, username, like_count, url

Pricing

Pay-per-result: $0.001 per item. No flat monthly fee.

Cost examples:

  • Daily 30 newest HuggingFace topics: $0.03
  • 1,000 topics from the HF "beginners" category: $1.00
  • A 200-post thread with full posts: $0.20
  • 50 user profiles across moderators of a forum: $0.05

Tips

  • Discourse forums run different versions. Most endpoints we wrap have been stable since 2018, but tag plugins are optional - we omit tag actions in v0.1 because they 404 on some installs.
  • Category slug auto-resolves. Pass just beginners and the actor looks up the numeric id from /categories.json before fetching. You can also pass beginners/5 if you already know it.
  • Topic detail returns chunks of 20 posts. Past that, the actor fetches additional batches via /t/{id}/posts.json?post_ids[]=... until maxItems is reached.
  • Search is full-text. It searches both posts and topics; the actor flattens results into a single search_result type with a topic_id so you can fetch the full thread separately.

FAQ, disclaimers, support

Is this legal? The actor calls each Discourse forum's official public REST API with documented endpoints. Public read access is the design intent of the open-source Discourse software (GPL-licensed). We identify with a clear User-Agent and honor 429 / Retry-After.

Does it work with private forums? No. We only hit anonymous read endpoints. Forums that require login to view content are out of scope.

Will I get rate-limited? Discourse has generous per-IP rate limits for read traffic and the actor retries with exponential backoff on 429. For very heavy scraping consider supplying an API key via the headers in your own fork.

Why are tags missing? The tags plugin is optional and not enabled on every Discourse instance. The actor returns topic.tags when present but doesn't have a dedicated getTags action because the endpoint 404s too often.

Bug or feature request? Open an Issue on the actor's Issues tab. I usually respond within a day.

Need a scraper for Hacker News, Stack Overflow, dev.to, arxiv, Lemmy, Mastodon, PeerTube? See my other actors at https://apify.com/perconey.