# Discourse Scraper: Topics, Posts, Users & Search (`perconey/discourse-scraper`) Actor

Scrape any Discourse forum via the public REST API. Latest / top topics, category topics, full topic + posts, user profiles + activity, full-text search. No browser, no proxies, no auth. Pay only per result item.

- **URL**: https://apify.com/perconey/discourse-scraper.md
- **Developed by:** [Perconey](https://apify.com/perconey) (community)
- **Categories:** Developer tools, Social media, News
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

$1.00 / 1,000 result items

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### What does Discourse Scraper do?

**Discourse Scraper** pulls structured data from any [Discourse](https://www.discourse.org/) forum via the **official public REST API**. Topics with view counts and like counts, full post threads, user profiles with trust levels and badges, category trees, full-text search. The actor calls the documented public JSON endpoints directly: no browser, no proxies, no cookies, no auth. One actor works with **every Discourse-powered community**: HuggingFace, Django, Python.org, Unity, KiCad, Ruby on Rails, Brave, meta.discourse.org, and hundreds more.

Try it instantly: pick **getLatest**, leave instance as `https://discuss.huggingface.co`, click Start. You get the 30 newest HuggingFace forum topics in under 3 seconds for $0.03.

### Why use Discourse Scraper?

- **DevRel teams**: Monitor mentions of your project across the major open-source forums. Schedule daily `searchPosts` runs across Django, Python, HuggingFace, Unity in parallel.
- **Community managers**: Track engagement on your own Discourse forum. `getLatest` + `getCategoryTopics` give you topic counts, view counts, like counts for every recent thread.
- **Customer-support archaeology**: When a bug report references "the forum thread from last month", pull `getTopicDetail` with the topic id and you get the full conversation tree in JSON.
- **Recruiters**: `getUserProfile` returns trust level, badge count, post count - quick signals on technical depth in a community.
- **OSS maintainers**: Pull `getCategoryTopics` for "help" categories on multiple Discourse instances to see what users struggle with this week.

### How to use Discourse Scraper

1. Open the **Input** tab.
2. Pick an **action** from the dropdown. `getLatest` is the simplest starting point.
3. Set **instance** (default `https://discuss.huggingface.co`). To scrape a different Discourse forum, paste its URL.
4. For category / topic / user / search actions, fill **queries**.
5. Tune **maxItems** (default 30).
6. Click **Start**.

#### Query format by action

Action | Query format
--- | ---
getLatest | leave empty
getTop | leave empty (use topPeriod field if needed)
getCategories | leave empty
getCategoryTopics | category slug (e.g. `beginners`) or `slug/id` (e.g. `beginners/5`)
getTopicDetail | numeric topic id (e.g. `175977`)
getUserProfile | username (e.g. `julien-c`)
getUserActivity | username
searchPosts | free-text search query

### Input

Field | Required | Description
--- | --- | ---
`action` | yes | Which API call to make. Eight options.
`instance` | yes | Discourse forum URL. Default https://discuss.huggingface.co.
`queries` | sometimes | Required for category / topic / user / search actions.
`maxItems` | no | Max items per query. Default 30.
`topPeriod` | no | getTop only. all / yearly / quarterly / monthly / weekly / daily.

### Output

Every item carries `_type` (`topic` / `post` / `category` / `user` / `user_action` / `search_result` / `error`) plus `_action` and `_instance`.

```json
{
    "_type": "topic",
    "_action": "getLatest",
    "_instance": "https://discuss.huggingface.co",
    "id": 175977,
    "title": "Practical match for 128Gb Strix Halo with 2x3090s? (inference for coding)",
    "slug": "practical-match-for-128gb-strix-halo-with-2x3090s-inference-for-coding",
    "category_id": 5,
    "posts_count": 2,
    "views": 41,
    "like_count": 0,
    "created_at": "2026-05-14T10:08:00Z",
    "bumped_at": "2026-05-14T10:12:00Z",
    "tags": [],
    "url": "https://discuss.huggingface.co/t/practical-match-for-128gb-strix-halo-with-2x3090s-inference-for-coding/175977"
}
````

You can download the dataset in JSON, CSV, XML, Excel, RSS or HTML format from the Output tab.

### Data fields

Type | Key fields
\--- | ---
`topic` | id, title, slug, category\_id, posts\_count, views, like\_count, created\_at, bumped\_at, last\_posted\_at, tags, archetype, closed, archived, pinned, url
`post` | id, topic\_id, post\_number, username, user\_trust\_level, cooked (HTML), raw (markdown), reply\_count, like\_count, accepted\_answer, created\_at, url
`category` | id, name, slug, description, topic\_count, post\_count, color, parent\_category\_id, url
`user` | id, username, name, title, trust\_level, post\_count, topic\_count, badge\_count, likes\_given, likes\_received, created\_at, last\_seen\_at
`user_action` | action\_type, action\_code, created\_at, excerpt, topic\_id, topic\_title, post\_number, category\_id, url
`search_result` | id, topic\_id, post\_number, title, blurb, username, like\_count, url

### Pricing

**Pay-per-result: $0.001 per item.** No flat monthly fee.

Cost examples:

- Daily 30 newest HuggingFace topics: **$0.03**
- 1,000 topics from the HF "beginners" category: **$1.00**
- A 200-post thread with full posts: **$0.20**
- 50 user profiles across moderators of a forum: **$0.05**

### Tips

- **Discourse forums run different versions.** Most endpoints we wrap have been stable since 2018, but tag plugins are optional - we omit tag actions in v0.1 because they 404 on some installs.
- **Category slug auto-resolves.** Pass just `beginners` and the actor looks up the numeric id from `/categories.json` before fetching. You can also pass `beginners/5` if you already know it.
- **Topic detail returns chunks of 20 posts.** Past that, the actor fetches additional batches via `/t/{id}/posts.json?post_ids[]=...` until maxItems is reached.
- **Search is full-text.** It searches both posts and topics; the actor flattens results into a single `search_result` type with a `topic_id` so you can fetch the full thread separately.

### FAQ, disclaimers, support

**Is this legal?** The actor calls each Discourse forum's official public REST API with documented endpoints. Public read access is the design intent of the open-source Discourse software (GPL-licensed). We identify with a clear User-Agent and honor 429 / Retry-After.

**Does it work with private forums?** No. We only hit anonymous read endpoints. Forums that require login to view content are out of scope.

**Will I get rate-limited?** Discourse has generous per-IP rate limits for read traffic and the actor retries with exponential backoff on 429. For very heavy scraping consider supplying an API key via the headers in your own fork.

**Why are tags missing?** The tags plugin is optional and not enabled on every Discourse instance. The actor returns topic.tags when present but doesn't have a dedicated getTags action because the endpoint 404s too often.

**Bug or feature request?** Open an Issue on the actor's Issues tab. I usually respond within a day.

**Need a scraper for Hacker News, Stack Overflow, dev.to, arxiv, Lemmy, Mastodon, PeerTube?** See my other actors at https://apify.com/perconey.

# Actor input Schema

## `action` (type: `string`):

Pick the action. getLatest / getTop / getCategories work without queries. getCategoryTopics needs a category slug. getTopicDetail needs a numeric topic id. getUserProfile / getUserActivity need a username. searchPosts needs a query.

## `instance` (type: `string`):

Base URL of the Discourse forum. Examples: https://discuss.huggingface.co, https://forum.djangoproject.com, https://discuss.python.org, https://meta.discourse.org.

## `queries` (type: `array`):

One entry per query. getCategoryTopics: category slug (e.g. 'beginners') or 'slug/id' if known. getTopicDetail: numeric topic id (e.g. 175977). getUserProfile / getUserActivity: username (e.g. julien-c). searchPosts: free-text.

## `maxItems` (type: `integer`):

Stop after this many items. Latest / top / category feeds paginate 30 per page. Topic detail returns posts in chunks of 20.

## `topPeriod` (type: `string`):

Discourse 'top' period selector. Default is the forum's configured default (usually weekly or monthly).

## Actor input object example

```json
{
  "action": "getLatest",
  "instance": "https://discuss.huggingface.co",
  "queries": [],
  "maxItems": 30
}
```

# Actor output Schema

## `dataset` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "instance": "https://discuss.huggingface.co",
    "queries": []
};

// Run the Actor and wait for it to finish
const run = await client.actor("perconey/discourse-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "instance": "https://discuss.huggingface.co",
    "queries": [],
}

# Run the Actor and wait for it to finish
run = client.actor("perconey/discourse-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "instance": "https://discuss.huggingface.co",
  "queries": []
}' |
apify call perconey/discourse-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=perconey/discourse-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Discourse Scraper: Topics, Posts, Users & Search",
        "description": "Scrape any Discourse forum via the public REST API. Latest / top topics, category topics, full topic + posts, user profiles + activity, full-text search. No browser, no proxies, no auth. Pay only per result item.",
        "version": "0.1",
        "x-build-id": "I0BlXuAc0mm4qqj1s"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/perconey~discourse-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-perconey-discourse-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/perconey~discourse-scraper/runs": {
            "post": {
                "operationId": "runs-sync-perconey-discourse-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/perconey~discourse-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-perconey-discourse-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "action"
                ],
                "properties": {
                    "action": {
                        "title": "What do you want to scrape?",
                        "enum": [
                            "getLatest",
                            "getTop",
                            "getCategories",
                            "getCategoryTopics",
                            "getTopicDetail",
                            "getUserProfile",
                            "getUserActivity",
                            "searchPosts"
                        ],
                        "type": "string",
                        "description": "Pick the action. getLatest / getTop / getCategories work without queries. getCategoryTopics needs a category slug. getTopicDetail needs a numeric topic id. getUserProfile / getUserActivity need a username. searchPosts needs a query.",
                        "default": "getLatest"
                    },
                    "instance": {
                        "title": "Discourse instance URL",
                        "type": "string",
                        "description": "Base URL of the Discourse forum. Examples: https://discuss.huggingface.co, https://forum.djangoproject.com, https://discuss.python.org, https://meta.discourse.org.",
                        "default": "https://discuss.huggingface.co"
                    },
                    "queries": {
                        "title": "Queries",
                        "type": "array",
                        "description": "One entry per query. getCategoryTopics: category slug (e.g. 'beginners') or 'slug/id' if known. getTopicDetail: numeric topic id (e.g. 175977). getUserProfile / getUserActivity: username (e.g. julien-c). searchPosts: free-text.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxItems": {
                        "title": "Max items per query",
                        "minimum": 0,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Stop after this many items. Latest / top / category feeds paginate 30 per page. Topic detail returns posts in chunks of 20.",
                        "default": 30
                    },
                    "topPeriod": {
                        "title": "Period for getTop",
                        "enum": [
                            "all",
                            "yearly",
                            "quarterly",
                            "monthly",
                            "weekly",
                            "daily"
                        ],
                        "type": "string",
                        "description": "Discourse 'top' period selector. Default is the forum's configured default (usually weekly or monthly)."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
