Matrix Message Scraper
Pricing
Pay per event
Matrix Message Scraper
Scrape public rooms and messages from any Matrix homeserver (matrix.org, Element, or self-hosted). Discover public rooms by keyword or scrape message history from specific rooms using a Matrix access token.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Scrape public rooms and message history from any Matrix homeserver — matrix.org, Element, or self-hosted. Returns room metadata and message events in clean JSON.
Matrix is a federated open protocol with 80 million accounts across 100,000+ homeservers. It's the backend for Element, the chat platform used by Mozilla, KDE, the French government, and a long list of open-source projects. Most public rooms are unencrypted and fully readable. This scraper gets you into that data.
Matrix Message Scraper Features
- Discovers public rooms across any Matrix homeserver without an account — name, topic, member count, join rules, aliases, and avatar URL
- Filters by keyword — narrow public room results to a specific topic before collecting
- Scrapes message history from one or more rooms using your Matrix access token — works on any public or joined room
- Extracts full event metadata — sender ID, display name, timestamp, message type, reply threading, edits, and media URLs
- Handles cursor pagination — fetches complete message history across thousands of events without manual cursor management
- Federation-aware — query a different homeserver's public room list by pointing the homeserver URL at any Matrix server
- No proxies required. Matrix REST API is accessible from standard IPs.
- Two distinct modes: room discovery (no auth) and message scraping (access token required)
What Can You Do With Matrix Data?
- Open-source community researchers — map the Matrix ecosystem, identify active projects, track migration from Slack/Discord
- Academic linguists — large-scale multilingual conversation corpora without Twitter's data access restrictions
- Compliance teams — archive message history for organizations that adopted Element as their primary chat platform
- Privacy advocates — analyze the federated network structure and federation patterns across homeservers
- Content moderation tooling — build training datasets from public room conversations
- Developer tools — monitor Matrix rooms for mentions, keywords, or bot triggers
How It Works
- Pick a mode. Discover Public Rooms lists rooms on the homeserver by keyword or returns all rooms. Scrape Room Messages fetches message history for specific room IDs.
- Configure the homeserver. Defaults to matrix.org. Point it at any Matrix homeserver URL for federated queries.
- Provide a token for messages. The Discover mode needs no credentials. For messages, get your access token from Element: Settings → Help & About → Access Token.
- Run. The scraper handles cursor pagination — the Matrix API returns results in batches with next-page tokens, and this actor follows them until it hits your item limit.
Matrix Message Scraper Input
{"action": "Discover Public Rooms","homeserver": "https://matrix.org","searchTerm": "linux","maxItems": 100}
{"action": "Scrape Room Messages","homeserver": "https://matrix.org","accessToken": "syt_...","roomIds": ["!abcdef:matrix.org", "!ghijkl:kde.org"],"maxMessagesPerRoom": 500,"maxItems": 1000,"messageDirection": "b"}
| Field | Type | Default | Description |
|---|---|---|---|
action | string | Discover Public Rooms | What to scrape. Options: Discover Public Rooms, Scrape Room Messages |
homeserver | string | https://matrix.org | Matrix homeserver base URL |
accessToken | string | — | Matrix access token. Required for Scrape Room Messages. |
roomIds | array | — | List of Matrix room IDs (e.g. !abcdef:matrix.org). Used with Scrape Room Messages. |
searchTerm | string | — | Keyword filter for Discover Public Rooms. Leave blank to return all rooms. |
maxItems | integer | 10 | Maximum number of records to return across all rooms |
maxMessagesPerRoom | integer | 100 | Maximum messages per room. Used with Scrape Room Messages. |
messageDirection | string | b | Pagination direction. b = backward (older first), f = forward (newest first) |
Matrix Message Scraper Output
Mode 1: Discover Public Rooms
{"record_type": "room","room_id": "!L58ME6ufiP49v97UIOBIpvWKEgj4912JmECPuDzlvCI","room_name": "Matrix HQ","room_topic": "The Official Matrix HQ — chat about Matrix here! | https://matrix.org","room_canonical_alias": "#matrix:matrix.org","room_member_count": 5451,"room_is_encrypted": false,"room_join_rule": "public","room_world_readable": true,"room_guest_can_join": true,"room_avatar_url": "mxc://matrix.org/DRevoaEiuzbkOznknySKuMmE","room_type": null,"homeserver": "https://matrix.org"}
| Field | Type | Description |
|---|---|---|
record_type | string | room for this mode |
room_id | string | Unique Matrix room ID (e.g. !abcdef:matrix.org) |
room_name | string | Display name of the room |
room_topic | string | Room topic or description |
room_canonical_alias | string | Canonical room alias (e.g. #matrix:matrix.org) |
room_member_count | integer | Number of joined members |
room_is_encrypted | boolean | Whether end-to-end encryption is enabled |
room_join_rule | string | Join rule: public, invite, knock, or restricted |
room_world_readable | boolean | Whether history is visible without joining |
room_guest_can_join | boolean | Whether guests can join |
room_avatar_url | string | mxc:// URL for the room avatar |
room_type | string | Room type (m.space for Matrix Spaces, null for regular rooms) |
homeserver | string | Homeserver URL used for the query |
Mode 2: Scrape Room Messages
{"record_type": "message","event_id": "$example_event_id:matrix.org","event_type": "m.room.message","event_sender": "@alice:matrix.org","event_sender_display_name": "Alice","event_origin_server_ts": 1715000000000,"event_content_msgtype": "m.text","event_content_body": "Hey everyone, anyone know the status of the new spec?","event_content_formatted_body": null,"event_content_url": null,"event_reply_to": null,"event_thread_id": null,"event_edits": null,"event_room_id": "!L58ME6ufiP49v97UIOBIpvWKEgj4912JmECPuDzlvCI"}
| Field | Type | Description |
|---|---|---|
record_type | string | message for this mode |
event_id | string | Unique Matrix event ID (e.g. $eventid:matrix.org) |
event_type | string | Event type (only m.room.message events are saved) |
event_sender | string | Matrix user ID of the sender (e.g. @user:matrix.org) |
event_sender_display_name | string | Display name at time of event (may be null) |
event_origin_server_ts | integer | Event timestamp in milliseconds since Unix epoch |
event_content_msgtype | string | Message subtype: m.text, m.image, m.file, m.video, m.audio, m.notice |
event_content_body | string | Plain-text body or filename for media messages |
event_content_formatted_body | string | HTML-formatted body (when present) |
event_content_url | string | mxc:// media URL for image/file/video/audio messages |
event_reply_to | string | Event ID being replied to (m.in_reply_to), if any |
event_thread_id | string | Thread root event ID (m.thread relation), if any |
event_edits | string | Event ID being edited by this event (m.replace relation), if any |
event_room_id | string | Room ID the event belongs to |
🔍 FAQ
How do I scrape Matrix rooms?
Matrix Message Scraper connects to the Matrix Client-Server API at your chosen homeserver. For public room discovery, no credentials are needed — just set action to Discover Public Rooms and run. For message history, get your access token from Element (Settings → Help & About → Access Token) and supply a list of room IDs.
How much does Matrix Message Scraper cost to run? Matrix Message Scraper charges $0.10 per run start plus $0.001 per record. Scraping 1,000 messages from a set of rooms costs roughly $1.10. Public room discovery is inexpensive — 100 rooms is about $0.20 total.
What data can I get from matrix.org without logging in? The public room directory. Matrix Message Scraper returns room ID, name, topic, member count, canonical alias, join rule, and avatar URL for all publicly listed rooms — no access token needed. Message history requires authentication.
Can Matrix Message Scraper scrape rooms on homeservers other than matrix.org?
Yes. Set homeserver to any Matrix server URL (e.g. https://matrix.mozilla.org, https://matrix.kde.org, or a self-hosted instance). Public room discovery and authenticated message scraping work on any standard Matrix homeserver.
Does it scrape encrypted rooms? No. Encrypted room messages are stored as opaque ciphertext — without the private keys they're unreadable. Matrix Message Scraper skips encrypted event payloads. Most public rooms don't use encryption.
Need More Features?
Need custom filters, incremental scraping across runs, or support for a different homeserver configuration? File a request or get in touch.
Why Use Matrix Message Scraper?
- First Matrix scraper on Apify — zero competing actors means no stale or abandoned alternatives
- No proxies, no browser overhead — pure REST API access, which keeps costs low and runs fast
- Federation-aware — one actor queries any homeserver, not just matrix.org