Twitch Chat Scraper — VOD Chat Archive
Pricing
Pay per event
Twitch Chat Scraper — VOD Chat Archive
Export and download the full timestamped chat replay from any public Twitch VOD as a one-row-per-message dataset — user color, badges, emote IDs, message offsets — to JSON or CSV. A Twitch VOD chat downloader, no login, after earlier Apify Twitch actors were deprecated.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
0
Monthly active users
2 days ago
Last modified
Categories
Share
🎯 What this scrapes
Twitch retains a complete timestamped chat replay for every VOD as long as the VOD itself exists. This Actor walks the same VideoCommentsByOffsetOrCursor GraphQL endpoint that Twitch's own VOD player uses, paginates by content offset (the only mode that avoids Twitch's integrity-check challenge), and emits one clean row per chat message — ready for analytics, moderation-classifier training, or post-broadcast review.
The Twitch chat archive endpoint is not part of the public Helix API; walking it reliably requires matching browser-level request fingerprints and absorbing residential-proxy rotation when Twitch rate-limits a single IP past roughly 10k messages. We handle that layer for you.
🔥 Features
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per page,Retry-Afterhonoured. - 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
- 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.
- 🔄 Channel back-catalog mode — point at a channel login and pull the N most-recent archive VODs in one run.
- 🎯 Offset filtering — skip the first N seconds of a VOD to focus on a specific segment.
💡 Use cases
- Community sentiment analysis — bulk-export chat from your last 20 streams and run NLP for hype-moment detection.
- Moderation-classifier training — gather positive / negative chat samples at scale for an in-house spam or toxicity model. The
message_fragmentsfield preserves emote IDs so your model can learn emote semantics. - Esports analytics — quantify hype peaks against game events by joining message density (
message_offset_seconds) to the VOD timeline. - Post-broadcast review — streamers and mods download the twitch chat archive for an after-action review, username search, or clipping chat context.
- Academic research — public-record dataset of streamer-viewer conversation for media-studies and parasocial-dynamics research.
- Mod-tool development — feed historical chat into StreamElements / Streamlabs / Nightbot rule-testers without waiting for a live stream.
- chatdownloader alternative — drop-in managed replacement for local CLI tools; runs in Apify's cloud, no local setup, no proxy management.
⚙️ How to use it
- Click Try for free at the top of the page.
- Fill in the input form — most fields have sensible defaults.
- Click Start. Output streams into the run's dataset.
- Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the Apify API.
For a twitch vod chat download by VOD ID, paste the numeric ID (e.g. 2773625679) or the full URL (https://www.twitch.tv/videos/2773625679) into the vodIds field. For a full channel back-catalog, enter the channel login in channelLogin and set maxRecentVods.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
vodIds | array | no | [] | List of Twitch VOD IDs or full VOD URLs (e.g. https://www.twitch.tv/videos/2773625679). Either this or channelLogin must be provided. |
channelLogin | string | no | — | Twitch channel login (the URL slug, e.g. shroud). Used only when vodIds is empty. The Actor fetches the channel's most-recent archive VODs. |
maxRecentVods | integer | no | 5 | When channel mode is used, how many most-recent archive VODs to pull (1–50). |
maxMessagesPerVod | integer | no | 5000 | Stop walking chat once this many messages have been emitted for a single VOD (1–200 000). A 6-hour stream typically has 50k–300k messages. |
startOffsetSeconds | integer | no | 0 | Skip chat messages whose offset within the VOD is less than this value. Use 0 to start from the beginning. |
proxyConfiguration | object | no | {"useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"]} | Twitch rate-limits a single IP aggressively past ~10k chat messages. Residential proxy is strongly recommended for long VODs. |
Example input
{"vodIds": ["2773625679"],"maxMessagesPerVod": 100,"startOffsetSeconds": 0,"proxyConfiguration": {"useApifyProxy": false}}
📤 Output
Every row is one dataset item — one chat message from the VOD replay.
| Field | Type | Notes |
|---|---|---|
vod_id | string | Twitch VOD ID (numeric string). |
vod_title | string | null | VOD title. Populated when channel mode is used or when a metadata pre-fetch resolved it. |
channel_login | string | null | Channel login (URL slug) for the VOD. Populated in channel mode. |
message_id | string | Unique chat message UUID. |
message_offset_seconds | integer | Position within the VOD when this message was posted, in seconds. |
posted_at | string | Wall-clock UTC timestamp the message was posted (ISO 8601 with milliseconds, verbatim from Twitch). |
commenter_id | string | null | Twitch user ID of the commenter. Null for deleted users. |
commenter_login | string | null | Commenter login (URL slug). |
commenter_display_name | string | null | Commenter display name. |
message_text | string | Concatenated plain-text body of the message (emote shortcodes preserved as their literal text). |
message_fragments | array | Structured fragments: list of `{type: "text" |
user_color | string | null | User's chat color (hex, e.g. #DAA520). Null when not set. |
badges | array | List of {set_id, version} dicts. Empty list when the user has none. |
is_subscriber | boolean | Convenience flag: true when subscriber is in badges. |
scraped_at | string | When this row was emitted (ISO 8601 UTC). |
Example output
{"vod_id": "2773625679","vod_title": "never played forza but i definitely have a drivers license so it should be easy","channel_login": "shroud","message_id": "1292e052-0561-4db5-86c7-adfc4556d628","message_offset_seconds": 12,"posted_at": "2026-05-16T18:42:35.297Z","commenter_id": "142680597","commenter_login": "tabrexs","commenter_display_name": "tabrexs","message_text": "PewPewPew","message_fragments": [{"type": "emote","text": "PewPewPew","emote_id": "emotesv2_587405136a8147148c77df74baaa1bf4"}],"user_color": "#DAA520","badges": [],"is_subscriber": false,"scraped_at": "2026-05-16T19:00:00Z"}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.05 | One-off warm-up charge per run |
result-row | $0.001 | Per chat message emitted to the dataset |
Example: 1 000 results at the rates above ≈ $1.05. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.
🚧 Limitations
- VOD chat only, not live chat. Live chat is a separate IRC-over-WSS protocol; this Actor reads the VOD replay endpoint only. Once a broadcast ends and Twitch processes the VOD, the replay becomes available.
- No moderator-action log. The public chat-replay endpoint does not expose bans, timeouts, or deleted-message metadata. Deleted messages may appear as
<message deleted>or may not appear at all, depending on when they were removed. - VOD expiry. Default Twitch accounts retain VODs for 60 days. Partners, Affiliates, and Turbo subscribers retain indefinitely. Expired VODs return zero messages.
- Subscriber-only chat. If a VOD has subscriber-only chat enabled, anonymous queries return nothing. We surface the cause in the run status message.
- Rate-limits on long VODs. Twitch returns roughly 50–60 messages per page and rate-limits a single IP aggressively past ~10k messages. Residential proxy is strongly recommended for VODs with more than 10k messages. On the FREE Apify plan only the BUYPROXIES94952 datacenter group is provisioned by default; upgrade to residential for large VODs.
- GraphQL hash rotation. The persisted-query hash this Actor uses is a public Twitch web-player constant. If Twitch rotates it on a schema change, we ship a same-day patch. Subscribe to the Twitch Developers Discord
#announcementsfor early-warning.
❓ FAQ
Does this scrape live chat?
No — VOD chat replays only. Live chat is a different IRC-over-WSS protocol. Once the broadcast ends and Twitch processes the VOD, its chat replay becomes accessible via this Actor. For live-stream monitoring, see the Kick chat actor in our fleet.
Is this a chatdownloader alternative?
Yes — it covers the same use case (download Twitch VOD chat to CSV / JSON) but runs in Apify's managed cloud. No local environment to set up, no proxy configuration, no babysitting long runs. The output schema is richer: structured emote fragments, subscriber badge flag, and ISO-8601 timestamps baked in.
Why are some VODs returning zero messages?
Most common causes: (a) the VOD has subscriber-only chat enabled, so anonymous queries get nothing; (b) the VOD has expired (default accounts retain for 60 days); (c) the channel disabled chat replay. We surface the cause in the run status message.
How do I download Twitch chat to CSV?
Run the Actor with your VOD IDs, then open Storage → Dataset in Apify Console and click Export → CSV. You can also download via the Apify API if you need to automate the export step.
Why does a long VOD take so long?
Twitch returns about 50–60 messages per page and backs off on rate-limit signals. We walk one page at a time with backoff, which is the correct approach — hammering the endpoint gets you banned faster. For 100k+ message VODs, expect 10–20 minutes; residential proxy significantly reduces interruptions.
What about emote images?
We return the Twitch emote_id in each emote fragment. Construct the CDN URL yourself: https://static-cdn.jtvnw.net/emoticons/v2/<emote_id>/default/dark/3.0.
Is this a twitch chat dataset for NLP / toxicity training?
Exactly the use case we designed for. Export the full message_text + message_fragments fields, filter by is_subscriber if you need a specific audience slice, and join on message_offset_seconds to correlate with game events. We run a publicly available anonymised corpus on HuggingFace Datasets if you want a ready-made starting point.
💬 Your feedback
Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.