Twitch VOD Chat Archive
Pricing
Pay per event
Twitch VOD Chat Archive
Export the full timestamped chat replay attached to any public Twitch VOD as a one-row-per-message dataset. Includes user color, badges, emote IDs, and message offsets. No login. The go-to Twitch chat scraper after earlier Apify Twitch actors were deprecated.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
🎯 What this scrapes
Twitch retains a complete timestamped chat replay for every VOD as long as the VOD itself exists. This Actor walks the same VideoCommentsByOffsetOrCursor GraphQL endpoint that Twitch's own VOD player uses, paginates by content offset (the only mode that avoids Twitch's integrity-check challenge), and emits one clean row per chat message — ready for analytics, moderation-classifier training, or post-broadcast review.
🔥 What we handle for you
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per page,Retry-Afterhonoured. - 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
- 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.
💡 Use cases
- Community sentiment analysis — bulk-export chat from your last 20 streams and run NLP for hype-moment detection.
- Moderation-classifier training — gather positive / negative chat samples at scale for an in-house spam or toxicity model.
- Esports analytics — quantify hype peaks against game events by joining message density to the VOD timeline.
- Post-broadcast review — streamers and mods download chat for an after-action review, search for usernames, or extract clips with chat context.
- Academic research — public-record dataset of streamer-viewer conversation for media-studies research.
⚙️ How to use it
- Click Try for free at the top of the page.
- Fill in the input form — most fields have sensible defaults.
- Click Start. Output streams into the run's dataset.
- Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
vodIds | array | no | [] | List of Twitch VOD IDs or full VOD URLs (e.g. https://www.twitch.tv/videos/2773625679). Either this or <cod |
channelLogin | string | no | '—' | Twitch channel login (the URL slug, e.g. shroud). Used only when vodIds is empty. The Actor fe |
maxRecentVods | integer | no | 5 | When channel mode is used, how many most-recent ARCHIVE VODs to pull (1–50). |
maxMessagesPerVod | integer | no | 5000 | Stop walking chat once this many messages have been emitted for a single VOD (1–200000). A 6-hour stream typically has 5 |
startOffsetSeconds | integer | no | 0 | Skip chat messages whose offset within the VOD is less than this value. Use 0 to start from the beginning. |
proxyConfiguration | object | no | {'useApifyProxy': True, 'apifyProxyGroups': ['RESIDENTIAL']} | Twitch rate-limits a single IP aggressively past ~10k chat messages. Residential proxy strongly recommended for long VOD |
Example input
{"vodIds": ["2773625679"],"maxMessagesPerVod": 100,"startOffsetSeconds": 0,"proxyConfiguration": {"useApifyProxy": false}}
📤 Output
Every row is one dataset item.
| Field | Type | Notes |
|---|---|---|
vod_id | string | Twitch VOD ID (numeric string). |
vod_title | ['string', 'null'] | VOD title. Populated when channel mode is used or when a metadata pre-fetch resolved it. |
channel_login | ['string', 'null'] | Channel login (URL slug) for the VOD. Populated in channel mode. |
message_id | string | Unique chat message UUID. |
message_offset_seconds | integer | Position within the VOD when this message was posted, in seconds. |
posted_at | string | Wall-clock UTC timestamp the message was posted (ISO 8601 with milliseconds, verbatim from Twitch). |
commenter_id | ['string', 'null'] | Twitch user ID of the commenter. Null for deleted users. |
commenter_login | ['string', 'null'] | Commenter login (URL slug). |
commenter_display_name | ['string', 'null'] | Commenter display name. |
message_text | string | Concatenated plain-text body of the message (emote shortcodes preserved as their literal text). |
message_fragments | array | Structured fragments: list of {type: 'text' |
user_color | ['string', 'null'] | User's chat color (hex, e.g. '#DAA520'). Null when not set. |
badges | array | List of {set_id, version} dicts. Empty list when the user has none. |
is_subscriber | boolean | Convenience: true when 'subscriber' is in badges. |
scraped_at | string | When this row was emitted (ISO 8601 UTC). |
Example output
{"vod_id": "2773625679","vod_title": "never played forza but i definitely have a drivers license so it should be easy","channel_login": "shroud","message_id": "1292e052-0561-4db5-86c7-adfc4556d628","message_offset_seconds": 12,"posted_at": "2026-05-16T18:42:35.297Z","commenter_id": "142680597","commenter_login": "tabrexs","commenter_display_name": "tabrexs","message_text": "PewPewPew","message_fragments": [{"type": "emote","text": "PewPewPew","emote_id": "emotesv2_587405136a8147148c77df74baaa1bf4"}],"user_color": "#DAA520","badges": [],"is_subscriber": false,"scraped_at": "2026-05-16T19:00:00Z"}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.05 | One-off warm-up charge per run |
result-row | $0.001 | PPE event |
Example: 1 000 results at the rates above ≈ $0.05. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.
🚧 Limitations
Twitch's public VOD chat replay endpoint is the only data source — no OAuth, no moderator-action log, no live chat, no DMs. On the FREE Apify plan only the BUYPROXIES94952 datacenter group is provisioned (5 IPs); residential proxy gives much better tolerance for long VODs and is recommended on paid plans. The persisted-query hash this Actor uses is a public Twitch web-player constant — if Twitch rotates it on a schema change, we ship a same-day patch.
❓ FAQ
Does this scrape live chat?
No — VOD chat replays only. Live chat is a different IRC-over-WSS protocol. Once the broadcast ends and the VOD is processed, its chat replay becomes accessible via this Actor.
Why are some VODs returning zero messages?
Most common causes: (a) the VOD has subscriber-only chat enabled, so anonymous queries get nothing; (b) the VOD has expired (default accounts retain VODs for 60 days, Partners / Affiliates / Turbo retain indefinitely); (c) the channel disabled chat replay. We surface the cause in the run status message.
Why does a long VOD take so long?
Twitch returns about 50–60 messages per page and rate-limits a single IP aggressively past ~10k messages. The Actor uses one in-flight request at a time and backs off on 429. For long VODs default to residential proxy.
What about emote images?
We return the Twitch emote_id in each emote fragment. You construct the CDN URL yourself: https://static-cdn.jtvnw.net/emoticons/v2/<emote_id>/default/dark/3.0.
Are moderator actions (bans, timeouts) included?
No — the public chat-replay endpoint does not expose moderator action logs. Deleted messages may appear as <message deleted> or not at all, depending on when they were removed.
💬 Your feedback
Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.