Rumble Category Scraper
Pricing
Pay per usage
Rumble Category Scraper
Walks /category/{slug}/videos pages and extracts unique channel URLs until target volume is hit, the pagination 404s, or a safety cap on pages is reached.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Fiodar Tarasenka
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
1
Monthly active users
11 days ago
Last modified
Categories
Share
infla/rumble-category-scraper
Walks https://rumble.com/category/{slug}/videos paginated listing
pages and extracts unique channel URLs. Powers Infla's
rumble_category job type (PLAN-009).
Input
| Field | Type | Default | Description |
|---|---|---|---|
category | string (required) | — | Rumble category slug (crypto, news, finance, …). |
mode | enum (channels | videos) | channels | Output shape. See Videos mode (PLAN-011) below. |
targetUniqueCreators | integer | 100 | Stop after this many unique channels. In videos mode this caps the number of distinct creators emitted. |
maxVideosPerCreator | integer | 5 | Videos-mode only: cap on videos emitted per creator. Range 1-50. Ignored in channels mode. |
maxPages | integer | 50 | Safety cap on pagination depth. |
Output
Channels mode (default)
One dataset item per unique channel:
{"mode": "channels","channelHandle": "russellbrand","channelURL": "https://rumble.com/c/RussellBrand","page": 1,"position": 3}
channelHandle is lowercased so callers can dedupe across runs
without re-normalising. page + position preserve discovery order.
Videos mode (PLAN-011)
When mode: "videos", the actor emits one dataset item per video
card instead of per channel, with the parent channel handle attached
so the Go side can group by handle downstream. This is how Infla's
PLAN-011 door-knock + content-only outreach pipelines get pre-fetched
video context without an extra per-creator round-trip to Rumble.
Use channels mode when you only need a list of creators to enrich
later via a separate per-channel scrape. Use videos mode when you
want the listing-page video data inline — saves one round-trip per
creator and is the default for Infla's rumble_category jobs after
PLAN-011.
Per-item shape in videos mode:
{"mode": "videos","channelHandle": "philgodlewski","channelURL": "https://rumble.com/c/philgodlewski","videoURL": "https://rumble.com/v123-some-slug.html","videoExternalID": "v123","title": "Title from the listing card","durationSeconds": 1234,"publishedAt": "2026-05-30T00:00:00Z","viewCount": 1234,"thumbnailURL": "https://.../thumbnail.jpg","isShort": false,"isLive": false,"page": 1,"position": 7}
Field notes:
videoExternalIDis the Rumble permalink slug (e.g.v123from/v123-some-slug.html). Stable across the site and usable as a primary key on the Go side.durationSecondsparses bothmm:ssandhh:mm:sscard text.0when the card doesn't expose a duration (e.g. live streams).viewCounthandles1.2K/3M/4.5Babbreviations.0on parse failure.publishedAtconverts relative dates ("2 days ago", "3 hours ago", "1 month ago") to ISO 8601 UTC at midnight.nullwhen the card has no date or it's in an unrecognised shape — listing cards don't carry sub-day precision so midnight is the honest anchor.isShort/isLiveare best-effort badge detection. Defaultfalsewhen the badge isn't present.
Stop semantics in videos mode: the crawl halts when the number of
distinct creators reaches targetUniqueCreators (matching
channels-mode "unique creators reached"), or pagination 4xx, or
maxPages. Per-creator cap is enforced by maxVideosPerCreator
but does NOT contribute to the stop condition.
Termination
The actor stops on the first of:
targetUniqueCreatorsunique handles collected- Next page returns a non-2xx response (pagination exhausted)
maxPagesreached
Local development
cd apify-actors/rumble-category-scrapernpm install# npm install runs `playwright install chromium` via postinstall;# if the postinstall step is skipped (e.g. CI), run it manually:# npx playwright install chromiummkdir -p .actorecho '{"category":"crypto","targetUniqueCreators":20,"maxPages":3}' > .actor/INPUT.jsonapify run --input-file=.actor/INPUT.json --purge
apify run --purge (Apify CLI v0.x+) wipes the local key-value
store and dataset between runs so each invocation starts clean.
Cloudflare on Rumble category pages
Rumble protects category listing pages with Cloudflare's JS challenge — every raw HTTP fetch is met with HTTP 403 from CF's edge. The actor uses PlaywrightCrawler with a real headless Chromium so the challenge resolves automatically. CheerioCrawler (raw HTTP) was tried first and confirmed not viable for this target.
Local runs without an Apify token use direct connections — your
laptop's IP solves the challenge once and Chromium reuses the
clearance cookie across pages. If your IP has been flagged by CF
for any reason, you'll see persistent 403s; either rotate IPs or
test on the cloud (apify push + run from the console).
On Apify cloud, Actor.createProxyConfiguration({groups:['RESIDENTIAL']})
is requested. RESIDENTIAL requires a paid Apify plan; on free tier
the call falls back to datacenter proxy, which is sufficient for
most CF challenges on listing pages.
Deploy
Requires the Apify CLI (npm i -g apify-cli)
and an Apify account with apify login completed.
cd apify-actors/rumble-category-scraperapify push
apify push reads .actor/actor.json, builds the image
remotely, and publishes a new actor version under your account.
The actor will appear in the Apify Console with the technical
name from actor.json (rumble-category-scraper). The full
ID is <your-username>/rumble-category-scraper.
After deploy:
- Open the actor in the Apify Console and copy its ID
(
username/rumble-category-scraper). - Set
APIFY_ACTOR_RUMBLE_CATEGORY=<your-username>/rumble-category-scraperin Infla's.env.production(and reload the app container). - Infla's discovery worker reads
cfg.ApifyActorRumbleCategorywhen dispatching arumble_categoryjob.
Design notes
- PlaywrightCrawler over CheerioCrawler. Rumble's category pages are gated by Cloudflare's JS challenge. A real Chromium resolves the challenge transparently; raw HTTP (CheerioCrawler) is met with 403 on every request. Cost: a Playwright run is ~3-5x the compute units of a Cheerio run, but at one actor run per discovery job this is still <$0.50 of operator-visible cost.
maxConcurrency: 1. Sequential pagination keeps the per-page log line meaningful and avoids saturating Rumble's anti-bot. The expected target is 100-300 creators which finishes in well under a minute even sequentially.- No retries on 404. Rumble's pagination doesn't expose a total-pages count; the canonical "you've reached the end" signal is a 404 on the next page request. The actor treats failed requests as the stop signal and exits cleanly.
- Conservative selectors. Match
a[href^="/c/"],/user/, and/channel/. Three prefixes cover every Rumble channel-URL scheme observed during PLAN-009 research. New schemes (if Rumble adds any) would need a code change here; the conservative approach beats a regex that might catch outbound advertising links. useFingerprints: true. Apify's browser pool rotates user- agent + canvas/font fingerprints across sessions, which combined with the Playwright JS-challenge solver is the canonical CF bypass recipe documented by Apify itself.
Schema version
This actor's output JSON is consumed by
internal/services/discovery/rumble_category.go (PLAN-009 C10).
Any output-field rename requires a coordinated change there.
