Rumble Category Scraper avatar

Rumble Category Scraper

Pricing

Pay per usage

Go to Apify Store
Rumble Category Scraper

Rumble Category Scraper

Walks /category/{slug}/videos pages and extracts unique channel URLs until target volume is hit, the pagination 404s, or a safety cap on pages is reached.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Fiodar Tarasenka

Fiodar Tarasenka

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

11 days ago

Last modified

Share

infla/rumble-category-scraper

Walks https://rumble.com/category/{slug}/videos paginated listing pages and extracts unique channel URLs. Powers Infla's rumble_category job type (PLAN-009).

Input

FieldTypeDefaultDescription
categorystring (required)Rumble category slug (crypto, news, finance, …).
modeenum (channels | videos)channelsOutput shape. See Videos mode (PLAN-011) below.
targetUniqueCreatorsinteger100Stop after this many unique channels. In videos mode this caps the number of distinct creators emitted.
maxVideosPerCreatorinteger5Videos-mode only: cap on videos emitted per creator. Range 1-50. Ignored in channels mode.
maxPagesinteger50Safety cap on pagination depth.

Output

Channels mode (default)

One dataset item per unique channel:

{
"mode": "channels",
"channelHandle": "russellbrand",
"channelURL": "https://rumble.com/c/RussellBrand",
"page": 1,
"position": 3
}

channelHandle is lowercased so callers can dedupe across runs without re-normalising. page + position preserve discovery order.

Videos mode (PLAN-011)

When mode: "videos", the actor emits one dataset item per video card instead of per channel, with the parent channel handle attached so the Go side can group by handle downstream. This is how Infla's PLAN-011 door-knock + content-only outreach pipelines get pre-fetched video context without an extra per-creator round-trip to Rumble.

Use channels mode when you only need a list of creators to enrich later via a separate per-channel scrape. Use videos mode when you want the listing-page video data inline — saves one round-trip per creator and is the default for Infla's rumble_category jobs after PLAN-011.

Per-item shape in videos mode:

{
"mode": "videos",
"channelHandle": "philgodlewski",
"channelURL": "https://rumble.com/c/philgodlewski",
"videoURL": "https://rumble.com/v123-some-slug.html",
"videoExternalID": "v123",
"title": "Title from the listing card",
"durationSeconds": 1234,
"publishedAt": "2026-05-30T00:00:00Z",
"viewCount": 1234,
"thumbnailURL": "https://.../thumbnail.jpg",
"isShort": false,
"isLive": false,
"page": 1,
"position": 7
}

Field notes:

  • videoExternalID is the Rumble permalink slug (e.g. v123 from /v123-some-slug.html). Stable across the site and usable as a primary key on the Go side.
  • durationSeconds parses both mm:ss and hh:mm:ss card text. 0 when the card doesn't expose a duration (e.g. live streams).
  • viewCount handles 1.2K / 3M / 4.5B abbreviations. 0 on parse failure.
  • publishedAt converts relative dates ("2 days ago", "3 hours ago", "1 month ago") to ISO 8601 UTC at midnight. null when the card has no date or it's in an unrecognised shape — listing cards don't carry sub-day precision so midnight is the honest anchor.
  • isShort / isLive are best-effort badge detection. Default false when the badge isn't present.

Stop semantics in videos mode: the crawl halts when the number of distinct creators reaches targetUniqueCreators (matching channels-mode "unique creators reached"), or pagination 4xx, or maxPages. Per-creator cap is enforced by maxVideosPerCreator but does NOT contribute to the stop condition.

Termination

The actor stops on the first of:

  • targetUniqueCreators unique handles collected
  • Next page returns a non-2xx response (pagination exhausted)
  • maxPages reached

Local development

cd apify-actors/rumble-category-scraper
npm install
# npm install runs `playwright install chromium` via postinstall;
# if the postinstall step is skipped (e.g. CI), run it manually:
# npx playwright install chromium
mkdir -p .actor
echo '{"category":"crypto","targetUniqueCreators":20,"maxPages":3}' > .actor/INPUT.json
apify run --input-file=.actor/INPUT.json --purge

apify run --purge (Apify CLI v0.x+) wipes the local key-value store and dataset between runs so each invocation starts clean.

Cloudflare on Rumble category pages

Rumble protects category listing pages with Cloudflare's JS challenge — every raw HTTP fetch is met with HTTP 403 from CF's edge. The actor uses PlaywrightCrawler with a real headless Chromium so the challenge resolves automatically. CheerioCrawler (raw HTTP) was tried first and confirmed not viable for this target.

Local runs without an Apify token use direct connections — your laptop's IP solves the challenge once and Chromium reuses the clearance cookie across pages. If your IP has been flagged by CF for any reason, you'll see persistent 403s; either rotate IPs or test on the cloud (apify push + run from the console).

On Apify cloud, Actor.createProxyConfiguration({groups:['RESIDENTIAL']}) is requested. RESIDENTIAL requires a paid Apify plan; on free tier the call falls back to datacenter proxy, which is sufficient for most CF challenges on listing pages.

Deploy

Requires the Apify CLI (npm i -g apify-cli) and an Apify account with apify login completed.

cd apify-actors/rumble-category-scraper
apify push

apify push reads .actor/actor.json, builds the image remotely, and publishes a new actor version under your account. The actor will appear in the Apify Console with the technical name from actor.json (rumble-category-scraper). The full ID is <your-username>/rumble-category-scraper.

After deploy:

  1. Open the actor in the Apify Console and copy its ID (username/rumble-category-scraper).
  2. Set APIFY_ACTOR_RUMBLE_CATEGORY=<your-username>/rumble-category-scraper in Infla's .env.production (and reload the app container).
  3. Infla's discovery worker reads cfg.ApifyActorRumbleCategory when dispatching a rumble_category job.

Design notes

  • PlaywrightCrawler over CheerioCrawler. Rumble's category pages are gated by Cloudflare's JS challenge. A real Chromium resolves the challenge transparently; raw HTTP (CheerioCrawler) is met with 403 on every request. Cost: a Playwright run is ~3-5x the compute units of a Cheerio run, but at one actor run per discovery job this is still <$0.50 of operator-visible cost.
  • maxConcurrency: 1. Sequential pagination keeps the per-page log line meaningful and avoids saturating Rumble's anti-bot. The expected target is 100-300 creators which finishes in well under a minute even sequentially.
  • No retries on 404. Rumble's pagination doesn't expose a total-pages count; the canonical "you've reached the end" signal is a 404 on the next page request. The actor treats failed requests as the stop signal and exits cleanly.
  • Conservative selectors. Match a[href^="/c/"], /user/, and /channel/. Three prefixes cover every Rumble channel-URL scheme observed during PLAN-009 research. New schemes (if Rumble adds any) would need a code change here; the conservative approach beats a regex that might catch outbound advertising links.
  • useFingerprints: true. Apify's browser pool rotates user- agent + canvas/font fingerprints across sessions, which combined with the Playwright JS-challenge solver is the canonical CF bypass recipe documented by Apify itself.

Schema version

This actor's output JSON is consumed by internal/services/discovery/rumble_category.go (PLAN-009 C10). Any output-field rename requires a coordinated change there.