Pricing

from $3.99 / 1,000 results

Go to Apify Store

Reddit Subreddit Members Scraper

Try for free

Pricing

from $3.99 / 1,000 results

Rating

0.0

(0)

Developer

ScraperX

Actor stats

Bookmarked

Total users

Monthly active users

9 days ago

Last modified

Reddit Subreddit Members Scraper — Async Apify Actor with Smart Proxy Fallback

This actor collects Reddit user handles from subreddits, keywords, or direct user inputs, then optionally enriches each profile with public metadata. It is fully asynchronous (aiohttp), pushes results live to the Apify dataset, and ships with a multi-step proxy strategy: start direct, fallback to Apify datacenter if blocked, then escalate to residential and stay there. This README is long-form (≈1500 words) and SEO-optimized for queries like “Reddit scraper”, “Reddit members extractor”, “Apify actor Reddit”, “Reddit subreddit members”, “Reddit user scraper”, “Reddit proxy scraping”, and “async Python Reddit scraper”.

Why This Actor

Multi-input flexibility: Accepts subreddit URLs (https://www.reddit.com/r/python), r/<sub>, user URLs (https://www.reddit.com/user/spez), u/<user>, or free-text keywords to search Reddit posts.
Post + comment coverage: Gathers authors from both posts and comments with pagination, respecting maxPosts, maxComments, and sort_order (new, hot, top, rising).
Optional enrichment: When fetchDetails=true, calls /user/{username}/about.json to add karma, account creation time, gold status, and avatar.
Proxy escalation: Direct → Apify datacenter → Apify residential (sticky after switch). Logs every transition; adds extra retries on residential.
Live data safety: Pushes each found user immediately to the Apify dataset so partial results are preserved on crash or block.
Async and rate-limited: Uses aiohttp with semaphores and jittered sleeps to reduce throttling; rotates User-Agent per request.
Function-based code: No classes, easier to maintain and extend.
Production-ready defaults: Conservative timeouts, retry logic, and structured dataset view.

What It Scrapes

Usernames from subreddit posts and comments.
Profile URL for each discovered user.
Optional user details (when enabled): total karma, post karma, comment karma, created UTC, gold flag, avatar URL.
Source coverage:
- Subreddit posts (sorted by sort_order)
- Subreddit comments
- Reddit search results for provided keywords (post authors)
- Direct user inputs (usernames/URLs)

Proxy and Anti-Blocking Strategy

Start direct (no proxy) — fastest and cheapest.
If blocked (403/429/503/timeout) → switch to Apify datacenter proxy.
If still blocked → switch to Apify residential proxy, stick to residential for the rest of the run, and allow 3 extra retries there.
Logged events — every escalation is written to Apify logs with reason and target URL.

Input Parameters (actor.json)

targets (array, required): Mixed inputs. Supports:
- Subreddit URLs (https://www.reddit.com/r/python)
- r/<subreddit>
- User URLs (https://www.reddit.com/user/spez) or u/<user>
- Free-text keywords (treated as Reddit search queries)
subreddits (array, optional): Plain subreddit names if you prefer not to use targets.
sort_order (string, enum): new | hot | top | rising (defaults to new).
maxPosts (integer): 1–1000, default 100. Posts per subreddit or per keyword search.
maxComments (integer): 0–1000, default 100. Comments per subreddit.
fetchDetails (boolean): If true, enrich each user via /user/{username}/about.json.
maxConcurrentUsers (integer): 1–20, default 3. Concurrency for user detail fetches.
requestDelay (integer): 0–10 seconds, default 1. Added between detail calls; jittered by +0–0.5s.
proxyConfiguration (object): Apify proxy editor. Default useApifyProxy=false (direct). If Reddit blocks, actor escalates automatically to datacenter then residential.

Example Input (balanced)

{
  "targets": [
    { "url": "https://www.reddit.com/r/python" },
    { "url": "r/webscraping" },
    { "url": "asyncio" },
    { "url": "u/spez" }
  ],
  "sort_order": "new",
  "maxPosts": 50,
  "maxComments": 50,
  "fetchDetails": true,
  "maxConcurrentUsers": 3,
  "requestDelay": 1,
  "proxyConfiguration": { "useApifyProxy": false }
}

Output Schema (dataset)

Each pushed item (basic):

{
  "username": "example_user",
  "userId": "t2_1700000000000",
  "profileUrl": "https://reddit.com/user/example_user"
}

When fetchDetails=true, additional fields appear:

{
  "username": "example_user",
  "userId": "t2_1700000000000",
  "profileUrl": "https://reddit.com/user/example_user",
  "totalKarma": 1234,
  "postKarma": 900,
  "commentKarma": 334,
  "createdUTC": 1600000000,
  "isGold": false,
  "iconImg": "https://styles.redditmedia.com/..."
}

How It Works (Architecture)

Async requests: aiohttp with rotating User-Agent and per-request proxy resolution.
Proxy state machine: Direct → Datacenter → Residential; sticky on residential with extra retries.
Post and comment pagination: Uses after cursors up to maxPosts / maxComments.
Keyword search: Queries /search.json with type=link, collects authors from posts.
Live persistence: Each user is Actor.push_data(...) immediately; no batching required.
Concurrency control: Semaphore on user-detail fetches to respect maxConcurrentUsers.
Jittered delays: Small random sleep after a few requests and between user detail calls to reduce 429s.
Resilience: Escalates proxies on 403/429/503/timeouts, logs failures, continues collecting partial data.

Anti-Blocking Tips

Keep fetchDetails=false if you only need usernames; this reduces calls and blocks.
Lower maxPosts/maxComments for aggressive subreddits.
Increase requestDelay to 2–3s when fetching details at scale.
Start with direct; allow actor to escalate naturally. If you expect heavy blocking, set useApifyProxy=true to begin on datacenter.
Prefer fewer concurrent detail requests (maxConcurrentUsers 2–4) if seeing rate limits.

Setup and Local Run

Requirements: Python 3.10+, Apify CLI, Docker (for full build).

cd Reddit-Subreddit-Members-Scraper
pip install -r requirements.txt
apify run

Supply input via INPUT.json or Apify UI. Results appear in the default dataset.

Deploy on Apify

Push: apify push
Run in the Apify Console with your input.
Monitor logs for proxy transitions and counts.
Download results from the Dataset tab (JSON/CSV/Excel).

Field-by-Field Input Guide

targets: Best entrypoint; mix subreddits, users, and keywords. Use request-list editor in Apify UI.
subreddits: Convenience fallback; plain names.
sort_order: Choose new for freshness, hot for trending, top for high-signal authors, rising for early discoveries.
maxPosts / maxComments: Balance coverage vs. speed; Reddit caps per page at 100.
fetchDetails: Enable only when you need karma/created/avatar; otherwise stay off for speed.
maxConcurrentUsers: Tune to reduce 429s; 3–5 is usually safe.
requestDelay: Increase if blocked while enriching.
proxyConfiguration: Leave off by default. The actor will escalate automatically when blocked.

Data Quality Notes

Deleted or suspended authors are skipped ([deleted]).
Some profiles may block or return minimal data; enrichment may be partial.
userId is a generated placeholder (Reddit API hides the true ID through these endpoints).

Performance Tips

Use smaller maxPosts and maxComments across more subreddits to diversify results.
For large runs with details, consider running during off-peak hours and upping delay to 2–3s.
Keywords can be noisy; combine with subreddits for higher relevance.

Error Handling and Logging

HTTP 403/429/503/Timeout → proxy escalation with log entry.
Other HTTP → warning + limited retries.
Exceptions → logged with stack trace; run continues where possible.
Completion → summary log to check dataset.

Security and Compliance

Scrapes only public Reddit endpoints.
Respect Reddit’s terms and local laws; do not spam users with scraped data.
Proxies are used solely for block avoidance; residential escalation is logged.

Extending the Actor

Add score or upvote thresholds: adjust post fetch URL parameters.
Add top time windows: append t=day/week/month/year/all on top sorting.
Add subreddit filters: pre-validate allowed subs list.
Add CSV export: post-process dataset with Apify transformations or client-side script.

Troubleshooting FAQ

Q: I get empty results.
A: Check inputs; ensure subreddits exist. Lower maxPosts/maxComments, set sort_order=new, and allow proxy escalation.

Q: Blocks persist even on residential.
A: Increase requestDelay, reduce maxConcurrentUsers, and lower volume. Consider splitting runs by subreddit batches.

Q: Enrichment is slow.
A: Disable fetchDetails or reduce maxConcurrentUsers. Details require per-user calls.

Q: Dataset fields missing.
A: Fields only appear when fetchDetails=true. Basic mode pushes username, userId, profileUrl only.

SEO Snapshot (keywords covered)

Reddit scraper, Reddit members scraper, Reddit user extractor, Reddit subreddit members, Reddit profile scraper, Apify Reddit scraper, Python aiohttp Reddit scraper, Reddit proxy scraping, Reddit residential proxy, Reddit datacenter proxy, async Reddit scraper, Reddit dataset export.

Quick Start (TL;DR)

Provide targets with subreddits/users/keywords.
Set maxPosts, maxComments, sort_order.
Decide on fetchDetails.
Leave proxy off; actor will escalate if blocked.
Run and grab results from the dataset.

Changelog

v0.1: Initial public actor with async fetch, post/comment collection, keyword search, optional user details, and direct→datacenter→residential proxy fallback.

Built to stay resilient, transparent, and efficient for Reddit member discovery on Apify. Run it, watch the logs for proxy events, and export the dataset when done.

Reddit Subreddit Members Scraper

scrapeengine/reddit-subreddit-members-scraper

ScrapeEngine

Reddit Subreddit Members Scraper

scrapebase/reddit-subreddit-members-scraper

ScrapeBase

Reddit Subreddit Members Scraper

scrapeflow/reddit-subreddit-members-scraper

ScrapeFlow

Reddit Subreddit Members Scraper

scraply/reddit-subreddit-members-scraper

Scraply

Reddit Subreddit Members Scraper

scrapemesh/reddit-subreddit-members-scraper

ScrapeMesh

Reddit Subreddit Members Scraper

scrapepilotapi/reddit-subreddit-members-scraper

ScrapePilot

Reddit Subreddit Members Scraper

scrapium/reddit-subreddit-members-scraper

Scrapium

Reddit Subreddit Members Scraper

api-empire/reddit-subreddit-members-scraper

Scrape and export subreddit members into your CRM or automation pipeline. Ideal for community targeting, outreach, and social listening workflows driven by live Reddit engagement.

API Empire

Reddit Subreddit Tracker

yawning_pit/reddit-subreddit-tracker

pit 2017

Reddit Subreddit Members Scraper

louisdeconinck/reddit-subreddit-users

Scrape all members of a subreddit. Find the most active and influential users within Reddit communities. Perfect for market research, community analysis, and finding key players in your target niche.