Reddit Scraper avatar
Reddit Scraper

Pricing

$29.00/month + usage

Go to Apify Store
Reddit Scraper

Reddit Scraper

Scrape Reddit posts, comments, communities, and profiles via URLs or keyword searches. Features proxy rotation, custom field names, flexible filtering, and automatic retries. Perfect for monitoring, research, and data collection.

Pricing

$29.00/month + usage

Rating

5.0

(1)

Developer

scraping automation

scraping automation

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

5 days ago

Last modified

Share

Reddit Scraper

Reddit Scraper is a comprehensive Apify Actor that collects posts, comments, communities, user profiles, and leaderboards from Reddit. Each entity saved in the dataset uses differentiated output fields (entityType, headline, mediaBundle, communityTag, subscriberTotal, karmaPost, etc.) to facilitate integration and avoid conflicts with other data sources.

Main Features

🎯 Complete Coverage

  • Posts: Title, content, media, votes, comments, complete metadata
  • Comments: Complete tree structure, votes, depth, replies
  • Communities/Subreddits: Metadata, members, descriptions, icons
  • User Profiles: Karma, post/comment history, metadata
  • Leaderboards: Rankings of popular subreddits by category

🔧 Flexibility and Control

  • URL-based Scraping: Supports all Reddit formats (posts, users, communities, leaderboards, searches, multireddits)
  • Keyword-based Scraping: Automatic search with configurable scope (Posts or Communities & users)
  • Advanced Sorting: 5 options (relevance, hot, top, new, comments)
  • Temporal Filters: By hour, day, week, month, year
  • Granular Limits: 6 independent cap types (total, post, comment, community, profile, leaderboard)
  • Score Filters: Automatically exclude posts/comments with low scores
  • NSFW Filters: Option to exclude NSFW content
  • Absolute Date Filters: Filter by precise date range (dateFrom, dateTo)
  • Multireddits: Support for URLs combining multiple subreddits (e.g., /r/pics+funny)
  • Automatic Pagination: Collect more than 100 items by automatically paginating
  • Deduplication: Avoid duplicates within the same run

🛡️ Robustness and Reliability

  • Automatic Retry: Intelligent handling of 403/429 errors with proxy rotation
  • Flexible Proxy Configuration: Apify Proxy (residential/datacenter) or custom proxies
  • Automatic Fallback: Default URL if no input is provided (ideal for automated tests)
  • Debug Mode: Detailed logging for quick diagnostics
  • Configurable Concurrency: Adjust the number of parallel requests
  • Performance Metrics: Detailed statistics at end of run (items/sec, duration, applied filters, duplicates)

🎨 Customization

  • Differentiated Output Fields: Unique names to facilitate integration
  • Extend Result Function: Custom enrichment of each item
  • Output Format: JSON, CSV, XML, HTML, Excel via Apify interface

Main Input Parameters

FieldTypeDefaultDescription
startLinksarray[]Reddit URLs to crawl (posts, communities, users, leaderboards, searches). If empty and no searchQueries, automatically uses /r/popular/ as fallback.
searchQueriesarray<string>[]Keywords to run a Reddit search.
searchScopeenumpostsposts or communities to target the search tab.
sortOrderenumrelevancerelevance, hot, top, new, comments (5 available options).
timeWindowenumallall, hour, day, week, month, year (for posts).
totalItemCapinteger100Global limit of items in the dataset.
postCapinteger50Maximum posts per subreddit/feed/user.
commentCapinteger25Maximum comments per post.
communityCapinteger25Maximum communities from leaderboards/searches.
profileCapinteger25Maximum user profiles from searches.
leaderboardCapinteger25Number of entries from /subreddits/leaderboard.
scrollWaitSecondsinteger30Wait delay between retries on 403/429 errors.
maxConcurrencyinteger10Maximum number of parallel HTTP requests.
useApifyProxybooleantrueEnable Apify Proxy (recommended to avoid 403 errors).
proxyConfigurationobject{}Detailed proxy configuration (Apify or custom).
extendResultFunctionstring-JavaScript function to enrich each item.
debugLogbooleanfalseEnable detailed logging for diagnostics.
minScoreintegernullMinimum score to filter posts and comments. Items with lower scores will be excluded.
includeNSFWbooleantrueWhen false, excludes NSFW posts and communities from results.
logMetricsbooleantrueDisplays performance statistics at end of run (items/sec, duration, errors, filters).
enablePaginationbooleanfalseEnables automatic pagination to collect more than 100 items per listing.
dateFromstringnullStart date to filter items (ISO 8601 format, e.g., 2024-01-01T00:00:00Z).
dateTostringnullEnd date to filter items (ISO 8601 format, e.g., 2024-12-31T23:59:59Z).
enableDeduplicationbooleanfalseEnables deduplication to avoid duplicates (by entityId) within the same run.

Input Example

{
"startLinks": [
{ "url": "https://www.reddit.com/r/worldnews/" },
{ "url": "https://www.reddit.com/r/learnprogramming/comments/lp1hi4/is_webscraping_a_good_skill_to_learn_as_a_beginner/" },
{ "url": "https://www.reddit.com/subreddits/leaderboard/" },
{ "url": "https://www.reddit.com/r/pics+funny/" }
],
"searchQueries": ["parrots"],
"searchScope": "communities",
"sortOrder": "new",
"timeWindow": "all",
"totalItemCap": 20,
"postCap": 10,
"commentCap": 5,
"communityCap": 15,
"leaderboardCap": 25,
"maxConcurrency": 10,
"scrollWaitSeconds": 30,
"useApifyProxy": true,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
},
"debugLog": false,
"minScore": 10,
"includeNSFW": false,
"logMetrics": true,
"enablePagination": true,
"dateFrom": "2024-01-01T00:00:00Z",
"dateTo": "2024-12-31T23:59:59Z",
"enableDeduplication": true
}

Use Cases

  • Brand Monitoring: Track discussions about your product or service
  • Trend Research: Identify popular topics by community
  • Sentiment Analysis: Collect comments for NLP analysis
  • Community Discovery: Explore leaderboards by category
  • Competitive Intelligence: Monitor competitor mentions
  • Academic Research: Collect data for social studies
  • Content Curation: Find relevant content by keywords

Example Post Item

{
"entityType": "post",
"entityId": "t3_144w7sn",
"redditId": "144w7sn",
"permalink": "https://www.reddit.com/r/HonkaiStarRail/comments/144w7sn/my_luckiest_10x_pull_yet/",
"headline": "My Luckiest 10x Pull Yet",
"textBody": "URL: https://i.redd.it/yod3okjkgx4b1.jpg",
"mediaBundle": {
"primaryUrl": "https://i.redd.it/yod3okjkgx4b1.jpg",
"thumbnailUrl": "https://b.thumbs.redditmedia.com/lm9KxS4laQWgx4uOoioM3N7-tBK3GLPrxb9da2hGtjs.jpg",
"isVideo": false
},
"authorHandle": "YourKingLives",
"communityTag": "r/HonkaiStarRail",
"voteScore": 1,
"commentTotal": 0,
"createdAt": "2023-06-09T05:23:15.000Z",
"collectedAt": "2025-11-20T10:00:00.000Z"
}

Quick Start

  1. Open the actor in the Apify console
  2. Configure input parameters (or use default values)
  3. Click Start and wait for the run to complete
  4. Download results from the Dataset tab (JSON, CSV, XML, HTML, Excel)

Note: If you don't provide startLinks or searchQueries, the actor automatically uses /r/popular/ as a starting point, ensuring a valid run even for automated tests.

Key Advantages

Differentiated Output Fields

Data is structured with unique field names (entityType, headline, mediaBundle, communityTag, subscriberTotal, karmaPost, etc.) to facilitate integration and avoid conflicts with other data sources.

Automatic Robustness

  • Automatic retry with proxy rotation on 403/429 errors
  • Intelligent rate limit handling
  • Automatic fallback if no input is provided
  • Debug mode for quick diagnostics

Advanced Configuration

  • Granular control with 6 independent cap types
  • Adjustable concurrency and delays
  • Complete support for Reddit leaderboards
  • 5 sorting options (including "comments")
  • Automatic pagination to collect large volumes
  • Absolute date filters for precise historical analysis
  • Automatic deduplication to avoid duplicates

Technical Notes

  • extendResultFunction receives { data, page }; page is null because we use Reddit's JSON API.
  • Always respect Reddit's usage rules and avoid unreasonable volumes.
  • Using Apify Proxy (residential recommended) is strongly advised to avoid 403 blocks.

Important: This Actor scrapes publicly available data from Reddit. By using this Actor, you acknowledge and agree to the following:

  1. Reddit Terms of Service: You are responsible for complying with Reddit's Terms of Service and User Agreement. Reddit's ToS can be found at https://www.reddit.com/help/useragreement.

  2. Rate Limiting: This Actor includes automatic retry logic and proxy rotation to handle rate limits. However, you must use reasonable request rates and avoid excessive scraping that could impact Reddit's servers.

  3. Data Usage: The scraped data is for your personal or business use only. You must respect copyright, privacy rights, and any applicable data protection laws (such as GDPR, CCPA) when using the collected data.

  4. No Warranty: This Actor is provided "as is" without any warranties. The developers are not responsible for any consequences arising from the use of this Actor, including but not limited to account bans, legal issues, or data inaccuracies.

  5. User Responsibility: You are solely responsible for ensuring that your use of this Actor complies with all applicable laws and regulations in your jurisdiction. This includes respecting intellectual property rights, privacy laws, and terms of service of third-party platforms.

  6. Prohibited Uses: Do not use this Actor to:

    • Scrape private or restricted content
    • Violate Reddit's API usage policies
    • Collect personal information without consent
    • Engage in any illegal activities

Recommendation: For production use, consider using Reddit's official API when possible, as it provides a more reliable and compliant way to access Reddit data.