Reddit Data Extractor avatar
Reddit Data Extractor

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Reddit Data Extractor

Reddit Data Extractor

Comprehensive Reddit data extraction tool that scrapes posts with comments and user profiles. Features advanced search, date filtering, NSFW control, and pagination. Perfect for sentiment analysis, market research, brand monitoring, and academic studies. Uses official Reddit JSON API.

Pricing

from $1.00 / 1,000 results

Rating

5.0

(1)

Developer

techtechnicworld

techtechnicworld

Maintained by Community

Actor stats

0

Bookmarked

6

Total users

4

Monthly active users

7 days ago

Last modified

Share

Comprehensive Reddit scraper that extracts posts, comments, communities, and users from Reddit using the JSON API. Perfect for data analysis, research, and monitoring Reddit content.

Features

  • 📝 Post Scraping - Extract posts from subreddits, user profiles, or specific posts
  • 💬 Comment Scraping - Collect comments from posts with configurable depth limits
  • 🔍 Advanced Search - Search for posts, communities, users, and comments
  • 🎯 Flexible Filtering - Filter by date, NSFW content, and custom time ranges
  • 📊 Pagination Control - Specify start/end pages for precise data collection
  • High Performance - Concurrent scraping with proxy support
  • 🎨 Rich Data Output - Structured JSON with all relevant metadata

Input Parameters

Basic Configuration

ParameterTypeDefaultDescription
startUrlsArray[{ "url": "https://www.reddit.com/r/GrowthHacking/" }]List of Reddit URLs to scrape (subreddits, posts, or user profiles)
maxPostCountInteger4Maximum number of posts to scrape (0-10000)
maxCommentsPerPostInteger2Maximum number of comments to scrape per post (0-1000, 0 = no comments)
skipCommentsBooleanfalseIf true, skip scraping comments entirely

Pagination

ParameterTypeDefaultDescription
startPageInteger1Page number to start scraping from
endPageIntegernullPage number to stop at (leave empty for unlimited)

Search & Filtering

ParameterTypeDefaultDescription
searchQueryString""Search term to find posts, communities, or users
searchPostsBooleanfalseSearch for posts matching the query
searchCommunitiesBooleanfalseSearch for communities (subreddits) matching the query
searchCommentsBooleanfalseSearch for comments matching the query
sortString"new"Sort order: hot, new, top, rising, relevance, best, comments
timeString"all"Time filter: hour, day, week, month, year, all
maxPostAgeDaysIntegernullOnly scrape posts from the last N days
includeNSFWBooleanfalseInclude NSFW (Not Safe For Work) posts

Advanced Options

ParameterTypeDefaultDescription
ignoreStartUrlsBooleanfalseIf true, startUrls will be ignored (useful when only using search)
maxConcurrencyInteger10Maximum concurrent requests
maxRequestRetriesInteger3Maximum number of retries for failed requests
scrollTimeoutInteger400Timeout for scrolling in milliseconds
debugModeBooleanfalseEnable detailed logging for debugging
proxyObject{ "useApifyProxy": true }Proxy configuration for the scraper

Output Format

The scraper outputs structured JSON data with three types of items:

Post Data

{
"dataType": "post",
"id": "t3_abc123",
"parsedId": "abc123",
"url": "https://www.reddit.com/r/...",
"username": "reddit_user",
"userId": "t2_xyz789",
"title": "Post Title",
"communityName": "r/subreddit",
"parsedCommunityName": "subreddit",
"body": "Post content...",
"html": "<div>Post HTML...</div>",
"link": "https://external-link.com",
"numberOfComments": 42,
"flair": "Discussion",
"upVotes": 1234,
"upVoteRatio": 0.95,
"isVideo": false,
"isAd": false,
"over18": false,
"thumbnailUrl": "https://...",
"imageUrls": ["https://..."],
"createdAt": "2025-01-15T10:30:00.000Z",
"scrapedAt": "2025-01-15T12:00:00.000Z"
}

Comment Data

{
"dataType": "comment",
"id": "t1_def456",
"parsedId": "def456",
"url": "https://www.reddit.com/r/.../comments/...",
"postId": "t3_abc123",
"parentId": "t3_abc123",
"username": "commenter",
"userId": "t2_uvw321",
"category": "subreddit",
"communityName": "r/subreddit",
"body": "Comment text...",
"html": "<div>Comment HTML...</div>",
"createdAt": "2025-01-15T11:00:00.000Z",
"scrapedAt": "2025-01-15T12:00:00.000Z",
"upVotes": 56,
"numberOfreplies": 3
}

Community Data

{
"dataType": "community",
"id": "t5_ghi789",
"parsedId": "ghi789",
"communityName": "r/subreddit",
"parsedCommunityName": "subreddit",
"title": "Subreddit Title",
"url": "https://www.reddit.com/r/subreddit/",
"subscribers": 150000,
"description": "Subreddit description...",
"createdAt": "2020-01-01T00:00:00.000Z",
"scrapedAt": "2025-01-15T12:00:00.000Z",
"over18": false,
"iconUrl": "https://...",
"bannerUrl": "https://...",
"activeUsers": 500
}

Usage Examples

Example 1: Scrape Recent Posts from a Subreddit

{
"startUrls": [
{ "url": "https://www.reddit.com/r/technology/" }
],
"maxPostCount": 50,
"maxCommentsPerPost": 10,
"sort": "new",
"maxPostAgeDays": 7
}

Example 2: Search for Posts About a Topic

{
"searchQuery": "artificial intelligence",
"searchPosts": true,
"ignoreStartUrls": true,
"maxPostCount": 100,
"sort": "top",
"time": "week"
}

Example 3: Scrape User's Posts

{
"startUrls": [
{ "url": "https://www.reddit.com/user/username/" }
],
"maxPostCount": 25,
"skipComments": true,
"sort": "new"
}

Example 4: Deep Dive into Specific Post

{
"startUrls": [
{ "url": "https://www.reddit.com/r/AskReddit/comments/abc123/" }
],
"maxPostCount": 1,
"maxCommentsPerPost": 500
}

Example 5: Search Communities and Users

{
"searchQuery": "machine learning",
"searchCommunities": true,
"ignoreStartUrls": true,
"maxPostCount": 20
}

Tips & Best Practices

  1. Rate Limiting: Use proxies (enabled by default) to avoid rate limiting when scraping large amounts of data
  2. Pagination: Use startPage and endPage to scrape specific sections of subreddits
  3. Date Filtering: Combine maxPostAgeDays with sort: "new" for recent content
  4. Comment Depth: Set maxCommentsPerPost: 0 if you only need post data without comments
  5. Debug Mode: Enable debugMode: true to troubleshoot issues and see detailed logs
  6. Search Efficiency: Use ignoreStartUrls: true when you only want search results
  7. NSFW Content: Set includeNSFW: true only if your use case requires it

Limitations

  • Maximum 10,000 posts per run
  • Maximum 1,000 comments per post
  • Stickied posts are automatically skipped
  • Deleted and removed comments are filtered out
  • Reddit's JSON API has inherent rate limits

Error Handling

The scraper includes robust error handling:

  • Automatic retries for failed requests (configurable)
  • Graceful handling of deleted content
  • Validation of input parameters
  • Detailed error logging in debug mode

Performance

  • Concurrency: Adjust maxConcurrency based on your needs (default: 10)
  • Proxy Support: Built-in Apify proxy support for high-volume scraping
  • Memory Efficient: Streams data to output as it's scraped

Privacy & Ethics

This scraper accesses only publicly available data through Reddit's JSON API. Please:

  • Respect Reddit's Terms of Service
  • Don't overwhelm Reddit's servers with excessive requests
  • Use the data responsibly and ethically
  • Consider user privacy when handling scraped data

Support

For issues, questions, or feature requests, please refer to the actor's support channels on the Apify platform.