Reddit Comment Scraper avatar

Reddit Comment Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Reddit Comment Scraper

Reddit Comment Scraper

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

ScrapeMesh

ScrapeMesh

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

A powerful Apify Actor for scraping comments from Reddit posts. Extract comments, replies, author information, upvotes, and more from any Reddit post URL.

Why Choose Us?

  • Comprehensive Data Extraction: Get all comment details including author, content, upvotes, permalinks, and nested replies
  • Smart Proxy Management: Automatic fallback from direct connection to datacenter to residential proxies
  • Bulk Processing: Process multiple Reddit post URLs simultaneously
  • Flexible Configuration: Customize sort order, comment limits, and reply depth
  • Reliable & Fast: Built with async/await for optimal performance

Key Features

  • ✅ Extract comments from Reddit posts
  • ✅ Support for nested replies (configurable depth)
  • ✅ Multiple sort orders (hot, new, top, controversial, old)
  • ✅ Automatic proxy fallback (no proxy → datacenter → residential)
  • ✅ Bulk URL processing
  • ✅ Detailed logging and progress tracking
  • ✅ Structured JSON output

Input

JSON Example

{
"startUrls": [
{
"url": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/"
}
],
"maxComments": 50,
"sortOrder": "hot",
"replyLimit": 1,
"proxyConfiguration": {
"useApifyProxy": false
}
}

Input Fields

  • startUrls (required): Array of Reddit post URLs to scrape
  • maxComments (optional, default: 50): Maximum number of comments to fetch per URL (1-10000)
  • sortOrder (optional, default: "hot"): How to sort comments - "hot", "new", "top", "controversial", or "old"
  • replyLimit (optional, default: 1): Maximum number of replies to extract per comment (1-100)
  • proxyConfiguration (optional): Proxy settings. By default, no proxy is used. If Reddit blocks requests, the actor automatically falls back to datacenter then residential proxies.

Output

The actor outputs data in two formats:

  1. Dataset: Individual comment records with URL reference
  2. Key-Value Store: Grouped comments by URL (matching the original output.json format)

Output Format

Each comment includes:

  • url: The Reddit post URL
  • comment_id: Unique comment identifier
  • post_id: Post identifier
  • author: Comment author username
  • permalink: Direct link to the comment
  • upvotes: Number of upvotes
  • content_type: Type of content (usually "text")
  • parent_id: Parent comment ID (if it's a reply)
  • author_avatar: Author avatar URL (if available)
  • userUrl: Link to user's Reddit profile
  • contentText: The comment text content
  • created_time: Timestamp (if available)
  • replies: Array of nested replies (if any)

Example Output

{
"https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/": [
{
"comment_id": "lhk1f7n",
"post_id": "t3_1epeshq",
"author": "AutoModerator",
"permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/",
"upvotes": 1,
"content_type": "text",
"parent_id": "t3_1epeshq",
"author_avatar": "",
"userUrl": "https://www.reddit.com/user/AutoModerator/",
"contentText": "Comment text here...",
"created_time": "",
"replies": []
}
]
}

🚀 How to Use the Actor (via Apify Console)

  1. Log in at https://console.apify.com and go to Actors
  2. Find reddit-comment-scraper and click it
  3. Configure inputs:
    • Add Reddit post URLs in the startUrls field
    • Set maxComments (default: 50)
    • Choose sortOrder (default: "hot")
    • Set replyLimit (default: 1)
    • Configure proxy settings if needed
  4. Click Run to start the actor
  5. Monitor logs in real time
  6. Access results in the OUTPUT tab
  7. Export results to JSON or CSV

Best Use Cases

  • Market Research: Analyze user opinions and discussions on specific topics
  • Sentiment Analysis: Collect comments for sentiment analysis
  • Content Aggregation: Gather comments for content creation or research
  • Community Monitoring: Track discussions in specific subreddits
  • Data Analysis: Extract structured data for statistical analysis

Frequently Asked Questions

Q: Can I scrape comments from private subreddits?
A: No, this actor only scrapes publicly available content from Reddit.

Q: What happens if Reddit blocks my requests?
A: The actor automatically falls back through proxy options: first tries direct connection, then datacenter proxy, then residential proxy with retries.

Q: How many comments can I scrape per URL?
A: You can set maxComments from 1 to 10,000 per URL.

Q: Does the actor respect Reddit's rate limits?
A: Yes, the actor includes delays between requests to be respectful to Reddit's API.

Q: Can I scrape nested replies?
A: Yes, use the replyLimit parameter to control how many nested replies to extract per comment.

Support and Feedback

For issues, questions, or feedback, please contact support or create an issue in the actor repository.

Cautions

  • Data is collected only from publicly available sources
  • No data is taken from private accounts or password-protected content
  • The end user is responsible for ensuring legal compliance (spam laws, privacy, data protection, etc.)
  • Please respect Reddit's Terms of Service and rate limits