Reddit Thread Details Scraper avatar

Reddit Thread Details Scraper

Pricing

from $1.20 / 1,000 results

Go to Apify Store
Reddit Thread Details Scraper

Reddit Thread Details Scraper

Reddit Thread Details Scraper automates extraction of comprehensive thread metadata including post content, engagement metrics, author information, and moderation data. Efficiently collect detailed Reddit data for social listening, market research, sentiment analysis, and community insights.

Pricing

from $1.20 / 1,000 results

Rating

0.0

(0)

Developer

ecomscrape

ecomscrape

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Contact

If you encounter any issues or need to exchange information, please feel free to contact us through the following link: My profile

Reddit Thread Details Scraper: Extract Complete Reddit Post Data for Social Intelligence

Introduction

Reddit stands as one of the world's largest social news aggregation and discussion platforms, hosting thousands of active communities (subreddits) covering virtually every topic imaginable. With hundreds of millions of monthly active users, Reddit serves as a critical source of authentic user opinions, trend discussions, technical support, product reviews, and community-driven content across countless niches.

For businesses conducting social listening, market researchers tracking sentiment, community managers monitoring brand mentions, or data analysts studying online behavior, accessing detailed Reddit thread data is essential. However, manually collecting comprehensive post information including engagement metrics, timestamps, author details, and moderation metadata is impractical at scale.

The Reddit Thread Details Scraper automates extraction of complete thread-level data, enabling systematic analysis of discussions, sentiment tracking, trend identification, and competitive intelligence gathering from specific Reddit posts across any subreddit.

Scraper Overview

The Reddit Thread Details Scraper is a specialized extraction tool designed to collect comprehensive metadata from individual Reddit threads. It captures over 100 data fields including post content, engagement metrics, author information, flair data, moderation status, awards, and community-specific attributes.

Key advantages include built-in error handling for failed URLs, ability to process multiple threads simultaneously, and extraction of both visible and backend metadata unavailable through Reddit's interface. The tool is valuable for social media analysts, market researchers, brand monitoring teams, academic researchers, and community managers needing detailed Reddit data.

The scraper extracts rich metadata including voting patterns, comment counts, crosspost information, content moderation status, author credentials, and temporal data for trend analysis. It maintains data accuracy while implementing ethical scraping practices.

Input Configuration

Example url 1: https://www.reddit.com/r/Python/comments/1s9jc2z/reaching_100_type_coverage_by_deleting/

Example url 2: https://www.reddit.com/r/Python/comments/1s9jc2z/comment/odp0m2x

Example url 3: https://www.reddit.com/r/Python/comments/1sa1qcz/thursday_daily_thread_python_careers_courses_and/

Example Screenshot of product information page:

Input Format

The scraper accepts a simple JSON configuration focused on extracting data from specific Reddit thread URLs.

{
"urls": [
"https://www.reddit.com/r/Python/comments/1sa1qcz/thursday_daily_thread_python_careers_courses_and/"
],
"ignore_url_failures": true
}

The urls parameter: Add URLs of specific Reddit threads you want to scrape. You can paste URLs one by one, or use bulk edit to add a prepared list. URLs should be complete thread permalinks, not subreddit pages or user profiles.

The ignore_url_failures parameter: If true, the scraper continues running even if some URLs fail to load. Essential for processing large lists where some threads may be deleted or inaccessible.

Output Format

[
{
"approved_at_utc": null,
"subreddit": "Python",
"selftext": "On the Pyrefly team, we've always believed that type coverage is one of the most important indicators of code quality. Over the past year, we've worked closely with teams across large Python codebases at Meta - improving performance, tightening soundness, and making type checking a seamless part of everyday development.\n\nBut one question kept coming up: What would it take to reach 100% type coverage?\n\nToday, we're excited to share a breakthrough ;-)\n\nLink to full blog: https://pyrefly.org/blog/100-percent-type-coverage/",
"user_reports": [],
"saved": false,
"mod_reason_title": null,
"gilded": 0,
"clicked": false,
"title": "Reaching 100% Type Coverage by Deleting Unannotated Code",
"link_flair_richtext": [
{
"e": "text",
"t": "Discussion"
}
],
"subreddit_name_prefixed": "r/Python",
"hidden": false,
"pwls": 6,
"link_flair_css_class": "discussion",
"downs": 0,
"thumbnail_height": null,
"top_awarded_type": null,
"hide_score": false,
"name": "t3_1s9jc2z",
"quarantine": false,
"link_flair_text_color": "light",
"upvote_ratio": 0.92,
"author_flair_background_color": null,
"subreddit_type": "public",
"ups": 150,
"total_awards_received": 0,
"media_embed": {},
"thumbnail_width": null,
"author_flair_template_id": null,
"is_original_content": false,
"author_fullname": "t2_13i16q",
"secure_media": null,
"is_reddit_media_domain": false,
"is_meta": false,
"category": null,
"secure_media_embed": {},
"link_flair_text": "Discussion",
"can_mod_post": false,
"score": 150,
"approved_by": null,
"is_created_from_ads_ui": false,
"author_premium": false,
"thumbnail": "self",
"edited": false,
"author_flair_css_class": null,
"author_flair_richtext": [],
"gildings": {},
"content_categories": null,
"is_self": true,
"mod_note": null,
"created": 1775047051.0,
"link_flair_type": "richtext",
"wls": 6,
"removed_by_category": null,
"banned_by": null,
"author_flair_type": "text",
"domain": "self.Python",
"allow_live_comments": false,
"selftext_html": "<!-- SC_OFF --><div class=\"md\"><p>On the Pyrefly team, we&#39;ve always believed that type coverage is one of the most important indicators of code quality. Over the past year, we&#39;ve worked closely with teams across large Python codebases at Meta - improving performance, tightening soundness, and making type checking a seamless part of everyday development.</p>\n\n<p>But one question kept coming up: What would it take to reach 100% type coverage?</p>\n\n<p>Today, we&#39;re excited to share a breakthrough ;-)</p>\n\n<p>Link to full blog: <a href=\"https://pyrefly.org/blog/100-percent-type-coverage/\">https://pyrefly.org/blog/100-percent-type-coverage/</a></p>\n</div><!-- SC_ON -->",
"likes": null,
"suggested_sort": null,
"banned_at_utc": null,
"view_count": null,
"archived": false,
"no_follow": false,
"is_crosspostable": false,
"pinned": false,
"over_18": false,
"all_awardings": [],
"awarders": [],
"media_only": false,
"link_flair_template_id": "0df42996-1c5e-11ea-b1a0-0e44e1c5b731",
"can_gild": false,
"spoiler": false,
"locked": false,
"author_flair_text": null,
"treatment_tags": [],
"visited": false,
"removed_by": null,
"num_reports": null,
"distinguished": null,
"subreddit_id": "t5_2qh0y",
"author_is_blocked": false,
"mod_reason_by": null,
"removal_reason": null,
"link_flair_background_color": "#f50057",
"id": "1s9jc2z",
"is_robot_indexable": true,
"num_duplicates": 0,
"report_reasons": null,
"author": "BeamMeUpBiscotti",
"discussion_type": null,
"num_comments": 23,
"send_replies": true,
"media": null,
"contest_mode": false,
"author_patreon_flair": false,
"author_flair_text_color": null,
"permalink": "/r/Python/comments/1s9jc2z/reaching_100_type_coverage_by_deleting/",
"stickied": false,
"url": "https://www.reddit.com/r/Python/comments/1s9jc2z/reaching_100_type_coverage_by_deleting/",
"subreddit_subscribers": 1465596,
"created_utc": 1775047051.0,
"num_crossposts": 0,
"mod_reports": [],
"is_video": false
}
]

The scraper returns comprehensive thread data with 100+ fields organized by category:

Core Post Information:

  • ID: Unique Reddit post identifier. Essential for database management, duplicate detection, and tracking posts across datasets.
  • Title: Thread headline. Primary content for keyword analysis, trend detection, and topic classification.
  • Selftext: Post body content. Core data for sentiment analysis, content analysis, and understanding discussion context.
  • Selftext HTML: HTML-formatted post content. Preserves formatting, links, and structure for rich text analysis.
  • URL: Thread permalink. Reference link for verification and accessing current state.
  • Permalink: Relative URL path. Alternative reference format for Reddit-specific tools.
  • Domain: Source domain for link posts. Identifies external content sources and link-sharing patterns.

Engagement Metrics:

  • Score: Net upvotes (ups minus downs). Primary engagement indicator for post popularity.
  • Upvote Ratio: Percentage of upvotes. Reveals controversy level and community reception.
  • Ups: Total upvotes received. Positive engagement metric.
  • Downs: Total downvotes received. Negative engagement indicator.
  • Num Comments: Total comment count. Measures discussion depth and engagement.
  • Gilded: Number of Reddit Gold awards. Premium engagement indicator.
  • Total Awards Received: Count of all award types. Shows exceptional content quality or community appreciation.
  • All Awardings: Detailed award breakdown. Specific award types and quantities.

Author Information:

  • Author: Username of post creator. Identifies content creators and influential users.
  • Author Fullname: Full Reddit user identifier. Unique author tracking across posts.
  • Author Flair Text: User flair in subreddit. Shows expertise, status, or community role.
  • Author Premium: Reddit Premium subscription status. Indicates engaged platform users.
  • Author Patreon Flair: Patreon supporter status. Identifies financially engaged users.

Subreddit Context:

  • Subreddit: Community name. Essential for community-specific analysis.
  • Subreddit ID: Unique subreddit identifier. Database linking and organization.
  • Subreddit Name Prefixed: Full subreddit name with r/ prefix. Display format.
  • Subreddit Type: Community type (public, restricted, private). Access and participation constraints.
  • Subreddit Subscribers: Community size. Context for engagement metrics.

Post Classification:

  • Link Flair Text: Category label assigned to post. Topical classification within subreddit.
  • Link Flair CSS Class: Styling class for flair. Visual categorization.
  • Category: Content category. High-level classification.
  • Is Self: Boolean indicating text vs. link post. Content type identification.
  • Is Video: Video content indicator. Media type classification.
  • Over 18: NSFW status. Content filtering flag.

Temporal Data:

  • Created UTC: Post creation timestamp. Essential for time-series analysis, trend tracking, and temporal patterns.
  • Edited: Edit timestamp if modified. Content update tracking.
  • Approved At UTC: Moderator approval time. Moderation timeline data.
  • Banned At UTC: Removal timestamp. Content moderation tracking.

Moderation & Status:

  • Locked: Comments disabled status. Indicates moderation intervention.
  • Archived: Thread archived (no new interactions). Shows content lifecycle stage.
  • Removed By: Moderator who removed post. Moderation accountability.
  • Removal Reason: Explanation for removal. Policy violation insights.
  • Distinguished: Moderator/admin designation. Official communication indicator.
  • Stickied: Pinned to subreddit top. Important community announcements.
  • Spoiler: Spoiler tag status. Content warning flag.

Engagement Features:

  • Contest Mode: Contest mode enabled status. Special voting configuration.
  • Allow Live Comments: Live discussion enabled. Real-time engagement feature.
  • Suggested Sort: Recommended comment sorting. Discussion organization preference.
  • Hide Score: Score visibility setting. Engagement display configuration.

Crossposting & Sharing:

  • Num Crossposts: Times shared to other subreddits. Viral spread indicator.
  • Num Duplicates: Duplicate post count. Repost detection.
  • Is Crosspostable: Sharing permission status. Content distribution control.

Technical Metadata:

  • Is Robot Indexable: Search engine indexing permission. SEO and discoverability.
  • No Follow: Link following restriction. SEO implications.
  • Quarantine: Quarantine status. Content restriction indicator.
  • Treatment Tags: A/B testing or feature flags. Platform experiment data.

Each field enables specific analysis types including sentiment analysis, trend detection, influencer identification, community health monitoring, content moderation research, and engagement optimization.

Usage Guide

Setting Up Thread Extraction

Step 1: Collect Target Thread URLs

Identify specific Reddit threads for analysis:

  • Brand mention threads for reputation monitoring
  • Product discussion threads for market research
  • Technical support threads for customer insights
  • Trending discussions for competitive intelligence
  • Community announcements for sentiment tracking

Copy complete thread URLs from your browser.

Step 2: Build URL List

Create your URL collection:

  • Add URLs individually for focused analysis
  • Use bulk edit for large-scale extraction
  • Organize by subreddit or topic for structured research
  • Include historical threads for temporal analysis

Step 3: Configure Error Handling

Enable ignore_url_failures to ensure deleted or removed threads don't stop extraction. Critical when processing historical URLs where some posts may no longer exist.

Best Practices

URL Collection:

  • Verify URLs are complete thread permalinks
  • Test sample URLs before batch processing
  • Track URL sources for organized analysis
  • Handle deleted threads gracefully

Data Applications:

Social Listening:

  • Monitor brand mentions across subreddits
  • Track sentiment in product discussions
  • Identify emerging issues or complaints
  • Measure community response to announcements

Market Research:

  • Analyze product feedback and reviews
  • Identify user pain points and feature requests
  • Track competitor mentions and comparisons
  • Understand customer language and terminology

Community Analysis:

  • Study engagement patterns by flair or author
  • Track moderation activity and policies
  • Identify influential community members
  • Analyze content lifecycle and archival patterns

Trend Detection:

  • Monitor award patterns for exceptional content
  • Track crossposting for viral spread
  • Analyze temporal engagement patterns
  • Identify emerging topics through upvote velocity

Common Use Cases

Brand Monitoring: Extract threads mentioning your brand or products, track sentiment through engagement metrics, identify customer support issues, and monitor competitive discussions.

Influencer Identification: Analyze author engagement patterns, identify high-karma contributors, track award recipients, and locate community experts by flair.

Content Strategy: Study successful post characteristics, analyze optimal posting times through created_utc, understand flair usage patterns, and identify engaging content formats.

Moderation Research: Track removal patterns, analyze moderation response times, study policy enforcement, and understand community guidelines impact.

Benefits and Applications

Primary Applications:

Social Listening & Brand Monitoring: Track brand mentions, product discussions, and customer sentiment across relevant subreddits with comprehensive engagement and sentiment indicators.

Market Research: Gather authentic user feedback, identify product improvement opportunities, understand competitive positioning, and discover unmet market needs through genuine community discussions.

Community Management: Monitor community health, track engagement trends, identify influential members, understand moderation needs, and measure content performance.

Academic Research: Study online communities, analyze discussion patterns, research content moderation, investigate information spread, and examine social media dynamics with rich metadata.

Competitive Intelligence: Monitor competitor mentions, track product comparisons, identify market gaps, and understand customer preferences through authentic discussions.

Advantages:

  • Access to 100+ metadata fields unavailable in Reddit interface
  • Comprehensive engagement metrics for quantitative analysis
  • Moderation data revealing community management practices
  • Temporal data enabling trend and pattern analysis
  • Author information for influencer identification
  • Award data indicating exceptional content quality

Data integrates with sentiment analysis tools, social media monitoring platforms, research databases, and business intelligence systems for immediate activation in marketing, product development, and strategic planning.

Conclusion

The Reddit Thread Details Scraper transforms manual Reddit research into efficient automated data collection. By extracting comprehensive thread metadata from the world's largest discussion platform, it enables data-driven insights for social listening, market research, and community analysis.

Whether monitoring brand sentiment, conducting market research, studying online communities, or tracking trends, this scraper provides detailed extraction capabilities to accelerate your social intelligence gathering.

Ready to unlock Reddit insights? Start extracting comprehensive thread data today and transform your social media intelligence capabilities.

Your feedback

We are always working to improve Actors' performance. So, if you have any technical feedback about Reddit Thread Details Scraper or simply found a bug, please create an issue on the Actor's Issues tab in Apify Console.