Reddit Posts Scraper avatar

Reddit Posts Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Reddit Posts Scraper

Reddit Posts Scraper

🔎 Reddit Posts Scraper pulls posts & comments from subreddits or users—titles, bodies, upvotes, score, flair, author, timestamps, links & media. 📊 Great for research, social listening, SEO, sentiment & trend analysis. ⚙️ Filters & keywords. 💾 Export CSV/JSON. 🚀

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

Scraply

Scraply

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

22 days ago

Last modified

Share

Reddit Posts Scraper

Reddit Posts Scraper is a fast, reliable Reddit scraper that collects public posts (and optional comments) from subreddits, full Reddit URLs, or search keywords. It solves the pain of manual copying and API limits by returning clean, structured JSON ready for analysis. Built for marketers, developers, data analysts, and researchers, this subreddit scraper helps you scrape Reddit posts at scale for trend tracking, SEO research, social listening, and NLP pipelines. With parallel processing, smart retries, and proxy fallback, it enables high-volume Reddit thread scraping with production reliability. 🚀

What data / output can you get?

Below are the exact fields this Reddit web scraper pushes to the Apify dataset (one row per post):

Data typeDescriptionExample value
subredditCommunity name the post belongs to"technology"
titlePost title text"Open-source LLM hits new benchmark"
authorReddit username of the poster"u_datawizard"
scorePost score/upvotes842
num_commentsNumber of comments on the post126
created_utcUnix timestamp (UTC) when the post was created1703123456
permalinkFull permalink to the Reddit thread"https://www.reddit.com/r/technology/comments/abc123/example_post/"
bodySelftext/body content for text posts"Here’s a quick summary of the paper..."
thumbnail_urlThumbnail image URL (if any)"https://preview.redd.it/..."
image_urlMain media URL (if provided)"https://i.redd.it/xyz.png"
commentsArray of nested comments (author, body, score, created_utc, replies)[ { "author": "u_commenter", ... } ]
post_idUnique Reddit post ID"abc123"
successWhether this post was processed successfullytrue
error_messageError details if processing failednull

Notes:

  • Nested comments include replies with the same structure (author, body, score, created_utc, replies).
  • You can export results as JSON, CSV, or Excel from the Apify dataset UI or via the API.

Key features

  • ⚡ Bold-scale extraction Parallel comment fetching and batched processing to scrape subreddit posts efficiently across multiple sources.
  • 🧩 Flexible targeting Scrape Reddit posts by subreddit names, full Reddit URLs, or keywords in a single run—perfect for a Reddit thread scraper or Reddit post extractor.
  • 🔄 Sort & time filtering Choose hot, new, top, or rising with a time range for top/rising—ideal to scrape subreddit posts for trend snapshots.
  • 🛡️ Resilient proxy fallback Automatic fallback from no proxy → datacenter → residential if blocked, plus smart retries on 403/429/5xx and timeouts.
  • 💬 Optional comments Configure how many comments to fetch per post, or set zero to skip—streamlines both Reddit posts and Reddit comments scraper use cases.
  • 💾 Structured outputs Export-ready JSON for dashboards or to export Reddit posts to CSV; consistent fields for easy joins and analytics.
  • 🧑‍💻 Developer-friendly Built in Python (Python Reddit scraper) with Apify SDK—trigger via API and integrate with Make, Zapier, or n8n.
  • 🏗️ Production-ready reliability Request pacing, parallelism controls, and detailed logging ensure a stable Reddit scraping tool for large runs.

How to use Reddit Posts Scraper - step by step

  1. Create or log in to your Apify account.
  2. Open the Reddit Posts Scraper in Apify Console.
  3. Add your sources in “Reddit URLs / Subreddits / Keywords”:
    • Enter subreddit names (e.g., “news” or “r/technology”), full Reddit URLs, or search keywords (e.g., “artificial intelligence”). One per line.
  4. Set optional controls:
    • Sort order (hot, new, top, rising) and time filter (hour, day, week, month, year, all) for top/rising.
    • Limits for maximum posts and maximum comments per post.
    • Proxy configuration (recommended for larger volumes).
  5. Start the run and monitor logs as posts are collected and comments are processed in parallel.
  6. Download results:
    • Go to the Dataset in the Output tab to preview results and export to JSON, CSV, or Excel.
  7. Pro Tip: Trigger runs via the Apify API and pipe dataset URLs to your data stack or automation (n8n, Make, Zapier) for scheduled Reddit scraping without API credentials.

Use cases

Use caseDescription
Market & trend researchTrack trending topics by keyword or subreddit to quantify engagement and sentiment over time.
Content & SEO researchDiscover high-performing topics and questions to inform content calendars and SERP targeting.
Brand & competitor monitoringMonitor mentions across relevant communities and compare share of voice across subreddits.
NLP / ML datasetsCollect titles, bodies, and structured comment trees for training or evaluation datasets.
Academic & journalism researchCompile public quotes and discussions from Reddit threads for analysis and reporting.
Data pipelines & automationSchedule a Reddit scraping script via API, then export Reddit posts to CSV for ETL or BI dashboards.

Why choose Reddit Posts Scraper?

This Reddit scraping tool combines precision, automation, and reliability for large-scale, repeatable data collection.

  • ✅ Accurate, structured fields ready for analysis and modeling
  • 🌍 Keyword and subreddit targeting for broad or niche coverage
  • ⚙️ Scales from small tests to bulk runs with parallel processing
  • 🧑‍💻 API- and Python-friendly for developer workflows
  • 🛡️ Safer than brittle extensions—handles blocks with proxy fallback and retries
  • 💰 Cost-effective automation via Apify infrastructure and dataset exports
  • 🔌 Integrations-ready (n8n, Make, Zapier) for end-to-end pipelines

In short, a production-ready Reddit web scraper versus unstable alternatives—built for consistent data extraction at scale.

Yes—when done responsibly. This actor targets publicly available Reddit content and does not access private subreddits or authenticated data.

Guidelines for compliant use:

  • Scrape only public data and respect Reddit’s platform policies.
  • Do not misuse personal information found in public posts or comments.
  • Observe applicable data protection laws (e.g., GDPR, CCPA) in your jurisdiction.
  • Use proxy and rate controls to minimize load and reduce the likelihood of blocks.
  • Consult your legal team for edge cases or regulated workflows.

Input parameters & output format

Example input (JSON)

{
"startUrls": [
"https://www.reddit.com/r/news/",
"news",
"artificial intelligence"
],
"maxPosts": 50,
"maxComments": 100,
"sortOrder": "top",
"timeFilter": "week",
"proxyConfiguration": { "useApifyProxy": false }
}

Input fields

  • startUrls (array of strings, required)
    • Description: One per line — mix full Reddit URLs, subreddit names (e.g., news or r/news), or search keywords.
    • Default: None
  • maxPosts (integer)
    • Description: Max posts to scrape per subreddit or keyword (1–1000).
    • Default: 50
  • maxComments (integer)
    • Description: Max comments to fetch per post (0–1000). Set 0 to skip comments.
    • Default: 100
  • sortOrder (string; one of: hot, new, top, rising)
    • Description: How posts are ordered.
    • Default: top
  • timeFilter (string; one of: hour, day, week, month, year, all)
    • Description: Only applies when sortOrder is top or rising.
    • Default: week
  • proxyConfiguration (object)
    • Description: Choose proxies. If blocked, the actor falls back: no proxy → datacenter → residential.
    • Default: { "useApifyProxy": false }

Note:

  • The actor can also accept “startUrls” as a newline-separated string or as an array with URL objects via API; however, the Console form uses a string list.

Example output item (JSON)

{
"post_id": "abc123",
"title": "Example post title",
"author": "u_example",
"created_utc": 1703123456,
"num_comments": 42,
"score": 156,
"permalink": "https://www.reddit.com/r/news/comments/abc123/example_post/",
"image_url": "https://i.redd.it/xyz.png",
"thumbnail_url": "https://preview.redd.it/xyz-thumb.jpg",
"body": "Post content...",
"comments": [
{
"author": "u_commenter1",
"body": "Top-level comment",
"score": 23,
"created_utc": 1703123499,
"replies": [
{
"author": "u_replier",
"body": "Nested reply",
"score": 7,
"created_utc": 1703123600,
"replies": []
}
]
}
],
"subreddit": "news",
"success": true,
"error_message": null
}

Fields like image_url, thumbnail_url, and body may be empty when not present on the original post.

FAQ

Do I need a Reddit account or login to use this?

No. The actor collects publicly available Reddit data without requiring login or cookies. It fetches structured JSON from Reddit endpoints and processes it automatically.

Can it scrape comments as well as posts?

Yes. Set “Maximum comments per post” to a value greater than 0 to fetch comment threads; set it to 0 to skip comments for faster runs.

How many posts can I scrape per source?

You can set “Maximum posts per source” up to 1000. The total output depends on how many sources (subreddits, URLs, or keywords) you provide.

Does it work with proxies and handle blocks?

Yes. It automatically falls back through no proxy → datacenter → residential if Reddit blocks requests, and retries on 403/429, 5xx, timeouts, and connection/SSL issues.

Can I export results to CSV?

Yes. After the run, open the dataset in Apify and export to JSON, CSV, or Excel. You can also access results programmatically via the Apify API.

Is this a Python Reddit scraper I can integrate with my pipeline?

Yes. The actor is implemented in Python and is API-accessible, making it easy to integrate into ETL, analytics, or automation workflows (e.g., n8n, Make, Zapier).

Does it support sorting and time filters like “top this week”?

Yes. Choose hot, new, top, or rising. For top and rising, you can apply a time filter (hour, day, week, month, year, all).

What happens if some posts fail to process?

Each post reports a success flag and error_message if processing fails. The actor saves successful items as they’re scraped so you can still export partial results.

Final thoughts

Reddit Posts Scraper is built to scrape Reddit posts (and optional comments) from subreddits, URLs, or keywords with structured, export-ready output. With sort/time controls, scalable limits, and resilient proxy fallback, it’s ideal for marketers, researchers, analysts, and developers. Trigger it via API to power a Reddit API scraping workflow, connect to automation tools, or export Reddit posts to CSV for downstream analytics. Start extracting smarter Reddit insights—at scale and with confidence.