Reddit Posts Scraper avatar

Reddit Posts Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Reddit Posts Scraper

Reddit Posts Scraper

🧰 Reddit Posts Scraper pulls structured Reddit data—titles, bodies, media, flair, author, subreddit, upvotes, comments, awards, dates & links—from subreddits, users, and searches. ⚙️ Exports JSON/CSV. 🚀 Ideal for market research, trend analysis, sentiment & content curation.

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

ScrapeEngine

ScrapeEngine

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Reddit Posts Scraper

Reddit Posts Scraper is a production-ready Apify actor that extracts public Reddit posts—and optionally comments—from subreddits, full Reddit URLs, or keyword searches, delivering clean, structured JSON you can export to CSV. It solves the pain of manual collection and unreliable tools by providing a reliable Reddit post scraper with parallel comment fetching, automatic proxy fallback, and robust retry logic. Built for marketers, developers, data analysts, and researchers, this subreddit scraper scales from quick one-offs to automated monitoring and bulk runs, enabling large-scale trend tracking, sentiment analysis, and content curation.

What data / output can you get?

Below are the exact fields this Reddit data scraper saves to the dataset for each post:

Data fieldDescriptionExample value
subredditCommunity name"news"
titlePost title"Example post title"
authorReddit username of the poster"username123"
scoreUpvotes / score156
num_commentsNumber of comments42
created_utcUnix timestamp (UTC)1703123456
permalinkPermalink to the Reddit thread"https://www.reddit.com/r/news/comments/abc123/example_post/"
bodySelftext/body of the post"Post content..."
thumbnail_urlThumbnail image URL"https://preview.redd.it/..."
image_urlMain image or media URL (if any)"https://i.redd.it/..."
commentsArray of comments with nested replies[{"author":"userA","body":"Nice post!","score":12,"created_utc":1703123457,"replies":[]}]
post_idReddit post ID"abc123"
successWhether the post was processed successfullytrue
error_messageError message if processing failednull

Notes:

  • comments contains nested replies with the same structure (author, body, score, created_utc, replies).
  • Some fields (e.g., image_url, thumbnail_url, body) may be empty if not present in the original post.

You can export results as JSON or CSV from the Apify dataset or fetch them programmatically via the Apify API—perfect when you need to export Reddit posts to CSV for analysis or dashboards.

Key features

  • 🚀 Parallel comment processing
    Efficiently handle larger batches with parallel fetching of comments for each post—ideal for a bulk Reddit thread scraper.

  • 🧩 Flexible source targeting
    Feed subreddits (e.g., news or r/technology), full Reddit URLs, or keyword queries in a single list to scrape Reddit posts at scale.

  • 🔀 Sorting + time filtering
    Choose sortOrder (hot, new, top, rising) and apply a timeFilter (hour, day, week, month, year, all) when applicable for focused results.

  • 🛡️ Automatic proxy fallback
    Built-in fallback from direct connection to datacenter to residential proxies on blocks (403/429) to maximize reliability for continuous runs.

  • 🔁 Robust retry logic
    Retries handle blocks, timeouts, and upstream/proxy errors with backoff strategies to keep your Reddit scraping tool resilient.

  • 📤 Structured, integration-ready data
    Clean JSON output with post metadata and nested comments that you can export to CSV or consume via API for downstream workflows.

  • 🧑‍💻 Developer friendly
    Call runs and fetch datasets via the Apify API from Python or Node.js—great for building an automated Reddit API scraper pipeline.

  • 💾 Live dataset streaming
    Results are pushed to the dataset as they’re processed, so you keep partial data even if long runs stop early.

How to use Reddit Posts Scraper - step by step

  1. 🔑 Sign in at console.apify.com with your Apify account.
  2. 🔍 Find “Reddit Posts Scraper” in the Apify Store and open the actor.
  3. 🧾 Add your input in the “Reddit URLs / Subreddits / Keywords” field (one per line). You can mix:
  4. ⚙️ Configure settings as needed:
    • sortOrder (hot, new, top, rising)
    • timeFilter (hour, day, week, month, year, all) — only applies to top or rising
    • maxPosts and maxComments limits
    • proxyConfiguration for reliable, large-scale runs
  5. ▶️ Start the run and watch logs as it collects posts and processes comments in parallel.
  6. 💾 Open the Output (Dataset) tab to preview results and export them as JSON or CSV, or pull them via the Apify API.

Pro tip: Use the Apify API from your Python Reddit scraper or Node.js scripts to schedule runs and build a Reddit post monitoring workflow that keeps your datasets fresh over time.

Use cases

Use caseDescription
Market & trend researchTrack trending posts by keyword or subreddit to quantify interest and surface emerging topics.
Content & SEO planningIdentify high-engagement threads to inspire content calendars and optimize topical coverage.
Brand monitoringMonitor mentions and sentiment across communities to detect issues and opportunities early.
Academic & social researchCollect public discourse (titles, bodies, comments) for reproducible studies and NLP projects.
Data pipeline / API workflowsAutomate runs via Apify API and export Reddit data to CSV/JSON for BI dashboards or databases.
Competitive intelligenceAnalyze discussions around competitors’ products or niches to inform strategy and positioning.

Why choose Reddit Posts Scraper?

This Reddit web scraper is engineered for reliability and scale with structured outputs, automatic proxy fallback, and parallel processing.

  • ✅ Accurate structured extraction: Captures subreddit, title, author, score, comment counts, timestamps, permalinks, media, and nested comments.
  • ⚡ Scales to bulk: Handles multiple sources with parallel comment fetching—true automated subreddit scraping.
  • 🧑‍💻 Developer access: Control runs and fetch results over the Apify API from Python or Node.js for CI/CD and data pipelines.
  • 🛡️ Reliable by design: Retries and proxy fallback mitigate 403/429 and upstream errors for consistent results.
  • 🔒 No login, no browser: Works with public Reddit endpoints for streamlined, safe operations.
  • 🔗 Integration-ready: Export JSON/CSV to feed analytics stacks, warehouses, or custom apps.

Unlike browser extensions or ad‑hoc scripts, this production-ready Reddit posts extractor delivers consistent, scalable outcomes with observability and structured exports.

Yes—when used responsibly. This actor targets publicly available Reddit content and does not require login or access to private subreddits.

Guidelines for compliant use:

  • Scrape only public data and respect Reddit’s terms.
  • Avoid collecting or misusing personal data from users.
  • Follow applicable data protection laws (e.g., GDPR, CCPA).
  • Do not attempt to access private or authenticated content.
  • Consult your legal team for edge cases or regulated use.

Input parameters & output format

Example JSON input

{
"startUrls": [
"https://www.reddit.com/r/news/",
"news",
"artificial intelligence"
],
"sortOrder": "top",
"timeFilter": "week",
"maxPosts": 50,
"maxComments": 100,
"proxyConfiguration": { "useApifyProxy": false }
}

Input parameter reference

  • startUrls (array, required)
    Description: Enter one item per line. Mix full Reddit URLs (e.g., https://www.reddit.com/r/news/), subreddit names (e.g., news or r/news), or search keywords (e.g., artificial intelligence). Duplicate subreddits are merged.
    Default: none

  • maxPosts (integer)
    Description: Max number of posts to scrape per subreddit or keyword (1–1000). If you have 3 sources and set 50, you can get up to 150 posts total.
    Default: 50

  • maxComments (integer)
    Description: Max comments to fetch for each post (0–1000). Set to 0 to skip comments and only get post metadata (faster).
    Default: 100

  • sortOrder (string)
    Description: How Reddit should sort the posts. Hot = trending now, New = latest first, Top = most upvoted, Rising = gaining traction.
    Default: "top"

  • timeFilter (string)
    Description: Time range for results. Only applies when sortOrder is Top or Rising (ignored for Hot and New).
    Default: "week"

  • proxyConfiguration (object)
    Description: Choose which proxies to use. If Reddit blocks a request, the actor automatically falls back: no proxy → datacenter → residential. Recommended for large runs or when you hit blocks.
    Default: { "useApifyProxy": false }

Example JSON output item (one post)

{
"subreddit": "news",
"title": "Example post title",
"author": "username",
"score": 156,
"num_comments": 42,
"created_utc": 1703123456,
"permalink": "https://www.reddit.com/r/news/comments/abc123/example_post/",
"body": "Post content...",
"thumbnail_url": "https://preview.redd.it/...",
"image_url": "https://i.redd.it/...",
"comments": [
{
"author": "commenter1",
"body": "Comment text...",
"score": 23,
"created_utc": 1703123457,
"replies": []
}
],
"post_id": "abc123",
"success": true,
"error_message": null
}

Notes:

  • comments includes nested replies with the same structure (author, body, score, created_utc, replies).
  • Some fields (e.g., image_url, thumbnail_url, body) may be empty if not present in the original post.

FAQ

Does the scraper include comments?

Yes. Set maxComments > 0 to fetch comment threads. If you only need post metadata, set maxComments to 0 for faster runs.

Can I target subreddits, URLs, and keywords together?

Yes. Add them all to startUrls (one per line). The actor accepts full Reddit URLs, subreddit names, and search keywords in one list.

How many posts can I scrape per source?

Up to maxPosts per source, between 1 and 1000. For example, with 3 sources and maxPosts set to 50, you can collect up to 150 posts in total.

Do I need login or API keys?

No. The actor works with publicly accessible Reddit endpoints and does not require login or API keys.

What happens if Reddit blocks requests?

The actor attempts automatic proxy fallback (direct → datacenter → residential) and retries on blocks and transient errors to improve success rates.

Can I export results to CSV?

Yes. Open the dataset in the Output tab and export as JSON or CSV, or fetch data programmatically via the Apify API.

Can developers use this from Python or Node.js?

Yes. Trigger runs and download datasets via the Apify API from Python or Node.js to build automated pipelines or integrate with your systems.

Does keyword search work across Reddit?

Yes. If a startUrls entry looks like a keyword query, the actor uses Reddit’s search endpoint to collect matching posts.

Final thoughts

Reddit Posts Scraper is built to extract structured Reddit posts (and comments) from subreddits, URLs, and keyword searches—fast, reliable, and export‑ready. With parallel comment fetching, automatic proxy fallback, and clean JSON/CSV output, it’s ideal for marketers, developers, data analysts, and researchers. Use the Apify API to integrate with Python or Node.js and automate your subreddit monitoring pipeline. Start extracting smarter Reddit insights at scale.