Under maintenance

Pricing

$5.00/month + usage

Try for free

Go to Apify Store

Reddit Data Scraper – Scrape Posts, Comments, Upvotes & More

Under maintenance

Try for free

Extract Reddit posts, comments, upvotes, and subreddit data with this powerful Reddit scraper. Ideal for data analysis, lead generation, trend research, and AI datasets. Scrape Reddit data at scale without API limits and export results in JSON, CSV, or Excel format.

Pricing

$5.00/month + usage

Rating

5.0

(2)

Developer

Sovanza

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Reddit Post Scraper – Extract Posts, Comments, Upvotes & Data

Extract Reddit posts, scores (upvote / net score), comment counts, and subreddit metadata with this scraper—ideal for trend analysis, content research, and NLP-ready datasets. Export in JSON, CSV, or Excel from your Apify dataset.

Scope: This actor is post-level only. It does not download comment text, nested replies, or awards. You get num_comments and rich post fields (title, selftext, link, score, etc.). For full comment trees you would need a separate flow or actor extension.

What is Reddit Post Scraper?

Reddit Post Scraper is a Reddit data extraction tool built on Apify that reads Reddit’s public JSON (append .json to listing or post URLs). No browser and no official Reddit API key. It is designed for:

Marketers
Researchers
Data analysts
Content creators
AI developers

It turns subreddit listings and post URLs into a structured dataset for trends, engagement signals, and text analysis.

Why This Reddit Scraper is Powerful

Fast and lightweight: HTTP + JSON only (no Playwright).
Flexible input: Subreddit listing URLs, direct post URLs, productUrls (string list, same pattern as Amazon/eBay actors), startUrls, legacy url, or subreddit + sort.
Engagement fields: score, upvote_ratio, num_comments, plus title and body text for NLP.
Multi-URL runs: Scrape several listings or posts in one run.
Proxy-aware: Optional Apify residential proxy or custom proxyUrl—Reddit often blocks datacenter IPs.
Automation: Apify API, schedules, webhooks; export CSV/Excel or integrate with Sheets and BI tools.

➡️ Built for insight and datasets at the post level, not full comment-thread dumps.

What Data Does Reddit Post Scraper Extract?

Each dataset item is one post (see .actor/dataset_schema.json).

Post data

Title (title)
Body (selftext, optional selftext_html)
Post URL (url) and permalink
Subreddit (subreddit, subreddit_name_prefixed)
Post id
source_url (the listing or post URL used for the request)

Engagement metrics

score — net score (upvotes minus downvotes, as returned by Reddit)
upvote_ratio — upvote ratio when present
num_comments — comment count (not the comments themselves)

Author

author — post author username when present

Media and links

link_url — outbound link for link posts
thumbnail — thumbnail URL when available
domain — link domain

Other metadata

is_self — text post vs link post
over_18 — NSFW flag
link_flair_text — flair when present
created_utc — post creation time (ISO, UTC)
scraped_at — scrape timestamp

Not included: comment bodies, nested replies, awards, or “search query” scraping across all of Reddit (use specific subreddit/listing URLs or post URLs).

➡️ All fields are structured and exportable as JSON, CSV, or Excel.

Advanced Features

Subreddit listings: Use URLs like https://www.reddit.com/r/python/hot or .../top, or pass subreddit + sort (hot, new, top, rising, controversial).
Direct posts: Open a single post URL; the actor normalizes it to the .json endpoint.
Volume control: maxPostsPerUrl — up to 100 posts per listing request (Reddit/API limit per call).
Locales: language sets Accept-Language on requests.
Bulk runs: Multiple URLs in productUrls or startUrls in one actor run.

How to Use Reddit Post Scraper on Apify

Open Reddit Post Scraper on Apify.
Under Run settings, enable Proxy where possible (this actor uses residential proxy when Apify Proxy is available).
Configure Input (at least one URL source or subreddit).
Click Start.
Open Dataset → export JSON, CSV, or Excel, or use the API.

Local run

cd reddit-post-scraper
pip install -r requirements.txt
cp INPUT.example.json INPUT.json
python main.py

Input configuration

Recommended on Apify: enable Apify Proxy (residential). Optionally set proxyUrl to your own HTTP proxy.

Example — multiple subreddit listings (use paths that include sort when you want something other than default hot, e.g. top):

{
  "productUrls": [
    "https://www.reddit.com/r/marketing/hot",
    "https://www.reddit.com/r/startups/top"
  ],
  "maxPostsPerUrl": 100,
  "language": "en",
  "proxyCountry": "US",
  "proxyUrl": ""
}

Alternate — startUrls (strings or { "url" } objects):

{
  "startUrls": [
    { "url": "https://www.reddit.com/r/marketing/hot" },
    "https://www.reddit.com/r/startups/new"
  ],
  "maxPostsPerUrl": 50,
  "language": "en",
  "proxyCountry": "AUTO_SELECT_PROXY_COUNTRY"
}

Using subreddit name + sort (no full URL required):

{
  "subreddit": "python",
  "sort": "top",
  "maxPostsPerUrl": 25,
  "language": "en",
  "proxyCountry": "AUTO_SELECT_PROXY_COUNTRY"
}

Field	Type	Description
`productUrls`	array	Reddit URLs (string list). Subreddit listing or post URLs (same naming as Amazon/eBay scrapers in this repo).
`startUrls`	array	Alternate: strings or `{ "url" }` request-list style.
`url`	string	Legacy single URL.
`subreddit`	string	Subreddit name without `r/` (used if no URLs given).
`sort`	string	`hot`, `new`, `top`, `rising`, `controversial` — used with `subreddit` only.
`maxPostsPerUrl`	integer	Max posts per URL (1–100, default 25).
`language`	string	Request locale hint (`en`, `de`, …).
`proxyCountry`	string	Apify proxy country when using platform proxy (`US`, `GB`, … or `AUTO_SELECT_PROXY_COUNTRY`).
`proxyUrl`	string	Optional custom proxy URL; overrides Apify proxy when set.

Note: There is no scrapeComments or maxPosts field in this actor — use maxPostsPerUrl. Reddit does not support arbitrary “search all Reddit” through this input; point at specific subreddit feeds or post URLs.

Output

Results are stored in the Actor’s default dataset. The Output tab and API use .actor/output_schema.json with an overview table view (title, author, subreddit, score, comments, links). Full field definitions: .actor/dataset_schema.json.

Field	Description
`url`	Full post URL
`permalink`	Permalink path / URL
`source_url`	Input listing or post URL used
`id`	Reddit post id
`title`	Post title
`author`	Author username
`subreddit`	Subreddit name
`subreddit_name_prefixed`	e.g. `r/python`
`score`	Net score
`upvote_ratio`	Upvote ratio (0–1)
`num_comments`	Number of comments
`selftext`	Text body (truncated at 50k chars in code)
`selftext_html`	HTML body when present
`link_url`	Linked URL for link posts
`is_self`	Text post flag
`over_18`	NSFW flag
`link_flair_text`	Flair
`thumbnail`	Thumbnail URL
`domain`	Link domain
`created_utc`	Created time (ISO)
`scraped_at`	Scrape time (ISO)

Example item (illustrative):

{
  "url": "https://www.reddit.com/r/marketing/comments/abc123/example/",
  "permalink": "https://www.reddit.com/r/marketing/comments/abc123/example/",
  "source_url": "https://www.reddit.com/r/marketing/hot",
  "id": "abc123",
  "title": "Example Reddit post",
  "author": "example_user",
  "subreddit": "marketing",
  "subreddit_name_prefixed": "r/marketing",
  "score": 512,
  "upvote_ratio": 0.96,
  "num_comments": 74,
  "selftext": "Post body…",
  "link_url": null,
  "is_self": true,
  "over_18": false,
  "created_utc": "2026-03-15T12:00:00+00:00",
  "scraped_at": "2026-03-30T12:05:00+00:00"
}

How the scraper works

Collects URLs from productUrls, startUrls, url, or subreddit + sort.
Builds Reddit .json URLs with a limit capped by maxPostsPerUrl (max 100).
Requests JSON with a descriptive User-Agent (CloudBots Reddit Post Scraper/1.0 (Apify; +https://apify.com)), optional proxy, and ~2s delay between URLs.
Parses listing responses into post objects, or a single post from a post/comments JSON payload.
Pushes one dataset item per post.

Anti-blocking and reliability

Use residential proxy on Apify (actor attempts Apify Proxy with RESIDENTIAL group when credentials exist).
proxyUrl for your own provider if needed.
Reddit may return 403/429 without proxy; results may be empty—check logs and proxy settings.

Integrations and API

Apify API — runs and datasets.
Python / Node.js — Apify clients.
Zapier / Make / Google Sheets — via Apify connectors or HTTP.

FAQ

How can this scraper help find trending topics?
Use top, hot, or rising URLs or sort values on niche subreddits, then rank by score and num_comments in your own analysis.

Can I use it for content ideation?
Yes—high score / num_comments posts plus titles and selftext show what resonates in a community.

Does it extract full comment discussions?
No. Only num_comments at the post level. Comment bodies and nested replies are not included. Extend the actor or use another tool if you need threads.

Sentiment and AI datasets?
title and selftext are suitable for NLP; add comment scraping separately if you need conversational sentiment.

Monitor a subreddit over time?
Schedule runs with the same input and compare exports.

Data freshness?
Data reflects Reddit’s JSON at request time; content changes—refresh on a schedule for monitoring.

Integrations?
Export or API into Sheets, warehouses, or automation tools.

How much data?
Up to 100 posts per URL per request, times the number of URLs in the run. Plan limits apply on Apify.

Is scraping Reddit legal?
Use only in line with Reddit’s terms and applicable law; respect privacy and rate limits.

SEO keywords (high-intent)

reddit scraper, reddit post scraper, reddit comment scraper, scrape reddit data, subreddit scraper, reddit scraping api, reddit trend analysis, reddit sentiment analysis, reddit data extraction, reddit automation tool

Actor permissions

This actor reads input and writes to its dataset only. In Apify Console you can use limited permissions for Store trust (Settings → Permissions).

Limitations

No comment bodies or nested replies in the current implementation.
No global Reddit “search query” input—use specific subreddit/listing or post URLs.
Some subs or posts may be private, removed, or geo-blocked; JSON may be empty.
selftext / selftext_html truncated at 50,000 characters in code.
Reddit JSON shape can change; the actor may need updates.

License

MIT.

Get started

Configure URLs or subreddit + sort, enable proxy on Apify, run the actor, and export your post dataset.

Reddit Api Scraper

scrapier/reddit-api-scraper

Extract structured data from Reddit quickly and reliably with the Reddit API Scraper. Collect posts, comments, subreddit details, user profiles, upvotes, timestamps, and more using the official API. Ideal for research, monitoring trends, sentiment analysis, and building Reddit datasets at scale.

Scrapier

Reddit Scraper

janbruinier/jan-reddit-scraper

Scrape posts and comments from Reddit

Jan Bruinier

Reddit Api Scraper

api-empire/reddit-api-scraper

Extract Reddit data efficiently using the Reddit API Scraper. Collect posts, comments, authors, upvotes, subreddit names, and timestamps through the Reddit API. Ideal for market research, sentiment analysis, community monitoring, and trend discovery.

API Empire

Reddit Api Scraper

scrapio/reddit-api-scraper

Extract structured Reddit data with the Reddit API Scraper. Collect posts, comments, usernames, upvotes, subreddit names, and timestamps directly through the Reddit API. Ideal for market research, sentiment analysis, and community monitoring.

Scrapio

Reddit Scraper

scrapapi/reddit-scraper

Extract posts, comments, and user data from Reddit with the Reddit Scraper. Collect post titles, descriptions, upvotes, comment counts, subreddit names, and author usernames automatically. Ideal for market research, trend discovery, and community analysis.

ScrapAPI

Reddit Scraper

api-empire/reddit-scraper

Extract posts, comments, and user data from Reddit using the Reddit Scraper. Collect post titles, descriptions, upvotes, comment counts, subreddit names, and author details automatically. Ideal for market research, trend analysis, and community insights.

API Empire

Reddit Scraper

gio21/reddit-scraper

Scrape Reddit posts and comments from any subreddit. Extract titles, scores, authors, comments, and more using Reddit's public JSON API.

Gio

5.0

Reddit Posts Scraper

scrapemesh/reddit-posts-scraper

🧰 Reddit Posts Scraper extracts Reddit post data by subreddit, keyword, or URL—titles, authors, flairs, scores, upvotes, comments, timestamps, links & media. 📊 Export CSV/JSON. 🔎 Perfect for trend tracking, sentiment analysis, content research & social listening. 🚀