Reddit Data Scraper – Scrape Posts, Comments, Upvotes & More avatar

Reddit Data Scraper – Scrape Posts, Comments, Upvotes & More

Pricing

$5.00/month + usage

Go to Apify Store
Reddit Data Scraper – Scrape Posts, Comments, Upvotes & More

Reddit Data Scraper – Scrape Posts, Comments, Upvotes & More

Extract Reddit posts, comments, upvotes, and subreddit data with this powerful Reddit scraper. Ideal for data analysis, lead generation, trend research, and AI datasets. Scrape Reddit data at scale without API limits and export results in JSON, CSV, or Excel format.

Pricing

$5.00/month + usage

Rating

0.0

(0)

Developer

Sovanza

Sovanza

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Categories

Share

Reddit Post Scraper – Extract Posts, Comments, Upvotes & Data

Extract Reddit posts, scores (upvote / net score), comment counts, and subreddit metadata with this scraper—ideal for trend analysis, content research, and NLP-ready datasets. Export in JSON, CSV, or Excel from your Apify dataset.

Scope: This actor is post-level only. It does not download comment text, nested replies, or awards. You get num_comments and rich post fields (title, selftext, link, score, etc.). For full comment trees you would need a separate flow or actor extension.

What is Reddit Post Scraper?

Reddit Post Scraper is a Reddit data extraction tool built on Apify that reads Reddit’s public JSON (append .json to listing or post URLs). No browser and no official Reddit API key. It is designed for:

  • Marketers
  • Researchers
  • Data analysts
  • Content creators
  • AI developers

It turns subreddit listings and post URLs into a structured dataset for trends, engagement signals, and text analysis.

Why This Reddit Scraper is Powerful

  • Fast and lightweight: HTTP + JSON only (no Playwright).
  • Flexible input: Subreddit listing URLs, direct post URLs, productUrls (string list, same pattern as Amazon/eBay actors), startUrls, legacy url, or subreddit + sort.
  • Engagement fields: score, upvote_ratio, num_comments, plus title and body text for NLP.
  • Multi-URL runs: Scrape several listings or posts in one run.
  • Proxy-aware: Optional Apify residential proxy or custom proxyUrl—Reddit often blocks datacenter IPs.
  • Automation: Apify API, schedules, webhooks; export CSV/Excel or integrate with Sheets and BI tools.

➡️ Built for insight and datasets at the post level, not full comment-thread dumps.

What Data Does Reddit Post Scraper Extract?

Each dataset item is one post (see .actor/dataset_schema.json).

Post data

  • Title (title)
  • Body (selftext, optional selftext_html)
  • Post URL (url) and permalink
  • Subreddit (subreddit, subreddit_name_prefixed)
  • Post id
  • source_url (the listing or post URL used for the request)

Engagement metrics

  • score — net score (upvotes minus downvotes, as returned by Reddit)
  • upvote_ratio — upvote ratio when present
  • num_comments — comment count (not the comments themselves)

Author

  • author — post author username when present
  • link_url — outbound link for link posts
  • thumbnail — thumbnail URL when available
  • domain — link domain

Other metadata

  • is_self — text post vs link post
  • over_18 — NSFW flag
  • link_flair_text — flair when present
  • created_utc — post creation time (ISO, UTC)
  • scraped_at — scrape timestamp

Not included: comment bodies, nested replies, awards, or “search query” scraping across all of Reddit (use specific subreddit/listing URLs or post URLs).

➡️ All fields are structured and exportable as JSON, CSV, or Excel.

Advanced Features

  • Subreddit listings: Use URLs like https://www.reddit.com/r/python/hot or .../top, or pass subreddit + sort (hot, new, top, rising, controversial).
  • Direct posts: Open a single post URL; the actor normalizes it to the .json endpoint.
  • Volume control: maxPostsPerUrl — up to 100 posts per listing request (Reddit/API limit per call).
  • Locales: language sets Accept-Language on requests.
  • Bulk runs: Multiple URLs in productUrls or startUrls in one actor run.

How to Use Reddit Post Scraper on Apify

  1. Open Reddit Post Scraper on Apify.
  2. Under Run settings, enable Proxy where possible (this actor uses residential proxy when Apify Proxy is available).
  3. Configure Input (at least one URL source or subreddit).
  4. Click Start.
  5. Open Dataset → export JSON, CSV, or Excel, or use the API.

Local run

cd reddit-post-scraper
pip install -r requirements.txt
# Optional: INPUT.json
python main.py

Input configuration

Recommended on Apify: enable Apify Proxy (residential). Optionally set proxyUrl to your own HTTP proxy.

Example — multiple subreddit listings (use paths that include sort when you want something other than default hot, e.g. top):

{
"productUrls": [
"https://www.reddit.com/r/marketing/hot",
"https://www.reddit.com/r/startups/top"
],
"maxPostsPerUrl": 100,
"language": "en",
"proxyCountry": "US",
"proxyUrl": ""
}

Alternate — startUrls (strings or { "url" } objects):

{
"startUrls": [
{ "url": "https://www.reddit.com/r/marketing/hot" },
"https://www.reddit.com/r/startups/new"
],
"maxPostsPerUrl": 50,
"language": "en",
"proxyCountry": "AUTO_SELECT_PROXY_COUNTRY"
}

Using subreddit name + sort (no full URL required):

{
"subreddit": "python",
"sort": "top",
"maxPostsPerUrl": 25,
"language": "en",
"proxyCountry": "AUTO_SELECT_PROXY_COUNTRY"
}
FieldTypeDescription
productUrlsarrayReddit URLs (string list). Subreddit listing or post URLs (same naming as Amazon/eBay scrapers in this repo).
startUrlsarrayAlternate: strings or { "url" } request-list style.
urlstringLegacy single URL.
subredditstringSubreddit name without r/ (used if no URLs given).
sortstringhot, new, top, rising, controversial — used with subreddit only.
maxPostsPerUrlintegerMax posts per URL (1–100, default 25).
languagestringRequest locale hint (en, de, …).
proxyCountrystringApify proxy country when using platform proxy (US, GB, … or AUTO_SELECT_PROXY_COUNTRY).
proxyUrlstringOptional custom proxy URL; overrides Apify proxy when set.

Note: There is no scrapeComments or maxPosts field in this actor — use maxPostsPerUrl. Reddit does not support arbitrary “search all Reddit” through this input; point at specific subreddit feeds or post URLs.

Output

FieldDescription
urlFull post URL
permalinkPermalink path / URL
source_urlInput listing or post URL used
idReddit post id
titlePost title
authorAuthor username
subredditSubreddit name
subreddit_name_prefixede.g. r/python
scoreNet score
upvote_ratioUpvote ratio (0–1)
num_commentsNumber of comments
selftextText body (truncated at 50k chars in code)
selftext_htmlHTML body when present
link_urlLinked URL for link posts
is_selfText post flag
over_18NSFW flag
link_flair_textFlair
thumbnailThumbnail URL
domainLink domain
created_utcCreated time (ISO)
scraped_atScrape time (ISO)

Example item (illustrative):

{
"url": "https://www.reddit.com/r/marketing/comments/abc123/example/",
"permalink": "https://www.reddit.com/r/marketing/comments/abc123/example/",
"source_url": "https://www.reddit.com/r/marketing/hot",
"id": "abc123",
"title": "Example Reddit post",
"author": "example_user",
"subreddit": "marketing",
"subreddit_name_prefixed": "r/marketing",
"score": 512,
"upvote_ratio": 0.96,
"num_comments": 74,
"selftext": "Post body…",
"link_url": null,
"is_self": true,
"over_18": false,
"created_utc": "2026-03-15T12:00:00+00:00",
"scraped_at": "2026-03-30T12:05:00+00:00"
}

How the scraper works

  1. Collects URLs from productUrls, startUrls, url, or subreddit + sort.
  2. Builds Reddit .json URLs with a limit capped by maxPostsPerUrl (max 100).
  3. Requests JSON with a descriptive User-Agent (CloudBots Reddit Post Scraper/1.0 (Apify; +https://apify.com)), optional proxy, and ~2s delay between URLs.
  4. Parses listing responses into post objects, or a single post from a post/comments JSON payload.
  5. Pushes one dataset item per post.

Anti-blocking and reliability

  • Use residential proxy on Apify (actor attempts Apify Proxy with RESIDENTIAL group when credentials exist).
  • proxyUrl for your own provider if needed.
  • Reddit may return 403/429 without proxy; results may be empty—check logs and proxy settings.

Integrations and API

  • Apify API — runs and datasets.
  • Python / Node.js — Apify clients.
  • Zapier / Make / Google Sheets — via Apify connectors or HTTP.

FAQ

How can this scraper help find trending topics?
Use top, hot, or rising URLs or sort values on niche subreddits, then rank by score and num_comments in your own analysis.

Can I use it for content ideation?
Yes—high score / num_comments posts plus titles and selftext show what resonates in a community.

Does it extract full comment discussions?
No. Only num_comments at the post level. Comment bodies and nested replies are not included. Extend the actor or use another tool if you need threads.

Sentiment and AI datasets?
title and selftext are suitable for NLP; add comment scraping separately if you need conversational sentiment.

Monitor a subreddit over time?
Schedule runs with the same input and compare exports.

Data freshness?
Data reflects Reddit’s JSON at request time; content changes—refresh on a schedule for monitoring.

Integrations?
Export or API into Sheets, warehouses, or automation tools.

How much data?
Up to 100 posts per URL per request, times the number of URLs in the run. Plan limits apply on Apify.

Is scraping Reddit legal?
Use only in line with Reddit’s terms and applicable law; respect privacy and rate limits.

SEO keywords (high-intent)

reddit scraper, reddit post scraper, reddit comment scraper, scrape reddit data, subreddit scraper, reddit scraping api, reddit trend analysis, reddit sentiment analysis, reddit data extraction, reddit automation tool

Actor permissions

This actor reads input and writes to its dataset only. In Apify Console you can use limited permissions for Store trust (Settings → Permissions).

Limitations

  • No comment bodies or nested replies in the current implementation.
  • No global Reddit “search query” input—use specific subreddit/listing or post URLs.
  • Some subs or posts may be private, removed, or geo-blocked; JSON may be empty.
  • selftext / selftext_html truncated at 50,000 characters in code.
  • Reddit JSON shape can change; the actor may need updates.

License

MIT.

Get started

Configure URLs or subreddit + sort, enable proxy on Apify, run the actor, and export your post dataset.