Pricing

from $1.50 / 1,000 results

Reddit Historical Archive Scraper - Old Posts by Date

Pushshift alternative to scrape old Reddit posts and comments without an API key. Full-text comment search, user history, export to CSV/JSON.

Pricing

from $1.50 / 1,000 results

Rating

0.0

(0)

Developer

Logiover

Actor stats

Bookmarked

Total users

Monthly active users

13 hours ago

Last modified

Reddit Historical Archive Scraper — Old Posts & Comments by Date (No API)

Scrape years of old Reddit posts and comments by date — content that Reddit's own search and listings can no longer reach. This Reddit historical scraper queries the Arctic-Shift archive (a maintained Pushshift successor, indexed into 2026) with PullPush.io as a fallback, so you can pull deep history, full-text search comment bodies, reconstruct entire threads and dump complete user histories. Point it at subreddits, post IDs, usernames, search terms or raw Reddit URLs and get clean, flat rows. Fast, no browser, no Reddit login, no API key, no client secret, no proxy.

🏆 Why this Reddit historical scraper?

Posts + comments in flat rows · thousands of records per run · full-text comment search Reddit itself can't do · date-bounded windows going back 10+ years · export to JSON / CSV / Excel. The unofficial Reddit / Pushshift API alternative for archival, research, AI training corpora and brand monitoring.

✨ What this Actor does / Key features

🗓️ Deep historical reach — the archive backends index Reddit content from its early years to the present, so you can pull posts and comments many years old, well beyond Reddit's own search depth and listing limits.
🔎 Full-text comment search — search inside archived comment bodies, something Reddit's native search cannot do (it only matches post titles). Find every comment ever made mentioning a brand, term or phrase across Reddit history.
🧵 Full thread reconstruction — fetch a single post by ID together with all of its archived comments as a flat list with a reconstructed depth field.
👤 Complete user histories — pull a username's submitted posts and/or comments across their entire archived Reddit history.
📅 Precise date + score filters — narrow with afterDate / beforeDate (ISO YYYY-MM-DD) and a minScore threshold for time-bounded or high-engagement slices.
🔀 Flexible targeting — mix and match subreddits, post IDs, usernames, post search queries, comment search queries and raw Reddit URLs (auto-detected) in a single run.
🧱 Flat, tidy rows — every post or comment is one row with a type field (post or comment), ready for spreadsheets, warehouses and LLM pipelines.
🛡️ No-block backends — Arctic-Shift (primary) and PullPush (fallback) work over a direct connection with polite rate limiting, retries and exponential backoff. No login, OAuth, API key, client secret or proxy required (a proxy toggle exists for extra robustness).

🚀 Quick start (3 steps)

Configure — fill in any combination of Subreddits, Post IDs, Usernames, Post Search Queries, Comment Search Queries or raw Reddit URLs. Optionally add a date window, a minimum score, a sort and item caps.
Run — click Start. The Actor queries the archive, paginates each target and saves each post or comment as one flat row.
Get your data — open the Output tab and export to JSON, CSV, Excel, HTML, XML or JSONL, or pull it via the Apify API and webhooks.

📥 Input

Fill in at least one target (subreddit, post ID, username, search query or URL). Everything else is optional.

Example — a subreddit's top posts in a date window

{
  "subreddits": ["wallstreetbets"],
  "afterDate": "2021-01-01",
  "beforeDate": "2021-02-28",
  "sort": "top",
  "minScore": 100,
  "maxItems": 1000
}

Example — every comment mentioning a brand (full-text search)

{
  "commentSearchQueries": ["Notion"],
  "afterDate": "2024-01-01",
  "sort": "new",
  "maxItems": 2000
}

Example — a user's full post + comment history

{
  "usernames": ["spez"],
  "userContent": "overview",
  "maxItemsPerTarget": 500
}

Example — one post with its entire archived comment tree

{
  "postIds": ["1abc234"],
  "maxItems": 0
}

Field	Type	Description
`subreddits`	array	Subreddit names to scrape historically (no `/r/` prefix). Each runs as a listing scrape.
`postIds`	array	Reddit post IDs (the alphanumeric part in `/comments/XXXX/`). Each post is fetched with all of its archived comments.
`usernames`	array	Reddit usernames (no `/u/` prefix). Returns historical posts and/or comments per `userContent`.
`searchQueries`	array	Full-text search across archived posts (title + selftext), combinable with dates, subreddit and score filters.
`commentSearchQueries`	array	Full-text search across archived comment bodies — a unique feature Reddit's own search cannot do.
`startUrls`	array	Any Reddit URL — subreddit, post, user or search. Type is auto-detected; mix URL types freely.
`sort`	string	`new` (newest archived first) or `top` (highest score first). `hot`, `rising`, `controversial`, `best` are Reddit-algorithm-specific and fall back to `new`.
`afterDate` / `beforeDate`	string	ISO date bounds (`YYYY-MM-DD` or full ISO timestamp) for time-window queries.
`minScore`	integer	Only return posts/comments at or above this score (upvotes − downvotes).
`userContent`	string	For usernames: `overview` (posts + comments), `submitted` (posts only) or `comments` (comments only).
`maxItems`	integer	Global hard cap across all targets. `0` = unlimited.
`maxItemsPerTarget`	integer	Cap per subreddit / post / user / search so one big target doesn't eat the budget.
`requestDelayMs`	integer	Milliseconds between requests to stay under archive rate limits (default `1200`).
`maxRetries`	integer	Retry attempts with exponential backoff on 429 / 5xx / timeout (default `5`).
`proxyConfiguration`	object	Optional. Backends are no-block on direct connection; route through Apify Proxy for extra robustness.

Tip: to search comment bodies (impossible on Reddit itself), use commentSearchQueries. To pull a deep subreddit window, list the subreddit and set afterDate + beforeDate. To rebuild a full thread, pass its postIds — the comment tree comes back flat with a reconstructed depth.

📤 Output

Posts and comments come back as flat dataset rows, each with a type field (post or comment). Here is a trimmed sample — a post followed by a comment:

{
  "type": "post",
  "id": "l6omga",
  "fullname": "t3_l6omga",
  "subreddit": "wallstreetbets",
  "subredditNamePrefixed": "r/wallstreetbets",
  "author": "DeepFuckingValue",
  "title": "GME YOLO update — Jan 28 2021",
  "selftext": "…",
  "url": "https://www.reddit.com/r/wallstreetbets/comments/l6omga/…",
  "permalink": "/r/wallstreetbets/comments/l6omga/…",
  "domain": "self.wallstreetbets",
  "isSelf": true,
  "over18": false,
  "score": 234512,
  "upvoteRatio": 0.94,
  "numComments": 18422,
  "flairText": "YOLO",
  "createdUtc": "2021-01-28T14:03:11.000Z",
  "retrievedUtc": "2026-07-06T12:00:00.000Z"
}

{
  "type": "comment",
  "id": "gkz1abc",
  "fullname": "t1_gkz1abc",
  "parentId": "t3_l6omga",
  "linkId": "t3_l6omga",
  "subreddit": "wallstreetbets",
  "author": "diamond_hands_42",
  "body": "To the moon 🚀🚀",
  "score": 8123,
  "depth": 0,
  "controversial": 0,
  "gilded": 2,
  "permalink": "/r/wallstreetbets/comments/l6omga/_/gkz1abc/",
  "createdUtc": "2021-01-28T14:11:52.000Z",
  "edited": false,
  "distinguished": null
}

💡 Use cases

Historical research & archival — collect a subreddit's posts and comments going back years for longitudinal study of a community.
Academic & journalism work — pull date-bounded windows of Reddit discussion around an event, topic or brand.
AI / NLP training corpora — build domain-specific datasets from years of niche-subreddit text and comment threads.
Brand & reputation monitoring — full-text search every comment ever made mentioning your brand or product — something Reddit's own search cannot do.
Account & thread analysis — pull a user's entire post and comment history, or fetch a single post with its complete archived comment tree.
Sentiment & trend backfills — reconstruct how discussion of a topic evolved across a precise date range.

👥 Who uses it

Academic researchers & social scientists · data journalists · AI / NLP & ML teams building corpora · brand, PR & reputation analysts · quantitative & market researchers · community managers & moderators · OSINT and trend analysts.

💰 Pricing

This Actor runs on a simple pay-per-result model — you pay for the posts and comments you extract, with no separate Apify platform fees to calculate. Try it on the free tier first, then scale up. See the Pricing tab on this page for the current rate.

❓ Frequently Asked Questions

Is it legal to scrape Reddit history? The Actor collects publicly available archived Reddit content. You are responsible for using the data in compliance with Reddit's terms, the archive providers' terms and applicable laws such as GDPR.

Does Reddit (or Pushshift) have a public API for old data? Reddit's own API and search only reach shallow, recent content, and Pushshift is no longer openly available. This Actor instead queries the Arctic-Shift archive (a maintained Pushshift successor indexed into 2026) with PullPush.io as a fallback — no API key needed.

Is this a Pushshift alternative? Yes. It covers the same deep Reddit history Pushshift used to serve, via Arctic-Shift (primary) and PullPush (fallback), with no API key.

Do I need an API key, a login or a proxy? No. The scraper uses public archive APIs over a direct connection — no Reddit account, OAuth, API key, client secret or proxy required. A proxy toggle is available for extra robustness but is off by default.

Can I scrape Reddit without an API key or login? Yes. It reads public archive APIs directly, so no Reddit account, OAuth, API key, client secret or proxy is required to pull old posts, comments or full user history.

How far back can I scrape / how much data can I get? The archive backends index Reddit content from its early years to the present, so you can pull posts and comments many years old — thousands of records per run, well beyond Reddit's own search depth. Use maxItems and maxItemsPerTarget to control volume and cost.

Can I search inside comment bodies? Yes — use commentSearchQueries for full-text search across archived comments. Reddit's native search only matches post titles, so this finds every comment mentioning a term across Reddit history.

How do I export old Reddit posts and comments to CSV or JSON?

Run the scraper, then download the dataset as CSV, JSON, Excel, HTML, XML or JSONL straight from the Apify Console, or pull it through the API. Every post and comment is a flat row, so it imports cleanly into spreadsheets and databases.

How do I scrape a subreddit's posts from a specific year?

List the subreddit and set afterDate and beforeDate in YYYY-MM-DD format to pull a bounded historical window, far deeper than Reddit's own listing limits allow.

How do I download a Reddit user's full post and comment history?

Add the username to usernames and set userContent to overview to return both their submitted posts and comments across their entire archived Reddit history.

How do I find every Reddit comment mentioning a brand?

Use commentSearchQueries to run a full-text search across archived comment bodies — something Reddit's own search cannot do — and pull every comment mentioning your brand or product.

Building a wider social dataset? Pair this with the rest of the logiover social-media suite:

Platform	Actor
🔴 Reddit	Reddit Search Scraper · Reddit Subreddit Scraper
🐦 X / Twitter	X Tweet Scraper · Twitter/X Media Downloader
💼 LinkedIn	LinkedIn Profile Scraper · LinkedIn Company Scraper
▶️ YouTube	YouTube Channel Scraper · YouTube Comments Scraper
🎵 TikTok	TikTok Hashtag Video Scraper · TikTok Brand Mention Monitor
🟠 Hacker News	Hacker News Search Scraper
📸 Instagram	Instagram Media Downloader
🧵 Threads	Threads Scraper
📌 Pinterest	Pinterest Scraper
✉️ Substack	Substack Newsletter Scraper

👉 Browse all logiover scrapers on Apify Store — 180+ actors across real estate, jobs, crypto, social media & B2B data.

⏰ Scheduling & integration

Schedule this Actor on Apify to keep a fresh archive of a subreddit, topic or brand mention. Export results to JSON, CSV or Excel, sync to Google Sheets, or push to your database, BI tools and webhooks through the Apify API. Connect it to Make, n8n or Zapier to build automated monitoring and research pipelines.

⭐ Support & feedback

Found a bug or need an extra field? Open an issue on the Issues tab — response is usually fast. If this Actor saves you time, a ★★★★★ review on the Store page genuinely helps and is hugely appreciated. 🙏

⚖️ Legal

This Actor extracts only publicly available archived data and is intended for legitimate research, analytics and monitoring use. You are responsible for complying with Reddit's terms of service, the archive providers' terms, GDPR and any applicable local laws.

📝 Changelog

2026-07-06

✨ README overhaul: richer post + comment output samples, ready-to-run example scenarios, cross-promo links, and clearer quick-start.

2026-07-01

Maintenance pass: re-verified end-to-end on live data and confirmed successful runs within the 5-minute quality window on the default input.
Sharpened Store metadata (SEO title & description) and expanded the FAQ with high-intent, long-tail questions for easier discovery in Google and Apify Store search.
Added ready-to-run example tasks that cover common real-world use cases.

2026-06-15

Reliability pass: re-verified end-to-end on live data with real-world inputs. Routine maintenance build.

2026-06-07

Docs: added coverage for using the scraper as a Pushshift alternative, exporting old Reddit posts and comments to CSV/JSON, and scraping Reddit data without an API key or login.

2026-06-05

SEO and documentation refresh; metadata corrected to describe the Arctic-Shift primary backend with PullPush fallback (not PullPush alone).
Verified live and rebuilt.

🗄️ Reddit Archive Scraper - Years of Posts & Comments

benthepythondev/reddit-archive-scraper

Reddit Archive Scraper to extract years of historical Reddit posts and comments from the PullPush archive. Reddit's API caps subreddits at ~1000 posts; this Actor pulls months or years from many subreddits by date range and keyword. For historical backfill, research and AI datasets.

Ben

Reddit Scraper - Posts, Comments, Users & Search

benthepythondev/reddit-scraper

Scrape public Reddit posts, comments, user activity, subreddits, and search results through Reddit's app-only OAuth API. Export nested threads, scores, media, Markdown, and token counts to JSON, CSV, or Excel. No Reddit login or user-supplied API key required.

Ben

136

5.0

Reddit Scraper — Posts & Comments | from $1.50/1K

bovi/reddit-scraper

Scrape Reddit posts, comments, and user activity from any public subreddit. Returns 25+ fields: score, upvote ratio, flair, author, timestamps, parse_confidence. No API key needed — backed by Arctic Shift archive with unlimited historical depth. MCP-callable.

Vitalii Bondarev

Reddit Scraper - Posts, Comments & Users

betterdevsscrape/reddit-scraper

Extract posts, comments, communities & user profiles from any subreddit at scale. Fetches all comments including hidden/collapsed ones. Breaks Reddit's 1000-post limit with date windowing. No login needed, no browser. $0.003 per result. Supports search, sorting, NSFW filtering & date filtering.

Better Devs Scrape

Fast Reddit Scraper ($2/1k): Cheap & Bulk Data

practicaltools/apify-reddit-api

The most affordable Reddit scraper on the store. 60% cheaper than the competition. Perfect for training AI models, bulk historical data, and sentiment analysis.

Practical Tools

1.1K

5.0

Reddit Subreddit Scraper

rambunctious_fingerprint/reddit-subreddit-scraper

Extract Reddit posts from any subreddit without an API key. Get titles, scores, authors, comment counts, flairs, and URLs from old.reddit.com.

Casey Marsh

Reddit Scraper | Enterprise Grade

fatihtahta/reddit-scraper-search-fast

Extract Reddit posts and full comment threads from searches, subreddits, user pages, and direct post URLs. Built for enterprise-grade speed, richest-in-class data coverage, advanced filtering, and clean JSON for market intelligence, sentiment analysis and analytics.