Reddit Archive Scraper
Pricing
Pay per usage
Reddit Archive Scraper
Reddit Archive Scraper to extract years of historical Reddit posts and comments from the PullPush archive. Reddit's API caps subreddits at ~1000 posts; this Actor pulls months or years from many subreddits by date range and keyword. For historical backfill, research and AI datasets.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
ben
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 hours ago
Last modified
Categories
Share
Reddit Archive Scraper — Historical Posts & Comments (Years of Data)
Pull MONTHS or YEARS of historical Reddit posts and comments from one or many subreddits — by date range and keyword.
This Actor uses the PullPush archive (the public Pushshift successor) to reach data that Reddit's own API simply won't return.
Why this exists
Reddit's official API hard-caps any subreddit listing at ~1000 posts — for an active subreddit that's only a few weeks of history. There is no way around that cap with the official API, in any tool.
This Actor solves that: it reads from the historical archive, so you can backfill a full year (or several) across multiple subreddits in one job.
Need live, up-to-the-minute posts and full threaded comment trees instead? Use the companion Reddit Scraper (official API) for fresh data, and this Archive Scraper for deep history. They pair well: archive for backfill, live scraper for ongoing updates.
What you get
Posts: title, selftext (body), author, subreddit, score, upvote_ratio, num_comments, created date (epoch + ISO), permalink, url, domain, flair, is_self/is_video/over_18/locked/stickied/spoiler, awards.
Comments (optional): body, author, subreddit, score, parent_id, link_id, post_id, created date, permalink, is_submitter.
Each row has a type field (post or comment) so you can split them easily.
Input
| Field | Type | Description |
|---|---|---|
subreddits | array | Subreddits to archive (without r/) |
searchQuery | string | Optional keyword filter (or search all of Reddit) |
afterDate | string | Earliest date YYYY-MM-DD (lower bound) |
beforeDate | string | Latest date YYYY-MM-DD (start point) |
maxPosts | integer | Max posts across all subreddits |
includeComments | boolean | Also fetch archived comments per post |
maxCommentsPerPost | integer | Cap comments per post |
Example: one year of a subreddit
{"subreddits": ["FragranceClones"],"afterDate": "2024-01-01","beforeDate": "2025-01-01","maxPosts": 10000,"includeComments": false}
Example: keyword across all of Reddit, posts + comments
{"searchQuery": "dupe","afterDate": "2024-06-01","maxPosts": 1000,"includeComments": true,"maxCommentsPerPost": 50}
Sample output (post)
{"type": "post","id": "1d8bw4c","title": "Best clone of Cool Water?","selftext": "Looking for an affordable alternative...","author": "someuser","subreddit": "fragranceclones","score": 14,"num_comments": 8,"created_iso": "2024-06-02T10:14:00+00:00","permalink": "https://www.reddit.com/r/fragranceclones/comments/1d8bw4c/..."}
Use cases
- Historical backfill — seed a database with years of a subreddit's content
- Research & sentiment datasets — analyse trends over long time spans
- AI / RAG training data — large historical corpora by topic
- Brand / product monitoring — see what was said about a topic over time
Cost tips
- Pay-per-result: you're charged per post/comment returned.
- Comments are the bulk of the count — keep
includeCommentsoff if you only need posts, or capmaxCommentsPerPost. - Use
afterDate/beforeDateto scope exactly the window you need.
Notes & legal
- Data comes from the public PullPush archive; coverage and freshness depend on that service. For the most recent posts, pair with the live Reddit Scraper.
- Use data only for lawful purposes and in line with Reddit's and PullPush's terms.
Related actors
More scrapers from the same author:
- Reddit Scraper — live posts, comments & AI-ready markdown
- OpenAlex Scraper — academic papers & citations
- PubMed Scraper — biomedical literature & citations
- arXiv Scraper — 2M+ scientific papers, abstracts & PDFs