Pricing

from $3.00 / 1,000 results

🗄️ Reddit Archive Scraper - Years of Posts & Comments

Reddit Archive Scraper to extract years of historical Reddit posts and comments from the PullPush archive. Reddit's API caps subreddits at ~1000 posts; this Actor pulls months or years from many subreddits by date range and keyword. For historical backfill, research and AI datasets.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Ben

Actor stats

Bookmarked

Total users

Monthly active users

12 days ago

Last modified

🗄️ Reddit Archive Scraper — Years of Historical Posts & Comments

Pull months or years of historical Reddit posts and comments from one or many subreddits, filtered by date range and keyword. Reddit's official API hard-caps any listing at ~1000 posts — only a few weeks for an active subreddit — so this Actor reads from the public PullPush and Arctic Shift archives to reach the deep history no live API can return. It automatically fails over when one archive is unavailable. Export to JSON/CSV/Excel, run on a schedule, call via API, or connect to Make, Zapier or n8n.

📚 What is the Reddit Archive Scraper?

It turns any subreddit or keyword into a structured historical dataset. Give it a list of subreddits (and/or a keyword), an optional afterDate/beforeDate window and a maxPosts cap, and it pages backward through time from the archive, returning every matching post — and, optionally, its archived comments. Each row carries a type field (post or comment) so you can split or join them effortlessly. Ideal for backfilling a database, building sentiment and AI/RAG datasets, or studying how a topic was discussed over years.

What data does it extract?

Posts (type: "post"):

id, title, selftext (body), author, subreddit
score, upvote_ratio, num_comments
created_utc (epoch) and created_iso (ISO timestamp)
permalink, url, domain, link_flair_text
Flags: is_self, is_video, over_18, locked, stickied, spoiler, total_awards_received

Comments (type: "comment", optional):

id, post_id, parent_id, link_id, body, author, subreddit
score, created_utc, created_iso, permalink
is_submitter, total_awards_received

⬇️ Input

Give it at least one subreddit or a keyword, then scope the window and size:

Field	Description
`subreddits`	One or more subreddits (without r/). Leave empty to search all of Reddit by keyword
`searchQuery`	Optional keyword — combine with subreddits, or use alone across all of Reddit
`afterDate`	Earliest date `YYYY-MM-DD` (lower bound)
`beforeDate`	Latest date `YYYY-MM-DD` (start point / upper bound)
`maxPosts`	Max posts across all subreddits (1–500000)
`includeComments`	Also fetch archived comments per post (raises result count and cost)
`maxCommentsPerPost`	Cap comments fetched per post (only when comments are on)

Example input

{
  "subreddits": ["FragranceClones"],
  "searchQuery": "dupe",
  "afterDate": "2024-01-01",
  "beforeDate": "2025-01-01",
  "maxPosts": 5000,
  "includeComments": true,
  "maxCommentsPerPost": 50
}

⬆️ Output

Every post and comment is one clean row (view as a table, or export JSON / CSV / Excel):

{
  "type": "post",
  "id": "1d8bw4c",
  "title": "Best clone of Cool Water?",
  "selftext": "Looking for an affordable alternative...",
  "author": "someuser",
  "subreddit": "FragranceClones",
  "score": 14,
  "upvote_ratio": 0.93,
  "num_comments": 8,
  "created_utc": 1717322040,
  "created_iso": "2024-06-02T10:14:00+00:00",
  "permalink": "https://www.reddit.com/r/FragranceClones/comments/1d8bw4c/...",
  "url": "https://www.reddit.com/r/FragranceClones/comments/1d8bw4c/...",
  "domain": "self.FragranceClones",
  "link_flair_text": "Discussion",
  "is_self": true,
  "over_18": false,
  "total_awards_received": 0
}

💡 Use cases

🗃️ Historical backfill: seed a database with years of a subreddit's posts and comments in a single run.
🤖 AI / RAG training data: build large, topic-specific historical corpora for fine-tuning or retrieval.
📈 Research & sentiment datasets: analyse how opinions and trends shifted across long time spans.
🔎 Brand & product monitoring: see everything said about a brand, product or keyword over the years.

💰 Cost tips

Pricing is pay-per-result, so you're charged per post or comment returned. Comments are usually the bulk of the count — leave includeComments off if you only need posts, or set maxCommentsPerPost to cap them. Use afterDate/beforeDate to scope exactly the window you need and avoid pulling more than you'll use.

❓ FAQ

How do I scrape historical Reddit data? Enter one or more subreddits (and/or a searchQuery), optionally set a date window, then Run. It pages backward through the archive and returns structured posts (and comments if enabled).

Why not just use Reddit's official API? Reddit's API hard-caps any listing at ~1000 posts — a few weeks for a busy subreddit. The archive lets you reach months or years of history that the live API simply won't return.

Where does the data come from? The public PullPush and Arctic Shift archives. The Actor automatically selects a healthy backend; coverage and freshness depend on those third-party services.

Can I search all of Reddit by keyword? Yes — leave subreddits empty and set searchQuery to search across all of Reddit, or combine both to filter a keyword within specific subreddits.

Can I get comments too? Yes — turn on includeComments. Each post's archived comments are returned as separate rows with type: "comment"; use maxCommentsPerPost to cap them.

How do I separate posts from comments? Every row has a type field set to post or comment, plus post_id/link_id on comments to join them back to their thread.

Do I need an API key or login? No. It reads the public archive — no Reddit account, app credentials or API key required.

How far back does it go? As far as the archive holds for that subreddit or keyword; use afterDate to set the earliest date you want.

Can I run it on a schedule or via API? Yes — schedule recurring runs in Apify, call it via the API/SDK, or connect it to Make, Zapier or n8n.

Is scraping Reddit data legal? It reads publicly archived data. Use it for lawful purposes and follow Reddit's and each archive provider's terms.

🔗 You might also like

Reddit Scraper — live posts, comments & AI-ready markdown
Hacker News Intelligence — HN stories & comment threads
OpenAlex Scraper — academic papers & citations
arXiv Scraper — scientific papers, abstracts & PDFs

Keywords: Reddit scraper, Reddit archive, historical Reddit data, PullPush, Arctic Shift, Pushshift alternative, Reddit posts scraper, Reddit comments scraper, subreddit scraper, Reddit API alternative, sentiment dataset, RAG training data, social media research, Reddit data export, keyword search Reddit.

Reddit Scraper — Posts & Comments | from $1.50/1K

bovi/reddit-scraper

Scrape Reddit posts, comments, and user activity from any public subreddit. Returns 25+ fields: score, upvote ratio, flair, author, timestamps, parse_confidence. No API key needed — backed by Arctic Shift archive with unlimited historical depth. MCP-callable.

Vitalii Bondarev

Reddit Scraper

prodiger/reddit-scraper

Extract posts, comments, user profiles, and search results from Reddit. Pure HTTP, no API key required.

Arnas

246

Reddit Scraper | Enterprise Grade

fatihtahta/reddit-scraper-search-fast

Extract Reddit posts and full comment threads from searches, subreddits, user pages, and direct post URLs. Built for enterprise-grade speed, richest-in-class data coverage, advanced filtering, and clean JSON for market intelligence, sentiment analysis and analytics.

Fatih Tahta

4.6K

4.4

Fast Reddit Scraper ($2/1k): Cheap & Bulk Data

practicaltools/apify-reddit-api

The most affordable Reddit scraper on the store. 60% cheaper than the competition. Perfect for training AI models, bulk historical data, and sentiment analysis.

Practical Tools

1.3K

5.0

Reddit Scraper

automation-lab/reddit-scraper

Scrape public Reddit search results and subreddit listings, with posts, comments, and profiles available on a best-effort basis. No Reddit account or API key required.

Stas Persiianenko

2.8K

4.7

Reddit API Scraper

comchat/reddit-api-scraper

Reddit Scraper is a powerful tool that allows you to extract data from Reddit such as posts by keyword. With Reddit Scraper, you can easily gather valuable information from Reddit without the need to log in. You can easily use this Reddit scraper as an alternative API.

Comchat

3.2

⭐️ FREE Reddit Scraper Pro

spry_wholemeal/reddit-scraper

Free Reddit scraper that does what the paid ones do but better. No API keys needed, no usage fees. Pairs with ready-made n8n workflow templates for lead gen and content research.

Greg

1.1K

5.0

Reddit Scraper

macrocosmos/reddit-scraper

Scrape Reddit data, via URL, subreddit, keyword, username.

Macrocosmos

896

5.0

Reddit Scraper For Posts & Comments

creative_tablecloth/reddit-scraper-for-posts

Access Reddit data freely without authentication. Quickly extract detailed information from Reddit posts and comments, both efficiently and cost-effectively. (approx $0.015 for 1,000 results)

Jinny Kim

460

5.0

Reddit Post Scraper

pratikdani/reddit-post-scraper

A Reddit post scraper, fetching data like titles, authors, content, and scores from specified subreddits or search queries. Delivers valuable insights from the Reddit hivemind for analysis and trend identification.