Pricing

Pay per usage

Reddit Thread Scraper

Extract posts and comments from any subreddit via Reddit's public JSON API. Filter by sort order (hot, new, top, rising), time range, and optionally include full comment threads. Perfect for AI training data, sentiment analysis, and market research.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Sheshinmcfly

Actor stats

Bookmarked

Total users

Monthly active users

a day ago

Last modified

What data does it extract?

Posts

Field	Description	Example
`type`	Record type	`"post"`
`id`	Reddit post ID	`"1sa4rlx"`
`subreddit`	Subreddit name	`"MachineLearning"`
`title`	Post title	`"New paper on LLM reasoning"`
`author`	Username	`"researcher123"`
`score`	Upvotes - downvotes (top mode)	`1420`
`numComments`	Total comment count (top mode)	`83`
`selftext`	Post body text	`"We propose a new..."`
`url`	Link URL	`"https://arxiv.org/..."`
`permalink`	Reddit post URL	`"https://reddit.com/r/..."`
`flair`	Post flair label (top mode)	`"Research"`
`createdAt`	Post creation/update time	`"2026-04-21T10:00:00Z"`
`extractedAt`	Extraction timestamp	`"2026-04-21T12:00:00Z"`

Comments (top mode only)

Field	Description	Example
`type`	Record type	`"comment"`
`id`	Comment ID	`"abc123"`
`postId`	Parent post ID	`"1sa4rlx"`
`author`	Username	`"user456"`
`body`	Comment text	`"Great work, but..."`
`score`	Upvotes - downvotes	`342`
`depth`	Nesting level (0 = top-level)	`0`
`permalink`	Direct link to comment	`"https://reddit.com/..."`
`createdAt`	Comment creation time	`"2026-04-21T10:05:00Z"`

Use cases

AI training data: Clean text from expert communities for LLM fine-tuning
Sentiment analysis: Monitor brand mentions and user opinions in real time
Market research: Track trends and discussions in niche communities
Competitive intelligence: See what problems users are discussing
RAG pipelines: Feed domain-specific knowledge into retrieval systems
Content research: Find all-time top-performing posts for content strategy

How to use

Open the actor and configure:
- Subreddits: List subreddit names (e.g. MachineLearning, investing, Python)
- Sort: top (default) → best posts with real score, upvote ratio and comment counts. hot/new/rising → real-time posts via RSS, but without engagement metrics (score and comment counts are 0 — a Reddit limitation for non-authenticated access)
- Max posts: Cap per subreddit (max 25 for RSS modes)
- Include comments: Fetch top comments per post (available in top mode)
Click Start
Download results as JSON, CSV, or Excel

Agent-ready via x402: AI agents can run this actor directly with USDC on Base — no Apify account needed. See x402 protocol docs.

Input parameters

Parameter	Type	Default	Description
`subreddits`	string[]	`["MachineLearning"]`	Subreddit names to scrape (without the `r/` prefix)
`sort`	string	`"top"`	`top` = archive by score (real metrics); `hot`/`new`/`rising` = real-time RSS (score & comment counts are 0)
`timeFilter`	string	`"week"`	Time range for `top` sort: `hour`, `day`, `week`, `month`, `year`, `all`
`maxPostsPerSubreddit`	integer	`25`	Max posts to extract per subreddit
`includeComments`	boolean	`true`	Also extract top comments (only in `top` mode)
`maxCommentsPerPost`	integer	`10`	Max top-level comments per post

Example output (JSON)

[
  {
    "type": "post",
    "id": "1sa4rlx",
    "subreddit": "MachineLearning",
    "title": "[D] New method achieves SOTA on reasoning benchmarks",
    "author": "ml_researcher",
    "score": 1420,
    "upvoteRatio": 0.97,
    "numComments": 83,
    "selftext": "We introduce a novel approach...",
    "url": "https://arxiv.org/abs/2504.12345",
    "permalink": "https://www.reddit.com/r/MachineLearning/comments/1sa4rlx/",
    "flair": "Research",
    "createdAt": "2026-04-21T10:00:00.000Z",
    "extractedAt": "2026-04-21T12:00:00.000Z"
  },
  {
    "type": "comment",
    "id": "kxyz789",
    "postId": "1sa4rlx",
    "subreddit": "MachineLearning",
    "author": "deep_learner",
    "body": "Impressive results. Did you test on out-of-distribution benchmarks?",
    "score": 342,
    "depth": 0,
    "permalink": "https://www.reddit.com/r/MachineLearning/comments/1sa4rlx/comment/kxyz789/",
    "createdAt": "2026-04-21T10:05:00.000Z",
    "extractedAt": "2026-04-21T12:00:00.000Z"
  }
]

Performance

Mode	Data source	Freshness	Comments	Max posts/sub
`hot` / `new` / `rising`	Reddit RSS Atom	Real-time	No	25
`top`	PullPush archive	Historical (all-time best)	Yes	500

Note: The top mode uses the PullPush public archive, which indexes Reddit's historical content. Posts returned are ranked by all-time score — ideal for AI training and research. The hot/new/rising modes fetch live data from Reddit's public RSS feeds.

Pricing

This actor charges $0.002 USD per item extracted (posts and comments each count as one item). Extracting 100 posts with 10 comments each = 1,100 items ≈ $2.20 USD.

Stack Overflow Scraper — questions, answers, and tags from Stack Overflow.
ArXiv Paper Scraper — research papers with citations and TLDR summaries.
Trustpilot Reviews Scraper — business reviews and ratings.
MercadoLibre Scraper — product listings and prices from MercadoLibre.

FAQ

Do I need a Reddit account or API key? No. This actor uses Reddit's public RSS feeds and the PullPush public archive — no login, API key, or credentials required.

What's the difference between hot/new and top? hot, new, and rising fetch live posts from Reddit's RSS feeds (data from today, up to 25 posts per subreddit). top fetches all-time highest-scored posts from the PullPush historical archive (with comments, up to 500 posts per subreddit).

Why doesn't top return posts from this week? top mode uses the PullPush archive, which indexes historical Reddit content. For recent posts, use hot or new.

Can I scrape private or NSFW subreddits? No. This actor only accesses publicly available posts visible to anonymous visitors. Private, quarantined, and restricted subreddits are not accessible.

What export formats are available? JSON, CSV, and Excel — download from the run's dataset or pull via the Apify API.

Keywords

reddit scraper, subreddit posts extractor, reddit comments scraper, reddit data for AI, reddit sentiment analysis, reddit thread extractor, social media scraper, reddit RSS scraper, NLP training data, reddit market research

Legal Disclaimer

This actor extracts publicly available data only from Reddit and the PullPush public archive, in compliance with Chilean Law 19.628 on the Protection of Private Life (Ley 19.628 sobre Protección de la Vida Privada).

What this actor does NOT collect:

Private messages or non-public posts
Email addresses or personal contact information
Data from private or restricted subreddits
Any data not freely visible to anonymous visitors

What this actor collects:

Post titles, body text, and metadata (public content)
Publicly visible usernames and comment text
Engagement metrics (score, upvotes, comment counts)

All data is publicly accessible without authentication. Users are solely responsible for ensuring their use of this data complies with applicable laws and Reddit's terms of service.

Changelog

v1.1 (2026-07-01) — Switched data sources: RSS Atom feeds for real-time hot/new/rising posts; PullPush archive for top mode with full comment support. No authentication, no proxies required — higher reliability at lower cost.
v1.0 — Initial release using Reddit's official JSON API.

Reddit Scraper – Subreddit Posts & Comments

shuicici/reddit-scraper

Extract posts, comments, upvotes from any public subreddit. Filter by time, sort order, and keywords. Perfect for market research, sentiment tracking, and AI training data. JSON/CSV output.

Clara

Reddit Subreddit Posts Scraper

xtracto/reddit-subreddit-posts-scraper

Get posts from any public subreddit by sort (hot/new/top/rising/controversial) and time filter. Bulk-paginated.

Farhan Febrian Nauval

Reddit Search Scraper

crw/reddit-search-scraper

Search Reddit and extract structured data across all content types — posts, comments, communities, media, and people. No login required. Filter by time range and sort order. Perfect for market research, sentiment analysis, community discovery, and AI training datasets.

CRW

Reddit Subreddit Scraper — Posts, Scores & Comment Counts

maged120/reddit-subreddit

Scrape posts from any Reddit subreddit. Get titles, scores, comment counts, authors, timestamps, and links. Supports hot, new, top, and rising sort orders.

Maged

Reddit Search Scraper Pro

prince.sh/reddit-search-scraper

Search Reddit for posts and comments by keyword. Get titles, body, upvotes, comment threads, flair, and metadata. Filter by sort order, time range, and minimum upvotes. Perfect for sentiment analysis, AI training data, brand monitoring, and trend research.

Prince Jain

Reddit Subreddit Scraper

myagizm/reddit-subreddit-scraper

Scrape posts from any subreddit as structured JSON — new/hot/top/rising, with text and media. No login, no API key.

MYM

👽 Reddit Scraper — Posts, Comments & Search

inexhaustible_glass/reddit-scraper

Scrape Reddit posts, comments, search results & user activity. No login, no API key. Subreddit hot/new/top/rising, keyword search, full comment trees. Auto-paginated. For market research, lead monitoring, brand sentiment & content ideas.

Hitman studio

Reddit Scraper

dami_studio/reddit-scraper

Scrape Reddit posts and comments from any subreddit or user profile by top/hot/new/rising over any time window. Returns full post data plus clean, TTS-ready text for research and social listening.

Dami's Studio

5.0

Reddit Scraper - Extract Posts, Comments & Subreddit Data

kayhermes/reddit-scraper

Scrape Reddit posts, comments, and metadata from any subreddit. Extract post titles, scores, comments, authors, and more for market research, trend analysis, and AI training data.

Khoa Nguyen

Reddit Subreddit Scraper

mranderson323/reddit-subreddit-scraper

Scrapes posts and comments from any subreddit using Reddit's public JSON API. Filter by listing type, time range, keywords, and score. No API key needed.