Reddit Profile Crawler avatar
Reddit Profile Crawler

Pricing

$2.50 / 1,000 results

Go to Apify Store
Reddit Profile Crawler

Reddit Profile Crawler

Developed by

Crawler Bros

Crawler Bros

Maintained by Community

Scrape reddit's profiles with posts and profile information.

5.0 (3)

Pricing

$2.50 / 1,000 results

0

5

5

Last modified

11 hours ago

Reddit Profile Scraper

An Apify Actor for scraping posts and information from Reddit user profiles using browser automation with Playwright.

Features

  • 👤 Scrape multiple user profiles in a single run
  • 📊 Extract user information (karma, account age, etc.)
  • 📝 Scrape user's posts and comments
  • 🔄 Support for different profile sections (overview, submitted, comments, gilded)
  • 📈 Multiple sorting options (hot, new, top, controversial)
  • 🔄 Automatic pagination support
  • 📦 No authentication required for public profiles
  • 💾 Data saved in structured JSON format
  • 🌐 Browser automation bypasses API restrictions

Input Parameters

The actor accepts the following input parameters:

ParameterTypeRequiredDefaultDescription
usernamesarrayYes["spez"]List of Reddit usernames to scrape (without 'u/' prefix)
maxPostsintegerNo100Maximum number of posts to scrape from each user (1-1000)
sectionstringNo"submitted"Profile section: submitted, overview, gilded
sortstringNo"new"How to sort posts: hot, new, top, controversial

Example Input

{
"usernames": ["spez", "example_user"],
"maxPosts": 50,
"section": "submitted",
"sort": "top"
}

Output Fields

The actor extracts two types of data:

User Profile Information

  • username - Reddit username
  • post_karma - Total post/link karma
  • comment_karma - Total comment karma
  • total_karma - Combined karma score
  • account_created - ISO 8601 formatted account creation date

User Posts

Post Information

  • post_id - Unique post ID
  • post_name - Full post name in Reddit format (e.g., "t3_abc123")
  • title - Post title
  • author - Username (same as scraped user)
  • subreddit - Subreddit where posted (without r/ prefix)
  • subreddit_prefixed - Subreddit with r/ prefix

Engagement Metrics

  • score - Post score/karma (upvotes minus downvotes)
  • num_comments - Number of comments on the post
  • url - URL of the linked content (external URL or permalink)
  • permalink - Direct link to the Reddit post

Metadata

  • domain - Domain of the linked content
  • is_self_post - Boolean indicating if it's a text post
  • link_flair - Post flair/tag text

Timestamps

  • created_utc - Unix timestamp when posted
  • created_at - ISO 8601 formatted datetime

Flags

  • is_stickied - Boolean indicating if post is stickied
  • is_nsfw - Boolean indicating if post is NSFW

Example Output

User Info:

{
"username": "spez",
"post_karma": 155234,
"comment_karma": 892456,
"total_karma": 1047690,
"account_created": "2005-06-06T04:00:00+00:00"
}

User Post:

{
"post_id": "abc123",
"post_name": "t3_abc123",
"title": "Announcing new Reddit features",
"author": "spez",
"subreddit": "announcements",
"subreddit_prefixed": "r/announcements",
"score": 15234,
"num_comments": 1250,
"url": "https://old.reddit.com/r/announcements/comments/abc123/...",
"permalink": "https://old.reddit.com/r/announcements/comments/abc123/...",
"domain": "self.announcements",
"is_self_post": true,
"link_flair": "Admin Post",
"created_utc": 1728912645,
"created_at": "2025-10-14T12:30:45",
"is_stickied": true,
"is_nsfw": false
}

Usage

Local Development

  1. Install dependencies:

    pip install -r requirements.txt
    playwright install chromium
  2. Set up input in storage/key_value_stores/default/INPUT.json:

    {
    "usernames": ["spez"],
    "maxPosts": 50,
    "section": "submitted",
    "sort": "top"
    }
  3. Run the actor:

    $python -m src
  4. Check results in storage/datasets/default/

On Apify Platform

  1. Push to Apify:

    • Login to Apify CLI: apify login
    • Initialize: apify init (if not already done)
    • Push to Apify: apify push
  2. Or manually upload:

    • Create a new actor on Apify platform
    • Upload all files including Dockerfile, requirements.txt, and .actor/ directory
  3. Configure and run:

    • Set input parameters in the Apify console
    • Enter Reddit usernames
    • Select section and sort options
    • Click "Start" to run the actor
    • Download results from the dataset tab

Profile Sections

Overview (default)

Shows a mix of posts and comments from the user's profile. Note: The scraper extracts only posts from this section. Comments are skipped. Use section="submitted" for posts only or section="comments" if you need comments.

Submitted

Shows only posts (links and self posts) submitted by the user. This is the recommended section for extracting posts.

Comments

Shows only comments made by the user. Note: Currently, the scraper is optimized for posts. For comments, use the dedicated Reddit Comment Scraper actor.

Gilded

Shows posts and comments that received Reddit awards/gilding. Only posts will be extracted from this section.

Technical Details

Browser Automation

  • Uses Playwright with Chromium browser
  • Scrapes old.reddit.com/user/{username} for better compatibility
  • Implements anti-detection measures:
    • Custom User-Agent headers
    • Disabled automation flags
    • Browser fingerprint masking

Features

  • Automatic pagination: Clicks "next" button to load more posts
  • Smart extraction: Handles both posts and comments
  • User info extraction: Parses karma and account details
  • Error handling: Gracefully handles deleted accounts and private profiles

Performance

  • Headless browser mode for efficiency
  • Optimized page load strategy (domcontentloaded)
  • Configurable wait times and timeouts
  • Sequential processing with delays between users

Limitations

  • Only works with public user profiles
  • Cannot scrape private/suspended accounts
  • Browser automation is slower than direct API calls but more reliable
  • Maximum 1000 posts per user (configurable)
  • Hidden karma scores may show as 0

Dependencies

  • apify>=2.1.0 - Apify SDK for Python
  • playwright~=1.40.0 - Browser automation framework
  • beautifulsoup4~=4.12.0 - HTML parsing library

Troubleshooting

Timeout Issues

If you encounter timeout errors:

  • Check if the username is correct (without 'u/' prefix)
  • Verify the user profile is public
  • Increase timeout values if needed

No Posts Found

If no posts are returned:

  • User may have no posts in the selected section
  • User account might be suspended or deleted
  • Try different sections (overview, submitted, comments)

Private/Suspended Accounts

  • Private profiles cannot be scraped
  • Suspended accounts show no content
  • This is normal Reddit behavior

Use Cases

  • User Activity Analysis: Track user posting patterns and engagement
  • Content Research: Study what type of content users post
  • Karma Analysis: Analyze karma distribution across subreddits
  • Profile Monitoring: Monitor specific user accounts
  • Data Collection: Build datasets for research or analysis

License

This actor is provided as-is for scraping public Reddit data in accordance with Reddit's terms of service.

Notes

  • This scraper uses browser automation to access Reddit's public web interface
  • Always respect Reddit's robots.txt and terms of service
  • Use responsibly and avoid overwhelming Reddit's servers
  • Consider implementing additional rate limiting for large-scale scraping
  • The actor works best with the Apify platform's infrastructure
  • Users with extensive post history may take longer to scrape