Facebook Groups Scraper avatar

Facebook Groups Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Facebook Groups Scraper

Facebook Groups Scraper

Scrape Facebook group data efficiently with this Facebook Groups Scraper ๐Ÿ‘ฅ Extract posts, member insights, comments, reactions, and timestamps with ease ๐Ÿ“Š Ideal for community analysis, lead generation, and trend discovery ๐Ÿ” Fast, reliable, and scalable ๐Ÿš€

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

ScrapeMesh

ScrapeMesh

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

4 days ago

Last modified

Share

A powerful Apify actor to scrape Facebook group posts with automatic discovery of doc_id, node_id, and end_cursor, intelligent pagination, advanced filtering options, and smart proxy management.

Key Features

  • ๐Ÿ” Automatic Discovery: Extracts node_id, doc_id, and end_cursor automatically from group HTML/JS
  • ๐Ÿ“„ Smart Pagination: Paginates through group feed and streams results to Apify dataset in real-time
  • ๐Ÿ  Residential Proxy: Always uses residential proxy with automatic retries until data demand is fulfilled
  • ๐Ÿ”„ Multiple Sorting Options: Support for CHRONOLOGICAL, RECENT_ACTIVITY, TOP_POSTS, and CHRONOLOGICAL_LISTINGS
  • ๐Ÿ”Ž Advanced Filtering: Filter posts by keywords, year, and date range
  • โšก Anti-blocking: Randomized delays between requests to avoid detection
  • ๐Ÿ“Š Rich Data Extraction: Extracts posts with reactions, comments, attachments, and metadata

Input Parameters

Required

  • startUrls (array, required): Facebook group URLs to scrape
    • Only public Facebook groups can be scraped
    • Example: ["https://www.facebook.com/groups/example-group"]

Optional

  • resultsLimit (integer, default: 20): Maximum number of posts to scrape across all URLs

    • Minimum: 1
    • If not set, the actor will scrape as many results as possible
  • viewOption (string, default: "CHRONOLOGICAL"): Sorting order for posts

    • Options:
      • CHRONOLOGICAL: Posts in chronological order
      • RECENT_ACTIVITY: Posts sorted by recent activity
      • TOP_POSTS: Top posts by engagement
      • CHRONOLOGICAL_LISTINGS: Chronological listings (for BuySell groups)
    • Note: The results limit applies to new posts only
  • searchGroupKeyword (string): Search posts by letter or keyword

    • โš ๏ธ Important: Without logging in, search results are VERY limited
    • Searching by full words will return nothing in most cases
    • Recommendation: Use one or two letter searches for better results
    • Example: "a" or "ab"
  • searchGroupYear (string): Filter posts by specific year

    • Requires searchGroupKeyword to be set
    • Example: "2024"
  • onlyPostsNewerThan (string): Filter posts newer than a specific date/time

    • Supports multiple formats:
      • Absolute date: "2024-01-15" or "2024-01-15T10:30:00"
      • Relative time: "1 days", "2 months", "3 years", "1 hour", "2 minutes"
      • ISO format: "2024-01-15T10:30:00Z"
    • Examples:
      • "2024-01-15" - Posts from January 15, 2024 onwards
      • "2 months" - Posts from the last 2 months
      • "1 days" - Posts from the last 24 hours
  • fallbackDocId (string, optional): GraphQL doc_id to use when automatic discovery fails (e.g. after a Facebook frontend update). If the actor reports "Missing doc_id", set a known working doc_id here. Leave empty for automatic discovery.

  • proxyConfiguration (object): Proxy settings

    • Actor always uses residential proxy
    • Automatically retries until user demand data is fully fulfilled
    • Default: {"useApifyProxy": false}

Output Structure

Each post in the dataset contains the following fields:

Basic Information

  • facebookUrl: URL of the Facebook group
  • url: Direct URL to the post
  • time: ISO8601 timestamp when the post was created (UTC, .000Z format)
  • inputUrl: Original input URL used for scraping

Post Identifiers

  • id: Unique post identifier
  • legacyId: Legacy post ID (post_id)
  • feedbackId: Feedback identifier for the post

User Information

  • user: Object containing:
    • id: User ID
    • name: User's display name

Content

  • text: Post text content
  • attachments: Array of attachments including:
    • Photos with thumbnail, image URI, dimensions
    • Media sets (albums) with mediaset_token
    • OCR text from images
    • Owner information

Engagement Metrics

  • likesCount: Total number of likes
  • sharesCount: Total number of shares
  • commentsCount: Total number of comments
  • topReactionsCount: Total count of top reactions
  • reactionLikeCount: Number of like reactions
  • reactionLoveCount: Number of love reactions

Comments

  • topComments: Array of top 2 comments, each containing:
    • commentUrl: Direct URL to the comment
    • id: Comment identifier
    • feedbackId: Comment feedback ID
    • date: Comment timestamp
    • text: Comment text
    • profileUrl: Commenter's profile URL
    • profilePicture: Commenter's profile picture URL
    • profileId: Commenter's user ID
    • profileName: Commenter's display name
    • likesCount: Number of likes on the comment
    • threadingDepth: Comment threading depth

Group Information

  • facebookId: Facebook group ID
  • groupTitle: Title of the Facebook group
  • pageAdLibrary: Object containing:
    • is_business_page_active: Boolean flag
    • id: Page/group identifier

How to Use

  1. Open the Actor: In Apify Console, navigate to the Facebook Groups Scraper actor

  2. Configure Input:

    • Add one or more Facebook group URLs in startUrls
    • (Optional) Set resultsLimit to control how many posts to scrape
    • (Optional) Choose viewOption for sorting preference
    • (Optional) Set searchGroupKeyword and searchGroupYear for filtered searches
    • (Optional) Set onlyPostsNewerThan to filter by date
  3. Run the Actor: Click "Start" and monitor the logs for progress

  4. View Results:

    • Results are streamed to the dataset in real-time
    • View in the OUTPUT tab
    • Export as JSON, CSV, or other formats as needed

Example Input

{
"startUrls": [
"https://www.facebook.com/groups/germtheory.vs.terraintheory"
],
"resultsLimit": 100,
"viewOption": "RECENT_ACTIVITY",
"searchGroupKeyword": "a",
"searchGroupYear": "2024",
"onlyPostsNewerThan": "2 months"
}

Important Notes

  • Public Groups Only: Only public Facebook groups can be scraped without authentication
  • Search Limitations: Without logging in, search functionality is very limited. Use single or double letters for best results
  • Proxy Policy: The actor always uses residential proxy and will automatically retry until your data demand is fulfilled
  • Real-time Streaming: Results are saved to the dataset as they are extracted, not at the end
  • Date Filtering: Date filters are applied client-side after extraction
  • Rate Limiting: The actor includes anti-blocking delays between requests

Support

For custom solutions, feature requests, or technical support, please contact: ๐Ÿ“ง dev.scraperengine@gmail.com

Technical Details

  • Automatic Discovery: The actor automatically extracts required GraphQL parameters (doc_id, node_id, end_cursor) from Facebook's HTML and JavaScript
  • Pagination: Uses cursor-based pagination to navigate through group feeds
  • Error Handling: Includes retry logic and fallback mechanisms for robust scraping
  • Data Normalization: All extracted data is normalized to a consistent structure matching the output schema