Facebook Groups Scraper avatar

Facebook Groups Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Facebook Groups Scraper

Facebook Groups Scraper

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

ScrapeEngine

ScrapeEngine

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

A powerful Apify actor to scrape Facebook group posts with automatic discovery of doc_id, node_id, and end_cursor, intelligent pagination, advanced filtering options, and smart proxy management.

Key Features

  • ๐Ÿ” Automatic Discovery: Extracts node_id, doc_id, and end_cursor automatically from group HTML/JS
  • ๐Ÿ“„ Smart Pagination: Paginates through group feed and streams results to Apify dataset in real-time
  • ๐Ÿ  Residential Proxy: Always uses residential proxy with automatic retries until data demand is fulfilled
  • ๐Ÿ”„ Multiple Sorting Options: Support for CHRONOLOGICAL, RECENT_ACTIVITY, TOP_POSTS, and CHRONOLOGICAL_LISTINGS
  • ๐Ÿ”Ž Advanced Filtering: Filter posts by keywords, year, and date range
  • โšก Anti-blocking: Randomized delays between requests to avoid detection
  • ๐Ÿ“Š Rich Data Extraction: Extracts posts with reactions, comments, attachments, and metadata

Input Parameters

Required

  • startUrls (array, required): Facebook group URLs to scrape
    • Only public Facebook groups can be scraped
    • Example: ["https://www.facebook.com/groups/example-group"]

Optional

  • resultsLimit (integer, default: 20): Maximum number of posts to scrape across all URLs

    • Minimum: 1
    • If not set, the actor will scrape as many results as possible
  • viewOption (string, default: "CHRONOLOGICAL"): Sorting order for posts

    • Options:
      • CHRONOLOGICAL: Posts in chronological order
      • RECENT_ACTIVITY: Posts sorted by recent activity
      • TOP_POSTS: Top posts by engagement
      • CHRONOLOGICAL_LISTINGS: Chronological listings (for BuySell groups)
    • Note: The results limit applies to new posts only
  • searchGroupKeyword (string): Search posts by letter or keyword

    • โš ๏ธ Important: Without logging in, search results are VERY limited
    • Searching by full words will return nothing in most cases
    • Recommendation: Use one or two letter searches for better results
    • Example: "a" or "ab"
  • searchGroupYear (string): Filter posts by specific year

    • Requires searchGroupKeyword to be set
    • Example: "2024"
  • onlyPostsNewerThan (string): Filter posts newer than a specific date/time

    • Supports multiple formats:
      • Absolute date: "2024-01-15" or "2024-01-15T10:30:00"
      • Relative time: "1 days", "2 months", "3 years", "1 hour", "2 minutes"
      • ISO format: "2024-01-15T10:30:00Z"
    • Examples:
      • "2024-01-15" - Posts from January 15, 2024 onwards
      • "2 months" - Posts from the last 2 months
      • "1 days" - Posts from the last 24 hours
  • fallbackDocId (string, optional): GraphQL doc_id to use when automatic discovery fails (e.g. after a Facebook frontend update). If the actor reports "Missing doc_id", set a known working doc_id here. Leave empty for automatic discovery.

  • proxyConfiguration (object): Proxy settings

    • Actor always uses residential proxy
    • Automatically retries until user demand data is fully fulfilled
    • Default: {"useApifyProxy": false}

Output Structure

Each post in the dataset contains the following fields:

Basic Information

  • facebookUrl: URL of the Facebook group
  • url: Direct URL to the post
  • time: ISO8601 timestamp when the post was created (UTC, .000Z format)
  • inputUrl: Original input URL used for scraping

Post Identifiers

  • id: Unique post identifier
  • legacyId: Legacy post ID (post_id)
  • feedbackId: Feedback identifier for the post

User Information

  • user: Object containing:
    • id: User ID
    • name: User's display name

Content

  • text: Post text content
  • attachments: Array of attachments including:
    • Photos with thumbnail, image URI, dimensions
    • Media sets (albums) with mediaset_token
    • OCR text from images
    • Owner information

Engagement Metrics

  • likesCount: Total number of likes
  • sharesCount: Total number of shares
  • commentsCount: Total number of comments
  • topReactionsCount: Total count of top reactions
  • reactionLikeCount: Number of like reactions
  • reactionLoveCount: Number of love reactions

Comments

  • topComments: Array of top 2 comments, each containing:
    • commentUrl: Direct URL to the comment
    • id: Comment identifier
    • feedbackId: Comment feedback ID
    • date: Comment timestamp
    • text: Comment text
    • profileUrl: Commenter's profile URL
    • profilePicture: Commenter's profile picture URL
    • profileId: Commenter's user ID
    • profileName: Commenter's display name
    • likesCount: Number of likes on the comment
    • threadingDepth: Comment threading depth

Group Information

  • facebookId: Facebook group ID
  • groupTitle: Title of the Facebook group
  • pageAdLibrary: Object containing:
    • is_business_page_active: Boolean flag
    • id: Page/group identifier

How to Use

  1. Open the Actor: In Apify Console, navigate to the Facebook Groups Scraper actor

  2. Configure Input:

    • Add one or more Facebook group URLs in startUrls
    • (Optional) Set resultsLimit to control how many posts to scrape
    • (Optional) Choose viewOption for sorting preference
    • (Optional) Set searchGroupKeyword and searchGroupYear for filtered searches
    • (Optional) Set onlyPostsNewerThan to filter by date
  3. Run the Actor: Click "Start" and monitor the logs for progress

  4. View Results:

    • Results are streamed to the dataset in real-time
    • View in the OUTPUT tab
    • Export as JSON, CSV, or other formats as needed

Example Input

{
"startUrls": [
"https://www.facebook.com/groups/germtheory.vs.terraintheory"
],
"resultsLimit": 100,
"viewOption": "RECENT_ACTIVITY",
"searchGroupKeyword": "a",
"searchGroupYear": "2024",
"onlyPostsNewerThan": "2 months"
}

Important Notes

  • Public Groups Only: Only public Facebook groups can be scraped without authentication
  • Search Limitations: Without logging in, search functionality is very limited. Use single or double letters for best results
  • Proxy Policy: The actor always uses residential proxy and will automatically retry until your data demand is fulfilled
  • Real-time Streaming: Results are saved to the dataset as they are extracted, not at the end
  • Date Filtering: Date filters are applied client-side after extraction
  • Rate Limiting: The actor includes anti-blocking delays between requests

Support

For custom solutions, feature requests, or technical support, please contact: ๐Ÿ“ง dev.scraperengine@gmail.com

Technical Details

  • Automatic Discovery: The actor automatically extracts required GraphQL parameters (doc_id, node_id, end_cursor) from Facebook's HTML and JavaScript
  • Pagination: Uses cursor-based pagination to navigate through group feeds
  • Error Handling: Includes retry logic and fallback mechanisms for robust scraping
  • Data Normalization: All extracted data is normalized to a consistent structure matching the output schema