Facebook Groups Scraper
Pricing
$19.99/month + usage
Facebook Groups Scraper
Extract data from public Facebook groups including posts, comments, reactions, and member insights. This Apify scraper helps you track discussions, analyze engagement, monitor trends, and gather valuable data for research, marketing, and community intelligence
Pricing
$19.99/month + usage
Rating
0.0
(0)
Developer
ScrapeEngine
Actor stats
0
Bookmarked
4
Total users
1
Monthly active users
3 hours ago
Last modified
Categories
Share
A powerful Apify actor to scrape Facebook group posts with automatic discovery of doc_id, node_id, and end_cursor, intelligent pagination, advanced filtering options, and smart proxy management.
Key Features
- π Automatic Discovery: Extracts node_id, doc_id, and end_cursor automatically from group HTML/JS
- π Smart Pagination: Paginates through group feed and streams results to Apify dataset in real-time
- π Residential Proxy: Always uses residential proxy with automatic retries until data demand is fulfilled
- π Multiple Sorting Options: Support for CHRONOLOGICAL, RECENT_ACTIVITY, TOP_POSTS, and CHRONOLOGICAL_LISTINGS
- π Advanced Filtering: Filter posts by keywords, year, and date range
- β‘ Anti-blocking: Randomized delays between requests to avoid detection
- π Rich Data Extraction: Extracts posts with reactions, comments, attachments, and metadata
Input Parameters
Required
startUrls(array, required): Facebook group URLs to scrape- Only public Facebook groups can be scraped
- Example:
["https://www.facebook.com/groups/example-group"]
Optional
-
resultsLimit(integer, default: 20): Maximum number of posts to scrape across all URLs- Minimum: 1
- If not set, the actor will scrape as many results as possible
-
viewOption(string, default: "CHRONOLOGICAL"): Sorting order for posts- Options:
CHRONOLOGICAL: Posts in chronological orderRECENT_ACTIVITY: Posts sorted by recent activityTOP_POSTS: Top posts by engagementCHRONOLOGICAL_LISTINGS: Chronological listings (for BuySell groups)
- Note: The results limit applies to new posts only
- Options:
-
searchGroupKeyword(string): Search posts by letter or keyword- β οΈ Important: Without logging in, search results are VERY limited
- Searching by full words will return nothing in most cases
- Recommendation: Use one or two letter searches for better results
- Example:
"a"or"ab"
-
searchGroupYear(string): Filter posts by specific year- Requires
searchGroupKeywordto be set - Example:
"2024"
- Requires
-
onlyPostsNewerThan(string): Filter posts newer than a specific date/time- Supports multiple formats:
- Absolute date:
"2024-01-15"or"2024-01-15T10:30:00" - Relative time:
"1 days","2 months","3 years","1 hour","2 minutes" - ISO format:
"2024-01-15T10:30:00Z"
- Absolute date:
- Examples:
"2024-01-15"- Posts from January 15, 2024 onwards"2 months"- Posts from the last 2 months"1 days"- Posts from the last 24 hours
- Supports multiple formats:
-
fallbackDocId(string, optional): GraphQL doc_id to use when automatic discovery fails (e.g. after a Facebook frontend update). If the actor reports "Missing doc_id", set a known working doc_id here. Leave empty for automatic discovery. -
proxyConfiguration(object): Proxy settings- Actor always uses residential proxy
- Automatically retries until user demand data is fully fulfilled
- Default:
{"useApifyProxy": false}
Output Structure
Each post in the dataset contains the following fields:
Basic Information
facebookUrl: URL of the Facebook groupurl: Direct URL to the posttime: ISO8601 timestamp when the post was created (UTC,.000Zformat)inputUrl: Original input URL used for scraping
Post Identifiers
id: Unique post identifierlegacyId: Legacy post ID (post_id)feedbackId: Feedback identifier for the post
User Information
user: Object containing:id: User IDname: User's display name
Content
text: Post text contentattachments: Array of attachments including:- Photos with thumbnail, image URI, dimensions
- Media sets (albums) with mediaset_token
- OCR text from images
- Owner information
Engagement Metrics
likesCount: Total number of likessharesCount: Total number of sharescommentsCount: Total number of commentstopReactionsCount: Total count of top reactionsreactionLikeCount: Number of like reactionsreactionLoveCount: Number of love reactions
Comments
topComments: Array of top 2 comments, each containing:commentUrl: Direct URL to the commentid: Comment identifierfeedbackId: Comment feedback IDdate: Comment timestamptext: Comment textprofileUrl: Commenter's profile URLprofilePicture: Commenter's profile picture URLprofileId: Commenter's user IDprofileName: Commenter's display namelikesCount: Number of likes on the commentthreadingDepth: Comment threading depth
Group Information
facebookId: Facebook group IDgroupTitle: Title of the Facebook grouppageAdLibrary: Object containing:is_business_page_active: Boolean flagid: Page/group identifier
How to Use
-
Open the Actor: In Apify Console, navigate to the Facebook Groups Scraper actor
-
Configure Input:
- Add one or more Facebook group URLs in
startUrls - (Optional) Set
resultsLimitto control how many posts to scrape - (Optional) Choose
viewOptionfor sorting preference - (Optional) Set
searchGroupKeywordandsearchGroupYearfor filtered searches - (Optional) Set
onlyPostsNewerThanto filter by date
- Add one or more Facebook group URLs in
-
Run the Actor: Click "Start" and monitor the logs for progress
-
View Results:
- Results are streamed to the dataset in real-time
- View in the OUTPUT tab
- Export as JSON, CSV, or other formats as needed
Example Input
{"startUrls": ["https://www.facebook.com/groups/germtheory.vs.terraintheory"],"resultsLimit": 100,"viewOption": "RECENT_ACTIVITY","searchGroupKeyword": "a","searchGroupYear": "2024","onlyPostsNewerThan": "2 months"}
Important Notes
- Public Groups Only: Only public Facebook groups can be scraped without authentication
- Search Limitations: Without logging in, search functionality is very limited. Use single or double letters for best results
- Proxy Policy: The actor always uses residential proxy and will automatically retry until your data demand is fulfilled
- Real-time Streaming: Results are saved to the dataset as they are extracted, not at the end
- Date Filtering: Date filters are applied client-side after extraction
- Rate Limiting: The actor includes anti-blocking delays between requests
Support
For custom solutions, feature requests, or technical support, please contact: π§ dev.scraperengine@gmail.com
Technical Details
- Automatic Discovery: The actor automatically extracts required GraphQL parameters (doc_id, node_id, end_cursor) from Facebook's HTML and JavaScript
- Pagination: Uses cursor-based pagination to navigate through group feeds
- Error Handling: Includes retry logic and fallback mechanisms for robust scraping
- Data Normalization: All extracted data is normalized to a consistent structure matching the output schema