Facebook Groups Scraper
Pricing
$19.99/month + usage
Facebook Groups Scraper
Pricing
$19.99/month + usage
Rating
0.0
(0)
Developer
ScrapeEngine
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
A powerful Apify actor to scrape Facebook group posts with automatic discovery of doc_id, node_id, and end_cursor, intelligent pagination, advanced filtering options, and smart proxy management.
Key Features
- ๐ Automatic Discovery: Extracts node_id, doc_id, and end_cursor automatically from group HTML/JS
- ๐ Smart Pagination: Paginates through group feed and streams results to Apify dataset in real-time
- ๐ Residential Proxy: Always uses residential proxy with automatic retries until data demand is fulfilled
- ๐ Multiple Sorting Options: Support for CHRONOLOGICAL, RECENT_ACTIVITY, TOP_POSTS, and CHRONOLOGICAL_LISTINGS
- ๐ Advanced Filtering: Filter posts by keywords, year, and date range
- โก Anti-blocking: Randomized delays between requests to avoid detection
- ๐ Rich Data Extraction: Extracts posts with reactions, comments, attachments, and metadata
Input Parameters
Required
startUrls(array, required): Facebook group URLs to scrape- Only public Facebook groups can be scraped
- Example:
["https://www.facebook.com/groups/example-group"]
Optional
-
resultsLimit(integer, default: 20): Maximum number of posts to scrape across all URLs- Minimum: 1
- If not set, the actor will scrape as many results as possible
-
viewOption(string, default: "CHRONOLOGICAL"): Sorting order for posts- Options:
CHRONOLOGICAL: Posts in chronological orderRECENT_ACTIVITY: Posts sorted by recent activityTOP_POSTS: Top posts by engagementCHRONOLOGICAL_LISTINGS: Chronological listings (for BuySell groups)
- Note: The results limit applies to new posts only
- Options:
-
searchGroupKeyword(string): Search posts by letter or keyword- โ ๏ธ Important: Without logging in, search results are VERY limited
- Searching by full words will return nothing in most cases
- Recommendation: Use one or two letter searches for better results
- Example:
"a"or"ab"
-
searchGroupYear(string): Filter posts by specific year- Requires
searchGroupKeywordto be set - Example:
"2024"
- Requires
-
onlyPostsNewerThan(string): Filter posts newer than a specific date/time- Supports multiple formats:
- Absolute date:
"2024-01-15"or"2024-01-15T10:30:00" - Relative time:
"1 days","2 months","3 years","1 hour","2 minutes" - ISO format:
"2024-01-15T10:30:00Z"
- Absolute date:
- Examples:
"2024-01-15"- Posts from January 15, 2024 onwards"2 months"- Posts from the last 2 months"1 days"- Posts from the last 24 hours
- Supports multiple formats:
-
fallbackDocId(string, optional): GraphQL doc_id to use when automatic discovery fails (e.g. after a Facebook frontend update). If the actor reports "Missing doc_id", set a known working doc_id here. Leave empty for automatic discovery. -
proxyConfiguration(object): Proxy settings- Actor always uses residential proxy
- Automatically retries until user demand data is fully fulfilled
- Default:
{"useApifyProxy": false}
Output Structure
Each post in the dataset contains the following fields:
Basic Information
facebookUrl: URL of the Facebook groupurl: Direct URL to the posttime: ISO8601 timestamp when the post was created (UTC,.000Zformat)inputUrl: Original input URL used for scraping
Post Identifiers
id: Unique post identifierlegacyId: Legacy post ID (post_id)feedbackId: Feedback identifier for the post
User Information
user: Object containing:id: User IDname: User's display name
Content
text: Post text contentattachments: Array of attachments including:- Photos with thumbnail, image URI, dimensions
- Media sets (albums) with mediaset_token
- OCR text from images
- Owner information
Engagement Metrics
likesCount: Total number of likessharesCount: Total number of sharescommentsCount: Total number of commentstopReactionsCount: Total count of top reactionsreactionLikeCount: Number of like reactionsreactionLoveCount: Number of love reactions
Comments
topComments: Array of top 2 comments, each containing:commentUrl: Direct URL to the commentid: Comment identifierfeedbackId: Comment feedback IDdate: Comment timestamptext: Comment textprofileUrl: Commenter's profile URLprofilePicture: Commenter's profile picture URLprofileId: Commenter's user IDprofileName: Commenter's display namelikesCount: Number of likes on the commentthreadingDepth: Comment threading depth
Group Information
facebookId: Facebook group IDgroupTitle: Title of the Facebook grouppageAdLibrary: Object containing:is_business_page_active: Boolean flagid: Page/group identifier
How to Use
-
Open the Actor: In Apify Console, navigate to the Facebook Groups Scraper actor
-
Configure Input:
- Add one or more Facebook group URLs in
startUrls - (Optional) Set
resultsLimitto control how many posts to scrape - (Optional) Choose
viewOptionfor sorting preference - (Optional) Set
searchGroupKeywordandsearchGroupYearfor filtered searches - (Optional) Set
onlyPostsNewerThanto filter by date
- Add one or more Facebook group URLs in
-
Run the Actor: Click "Start" and monitor the logs for progress
-
View Results:
- Results are streamed to the dataset in real-time
- View in the OUTPUT tab
- Export as JSON, CSV, or other formats as needed
Example Input
{"startUrls": ["https://www.facebook.com/groups/germtheory.vs.terraintheory"],"resultsLimit": 100,"viewOption": "RECENT_ACTIVITY","searchGroupKeyword": "a","searchGroupYear": "2024","onlyPostsNewerThan": "2 months"}
Important Notes
- Public Groups Only: Only public Facebook groups can be scraped without authentication
- Search Limitations: Without logging in, search functionality is very limited. Use single or double letters for best results
- Proxy Policy: The actor always uses residential proxy and will automatically retry until your data demand is fulfilled
- Real-time Streaming: Results are saved to the dataset as they are extracted, not at the end
- Date Filtering: Date filters are applied client-side after extraction
- Rate Limiting: The actor includes anti-blocking delays between requests
Support
For custom solutions, feature requests, or technical support, please contact: ๐ง dev.scraperengine@gmail.com
Technical Details
- Automatic Discovery: The actor automatically extracts required GraphQL parameters (doc_id, node_id, end_cursor) from Facebook's HTML and JavaScript
- Pagination: Uses cursor-based pagination to navigate through group feeds
- Error Handling: Includes retry logic and fallback mechanisms for robust scraping
- Data Normalization: All extracted data is normalized to a consistent structure matching the output schema