Facebook Group Post Scraper
Pricing
$24.99/month + usage
Facebook Group Post Scraper
Scrape Facebook group posts easily with Facebook Group Post Scraper! Extract post text, author info, reactions, comments, and timestamps from any group. Perfect for data collection, market research, community analysis, and engagement tracking across Facebook groups.
Pricing
$24.99/month + usage
Rating
0.0
(0)
Developer

Scraper Engine
Actor stats
0
Bookmarked
9
Total users
2
Monthly active users
2 days ago
Last modified
Categories
Share
A powerful Apify Actor designed to scrape posts from Facebook groups. Extract comprehensive post data including author information, engagement metrics (reactions, shares, comments), post content, and top comments with full author details. Perfect for social media monitoring, community engagement analysis, content research, and sentiment analysis at scale.
⚠️ IMPORTANT: RESIDENTIAL PROXY REQUIRED
Facebook BLOCKS Apify's datacenter IPs! You MUST enable residential proxy in your input configuration, or this actor will return 0 posts.
Required configuration:
{"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Without residential proxy, Facebook returns a minimal "blocked" page and the actor cannot extract any data.
Why Choose Us?
- 🔍 Comprehensive Data Extraction: Extract detailed post information including text, author details, engagement metrics, and top comments
- ⚡ High Performance: Asynchronous processing with concurrent requests for fast data collection
- 🛡️ Smart Proxy Fallback: Automatic fallback system (Direct → Datacenter → Residential) with 3 retries to maximize success rate
- 💾 Live Saving: Results saved immediately as they're scraped - no data loss if actor stops
- 📊 Flexible Sorting: Sort posts by popularity (TOP_POSTS), newest (RECENT_POSTS), or chronological order
- 🔄 Automatic Pagination: Handles pagination automatically to collect large datasets
- 💬 Comment Extraction: Extract top comments with full author profiles and engagement metrics
- 📝 Detailed Logging: Real-time logs with clear proxy events to track progress
- 🎯 Reliable: Built-in retry logic (3 attempts) and error handling for robust operation
Key Features
- Bulk Group Scraping: Process multiple Facebook group URLs in a single run
- Flexible Sorting Options:
TOP_POSTS- Most popular posts (reactions + comments)RECENT_POSTS- Newest posts firstCHRONOLOGICAL- Oldest posts first
- Comment Control: Configure maximum comments per post (0-100, or 0 for all)
- Intelligent Proxy Fallback:
- Starts with no proxy (direct connection) - fastest
- Automatically falls back to datacenter proxy if blocked
- Falls back to residential proxy if datacenter is blocked
- Retries 3 times with residential proxy before giving up
- Once residential proxy is used, it persists for all remaining requests
- Real-time Data Saving: Posts are saved to dataset as soon as they're extracted
- Comprehensive Engagement Data: Extracts reaction counts, share counts, comment counts
- Top Comments with Details: Extracts top comments including:
- Comment text and creation time
- Author information (name, ID, profile picture, verification status)
- Comment engagement (reactions, replies)
- Direct comment URLs
- Configurable Limits: Set maximum number of posts to collect (1-10,000)
- Pagination Support: Automatically handles Facebook's pagination to fetch all available posts
Input
The actor accepts the following input parameters:
Input Schema
{"startUrls": [{"url": "https://www.facebook.com/groups/germtheory.vs.terraintheory"}],"sortOrder": "TOP_POSTS","maxComments": 10,"maxItems": 500,"proxyConfiguration": {"useApifyProxy": false}}
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| startUrls | array | ✅ Yes | - | List of Facebook group URLs. Each URL should be an object with url property (e.g., {"url": "https://www.facebook.com/groups/groupname"}). |
| sortOrder | string | ❌ No | "TOP_POSTS" | How to sort posts. Options: "TOP_POSTS" (most popular), "RECENT_POSTS" (newest first), "CHRONOLOGICAL" (oldest first). |
| maxComments | integer | ❌ No | 10 | Maximum number of top comments to extract per post. Set to 0 to extract all available comments. Range: 0-100. |
| maxItems | integer | ❌ No | 500 | Maximum number of posts to scrape from each group. Range: 1-10,000. |
| proxyConfiguration | object | ❌ No | {"useApifyProxy": false} | Proxy configuration. By default, no proxy is used. If Facebook blocks requests, the actor automatically falls back to datacenter proxy, then residential proxy. |
Input Examples
Example 1: Single Group with Default Settings
{"startUrls": [{"url": "https://www.facebook.com/groups/germtheory.vs.terraintheory"}]}
Uses default settings: TOP_POSTS sorting, 10 comments per post, 500 posts max, no proxy (with automatic fallback).
Example 2: Multiple Groups with Custom Settings
{"startUrls": [{"url": "https://www.facebook.com/groups/group1"},{"url": "https://www.facebook.com/groups/group2"}],"sortOrder": "RECENT_POSTS","maxComments": 20,"maxItems": 1000}
Example 3: All Comments, Chronological Order
{"startUrls": [{"url": "https://www.facebook.com/groups/mygroup"}],"sortOrder": "CHRONOLOGICAL","maxComments": 0,"maxItems": 200}
Extracts all comments (0 = unlimited) and sorts posts chronologically.
Example 4: With Proxy Enabled (Recommended for Large Scrapes)
{"startUrls": [{"url": "https://www.facebook.com/groups/mygroup"}],"maxItems": 5000,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Uses residential proxy from the start for better reliability on large scrapes.
Output
The actor saves results to the Apify dataset. Each post entry contains comprehensive metadata:
Output Schema
{"createdAt": 1761419106,"url": "https://www.facebook.com/groups/germtheory.vs.terraintheory/permalink/25132651666385171/","user": {"id": "pfbid0Ypf1LaaX4i3fZVCN4zpxvkqJBn8Sdco4pqE42cdaH974RpncHEKhcAsxu1Ar6igPl","name": "Anna Kazazis","url": "https://www.facebook.com/anna.kazazis"},"text": "Are dried goji berries considered sweet or subacid?","attachments": [],"reactionCount": 3,"shareCount": 0,"commentCount": 8,"topComments": [{"text": "I believe dried fruit is its own category...","createdAt": 1761463794,"author": {"name": "Gosia Adur","id": "pfbid0xBBvGGZDdjVCwCXJjiKQGFPUJ3nXxqFELp82wtGBVb4adizWYh52Tgsfjb7kPnnRl","gender": "FEMALE","url": null,"profilePicture": "https://scontent.fdac142-1.fna.fbcdn.net/...","shortName": "Gosia","isVerified": false},"reactionCount": 2,"commentCount": 1,"url": "https://www.facebook.com/groups/germtheory.vs.terraintheory/permalink/25132651666385171/?comment_id=25143801295270208"}]}
Output Fields
| Field | Type | Description |
|---|---|---|
| createdAt | integer | Unix timestamp when the post was created |
| url | string | Direct link to the post on Facebook |
| user | object | Post author information |
| user.id | string | Author's Facebook user ID |
| user.name | string | Author's display name |
| user.url | string | Author's profile URL (may be empty) |
| text | string | Post text content |
| attachments | array | Post attachments (currently empty, can be expanded) |
| reactionCount | integer | Total number of reactions (likes, etc.) on the post |
| shareCount | integer | Number of times the post was shared |
| commentCount | integer | Total number of comments on the post |
| topComments | array | Array of top comments (limited by maxComments setting) |
| topComments[].text | string | Comment text content |
| topComments[].createdAt | integer | Unix timestamp when comment was created |
| topComments[].author | object | Comment author information |
| topComments[].author.name | string | Comment author's display name |
| topComments[].author.id | string | Comment author's Facebook user ID |
| topComments[].author.gender | string | Author's gender (MALE/FEMALE/empty) |
| topComments[].author.url | string | Author's profile URL |
| topComments[].author.profilePicture | string | URL to author's profile picture |
| topComments[].author.shortName | string | Author's short/first name |
| topComments[].author.isVerified | boolean | Whether author has verified account |
| topComments[].reactionCount | integer | Number of reactions on the comment |
| topComments[].commentCount | integer | Number of replies to the comment |
| topComments[].url | string | Direct link to the comment |
🚀 How to Use the Actor (via Apify Console)
-
Log in to your Apify account at https://console.apify.com
-
Navigate to Actors and find
facebook-group-post-scraper -
Configure Input:
- Add Facebook group URLs in
startUrls(click "Add item" for each URL) - Set
sortOrder(TOP_POSTS, RECENT_POSTS, or CHRONOLOGICAL) - Set
maxCommentsto control how many comments to extract per post (0 = all) - Set
maxItemsto control how many posts to scrape per group - (Optional) Configure proxy settings if needed
- Add Facebook group URLs in
-
Run the Actor by clicking the "Start" button
-
Monitor Progress:
- Watch real-time logs to see scraping progress
- See proxy fallback events if blocking occurs
- Check the dataset to see posts being saved live
- View engagement statistics in the summary
-
Access Results:
- Go to the Dataset tab after the run completes
- View results in table format with all fields
- Expand
userandtopCommentsobjects to see detailed information - Click post URLs to view posts directly on Facebook
-
Export Results:
- Click "Export" to download data as JSON, CSV, Excel, or other formats
- Use the Apify API to programmatically access results
- Integrate with webhooks or other services
Best Use Cases
- 📊 Social Media Monitoring: Track posts and engagement in specific Facebook groups
- 💬 Community Analysis: Analyze discussion patterns and popular topics in groups
- 📈 Engagement Research: Study which types of posts get the most reactions and comments
- 🔍 Content Discovery: Find trending posts and discussions in niche communities
- 📝 Sentiment Analysis: Collect post data for sentiment analysis and opinion mining
- 🎯 Influencer Research: Identify active members and their engagement patterns
- 📱 Brand Monitoring: Monitor mentions and discussions about your brand in groups
- 📚 Academic Research: Collect data for social media research and studies
- 💼 Market Research: Understand community interests and trending topics
- 🔄 Competitive Intelligence: Track competitor mentions and discussions
Frequently Asked Questions
How does proxy fallback work?
The actor uses intelligent proxy fallback to maximize success rate:
- Default: Starts with no proxy (direct connection) for fastest performance
- If blocked: Automatically falls back to datacenter proxy
- If still blocked: Falls back to residential proxy with 3 retries
- Persistence: Once residential proxy is activated, it's used for all remaining requests
- Clear logging: All proxy events are logged so you can track what's happening
Example Flow:
- Request 1: Direct connection → Blocked (403)
- Request 2: Datacenter proxy → Blocked (403)
- Request 3-5: Residential proxy → Success (200)
- All subsequent requests: Continue with residential proxy
Recommendation: For large scrapes (1000+ posts), enable residential proxy from the start for better reliability.
Can I scrape posts from multiple groups?
Yes! Simply add multiple URLs to the startUrls array:
{"startUrls": [{"url": "https://www.facebook.com/groups/group1"},{"url": "https://www.facebook.com/groups/group2"},{"url": "https://www.facebook.com/groups/group3"}]}
The actor will process each group sequentially and save all posts to the same dataset.
What's the maximum number of posts I can scrape?
You can scrape up to 10,000 posts per group per run. However:
- Some groups may not have 10,000 posts
- Facebook's pagination may limit available posts
- Consider scraping in batches for very large datasets
- Use
maxItemsto control the limit per group
How long does it take to scrape posts?
Speed depends on:
- Number of groups
- Maximum posts per group
- Number of comments per post (
maxCommentssetting) - Facebook's response time
- Whether proxies are used (proxy = slower but more reliable)
Typical speeds (with default settings, no proxy):
- 100 posts: 2-3 minutes
- 500 posts: 8-12 minutes
- 1000 posts: 15-20 minutes
Note: With proxy fallback, add 20-30% more time. Extracting all comments (maxComments: 0) takes longer.
What happens if Facebook blocks my requests?
The actor has built-in protection:
- Automatic Fallback: If direct connection is blocked, automatically tries datacenter proxy
- Residential Fallback: If datacenter is blocked, falls back to residential proxy
- Retry Logic: Each request is retried up to 3 times before giving up
- Clear Logging: All blocking events are logged with status codes
- Graceful Handling: If a request fails after all retries, the actor continues with the next post
Tip: For public groups, direct connection usually works. For private or restricted groups, you may need to enable proxy from the start.
Are the results saved in real-time?
Yes! Posts are saved to the dataset immediately as they're extracted. This means:
- You can access partial results even if the actor stops mid-run
- Progress is not lost if the actor encounters errors
- You can monitor results while scraping is in progress
- Each post is saved as soon as it's processed
What's the difference between sort orders?
- TOP_POSTS: Shows posts with highest engagement (reactions + comments). Best for finding viral content.
- RECENT_POSTS: Shows newest posts first. Best for real-time monitoring.
- CHRONOLOGICAL: Shows oldest posts first. Best for historical analysis.
How are comments selected?
Facebook provides "interesting top-level comments" which are typically:
- Comments with high engagement (reactions, replies)
- Comments from verified or popular accounts
- Comments that Facebook's algorithm deems relevant
The actor extracts these top comments up to your maxComments limit. Set maxComments: 0 to get all available top comments.
Can I scrape private groups?
This actor can only scrape publicly accessible content:
- Public groups: ✅ Full access
- Closed groups (if you're a member): ✅ Full access (requires authentication cookies - not currently supported)
- Private groups: ❌ Not accessible
Note: The actor currently doesn't support authentication cookies. For private/closed groups, you would need to add cookie authentication (not included in this version).
What data is extracted from comments?
For each top comment, the actor extracts:
- Comment text and creation timestamp
- Author profile information (name, ID, profile picture, verification status)
- Comment engagement (reaction count, reply count)
- Direct comment URL
This allows you to analyze who's commenting, when, and how engaged the community is.
How do I handle very large groups?
For groups with thousands of posts:
- Use
maxItems: Set a reasonable limit (e.g., 1000-5000) - Use pagination: Run multiple times with different date ranges (if Facebook supports it)
- Enable proxy: Use residential proxy for better reliability
- Monitor logs: Watch for rate limiting or blocking
- Export incrementally: Check dataset periodically to see progress
Can I filter posts by date or other criteria?
The actor extracts all posts as they appear in Facebook's feed. For filtering:
- Use
sortOrder: "CHRONOLOGICAL"to get oldest posts first - Use
sortOrder: "RECENT_POSTS"to get newest posts first - Filter results after scraping using the
createdAttimestamp field - Use Apify's dataset API to query results programmatically
Support and Feedback
- Issues: Report bugs or issues through the Apify platform Issues tab
- Feature Requests: Suggest new features via Apify support or Issues tab
- Documentation: Check the Apify documentation for more details
- Community: Join the Apify Discord for discussions and tips
Cautions
⚠️ Important Legal and Ethical Considerations:
- Public Data Only: This actor collects data only from publicly available posts in Facebook groups
- No Private Content: Private groups, password-protected content, and non-public posts are not accessible
- Compliance: Ensure your use case complies with:
- Facebook's Terms of Service
- Data protection regulations (GDPR, CCPA, etc.)
- Local laws regarding web scraping and data collection
- Privacy and spam laws
- Rate Limiting: The actor includes delays and retry logic to avoid overwhelming Facebook's servers
- Respect Privacy: Be mindful of extracting personal information from posts and comments
- Responsible Use: Use this actor responsibly and ethically. Do not:
- Scrape personal information without consent
- Send unsolicited messages based on scraped data
- Violate Facebook's Terms of Service
- Use scraped data for spam, harassment, or malicious purposes
- Share or sell scraped data inappropriately
- Republish content without permission
The end user is responsible for ensuring legal compliance with all applicable laws and regulations.
Version: 0.1
Last Updated: November 2025
Actor Specification: 1
Maintained by: Your Team
Technical Details
- Runtime: Python 3.11+
- Async Framework: aiohttp for concurrent requests
- Apify SDK: Latest version (>=2.0.0)
- Retry Logic: 3 attempts per request with exponential backoff
- Proxy Support: Automatic fallback (Direct → Datacenter → Residential)
- Data Format: JSON dataset with structured fields
- Live Saving: Results pushed to dataset immediately as posts are extracted
- Pagination: Automatic handling of Facebook's cursor-based pagination
- Comment Extraction: Intelligent parsing of Facebook's GraphQL responses
Changelog
Version 0.1 (November 2025)
- ✅ Initial release
- ✅ Facebook group post scraping with pagination
- ✅ Intelligent proxy fallback system (Direct → Datacenter → Residential)
- ✅ Multiple sort order options (TOP_POSTS, RECENT_POSTS, CHRONOLOGICAL)
- ✅ Top comment extraction with full author details
- ✅ Engagement metrics (reactions, shares, comments)
- ✅ Live data saving to dataset
- ✅ Comprehensive logging with proxy event tracking
- ✅ Multi-group support
- ✅ Configurable comment limits
- ✅ Real-time progress tracking