Facebook Search Scraper [UPDATED]
Pricing
$15.00/month + usage
Facebook Search Scraper [UPDATED]
Extract and analyze public Facebook posts with detailed engagement metrics, author insights, and rich media data. Perfect for market research, brand monitoring, and competitive analysis. Get structured JSON output with post content, reactions, comments, shares, and hashtags—all in one place
Pricing
$15.00/month + usage
Rating
1.0
(1)
Developer
Muhamed Didovic
Actor stats
1
Bookmarked
33
Total users
3
Monthly active users
1.5 days
Issues response
5 days ago
Last modified
Categories
Share
Facebook Posts Search Scraper
Unlock the Full Power of Facebook Search Data - This actor is now primarily focused on discovering Facebook pages through Google and extracting structured Facebook page data in a competitor-aligned format. It is useful for location-based business discovery, page research, monitoring, and dataset building.
"From Google discovery to structured Facebook page output, the goal is reliable page-level intelligence with minimal manual cleanup."
What You'll Get
With the Facebook Posts Search Scraper, you'll gain access to rich, structured Facebook data discovered through Google, including:
Comprehensive Page Data
- Direct Facebook Page URLs: Canonical and legacy Facebook page links
- Page Identity: Facebook IDs, page names, and normalized page URLs
- Business Details: Categories, addresses, phone numbers, websites, services, and business status
- Media & Profile Data: Profile image, cover image, and related page metadata
Valuable Engagement Metrics
- Follower and Like Counts: Useful for quick size and popularity checks
- Ratings Metadata: Rating text, counts, and overall score when Facebook exposes them
- Ad Library References: Page ad library identifiers when available
Actionable Business Intelligence
- Location-Aware Discovery: Search terms can be combined with places such as
Sarajevo - Competitive Benchmarking: Output is shaped toward competitor-style page datasets
- Availability Detection: Legacy pages that are no longer accessible can still be surfaced as structured error rows
Technical Advantages
- Structured JSON Output: Easy-to-parse page-level dataset rows
- CSV Export: Useful for quick spreadsheet review
- Google-Only Discovery for Cheerio: Keeps the default actor focused and simpler
This data helps researchers, operators, and analysts build business-page datasets, compare discovery quality against competitors, and monitor what Facebook exposes for public business pages.*
Overview
The Facebook Posts Search Scraper currently has two flows in the repository. The active default flow is the Cheerio actor in src/main.ts, which discovers Facebook pages through Google and extracts page data. The older Puppeteer flow in src/main-puppeteer.ts still exists for browser-driven Facebook post-search experiments, but it is no longer the main path for the default actor behavior.
What does Facebook Posts Search Scraper do?
The Facebook Posts Search Scraper currently enables you to:
Comprehensive Data Collection
- Page Data
- Discover Facebook pages from Google result pages
- Extract structured page-level fields such as IDs, names, categories, addresses, follower counts, links, and media references
- Normalize different Facebook URL shapes into stable output rows
- Capture legacy unavailable pages as structured error rows instead of silently losing them
- Legacy and Mixed URL Handling
- Process canonical page URLs
- Process legacy
/pages/...URLs - Process legacy
/people/...URLs - Handle some
photos_streamstyle pages when Google surfaces them
Advanced Scraping Capabilities
- Google Pagination Handling: Opens additional Google pages based on
maxItems - Location-Aware Search: Combines search terms with optional locations
- Fallback Request Strategy: Uses mixed transport for canonical and legacy Facebook URLs
- Structured Retry Behavior: Retries or falls back when Facebook responses are incomplete
- Monitoring Mode: Keeps support for incremental collection workflows
Flexible Scraping Options
- Keyword Discovery: Search by plain terms such as
Pub- Example:
searchTerms: ["Pub"]
- Example:
- Location Discovery: Combine each term with locations
- Example:
locations: ["Sarajevo"]
- Example:
- Google-Driven Page Discovery: Use Google SERPs to find Facebook business pages
- Page Output Collection: Export discovered page rows to JSON and CSV
This tool is ideal for:
- Local business dataset building
- Competitive intelligence and market research
- Location-aware page discovery
- Comparing output parity with competitor actors
- Monitoring newly discovered or changed public pages
Features
- Comprehensive Data Extraction: Detailed Facebook page information and structured output rows
- Multiple Scraping Modes:
- Cheerio Default Mode: Google-only Facebook page discovery via
src/main.ts - Puppeteer Legacy Mode: Browser-based Facebook post search via
src/main-puppeteer.ts
- Cheerio Default Mode: Google-only Facebook page discovery via
- Flexible Input: Supports multiple input formats:
- Plain search terms
- Optional locations
- Legacy Puppeteer
startUrlsfor browser flow experiments
- Automatic Google Pagination: Expands SERPs based on
maxItems - Efficient Processing: Concurrent crawling with configurable concurrency in code
- Reliable Performance: Built-in retry and fallback behavior, plus hardcoded proxy routing in the Cheerio flow
- Structured Data Export: Download discovered page data in JSON or CSV format for analysis
How to Use
Scraping Facebook Page Data
To run the current default actor:
- Set Up: Ensure you have an Apify account and access to the Apify platform.
- Configure Input: Provide plain
searchTermsand optionallocations. - Adjust Settings: Configure
maxItems,startDate,monitoringMode,minDelay, andmaxDelay. - Run the Scraper: Execute the scraper on the Apify platform.
- Data Collection: The scraper will query Google, discover Facebook pages, fetch Facebook page HTML, and export structured page rows.
Input Configuration
Here's an example of how to set up the input for the current Cheerio actor:
{"searchTerms": ["Pub"],"locations": ["Sarajevo"],"monitoringMode": true,"startDate": "2025-07-01","maxItems": 60,"minDelay": 5,"maxDelay": 10}
Input Fields Explanation
searchTerms: Array of plain Google discovery terms such asPub.locations: Optional array of locations combined with each search term, for exampleSarajevo.monitoringMode: When enabled, only processes newly seen items compared to prior runs where applicable.startDate: Optional lower date boundary used by the monitoring-related flow.maxItems: Maximum number of dataset rows to push and the input used to scale Google pagination depth.minDelay: Minimum randomized delay between requests, in seconds.maxDelay: Maximum randomized delay between requests, in seconds.
The Cheerio flow no longer expects:
- cookies
- proxy settings in actor input
Monitoring Mode
When monitoringMode is enabled, the scraper focuses on incremental collection behavior where applicable. This is useful for:
- Tracking newly discovered or newly processed items
- Building a historical archive over repeated runs
- Monitoring the same search space without excessive duplication
How Monitoring Mode Works
- The scraper keeps state for repeated runs
- On subsequent runs with
monitoringMode: true, previously seen data can be filtered or processed differently - Newly discovered items are prioritized
- The stored state is updated after the run
Output Structure
The current default actor outputs Facebook page rows, not just Facebook post rows. Below is a current sample row from data.json.
Sample JSON Output
{"facebookUrl": "https://www.facebook.com/thebarsarajevo","pageUrl": "https://www.facebook.com/thebarsarajevo","pageName": "thebarsarajevo","pageId": "100054347029663","facebookId": "100054347029663","title": "The Bar Sarajevo | Sarajevo","categories": ["Page","The Bar"],"info": ["The Bar Sarajevo, Sarajevo. 671 likes","473 were here. The Bar"],"likes": 671,"intro": null,"phone": null,"email": null,"messenger": null,"priceRange": null,"address": "43.85568192,18.41980219","websites": ["https://maps.google.com/maps?q=43.85568192,18.41980219&hl=en"],"website": null,"services": null,"followers": 671,"followings": null,"profilePictureUrl": "https://scontent-man2-1.xx.fbcdn.net/...","coverPhotoUrl": null,"profilePhoto": null,"alternativeSocialMedia": null,"ratingOverall": null,"ratingCount": null,"category": "The Bar","addressUrl": "https://maps.google.com/maps?q=43.85568192,18.41980219&hl=en","ratings": null,"rating": null,"business_hours": "Open now","business_price": null,"business_services": null,"creation_date": null,"ad_status": null,"pageAdLibrary": {"id": "215880995429821","owner_business": null,"pamv_comms_data": null},"instagram": null}
Output Fields Explanation
Page Identity
facebookUrl(String): The Facebook URL stored for the page in the output row.pageUrl(String): The normalized page URL used as the page-level reference.pageName(String): The extracted Facebook page slug or page name token.pageId(String): The page identifier extracted from Facebook data.facebookId(String): The main Facebook ID exposed for the page. This often matchespageId.title(String): The page title as shown in the output, often including city or branding context.
Classification and Summary
categories(Array): The list of categories Facebook exposed for the page.info(Array): Short free-text summary lines extracted from the page.category(String or null): A single primary category when the extractor can determine one.intro(String or null): The page introduction or short description if available.
Contact and Business Details
phone(String or null): Phone number shown on the page.email(String or null): Public contact email shown on the page.messenger(String or null): Messenger contact link or related messenger metadata when available.priceRange(String or null): Facebook price-range value when extracted in the page summary.address(String or null): Public page address. This may be a textual address or coordinates, depending on what Facebook exposes.websites(Array): List of public website links associated with the page.website(String or null): A primary website selected from the available links.services(String or null): Service summary extracted from the page, such asDine in.
Audience and Popularity
likes(Number or null): Public like count.followers(Number or null): Public follower count.followings(Number or null): Public following count if exposed by Facebook.ratings(String or null): Human-readable rating summary text, for example recommendation text.rating(String or null): Rating label text if available.ratingOverall(Number or null): Numeric overall rating when Facebook provides one.ratingCount(Number or null): Number of reviews or ratings when available.
Media and Visual Fields
profilePictureUrl(String or null): Direct image URL for the page profile picture.coverPhotoUrl(String or null): Direct image URL for the page cover photo.profilePhoto(String or null): Facebook photo page URL for the profile image when available.alternativeSocialMedia(String or null): Alternative social profile reference if extracted separately.instagram(String or null): Instagram reference when stored separately fromwebsiteorwebsites.
Business Status Fields
addressUrl(String or null): Map or location URL associated with the address.business_hours(String or null): Human-readable business-hours status such asOpen now.business_price(String or null): Human-readable business price label.business_services(String or null): Human-readable business services label.creation_date(String or null): Page creation date if Facebook exposes it.ad_status(String or null): Ad-library-related page status text.
Ad Library Object
pageAdLibrary(Object or null): Nested object with ad library metadata for the page.id(String or null): The page ad library identifier.owner_business(String, Object, or null): Business owner metadata if Facebook exposes it.pamv_comms_data(Object or null): Additional communication or ad-library metadata when available.
Null Values
If a field is null, it usually means one of these:
- Facebook did not expose that field publicly
- the page layout did not contain the field in the parsed HTML
- the field exists for some pages but not for this specific page
Error Rows
Some rows are not normal page rows. For unavailable legacy pages, the actor can emit structured error rows such as:
{"url": "https://www.facebook.com/pages/Pirates-Bar-Old-Town-Sarajevo/163665216985663","error": "not_available","errorDescription": "This content isn't available because the owner only shared it with a small group of people or changed who can see it, or it's been deleted."}
url(String): The Facebook page URL that failed to resolve to a normal page row.error(String): Machine-friendly error type.errorDescription(String): Human-readable explanation of what Facebook returned.
Explore More Scrapers
If you found this Apify Facebook Posts Search Scraper useful, be sure to check out our other powerful scrapers and actors at memo23's Apify profile. We offer a wide range of tools to enhance your web scraping and automation needs across various platforms and use cases.
Support
- For issues or feature requests, please use the Issues section of this actor.
- If you need customization or have questions, feel free to contact the author:
- Author's website: https://muhamed-didovic.github.io/
- Email: muhamed.didovic@gmail.com
Additional Services
- Request customization or whole dataset: muhamed.didovic@gmail.com
- If you need anything else scraped, or this actor customized, email: muhamed.didovic@gmail.com
- For API services of this scraper (no Apify fee, just usage fee for the API), contact: muhamed.didovic@gmail.com
- Email: muhamed.didovic@gmail.com