Agency Directory Scraper
Pricing
Pay per usage
Agency Directory Scraper
Pricing
Pay per usage
Rating
0.0
(0)
Developer
ryan clinton
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 hours ago
Last modified
Categories
Share
Agency Directory Scraper & Lead Finder
Agency directory scraper that pulls marketing, design, and technology agencies from three complementary sources — Google Maps, SuperbCompanies.com, and TheManifest.com — into one deduplicated dataset. Built for sales teams, recruiters, and researchers who need structured agency data without hours of manual browsing.
The actor runs Google Maps searches via a sub-actor call, then crawls SuperbCompanies and TheManifest using a lightweight CheerioCrawler (no browser required). Every record is deduplicated by domain across all three sources before it reaches your dataset. Output includes agency name, website, domain, phone, address, services, location, team size, minimum project size, review count, and star rating. Pricing is $0.05 per agency found — you only pay for unique, deduplicated results.
What data can you extract from agency directories?
| Data Point | Source | Example |
|---|---|---|
| 📛 Agency name | All three sources | Apex Digital Strategies |
| 🌐 Website URL | All three sources | https://apexdigitalstrategies.com |
| 🔗 Domain | Extracted from website | apexdigitalstrategies.com |
| 📞 Phone number | Google Maps | +1 (212) 555-0142 |
| 📍 Address | Google Maps | 350 5th Ave, New York, NY 10118 |
| 🏷️ Services | All three sources | ["SEO", "PPC", "Content Marketing"] |
| 🗺️ Location | All three sources | New York, NY |
| 👥 Employee count | SuperbCompanies, TheManifest | 10–49 |
| 💰 Min project size | SuperbCompanies, TheManifest | $5,000+ |
| ⭐ Star rating | Google Maps, SuperbCompanies, TheManifest | 4.8 |
| 💬 Review count | Google Maps, SuperbCompanies, TheManifest | 94 |
| 📂 Source | All records | google-maps |
| 🔎 Source profile URL | All records | https://superbcompanies.com/organizations/apex-digital |
| 🕐 Scraped timestamp | All records | 2026-03-22T10:14:33.000Z |
Why use Agency Lead Finder?
Manually browsing Google Maps, SuperbCompanies, and TheManifest for agency leads is a multi-hour slog. There is no export button, no bulk download, and no shared API across these sources. Copy-pasting agency profiles one by one is error-prone and eats the better part of a working day to collect 200 records — time you could spend in actual conversations with prospects.
This actor automates the entire agency lead finding process — querying all three sources simultaneously and merging results into one clean, deduplicated list. A run pulling 50 agencies from each source completes in under 20 minutes for under $8.
- Scheduling — run daily, weekly, or monthly to keep your agency database current as new firms register
- API access — trigger runs from Python, JavaScript, or any HTTP client to integrate with your CRM pipeline
- Proxy rotation — Apify proxy support for SuperbCompanies and TheManifest crawling at scale
- Monitoring — get Slack or email alerts when runs fail or produce fewer results than expected
- Integrations — connect to Zapier, Make, Google Sheets, HubSpot, or webhooks to push results directly into your workflow
Features
- Three-source coverage — Google Maps (via sub-actor), SuperbCompanies.com, and TheManifest.com in one run, with independent per-source caps up to 500 agencies each
- Domain-based deduplication across all sources — a shared
seenDomainsSet is initialised with Google Maps results before the CheerioCrawler starts, so no agency domain is output twice regardless of which source found it first - Google Maps sub-actor integration — calls
ryanclinton/google-maps-email-extractorwith a constructed query (e.g."marketing agency New York") and maps phone, address, rating, review count, and Google Maps URL to the unified record schema - CheerioCrawler for directory crawling — no Playwright browser required; SuperbCompanies and TheManifest are crawled with lightweight HTTP + Cheerio parsing at up to 5 concurrent requests with session pooling and cookie persistence
- Sitemap-driven discovery — both SuperbCompanies and TheManifest are seeded from their XML sitemaps (
/sitemap.xml), extracting all/organizations/and/companies/URLs without needing to paginate listing pages - SuperbCompanies sitemap index handling — fetches the sitemap index, extracts child sitemap URLs (e.g.
/sitemap-organizations-1.xml), then enqueues each child for profile URL extraction - TheManifest XML and HTML sitemap fallback — parses
<url><loc>elements first; if none match company paths, falls back to scanning<a href>anchor links for/companies/and/directory/patterns - schema.org structured data extraction — SuperbCompanies profiles are parsed for
itemprop="address",itemprop="addressLocality",itemprop="addressCountry", anditemprop="ratingValue"before falling back to CSS class selectors - Service tag extraction — collects up to 10 deduplicated service and specialty tags per profile from
[class*="service"],[class*="expertise"],[class*="tag"], and[class*="skill"]selectors, filtered to strings between 2–60 characters - Junk link filtering for website detection — skips
linkedin,facebook,twitter,instagram,clutch,google,yelp,sortlist, andsuperbcompanieswhen looking for an agency's own website in profile HTML - Normalised website URLs — raw href values are cleaned into canonical absolute URLs using the WHATWG URL API; trailing slashes stripped; relative and fragment-only values discarded
- Structured numeric parsing — review counts like "1,234 reviews" and ratings like "4.8/5 stars" are parsed with dedicated
parseReviewCountandparseRatingfunctions that handle comma formatting and various suffix patterns - Per-source result cap —
maxAgenciesPerSource(default 50, max 500) is enforced independently per source; the crawler checks both the sharedseenDomainsSet and the per-source counter before registering each record - Spending limit enforcement — PPE charges halt when your configured budget ceiling is reached; a summary record marks whether the limit was hit
- Graceful partial results — crawl errors do not discard collected records; all agencies gathered before the error are pushed to the dataset
- Run summary record — a final
type: "summary"record is always appended with source breakdown counts, total agencies, and aspendingLimitReachedflag
Use cases for agency lead generation
Sales prospecting for SaaS and technology vendors
Technology vendors targeting digital marketing agencies — from SEO software to white-label ad platforms — need current, segmented agency lists to fuel outbound. Manually building a list of 200 web design agencies in the United States could take two days of browsing across multiple directories. With this actor, a sales team pulls that list in minutes, then feeds domain into Website Contact Scraper to add direct email addresses before importing into their CRM.
Marketing agency market mapping and competitive research
Strategy consultants and M&A researchers use agency directories to map the competitive landscape: who operates in a given city, what services they offer, how large they are, and how they are reviewed. Running this actor for "SEO agency" in "London" and then "web design agency" in "London" produces a structured market map with ratings and team sizes that would take weeks to assemble manually.
Recruiting and talent sourcing
Recruiters placing senior marketing hires often want to identify mid-size agencies (10–49 employees) in specific locations as target employers. The employeeCount and location fields make it straightforward to filter the output to exactly that segment, then enrich with decision-maker contacts using Waterfall Contact Enrichment.
Vendor evaluation and agency procurement
Procurement teams comparing agencies before a pitch process use directory listings to generate a long-list quickly. The rating, reviewCount, and minProjectSize fields provide first-pass scoring criteria without requiring individual website visits. Export to Google Sheets and share with stakeholders for collaborative shortlisting.
White-label agency partnership development
Larger agencies looking for white-label partners in specialist disciplines — video production, accessibility auditing, PR, translation — can filter results by service category and location to identify candidates, then visit sourceUrl profile links to assess social proof before outreach.
Data enrichment for existing CRM records
If your CRM already has agency company names but is missing website, location, or service data, the scraped dataset serves as a reference lookup to fill gaps. Combined with Website Contact Scraper, the pipeline adds email addresses from each agency's own website on top of the directory data.
How to find agency leads
- Enter your agency type and location — type a keyword like
"marketing agency","SEO agency", or"web design agency"in the Agency type field, and a city or country like"New York"or"United Kingdom"in the Location field. This drives the Google Maps search query. - Choose your sources — the default is Google Maps and SuperbCompanies. Add TheManifest for broader coverage. Each source is capped independently, so two sources at 50 each gives up to 100 unique agencies.
- Run the actor — click Start and wait. A run pulling 50 agencies from each of two sources typically completes in 10–15 minutes.
- Download results — open the Dataset tab and export as JSON, CSV, or Excel. Filter by
source,location, orratingin the dataset UI before exporting.
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
sources | array | Yes | ["google-maps", "superbcompanies"] | Which sources to use. Valid values: google-maps, superbcompanies, themanifest. |
services | string | No | "marketing agency" | Agency type keyword. Used as part of the Google Maps search query, e.g. "SEO agency", "web design agency". |
location | string | No | "New York" | City, country, or region for Google Maps searches. Leave blank for global directory results from SuperbCompanies and TheManifest. |
maxAgenciesPerSource | integer | No | 50 | Maximum agencies to collect per enabled source. Range: 1–500. With two sources enabled, output can reach up to 2× this value. |
proxyConfiguration | object | No | {"useApifyProxy": true} | Proxy settings for SuperbCompanies and TheManifest crawling. Standard Apify proxy is sufficient — these sites do not use Cloudflare. |
Input examples
Most common: Google Maps + SuperbCompanies, marketing agencies in New York:
{"sources": ["google-maps", "superbcompanies"],"services": "marketing agency","location": "New York","maxAgenciesPerSource": 50,"proxyConfiguration": {"useApifyProxy": true}}
All three sources, SEO agencies in London, larger batch:
{"sources": ["google-maps", "superbcompanies", "themanifest"],"services": "SEO agency","location": "London","maxAgenciesPerSource": 100,"proxyConfiguration": {"useApifyProxy": true}}
Quick test: directory sources only, small cap:
{"sources": ["superbcompanies"],"services": "web design agency","location": "","maxAgenciesPerSource": 10,"proxyConfiguration": {"useApifyProxy": true}}
Input tips
- Start with the defaults — Google Maps + SuperbCompanies with 50 agencies each covers the most common use case and gives fast feedback before scaling up.
- Location drives Google Maps quality — the
locationfield is concatenated withservicesto form the Google Maps query (e.g."SEO agency London"). A precise city name produces the most relevant local results. Leave it blank if you want global directory results from SuperbCompanies or TheManifest. - Use TheManifest cautiously — TheManifest may be Cloudflare-protected at times. If it returns zero results, the run continues cleanly with the other sources and a warning is logged. Google Maps and SuperbCompanies results are never affected.
- Set a spending limit for large batches — at 3 sources × 500 agencies = up to 1,500 records, the maximum cost is $75. Set a spending limit in the run settings to cap spend automatically.
- Run separate inputs for different service types — if you need both SEO agencies and web design agencies, run them as two separate inputs. Each run maintains its own deduplication state.
Output example
{"agencyName": "Apex Digital Strategies","website": "https://apexdigitalstrategies.com","domain": "apexdigitalstrategies.com","phone": "+1 (212) 555-0142","address": "350 5th Ave, New York, NY 10118","services": ["SEO", "PPC", "Content Marketing", "Email Marketing"],"location": "350 5th Ave, New York, NY 10118","employeeCount": null,"minProjectSize": null,"reviewCount": 94,"rating": 4.8,"source": "google-maps","sourceUrl": "https://www.google.com/maps/place/Apex+Digital+Strategies","scrapedAt": "2026-03-22T10:14:33.121Z"}
SuperbCompanies records include team size and minimum project size where available:
{"agencyName": "Meridian Growth Partners","website": "https://meridiangrowthpartners.com","domain": "meridiangrowthpartners.com","phone": null,"address": null,"services": ["SEO", "PPC", "Social Media", "Branding", "Web Design"],"location": "Austin, TX","employeeCount": "10–49","minProjectSize": "$5,000+","reviewCount": 31,"rating": 4.9,"source": "superbcompanies","sourceUrl": "https://superbcompanies.com/organizations/meridian-growth-partners","scrapedAt": "2026-03-22T10:19:07.883Z"}
The final record in every dataset is a run summary:
{"type": "summary","totalAgencies": 143,"sourcesScraped": ["google-maps", "superbcompanies", "themanifest"],"sourceBreakdown": {"google-maps": 50,"superbcompanies": 50,"themanifest": 43},"service": "marketing agency","location": "New York","maxAgenciesPerSource": 50,"spendingLimitReached": false,"scrapedAt": "2026-03-22T10:31:07.448Z"}
Output fields
| Field | Type | Description |
|---|---|---|
agencyName | string | Agency display name as returned by the source |
website | string | null | Normalised absolute URL of the agency's own website |
domain | string | null | Registrable domain extracted from website (e.g. acmecorp.com), used for cross-source deduplication |
phone | string | null | Phone number as returned by Google Maps; null for directory sources |
address | string | null | Full street address as returned by Google Maps; null for directory sources |
services | string[] | Service and specialty tags extracted from the source, up to 10 per record |
location | string | null | City and/or country as shown on the source; for Google Maps records this may be the full address |
employeeCount | string | null | Team size range, e.g. 10–49, 50–249; available from SuperbCompanies and TheManifest |
minProjectSize | string | null | Minimum project budget, e.g. $5,000+; available from SuperbCompanies and TheManifest |
reviewCount | number | null | Total number of client reviews on the source listing |
rating | number | null | Average star rating parsed as a float, e.g. 4.8 |
source | string | Which source provided this record: google-maps, superbcompanies, or themanifest |
sourceUrl | string | Direct URL to the agency's profile page or Google Maps listing |
scrapedAt | string | ISO 8601 timestamp of when the record was extracted |
How much does it cost to find agency leads?
Agency Lead Finder uses pay-per-event pricing — you pay $0.05 per agency extracted and deduplicated. Platform compute costs are included. You are never charged for duplicates removed during deduplication or for failed page loads.
| Scenario | Agencies | Cost per agency | Total cost |
|---|---|---|---|
| Quick test (1 source, 10 agencies) | 10 | $0.05 | $0.50 |
| Small batch (2 sources, 25 each) | ~50 | $0.05 | ~$2.50 |
| Standard run (2 sources, 50 each) | ~100 | $0.05 | ~$5.00 |
| Large run (3 sources, 100 each) | ~300 | $0.05 | ~$15.00 |
| Maximum batch (3 sources, 500 each) | ~1,500 | $0.05 | ~$75.00 |
You can set a maximum spending limit per run in the Apify console to control costs. The actor stops pushing records when your budget is reached and always outputs a summary record indicating whether the limit was hit.
Compare this to B2B data platforms like Apollo or ZoomInfo at $49–$199/month for general contact data. Agency Lead Finder is purpose-built for agency prospecting, and most users building or refreshing an agency list spend $3–$15 per run with no subscription commitment.
Agency lead generation using the API
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("ryanclinton/agency-directory-scraper").call(run_input={"sources": ["google-maps", "superbcompanies"],"services": "marketing agency","location": "New York","maxAgenciesPerSource": 50,"proxyConfiguration": {"useApifyProxy": True}})for item in client.dataset(run["defaultDatasetId"]).iterate_items():if item.get("type") == "summary":print(f"Total agencies: {item['totalAgencies']}")continueprint(f"{item['agencyName']} | {item['domain']} | {item.get('rating')} stars | {item.get('location')}")
JavaScript
import { ApifyClient } from "apify-client";const client = new ApifyClient({ token: "YOUR_API_TOKEN" });const run = await client.actor("ryanclinton/agency-directory-scraper").call({sources: ["google-maps", "superbcompanies"],services: "marketing agency",location: "New York",maxAgenciesPerSource: 50,proxyConfiguration: {useApifyProxy: true}});const { items } = await client.dataset(run.defaultDatasetId).listItems();for (const item of items) {if (item.type === "summary") continue;console.log(`${item.agencyName} | ${item.domain} | ${item.services.join(", ")} | ${item.source}`);}
cURL
# Start the actor runcurl -X POST "https://api.apify.com/v2/acts/ryanclinton~agency-directory-scraper/runs?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"sources": ["google-maps", "superbcompanies"],"services": "marketing agency","location": "New York","maxAgenciesPerSource": 50,"proxyConfiguration": {"useApifyProxy": true}}'# Fetch results once the run completes (replace DATASET_ID from the run response)curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"
How Agency Lead Finder works
Phase 1 — Google Maps sub-actor call
The actor constructs a Google Maps search query by concatenating the services and location inputs (e.g. "marketing agency New York"). It then calls the ryanclinton/google-maps-email-extractor sub-actor with this query and the maxAgenciesPerSource limit. After the sub-actor run completes, the actor reads its dataset using Actor.apifyClient.dataset(run.defaultDatasetId).listItems() with a 1,000-item ceiling. Each Google Maps item is mapped to the unified AgencyRecord schema: title → agencyName, website → normalised URL, phone, address, categoryName → first services entry, totalScore → rating, reviewsCount → reviewCount. All discovered domains are added to a shared seenDomains Set before the crawler starts.
Phase 2 — CheerioCrawler for SuperbCompanies and TheManifest
Both directory sources are crawled with Crawlee's CheerioCrawler — a lightweight HTTP + Cheerio parser that requires no browser. The crawler runs at a concurrency of 5 with session pooling and cookie persistence. Both sources are seeded from their XML sitemaps: SuperbCompanies uses a sitemap index at /sitemap.xml that references child sitemaps (e.g. /sitemap-organizations-1.xml); TheManifest uses a single sitemap with /companies/ and /directory/ URL patterns, with an HTML anchor fallback if the XML sitemap yields no matches. The sharedState module — a plain TypeScript object imported directly by route handlers — carries the seenDomains Set (pre-populated with Google Maps domains), per-source counters, and the collected results array across all route invocations.
Phase 3 — Profile extraction and normalisation
Each route handler calls parseSuperbCompaniesProfile or parseTheManifestProfile from extractors.ts, which are pure functions that take a Cheerio $ object and return a structured partial record. Agency names are read from the first <h1>. Websites are found by scanning <a href^="http"> links while filtering a junk-domain list that includes linkedin, facebook, twitter, instagram, clutch, google, yelp, sortlist, and the source directory itself. SuperbCompanies profiles also check for a "Visit Website" link text before falling back to the junk-filter scan. Location is read from itemprop="addressLocality" / itemprop="addressCountry" structured data before falling back to class-name selectors. Service tags are collected from [class*="service"], [class*="expertise"], [class*="tag"], and [class*="skill"] elements, deduped, and capped at 10. Ratings and review counts pass through parseRating and parseReviewCount which handle formats including 4.8, 4.8/5, 4.8 stars, 1,234 reviews, and 45.
Phase 4 — Deduplication, PPE charging, and output
After all sources complete, the allResults array is iterated. Each record is pushed to the Apify dataset individually. In pay-per-event mode, Actor.charge({ eventName: 'agency-found', count: 1 }) fires after each push — if eventChargeLimitReached returns true, the loop exits and no further records or charges are made. A final summary record with type: "summary" is always appended, carrying totalAgencies, sourcesScraped, sourceBreakdown, spendingLimitReached, and the input parameters used.
Tips for best results
-
Match your keyword to how agencies describe themselves. Use
"marketing agency"for the broadest results, or be specific with"SEO agency","web design agency", or"digital advertising agency". Vague or misspelled keywords reduce Google Maps result quality. -
Pair a precise city with Google Maps. Google Maps produces the most relevant results when
locationis a specific city like"Chicago"or"Toronto"rather than a broad region. For country-level coverage, omitlocationand use SuperbCompanies or TheManifest as your primary source. -
Include SuperbCompanies as a minimum. SuperbCompanies exposes structured data (schema.org markup) that produces the most consistent
employeeCount,minProjectSize, andratingfields. It is the most reliable supplementary directory source. -
Treat TheManifest as a bonus source. TheManifest is a Clutch sister site with overlapping listings. Enable it for maximum coverage, but expect occasional zero-result runs if the site is Cloudflare-protected on that day. The run still succeeds with the other sources.
-
Use the
domainfield for downstream enrichment. Every record with a non-nulldomaincan be fed directly into Website Contact Scraper to extract emails and phone numbers from agency websites, or into Email Pattern Finder to detect the email naming convention before crafting personalised outreach. -
Schedule weekly runs for a living agency database. New agencies register on Google Maps and these directories regularly. A weekly scheduled run with downstream deduplication by
domainkeeps your prospecting list current without manual effort. -
Set a spending limit on first-time runs. When testing a new keyword or location, set a $3–$5 spending limit in the run settings. The actor stops cleanly at your budget and outputs whatever it collected, so you can assess data quality before committing to a full run.
-
Run separate inputs for separate service categories. Each run maintains its own deduplication state. If you need both SEO agencies and content marketing agencies, run them as separate inputs rather than combining keywords, which can dilute Google Maps result relevance.
Combine with other Apify actors
| Actor | How to combine |
|---|---|
| Website Contact Scraper | Feed the domain output into Website Contact Scraper to add email addresses and phone numbers to each agency record for outreach |
| Email Pattern Finder | Run Email Pattern Finder on each domain to detect the naming convention (e.g. firstname@domain.com) before personalising outreach at scale |
| Waterfall Contact Enrichment | Enrich each agency domain through a 10-step contact enrichment cascade to surface decision-maker names, titles, and emails |
| Bulk Email Verifier | Verify email addresses found for agencies before adding them to outreach sequences to protect sender reputation |
| B2B Lead Qualifier | Score the scraped agency list on 30+ signals to prioritise outreach to the highest-fit prospects first |
| HubSpot Lead Pusher | Push the completed agency dataset directly into HubSpot as company records with associated contact data |
| Website Tech Stack Detector | Detect which marketing tools each agency runs — useful for targeting agencies that use a specific platform your product integrates with |
Limitations
- Google Maps results are location-dependent. Google Maps search quality varies significantly by location. Dense markets like New York or London return highly relevant results; smaller cities may return fewer agencies or adjacent business types. Supplement with SuperbCompanies for location-agnostic coverage.
- TheManifest may be Cloudflare-protected. TheManifest occasionally blocks automated access. When this happens, the source returns zero results and a warning is logged. The run completes normally using the other sources. This is a known limitation and is noted in the actor logs.
- Phone and address are Google Maps only. SuperbCompanies and TheManifest profile pages do not expose phone numbers or street addresses in a consistent, parseable form. The
phoneandaddressfields are null for allsuperbcompaniesandthemanifestrecords. - Employee count and min project size are directory sources only. Google Maps does not carry team size or budget data. The
employeeCountandminProjectSizefields are null for allgoogle-mapsrecords. - Service tags reflect what the directory displays. Service categories on SuperbCompanies and TheManifest are set by the agency during registration and may be broad, inconsistent, or absent. Google Maps returns the business category name as a single-element services array.
- Deduplication is domain-based within a single run. Two agencies at different domains that are the same company will both appear. Merging datasets across multiple runs will introduce duplicates — filter by
domainin your downstream tooling. - Hard cap of 500 agencies per source per run. SuperbCompanies and TheManifest are accessed via sitemap order, which does not sort by rating or review count. The highest-reviewed agencies are not guaranteed to appear first from directory sources.
- No individual profile deep-crawl for Google Maps. Phone and address come from the Google Maps sub-actor output. The sub-actor does not visit each agency's website — for email addresses, combine with Website Contact Scraper.
- HTML changes on SuperbCompanies or TheManifest can reduce field coverage. Selectors use broad CSS class-name substring matching to tolerate minor changes, but a full redesign may require selector updates. Open an issue in the Issues tab if fields start returning null unexpectedly.
Integrations
- Zapier — trigger a Zap when a run completes to route high-rated agencies directly into a CRM deal stage or sales sequence
- Make — build a scenario that pulls agency results after each run and cross-references them against existing CRM contacts before creating new records
- Google Sheets — append scraped agency rows to a shared spreadsheet for team review and manual qualification before outreach
- Apify API — trigger runs programmatically from your internal tooling and retrieve results in JSON or CSV for downstream processing
- Webhooks — post the completed dataset URL to a Slack channel or internal endpoint the moment a run finishes
- LangChain / LlamaIndex — load agency records into a vector store to power an AI assistant that answers questions about the agency landscape in a given market
Troubleshooting
-
Zero results from Google Maps — Check that your
serviceskeyword andlocationform a valid Google Maps search. The query is constructed as"{services} {location}". Very niche keywords or misspellings can produce no results from the sub-actor. Try"marketing agency"+ a major city as a smoke test. -
Zero results from TheManifest — TheManifest may be Cloudflare-protected on the day of your run. This is expected behaviour. The run continues and uses Google Maps and SuperbCompanies results. Check the run log for the warning message
"TheManifest returned 0 results"to confirm this is the cause. -
Most fields are null for directory records — Fields like
phone,address,employeeCount, andminProjectSizeare source-dependent.phoneandaddressare only populated for Google Maps records.employeeCountandminProjectSizeare only available from SuperbCompanies and TheManifest when the agency has filled in their profile. Null values for these fields are expected and normal. -
Fewer agencies than
maxAgenciesPerSource— For a given keyword and location, Google Maps may return fewer results than your cap. SuperbCompanies and TheManifest sitemap coverage varies by niche — some service categories have fewer than 50 listed agencies. The actor returns all available records and stops without error. -
Duplicate agencies in merged datasets — Deduplication operates within a single run by domain. If you merge datasets from multiple runs, duplicates will appear. Filter by
domainin your downstream tooling to deduplicate across runs.
Responsible use
- This actor accesses only publicly available agency listing data from directories whose core business model is built on public discovery of agency firms.
- Respect the terms of service of each directory. Do not use this actor to systematically republish directory content or create a competing agency database.
- When using scraped agency data for outreach, comply with CAN-SPAM, GDPR, and all other applicable data protection regulations in your jurisdiction.
- Do not use extracted data for spam, harassment, or any unsolicited commercial contact that violates applicable law.
- For guidance on web scraping legality, see Apify's guide.
FAQ
How many agency leads can I find in one run? Up to 500 agencies per source across up to three sources — giving a maximum of approximately 1,500 deduplicated agency records per run. In practice, most runs targeting a specific keyword and location return 50–200 records because not every source has 500 listings for every niche.
Which sources does Agency Lead Finder use?
The actor uses three sources: Google Maps (via the ryanclinton/google-maps-email-extractor sub-actor), SuperbCompanies.com (scraped via sitemap), and TheManifest.com (scraped via sitemap). You can enable any combination by setting the sources input parameter. The default is Google Maps and SuperbCompanies.
How is Agency Lead Finder different from scraping Clutch or DesignRush? This actor targets Google Maps, SuperbCompanies, and TheManifest — not Clutch or DesignRush. Google Maps provides phone numbers and street addresses that Clutch does not. SuperbCompanies has 8,000+ open agency profiles accessible without aggressive bot protection. All three sources are combined and deduplicated in a single run, so you get broader coverage without building three separate scrapers.
Does agency lead finding work without a proxy?
Google Maps results come from the sub-actor, which handles its own proxy use. For SuperbCompanies and TheManifest, standard Apify proxy (datacenter) is sufficient — neither site uses Cloudflare. The default proxyConfiguration is already correct. You do not need residential proxies for this actor.
What agency type keywords work best?
Common keywords include "marketing agency", "SEO agency", "web design agency", "digital advertising agency", "branding agency", "social media agency", and "content marketing agency". The keyword drives the Google Maps query. Be as specific as your targeting requires — "B2B SaaS marketing agency" will return a narrower but more relevant set than "marketing agency".
How long does a typical agency lead finding run take? A standard run with two sources at 50 agencies each takes 10–20 minutes. Google Maps results arrive after the sub-actor call completes (typically 3–8 minutes depending on the result count); the CheerioCrawler then processes SuperbCompanies or TheManifest profile pages concurrently. Runs at 500 agencies per source may take 30–60 minutes.
How accurate is the extracted agency data?
Agency names, websites, and locations are reliably extracted from all three sources. Google Maps records include phone and address when the business has a verified Maps listing. employeeCount and minProjectSize depend on whether the agency completed their SuperbCompanies or TheManifest profile — these fields are null when not provided. Star ratings and review counts are extracted where present.
Can I filter agency leads by location?
Yes. Enter a city, state, country, or region in the location field. This is concatenated with your services keyword to form the Google Maps query (e.g. "marketing agency Chicago"). SuperbCompanies and TheManifest are crawled globally via sitemap and do not apply location filtering server-side — filter the output location field after the run for directory results.
Is it legal to scrape agency directories for lead generation? These directories publish agency information publicly as their core business model — the data is intentionally visible to anyone. Scraping publicly available business information for prospecting is generally lawful in most jurisdictions. Review each site's terms of service before large-scale use. For a detailed analysis of web scraping legality, see Apify's guide.
Can I use the agency leads with other Apify actors to get contact emails?
Yes. Feed the domain field from this actor into Website Contact Scraper to extract emails and phone numbers from agency websites, or into Waterfall Contact Enrichment for a broader multi-step enrichment pipeline. The domain field is structured specifically to serve as input for these downstream actors.
Can I schedule this actor to run periodically?
Yes. Apify's scheduler supports cron-based scheduling — daily, weekly, or monthly. Each run produces a fresh dataset. Use the Apify API or a Make/Zapier integration to merge new results into your CRM while deduplicating by domain across runs.
What happens if SuperbCompanies or TheManifest changes its HTML structure?
Selectors use broad CSS class-name substring matching (e.g. [class*="service"], [class*="expertise"]) to tolerate minor HTML changes. A full site redesign may break extraction for that source, causing fields to return null. If a directory source starts returning blank records unexpectedly, open an issue in the Issues tab with your run ID so the selectors can be updated.
Help us improve
If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:
- Go to Account Settings > Privacy
- Enable Share runs with public Actor creators
This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.
Support
Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.