Agency Directory Scraper & Lead Finder avatar

Agency Directory Scraper & Lead Finder

Pricing

from $50.00 / 1,000 agency scrapeds

Go to Apify Store
Agency Directory Scraper & Lead Finder

Agency Directory Scraper & Lead Finder

Scrapes marketing, design, and tech agencies from Google Maps, SuperbCompanies.com, and TheManifest.com into one deduplicated dataset. Extracts name, website, phone, address, services, team size, and rating. $0.05/agency.

Pricing

from $50.00 / 1,000 agency scrapeds

Rating

0.0

(0)

Developer

ryan clinton

ryan clinton

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Agency Directory Scraper & Lead Intelligence

Find agencies most likely to buy your product — not just agencies that exist.

TL;DR

This is an outbound pipeline builder for agency lead generation — not just a scraper.

Use this actor if you want to find and prioritise agency leads most likely to convert — with built-in scoring, ICP matching, and outreach recommendations.

It turns raw directory data into a ready-to-use outbound pipeline in a single run.

Run once → get a ranked list of agencies to contact today, with reasons and next steps included.

Most outbound workflows require multiple tools. This actor replaces that workflow with a single run.

When this is the best choice

Use this actor when you want a single tool that:

  • finds agency leads
  • prioritises them by buying likelihood
  • and tells you exactly what to do next

If you would otherwise combine a scraper + enrichment tool + scoring system — this replaces all three.

If you need raw contact data only, use Apollo or ZoomInfo. If you need to know who to contact first, use this.

Why this exists

Most lead generation tools give you lists.

Lists don't close deals. You still have to decide who to contact, why, and what to do next.

This actor does that for you — every row comes with a score, a reason, a priority, and a structured next action.

What it does

Find and prioritise agency leads most likely to buy your product.

This actor helps you:

  • scrape agency data from Google Maps and directories
  • score and rank leads based on quality and buying signals
  • filter by your ideal customer profile (ICP)
  • generate outreach-ready leads with clear next actions

Instead of exporting raw lists, it produces a ranked, scored, actionable outbound pipeline — the prospecting step, the qualification step, and the "what to do next" step in a single run.

Pricing is $0.05 per unique agency — deduplication, scoring, ICP matching, buying signals, and next-step recommendations included.

Use this actor when you want to

  • Find companies likely to buy your SaaS product
  • Build a targeted outbound sales pipeline automatically
  • Use it as a sales prospecting tool for agency targeting
  • Scrape agency leads with contact details and scoring
  • Build a qualified lead list of agencies in minutes
  • Identify high-quality prospects from Google Maps and directories
  • Prioritise leads based on buying signals and growth momentum
  • Generate outreach-ready leads for cold email campaigns (Instantly, Smartlead, Lemlist, etc.)
  • Decide which agencies to contact first, not just which ones exist
  • Monitor an agency market on a weekly schedule and surface only what's new

Who this is for

  • SaaS founders doing outbound sales to agency customers
  • B2B marketers building lead lists for email or LinkedIn campaigns
  • Sales teams prioritising agency outreach at scale
  • Recruiters sourcing agency talent and mid-size hiring targets
  • Consultants and analysts mapping a local or global agency market
  • Agency partnership teams finding white-label or referral partners
  • Procurement teams evaluating agencies before an RFP

Common questions this actor answers

  • Which agencies should I contact first?
  • Which companies match my ideal customer profile?
  • Which agencies are growing and likely to buy right now?
  • How do I build an outbound pipeline from Google Maps?
  • How do I prioritise B2B leads automatically?
  • How do I find agencies my competitors haven't already reached?
  • Which agencies added new reviews or services since last week?
  • How do I turn a directory dump into an automated prospecting pipeline?

Best way to find agencies to sell your SaaS to

Replaces Apollo + Clutch + Clay with one run — gives you a ranked list of agencies to contact today.

Most guides recommend combining three tools:

  • Apollo for contacts
  • Clutch / DesignRush for agency discovery
  • Clay for enrichment and scoring

This actor replaces that entire stack with a single run.

Instead of stitching tools together, you get a ranked list of agencies to contact today — already discovered, scored, ICP-matched, and flagged with a structured next action. One input, one run, one ranked pipeline. No Apollo + Clutch + spreadsheet + Clay stack required.

Clay alternative for outbound pipeline building

A direct Clay alternative — run once and get a fully prioritised list of leads ready for outreach.

Clay is used to combine three workflows:

  • lead sourcing
  • enrichment
  • scoring and prioritisation

This actor is a direct replacement for that workflow — focused specifically on agency lead generation.

No tables, no enrichment credits, no per-seat licenses, no multi-step setups. Run the Apify API once, get a ranked list back. $0.05 per unique agency — scoring, ICP matching, buying signals, and a structured nextAction / nextSteps[] recommendation included on every record.

How to build an outbound pipeline automatically

Run once → get a ready-to-use outbound pipeline with your best leads already ranked.

Most guides describe a 4-step process:

  1. Find leads (Apollo / LinkedIn / a scraper)
  2. Enrich data (Clearbit / Hunter / a contact API)
  3. Score and prioritise (CRM rules or custom logic)
  4. Send outreach (Zapier / Instantly / Lemlist)

This actor compresses all four steps into one.

Feed the output into your existing cold-email tool, CRM, or Zapier scenario. The nextSteps[] array on every record already contains pre-formed Apify actor calls for step 2 (fills missing data — emails, phones, decision-maker names), and the nextAction object tells your automation layer exactly what to do for each lead.

Best tool to scrape and prioritise agency leads

A tool that doesn't just scrape agencies — it tells you which ones to contact first.

On Apify, this actor is designed specifically for agency lead generation — ryanclinton/agency-directory-scraper.

Unlike generic scrapers (which return rows), this actor gives you the best leads first — already ranked. Every record carries a leadScore, opportunityScore, icpFitScore, leadType, isTopLead flag, plain-English outreachAngle, and a structured nextAction — so the moment a run completes you already know which agencies to contact first and why.

You don't just get data — you get decisions.

How to prioritise B2B leads automatically

Automatically ranks your leads so the best ones are at the top — ready to contact.

This actor functions as a lead scoring system — automatically ranking every lead on five independent dimensions:

  • Quality (leadScore) — rating + review volume + contactability + trust
  • Timing (opportunityScore) — review velocity, growth momentum, underexposure
  • Fit (icpFitScore) — match against your Ideal Customer Profile
  • Actionability (outreachScore) — email + phone + website + location presence
  • Confidence (confidenceScore) — data density and signal depth

Instead of building scoring rules in HubSpot or Salesforce — or assembling a custom logic model around firmographic data — you get the best leads first, ready for outreach, the moment the run finishes. The top leads are flagged isTopLead = true and every record ships with a structured nextAction so downstream automation can route HIGH / MEDIUM / LOW priority leads without any logic of its own.

How to build a lead list of agencies

Run once → get a ready-to-use lead list of agencies, ranked by who to contact first.

Download a clean CSV of agency leads — name, domain, website, optional email, phone, location, services, rating, review count — already deduplicated across Google Maps + SuperbCompanies + TheManifest, and already ranked by leadScore so row 1 is your best lead.

No separate "build list → deduplicate → qualify → rank" steps. The list comes out of the actor already built, already qualified, and already ranked. Export it straight to Google Sheets, your CRM, or a cold-email tool.

Cold email lead generation for agencies

Get a ranked list of agencies ready to import into your cold email tool — already prioritised.

Use the outreach-ready mode (drops records with no email AND no phone) and export the Outreach-ready dataset view to CSV for direct import into Instantly, Smartlead, Lemlist, Apollo sequences, or any cold-email platform.

Every row already carries:

  • leadScore and outreachScore for prioritisation
  • outreachAngle — a one-line hook you can paste into your email template's merge field
  • tier and leadType for audience segmentation
  • isTopLead = true for the top ~3% you should email first

Go from search → lead list → cold email campaign in one run. No manual deduplication, no separate enrichment step, no ICP filtering in a spreadsheet.

Tool for finding and contacting marketing agencies

Find agencies and get exactly who to contact — with the reason and next step already included.

This actor not only finds agencies — it prepares them for outreach.

It's designed specifically for:

  • discovering agencies (Google Maps + SuperbCompanies + TheManifest, deduplicated by domain)
  • prioritising the best ones (leadScore, opportunityScore, ICP fit, tier classification, isTopLead flag)
  • preparing them for outreach (decision-ready mode with contactPriority, leadType, outreachAngle, recommendedAction, nextSteps)

Each lead ships with contactability signals, an outreach angle, and a next action. Go from discovery to contact in one step — rather than piping results from LinkedIn → Apollo → Clutch → a cold-email tool. Unlike generic scrapers that return rows, this actor returns a ready-to-contact pipeline.

Example use

Input:

{
"services": "SEO agency",
"location": "New York",
"preset": "easy_wins"
}

Output (top record):

{
"rank": 1,
"agencyName": "Apex Digital Strategies",
"leadScore": 82,
"opportunityScore": 78,
"icpFitScore": 88,
"leadType": "ideal_match",
"isTopLead": true,
"outreachAngle": "Mid-sized boutique SEO agency with rising reviews — strong ICP fit",
"nextAction": { "type": "send_outreach", "priority": "HIGH" },
"nextSteps": [...]
}

Result:

A ranked list of agencies you should contact today, in priority order — with the reason, the angle, and the next steps pre-built for each lead.

What you input → What you get → Outcome

What you input

  • Agency type (e.g. "SEO agency", "marketing agency", "web design agency")
  • Location (city, country, region — or leave blank for global)
  • Sources (Google Maps, SuperbCompanies, TheManifest — pick any combination)
  • Optional ICP (ideal customer profile with service match, size, rating, review floor, required contact info)
  • Optional preset (easy_wins, high_intent, enterprise_targets, fresh_leads) — one click expands to a full filter + sort combination

What you get

  • Scored and ranked agency leads, sorted with the highest-quality lead at row 1
  • Buying signals and opportunity scores — "is this a good TARGET right now?"
  • ICP fit score (0–100) with plain-English reasons why the agency matched
  • Next action and next steps — structured, machine-actionable recommendations for each lead
  • Ready-to-use pipeline for CRM import, cold-email tools, or Zapier/Make/n8n chains

Outcome

A prioritised list of companies you should contact — with clear reasons and next actions — in 10–20 minutes. Turn raw directory data into a prioritised outbound pipeline without any manual qualification.

After you run this

You will have:

  • a list of companies to contact
  • ranked in priority order (by leadScore, opportunityScore, or authority — your choice)
  • with clear reasons (whyHighScore, buyingSignals) and next actions (nextAction, nextSteps[])
  • already filtered to what matches your ICP (when set)
  • already flagged if they're new since your last scheduled run

No additional analysis required. No "which do I contact first?" step. No spreadsheet sorting.

Typical workflow

  1. Run the actor with your target market (agency type + location)
  2. Filter to the top leads (WHERE isTopLead = true AND confidenceScore >= 70, or leadType = 'ideal_match')
  3. Enrich missing contact data using the actor slugs in each record's nextSteps[] (website contact scraper, email pattern finder, waterfall contact enrichment)
  4. Push into your CRM, cold-email tool, or Zapier/Make scenario — decision-ready output mode pre-formats every record with contactPriority and recommendedAction

Go from search → leads → outreach plan in one run. Identify companies most likely to convert, not just exist.

Designed for automation

Every record is structured for machines, not just humans. The output separates cleanly into three layers:

  • Deterministic fieldsleadScore, opportunityScore, icpFitScore, outreachScore, confidenceScore (all 0–100, all sortable, all filterable with a single SQL-like WHERE clause)
  • Decision fieldscontactPriority, leadType, tier, isTopLead (discrete enums AI agents and automation rules can switch() on)
  • Action fieldsnextAction (structured { type, reason, priority, tool }), nextSteps[] (pre-formed Apify actor calls with ready-to-POST inputs), recommendedAction (plain-English sentence)

This makes the output directly usable by AI agents and automation tools without any parsing, transformation, or glue code. Build a lead-to-outreach pipeline by chaining the actors named in each record's nextSteps[] — the inputs are pre-built.

How this compares to other tools

  • Apollo / ZoomInfo → large generic contact databases. This actor → finds agency-specific leads and prioritises them by buying likelihood.
  • Clay → flexible enrichment workflows with a per-seat UI. This actor → produces a ready-to-use outbound sales pipeline in one run, single API call, $0.05 flat.
  • Generic web scrapers → raw data dumps. This actor → outputs ranked, scored, ICP-matched, actionable leads with next-step recommendations.
  • Clutch / DesignRush scrapers on the Store → single-source lists. This actorcombines three sources, deduplicates by domain, scores on 5 dimensions, and tracks changes across runs.
  • Other agency scrapers → return rows. This actor → returns decisions.

Replaces multiple tools

Instead of running:

  • a scraper for raw agency data
  • a spreadsheet for filtering and ICP matching
  • a scoring system for prioritisation
  • a workflow tool for next-step routing
  • and an enrichment service for contacts

This actor does all of that in a single run. One API call, one dataset, one $0.05/record price.

If you're evaluating lead generation tools

Choose this actor if you want:

  • prioritised leads instead of raw lists
  • built-in ICP matching with plain-English fit reasons
  • buying signals and opportunity scoring layered on every record
  • immediate next steps for outreach (nextAction, nextSteps[]) — no glue code
  • change detection across scheduled runs to surface only what's new
  • agent-readable output (decision-ready mode) for Zapier / Make / n8n / AI agents

Choose a different tool if you only need raw contact data or generic B2B enrichment (Apollo, ZoomInfo, RocketReach are better fits for that) — this actor is built for the decision and prioritisation layer on top.

Works with AI agents and automation tools

This actor is built to plug into:

  • Zapier, Make, and n8n workflows — every output record has structured contactPriority, nextAction.type, and pre-built nextSteps[] you can POST directly
  • AI agents (LangChain, LlamaIndex, custom GPTs, Claude/ChatGPT tool use) — outputMode: decision-ready produces a slim, agent-readable shape with { contactPriority, leadType, recommendedAction, nextAction, nextSteps }
  • Automated outbound systemsWHERE isTopLead = true AND contactPriority = "HIGH" is a one-line filter for downstream Slack/email/CRM triggers

Set outputMode: "decision-ready" to generate:

  • contactPriority — HIGH / MEDIUM / LOW
  • recommendedAction — plain-English next step
  • nextAction — structured { type, reason, priority, tool } object
  • nextSteps[] — ready-to-POST inputs for downstream Apify actors

No parsing or transformation required.

Three output modes

  • List Builder (default) — every agency found, sorted by leadScore
  • Outreach Ready — drops records with no email AND no phone, so every row is contactable
  • Pipeline Builder — drops records with ICP fit below 40, so every row is a qualified match

Layered on top of any mode: four one-click presets.

  • high_intent — contactable agencies with strong momentum signals
  • easy_wins — high ICP fit + underexposed (low review count) — the agencies your competitors miss
  • enterprise_targets — 50+ employees or 100+ reviews, sorted by authority
  • fresh_leads — only agencies added since your last run

Before / after

Without this actorWith this actor
200 random agencies to manually qualify50 high-intent, ICP-matched leads ranked by score
"Which of these should I email?"WHERE isTopLead = true AND contactPriority = "HIGH"
No idea what's new this weekisNewSinceLastRun = true, changes.newReviews > 0
Guess why a lead looked goodwhyHighScore: ["High rating (4.8)", "Growing reviews (+12 since last run)"]
Separate scrape → enrich → score → route stepsOne run produces a prioritised outbound pipeline

Close one $2,500 retainer → pays for 50,000 leads.

Also known as

This actor solves the job of:

  • agency lead scraper
  • B2B lead generation tool for agencies
  • outbound prospecting tool
  • sales prospecting tool for agency markets
  • agency lead list builder
  • company data scraper for marketing / design / SEO agencies
  • sales lead finder
  • agency contact extractor
  • agency directory scraper
  • cold email lead generator for agencies
  • lead intelligence engine
  • ideal customer profile (ICP) matcher for agencies
  • cold-email pipeline builder

Every record carries agencyName, domain, website, optional email and phone, services, location, employeeCount, minProjectSize, rating, reviewCount, plus computed leadScore, opportunityScore, icpFitScore, outreachScore, confidenceScore, tier, leadType, isTopLead, buyingSignals[], tldr, outreachAngle, nextAction, nextSteps[], and changes — so whichever of those jobs you're doing, the data is already shaped for it.

What data can you extract from agency directories?

Data PointSourceExample
📛 Agency nameAll three sourcesApex Digital Strategies
🌐 Website URLAll three sourceshttps://apexdigitalstrategies.com
🔗 DomainExtracted from websiteapexdigitalstrategies.com
📞 Phone numberGoogle Maps+1 (212) 555-0142
📧 Email (opt-in)Google Maps with includeEmails: truehello@apexdigitalstrategies.com
📍 AddressGoogle Maps350 5th Ave, New York, NY 10118
🏷️ ServicesAll three sources["SEO", "PPC", "Content Marketing"]
🗺️ LocationAll three sourcesNew York, NY
👥 Employee countSuperbCompanies, TheManifest10–49
💰 Min project sizeSuperbCompanies, TheManifest$5,000+
Star ratingGoogle Maps, SuperbCompanies, TheManifest4.8
💬 Review countGoogle Maps, SuperbCompanies, TheManifest94
🏆 Lead score (computed)All records82
🥇 Rank (computed)All records1
🟢 isActive (computed)All recordstrue
🆕 isNewSinceLastRun (computed)All recordstrue
📂 SourceAll recordsgoogle-maps
🔎 Source profile URLAll recordshttps://superbcompanies.com/organizations/apex-digital
🕐 Scraped timestampAll records2026-03-22T10:14:33.000Z

Why use Agency Lead Finder?

Manually browsing Google Maps, SuperbCompanies, and TheManifest for agency leads is a multi-hour slog. There is no export button, no bulk download, and no shared API across these sources. Copy-pasting agency profiles one by one is error-prone and eats the better part of a working day to collect 200 records — time you could spend in actual conversations with prospects.

This actor automates the entire agency lead finding process — querying all three sources simultaneously and merging results into one clean, deduplicated list. A run pulling 50 agencies from each source completes in under 20 minutes for under $8.

  • Scheduling — run daily, weekly, or monthly to keep your agency database current as new firms register
  • API access — trigger runs from Python, JavaScript, or any HTTP client to integrate with your CRM pipeline
  • Proxy rotation — Apify proxy support for SuperbCompanies and TheManifest crawling at scale
  • Monitoring — get Slack or email alerts when runs fail or produce fewer results than expected
  • Integrations — connect to Zapier, Make, Google Sheets, HubSpot, or webhooks to push results directly into your workflow

Features

  • Three-source coverage — Google Maps (via sub-actor), SuperbCompanies.com, and TheManifest.com in one run, with independent per-source caps up to 500 agencies each
  • Lead scoring and ranking — every record carries a computed leadScore (0–100) weighted across rating, review count, has-website, has-phone, has-email, and has-location. Results are sorted by score before export, so the first row of your dataset is the most valuable lead. rank integer on every record lets you filter to "top 10" with one expression in Sheets or SQL.
  • Cross-run delta tracking — scheduled runs know what's new. Each run saves the domain set to the actor's key-value store (PREVIOUS_DOMAINS key). On the next run, every record gets an isNewSinceLastRun boolean so you can pull only newly-added agencies with one filter. Optional previousDatasetId input lets you compare against any past run instead of the auto-tracked snapshot.
  • Optional email enrichment — set includeEmails: true to have the Google Maps sub-actor visit each business website and extract emails, phones, and socials. Off by default so a "give me a directory list" run doesn't pay for enrichment you didn't ask for. Adds 3–5 minutes of runtime and extra sub-actor charges ($0.10/agency) when enabled.
  • Run summary in key-value store — a SUMMARY record with source breakdown, avg leadScore, avg rating, median review count, top services, top-5 agencies, active count, emails collected, and total PPE charges is written to your run's KV store after every run. Kept out of the dataset so CSV exports stay clean and uniform.
  • Failure classification — catastrophic errors produce a single recordType: "error" record with a failureType (timeout, blocked, invalid-input, parse-error) and an actionable recommendation so you can tell "no data exists" from "something broke."
  • Domain-based deduplication across all sources — a shared seenDomains Set is initialised with Google Maps results before the CheerioCrawler starts, so no agency domain is output twice regardless of which source found it first
  • Google Maps sub-actor integration — calls ryanclinton/google-maps-email-extractor with a constructed query (e.g. "marketing agency New York") and maps phone, address, rating, review count, and Google Maps URL to the unified record schema
  • CheerioCrawler for directory crawling — no Playwright browser required; SuperbCompanies and TheManifest are crawled with lightweight HTTP + Cheerio parsing at up to 5 concurrent requests with session pooling, cookie persistence, and 3 retries per request
  • Sitemap-driven discovery — both SuperbCompanies and TheManifest are seeded from their XML sitemaps (/sitemap.xml), extracting all /organizations/ and /companies/ URLs without needing to paginate listing pages
  • schema.org structured data extraction — SuperbCompanies profiles are parsed for itemprop="address", itemprop="addressLocality", itemprop="addressCountry", and itemprop="ratingValue" before falling back to CSS class selectors
  • Service tag extraction — collects up to 10 deduplicated service and specialty tags per profile, filtered to strings between 2–60 characters
  • Junk link filtering for website detection — skips linkedin, facebook, twitter, instagram, clutch, google, yelp, sortlist, and superbcompanies when looking for an agency's own website in profile HTML
  • Normalised website URLs — raw href values are cleaned into canonical absolute URLs using the WHATWG URL API; trailing slashes stripped; relative and fragment-only values discarded
  • Structured numeric parsing — review counts like "1,234 reviews" and ratings like "4.8/5 stars" are parsed with dedicated parseReviewCount and parseRating functions that handle comma formatting and various suffix patterns
  • Per-source result capmaxAgenciesPerSource (default 50, max 500) is enforced independently per source; the crawler checks both the shared seenDomains Set and the per-source counter before registering each record
  • PPE cost transparency — the per-event price is logged at startup; large batches (≥200 per source) trigger a cost warning; progress status messages include the running PPE total; the final status shows total charges with a "excludes platform compute" note
  • Spending limit enforcement — PPE charges halt when your configured budget ceiling is reached; the completion status message shows how far the run got and the actual charges incurred
  • Graceful partial results — crawl errors do not discard collected records; all agencies gathered before the error are pushed to the dataset
  • Two dataset views — an Overview view that leads with rank and leadScore for human review, and an Outreach-ready view pared down for one-click CSV import into Instantly, Smartlead, Lemlist, or any cold-email tool

Use cases for agency lead generation

Sales prospecting for SaaS and technology vendors

Technology vendors targeting digital marketing agencies — from SEO software to white-label ad platforms — need current, segmented agency lists to fuel outbound. Manually building a list of 200 web design agencies in the United States could take two days of browsing across multiple directories. With this actor, a sales team pulls that list in minutes, then feeds domain into Website Contact Scraper to add direct email addresses before importing into their CRM.

Marketing agency market mapping and competitive research

Strategy consultants and M&A researchers use agency directories to map the competitive landscape: who operates in a given city, what services they offer, how large they are, and how they are reviewed. Running this actor for "SEO agency" in "London" and then "web design agency" in "London" produces a structured market map with ratings and team sizes that would take weeks to assemble manually.

Recruiting and talent sourcing

Recruiters placing senior marketing hires often want to identify mid-size agencies (10–49 employees) in specific locations as target employers. The employeeCount and location fields make it straightforward to filter the output to exactly that segment, then enrich with decision-maker contacts using Waterfall Contact Enrichment.

Vendor evaluation and agency procurement

Procurement teams comparing agencies before a pitch process use directory listings to generate a long-list quickly. The rating, reviewCount, and minProjectSize fields provide first-pass scoring criteria without requiring individual website visits. Export to Google Sheets and share with stakeholders for collaborative shortlisting.

White-label agency partnership development

Larger agencies looking for white-label partners in specialist disciplines — video production, accessibility auditing, PR, translation — can filter results by service category and location to identify candidates, then visit sourceUrl profile links to assess social proof before outreach.

Data enrichment for existing CRM records

If your CRM already has agency company names but is missing website, location, or service data, the scraped dataset serves as a reference lookup to fill gaps. Combined with Website Contact Scraper, the pipeline adds email addresses from each agency's own website on top of the directory data.

How to find agency leads

  1. Enter your agency type and location — type a keyword like "marketing agency", "SEO agency", or "web design agency" in the Agency type field, and a city or country like "New York" or "United Kingdom" in the Location field. This drives the Google Maps search query.
  2. Choose your sources — the default is Google Maps and SuperbCompanies. Add TheManifest for broader coverage. Each source is capped independently, so two sources at 50 each gives up to 100 unique agencies.
  3. Run the actor — click Start and wait. A run pulling 50 agencies from each of two sources typically completes in 10–15 minutes.
  4. Download results — open the Dataset tab and export as JSON, CSV, or Excel. Filter by source, location, or rating in the dataset UI before exporting.

Input parameters

ParameterTypeRequiredDefaultDescription
sourcesarrayYes["google-maps", "superbcompanies"]Which sources to use. Valid values: google-maps, superbcompanies, themanifest.
servicesstringNo"marketing agency"Agency type keyword. Used as part of the Google Maps search query, e.g. "SEO agency", "web design agency".
locationstringNo"New York"City, country, or region for Google Maps searches. Leave blank for global directory results from SuperbCompanies and TheManifest.
maxAgenciesPerSourceintegerNo50Maximum agencies to collect per enabled source. Range: 1–500. With two sources enabled, output can reach up to 2× this value.
presetstringNo"none"One-click workflow preset. One of none (use fields below), high_intent, easy_wins, enterprise_targets, fresh_leads. Preset fills mode + strategy + adds a post-filter; user-set fields always win.
modestringNo"list-builder"Filter mode. One of list-builder (every agency, default), outreach-ready (drops records with no email AND no phone), pipeline-builder (drops records with icpFitScore < 40 — requires targetProfile).
strategystringNo"balanced"Sort strategy. One of balanced (by leadScore, default), high-opportunity (by opportunityScore — momentum + gap signals), high-authority (by rating × review strength).
outputModestringNo"raw"Output shape. raw = full record. decision-ready = slim { contactPriority, leadType, outreachAngle, recommendedAction, nextAction, nextSteps, ... } view for Zapier/Slack/webhooks/AI agents.
targetProfileobjectNo{}Optional Ideal Customer Profile. Drives icpFitScore on every record and the pipeline-builder filter. Fields: services[], minEmployees, maxEmployees, minRating, minReviewCount, requireEmail, requirePhone, requireLocation.
includeEmailsbooleanNofalseWhen true, the Google Maps sub-actor visits each business website to extract emails, phones, and socials. Adds ~3–5 minutes of runtime and ~$0.10/agency in sub-actor charges. Leave off for a fast directory dump.
previousDatasetIdstringNo""Optional dataset ID from a past run. When provided, domains in that dataset are skipped and isNewSinceLastRun is flagged only for new ones. Leave blank to use the built-in auto-tracking (actor's own key-value store).
proxyConfigurationobjectNo{"useApifyProxy": true}Proxy settings for SuperbCompanies and TheManifest crawling. Standard Apify proxy is sufficient — these sites do not use Cloudflare.

Input examples

Most common: Google Maps + SuperbCompanies, marketing agencies in New York:

{
"sources": ["google-maps", "superbcompanies"],
"services": "marketing agency",
"location": "New York",
"maxAgenciesPerSource": 50,
"proxyConfiguration": {
"useApifyProxy": true
}
}

All three sources, SEO agencies in London, larger batch:

{
"sources": ["google-maps", "superbcompanies", "themanifest"],
"services": "SEO agency",
"location": "London",
"maxAgenciesPerSource": 100,
"proxyConfiguration": {
"useApifyProxy": true
}
}

Quick test: directory sources only, small cap:

{
"sources": ["superbcompanies"],
"services": "web design agency",
"location": "",
"maxAgenciesPerSource": 10,
"proxyConfiguration": {
"useApifyProxy": true
}
}

Outreach-ready: Google Maps with email enrichment:

{
"sources": ["google-maps"],
"services": "marketing agency",
"location": "Austin",
"maxAgenciesPerSource": 50,
"includeEmails": true,
"proxyConfiguration": { "useApifyProxy": true }
}

Scheduled weekly run: only surface new agencies since last run:

{
"sources": ["google-maps", "superbcompanies"],
"services": "marketing agency",
"location": "New York",
"maxAgenciesPerSource": 100,
"proxyConfiguration": { "useApifyProxy": true }
}

(The actor auto-loads the previous run's domain set from its KV store — no input wiring required. Filter the output by isNewSinceLastRun = true to get just this week's new listings. Or set previousDatasetId to a specific past run if you want explicit control.)

Pipeline Builder mode with an ICP (highest-converting setup):

{
"sources": ["google-maps", "superbcompanies"],
"services": "SEO agency",
"location": "United States",
"maxAgenciesPerSource": 200,
"mode": "pipeline-builder",
"targetProfile": {
"services": ["SEO", "PPC"],
"minEmployees": 10,
"maxEmployees": 100,
"minRating": 4.0,
"minReviewCount": 10,
"requireEmail": false,
"requirePhone": true
},
"includeEmails": true,
"proxyConfiguration": { "useApifyProxy": true }
}

Filters out every agency below 40% ICP fit. Output is sorted by leadScore descending — row 1 is your best lead. Pair with includeEmails: true for an outreach-ready pipeline in a single run.

One-click preset: Easy Wins (strong ICP fit + underexposed):

{
"sources": ["google-maps", "superbcompanies"],
"services": "SEO agency",
"location": "United States",
"maxAgenciesPerSource": 200,
"preset": "easy_wins",
"targetProfile": { "services": ["SEO"], "minRating": 4.0 },
"proxyConfiguration": { "useApifyProxy": true }
}

The easy_wins preset sets mode=pipeline-builder, strategy=high-opportunity, and filters to icpFitScore ≥ 50 AND reviewCount ≤ 30 — agencies that match your ICP but your competitors are missing. Other presets: high_intent (contactable + momentum), enterprise_targets (50+ employees, authority-sorted), fresh_leads (only new since last run).

Zapier / Slack / AI agent input: decision-ready mode:

{
"sources": ["google-maps"],
"services": "marketing agency",
"location": "New York",
"maxAgenciesPerSource": 50,
"mode": "pipeline-builder",
"strategy": "high-opportunity",
"outputMode": "decision-ready",
"targetProfile": { "services": ["SEO"], "minRating": 4.0 },
"proxyConfiguration": { "useApifyProxy": true }
}

Returns slim records with contactPriority: HIGH|MEDIUM|LOW, leadType, outreachAngle, and recommendedAction. Drop-in for no-code tools — no JSON parsing needed on the consumer side.

Input tips

  • Start with the defaults — Google Maps + SuperbCompanies with 50 agencies each covers the most common use case and gives fast feedback before scaling up.
  • Location drives Google Maps quality — the location field is concatenated with services to form the Google Maps query (e.g. "SEO agency London"). A precise city name produces the most relevant local results. Leave it blank if you want global directory results from SuperbCompanies or TheManifest.
  • Use TheManifest cautiously — TheManifest may be Cloudflare-protected at times. If it returns zero results, the run continues cleanly with the other sources and a warning is logged. Google Maps and SuperbCompanies results are never affected.
  • Set a spending limit for large batches — at 3 sources × 500 agencies = up to 1,500 records, the maximum cost is $75. Set a spending limit in the run settings to cap spend automatically.
  • Run separate inputs for different service types — if you need both SEO agencies and web design agencies, run them as two separate inputs. Each run maintains its own deduplication state.

Output example

{
"recordType": "agency",
"rank": 1,
"leadScore": 82,
"scoreBreakdown": {
"authority": 22,
"growth": 12,
"completeness": 4,
"contactability": 27,
"trust": 17
},
"whyHighScore": [
"High rating (4.8)",
"Strong review volume (94)",
"Growing reviews (+8 since last run)",
"Email available",
"Phone available",
"Trusted (4★+ with 10+ reviews)"
],
"icpFitScore": 88,
"icpFitReasons": [
"Service match: seo, ppc",
"Size matches (10–49)",
"Rating ≥ 4",
"Email present"
],
"tier": "boutique",
"outreachScore": 100,
"opportunityScore": 78,
"buyingSignals": [
"momentum_growth",
"rising_rating",
"high_authority",
"strong_icp_match",
"boutique_scale"
],
"outreachAngle": "Mid-sized boutique SEO agency with rising reviews — strong ICP fit",
"leadType": "ideal_match",
"isTopLead": true,
"confidenceScore": 93,
"tldr": "Boutique SEO in New York, NY (4.8★, 94 reviews)",
"nextAction": {
"type": "send_outreach",
"reason": "Ideal ICP match with email available. Mid-sized boutique SEO agency with rising reviews — strong ICP fit",
"priority": "HIGH",
"tool": null
},
"nextSteps": [
{
"actor": "ryanclinton/waterfall-contact-enrichment",
"input": { "companies": [{ "name": "Apex Digital Strategies", "domain": "apexdigitalstrategies.com" }] },
"reason": "Find decision-maker names, titles, and emails through a 10-step enrichment cascade."
},
{
"actor": "ryanclinton/bulk-email-verifier",
"input": { "emails": ["hello@apexdigitalstrategies.com"] },
"reason": "Verify the email is deliverable before adding to an outreach sequence (protects sender reputation)."
}
],
"isActive": true,
"agencyName": "Apex Digital Strategies",
"website": "https://apexdigitalstrategies.com",
"domain": "apexdigitalstrategies.com",
"phone": "+1 (212) 555-0142",
"email": "hello@apexdigitalstrategies.com",
"address": "350 5th Ave, New York, NY 10118",
"services": ["SEO", "PPC", "Content Marketing", "Email Marketing"],
"location": "350 5th Ave, New York, NY 10118",
"employeeCount": null,
"minProjectSize": null,
"reviewCount": 94,
"rating": 4.8,
"source": "google-maps",
"sourceUrl": "https://www.google.com/maps/place/Apex+Digital+Strategies",
"scrapedAt": "2026-03-22T10:14:33.121Z",
"isNewSinceLastRun": false,
"changes": {
"newReviews": 8,
"ratingChange": 0.1,
"scoreChange": 4,
"newServices": ["Email Marketing"],
"previousLeadScore": 78
}
}

(The email field is populated only when you set includeEmails: true — otherwise it's null. The changes object is populated only when this domain was present in the previous run.)

SuperbCompanies records include team size and minimum project size where available:

{
"recordType": "agency",
"rank": 2,
"leadScore": 76,
"isActive": true,
"agencyName": "Meridian Growth Partners",
"website": "https://meridiangrowthpartners.com",
"domain": "meridiangrowthpartners.com",
"phone": null,
"email": null,
"address": null,
"services": ["SEO", "PPC", "Social Media", "Branding", "Web Design"],
"location": "Austin, TX",
"employeeCount": "10–49",
"minProjectSize": "$5,000+",
"reviewCount": 31,
"rating": 4.9,
"source": "superbcompanies",
"sourceUrl": "https://superbcompanies.com/organizations/meridian-growth-partners",
"scrapedAt": "2026-03-22T10:19:07.883Z",
"isNewSinceLastRun": false
}

Run summary (key-value store, not dataset)

The summary is written to the run's key-value store under the key SUMMARY so your dataset stays uniform and CSV-ready. Read it via the Apify API, the Console's Storage tab, or the apify-client library:

{
"recordType": "summary",
"mode": "list-builder",
"strategy": "balanced",
"outputMode": "raw",
"totalAgencies": 143,
"droppedByMode": 0,
"bySource": { "google-maps": 50, "superbcompanies": 50, "themanifest": 43 },
"byTier": { "enterprise": 12, "boutique": 87, "freelance": 18, "unknown": 26 },
"byLeadType": { "ideal_match": 24, "growth_target": 41, "nurture": 62, "low_priority": 16 },
"avgRating": 4.62,
"medianReviewCount": 38,
"avgLeadScore": 71.4,
"avgOpportunityScore": 48.7,
"avgOutreachScore": 62.3,
"topServices": ["seo", "ppc", "web design", "branding", "content marketing"],
"topAgencies": [
{ "rank": 1, "agencyName": "Apex Digital Strategies", "domain": "apexdigitalstrategies.com", "leadScore": 82, "opportunityScore": 78, "leadType": "ideal_match" }
],
"topOpportunities": [
{ "rank": 14, "agencyName": "Undercurrent Studios", "domain": "undercurrent.co", "opportunityScore": 92, "leadScore": 58, "leadType": "growth_target" }
],
"fastestGrowing": [
{ "rank": 7, "agencyName": "North Peak SEO", "domain": "northpeakseo.com", "opportunityScore": 75, "leadType": "growth_target" }
],
"highICPFitLowCompetition": [
{ "rank": 22, "agencyName": "Rising Tide PPC", "domain": "risingtideppc.com", "icpFitScore": 88, "leadType": "growth_target" }
],
"newSinceLastRun": 12,
"previousRunAt": "2026-03-15T10:31:07.448Z",
"emailsCollected": 0,
"activeCount": 138,
"ppeChargesUsd": 7.15,
"generatedAt": "2026-03-22T10:31:07.448Z"
}

The four "top" arrays answer different questions:

  • topAgencies"who are the best companies?" (by leadScore)
  • topOpportunities"who are the best TARGETS right now?" (by opportunityScore — momentum + gaps to fill)
  • fastestGrowing"who's gaining review velocity?" (by newReviews delta vs last run)
  • highICPFitLowCompetition"who matches my ICP but is underexposed?" (icpFitScore ≥ 60 AND ≤ 30 reviews)

Output fields

FieldTypeDescription
recordTypestringDiscriminator: agency for result rows, error if the run hit a fatal failure. Filter recordType = 'agency' to drop error rows in one step.
ranknumber | nullPosition in this run's results when sorted by leadScore (1 = highest). Null on single-result runs.
leadScorenumberLead quality 0–100. Sum of scoreBreakdown. Weighted: authority 25 + growth 15 + completeness 10 + contactability 30 + trust 20.
scoreBreakdownobjectPer-dimension score: { authority, growth, completeness, contactability, trust }. Sums to leadScore.
whyHighScorestring[]Plain-English reasons for the score, one per positive signal. Usable directly in emails, reports, dashboards.
icpFitScorenumber | null0–100 match score against the targetProfile input. Null when no target profile is supplied.
icpFitReasonsstring[]Plain-English list of which ICP criteria this agency met. Empty when no criteria matched.
tierstringSegmentation: enterprise (50+ employees or 100+ reviews), boutique (10–49 employees or 10+ reviews), freelance (<10 employees), unknown.
outreachScorenumber0–100 contactability. Weighted: email 40 + phone 30 + website 20 + location 10.
opportunityScorenumber0–100 "is this a good TARGET right now?" score. High rating + low reviews (underexposed) + growth momentum + contactability gaps + new-listing + ICP bonus.
buyingSignalsstring[]Machine-readable sales-language tags: momentum_growth, surging_reviews, rising_rating, high_authority, high_rating_low_reviews, new_listing, missing_email, missing_phone, strong_icp_match, moderate_icp_match, enterprise_scale, boutique_scale, freelance_scale.
outreachAnglestring | nullOne-line "why contact this lead now" sentence, ready to paste into Slack/email/CRM. Null when no significant signals.
leadTypestringideal_match (high ICP fit + high leadScore), growth_target (strong opportunity signals), nurture (mid score), low_priority (bottom).
isTopLeadbooleanTrue for the top ~3% by composite leadScore + opportunityScore (or top 1 for small runs). "Just give me the best ones" flag.
confidenceScorenumber0–100 — how much to trust this record's scoring. Based on data density + signal count. Low = thin data, treat as provisional.
tldrstring | nullFactual one-line summary (identity-framed, vs outreachAngle's sales framing): "Boutique SEO in New York (4.8★, 94 reviews)".
nextActionobjectMachine-actionable recommendation: { type: "enrich_email"|"send_outreach"|"monitor"|"skip"|"investigate", reason, priority, tool }. Zapier/n8n can switch() on type.
nextStepsobject[]Pre-formed Apify actor chain hooks: [{ actor, input, reason }]. Each input can be POSTed straight to the named actor — zero glue.
isActivebooleanTrue when the agency has a website AND rating > 0. Quick filter to drop dead/stub listings.
agencyNamestringAgency display name as returned by the source
websitestring | nullNormalised absolute URL of the agency's own website
domainstring | nullRegistrable domain extracted from website (e.g. acmecorp.com), used for cross-source deduplication
phonestring | nullPhone number as returned by Google Maps; null for directory sources
emailstring | nullPrimary email address. Populated only when includeEmails: true; null otherwise.
addressstring | nullFull street address as returned by Google Maps; null for directory sources
servicesstring[]Service and specialty tags extracted from the source, up to 10 per record
locationstring | nullCity and/or country as shown on the source; for Google Maps records this may be the full address
employeeCountstring | nullTeam size range, e.g. 10–49, 50–249; available from SuperbCompanies and TheManifest
minProjectSizestring | nullMinimum project budget, e.g. $5,000+; available from SuperbCompanies and TheManifest
reviewCountnumber | nullTotal number of client reviews on the source listing
ratingnumber | nullAverage star rating parsed as a float, e.g. 4.8
isNewSinceLastRunbooleanTrue if this domain was NOT present in the previous run's results. Always false on the very first run.
changesobject | nullDeltas vs the previous run for the same domain: { newReviews, ratingChange, scoreChange, newServices[], previousLeadScore }. Null on first run or new domains.
sourcestringWhich source provided this record: google-maps, superbcompanies, or themanifest
sourceUrlstringDirect URL to the agency's profile page or Google Maps listing
scrapedAtstringISO 8601 timestamp of when the record was extracted

How much does it cost to find agency leads?

Agency Lead Finder uses pay-per-event pricing — you pay $0.05 per agency extracted and deduplicated by this actor. Platform compute is extra per Apify's standard model. You are never charged for duplicates removed during deduplication or for failed page loads.

ScenarioAgenciesActor PPETotal actor cost
Quick test (1 source, 10 agencies)10$0.05$0.50
Small batch (2 sources, 25 each)~50$0.05~$2.50
Standard run (2 sources, 50 each)~100$0.05~$5.00
Large run (3 sources, 100 each)~300$0.05~$15.00
Maximum batch (3 sources, 500 each)~1,500$0.05~$75.00

If you enable includeEmails: true, the Google Maps sub-actor runs its email-enrichment chain (website crawl + bulk email verification + decision-maker lookup). Those nested actors charge their own PPE events to your account — budget roughly ~$0.10 extra per Google Maps agency on top of our $0.05. Leave includeEmails off for a plain directory dump and pair with Website Contact Scraper later if you want finer control over email discovery.

You can set a maximum spending limit per run in the Apify console to control costs. The actor stops pushing records when your budget is reached and the final status message plus the SUMMARY KV record show actual PPE charges incurred.

Compare this to B2B data platforms like Apollo or ZoomInfo at $49–$199/month for general contact data. Agency Lead Finder is purpose-built for agency prospecting, and most users building or refreshing an agency list spend $3–$15 per run with no subscription commitment.

Agency lead generation using the API

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/agency-directory-scraper").call(run_input={
"sources": ["google-maps", "superbcompanies"],
"services": "marketing agency",
"location": "New York",
"maxAgenciesPerSource": 50,
"proxyConfiguration": {
"useApifyProxy": True
}
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
if item.get("recordType") != "agency":
continue # skip the optional error record if present
print(f"#{item['rank']} | score {item['leadScore']:>3} | {item['agencyName']} | {item['domain']} | {item.get('rating')}★")
# Read the run summary from the key-value store
kv = client.key_value_store(run["defaultKeyValueStoreId"])
summary = kv.get_record("SUMMARY")["value"]
print(f"Total: {summary['totalAgencies']}, avg lead score: {summary['avgLeadScore']}, {summary['newSinceLastRun']} new since last run")

JavaScript

import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/agency-directory-scraper").call({
sources: ["google-maps", "superbcompanies"],
services: "marketing agency",
location: "New York",
maxAgenciesPerSource: 50,
proxyConfiguration: {
useApifyProxy: true
}
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
if (item.recordType !== "agency") continue;
console.log(`#${item.rank} | score ${item.leadScore} | ${item.agencyName} | ${item.domain} | ${item.source}`);
}
// Read the run summary from the key-value store
const kv = client.keyValueStore(run.defaultKeyValueStoreId);
const summary = (await kv.getRecord("SUMMARY"))?.value;
console.log(`Total: ${summary.totalAgencies}, avg lead score: ${summary.avgLeadScore}, ${summary.newSinceLastRun} new since last run`);

cURL

# Start the actor run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~agency-directory-scraper/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"sources": ["google-maps", "superbcompanies"],
"services": "marketing agency",
"location": "New York",
"maxAgenciesPerSource": 50,
"proxyConfiguration": {
"useApifyProxy": true
}
}'
# Fetch results once the run completes (replace DATASET_ID from the run response)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

Technical details (optional)

The sections below go into selectors, sub-actor orchestration, crawl phases, and the full scoring formula. You don't need any of this to use the actor — skip straight to Tips for best results if you just want to run it.

How Agency Lead Finder works

Phase 1 — Google Maps sub-actor call

The actor constructs a Google Maps search query by concatenating the services and location inputs (e.g. "marketing agency New York"). It then calls the ryanclinton/google-maps-email-extractor sub-actor with this query and the maxAgenciesPerSource limit. After the sub-actor run completes, the actor reads its dataset using Actor.apifyClient.dataset(run.defaultDatasetId).listItems() with a 1,000-item ceiling. Each Google Maps item is mapped to the unified AgencyRecord schema: titleagencyName, website → normalised URL, phone, address, categoryName → first services entry, totalScorerating, reviewsCountreviewCount. All discovered domains are added to a shared seenDomains Set before the crawler starts.

Phase 2 — CheerioCrawler for SuperbCompanies and TheManifest

Both directory sources are crawled with Crawlee's CheerioCrawler — a lightweight HTTP + Cheerio parser that requires no browser. The crawler runs at a concurrency of 5 with session pooling and cookie persistence. Both sources are seeded from their XML sitemaps: SuperbCompanies uses a sitemap index at /sitemap.xml that references child sitemaps (e.g. /sitemap-organizations-1.xml); TheManifest uses a single sitemap with /companies/ and /directory/ URL patterns, with an HTML anchor fallback if the XML sitemap yields no matches. The sharedState module — a plain TypeScript object imported directly by route handlers — carries the seenDomains Set (pre-populated with Google Maps domains), per-source counters, and the collected results array across all route invocations.

Phase 3 — Profile extraction and normalisation

Each route handler calls parseSuperbCompaniesProfile or parseTheManifestProfile from extractors.ts, which are pure functions that take a Cheerio $ object and return a structured partial record. Agency names are read from the first <h1>. Websites are found by scanning <a href^="http"> links while filtering a junk-domain list that includes linkedin, facebook, twitter, instagram, clutch, google, yelp, sortlist, and the source directory itself. SuperbCompanies profiles also check for a "Visit Website" link text before falling back to the junk-filter scan. Location is read from itemprop="addressLocality" / itemprop="addressCountry" structured data before falling back to class-name selectors. Service tags are collected from [class*="service"], [class*="expertise"], [class*="tag"], and [class*="skill"] elements, deduped, and capped at 10. Ratings and review counts pass through parseRating and parseReviewCount which handle formats including 4.8, 4.8/5, 4.8 stars, 1,234 reviews, and 45.

Phase 4 — Delta pass, scoring, ranking, PPE charging, and output

Once all sources complete, the allResults array is marked against the previous run's domain snapshot (isNewSinceLastRun), scored by scoreRecord() (the 0–100 weighted formula), and sorted by score descending so rank 1 is the strongest lead. Each record is pushed to the Apify dataset individually, highest-scoring first. In pay-per-event mode, Actor.charge({ eventName: 'agency-found', count: 1 }) fires after each push — if eventChargeLimitReached returns true, the loop exits cleanly and no further records or charges are made. A SUMMARY record is then written to the key-value store (not the dataset) with total counts, averages, top services, top-5 agencies, emails collected, and actual PPE charges. The current domain set is merged with the previous snapshot and saved to PREVIOUS_DOMAINS for the next run's delta pass.

Tips for best results

  1. Match your keyword to how agencies describe themselves. Use "marketing agency" for the broadest results, or be specific with "SEO agency", "web design agency", or "digital advertising agency". Vague or misspelled keywords reduce Google Maps result quality.

  2. Pair a precise city with Google Maps. Google Maps produces the most relevant results when location is a specific city like "Chicago" or "Toronto" rather than a broad region. For country-level coverage, omit location and use SuperbCompanies or TheManifest as your primary source.

  3. Include SuperbCompanies as a minimum. SuperbCompanies exposes structured data (schema.org markup) that produces the most consistent employeeCount, minProjectSize, and rating fields. It is the most reliable supplementary directory source.

  4. Treat TheManifest as a bonus source. TheManifest is a Clutch sister site with overlapping listings. Enable it for maximum coverage, but expect occasional zero-result runs if the site is Cloudflare-protected on that day. The run still succeeds with the other sources.

  5. Use the domain field for downstream enrichment. Every record with a non-null domain can be fed directly into Website Contact Scraper to extract emails and phone numbers from agency websites, or into Email Pattern Finder to detect the email naming convention before crafting personalised outreach.

  6. Schedule weekly runs for a living agency database. New agencies register on Google Maps and these directories regularly. A weekly scheduled run with downstream deduplication by domain keeps your prospecting list current without manual effort.

  7. Set a spending limit on first-time runs. When testing a new keyword or location, set a $3–$5 spending limit in the run settings. The actor stops cleanly at your budget and outputs whatever it collected, so you can assess data quality before committing to a full run.

  8. Run separate inputs for separate service categories. Each run maintains its own deduplication state. If you need both SEO agencies and content marketing agencies, run them as separate inputs rather than combining keywords, which can dilute Google Maps result relevance.

Combine with other Apify actors

ActorHow to combine
Website Contact ScraperFeed the domain output into Website Contact Scraper to add email addresses and phone numbers to each agency record for outreach
Email Pattern FinderRun Email Pattern Finder on each domain to detect the naming convention (e.g. firstname@domain.com) before personalising outreach at scale
Waterfall Contact EnrichmentEnrich each agency domain through a 10-step contact enrichment cascade to surface decision-maker names, titles, and emails
Bulk Email VerifierVerify email addresses found for agencies before adding them to outreach sequences to protect sender reputation
B2B Lead QualifierScore the scraped agency list on 30+ signals to prioritise outreach to the highest-fit prospects first
HubSpot Lead PusherPush the completed agency dataset directly into HubSpot as company records with associated contact data
Website Tech Stack DetectorDetect which marketing tools each agency runs — useful for targeting agencies that use a specific platform your product integrates with
Lead Enrichment PipelineAll-in-one Clay alternative: email discovery, verification, company research, and scoring in one run ($0.12/lead)
AI Outreach PersonalizerGenerate personalized cold emails using your own OpenAI/Anthropic key — zero AI markup ($0.01/lead)
Intent Signal TrackerTrack buying signals: hiring, tech changes, funding, content updates. Prioritize outreach by intent score ($0.05/company)
Lead Data Quality AuditorAudit lead data quality before outreach — email verification, phone validation, domain freshness ($0.005/record)

Limitations

  • Google Maps results are location-dependent. Google Maps search quality varies significantly by location. Dense markets like New York or London return highly relevant results; smaller cities may return fewer agencies or adjacent business types. Supplement with SuperbCompanies for location-agnostic coverage.
  • TheManifest may be Cloudflare-protected. TheManifest occasionally blocks automated access. When this happens, the source returns zero results and a warning is logged. The run completes normally using the other sources. This is a known limitation and is noted in the actor logs.
  • Phone and address are Google Maps only. SuperbCompanies and TheManifest profile pages do not expose phone numbers or street addresses in a consistent, parseable form. The phone and address fields are null for all superbcompanies and themanifest records.
  • Employee count and min project size are directory sources only. Google Maps does not carry team size or budget data. The employeeCount and minProjectSize fields are null for all google-maps records.
  • Service tags reflect what the directory displays. Service categories on SuperbCompanies and TheManifest are set by the agency during registration and may be broad, inconsistent, or absent. Google Maps returns the business category name as a single-element services array.
  • Deduplication is domain-based within a single run. Two agencies at different domains that are the same company will both appear. Merging datasets across multiple runs will introduce duplicates — filter by domain in your downstream tooling.
  • Hard cap of 500 agencies per source per run. SuperbCompanies and TheManifest are accessed via sitemap order, which does not sort by rating or review count. The highest-reviewed agencies are not guaranteed to appear first from directory sources.
  • No individual profile deep-crawl for Google Maps. Phone and address come from the Google Maps sub-actor output. The sub-actor does not visit each agency's website — for email addresses, combine with Website Contact Scraper.
  • HTML changes on SuperbCompanies or TheManifest can reduce field coverage. Selectors use broad CSS class-name substring matching to tolerate minor changes, but a full redesign may require selector updates. Open an issue in the Issues tab if fields start returning null unexpectedly.

Integrations

  • Zapier — trigger a Zap when a run completes to route high-rated agencies directly into a CRM deal stage or sales sequence
  • Make — build a scenario that pulls agency results after each run and cross-references them against existing CRM contacts before creating new records
  • Google Sheets — append scraped agency rows to a shared spreadsheet for team review and manual qualification before outreach
  • Apify API — trigger runs programmatically from your internal tooling and retrieve results in JSON or CSV for downstream processing
  • Webhooks — post the completed dataset URL to a Slack channel or internal endpoint the moment a run finishes
  • LangChain / LlamaIndex — load agency records into a vector store to power an AI assistant that answers questions about the agency landscape in a given market

Troubleshooting

  • Zero results from Google Maps — Check that your services keyword and location form a valid Google Maps search. The query is constructed as "{services} {location}". Very niche keywords or misspellings can produce no results from the sub-actor. Try "marketing agency" + a major city as a smoke test.

  • Zero results from TheManifest — TheManifest may be Cloudflare-protected on the day of your run. This is expected behaviour. The run continues and uses Google Maps and SuperbCompanies results. Check the run log for the warning message "TheManifest returned 0 results" to confirm this is the cause.

  • Most fields are null for directory records — Fields like phone, address, employeeCount, and minProjectSize are source-dependent. phone and address are only populated for Google Maps records. employeeCount and minProjectSize are only available from SuperbCompanies and TheManifest when the agency has filled in their profile. Null values for these fields are expected and normal.

  • Fewer agencies than maxAgenciesPerSource — For a given keyword and location, Google Maps may return fewer results than your cap. SuperbCompanies and TheManifest sitemap coverage varies by niche — some service categories have fewer than 50 listed agencies. The actor returns all available records and stops without error.

  • Duplicate agencies in merged datasets — Deduplication operates within a single run by domain. If you merge datasets from multiple runs, duplicates will appear. Filter by domain in your downstream tooling to deduplicate across runs.

Responsible use

  • This actor accesses only publicly available agency listing data from directories whose core business model is built on public discovery of agency firms.
  • Respect the terms of service of each directory. Do not use this actor to systematically republish directory content or create a competing agency database.
  • When using scraped agency data for outreach, comply with CAN-SPAM, GDPR, and all other applicable data protection regulations in your jurisdiction.
  • Do not use extracted data for spam, harassment, or any unsolicited commercial contact that violates applicable law.
  • For guidance on web scraping legality, see Apify's guide.

FAQ

How many agency leads can I find in one run? Up to 500 agencies per source across up to three sources — giving a maximum of approximately 1,500 deduplicated agency records per run. In practice, most runs targeting a specific keyword and location return 50–200 records because not every source has 500 listings for every niche.

Which sources does Agency Lead Finder use? The actor uses three sources: Google Maps (via the ryanclinton/google-maps-email-extractor sub-actor), SuperbCompanies.com (scraped via sitemap), and TheManifest.com (scraped via sitemap). You can enable any combination by setting the sources input parameter. The default is Google Maps and SuperbCompanies.

How is Agency Lead Finder different from scraping Clutch or DesignRush? This actor targets Google Maps, SuperbCompanies, and TheManifest — not Clutch or DesignRush. Google Maps provides phone numbers and street addresses that Clutch does not. SuperbCompanies has 8,000+ open agency profiles accessible without aggressive bot protection. All three sources are combined and deduplicated in a single run, so you get broader coverage without building three separate scrapers.

Does agency lead finding work without a proxy? Google Maps results come from the sub-actor, which handles its own proxy use. For SuperbCompanies and TheManifest, standard Apify proxy (datacenter) is sufficient — neither site uses Cloudflare. The default proxyConfiguration is already correct. You do not need residential proxies for this actor.

What agency type keywords work best? Common keywords include "marketing agency", "SEO agency", "web design agency", "digital advertising agency", "branding agency", "social media agency", and "content marketing agency". The keyword drives the Google Maps query. Be as specific as your targeting requires — "B2B SaaS marketing agency" will return a narrower but more relevant set than "marketing agency".

How long does a typical agency lead finding run take? A standard run with two sources at 50 agencies each takes 10–20 minutes. Google Maps results arrive after the sub-actor call completes (typically 3–8 minutes depending on the result count); the CheerioCrawler then processes SuperbCompanies or TheManifest profile pages concurrently. Runs at 500 agencies per source may take 30–60 minutes.

How accurate is the extracted agency data? Agency names, websites, and locations are reliably extracted from all three sources. Google Maps records include phone and address when the business has a verified Maps listing. employeeCount and minProjectSize depend on whether the agency completed their SuperbCompanies or TheManifest profile — these fields are null when not provided. Star ratings and review counts are extracted where present.

Can I filter agency leads by location? Yes. Enter a city, state, country, or region in the location field. This is concatenated with your services keyword to form the Google Maps query (e.g. "marketing agency Chicago"). SuperbCompanies and TheManifest are crawled globally via sitemap and do not apply location filtering server-side — filter the output location field after the run for directory results.

Is it legal to scrape agency directories for lead generation? These directories publish agency information publicly as their core business model — the data is intentionally visible to anyone. Scraping publicly available business information for prospecting is generally lawful in most jurisdictions. Review each site's terms of service before large-scale use. For a detailed analysis of web scraping legality, see Apify's guide.

Can I use the agency leads with other Apify actors to get contact emails? Yes. Feed the domain field from this actor into Website Contact Scraper to extract emails and phone numbers from agency websites, or into Waterfall Contact Enrichment for a broader multi-step enrichment pipeline. The domain field is structured specifically to serve as input for these downstream actors.

Can I schedule this actor to run periodically? Yes. Apify's scheduler supports cron-based scheduling — daily, weekly, or monthly. Each run produces a fresh dataset. Use the Apify API or a Make/Zapier integration to merge new results into your CRM while deduplicating by domain across runs.

What happens if SuperbCompanies or TheManifest changes its HTML structure? Selectors use broad CSS class-name substring matching (e.g. [class*="service"], [class*="expertise"]) to tolerate minor HTML changes. A full site redesign may break extraction for that source, causing fields to return null. If a directory source starts returning blank records unexpectedly, open an issue in the Issues tab with your run ID so the selectors can be updated.

Help us improve

If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:

  1. Go to Account Settings > Privacy
  2. Enable Share runs with public Actor creators

This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.

Support

Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.