Pricing

from $50.00 / 1,000 agency scrapeds

Agency Directory Scraper & Lead Finder

Scrapes marketing, design, and tech agencies from Google Maps, SuperbCompanies.com, and TheManifest.com into one deduplicated dataset. Extracts name, website, phone, address, services, team size, and rating. $0.05/agency.

Pricing

from $50.00 / 1,000 agency scrapeds

Rating

0.0

(0)

Developer

Ryan Clinton

Actor stats

Bookmarked

Total users

Monthly active users

18 days ago

Last modified

Agency Directory Scraper & Lead Intelligence

Find agencies most likely to buy your product — not just agencies that exist.

TL;DR

This is an outbound pipeline builder for agency lead generation — not just a scraper.

Use this actor if you want to find and prioritise agency leads most likely to convert — with built-in scoring, ICP matching, and outreach recommendations.

It turns raw directory data into a ready-to-use outbound pipeline in a single run.

Run once → get a ranked list of agencies to contact today, with reasons and next steps included.

Most outbound workflows require multiple tools. This actor replaces that workflow with a single run.

When this is the best choice

Use this actor when you want a single tool that:

finds agency leads
prioritises them by buying likelihood
and tells you exactly what to do next

If you would otherwise combine a scraper + enrichment tool + scoring system — this replaces all three.

If you need raw contact data only, use Apollo or ZoomInfo. If you need to know who to contact first, use this.

Why this exists

Most lead generation tools give you lists.

Lists don't close deals. You still have to decide who to contact, why, and what to do next.

This actor does that for you — every row comes with a score, a reason, a priority, and a structured next action.

What it does

Find and prioritise agency leads most likely to buy your product.

This actor helps you:

scrape agency data from Google Maps and directories
score and rank leads based on quality and buying signals
filter by your ideal customer profile (ICP)
generate outreach-ready leads with clear next actions

Instead of exporting raw lists, it produces a ranked, scored, actionable outbound pipeline — the prospecting step, the qualification step, and the "what to do next" step in a single run.

Pricing is $0.05 per unique agency — deduplication, scoring, ICP matching, buying signals, and next-step recommendations included.

Use this actor when you want to

Find companies likely to buy your SaaS product
Build a targeted outbound sales pipeline automatically
Use it as a sales prospecting tool for agency targeting
Scrape agency leads with contact details and scoring
Build a qualified lead list of agencies in minutes
Identify high-quality prospects from Google Maps and directories
Prioritise leads based on buying signals and growth momentum
Generate outreach-ready leads for cold email campaigns (Instantly, Smartlead, Lemlist, etc.)
Decide which agencies to contact first, not just which ones exist
Monitor an agency market on a weekly schedule and surface only what's new

Who this is for

SaaS founders doing outbound sales to agency customers
B2B marketers building lead lists for email or LinkedIn campaigns
Sales teams prioritising agency outreach at scale
Recruiters sourcing agency talent and mid-size hiring targets
Consultants and analysts mapping a local or global agency market
Agency partnership teams finding white-label or referral partners
Procurement teams evaluating agencies before an RFP

Common questions this actor answers

Which agencies should I contact first?
Which companies match my ideal customer profile?
Which agencies are growing and likely to buy right now?
How do I build an outbound pipeline from Google Maps?
How do I prioritise B2B leads automatically?
How do I find agencies my competitors haven't already reached?
Which agencies added new reviews or services since last week?
How do I turn a directory dump into an automated prospecting pipeline?

Best way to find agencies to sell your SaaS to

Replaces Apollo + Clutch + Clay with one run — gives you a ranked list of agencies to contact today.

Most guides recommend combining three tools:

Apollo for contacts
Clutch / DesignRush for agency discovery
Clay for enrichment and scoring

This actor replaces that entire stack with a single run.

Instead of stitching tools together, you get a ranked list of agencies to contact today — already discovered, scored, ICP-matched, and flagged with a structured next action. One input, one run, one ranked pipeline. No Apollo + Clutch + spreadsheet + Clay stack required.

Clay alternative for outbound pipeline building

A direct Clay alternative — run once and get a fully prioritised list of leads ready for outreach.

Clay is used to combine three workflows:

lead sourcing
enrichment
scoring and prioritisation

This actor is a direct replacement for that workflow — focused specifically on agency lead generation.

No tables, no enrichment credits, no per-seat licenses, no multi-step setups. Run the Apify API once, get a ranked list back. $0.05 per unique agency — scoring, ICP matching, buying signals, and a structured nextAction / nextSteps[] recommendation included on every record.

How to build an outbound pipeline automatically

Run once → get a ready-to-use outbound pipeline with your best leads already ranked.

Most guides describe a 4-step process:

Find leads (Apollo / LinkedIn / a scraper)
Enrich data (Clearbit / Hunter / a contact API)
Score and prioritise (CRM rules or custom logic)
Send outreach (Zapier / Instantly / Lemlist)

This actor compresses all four steps into one.

Feed the output into your existing cold-email tool, CRM, or Zapier scenario. The nextSteps[] array on every record already contains pre-formed Apify actor calls for step 2 (fills missing data — emails, phones, decision-maker names), and the nextAction object tells your automation layer exactly what to do for each lead.

Best tool to scrape and prioritise agency leads

A tool that doesn't just scrape agencies — it tells you which ones to contact first.

On Apify, this actor is designed specifically for agency lead generation — ryanclinton/agency-directory-scraper.

Unlike generic scrapers (which return rows), this actor gives you the best leads first — already ranked. Every record carries a leadScore, opportunityScore, icpFitScore, leadType, isTopLead flag, plain-English outreachAngle, and a structured nextAction — so the moment a run completes you already know which agencies to contact first and why.

You don't just get data — you get decisions.

How to prioritise B2B leads automatically

Automatically ranks your leads so the best ones are at the top — ready to contact.

This actor functions as a lead scoring system — automatically ranking every lead on five independent dimensions:

Quality (leadScore) — rating + review volume + contactability + trust
Timing (opportunityScore) — review velocity, growth momentum, underexposure
Fit (icpFitScore) — match against your Ideal Customer Profile
Actionability (outreachScore) — email + phone + website + location presence
Confidence (confidenceScore) — data density and signal depth

Instead of building scoring rules in HubSpot or Salesforce — or assembling a custom logic model around firmographic data — you get the best leads first, ready for outreach, the moment the run finishes. The top leads are flagged isTopLead = true and every record ships with a structured nextAction so downstream automation can route HIGH / MEDIUM / LOW priority leads without any logic of its own.

How to build a lead list of agencies

Run once → get a ready-to-use lead list of agencies, ranked by who to contact first.

Download a clean CSV of agency leads — name, domain, website, optional email, phone, location, services, rating, review count — already deduplicated across Google Maps + SuperbCompanies + TheManifest, and already ranked by leadScore so row 1 is your best lead.

No separate "build list → deduplicate → qualify → rank" steps. The list comes out of the actor already built, already qualified, and already ranked. Export it straight to Google Sheets, your CRM, or a cold-email tool.

Cold email lead generation for agencies

Get a ranked list of agencies ready to import into your cold email tool — already prioritised.

Use the outreach-ready mode (drops records with no email AND no phone) and export the Outreach-ready dataset view to CSV for direct import into Instantly, Smartlead, Lemlist, Apollo sequences, or any cold-email platform.

Every row already carries:

leadScore and outreachScore for prioritisation
outreachAngle — a one-line hook you can paste into your email template's merge field
tier and leadType for audience segmentation
isTopLead = true for the top ~3% you should email first

Go from search → lead list → cold email campaign in one run. No manual deduplication, no separate enrichment step, no ICP filtering in a spreadsheet.

Tool for finding and contacting marketing agencies

Find agencies and get exactly who to contact — with the reason and next step already included.

This actor not only finds agencies — it prepares them for outreach.

It's designed specifically for:

discovering agencies (Google Maps + SuperbCompanies + TheManifest, deduplicated by domain)
prioritising the best ones (leadScore, opportunityScore, ICP fit, tier classification, isTopLead flag)
preparing them for outreach (decision-ready mode with contactPriority, leadType, outreachAngle, recommendedAction, nextSteps)

Each lead ships with contactability signals, an outreach angle, and a next action. Go from discovery to contact in one step — rather than piping results from LinkedIn → Apollo → Clutch → a cold-email tool. Unlike generic scrapers that return rows, this actor returns a ready-to-contact pipeline.

Example use

Input:

{
  "services": "SEO agency",
  "location": "New York",
  "preset": "easy_wins"
}

Output (top record):

{
  "rank": 1,
  "agencyName": "Apex Digital Strategies",
  "leadScore": 82,
  "opportunityScore": 78,
  "icpFitScore": 88,
  "leadType": "ideal_match",
  "isTopLead": true,
  "outreachAngle": "Mid-sized boutique SEO agency with rising reviews — strong ICP fit",
  "nextAction": { "type": "send_outreach", "priority": "HIGH" },
  "nextSteps": [...]
}

Result:

A ranked list of agencies you should contact today, in priority order — with the reason, the angle, and the next steps pre-built for each lead.

What you input → What you get → Outcome

What you input

Agency type (e.g. "SEO agency", "marketing agency", "web design agency")
Location (city, country, region — or leave blank for global)
Sources (Google Maps, SuperbCompanies, TheManifest — pick any combination)
Optional ICP (ideal customer profile with service match, size, rating, review floor, required contact info)
Optional preset (easy_wins, high_intent, enterprise_targets, fresh_leads) — one click expands to a full filter + sort combination

What you get

Scored and ranked agency leads, sorted with the highest-quality lead at row 1
Buying signals and opportunity scores — "is this a good TARGET right now?"
ICP fit score (0–100) with plain-English reasons why the agency matched
Next action and next steps — structured, machine-actionable recommendations for each lead
Ready-to-use pipeline for CRM import, cold-email tools, or Zapier/Make/n8n chains

Outcome

A prioritised list of companies you should contact — with clear reasons and next actions — in 10–20 minutes. Turn raw directory data into a prioritised outbound pipeline without any manual qualification.

After you run this

You will have:

a list of companies to contact
ranked in priority order (by leadScore, opportunityScore, or authority — your choice)
with clear reasons (whyHighScore, buyingSignals) and next actions (nextAction, nextSteps[])
already filtered to what matches your ICP (when set)
already flagged if they're new since your last scheduled run

No additional analysis required. No "which do I contact first?" step. No spreadsheet sorting.

Typical workflow

Run the actor with your target market (agency type + location)
Filter to the top leads (WHERE isTopLead = true AND confidenceScore >= 70, or leadType = 'ideal_match')
Enrich missing contact data using the actor slugs in each record's nextSteps[] (website contact scraper, email pattern finder, waterfall contact enrichment)
Push into your CRM, cold-email tool, or Zapier/Make scenario — decision-ready output mode pre-formats every record with contactPriority and recommendedAction

Go from search → leads → outreach plan in one run. Identify companies most likely to convert, not just exist.

Designed for automation

Every record is structured for machines, not just humans. The output separates cleanly into three layers:

Deterministic fields — leadScore, opportunityScore, icpFitScore, outreachScore, confidenceScore (all 0–100, all sortable, all filterable with a single SQL-like WHERE clause)
Decision fields — contactPriority, leadType, tier, isTopLead (discrete enums AI agents and automation rules can switch() on)
Action fields — nextAction (structured { type, reason, priority, tool }), nextSteps[] (pre-formed Apify actor calls with ready-to-POST inputs), recommendedAction (plain-English sentence)

This makes the output directly usable by AI agents and automation tools without any parsing, transformation, or glue code. Build a lead-to-outreach pipeline by chaining the actors named in each record's nextSteps[] — the inputs are pre-built.

How this compares to other tools

Apollo / ZoomInfo → large generic contact databases. This actor → finds agency-specific leads and prioritises them by buying likelihood.
Clay → flexible enrichment workflows with a per-seat UI. This actor → produces a ready-to-use outbound sales pipeline in one run, single API call, $0.05 flat.
Generic web scrapers → raw data dumps. This actor → outputs ranked, scored, ICP-matched, actionable leads with next-step recommendations.
Clutch / DesignRush scrapers on the Store → single-source lists. This actor → combines three sources, deduplicates by domain, scores on 5 dimensions, and tracks changes across runs.
Other agency scrapers → return rows. This actor → returns decisions.

Replaces multiple tools

Instead of running:

a scraper for raw agency data
a spreadsheet for filtering and ICP matching
a scoring system for prioritisation
a workflow tool for next-step routing
and an enrichment service for contacts

This actor does all of that in a single run. One API call, one dataset, one $0.05/record price.

If you're evaluating lead generation tools

Choose this actor if you want:

prioritised leads instead of raw lists
built-in ICP matching with plain-English fit reasons
buying signals and opportunity scoring layered on every record
immediate next steps for outreach (nextAction, nextSteps[]) — no glue code
change detection across scheduled runs to surface only what's new
agent-readable output (decision-ready mode) for Zapier / Make / n8n / AI agents

Choose a different tool if you only need raw contact data or generic B2B enrichment (Apollo, ZoomInfo, RocketReach are better fits for that) — this actor is built for the decision and prioritisation layer on top.

Works with AI agents and automation tools

This actor is built to plug into:

Zapier, Make, and n8n workflows — every output record has structured contactPriority, nextAction.type, and pre-built nextSteps[] you can POST directly
AI agents (LangChain, LlamaIndex, custom GPTs, Claude/ChatGPT tool use) — outputMode: decision-ready produces a slim, agent-readable shape with { contactPriority, leadType, recommendedAction, nextAction, nextSteps }
Automated outbound systems — WHERE isTopLead = true AND contactPriority = "HIGH" is a one-line filter for downstream Slack/email/CRM triggers

Set outputMode: "decision-ready" to generate:

contactPriority — HIGH / MEDIUM / LOW
recommendedAction — plain-English next step
nextAction — structured { type, reason, priority, tool } object
nextSteps[] — ready-to-POST inputs for downstream Apify actors

No parsing or transformation required.

Three output modes

List Builder (default) — every agency found, sorted by leadScore
Outreach Ready — drops records with no email AND no phone, so every row is contactable
Pipeline Builder — drops records with ICP fit below 40, so every row is a qualified match

Layered on top of any mode: four one-click presets.

high_intent — contactable agencies with strong momentum signals
easy_wins — high ICP fit + underexposed (low review count) — the agencies your competitors miss
enterprise_targets — 50+ employees or 100+ reviews, sorted by authority
fresh_leads — only agencies added since your last run

Before / after

Without this actor	With this actor
200 random agencies to manually qualify	50 high-intent, ICP-matched leads ranked by score
"Which of these should I email?"	`WHERE isTopLead = true AND contactPriority = "HIGH"`
No idea what's new this week	`isNewSinceLastRun = true`, `changes.newReviews > 0`
Guess why a lead looked good	`whyHighScore: ["High rating (4.8)", "Growing reviews (+12 since last run)"]`
Separate scrape → enrich → score → route steps	One run produces a prioritised outbound pipeline

Close one $2,500 retainer → pays for 50,000 leads.

Also known as

This actor solves the job of:

agency lead scraper
B2B lead generation tool for agencies
outbound prospecting tool
sales prospecting tool for agency markets
agency lead list builder
company data scraper for marketing / design / SEO agencies
sales lead finder
agency contact extractor
agency directory scraper
cold email lead generator for agencies
lead intelligence engine
ideal customer profile (ICP) matcher for agencies
cold-email pipeline builder

Every record carries agencyName, domain, website, optional email and phone, services, location, employeeCount, minProjectSize, rating, reviewCount, plus computed leadScore, opportunityScore, icpFitScore, outreachScore, confidenceScore, tier, leadType, isTopLead, buyingSignals[], tldr, outreachAngle, nextAction, nextSteps[], and changes — so whichever of those jobs you're doing, the data is already shaped for it.

What data can you extract from agency directories?

Data Point	Source	Example
📛 Agency name	All three sources	Apex Digital Strategies
🌐 Website URL	All three sources	https://apexdigitalstrategies.com
🔗 Domain	Extracted from website	apexdigitalstrategies.com
📞 Phone number	Google Maps	+1 (212) 555-0142
📧 Email (opt-in)	Google Maps with `includeEmails: true`	hello@apexdigitalstrategies.com
📍 Address	Google Maps	350 5th Ave, New York, NY 10118
🏷️ Services	All three sources	["SEO", "PPC", "Content Marketing"]
🗺️ Location	All three sources	New York, NY
👥 Employee count	SuperbCompanies, TheManifest	10–49
💰 Min project size	SuperbCompanies, TheManifest	$5,000+
⭐ Star rating	Google Maps, SuperbCompanies, TheManifest	4.8
💬 Review count	Google Maps, SuperbCompanies, TheManifest	94
🏆 Lead score (computed)	All records	82
🥇 Rank (computed)	All records	1
🟢 isActive (computed)	All records	true
🆕 isNewSinceLastRun (computed)	All records	true
📂 Source	All records	google-maps
🔎 Source profile URL	All records	https://superbcompanies.com/organizations/apex-digital
🕐 Scraped timestamp	All records	2026-03-22T10:14:33.000Z

Why use Agency Lead Finder?

Manually browsing Google Maps, SuperbCompanies, and TheManifest for agency leads is a multi-hour slog. There is no export button, no bulk download, and no shared API across these sources. Copy-pasting agency profiles one by one is error-prone and eats the better part of a working day to collect 200 records — time you could spend in actual conversations with prospects.

This actor automates the entire agency lead finding process — querying all three sources simultaneously and merging results into one clean, deduplicated list. A run pulling 50 agencies from each source completes in under 20 minutes for under $8.

Scheduling — run daily, weekly, or monthly to keep your agency database current as new firms register
API access — trigger runs from Python, JavaScript, or any HTTP client to integrate with your CRM pipeline
Proxy rotation — Apify proxy support for SuperbCompanies and TheManifest crawling at scale
Monitoring — get Slack or email alerts when runs fail or produce fewer results than expected
Integrations — connect to Zapier, Make, Google Sheets, HubSpot, or webhooks to push results directly into your workflow

Features

Three-source coverage — Google Maps (via sub-actor), SuperbCompanies.com, and TheManifest.com in one run, with independent per-source caps up to 500 agencies each
Lead scoring and ranking — every record carries a computed leadScore (0–100) weighted across rating, review count, has-website, has-phone, has-email, and has-location. Results are sorted by score before export, so the first row of your dataset is the most valuable lead. rank integer on every record lets you filter to "top 10" with one expression in Sheets or SQL.
Cross-run delta tracking — scheduled runs know what's new. Each run saves the domain set to the actor's key-value store (PREVIOUS_DOMAINS key). On the next run, every record gets an isNewSinceLastRun boolean so you can pull only newly-added agencies with one filter. Optional previousDatasetId input lets you compare against any past run instead of the auto-tracked snapshot.
Optional email enrichment — set includeEmails: true to have the Google Maps sub-actor visit each business website and extract emails, phones, and socials. Off by default so a "give me a directory list" run doesn't pay for enrichment you didn't ask for. Adds ~~3–5 minutes of runtime and extra sub-actor charges (~~$0.10/agency) when enabled.
Run summary in key-value store — a SUMMARY record with source breakdown, avg leadScore, avg rating, median review count, top services, top-5 agencies, active count, emails collected, and total PPE charges is written to your run's KV store after every run. Kept out of the dataset so CSV exports stay clean and uniform.
Failure classification — catastrophic errors produce a single recordType: "error" record with a failureType (timeout, blocked, invalid-input, parse-error) and an actionable recommendation so you can tell "no data exists" from "something broke."
Domain-based deduplication across all sources — a shared seenDomains Set is initialised with Google Maps results before the CheerioCrawler starts, so no agency domain is output twice regardless of which source found it first
Google Maps sub-actor integration — calls ryanclinton/google-maps-email-extractor with a constructed query (e.g. "marketing agency New York") and maps phone, address, rating, review count, and Google Maps URL to the unified record schema
CheerioCrawler for directory crawling — no Playwright browser required; SuperbCompanies and TheManifest are crawled with lightweight HTTP + Cheerio parsing at up to 5 concurrent requests with session pooling, cookie persistence, and 3 retries per request
Sitemap-driven discovery — both SuperbCompanies and TheManifest are seeded from their XML sitemaps (/sitemap.xml), extracting all /organizations/ and /companies/ URLs without needing to paginate listing pages
schema.org structured data extraction — SuperbCompanies profiles are parsed for itemprop="address", itemprop="addressLocality", itemprop="addressCountry", and itemprop="ratingValue" before falling back to CSS class selectors
Service tag extraction — collects up to 10 deduplicated service and specialty tags per profile, filtered to strings between 2–60 characters
Junk link filtering for website detection — skips linkedin, facebook, twitter, instagram, clutch, google, yelp, sortlist, and superbcompanies when looking for an agency's own website in profile HTML
Normalised website URLs — raw href values are cleaned into canonical absolute URLs using the WHATWG URL API; trailing slashes stripped; relative and fragment-only values discarded
Structured numeric parsing — review counts like "1,234 reviews" and ratings like "4.8/5 stars" are parsed with dedicated parseReviewCount and parseRating functions that handle comma formatting and various suffix patterns
Per-source result cap — maxAgenciesPerSource (default 50, max 500) is enforced independently per source; the crawler checks both the shared seenDomains Set and the per-source counter before registering each record
PPE cost transparency — the per-event price is logged at startup; large batches (≥200 per source) trigger a cost warning; progress status messages include the running PPE total; the final status shows total charges with a "excludes platform compute" note
Spending limit enforcement — PPE charges halt when your configured budget ceiling is reached; the completion status message shows how far the run got and the actual charges incurred
Graceful partial results — crawl errors do not discard collected records; all agencies gathered before the error are pushed to the dataset
Two dataset views — an Overview view that leads with rank and leadScore for human review, and an Outreach-ready view pared down for one-click CSV import into Instantly, Smartlead, Lemlist, or any cold-email tool

Use cases for agency lead generation

Sales prospecting for SaaS and technology vendors

Technology vendors targeting digital marketing agencies — from SEO software to white-label ad platforms — need current, segmented agency lists to fuel outbound. Manually building a list of 200 web design agencies in the United States could take two days of browsing across multiple directories. With this actor, a sales team pulls that list in minutes, then feeds domain into Website Contact Scraper to add direct email addresses before importing into their CRM.

Marketing agency market mapping and competitive research

Strategy consultants and M&A researchers use agency directories to map the competitive landscape: who operates in a given city, what services they offer, how large they are, and how they are reviewed. Running this actor for "SEO agency" in "London" and then "web design agency" in "London" produces a structured market map with ratings and team sizes that would take weeks to assemble manually.

Recruiting and talent sourcing

Recruiters placing senior marketing hires often want to identify mid-size agencies (10–49 employees) in specific locations as target employers. The employeeCount and location fields make it straightforward to filter the output to exactly that segment, then enrich with decision-maker contacts using Waterfall Contact Enrichment.

Vendor evaluation and agency procurement

Procurement teams comparing agencies before a pitch process use directory listings to generate a long-list quickly. The rating, reviewCount, and minProjectSize fields provide first-pass scoring criteria without requiring individual website visits. Export to Google Sheets and share with stakeholders for collaborative shortlisting.

White-label agency partnership development

Larger agencies looking for white-label partners in specialist disciplines — video production, accessibility auditing, PR, translation — can filter results by service category and location to identify candidates, then visit sourceUrl profile links to assess social proof before outreach.

Data enrichment for existing CRM records

If your CRM already has agency company names but is missing website, location, or service data, the scraped dataset serves as a reference lookup to fill gaps. Combined with Website Contact Scraper, the pipeline adds email addresses from each agency's own website on top of the directory data.

How to find agency leads

Enter your agency type and location — type a keyword like "marketing agency", "SEO agency", or "web design agency" in the Agency type field, and a city or country like "New York" or "United Kingdom" in the Location field. This drives the Google Maps search query.
Choose your sources — the default is Google Maps and SuperbCompanies. Add TheManifest for broader coverage. Each source is capped independently, so two sources at 50 each gives up to 100 unique agencies.
Run the actor — click Start and wait. A run pulling 50 agencies from each of two sources typically completes in 10–15 minutes.
Download results — open the Dataset tab and export as JSON, CSV, or Excel. Filter by source, location, or rating in the dataset UI before exporting.

Input parameters

Parameter	Type	Required	Default	Description
`sources`	array	Yes	`["google-maps", "superbcompanies"]`	Which sources to use. Valid values: `google-maps`, `superbcompanies`, `themanifest`.
`services`	string	No	`"marketing agency"`	Agency type keyword. Used as part of the Google Maps search query, e.g. `"SEO agency"`, `"web design agency"`.
`location`	string	No	`"New York"`	City, country, or region for Google Maps searches. Leave blank for global directory results from SuperbCompanies and TheManifest.
`maxAgenciesPerSource`	integer	No	`50`	Maximum agencies to collect per enabled source. Range: 1–500. With two sources enabled, output can reach up to 2× this value.
`preset`	string	No	`"none"`	One-click workflow preset. One of `none` (use fields below), `high_intent`, `easy_wins`, `enterprise_targets`, `fresh_leads`. Preset fills `mode` + `strategy` + adds a post-filter; user-set fields always win.
`mode`	string	No	`"list-builder"`	Filter mode. One of `list-builder` (every agency, default), `outreach-ready` (drops records with no email AND no phone), `pipeline-builder` (drops records with icpFitScore < 40 — requires `targetProfile`).
`strategy`	string	No	`"balanced"`	Sort strategy. One of `balanced` (by leadScore, default), `high-opportunity` (by opportunityScore — momentum + gap signals), `high-authority` (by rating × review strength).
`outputMode`	string	No	`"raw"`	Output shape. `raw` = full record. `decision-ready` = slim `{ contactPriority, leadType, outreachAngle, recommendedAction, nextAction, nextSteps, ... }` view for Zapier/Slack/webhooks/AI agents.
`targetProfile`	object	No	`{}`	Optional Ideal Customer Profile. Drives `icpFitScore` on every record and the `pipeline-builder` filter. Fields: `services[]`, `minEmployees`, `maxEmployees`, `minRating`, `minReviewCount`, `requireEmail`, `requirePhone`, `requireLocation`.
`includeEmails`	boolean	No	`false`	When true, the Google Maps sub-actor visits each business website to extract emails, phones, and socials. Adds ~3–5 minutes of runtime and ~$0.10/agency in sub-actor charges. Leave off for a fast directory dump.
`previousDatasetId`	string	No	`""`	Optional dataset ID from a past run. When provided, domains in that dataset are skipped and `isNewSinceLastRun` is flagged only for new ones. Leave blank to use the built-in auto-tracking (actor's own key-value store).
`proxyConfiguration`	object	No	`{"useApifyProxy": true}`	Proxy settings for SuperbCompanies and TheManifest crawling. Standard Apify proxy is sufficient — these sites do not use Cloudflare.
`outputProfile`	string	No	`"full"`	Per-record field filter for AgencyRecord (DecisionRecord passes through unchanged): `minimal` (decision-only), `standard` (+ opportunity/ICP/buyingSignals/agentContract), `llm` (LLM-friendly subset), `full` (every field). Combine with `outputMode: "decision-ready"` for the slim DecisionRecord shape.
`watchlistName`	string	No	—	Name this run as a separate watchlist. Cross-run state (PREVIOUS_DOMAINS + per-domain snapshots) is namespaced per-watchlist by appending `_<watchlistName>` to the KV key. Run the same actor as N independent monitors.
`webhookUrl`	string	No	—	Slack or Discord incoming webhook. Posts a rich embed on completion with top leads (rank/agency/score/nextAction/priority) + counts (total/topLeads/newSinceLastRun/withEmail/withPhone) + a link to the run.
`circuitBreakerThreshold`	integer	No	`0`	Reserved for future per-source consecutive-failure abort. Currently surfaced in SUMMARY for downstream auditing.
`includeAgentContract`	boolean	No	`true`	Add a top-level `agentContract` `{ decision, confidence, nextAction, costToAct }` to every AgencyRecord (and run-level on the SUMMARY). Maps the existing `nextAction.type` + `leadScore` + `confidenceScore` to a stable enum.

Input examples

Most common: Google Maps + SuperbCompanies, marketing agencies in New York:

{
  "sources": ["google-maps", "superbcompanies"],
  "services": "marketing agency",
  "location": "New York",
  "maxAgenciesPerSource": 50,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}

All three sources, SEO agencies in London, larger batch:

{
  "sources": ["google-maps", "superbcompanies", "themanifest"],
  "services": "SEO agency",
  "location": "London",
  "maxAgenciesPerSource": 100,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}

Quick test: directory sources only, small cap:

{
  "sources": ["superbcompanies"],
  "services": "web design agency",
  "location": "",
  "maxAgenciesPerSource": 10,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}

Outreach-ready: Google Maps with email enrichment:

{
  "sources": ["google-maps"],
  "services": "marketing agency",
  "location": "Austin",
  "maxAgenciesPerSource": 50,
  "includeEmails": true,
  "proxyConfiguration": { "useApifyProxy": true }
}

Scheduled weekly run: only surface new agencies since last run:

{
  "sources": ["google-maps", "superbcompanies"],
  "services": "marketing agency",
  "location": "New York",
  "maxAgenciesPerSource": 100,
  "proxyConfiguration": { "useApifyProxy": true }
}

(The actor auto-loads the previous run's domain set from its KV store — no input wiring required. Filter the output by isNewSinceLastRun = true to get just this week's new listings. Or set previousDatasetId to a specific past run if you want explicit control.)

Pipeline Builder mode with an ICP (highest-converting setup):

{
  "sources": ["google-maps", "superbcompanies"],
  "services": "SEO agency",
  "location": "United States",
  "maxAgenciesPerSource": 200,
  "mode": "pipeline-builder",
  "targetProfile": {
    "services": ["SEO", "PPC"],
    "minEmployees": 10,
    "maxEmployees": 100,
    "minRating": 4.0,
    "minReviewCount": 10,
    "requireEmail": false,
    "requirePhone": true
  },
  "includeEmails": true,
  "proxyConfiguration": { "useApifyProxy": true }
}

Filters out every agency below 40% ICP fit. Output is sorted by leadScore descending — row 1 is your best lead. Pair with includeEmails: true for an outreach-ready pipeline in a single run.

One-click preset: Easy Wins (strong ICP fit + underexposed):

{
  "sources": ["google-maps", "superbcompanies"],
  "services": "SEO agency",
  "location": "United States",
  "maxAgenciesPerSource": 200,
  "preset": "easy_wins",
  "targetProfile": { "services": ["SEO"], "minRating": 4.0 },
  "proxyConfiguration": { "useApifyProxy": true }
}

The easy_wins preset sets mode=pipeline-builder, strategy=high-opportunity, and filters to icpFitScore ≥ 50 AND reviewCount ≤ 30 — agencies that match your ICP but your competitors are missing. Other presets: high_intent (contactable + momentum), enterprise_targets (50+ employees, authority-sorted), fresh_leads (only new since last run).

Zapier / Slack / AI agent input: decision-ready mode:

{
  "sources": ["google-maps"],
  "services": "marketing agency",
  "location": "New York",
  "maxAgenciesPerSource": 50,
  "mode": "pipeline-builder",
  "strategy": "high-opportunity",
  "outputMode": "decision-ready",
  "targetProfile": { "services": ["SEO"], "minRating": 4.0 },
  "proxyConfiguration": { "useApifyProxy": true }
}

Returns slim records with contactPriority: HIGH|MEDIUM|LOW, leadType, outreachAngle, and recommendedAction. Drop-in for no-code tools — no JSON parsing needed on the consumer side.

Input tips

Start with the defaults — Google Maps + SuperbCompanies with 50 agencies each covers the most common use case and gives fast feedback before scaling up.
Location drives Google Maps quality — the location field is concatenated with services to form the Google Maps query (e.g. "SEO agency London"). A precise city name produces the most relevant local results. Leave it blank if you want global directory results from SuperbCompanies or TheManifest.
Use TheManifest cautiously — TheManifest may be Cloudflare-protected at times. If it returns zero results, the run continues cleanly with the other sources and a warning is logged. Google Maps and SuperbCompanies results are never affected.
Set a spending limit for large batches — at 3 sources × 500 agencies = up to 1,500 records, the maximum cost is $75. Set a spending limit in the run settings to cap spend automatically.
Run separate inputs for different service types — if you need both SEO agencies and web design agencies, run them as two separate inputs. Each run maintains its own deduplication state.

Output example

{
  "recordType": "agency",
  "rank": 1,
  "leadScore": 82,
  "scoreBreakdown": {
    "authority": 22,
    "growth": 12,
    "completeness": 4,
    "contactability": 27,
    "trust": 17
  },
  "whyHighScore": [
    "High rating (4.8)",
    "Strong review volume (94)",
    "Growing reviews (+8 since last run)",
    "Email available",
    "Phone available",
    "Trusted (4★+ with 10+ reviews)"
  ],
  "icpFitScore": 88,
  "icpFitReasons": [
    "Service match: seo, ppc",
    "Size matches (10–49)",
    "Rating ≥ 4",
    "Email present"
  ],
  "tier": "boutique",
  "outreachScore": 100,
  "opportunityScore": 78,
  "buyingSignals": [
    "momentum_growth",
    "rising_rating",
    "high_authority",
    "strong_icp_match",
    "boutique_scale"
  ],
  "outreachAngle": "Mid-sized boutique SEO agency with rising reviews — strong ICP fit",
  "leadType": "ideal_match",
  "isTopLead": true,
  "confidenceScore": 93,
  "tldr": "Boutique SEO in New York, NY (4.8★, 94 reviews)",
  "nextAction": {
    "type": "send_outreach",
    "reason": "Ideal ICP match with email available. Mid-sized boutique SEO agency with rising reviews — strong ICP fit",
    "priority": "HIGH",
    "tool": null
  },
  "nextSteps": [
    {
      "actor": "ryanclinton/waterfall-contact-enrichment",
      "input": { "companies": [{ "name": "Apex Digital Strategies", "domain": "apexdigitalstrategies.com" }] },
      "reason": "Find decision-maker names, titles, and emails through a 10-step enrichment cascade."
    },
    {
      "actor": "ryanclinton/bulk-email-verifier",
      "input": { "emails": ["hello@apexdigitalstrategies.com"] },
      "reason": "Verify the email is deliverable before adding to an outreach sequence (protects sender reputation)."
    }
  ],
  "isActive": true,
  "agencyName": "Apex Digital Strategies",
  "website": "https://apexdigitalstrategies.com",
  "domain": "apexdigitalstrategies.com",
  "phone": "+1 (212) 555-0142",
  "email": "hello@apexdigitalstrategies.com",
  "address": "350 5th Ave, New York, NY 10118",
  "services": ["SEO", "PPC", "Content Marketing", "Email Marketing"],
  "location": "350 5th Ave, New York, NY 10118",
  "employeeCount": null,
  "minProjectSize": null,
  "reviewCount": 94,
  "rating": 4.8,
  "source": "google-maps",
  "sourceUrl": "https://www.google.com/maps/place/Apex+Digital+Strategies",
  "scrapedAt": "2026-03-22T10:14:33.121Z",
  "isNewSinceLastRun": false,
  "changes": {
    "newReviews": 8,
    "ratingChange": 0.1,
    "scoreChange": 4,
    "newServices": ["Email Marketing"],
    "previousLeadScore": 78
  }
}

(The email field is populated only when you set includeEmails: true — otherwise it's null. The changes object is populated only when this domain was present in the previous run.)

SuperbCompanies records include team size and minimum project size where available:

{
  "recordType": "agency",
  "rank": 2,
  "leadScore": 76,
  "isActive": true,
  "agencyName": "Meridian Growth Partners",
  "website": "https://meridiangrowthpartners.com",
  "domain": "meridiangrowthpartners.com",
  "phone": null,
  "email": null,
  "address": null,
  "services": ["SEO", "PPC", "Social Media", "Branding", "Web Design"],
  "location": "Austin, TX",
  "employeeCount": "10–49",
  "minProjectSize": "$5,000+",
  "reviewCount": 31,
  "rating": 4.9,
  "source": "superbcompanies",
  "sourceUrl": "https://superbcompanies.com/organizations/meridian-growth-partners",
  "scrapedAt": "2026-03-22T10:19:07.883Z",
  "isNewSinceLastRun": false
}

Run summary (key-value store, not dataset)

The summary is written to the run's key-value store under the key SUMMARY so your dataset stays uniform and CSV-ready. Read it via the Apify API, the Console's Storage tab, or the apify-client library:

{
  "recordType": "summary",
  "mode": "list-builder",
  "strategy": "balanced",
  "outputMode": "raw",
  "totalAgencies": 143,
  "droppedByMode": 0,
  "bySource": { "google-maps": 50, "superbcompanies": 50, "themanifest": 43 },
  "byTier": { "enterprise": 12, "boutique": 87, "freelance": 18, "unknown": 26 },
  "byLeadType": { "ideal_match": 24, "growth_target": 41, "nurture": 62, "low_priority": 16 },
  "avgRating": 4.62,
  "medianReviewCount": 38,
  "avgLeadScore": 71.4,
  "avgOpportunityScore": 48.7,
  "avgOutreachScore": 62.3,
  "topServices": ["seo", "ppc", "web design", "branding", "content marketing"],
  "topAgencies": [
    { "rank": 1, "agencyName": "Apex Digital Strategies", "domain": "apexdigitalstrategies.com", "leadScore": 82, "opportunityScore": 78, "leadType": "ideal_match" }
  ],
  "topOpportunities": [
    { "rank": 14, "agencyName": "Undercurrent Studios", "domain": "undercurrent.co", "opportunityScore": 92, "leadScore": 58, "leadType": "growth_target" }
  ],
  "fastestGrowing": [
    { "rank": 7, "agencyName": "North Peak SEO", "domain": "northpeakseo.com", "opportunityScore": 75, "leadType": "growth_target" }
  ],
  "highICPFitLowCompetition": [
    { "rank": 22, "agencyName": "Rising Tide PPC", "domain": "risingtideppc.com", "icpFitScore": 88, "leadType": "growth_target" }
  ],
  "newSinceLastRun": 12,
  "previousRunAt": "2026-03-15T10:31:07.448Z",
  "emailsCollected": 0,
  "activeCount": 138,
  "ppeChargesUsd": 7.15,
  "generatedAt": "2026-03-22T10:31:07.448Z"
}

The four "top" arrays answer different questions:

topAgencies — "who are the best companies?" (by leadScore)
topOpportunities — "who are the best TARGETS right now?" (by opportunityScore — momentum + gaps to fill)
fastestGrowing — "who's gaining review velocity?" (by newReviews delta vs last run)
highICPFitLowCompetition — "who matches my ICP but is underexposed?" (icpFitScore ≥ 60 AND ≤ 30 reviews)

Output fields

Field	Type	Description
`recordType`	string	Discriminator: `agency` for result rows, `error` if the run hit a fatal failure. Filter `recordType = 'agency'` to drop error rows in one step.
`rank`	number \| null	Position in this run's results when sorted by `leadScore` (1 = highest). Null on single-result runs.
`leadScore`	number	Lead quality 0–100. Sum of `scoreBreakdown`. Weighted: authority 25 + growth 15 + completeness 10 + contactability 30 + trust 20.
`scoreBreakdown`	object	Per-dimension score: `{ authority, growth, completeness, contactability, trust }`. Sums to `leadScore`.
`whyHighScore`	string[]	Plain-English reasons for the score, one per positive signal. Usable directly in emails, reports, dashboards.
`icpFitScore`	number \| null	0–100 match score against the `targetProfile` input. Null when no target profile is supplied.
`icpFitReasons`	string[]	Plain-English list of which ICP criteria this agency met. Empty when no criteria matched.
`tier`	string	Segmentation: `enterprise` (50+ employees or 100+ reviews), `boutique` (10–49 employees or 10+ reviews), `freelance` (<10 employees), `unknown`.
`outreachScore`	number	0–100 contactability. Weighted: email 40 + phone 30 + website 20 + location 10.
`opportunityScore`	number	0–100 "is this a good TARGET right now?" score. High rating + low reviews (underexposed) + growth momentum + contactability gaps + new-listing + ICP bonus.
`buyingSignals`	string[]	Machine-readable sales-language tags: `momentum_growth`, `surging_reviews`, `rising_rating`, `high_authority`, `high_rating_low_reviews`, `new_listing`, `missing_email`, `missing_phone`, `strong_icp_match`, `moderate_icp_match`, `enterprise_scale`, `boutique_scale`, `freelance_scale`.
`outreachAngle`	string \| null	One-line "why contact this lead now" sentence, ready to paste into Slack/email/CRM. Null when no significant signals.
`leadType`	string	`ideal_match` (high ICP fit + high leadScore), `growth_target` (strong opportunity signals), `nurture` (mid score), `low_priority` (bottom).
`isTopLead`	boolean	True for the top ~3% by composite leadScore + opportunityScore (or top 1 for small runs). "Just give me the best ones" flag.
`confidenceScore`	number	0–100 — how much to trust this record's scoring. Based on data density + signal count. Low = thin data, treat as provisional.
`tldr`	string \| null	Factual one-line summary (identity-framed, vs outreachAngle's sales framing): `"Boutique SEO in New York (4.8★, 94 reviews)"`.
`nextAction`	object	Machine-actionable recommendation: `{ type: "enrich_email"\|"send_outreach"\|"monitor"\|"skip"\|"investigate", reason, priority, tool }`. Zapier/n8n can `switch()` on `type`.
`nextSteps`	object[]	Pre-formed Apify actor chain hooks: `[{ actor, input, reason }]`. Each `input` can be POSTed straight to the named actor — zero glue.
`isActive`	boolean	True when the agency has a website AND rating > 0. Quick filter to drop dead/stub listings.
`agencyName`	string	Agency display name as returned by the source
`website`	string \| null	Normalised absolute URL of the agency's own website
`domain`	string \| null	Registrable domain extracted from `website` (e.g. `acmecorp.com`), used for cross-source deduplication
`phone`	string \| null	Phone number as returned by Google Maps; null for directory sources
`email`	string \| null	Primary email address. Populated only when `includeEmails: true`; null otherwise.
`address`	string \| null	Full street address as returned by Google Maps; null for directory sources
`services`	string[]	Service and specialty tags extracted from the source, up to 10 per record
`location`	string \| null	City and/or country as shown on the source; for Google Maps records this may be the full address
`employeeCount`	string \| null	Team size range, e.g. `10–49`, `50–249`; available from SuperbCompanies and TheManifest
`minProjectSize`	string \| null	Minimum project budget, e.g. `$5,000+`; available from SuperbCompanies and TheManifest
`reviewCount`	number \| null	Total number of client reviews on the source listing
`rating`	number \| null	Average star rating parsed as a float, e.g. `4.8`
`isNewSinceLastRun`	boolean	True if this domain was NOT present in the previous run's results. Always false on the very first run.
`changes`	object \| null	Deltas vs the previous run for the same domain: `{ newReviews, ratingChange, scoreChange, newServices[], previousLeadScore }`. Null on first run or new domains.
`source`	string	Which source provided this record: `google-maps`, `superbcompanies`, or `themanifest`
`sourceUrl`	string	Direct URL to the agency's profile page or Google Maps listing
`scrapedAt`	string	ISO 8601 timestamp of when the record was extracted

How much does it cost to find agency leads?

Agency Lead Finder uses pay-per-event pricing — you pay $0.05 per agency extracted and deduplicated by this actor. Platform compute is extra per Apify's standard model. You are never charged for duplicates removed during deduplication or for failed page loads.

Scenario	Agencies	Actor PPE	Total actor cost
Quick test (1 source, 10 agencies)	10	$0.05	$0.50
Small batch (2 sources, 25 each)	~50	$0.05	~$2.50
Standard run (2 sources, 50 each)	~100	$0.05	~$5.00
Large run (3 sources, 100 each)	~300	$0.05	~$15.00
Maximum batch (3 sources, 500 each)	~1,500	$0.05	~$75.00

If you enable includeEmails: true, the Google Maps sub-actor runs its email-enrichment chain (website crawl + bulk email verification + decision-maker lookup). Those nested actors charge their own PPE events to your account — budget roughly ~$0.10 extra per Google Maps agency on top of our $0.05. Leave includeEmails off for a plain directory dump and pair with Website Contact Scraper later if you want finer control over email discovery.

You can set a maximum spending limit per run in the Apify console to control costs. The actor stops pushing records when your budget is reached and the final status message plus the SUMMARY KV record show actual PPE charges incurred.

Compare this to B2B data platforms like Apollo or ZoomInfo at $49–$199/month for general contact data. Agency Lead Finder is purpose-built for agency prospecting, and most users building or refreshing an agency list spend $3–$15 per run with no subscription commitment.

Agency lead generation using the API

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("ryanclinton/agency-directory-scraper").call(run_input={
    "sources": ["google-maps", "superbcompanies"],
    "services": "marketing agency",
    "location": "New York",
    "maxAgenciesPerSource": 50,
    "proxyConfiguration": {
        "useApifyProxy": True
    }
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    if item.get("recordType") != "agency":
        continue  # skip the optional error record if present
    print(f"#{item['rank']} | score {item['leadScore']:>3} | {item['agencyName']} | {item['domain']} | {item.get('rating')}★")

# Read the run summary from the key-value store
kv = client.key_value_store(run["defaultKeyValueStoreId"])
summary = kv.get_record("SUMMARY")["value"]
print(f"Total: {summary['totalAgencies']}, avg lead score: {summary['avgLeadScore']}, {summary['newSinceLastRun']} new since last run")

JavaScript

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("ryanclinton/agency-directory-scraper").call({
    sources: ["google-maps", "superbcompanies"],
    services: "marketing agency",
    location: "New York",
    maxAgenciesPerSource: 50,
    proxyConfiguration: {
        useApifyProxy: true
    }
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
    if (item.recordType !== "agency") continue;
    console.log(`#${item.rank} | score ${item.leadScore} | ${item.agencyName} | ${item.domain} | ${item.source}`);
}

// Read the run summary from the key-value store
const kv = client.keyValueStore(run.defaultKeyValueStoreId);
const summary = (await kv.getRecord("SUMMARY"))?.value;
console.log(`Total: ${summary.totalAgencies}, avg lead score: ${summary.avgLeadScore}, ${summary.newSinceLastRun} new since last run`);

cURL

# Start the actor run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~agency-directory-scraper/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "sources": ["google-maps", "superbcompanies"],
    "services": "marketing agency",
    "location": "New York",
    "maxAgenciesPerSource": 50,
    "proxyConfiguration": {
      "useApifyProxy": true
    }
  }'

# Fetch results once the run completes (replace DATASET_ID from the run response)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

Technical details (optional)

The sections below go into selectors, sub-actor orchestration, crawl phases, and the full scoring formula. You don't need any of this to use the actor — skip straight to Tips for best results if you just want to run it.

How Agency Lead Finder works

input (sources[] | services | location | mode | strategy | preset | targetProfile | watchlistName)
       │
       ▼
   resolve preset → mode + strategy + post-filter
       │
       ┌────────────┬────────────────────┐
       ▼            ▼                    ▼
   Google Maps  SuperbCompanies      TheManifest
   sub-actor    CheerioCrawler       CheerioCrawler
       │            │                    │
       └────────────┴────────────────────┘
                    │
                    ▼
        normalise → unified AgencyRecord (sharedState seenDomains dedup)
                    │
                    ▼
   load previous snapshot (named KV: PREVIOUS_DOMAINS_<watchlist> + PREVIOUS_DOMAIN_SNAPSHOTS_<watchlist>)
                    │
                    ▼
   per-record scoring:
     leadScore (0–100, 5 components: authority/growth/completeness/contactability/trust)
     icpFitScore · tier · outreachScore · opportunityScore · confidenceScore
     leadType · isTopLead · buyingSignals[] · outreachAngle · tldr · nextAction · nextSteps[]
     changes (vs prior snapshot: newReviews/ratingChange/scoreChange/newServices)
                    │
                    ▼
   premium-ladder fields:
     schemaVersion · eventId · agentContract { decision, confidence, nextAction, costToAct }
                    │
                    ▼
   filter by mode (list-builder | outreach-ready | pipeline-builder)
   sort by strategy (balanced | high-opportunity | high-authority)
   rank
                    │
                    ▼
   shape: outputMode (raw | decision-ready) × outputProfile (minimal/standard/llm/full)
                    │
                    ▼
   PPE charge per record (Actor.charge eventChargeLimitReached → stop)
                    │
                    ┌────────┴────────┐
                    ▼                 ▼
                dataset            KV SUMMARY (totals + portfolio + agentContract + coverage)
                per-record         KV OUTPUT (full deterministic shape — every record)
                                   PREVIOUS_DOMAINS_<watchlist> (for next run's delta)
                    │
                    └─► optional Slack/Discord webhook (top leads + counts)

Phase 1 — Google Maps sub-actor call

The actor constructs a Google Maps search query by concatenating the services and location inputs (e.g. "marketing agency New York"). It then calls the ryanclinton/google-maps-email-extractor sub-actor with this query and the maxAgenciesPerSource limit. After the sub-actor run completes, the actor reads its dataset using Actor.apifyClient.dataset(run.defaultDatasetId).listItems() with a 1,000-item ceiling. Each Google Maps item is mapped to the unified AgencyRecord schema: title → agencyName, website → normalised URL, phone, address, categoryName → first services entry, totalScore → rating, reviewsCount → reviewCount. All discovered domains are added to a shared seenDomains Set before the crawler starts.

Phase 2 — CheerioCrawler for SuperbCompanies and TheManifest

Both directory sources are crawled with Crawlee's CheerioCrawler — a lightweight HTTP + Cheerio parser that requires no browser. The crawler runs at a concurrency of 5 with session pooling and cookie persistence. Both sources are seeded from their XML sitemaps: SuperbCompanies uses a sitemap index at /sitemap.xml that references child sitemaps (e.g. /sitemap-organizations-1.xml); TheManifest uses a single sitemap with /companies/ and /directory/ URL patterns, with an HTML anchor fallback if the XML sitemap yields no matches. The sharedState module — a plain TypeScript object imported directly by route handlers — carries the seenDomains Set (pre-populated with Google Maps domains), per-source counters, and the collected results array across all route invocations.

Phase 3 — Profile extraction and normalisation

Each route handler calls parseSuperbCompaniesProfile or parseTheManifestProfile from extractors.ts, which are pure functions that take a Cheerio $ object and return a structured partial record. Agency names are read from the first <h1>. Websites are found by scanning <a href^="http"> links while filtering a junk-domain list that includes linkedin, facebook, twitter, instagram, clutch, google, yelp, sortlist, and the source directory itself. SuperbCompanies profiles also check for a "Visit Website" link text before falling back to the junk-filter scan. Location is read from itemprop="addressLocality" / itemprop="addressCountry" structured data before falling back to class-name selectors. Service tags are collected from [class*="service"], [class*="expertise"], [class*="tag"], and [class*="skill"] elements, deduped, and capped at 10. Ratings and review counts pass through parseRating and parseReviewCount which handle formats including 4.8, 4.8/5, 4.8 stars, 1,234 reviews, and 45.

Phase 4 — Delta pass, scoring, ranking, PPE charging, and output

Once all sources complete, the allResults array is marked against the previous run's domain snapshot (isNewSinceLastRun), scored by scoreRecord() (the 0–100 weighted formula), and sorted by score descending so rank 1 is the strongest lead. Each record is pushed to the Apify dataset individually, highest-scoring first. In pay-per-event mode, Actor.charge({ eventName: 'agency-found', count: 1 }) fires after each push — if eventChargeLimitReached returns true, the loop exits cleanly and no further records or charges are made. A SUMMARY record is then written to the key-value store (not the dataset) with total counts, averages, top services, top-5 agencies, emails collected, and actual PPE charges. The current domain set is merged with the previous snapshot and saved to PREVIOUS_DOMAINS for the next run's delta pass.

Tips for best results

Match your keyword to how agencies describe themselves. Use "marketing agency" for the broadest results, or be specific with "SEO agency", "web design agency", or "digital advertising agency". Vague or misspelled keywords reduce Google Maps result quality.
Pair a precise city with Google Maps. Google Maps produces the most relevant results when location is a specific city like "Chicago" or "Toronto" rather than a broad region. For country-level coverage, omit location and use SuperbCompanies or TheManifest as your primary source.
Include SuperbCompanies as a minimum. SuperbCompanies exposes structured data (schema.org markup) that produces the most consistent employeeCount, minProjectSize, and rating fields. It is the most reliable supplementary directory source.
Treat TheManifest as a bonus source. TheManifest is a Clutch sister site with overlapping listings. Enable it for maximum coverage, but expect occasional zero-result runs if the site is Cloudflare-protected on that day. The run still succeeds with the other sources.
Use the domain field for downstream enrichment. Every record with a non-null domain can be fed directly into Website Contact Scraper to extract emails and phone numbers from agency websites, or into Email Pattern Finder to detect the email naming convention before crafting personalised outreach.
Schedule weekly runs for a living agency database. New agencies register on Google Maps and these directories regularly. A weekly scheduled run with downstream deduplication by domain keeps your prospecting list current without manual effort.
Set a spending limit on first-time runs. When testing a new keyword or location, set a $3–$5 spending limit in the run settings. The actor stops cleanly at your budget and outputs whatever it collected, so you can assess data quality before committing to a full run.
Run separate inputs for separate service categories. Each run maintains its own deduplication state. If you need both SEO agencies and content marketing agencies, run them as separate inputs rather than combining keywords, which can dilute Google Maps result relevance.

Combine with other Apify actors

Actor	How to combine
Website Contact Scraper	Feed the `domain` output into Website Contact Scraper to add email addresses and phone numbers to each agency record for outreach
Email Pattern Finder	Run Email Pattern Finder on each `domain` to detect the naming convention (e.g. `firstname@domain.com`) before personalising outreach at scale
Waterfall Contact Enrichment	Enrich each agency domain through a 10-step contact enrichment cascade to surface decision-maker names, titles, and emails
Bulk Email Verifier	Verify email addresses found for agencies before adding them to outreach sequences to protect sender reputation
B2B Lead Qualifier	Score the scraped agency list on 30+ signals to prioritise outreach to the highest-fit prospects first
HubSpot Lead Pusher	Push the completed agency dataset directly into HubSpot as company records with associated contact data
Website Tech Stack Detector	Detect which marketing tools each agency runs — useful for targeting agencies that use a specific platform your product integrates with
Lead Enrichment Pipeline	All-in-one Clay alternative: email discovery, verification, company research, and scoring in one run ($0.12/lead)
AI Outreach Personalizer	Generate personalized cold emails using your own OpenAI/Anthropic key — zero AI markup ($0.01/lead)
Intent Signal Tracker	Track buying signals: hiring, tech changes, funding, content updates. Prioritize outreach by intent score ($0.05/company)
Lead Data Quality Auditor	Audit lead data quality before outreach — email verification, phone validation, domain freshness ($0.005/record)

Premium output fields

Beyond the existing rich AgencyRecord, every record now also includes:

Field	Type	Description
`recordType`	string	Discriminator (already present): `agency`, `decision`, `error`.
`schemaVersion`	string	Output schema version (semver). Bumped on shape changes. Safe to branch on.
`eventId`	string	Idempotent canonical id `sha256(watchlistName::domain-or-sourceUrl)`. Same id across re-runs of the same agency — safe join key for downstream diffing.
`agentContract`	object	`{ decision, confidence, nextAction, costToAct }` decision surface for MCP and AI-agent consumers. Maps the existing `nextAction.type` + `leadScore` + `confidenceScore` to a stable enum (`enrich_email`, `send_outreach`, `monitor`, `skip`, `investigate`).

KV store mirrors

Every run writes:

SUMMARY key — totals + portfolio breakdowns + coverage block + run-level agentContract (the top lead's contract — most-actionable surface) + cost. Best for triggering downstream actors with a single read.
OUTPUT key — full deterministic per-agency output (regardless of outputMode / outputProfile) plus run-level agentContract and coverage. Use when you need every field even though the dataset is filtered.
PREVIOUS_DOMAINS[_<watchlistName>] + PREVIOUS_DOMAIN_SNAPSHOTS[_<watchlistName>] keys (default KV) — cross-run state for delta detection and per-domain change tracking. Watchlist namespacing lets the same actor run as N independent monitors. Backward compatible: no watchlistName set = same key as before.

Stable enums

The actor commits to additive-only evolution of these enums (new values may be added in minor versions; existing values never removed or renamed):

recordType — agency, decision, error
agentContract.nextAction — enrich_email, send_outreach, monitor, skip, investigate
agentContract.decision — qualified-A, qualified-B, review, low-priority, reject
nextAction.type (existing) — same values as agentContract.nextAction
nextAction.priority (existing) — HIGH, MEDIUM, LOW
leadType (existing) — ideal_match, growth_target, nurture, low_priority
tier (existing) — enterprise, boutique, freelance, unknown
mode (existing) — list-builder, outreach-ready, pipeline-builder
strategy (existing) — balanced, high-opportunity, high-authority
outputMode (existing) — raw, decision-ready
preset (existing) — none, high_intent, easy_wins, enterprise_targets, fresh_leads
failureType (error records, existing) — no-data, blocked, timeout, js-required, parse-error, invalid-input

Branching on these in Dify, n8n, Make, or your own code is safe across schemaVersion minor bumps.

Limitations

Google Maps results are location-dependent. Google Maps search quality varies significantly by location. Dense markets like New York or London return highly relevant results; smaller cities may return fewer agencies or adjacent business types. Supplement with SuperbCompanies for location-agnostic coverage.
TheManifest may be Cloudflare-protected. TheManifest occasionally blocks automated access. When this happens, the source returns zero results and a warning is logged. The run completes normally using the other sources. This is a known limitation and is noted in the actor logs.
Phone and address are Google Maps only. SuperbCompanies and TheManifest profile pages do not expose phone numbers or street addresses in a consistent, parseable form. The phone and address fields are null for all superbcompanies and themanifest records.
Employee count and min project size are directory sources only. Google Maps does not carry team size or budget data. The employeeCount and minProjectSize fields are null for all google-maps records.
Service tags reflect what the directory displays. Service categories on SuperbCompanies and TheManifest are set by the agency during registration and may be broad, inconsistent, or absent. Google Maps returns the business category name as a single-element services array.
Deduplication is domain-based within a single run. Two agencies at different domains that are the same company will both appear. Merging datasets across multiple runs will introduce duplicates — filter by domain in your downstream tooling.
Hard cap of 500 agencies per source per run. SuperbCompanies and TheManifest are accessed via sitemap order, which does not sort by rating or review count. The highest-reviewed agencies are not guaranteed to appear first from directory sources.
No individual profile deep-crawl for Google Maps. Phone and address come from the Google Maps sub-actor output. The sub-actor does not visit each agency's website — for email addresses, combine with Website Contact Scraper.
HTML changes on SuperbCompanies or TheManifest can reduce field coverage. Selectors use broad CSS class-name substring matching to tolerate minor changes, but a full redesign may require selector updates. Open an issue in the Issues tab if fields start returning null unexpectedly.

Integrations

Zapier — trigger a Zap when a run completes to route high-rated agencies directly into a CRM deal stage or sales sequence
Make — build a scenario that pulls agency results after each run and cross-references them against existing CRM contacts before creating new records
Google Sheets — append scraped agency rows to a shared spreadsheet for team review and manual qualification before outreach
Apify API — trigger runs programmatically from your internal tooling and retrieve results in JSON or CSV for downstream processing
Webhooks — post the completed dataset URL to a Slack channel or internal endpoint the moment a run finishes
LangChain / LlamaIndex — load agency records into a vector store to power an AI assistant that answers questions about the agency landscape in a given market

Troubleshooting

Zero results from Google Maps — Check that your services keyword and location form a valid Google Maps search. The query is constructed as "{services} {location}". Very niche keywords or misspellings can produce no results from the sub-actor. Try "marketing agency" + a major city as a smoke test.
Zero results from TheManifest — TheManifest may be Cloudflare-protected on the day of your run. This is expected behaviour. The run continues and uses Google Maps and SuperbCompanies results. Check the run log for the warning message "TheManifest returned 0 results" to confirm this is the cause.
Most fields are null for directory records — Fields like phone, address, employeeCount, and minProjectSize are source-dependent. phone and address are only populated for Google Maps records. employeeCount and minProjectSize are only available from SuperbCompanies and TheManifest when the agency has filled in their profile. Null values for these fields are expected and normal.
Fewer agencies than maxAgenciesPerSource — For a given keyword and location, Google Maps may return fewer results than your cap. SuperbCompanies and TheManifest sitemap coverage varies by niche — some service categories have fewer than 50 listed agencies. The actor returns all available records and stops without error.
Duplicate agencies in merged datasets — Deduplication operates within a single run by domain. If you merge datasets from multiple runs, duplicates will appear. Filter by domain in your downstream tooling to deduplicate across runs.

Responsible use

This actor accesses only publicly available agency listing data from directories whose core business model is built on public discovery of agency firms.
Respect the terms of service of each directory. Do not use this actor to systematically republish directory content or create a competing agency database.
When using scraped agency data for outreach, comply with CAN-SPAM, GDPR, and all other applicable data protection regulations in your jurisdiction.
Do not use extracted data for spam, harassment, or any unsolicited commercial contact that violates applicable law.
For guidance on web scraping legality, see Apify's guide.

FAQ

How many agency leads can I find in one run? Up to 500 agencies per source across up to three sources — giving a maximum of approximately 1,500 deduplicated agency records per run. In practice, most runs targeting a specific keyword and location return 50–200 records because not every source has 500 listings for every niche.

Which sources does Agency Lead Finder use? The actor uses three sources: Google Maps (via the ryanclinton/google-maps-email-extractor sub-actor), SuperbCompanies.com (scraped via sitemap), and TheManifest.com (scraped via sitemap). You can enable any combination by setting the sources input parameter. The default is Google Maps and SuperbCompanies.

How is Agency Lead Finder different from scraping Clutch or DesignRush? This actor targets Google Maps, SuperbCompanies, and TheManifest — not Clutch or DesignRush. Google Maps provides phone numbers and street addresses that Clutch does not. SuperbCompanies has 8,000+ open agency profiles accessible without aggressive bot protection. All three sources are combined and deduplicated in a single run, so you get broader coverage without building three separate scrapers.

Does agency lead finding work without a proxy? Google Maps results come from the sub-actor, which handles its own proxy use. For SuperbCompanies and TheManifest, standard Apify proxy (datacenter) is sufficient — neither site uses Cloudflare. The default proxyConfiguration is already correct. You do not need residential proxies for this actor.

What agency type keywords work best? Common keywords include "marketing agency", "SEO agency", "web design agency", "digital advertising agency", "branding agency", "social media agency", and "content marketing agency". The keyword drives the Google Maps query. Be as specific as your targeting requires — "B2B SaaS marketing agency" will return a narrower but more relevant set than "marketing agency".

How long does a typical agency lead finding run take? A standard run with two sources at 50 agencies each takes 10–20 minutes. Google Maps results arrive after the sub-actor call completes (typically 3–8 minutes depending on the result count); the CheerioCrawler then processes SuperbCompanies or TheManifest profile pages concurrently. Runs at 500 agencies per source may take 30–60 minutes.

How accurate is the extracted agency data? Agency names, websites, and locations are reliably extracted from all three sources. Google Maps records include phone and address when the business has a verified Maps listing. employeeCount and minProjectSize depend on whether the agency completed their SuperbCompanies or TheManifest profile — these fields are null when not provided. Star ratings and review counts are extracted where present.

Can I filter agency leads by location? Yes. Enter a city, state, country, or region in the location field. This is concatenated with your services keyword to form the Google Maps query (e.g. "marketing agency Chicago"). SuperbCompanies and TheManifest are crawled globally via sitemap and do not apply location filtering server-side — filter the output location field after the run for directory results.

Is it legal to scrape agency directories for lead generation? These directories publish agency information publicly as their core business model — the data is intentionally visible to anyone. Scraping publicly available business information for prospecting is generally lawful in most jurisdictions. Review each site's terms of service before large-scale use. For a detailed analysis of web scraping legality, see Apify's guide.

Can I use the agency leads with other Apify actors to get contact emails? Yes. Feed the domain field from this actor into Website Contact Scraper to extract emails and phone numbers from agency websites, or into Waterfall Contact Enrichment for a broader multi-step enrichment pipeline. The domain field is structured specifically to serve as input for these downstream actors.

Can I schedule this actor to run periodically? Yes. Apify's scheduler supports cron-based scheduling — daily, weekly, or monthly. Each run produces a fresh dataset. Use the Apify API or a Make/Zapier integration to merge new results into your CRM while deduplicating by domain across runs.

What happens if SuperbCompanies or TheManifest changes its HTML structure? Selectors use broad CSS class-name substring matching (e.g. [class*="service"], [class*="expertise"]) to tolerate minor HTML changes. A full site redesign may break extraction for that source, causing fields to return null. If a directory source starts returning blank records unexpectedly, open an issue in the Issues tab with your run ID so the selectors can be updated.

Help us improve

If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:

Go to Account Settings > Privacy
Enable Share runs with public Actor creators

This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.

Support

Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.

Agency Website Analyzer — Services & Tech Stack

ryanclinton/website-content-analyzer

Extracts structured intelligence from agency websites: services offered, industries served, client names, team size, tech stack, and tone — without an LLM API. Feed results into cold email personalization or CRM enrichment. $0.10 per website.

Ryan Clinton

DesignRush Agency Scraper

powerai/designrush-agency-scraper

Scrape design and marketing agencies from DesignRush.com with automatic pagination and comprehensive agency data extraction.

PowerAI

3.0

Agency Vista Scraper

glaciological_hexahedron/agency-vista-scraper

Extract structured marketing agency profiles from Agency Vista — name, services, industries, location, ratings, team size, social links, badges, clients, and verification status. 47K+ agencies. Pay only for valid records ($0.005 each). No login required, no API key, JSON or CSV output.

Alex Lowe

themanifest.com Company Scraper

powerai/themanifest-company-scraper

Scrape company profiles from themanifest.com with comprehensive business information including ratings, locations, clients, and industries.

PowerAI

Google Maps Scraper

harsha6582/my-actor

Scrape business leads from Google Maps with name, address, phone, website, and Maps URL. Ideal for sales & marketing

Harsha Vardhan Reddy Chintakunta

Google Maps Business Data Scraper

opspawn/google-maps-scraper

Extract business listings from Google Maps: name, address, phone, website, rating, review count. Perfect for lead generation and market research.

OpSpawn Agent

Google Maps Lead Enricher

pumpkin_xyst/Google-Maps-Lead-Enricher

Extract business leads from Google Maps including business name, address, phone number, website, rating and reviews. Perfect for marketers, agencies and lead generation. Quickly collect local business data for outreach and marketing campaigns.

Vadivel D

DesignRush Agency Scraper

kawsar/designrush-agency-scraper

Scrapes agency listings from DesignRush and returns names, ratings, services, location, and pricing in a dataset ready for lead research or competitor analysis.

Kawsar

Google Maps Missing Website Finder

coregent/google-maps-missing-website-finder

Find local businesses on Google Maps that do not have a website listed. Returns phone, address, category, rating, reviews, Google Maps URL, coordinates, and a transparent opportunity score for outreach by web design agencies, SEO consultants, and local lead-generation teams.

Delowar Munna

DesignRush Agency Scraper

dionysus_way/designrush-agency-scraper

Extract agencies' data from DesignRush.com marketplace. Get contact details, pricing, team size, ratings, reviews, services, social profiles, and portfolio info. Export to CSV/JSON. Perfect for lead generation and market research.

Dionysus

Agency Directory Scraper & Lead Finder

Agency Directory Scraper & Lead Intelligence

TL;DR

When this is the best choice

Why this exists

What it does

Use this actor when you want to

Who this is for

Common questions this actor answers

Best way to find agencies to sell your SaaS to

Clay alternative for outbound pipeline building

How to build an outbound pipeline automatically

Best tool to scrape and prioritise agency leads

How to prioritise B2B leads automatically

How to build a lead list of agencies

Cold email lead generation for agencies

Tool for finding and contacting marketing agencies

Example use

What you input → What you get → Outcome

What you input

What you get

Outcome

After you run this

Typical workflow

Designed for automation

How this compares to other tools

Replaces multiple tools

If you're evaluating lead generation tools

Works with AI agents and automation tools

Three output modes

Before / after

Also known as

What data can you extract from agency directories?

Why use Agency Lead Finder?

Features

Use cases for agency lead generation

Sales prospecting for SaaS and technology vendors

Marketing agency market mapping and competitive research

Recruiting and talent sourcing

Vendor evaluation and agency procurement

White-label agency partnership development

Data enrichment for existing CRM records

How to find agency leads

Input parameters

Input examples

Input tips

Output example

Run summary (key-value store, not dataset)

Output fields

How much does it cost to find agency leads?

Agency lead generation using the API

Python

JavaScript

cURL

Technical details (optional)

How Agency Lead Finder works

Phase 1 — Google Maps sub-actor call

Phase 2 — CheerioCrawler for SuperbCompanies and TheManifest

Phase 3 — Profile extraction and normalisation

Phase 4 — Delta pass, scoring, ranking, PPE charging, and output

Tips for best results

Combine with other Apify actors

Premium output fields

KV store mirrors

Stable enums

Limitations

Integrations

Troubleshooting

Responsible use

FAQ

Help us improve

Support

You might also like

Agency Website Analyzer — Services & Tech Stack

DesignRush Agency Scraper

Agency Vista Scraper

themanifest.com Company Scraper

Google Maps Scraper

Google Maps Business Data Scraper

Google Maps Lead Enricher