Scout — Lead Enrichment + OSINT avatar

Scout — Lead Enrichment + OSINT

Pricing

from $50.00 / 1,000 person enricheds

Go to Apify Store
Scout — Lead Enrichment + OSINT

Scout — Lead Enrichment + OSINT

Email finder + lead enrichment + OSINT from public sources. Pass any fragment — name, email, or domain — get a verified dossier: 700+ identity sites, SMTP-validated emails, document mining, sanctions screen, domain→team discovery. $0.05 person, $0.15 domain. No API keys

Pricing

from $50.00 / 1,000 person enricheds

Rating

0.0

(0)

Developer

Logical Vivacity

Logical Vivacity

Maintained by Community

Actor stats

1

Bookmarked

5

Total users

4

Monthly active users

11 hours ago

Last modified

Share

Scout - OSINT + Lead Enrichment from Public Sources

Send Scout out with any fragment - a name, an email, a domain, a handle - and it brings back a verified, audit-ready dossier: cross-platform identity, work emails, employer firmographics, document evidence, and the org's actual team. No customer API keys. No per-seat pricing. Pay only per result.

Scout is a single Apify actor that spans the gap between an OSINT tool and a lead-enrichment platform. From a single field of input it returns a structured JSON dossier: verified identity, social presence across 700+ platforms, work-email waterfall, public document evidence, sanctions screening, full company firmographics - and when you scout a domain, the org's actual people, each spawned as a related entity and enriched in the same run.

Pricing: $0.05 per person, $0.15 per domain/org, $0.02 per spawned team member. Compute on top (Apify default). No subscriptions, no API keys, no surprises.


What's in the box

  • 🔍 Org → People discovery. Pass domain: "acme.com" and Scout walks the team / about / leadership pages, NER-extracts each person, pairs them with their title, and spawns a Person entity per teammate - then runs the full person-enrichment pipeline on each (email finder, GitHub, LinkedIn, Gravatar, …). One domain input → fully populated team graph.
  • 📧 Email finder waterfall. Given (name, domain) Scout generates plausible local-parts (first.last, flast, firstl, …), runs SMTP RCPT against each, returns the first verified hit with full provenance.
  • 📄 Document mining. Filename signals (date / kind / subject), embedded PDF/DOCX metadata (Author / Producer / Created), OCR fallback for image-only PDFs, NER-based co-occurrence with proximity scoring (closer name = stronger relation), section-aware resume parsing → structured WorkExperience and EducationEntry entries.
  • 🔗 High-signal individual systems. Booking links (Calendly, cal.com, SavvyCal, Hubspot Meetings, …). Newsletter detection (Substack, Beehiiv, Buttondown, Kit, Ghost). Cert Transparency for subdomain history.
  • 🥷 Stealth fetch. httpx-first with smart browser fallback for anti-bot hosts, playwright-stealth patches, randomised viewport, Google Referer header, and a curated list of "go straight to browser" hosts (LinkedIn, Twitter, Glassdoor, Crunchbase, …).
  • 🧹 Field normalization at the model level. Phones canonicalise to E.164, dates to ISO, URLs lose www. / fragments / trailing slashes, handles drop leading @, emails lowercase, LinkedIn URLs canonicalise to https://<host>/in/<slug>.
  • Single-entity quick mode. No more "wrap in entities: [...]" - just paste full_name or email or domain directly into the input.
  • 📜 Per-component provenance. Every component carries _added_by, _added_at, _confidence, _sources, _evidence - your compliance team can audit every datum.

Two ways to brief Scout

1) Quick mode (typed inputs, single entity)

Just fill the fields you have. No JSON wrapping required.

{
"full_name": "Jane Doe",
"email": "jane@acme.com"
}
{
"domain": "acme.com"
}
{
"linkedin_url": "https://www.linkedin.com/in/jane-doe"
}

2) Bulk mode (entities array)

For multi-lead runs.

{
"entities": [
{ "kind": "person", "full_name": "Jane Doe", "email": "jane@acme.com" },
{ "kind": "domain", "domain": "example.com" },
{ "kind": "organization", "name": "Acme Inc", "domain": "acme.com" }
]
}

Scout infers kind from whatever fields you pass - set it explicitly only when you want to override the heuristic.


Input field reference

All fields are optional individually; pass at least one.

Person fields

FieldTypeNotes
full_namestringMost useful single-anchor input
first_name / last_namestringUse if you only have parts
emailstringAuto-validated (syntax, MX, SMTP, breach)
phonestringAny format → E.164
company_namestringCombined with full_name enables email finder
titlestringJob title / role
locationstringCity / country - for disambiguation
linkedin_urlstringAuto-canonicalised
github_usernamestringStripped of @, lowercased
twitter_handlestringStripped of @, lowercased
bluesky_handle / mastodon_handlestring
notesstringFree-form context attached to the entity

Organization / domain fields

FieldTypeNotes
domainstringTriggers the full org pipeline + team discovery
name / company_namestringOrg / company name

Run controls

FieldTypeDefaultNotes
kindenumautoperson / organization / domain / email / phone / blank
processorsarrayallSubset of enrichment systems (advanced)
proxyConfigurationproxyApify proxy onStrongly recommend keeping enabled
headlessbooleantrueBrowser headless
perLeadTimeoutSecondsint180Total runtime ceiling per top-level entity

What Scout brings back per run

Identity & verification

  • Verified FullName, Email, Phone (E.164), Location, primary social handles
  • Cross-source Confirmed components per field with which sources agreed
  • Conflict components surfacing values Scout didn't commit to
  • SelfGrade - per-field confidence + an overall identity_locked flag + rationale
  • EntityTypeClassification - developer / executive / academic / creator / marketer / unknown
  • QualityGate with input-ambiguity score and suggested extra inputs

Email

  • RFC syntax + deliverability + MX + SMTP RCPT
  • Disposable + public-mailbox detection
  • Domain-level breach exposure (HaveIBeenPwned)
  • EmailPattern detection (first.last@, flast@, …)
  • Email finder - generates name@domain candidates and SMTP-verifies them when you have a name + an org domain anchor

Social & public presence

  • Bluesky, Mastodon, Reddit, Stack Overflow, Dev.to, Medium, Hacker News, Wikipedia/Wikidata
  • Keybase + PGP keyservers
  • Cross-platform handle map: probes ~700 sites via WhatsMyName (GitHub, GitLab, npm, PyPI, Hugging Face, Kaggle, Behance, Twitch, YouTube, Substack, Steam, ProductHunt, Strava, Letterboxd, …)
  • Per-platform false-positive filtering against the lead's known name + location
  • BookingLink - Calendly / cal.com / SavvyCal / Tidycal / Hubspot Meetings / Acuity / Bookings / Doodle / YCBM
  • Newsletter - Substack / Beehiiv / Buttondown / Kit / Ghost / Revue

Document evidence

  • Resume / CV / slide-deck discovery via dorked search
  • PDF / DOCX text extraction + emails / phones / URLs / hyperlinks
  • Embedded doc metadata: doc_title, doc_author, doc_subject, doc_producer, doc_creator, doc_created, doc_modified
  • Filename parser - extracts parsed_date, kind_hint (resume / cv / deck / report / invoice / contract / thesis / photo / …), subject_hint
  • OCR fallback - image-only PDFs retried via tesseract + pdf2image
  • Section-aware resume parser - emits structured WorkExperience (title, company, location, start/end dates, bullets) and EducationEntry (institution, degree, field, dates)
  • NER co-occurrence - extracts PERSON / ORG / GPE (location) entities from doc text with proximity scoring vs the primary person's name (closer = more related)
  • MentionedWith relation between people who appear together in the same document

GitHub depth (developer leads)

  • Profile + organisation stats
  • Public repos, top languages, most-starred repo
  • Recent activity heatmap, timezone inference, most-active hour/day
  • Repo README mining for emails
  • Commit-author email extraction (skips users.noreply.github.com)
  • Co-author / collaborator graph
  • Package ownership across npm, PyPI, crates.io

Compliance & risk

  • OFAC SDN sanctions screen (free public source)
  • PEP / adverse-news search
  • Domain-level breach history

Company / domain side

  • WHOIS - registrar, dates, registrant org
  • DNS - A, MX, NS, TXT, SPF, DMARC
  • Hosting - ASN, ASN organisation, country
  • MX provider detection - Google Workspace / Microsoft 365 / Zoho / self-hosted
  • Tech-stack fingerprint via Wappalyzer
  • CDN detection (Cloudflare, Fastly, Akamai, Vercel, Netlify, …)
  • Cert Transparency subdomain history + cert-issuance emails (crt.sh)
  • Subdomain enumeration
  • Status pages, public API docs
  • ProductHunt history, Wayback Machine timeline
  • LinkedIn company page (auth-walled fields are skipped, not faked)
  • ATS / hiring signals - Greenhouse, Lever, Ashby, Workable
  • Crunchbase, Glassdoor, BuiltWith
  • OrgPeople - scrapes team / about / leadership pages, spawns a Person entity per teammate, enriches each

Single-input demo flows

Just a name

{ "full_name": "Jane Doe" }

→ Scout anchors via dork search + identifier harvest → finds GitHub, LinkedIn, Twitter, personal domain → mines personal site for emails/handles → cross-corroborates → emits Confirmed on the agreed fields.

Just a domain

{ "domain": "acme.com" }

→ Full domain enrichment (WHOIS, DNS, hosting, tech stack, cert history) → scrapes /about (browser + networkidle so SPA content renders) → spawns a Person entity per teammate found on the team page → runs email finder + GitHub + LinkedIn + Gravatar on each → returns one org with N enriched related Persons.

Email + company

{ "email": "founder@acme.com", "company_name": "Acme Inc" }

→ Validates the email, mines breach data, checks SMTP → mines pattern → runs domain enrichment → infers WorksFor → completes person profile from socials.


Pricing

Pay only for results Scout actually delivers. No monthly fee, no per-seat licence, no API keys to manage.

EventWhen it firesPrice
person_enrichedOne Person / Email / Phone primary entity comes back enriched$0.05
domain_enrichedOne Domain / Organization primary entity comes back enriched$0.15
team_member_spawnedA teammate found via org→people discovery and enriched as a related Person$0.02 each

A typical run:

  • {full_name: "Jane Doe"}$0.05 (single person)
  • {email: "founder@acme.com"}$0.05 (email expanded into person)
  • {domain: "acme.com"} (returns 8 teammates) → $0.15 + 8 × $0.02 = $0.31

Scout does not bill when the result is too thin to be useful (SelfGrade.overall_confidence < 0.2 and identity not locked) - you only pay for runs that actually delivered something.

Apify charges its standard compute fee on top (typically a few cents per run depending on memory + duration).


Use cases

  • Sales / lead enrichment - turn a sparse CRM record into a usable contact + employer dossier; or expand a single domain into a team prospect list with verified emails.
  • OSINT / due diligence - investigate a counterparty, verify identity across multiple independent sources, catch sanctions / adverse-news flags.
  • Recruiting - enrich a candidate's full public footprint, structured employment history from their resume, signals across 700+ sites.
  • CRM hygiene - re-enrich existing leads, surface stale or wrong-person records, dedupe via cross-source identity verification.
  • Investigative reporting - public-source person lookup with per-field provenance.
  • Company research - go from a domain to a fully enriched org + team, including booking links and newsletter URLs for outreach.

Output shape

Each input entity becomes one dataset record:

{
"primary": {
"kind": "person",
"id": "...",
"FullName": { "value": "Jane Doe", "_added_by": "input", "_added_at": "..." },
"Email": { "address": "jane@acme.com", "_confidence": 0.85, "_sources": ["email_finder"] },
"GitHubProfile": { "username": "janedoe", "followers": 142, "..." : "..." },
"LinkedInUrl": { "value": "https://www.linkedin.com/in/jane-doe" },
"WorkExperience": [
{ "title": "Senior Software Engineer", "company": "Acme Inc", "start_date": "2022-01", "end_date": "Present", "bullets": [...] }
],
"EducationEntry": [
{ "institution": "Stanford University", "degree": "MSc", "field": "Computer Science", "start_date": "2018", "end_date": "2020" }
],
"BookingLink": [{ "provider": "calendly", "url": "https://calendly.com/jane-doe" }],
"DocEntityMention": [
{ "entity_type": "person", "value": "John Smith", "proximity_score": 0.97, "min_distance_chars": 33 }
],
"WorksFor": [{ "target_id": "...", "confidence": 0.85, "evidence": ["..."] }],
"Confirmed": [{ "field": "email", "value": "jane@acme.com", "confidence": 0.95 }],
"SelfGrade": { "overall_confidence": 0.78, "identity_locked": true, "evidence_strength": "strong" },
"LeadScore": { "score": 72, "tier": "hot", "persona": "developer" }
/* + 30+ more component blocks, each stamped with provenance */
},
"related": [
{ "kind": "organization", "id": "...", "FullName": { "value": "Acme Inc" }, "OwnsDomain": [...] },
{ "kind": "domain", "id": "...", "Domain": { "value": "acme.com" }, "WhoisData": {...}, "HomepageData": {...} }
],
"_meta": { "ticks_run": 4, "ran": {...}, "skipped": {...}, "failed": {...}, "elapsed_s": 137.4 }
}

Every component carries _added_by (which system emitted it), _added_at (UTC timestamp), _confidence, _sources, and _evidence - so the output is fully audit-able.


How does Scout compare?

ScoutApollo / ZoomInfo / ClearbitHunter.ioOSINT Industries
Data sourcePublic web only - every value tagged with originProprietary databaseEmail guesses + verificationPublic + private mix
API key requiredNoneYes (paid)Yes (paid)Yes (paid)
PricingPer-result, cents per lead$1K–$30K+ / yr per seat$50–$500+ / moSubscription
Identity verificationCross-source confidence + surfaced conflictsOpaque "verified" stampsEmail-onlyMixed
Domain → team people✅ Spawn each + enrich in same runYes (db)NoNo
Email finder + SMTP verify✅ Built inYes (db)✅ YesNo
Resume / PDF mining✅ Built in (incl. OCR)LimitedNoNo
GitHub depth✅ Commits, repos, co-authors, packagesNoneNoLimited
OFAC / sanctions✅ Built inAdd-onNoYes
Output is auditable✅ Full ledger + per-component provenanceNoNoMixed
Self-hostable✅ Run on your own Apify accountNoNoNo

Scout isn't a 200M-row contact DB. It's a verifiable, one-shot dossier per lead, paid by the result, with provenance your compliance team can audit.


FAQ

How do I find someone's email from just their name? Pass full_name and (optionally) company_name or domain. Scout probes public sources, mines documents and READMEs, runs cross-platform handle searches, and - when a corporate domain is involved - generates plausible work-email patterns and SMTP-verifies them. Success rate is highest when you provide at least one anchor beyond name (domain, company, or any handle).

How does the org → team flow work? Pass domain: "acme.com". Scout enriches the domain (WHOIS / DNS / hosting / tech stack / cert transparency) and also fetches the team / about / leadership pages with a real browser (so React-rendered content lands in the DOM), strips <script>/<style> via selectolax, runs spaCy NER + a strict heuristic name detector, pairs each name with the title text immediately following it, and spawns a Person entity for each teammate. The scheduler then enriches every spawned person on subsequent ticks - so a single domain input returns the org + N enriched team members.

Does Scout work without a domain or email? Yes - pass just full_name. Scout anchors identity from public mentions, but quality_gate.passed will likely be false for very common names, and you'll see multiple alternative hypotheses. Cross-cultural common names are the hardest case.

Is the output legally usable for outreach? Scout surfaces public data. Storing, redistributing, or selling that data may be regulated in your jurisdiction (GDPR, CCPA, PIPEDA, state data-broker laws). See "Use responsibly" below. Scout is a research tool - what you do with the output is your responsibility.

Why don't I see LinkedIn personal-profile data even though the URL is in the output? LinkedIn aggressively gates personal profiles even with stealth + Google Referer. Scout records the URL but won't fake the content if LinkedIn returns 999. Company pages frequently come through; personal pages typically don't.

How fast is it per lead? Default perLeadTimeoutSeconds is 180s. A single person input typically finishes in 30–90s. A domain input that spawns 5–10 team members can take 90–240s as Scout enriches each person. Lower the timeout for predictable cost; raise it for rich inputs you want fully exhausted.

Does it work for non-English names? Yes. Name-alias handling covers Anglo, Slavic, Persian, and several other naming conventions (e.g. nickname-to-formal: "Liz" ↔ "Elizabeth"; cross-cultural diminutives across Slavic / Persian / Arabic given-name traditions). Romanized names work best; CJK names work but disambiguation is harder.

Can I run it on a list of 10,000 leads? Yes - pass them as entities: [...]. Apify scales the actor automatically. Budget ~1.5 minutes per person and ~3–5 minutes per domain (because of team discovery). Account for proxy + upstream rate-limits at high concurrency.

Why is some data missing on my run? Several upstream sources rate-limit single IPs. Apify Proxy is strongly recommended. Failures are recorded in _meta.failed and Scout degrades gracefully - you always get a record back.

Will I be charged for runs that found nothing? No. Scout skips the per-result charge when SelfGrade.overall_confidence is below 0.2 and identity isn't locked. You'll still owe Apify's compute fee for the run, but the per-result fee that pays the actor is waived.

Can I customize which sources run? Yes - pass a processors array to enable a subset of enrichment systems. Default empty = run everything.


Limitations

  • Anti-bot variability. LinkedIn personal profiles, regional LinkedIn subdomains, and aggressive anti-bot sites still 999/403 even with stealth + Referer. Scout records what it tried, never falsifies.
  • Identity ambiguity. A common first name with no email, handle, or domain is hard to disambiguate. Scout will set quality_gate.passed=false, surface alternatives, and avoid guessing.
  • NER false positives on team pages. spaCy's small English model occasionally tags marketing-copy phrases as PERSON. The post-NER filter catches most ("Front Row", "Calls Completed"); a few may slip through.
  • SMTP RCPT is unreliable in production. Many providers accept-all or block from cloud IPs. A successful SMTP RCPT means "the server didn't reject" - not "this address truly works."
  • Public data only. No customer-supplied API keys, no paid data brokers, no auth-walled content.

Scout never raises on a single source failure - you always get a result, with the gaps clearly marked.


Use responsibly

The output describes real people and organizations. Your jurisdiction may regulate how this kind of data can be stored, redistributed, or sold (GDPR, CCPA, PIPEDA, state data-broker statutes). Scout is a research tool - operating it for due diligence, sales research, recruiting, or investigative work on parties with whom you have a legitimate interest is what it's built for. Bulk dataset construction or resale without lawful basis is on you.

Scout never bypasses authentication, never solves CAPTCHAs, and never exceeds public-rate-limit guidance. If a source returns "auth required" or rate-limits the request, the corresponding field is left null and the failure is recorded.


Keywords: lead enrichment, person enrichment, contact enrichment, email finder, email lookup, email verification, email finder waterfall, work email finder, OSINT tool, OSINT lead enrichment, people search, person lookup, identity verification, B2B contact data, B2B prospecting, LinkedIn enrichment, GitHub user lookup, sanctions screening, OFAC screening, WHOIS lookup, DNS lookup, company enrichment, firmographic enrichment, lead scoring, CRM enrichment, sales intelligence, due diligence, KYC, recruiting research, candidate enrichment, resume parser, structured resume parser, document mining, NER entity extraction, OCR PDF, cert transparency, subdomain enumeration, organization team discovery, domain to team, scout, dossier, prospecting, ZoomInfo alternative, Apollo alternative, RocketReach alternative, Clearbit alternative, Hunter.io alternative.