Scout — Lead Enrichment + OSINT
Pricing
from $50.00 / 1,000 person enricheds
Scout — Lead Enrichment + OSINT
Email finder + lead enrichment + OSINT from public sources. Pass any fragment — name, email, or domain — get a verified dossier: 700+ identity sites, SMTP-validated emails, document mining, sanctions screen, domain→team discovery. $0.05 person, $0.15 domain. No API keys
Pricing
from $50.00 / 1,000 person enricheds
Rating
0.0
(0)
Developer
Logical Vivacity
Actor stats
1
Bookmarked
5
Total users
4
Monthly active users
11 hours ago
Last modified
Categories
Share
Scout - OSINT + Lead Enrichment from Public Sources
Send Scout out with any fragment - a name, an email, a domain, a handle - and it brings back a verified, audit-ready dossier: cross-platform identity, work emails, employer firmographics, document evidence, and the org's actual team. No customer API keys. No per-seat pricing. Pay only per result.
Scout is a single Apify actor that spans the gap between an OSINT tool and a lead-enrichment platform. From a single field of input it returns a structured JSON dossier: verified identity, social presence across 700+ platforms, work-email waterfall, public document evidence, sanctions screening, full company firmographics - and when you scout a domain, the org's actual people, each spawned as a related entity and enriched in the same run.
Pricing: $0.05 per person, $0.15 per domain/org, $0.02 per spawned team member. Compute on top (Apify default). No subscriptions, no API keys, no surprises.
What's in the box
- 🔍 Org → People discovery. Pass
domain: "acme.com"and Scout walks the team / about / leadership pages, NER-extracts each person, pairs them with their title, and spawns a Person entity per teammate - then runs the full person-enrichment pipeline on each (email finder, GitHub, LinkedIn, Gravatar, …). One domain input → fully populated team graph. - 📧 Email finder waterfall. Given
(name, domain)Scout generates plausible local-parts (first.last,flast,firstl, …), runs SMTP RCPT against each, returns the first verified hit with full provenance. - 📄 Document mining. Filename signals (date / kind / subject), embedded PDF/DOCX metadata (Author / Producer / Created), OCR fallback for image-only PDFs, NER-based co-occurrence with proximity scoring (closer name = stronger relation), section-aware resume parsing → structured
WorkExperienceandEducationEntryentries. - 🔗 High-signal individual systems. Booking links (Calendly, cal.com, SavvyCal, Hubspot Meetings, …). Newsletter detection (Substack, Beehiiv, Buttondown, Kit, Ghost). Cert Transparency for subdomain history.
- 🥷 Stealth fetch. httpx-first with smart browser fallback for anti-bot hosts,
playwright-stealthpatches, randomised viewport, Google Referer header, and a curated list of "go straight to browser" hosts (LinkedIn, Twitter, Glassdoor, Crunchbase, …). - 🧹 Field normalization at the model level. Phones canonicalise to E.164, dates to ISO, URLs lose
www./ fragments / trailing slashes, handles drop leading@, emails lowercase, LinkedIn URLs canonicalise tohttps://<host>/in/<slug>. - ⚡ Single-entity quick mode. No more "wrap in
entities: [...]" - just pastefull_nameoremailordomaindirectly into the input. - 📜 Per-component provenance. Every component carries
_added_by,_added_at,_confidence,_sources,_evidence- your compliance team can audit every datum.
Two ways to brief Scout
1) Quick mode (typed inputs, single entity)
Just fill the fields you have. No JSON wrapping required.
{"full_name": "Jane Doe","email": "jane@acme.com"}
{"domain": "acme.com"}
{"linkedin_url": "https://www.linkedin.com/in/jane-doe"}
2) Bulk mode (entities array)
For multi-lead runs.
{"entities": [{ "kind": "person", "full_name": "Jane Doe", "email": "jane@acme.com" },{ "kind": "domain", "domain": "example.com" },{ "kind": "organization", "name": "Acme Inc", "domain": "acme.com" }]}
Scout infers kind from whatever fields you pass - set it explicitly only when you want to override the heuristic.
Input field reference
All fields are optional individually; pass at least one.
Person fields
| Field | Type | Notes |
|---|---|---|
full_name | string | Most useful single-anchor input |
first_name / last_name | string | Use if you only have parts |
email | string | Auto-validated (syntax, MX, SMTP, breach) |
phone | string | Any format → E.164 |
company_name | string | Combined with full_name enables email finder |
title | string | Job title / role |
location | string | City / country - for disambiguation |
linkedin_url | string | Auto-canonicalised |
github_username | string | Stripped of @, lowercased |
twitter_handle | string | Stripped of @, lowercased |
bluesky_handle / mastodon_handle | string | |
notes | string | Free-form context attached to the entity |
Organization / domain fields
| Field | Type | Notes |
|---|---|---|
domain | string | Triggers the full org pipeline + team discovery |
name / company_name | string | Org / company name |
Run controls
| Field | Type | Default | Notes |
|---|---|---|---|
kind | enum | auto | person / organization / domain / email / phone / blank |
processors | array | all | Subset of enrichment systems (advanced) |
proxyConfiguration | proxy | Apify proxy on | Strongly recommend keeping enabled |
headless | boolean | true | Browser headless |
perLeadTimeoutSeconds | int | 180 | Total runtime ceiling per top-level entity |
What Scout brings back per run
Identity & verification
- Verified
FullName,Email,Phone(E.164),Location, primary social handles - Cross-source
Confirmedcomponents per field with which sources agreed Conflictcomponents surfacing values Scout didn't commit toSelfGrade- per-field confidence + an overallidentity_lockedflag + rationaleEntityTypeClassification- developer / executive / academic / creator / marketer / unknownQualityGatewith input-ambiguity score and suggested extra inputs
- RFC syntax + deliverability + MX + SMTP RCPT
- Disposable + public-mailbox detection
- Domain-level breach exposure (HaveIBeenPwned)
EmailPatterndetection (first.last@,flast@, …)- Email finder - generates
name@domaincandidates and SMTP-verifies them when you have a name + an org domain anchor
Social & public presence
- Bluesky, Mastodon, Reddit, Stack Overflow, Dev.to, Medium, Hacker News, Wikipedia/Wikidata
- Keybase + PGP keyservers
- Cross-platform handle map: probes ~700 sites via WhatsMyName (GitHub, GitLab, npm, PyPI, Hugging Face, Kaggle, Behance, Twitch, YouTube, Substack, Steam, ProductHunt, Strava, Letterboxd, …)
- Per-platform false-positive filtering against the lead's known name + location
BookingLink- Calendly / cal.com / SavvyCal / Tidycal / Hubspot Meetings / Acuity / Bookings / Doodle / YCBMNewsletter- Substack / Beehiiv / Buttondown / Kit / Ghost / Revue
Document evidence
- Resume / CV / slide-deck discovery via dorked search
- PDF / DOCX text extraction + emails / phones / URLs / hyperlinks
- Embedded doc metadata:
doc_title,doc_author,doc_subject,doc_producer,doc_creator,doc_created,doc_modified - Filename parser - extracts
parsed_date,kind_hint(resume / cv / deck / report / invoice / contract / thesis / photo / …),subject_hint - OCR fallback - image-only PDFs retried via
tesseract+pdf2image - Section-aware resume parser - emits structured
WorkExperience(title, company, location, start/end dates, bullets) andEducationEntry(institution, degree, field, dates) - NER co-occurrence - extracts
PERSON/ORG/GPE(location) entities from doc text with proximity scoring vs the primary person's name (closer = more related) - MentionedWith relation between people who appear together in the same document
GitHub depth (developer leads)
- Profile + organisation stats
- Public repos, top languages, most-starred repo
- Recent activity heatmap, timezone inference, most-active hour/day
- Repo README mining for emails
- Commit-author email extraction (skips
users.noreply.github.com) - Co-author / collaborator graph
- Package ownership across npm, PyPI, crates.io
Compliance & risk
- OFAC SDN sanctions screen (free public source)
- PEP / adverse-news search
- Domain-level breach history
Company / domain side
- WHOIS - registrar, dates, registrant org
- DNS - A, MX, NS, TXT, SPF, DMARC
- Hosting - ASN, ASN organisation, country
- MX provider detection - Google Workspace / Microsoft 365 / Zoho / self-hosted
- Tech-stack fingerprint via Wappalyzer
- CDN detection (Cloudflare, Fastly, Akamai, Vercel, Netlify, …)
- Cert Transparency subdomain history + cert-issuance emails (crt.sh)
- Subdomain enumeration
- Status pages, public API docs
- ProductHunt history, Wayback Machine timeline
- LinkedIn company page (auth-walled fields are skipped, not faked)
- ATS / hiring signals - Greenhouse, Lever, Ashby, Workable
- Crunchbase, Glassdoor, BuiltWith
- OrgPeople - scrapes team / about / leadership pages, spawns a Person entity per teammate, enriches each
Single-input demo flows
Just a name
{ "full_name": "Jane Doe" }
→ Scout anchors via dork search + identifier harvest → finds GitHub, LinkedIn, Twitter, personal domain → mines personal site for emails/handles → cross-corroborates → emits Confirmed on the agreed fields.
Just a domain
{ "domain": "acme.com" }
→ Full domain enrichment (WHOIS, DNS, hosting, tech stack, cert history) → scrapes /about (browser + networkidle so SPA content renders) → spawns a Person entity per teammate found on the team page → runs email finder + GitHub + LinkedIn + Gravatar on each → returns one org with N enriched related Persons.
Email + company
{ "email": "founder@acme.com", "company_name": "Acme Inc" }
→ Validates the email, mines breach data, checks SMTP → mines pattern → runs domain enrichment → infers WorksFor → completes person profile from socials.
Pricing
Pay only for results Scout actually delivers. No monthly fee, no per-seat licence, no API keys to manage.
| Event | When it fires | Price |
|---|---|---|
person_enriched | One Person / Email / Phone primary entity comes back enriched | $0.05 |
domain_enriched | One Domain / Organization primary entity comes back enriched | $0.15 |
team_member_spawned | A teammate found via org→people discovery and enriched as a related Person | $0.02 each |
A typical run:
{full_name: "Jane Doe"}→ $0.05 (single person){email: "founder@acme.com"}→ $0.05 (email expanded into person){domain: "acme.com"}(returns 8 teammates) → $0.15 + 8 × $0.02 = $0.31
Scout does not bill when the result is too thin to be useful (SelfGrade.overall_confidence < 0.2 and identity not locked) - you only pay for runs that actually delivered something.
Apify charges its standard compute fee on top (typically a few cents per run depending on memory + duration).
Use cases
- Sales / lead enrichment - turn a sparse CRM record into a usable contact + employer dossier; or expand a single domain into a team prospect list with verified emails.
- OSINT / due diligence - investigate a counterparty, verify identity across multiple independent sources, catch sanctions / adverse-news flags.
- Recruiting - enrich a candidate's full public footprint, structured employment history from their resume, signals across 700+ sites.
- CRM hygiene - re-enrich existing leads, surface stale or wrong-person records, dedupe via cross-source identity verification.
- Investigative reporting - public-source person lookup with per-field provenance.
- Company research - go from a domain to a fully enriched org + team, including booking links and newsletter URLs for outreach.
Output shape
Each input entity becomes one dataset record:
{"primary": {"kind": "person","id": "...","FullName": { "value": "Jane Doe", "_added_by": "input", "_added_at": "..." },"Email": { "address": "jane@acme.com", "_confidence": 0.85, "_sources": ["email_finder"] },"GitHubProfile": { "username": "janedoe", "followers": 142, "..." : "..." },"LinkedInUrl": { "value": "https://www.linkedin.com/in/jane-doe" },"WorkExperience": [{ "title": "Senior Software Engineer", "company": "Acme Inc", "start_date": "2022-01", "end_date": "Present", "bullets": [...] }],"EducationEntry": [{ "institution": "Stanford University", "degree": "MSc", "field": "Computer Science", "start_date": "2018", "end_date": "2020" }],"BookingLink": [{ "provider": "calendly", "url": "https://calendly.com/jane-doe" }],"DocEntityMention": [{ "entity_type": "person", "value": "John Smith", "proximity_score": 0.97, "min_distance_chars": 33 }],"WorksFor": [{ "target_id": "...", "confidence": 0.85, "evidence": ["..."] }],"Confirmed": [{ "field": "email", "value": "jane@acme.com", "confidence": 0.95 }],"SelfGrade": { "overall_confidence": 0.78, "identity_locked": true, "evidence_strength": "strong" },"LeadScore": { "score": 72, "tier": "hot", "persona": "developer" }/* + 30+ more component blocks, each stamped with provenance */},"related": [{ "kind": "organization", "id": "...", "FullName": { "value": "Acme Inc" }, "OwnsDomain": [...] },{ "kind": "domain", "id": "...", "Domain": { "value": "acme.com" }, "WhoisData": {...}, "HomepageData": {...} }],"_meta": { "ticks_run": 4, "ran": {...}, "skipped": {...}, "failed": {...}, "elapsed_s": 137.4 }}
Every component carries _added_by (which system emitted it), _added_at (UTC timestamp), _confidence, _sources, and _evidence - so the output is fully audit-able.
How does Scout compare?
| Scout | Apollo / ZoomInfo / Clearbit | Hunter.io | OSINT Industries | |
|---|---|---|---|---|
| Data source | Public web only - every value tagged with origin | Proprietary database | Email guesses + verification | Public + private mix |
| API key required | None | Yes (paid) | Yes (paid) | Yes (paid) |
| Pricing | Per-result, cents per lead | $1K–$30K+ / yr per seat | $50–$500+ / mo | Subscription |
| Identity verification | Cross-source confidence + surfaced conflicts | Opaque "verified" stamps | Email-only | Mixed |
| Domain → team people | ✅ Spawn each + enrich in same run | Yes (db) | No | No |
| Email finder + SMTP verify | ✅ Built in | Yes (db) | ✅ Yes | No |
| Resume / PDF mining | ✅ Built in (incl. OCR) | Limited | No | No |
| GitHub depth | ✅ Commits, repos, co-authors, packages | None | No | Limited |
| OFAC / sanctions | ✅ Built in | Add-on | No | Yes |
| Output is auditable | ✅ Full ledger + per-component provenance | No | No | Mixed |
| Self-hostable | ✅ Run on your own Apify account | No | No | No |
Scout isn't a 200M-row contact DB. It's a verifiable, one-shot dossier per lead, paid by the result, with provenance your compliance team can audit.
FAQ
How do I find someone's email from just their name?
Pass full_name and (optionally) company_name or domain. Scout probes public sources, mines documents and READMEs, runs cross-platform handle searches, and - when a corporate domain is involved - generates plausible work-email patterns and SMTP-verifies them. Success rate is highest when you provide at least one anchor beyond name (domain, company, or any handle).
How does the org → team flow work?
Pass domain: "acme.com". Scout enriches the domain (WHOIS / DNS / hosting / tech stack / cert transparency) and also fetches the team / about / leadership pages with a real browser (so React-rendered content lands in the DOM), strips <script>/<style> via selectolax, runs spaCy NER + a strict heuristic name detector, pairs each name with the title text immediately following it, and spawns a Person entity for each teammate. The scheduler then enriches every spawned person on subsequent ticks - so a single domain input returns the org + N enriched team members.
Does Scout work without a domain or email?
Yes - pass just full_name. Scout anchors identity from public mentions, but quality_gate.passed will likely be false for very common names, and you'll see multiple alternative hypotheses. Cross-cultural common names are the hardest case.
Is the output legally usable for outreach? Scout surfaces public data. Storing, redistributing, or selling that data may be regulated in your jurisdiction (GDPR, CCPA, PIPEDA, state data-broker laws). See "Use responsibly" below. Scout is a research tool - what you do with the output is your responsibility.
Why don't I see LinkedIn personal-profile data even though the URL is in the output? LinkedIn aggressively gates personal profiles even with stealth + Google Referer. Scout records the URL but won't fake the content if LinkedIn returns 999. Company pages frequently come through; personal pages typically don't.
How fast is it per lead?
Default perLeadTimeoutSeconds is 180s. A single person input typically finishes in 30–90s. A domain input that spawns 5–10 team members can take 90–240s as Scout enriches each person. Lower the timeout for predictable cost; raise it for rich inputs you want fully exhausted.
Does it work for non-English names? Yes. Name-alias handling covers Anglo, Slavic, Persian, and several other naming conventions (e.g. nickname-to-formal: "Liz" ↔ "Elizabeth"; cross-cultural diminutives across Slavic / Persian / Arabic given-name traditions). Romanized names work best; CJK names work but disambiguation is harder.
Can I run it on a list of 10,000 leads?
Yes - pass them as entities: [...]. Apify scales the actor automatically. Budget ~1.5 minutes per person and ~3–5 minutes per domain (because of team discovery). Account for proxy + upstream rate-limits at high concurrency.
Why is some data missing on my run?
Several upstream sources rate-limit single IPs. Apify Proxy is strongly recommended. Failures are recorded in _meta.failed and Scout degrades gracefully - you always get a record back.
Will I be charged for runs that found nothing?
No. Scout skips the per-result charge when SelfGrade.overall_confidence is below 0.2 and identity isn't locked. You'll still owe Apify's compute fee for the run, but the per-result fee that pays the actor is waived.
Can I customize which sources run?
Yes - pass a processors array to enable a subset of enrichment systems. Default empty = run everything.
Limitations
- Anti-bot variability. LinkedIn personal profiles, regional LinkedIn subdomains, and aggressive anti-bot sites still 999/403 even with stealth + Referer. Scout records what it tried, never falsifies.
- Identity ambiguity. A common first name with no email, handle, or domain is hard to disambiguate. Scout will set
quality_gate.passed=false, surface alternatives, and avoid guessing. - NER false positives on team pages. spaCy's small English model occasionally tags marketing-copy phrases as PERSON. The post-NER filter catches most ("Front Row", "Calls Completed"); a few may slip through.
- SMTP RCPT is unreliable in production. Many providers accept-all or block from cloud IPs. A successful SMTP RCPT means "the server didn't reject" - not "this address truly works."
- Public data only. No customer-supplied API keys, no paid data brokers, no auth-walled content.
Scout never raises on a single source failure - you always get a result, with the gaps clearly marked.
Use responsibly
The output describes real people and organizations. Your jurisdiction may regulate how this kind of data can be stored, redistributed, or sold (GDPR, CCPA, PIPEDA, state data-broker statutes). Scout is a research tool - operating it for due diligence, sales research, recruiting, or investigative work on parties with whom you have a legitimate interest is what it's built for. Bulk dataset construction or resale without lawful basis is on you.
Scout never bypasses authentication, never solves CAPTCHAs, and never exceeds public-rate-limit guidance. If a source returns "auth required" or rate-limits the request, the corresponding field is left null and the failure is recorded.
Keywords: lead enrichment, person enrichment, contact enrichment, email finder, email lookup, email verification, email finder waterfall, work email finder, OSINT tool, OSINT lead enrichment, people search, person lookup, identity verification, B2B contact data, B2B prospecting, LinkedIn enrichment, GitHub user lookup, sanctions screening, OFAC screening, WHOIS lookup, DNS lookup, company enrichment, firmographic enrichment, lead scoring, CRM enrichment, sales intelligence, due diligence, KYC, recruiting research, candidate enrichment, resume parser, structured resume parser, document mining, NER entity extraction, OCR PDF, cert transparency, subdomain enumeration, organization team discovery, domain to team, scout, dossier, prospecting, ZoomInfo alternative, Apollo alternative, RocketReach alternative, Clearbit alternative, Hunter.io alternative.