Deduped Job Intelligence for AI Agents avatar

Deduped Job Intelligence for AI Agents

Pricing

from $3.50 / 1,000 valid jobs

Go to Apify Store
Deduped Job Intelligence for AI Agents

Deduped Job Intelligence for AI Agents

Extract, normalize, and deduplicate public job postings into clean hiring-signal records with source evidence, role classification, and confidence scores.

Pricing

from $3.50 / 1,000 valid jobs

Rating

0.0

(0)

Developer

DeepAPI

DeepAPI

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

2 days ago

Last modified

Share

Extract, normalize, and deduplicate public job postings into clean hiring-signal records with source evidence, role classification, duplicate groups, and confidence scores.

Use this Actor when you need hiring intelligence rather than another raw job feed. Provide public careers, job-board, or direct posting URLs, and the Actor returns validated dataset rows that are ready for sales workflows, recruiting research, market analysis, and AI agents.

What this Actor does

  • Crawls user-supplied public careers, job-board, and direct job posting URLs.
  • Extracts job candidates from supported ATS/job-board pages and fallback job links.
  • Visits extracted job detail pages to enrich records with location, remote type, seniority, and evidence when available.
  • Extracts public salary or compensation ranges from structured job data and visible job detail text when available.
  • Normalizes job titles for cleaner grouping and analysis.
  • Detects common role functions, seniority signals, locations, and remote types when present.
  • Matches optional role keywords such as account executive, customer success, or sales engineer.
  • Groups duplicate postings with a stable duplicateGroupId.
  • Preserves source URLs and evidence for auditability.
  • Filters output with maxJobs and minConfidence.
  • Returns validated dataset rows suitable for automation and AI-agent workflows.

Use cases

  • B2B sales teams using hiring as an account signal.
  • Lead generation agencies enriching company lists.
  • Recruiters monitoring active roles across target companies.
  • Investors and analysts tracking hiring momentum.
  • AI agents that need normalized hiring data rather than raw HTML or duplicate job feeds.

Supported source types

Use public source URLs such as company careers pages, public job-board pages, direct public job posting URLs, and public ATS pages.

The output schema currently detects these providers when available: Greenhouse, Lever, Ashby, Workable, SmartRecruiters, Teamtailor, Breezy, and Workday.

Provider coverage depends on the public page structure available at crawl time.

Input

Use public HTTP(S) sourceUrls as the primary input. Optional filters let you cap results, match role keywords, filter by locations, include duplicate source URLs, set clean-mode requirements, and set a minimum confidence threshold.

{
"sourceUrls": [
"https://jobs.ashbyhq.com/notion",
"https://jobs.lever.co/posthog"
],
"maxJobs": 25,
"roleKeywords": [
"account executive",
"customer success",
"sales engineer",
"solutions engineer",
"engineer"
],
"locations": ["United States", "Remote"],
"includeSourceDuplicates": true,
"requireLocation": false,
"requireSalary": false,
"requireRoleKeywordMatch": false,
"minConfidence": 0.65
}

Clean mode toggles can return only jobs that are ready for downstream routing:

ToggleEffect
requireLocationReturn only jobs with a detected location.
requireSalaryReturn only jobs with a detected public salary range.
requireRoleKeywordMatchReturn only jobs that matched at least one configured role keyword.

Output

Each pushed dataset item is a validated, normalized job intelligence record.

{
"companyName": "Notion",
"jobTitle": "Account Executive, Commercial",
"normalizedTitle": "Account Executive Commercial",
"function": "sales",
"location": "New York, United States",
"salary": {
"currency": "USD",
"min": 150000,
"max": 180000,
"period": "year"
},
"sourceUrl": "https://jobs.ashbyhq.com/notion/9526496b-5c39-456b-a454-ebec889e7149",
"sourceUrls": [
"https://jobs.ashbyhq.com/notion/9526496b-5c39-456b-a454-ebec889e7149",
"https://jobs.ashbyhq.com/notion/fdc2a6c0-396a-45db-b465-683bacf4201e"
],
"sourceUrlsText": "https://jobs.ashbyhq.com/notion/9526496b-5c39-456b-a454-ebec889e7149 | https://jobs.ashbyhq.com/notion/fdc2a6c0-396a-45db-b465-683bacf4201e",
"jobBoardProvider": "ashby",
"duplicateGroupId": "notion:account-executive-commercial",
"duplicateCount": 2,
"roleKeywords": ["account executive"],
"roleKeywordsText": "account executive",
"hiringSignal": "Hiring Account Executive, Commercial",
"evidence": [
{
"type": "job_title",
"value": "Account Executive, Commercial",
"sourceUrl": "https://jobs.ashbyhq.com/notion/9526496b-5c39-456b-a454-ebec889e7149"
},
{
"type": "role_keyword",
"value": "account executive",
"sourceUrl": "https://jobs.ashbyhq.com/notion/9526496b-5c39-456b-a454-ebec889e7149"
},
{
"type": "salary",
"value": "USD 150000-180000 per year",
"sourceUrl": "https://jobs.ashbyhq.com/notion/9526496b-5c39-456b-a454-ebec889e7149"
}
],
"evidenceSummary": "job_title: Account Executive, Commercial (https://jobs.ashbyhq.com/notion/9526496b-5c39-456b-a454-ebec889e7149) | salary: USD 150000-180000 per year (https://jobs.ashbyhq.com/notion/9526496b-5c39-456b-a454-ebec889e7149)",
"salaryText": "USD 150000-180000 per year",
"confidence": 0.95,
"scrapedAt": "2026-06-29T15:52:57.933Z"
}

Spreadsheet preview:

companyNamenormalizedTitlefunctionlocationsalaryTextroleKeywordsTextduplicateCountsourceUrlsText
NotionAccount Executive CommercialsalesNew York, United StatesUSD 150000-180000 per yearaccount executive2https://jobs.ashbyhq.com/notion/9526496b... | https://jobs.ashbyhq.com/notion/fdc2a6c0...

Example results

A local sample run against two public sources produced:

Source URLs: 2
Requests processed: 2
Raw jobs found: 74
Valid jobs returned: 25
Pushed jobs: 25

The sample included merged duplicate sources for equivalent postings, role keyword matches, function detection, and source-backed evidence.

Output Fields For CSV Buyers

The dataset keeps rich arrays and objects for agents, but also includes flat text fields for spreadsheet exports:

FieldMeaning
sourceUrlsTextPipe-separated source URLs represented by the deduplicated job.
roleKeywordsTextPipe-separated matched role keywords.
evidenceSummaryPipe-separated source evidence summary for review in CSV, Excel, or Sheets.
salaryTextHuman-readable salary range when detected.

Troubleshooting

SymptomMost likely reasonWhat to do
No validJob rowsThe public page did not expose job links/data matching your filters and minConfidence.Lower minConfidence, remove restrictive locations, or pass the direct ATS/job-board URL.
Jobs are missing locations or salaryThe public job detail page does not expose those fields in visible text or JSON-LD.Keep the row if the title/source evidence is enough, or filter with locations only when location is required.
Fewer rows than expectedDuplicates were merged, charge limits stopped writes, or maxJobs capped output.Check OUTPUT.summary, duplicateCount, and sourceUrlsText.

Limitations

  • Works with public pages only.
  • Does not log in to private job boards or social networks.
  • Does not collect applicant data.
  • Does not guarantee complete coverage for every ATS or custom careers site.
  • Salary, department, seniority, location, and remote type are returned only when detected from public source content.
  • Provider support can vary when public page markup changes.

Local Development

From the portfolio root:

pnpm jobs:test
pnpm jobs:typecheck
pnpm jobs:sample

The sample script uses a separate storage-live-sample local storage directory and writes inspection files to data/live-sample.

Pricing Unit

Recommended pay-per-event unit:

validJob

The Actor will charge only when a normalized job passes validation and is pushed to the dataset.