# Validated Jobs Scraper: Dedup, No Ghost Jobs, Confidence Scored (`jan_hilgard/validated-jobs-scraper`) Actor

Job data with a correctness guarantee: per-field confidence, ghost-job filtering and cross-source dedup — never silently wrong, duplicated or expired. Reaches LinkedIn and ATS boards cookieless, gets through Cloudflare/DataDome, self-healing on layout shifts. Built on the data.hilgard.cz engine.

- **URL**: https://apify.com/jan\_hilgard/validated-jobs-scraper.md
- **Developed by:** [Jan Hilgard](https://apify.com/jan_hilgard) (community)
- **Categories:** Jobs, Lead generation, Automation
- **Stats:** 3 total users, 2 monthly users, 0.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$2.50 / 1,000 job results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Validated Jobs Scraper

**Job data that is deduplicated, ghost-filtered and never silently wrong.**

You give it a source and a query (an ATS company, or keywords for LinkedIn / Indeed). You get back
clean job records — title, company, location, salary, employment and workplace type — and
on top of every record a correctness layer: a confidence on each field, a single
`reliable` flag, a ghost-job score, and cross-source dedup. When a field does not clear
the bar you get a `null` with a stated reason (`status: "absent"`), not a guessed value.
When the same opening sits on several boards you get it once, with the others listed under
`also_at`. That guarantee is the product; the engine underneath is only how it is kept.

Most job scrapers optimise for how many fields they return. That is the easy part. The
hard part — and the expensive one when it goes wrong — is a job that is silently
duplicated, already filled, reposted for the tenth time, or quietly mis-parsed. This one
is built around that: it scores its own confidence per field, flags ghost jobs, and
collapses duplicates across sources, so what you load is what is real.

Keywords: no silent errors, per-field confidence, validated jobs, ghost job detection,
deduplicated jobs, hiring intent, cookieless jobs scraper, linkedin jobs scraper,
indeed jobs scraper, ats greenhouse lever ashby, job posting api, labor market data,
hiring intent signals, sales prospecting jobs.

---

### What a real run looks like

One query in, one row per job out — each in the engine's snake_case schema, with the
per-field confidence and the enrichment block attached. Below is a **real record** from a
Greenhouse run (the two `description_*` fields are large and omitted here for brevity;
every enrichment field is wrapped as
`{ value, status, score, source, model }`).

**(a) A reliable record — and where the page has no salary, it says so instead of guessing.**

```json
{
  "source": "greenhouse",
  "canonical_url": "https://job-boards.greenhouse.io/gitlab/jobs/8565469002",
  "apply_url": "https://job-boards.greenhouse.io/gitlab/jobs/8565469002",
  "title": "AI Engineer",
  "company": { "name": "GitLab", "url": null },
  "location": { "raw": "Remote, US", "city": null, "region": null, "country": null, "workplace_type": "remote" },
  "employment_type": null,
  "date_posted": "2026-05-29",
  "salary": { "min": null, "max": null, "currency": null, "period": null, "source": "absent" },
  "seniority": null,
  "skills": [],
  "reliable": true,
  "overall_confidence": 0.95,
  "fields": {
    "title":           { "value": "AI Engineer",  "status": "confirmed", "score": 0.95, "source": "ats" },
    "company":         { "value": "GitLab",        "status": "confirmed", "score": 0.95, "source": "ats" },
    "location":        { "value": "Remote, US",    "status": "confirmed", "score": 0.95, "source": "ats" },
    "salary":          { "value": null,            "status": "absent",    "score": null, "source": null },
    "employment_type": { "value": null,            "status": "absent",    "score": null, "source": null }
  },
  "also_at": [],
  "enrichment": {
    "quality": {
      "ghost_job_score": { "value": 0.1,  "status": "low",    "score": 0.5,  "source": "inferred", "model": "qwen3.6-35b@1" },
      "flags":           { "value": [],   "status": "absent", "score": null, "source": "inferred", "model": null },
      "is_real":         { "value": true, "status": "low",    "score": 0.5,  "source": "inferred", "model": "qwen3.6-35b@1" }
    },
    "dedup": {
      "is_duplicate_of": { "value": null,           "status": "absent",    "score": null, "source": "inferred", "model": null },
      "sources_seen":    { "value": ["greenhouse"], "status": "confirmed", "score": 0.95, "source": "inferred", "model": null }
    },
    "normalized": {
      "role_normalized":      { "value": "AI Engineer", "status": "low",  "score": null, "source": "inferred", "model": "qwen3.6-35b@1" },
      "seniority_normalized": { "value": "Mid",         "status": "low",  "score": null, "source": "inferred", "model": "qwen3.6-35b@1" },
      "skills":               { "value": [{ "name": "Python", "type": "nice_to_have", "score": 0.9 }, { "name": "TypeScript", "type": "nice_to_have", "score": 0.9 }, { "name": "LLMs", "type": "nice_to_have", "score": 0.9 }], "status": "high", "score": 0.9, "source": "inferred", "model": "qwen3.6-35b@1" }
    },
    "hiring_intent": {
      "buying_signal_score": { "value": 0.11, "status": "low", "score": 0.45, "source": "inferred", "model": null },
      "company_signals":     { "value": { "open_roles_count": 1, "role_velocity": 0.03, "expanding_departments": [] }, "status": "low", "score": 0.45, "source": "inferred", "model": null }
    }
  },
  "company_name": "GitLab",
  "location_raw": "Remote, US",
  "ghost_job_score": 0.1,
  "success": true,
  "error": null
}
````

Note the `salary` and `employment_type`: the page did not state them, so they come back
`status: "absent"` with a `null` value — not a guessed band. The `ghost_job_score` is low
(0.1), so the record is trusted. `company_name`, `location_raw` and `ghost_job_score` at
the bottom are flat copies the actor lifts out of the nested objects for the dataset table.

**(b) A ghost-suspect job with thin data. It does NOT guess — it flags and fails loud.**
*(Illustrative record in the same real shape — a live ghost can't be produced on demand.)*

```json
{
  "source": "linkedin",
  "canonical_url": "https://www.linkedin.com/jobs/view/...",
  "title": "Marketing Manager",
  "company": { "name": "Stealth Startup", "url": null },
  "location": { "raw": null, "city": null, "region": null, "country": null, "workplace_type": null },
  "employment_type": null,
  "salary": { "min": null, "max": null, "currency": null, "period": null, "source": "absent" },
  "reliable": false,
  "overall_confidence": 0.39,
  "fields": {
    "title":    { "value": "Marketing Manager", "status": "high",   "score": 0.9,  "source": "html" },
    "company":  { "value": "Stealth Startup",   "status": "low",    "score": 0.41, "source": "html" },
    "location": { "value": null,                "status": "absent", "score": null, "source": null },
    "salary":   { "value": null,                "status": "absent", "score": null, "source": null }
  },
  "also_at": [],
  "enrichment": {
    "quality": {
      "ghost_job_score": { "value": 0.78,                  "status": "low",  "score": 0.6, "source": "inferred", "model": "qwen3.6-35b@1" },
      "flags":           { "value": ["reposted", "vague_jd"], "status": "high", "score": 0.7, "source": "inferred", "model": "qwen3.6-35b@1" },
      "is_real":         { "value": false,                 "status": "low",  "score": 0.6, "source": "inferred", "model": "qwen3.6-35b@1" }
    }
  },
  "ghost_job_score": 0.78,
  "success": true,
  "error": null
}
```

The value is the second row. A cheaper tool would have returned this as just another
clean-looking hit. Here the `ghost_job_score` is high (0.78), the flags say why
(`reposted`, `vague_jd`), `is_real` is `false`, the weak `company` field is `low`, and the
absent ones are marked absent — not filled with a guess. (Note: on LinkedIn an empty
apply link is normal, so no apply-related flag fires — the `no_real_apply` flag only fires
on ATS boards where a real apply path is expected.) Turn `drop_ghost` on and a row like
this is removed before it reaches you — and not charged.

***

### Why this beats LinkedIn-only and AI scrapers

Adapting to a layout and scraping a lot of fields is table-stakes now — this does both.
But scraping is not the same as being right. A scraper can return *a* job and still hand
you one that is duplicated three times, already filled, reposted for months, or quietly
mis-parsed — and say nothing. That silent bad row is the one that costs you, because you
act on it. The difference here is the guarantee, not the scraping: every field carries
its own confidence, every record carries a ghost-job score, duplicates are collapsed
across sources, and a cheap enrichment runs on **every** record — not just a sampled few.

Two things others quietly skip:

- **They return rich fields but zero confidence, and their high success rate needs your
  cookies.** This runs cookieless and still attaches a confidence to every field, so you
  can tell a solid row from a shaky one without logging anything in.
- **Their AI enrichment is shallow and expensive because it is API-bound.** Ours runs on
  cheap local inference, so the quality / dedup / normalize / hiring-intent layers run on
  every record by default, not as a costly add-on on a handful.

***

### Why it is different

- **No silent errors.** Every field carries a confidence, the whole record carries a
  `reliable` flag. Below the bar a field is returned `null` with `status: "absent"` and
  `reliable: false`, never a confident-looking wrong value.
- **Ghost-job filtering and cross-source dedup.** Each record gets a `ghost_job_score`
  with the flags behind it (`reposted`, `evergreen`, `vague_jd`, `staffing_agency`,
  `perpetual_req`, and `no_real_apply` on ATS), so stale and fake openings are visible —
  or dropped with `drop_ghost`. The same opening seen on several sources is collapsed into
  one record, the rest listed under `also_at`. You load real, distinct openings, not
  reposts and duplicates.
- **Cookieless reach / anti-bot.** It reaches LinkedIn, Indeed and the major ATS boards
  without cookies, and gets through heavy protections — Cloudflare, DataDome and similar — that
  return a challenge page to a plain fetch. (Anti-bot is an arms race, so this is a
  capability, not a guarantee against any one named vendor.)
- **Self-healing — a mechanism that serves the correctness above, not the headline.**
  When a board changes its markup, the engine re-finds fields by meaning instead of
  silently breaking on a selector.
- **Hiring-intent signals.** Enrichment normalises each role and adds hiring-intent
  signals (`buying_signal_score`, `company_signals`), so the data is usable for
  prospecting, labor analytics and recruiting research, not just a flat list of postings.

***

### Supported sources

Live-verified sources only — this list is what is actually tested, not a wishlist:

- Greenhouse
- Ashby
- Lever
- SmartRecruiters
- RemoteOK
- LinkedIn
- Indeed

ATS sources (Greenhouse / Ashby / Lever / SmartRecruiters) take a `company` slug;
LinkedIn, Indeed and RemoteOK take `keywords` (+ `location`). A concrete `source` is
required. Indeed sits behind Cloudflare — it is reached through the same anti-bot stack,
cookieless. Indeed pay is usually an estimate, so it comes back as `salary.source:
"inferred"` (never passed off as an employer-stated figure).

***

### Input

```jsonc
{
  "source": "greenhouse",           // required: linkedin | indeed | greenhouse | lever | ashby | smartrecruiters | remoteok
  "company": "gitlab",              // ATS slug, for greenhouse/lever/ashby/smartrecruiters
  "keywords": "backend engineer",   // for LinkedIn / Indeed / RemoteOK search
  "location": "Berlin",

  "title_include": [], "title_exclude": [],
  "employment_type": [], "workplace_type": [],
  "country": [], "language": [],    // arrays, e.g. ["DE"], ["en"]
  "posted_within_days": 30,
  "drop_expired": true, "drop_ghost": false,

  "enrich": true,
  "enrich_layers": ["quality", "dedup", "normalize", "hiring_intent"],
  "dedup_across_sources": true,
  "ghost_threshold": 0.7,

  "start": 0, "limit": 25, "max_results": 100, "fetch_all": false,
  "include_description": true       // engine fetches full JD HTML/text (on by default; LinkedIn fetch is per-posting)
}
```

What a source needs depends on the source: ATS boards (Greenhouse / Lever / Ashby /
SmartRecruiters) require `company`; LinkedIn and Indeed require `keywords` (or
`location`); RemoteOK needs neither (it lists the feed). Pick a `company` for an ATS
source and the actor fails loud early if it's missing, instead of forwarding a request
the engine would reject.
Filters and enrichment all run on the engine; the actor just forwards them. `max_results`
caps how many jobs come back, and since you pay per returned job, it caps spend.

### Output

One dataset row **per job**, in the engine's snake\_case `JobPosting` schema. Every tracked
field is always present — a missing one comes back with `status: "absent"` and a `null`
value, never silently dropped. Highlights:

- **Core fields:** `title`, `company` (`{name, url}`), `location`
  (`{raw, city, region, country, workplace_type}`), `employment_type`
  (`FULL_TIME` / `PART_TIME` / `CONTRACT` / `INTERN` / `TEMP`), `date_posted`, `salary`
  (`{min, max, currency, period, source}` — `source` is `explicit` / `inferred` /
  `absent`, never guessed), `seniority`, `skills`, `canonical_url`, `apply_url`.
- **Trust layer:** `reliable` (bool), `overall_confidence` (0–1), and `fields` — a map
  where each tracked field carries `{ value, status, score, source }`, with
  `status ∈ confirmed | high | low | absent`.
- **Dedup:** top-level `also_at[]` lists the same opening on other portals (`[]` if
  unique).
- **Enrichment** (present when `enrich` is on) under `enrichment`, each field wrapped as
  `{ value, status, score, source: "inferred", model }`:
  `quality` (`ghost_job_score`, `flags`, `is_real`), `dedup`, `normalized`,
  `hiring_intent`.
- **Flat helpers the actor adds** for the dataset table: `company_name`, `location_raw`,
  and `ghost_job_score` (lifted from `enrichment.quality.ghost_job_score.value`).

`success` says the engine produced a job record; `reliable` says that record cleared the
trust threshold. They diverge when a job is extracted but the engine is not confident —
then `success` is `true`, `reliable` is `false`, and the per-field scores stay low, so a
hedge never reads as a clean hit. Each row is the engine's record passed through
verbatim — the actor never drops or rewrites a field. The full job description
(`description_html` / `description_text`) is included; `include_description` (on by
default) controls whether the engine fetches it for sources fetched per-posting (e.g.
LinkedIn). Descriptions can be large.

***

### Pricing

This actor uses **Pay Per Event**, with a single event:

- **`job-result`** — one flat fee per job returned.

You are charged once per job the engine returns — whether it comes back `reliable: true`
or, honestly, `reliable: false`, because an honest fail is still a result you can act on
(you learn the field is shaky instead of trusting a guess). Jobs the engine drops before
returning — expired, ghost (with `drop_ghost`), or duplicates collapsed across sources —
are **not** returned and **not** charged. A run that fails before returning any results
is not charged. The current price of the event is in the Apify Console pricing tab.

No flat monthly fee, no per-seat pricing. One predictable price per validated job.

***

### A note on data

This actor returns **job and company data only** — title, company, location, salary,
employment terms, and signals derived from the posting. It does not collect or return
personal data of applicants or named recruiters.

***

### About

Built on the [data.hilgard.cz](https://data.hilgard.cz) engine — the same self-healing
stack that does cookieless anti-bot fetching, extraction by meaning, and independent
verification, here applied to jobs with ghost-job scoring and cross-source dedup on top.

By Jan Hilgard — founder of Hosting90 (built 2002, exited 2020), contributor to vllm-mlx.
The precision-first stance is deliberate: I would rather return an honest "not sure" than
a confident wrong row you build a pipeline on.

### Development

```bash
npm install
npm run build      # tsc → dist/
npm run start:dev  # tsx src/main.ts, reads .actor/INPUT.json
```

# Actor input Schema

## `source` (type: `string`):

Which board to pull jobs from. A concrete source is required. ATS sources (Greenhouse / Lever / Ashby / SmartRecruiters) take a company slug; LinkedIn, Indeed and RemoteOK take keywords (+ location).

## `keywords` (type: `string`):

Search query for LinkedIn / RemoteOK, e.g. 'backend engineer'.

## `location` (type: `string`):

Location filter for LinkedIn / aggregators, e.g. 'Berlin' or 'Remote'.

## `company` (type: `string`):

Company / board slug, e.g. 'gitlab'. REQUIRED when source is an ATS board (Greenhouse / Lever / Ashby / SmartRecruiters). Not used for LinkedIn / Indeed / RemoteOK — give 'keywords' there instead.

## `title_include` (type: `array`):

Keep only jobs whose title contains one of these terms (case-insensitive).

## `title_exclude` (type: `array`):

Drop jobs whose title contains one of these terms (case-insensitive).

## `employment_type` (type: `array`):

Keep only these employment types. Values match the output enum (the engine filters against the normalized employment\_type).

## `workplace_type` (type: `array`):

Keep only these workplace types.

## `country` (type: `array`):

ISO country filter, e.g. \['DE', 'US'].

## `language` (type: `array`):

ISO language filter for the job description, e.g. \['en'].

## `posted_within_days` (type: `integer`):

Keep only jobs first posted within this many days. Helps avoid stale and reposted listings.

## `drop_expired` (type: `boolean`):

Drop jobs the engine detects as expired / closed. Dropped jobs are not returned and not charged.

## `drop_ghost` (type: `boolean`):

Drop jobs whose ghost\_job\_score is above 'Ghost threshold'. When off (default), ghost-suspect jobs are still returned, just flagged — you decide. Dropped jobs are not returned and not charged.

## `enrich` (type: `boolean`):

Run the cheap local-inference enrichment on every returned record (quality / dedup / normalize / hiring-intent). On by default.

## `enrich_layers` (type: `array`):

Which enrichment layers to run when 'Enrich' is on.

## `dedup_across_sources` (type: `boolean`):

Collapse the same opening seen on multiple sources into one record, listing the others under also\_at. On by default.

## `ghost_threshold` (type: `number`):

Ghost-job score (0–1) above which a job counts as a ghost. Used for flagging, and for dropping when 'Drop ghost jobs' is on.

## `start` (type: `integer`):

Result offset to start from (for paging through a large search).

## `limit` (type: `integer`):

Results per page the engine fetches.

## `max_results` (type: `integer`):

Hard cap on total jobs returned for this run. You are charged per returned job, so this caps spend.

## `fetch_all` (type: `boolean`):

Page through every available result up to 'Max results'. When off, fetch a single page of 'Page size'.

## `include_description` (type: `boolean`):

Ask the engine to fetch the full job description (description\_html + description\_text). On by default. ATS boards (Greenhouse / Lever / Ashby / SmartRecruiters) include it inline regardless; for sources fetched per-posting (e.g. LinkedIn) this enables it, at some extra time. Descriptions can be large.

## `requestTimeoutSecs` (type: `integer`):

Total time budget for the search, across retries. Hard-capped internally — values above the cap are clamped.

## `maxRetries` (type: `integer`):

Retries if the search stream drops before returning results.

## `retryBackoffSecs` (type: `integer`):

Delay before retrying a dropped search stream.

## Actor input object example

```json
{
  "source": "greenhouse",
  "title_include": [],
  "title_exclude": [],
  "employment_type": [],
  "workplace_type": [],
  "country": [],
  "language": [],
  "posted_within_days": 30,
  "drop_expired": true,
  "drop_ghost": false,
  "enrich": true,
  "enrich_layers": [
    "quality",
    "dedup",
    "normalize",
    "hiring_intent"
  ],
  "dedup_across_sources": true,
  "ghost_threshold": 0.7,
  "start": 0,
  "limit": 25,
  "max_results": 100,
  "fetch_all": false,
  "include_description": true,
  "requestTimeoutSecs": 600,
  "maxRetries": 3,
  "retryBackoffSecs": 3
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "title_include": [],
    "title_exclude": [],
    "employment_type": [],
    "workplace_type": [],
    "country": [],
    "language": []
};

// Run the Actor and wait for it to finish
const run = await client.actor("jan_hilgard/validated-jobs-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "title_include": [],
    "title_exclude": [],
    "employment_type": [],
    "workplace_type": [],
    "country": [],
    "language": [],
}

# Run the Actor and wait for it to finish
run = client.actor("jan_hilgard/validated-jobs-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "title_include": [],
  "title_exclude": [],
  "employment_type": [],
  "workplace_type": [],
  "country": [],
  "language": []
}' |
apify call jan_hilgard/validated-jobs-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=jan_hilgard/validated-jobs-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Validated Jobs Scraper: Dedup, No Ghost Jobs, Confidence Scored",
        "description": "Job data with a correctness guarantee: per-field confidence, ghost-job filtering and cross-source dedup — never silently wrong, duplicated or expired. Reaches LinkedIn and ATS boards cookieless, gets through Cloudflare/DataDome, self-healing on layout shifts. Built on the data.hilgard.cz engine.",
        "version": "0.0",
        "x-build-id": "KqxADGulvgMLkOZE5"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/jan_hilgard~validated-jobs-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-jan_hilgard-validated-jobs-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/jan_hilgard~validated-jobs-scraper/runs": {
            "post": {
                "operationId": "runs-sync-jan_hilgard-validated-jobs-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/jan_hilgard~validated-jobs-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-jan_hilgard-validated-jobs-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "source"
                ],
                "properties": {
                    "source": {
                        "title": "Source",
                        "enum": [
                            "linkedin",
                            "indeed",
                            "greenhouse",
                            "lever",
                            "ashby",
                            "smartrecruiters",
                            "remoteok"
                        ],
                        "type": "string",
                        "description": "Which board to pull jobs from. A concrete source is required. ATS sources (Greenhouse / Lever / Ashby / SmartRecruiters) take a company slug; LinkedIn, Indeed and RemoteOK take keywords (+ location).",
                        "default": "greenhouse"
                    },
                    "keywords": {
                        "title": "Keywords",
                        "type": "string",
                        "description": "Search query for LinkedIn / RemoteOK, e.g. 'backend engineer'."
                    },
                    "location": {
                        "title": "Location",
                        "type": "string",
                        "description": "Location filter for LinkedIn / aggregators, e.g. 'Berlin' or 'Remote'."
                    },
                    "company": {
                        "title": "Company (ATS slug)",
                        "type": "string",
                        "description": "Company / board slug, e.g. 'gitlab'. REQUIRED when source is an ATS board (Greenhouse / Lever / Ashby / SmartRecruiters). Not used for LinkedIn / Indeed / RemoteOK — give 'keywords' there instead."
                    },
                    "title_include": {
                        "title": "Title must include",
                        "type": "array",
                        "description": "Keep only jobs whose title contains one of these terms (case-insensitive).",
                        "items": {
                            "type": "string"
                        }
                    },
                    "title_exclude": {
                        "title": "Title must not include",
                        "type": "array",
                        "description": "Drop jobs whose title contains one of these terms (case-insensitive).",
                        "items": {
                            "type": "string"
                        }
                    },
                    "employment_type": {
                        "title": "Employment type",
                        "type": "array",
                        "description": "Keep only these employment types. Values match the output enum (the engine filters against the normalized employment_type).",
                        "items": {
                            "type": "string",
                            "enum": [
                                "FULL_TIME",
                                "PART_TIME",
                                "CONTRACT",
                                "INTERN",
                                "TEMP"
                            ]
                        }
                    },
                    "workplace_type": {
                        "title": "Workplace type",
                        "type": "array",
                        "description": "Keep only these workplace types.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "onsite",
                                "hybrid",
                                "remote"
                            ]
                        }
                    },
                    "country": {
                        "title": "Country",
                        "type": "array",
                        "description": "ISO country filter, e.g. ['DE', 'US'].",
                        "items": {
                            "type": "string"
                        }
                    },
                    "language": {
                        "title": "Language",
                        "type": "array",
                        "description": "ISO language filter for the job description, e.g. ['en'].",
                        "items": {
                            "type": "string"
                        }
                    },
                    "posted_within_days": {
                        "title": "Posted within (days)",
                        "minimum": 1,
                        "maximum": 365,
                        "type": "integer",
                        "description": "Keep only jobs first posted within this many days. Helps avoid stale and reposted listings.",
                        "default": 30
                    },
                    "drop_expired": {
                        "title": "Drop expired",
                        "type": "boolean",
                        "description": "Drop jobs the engine detects as expired / closed. Dropped jobs are not returned and not charged.",
                        "default": true
                    },
                    "drop_ghost": {
                        "title": "Drop ghost jobs",
                        "type": "boolean",
                        "description": "Drop jobs whose ghost_job_score is above 'Ghost threshold'. When off (default), ghost-suspect jobs are still returned, just flagged — you decide. Dropped jobs are not returned and not charged.",
                        "default": false
                    },
                    "enrich": {
                        "title": "Enrich",
                        "type": "boolean",
                        "description": "Run the cheap local-inference enrichment on every returned record (quality / dedup / normalize / hiring-intent). On by default.",
                        "default": true
                    },
                    "enrich_layers": {
                        "title": "Enrichment layers",
                        "type": "array",
                        "description": "Which enrichment layers to run when 'Enrich' is on.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "quality",
                                "dedup",
                                "normalize",
                                "hiring_intent"
                            ]
                        },
                        "default": [
                            "quality",
                            "dedup",
                            "normalize",
                            "hiring_intent"
                        ]
                    },
                    "dedup_across_sources": {
                        "title": "Dedup across sources",
                        "type": "boolean",
                        "description": "Collapse the same opening seen on multiple sources into one record, listing the others under also_at. On by default.",
                        "default": true
                    },
                    "ghost_threshold": {
                        "title": "Ghost threshold",
                        "minimum": 0,
                        "maximum": 1,
                        "type": "number",
                        "description": "Ghost-job score (0–1) above which a job counts as a ghost. Used for flagging, and for dropping when 'Drop ghost jobs' is on.",
                        "default": 0.7
                    },
                    "start": {
                        "title": "Start offset",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Result offset to start from (for paging through a large search).",
                        "default": 0
                    },
                    "limit": {
                        "title": "Page size",
                        "minimum": 1,
                        "maximum": 200,
                        "type": "integer",
                        "description": "Results per page the engine fetches.",
                        "default": 25
                    },
                    "max_results": {
                        "title": "Max results",
                        "minimum": 1,
                        "maximum": 5000,
                        "type": "integer",
                        "description": "Hard cap on total jobs returned for this run. You are charged per returned job, so this caps spend.",
                        "default": 100
                    },
                    "fetch_all": {
                        "title": "Fetch all",
                        "type": "boolean",
                        "description": "Page through every available result up to 'Max results'. When off, fetch a single page of 'Page size'.",
                        "default": false
                    },
                    "include_description": {
                        "title": "Fetch full description",
                        "type": "boolean",
                        "description": "Ask the engine to fetch the full job description (description_html + description_text). On by default. ATS boards (Greenhouse / Lever / Ashby / SmartRecruiters) include it inline regardless; for sources fetched per-posting (e.g. LinkedIn) this enables it, at some extra time. Descriptions can be large.",
                        "default": true
                    },
                    "requestTimeoutSecs": {
                        "title": "Request budget (seconds)",
                        "minimum": 60,
                        "maximum": 3600,
                        "type": "integer",
                        "description": "Total time budget for the search, across retries. Hard-capped internally — values above the cap are clamped.",
                        "default": 600
                    },
                    "maxRetries": {
                        "title": "Max retries",
                        "minimum": 0,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Retries if the search stream drops before returning results.",
                        "default": 3
                    },
                    "retryBackoffSecs": {
                        "title": "Retry backoff (seconds)",
                        "minimum": 0,
                        "maximum": 120,
                        "type": "integer",
                        "description": "Delay before retrying a dropped search stream.",
                        "default": 3
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
