# AI & PhD Researcher Dataset Filter — recruiting, GTM, research (`crystalbytes/phds`) Actor

Turn a raw JSON export of AI / PhD / researcher profiles into a precise, deduplicated, deliverable-grade shortlist in seconds. Built for recruiting teams, B2B growth/SDR teams, and research panels who need clean, targeted lists instead of raw scraping noise.

🚀 22.5k records filtered in <6s.

- **URL**: https://apify.com/crystalbytes/phds.md
- **Developed by:** [CrystalBytes](https://apify.com/crystalbytes) (community)
- **Categories:** Jobs, AI, Lead generation
- **Stats:** 2 total users, 1 monthly users, 95.7% runs succeeded, NaN bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $10.00 / 1,000 delivered records

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 🎓 AI & PhD Researcher Dataset Filter — recruiting, GTM, research

**Turn a raw JSON export of AI / PhD / researcher profiles into a precise, deduplicated, deliverable-grade shortlist in seconds.**

You do not need extra engineering to get a useful first run. This Actor does **not** browse the web or pull live profiles. You **bring your own** JSON file (a single array of profile objects). The Actor **filters, deduplicates, and shapes** the rows you choose, then writes them to an Apify **Dataset** you can download as JSON or CSV.

---

### Who it is for

- **Hiring and talent teams** shortlisting PhD-level AI, ML, or research profiles from an existing export.
- **B2B GTM, SDR, and growth teams** who need a clean, ICP-matched list instead of a noisy raw dump.
- **Research, policy, and panel coordinators** who need specific countries, languages, or seniority without manual spreadsheet work.
- **Data and ops** teams that already have profile JSON and want repeatable, versioned “audience” runs.

---

### Get started in three steps

1. **Try the built-in sample** — Open the Actor and run. The bundled demo loads automatically so you can see how filters work; results appear on the **Dataset** tab.
2. **Your own profile file** — For production runs, whoever manages your workspace connects the JSON source your organization uses. If you need a different file than the default, ask them to point the Actor at it.
3. **Download results** — Open the run’s **Dataset** for the rows. For a step-by-step breakdown, read **`RUN_SUMMARY`** in the run’s default **Key-value store**.

> **Note:** The **Input** form is for **filters and export limits** only. **Which JSON file** a run uses is chosen outside the public form (by your workspace setup).

---

### Find the right people (practical playbooks)

Use the matching sections in the **Input** form. Leave a field empty to turn that filter off.

| I want to… | Start here in the form |
|------------|-------------------------|
| **US or UK** candidates only | **Location** — countries include (and add excludes for regions you do not want). |
| **Europe-based** PhD+ researchers | **Location** (continent or country) + **Education** — minimum level, schools, or degrees. |
| **Senior** AI / product / legal in **software** | **Career** — industry, job title, job level; optionally **Company** for size or employer name. |
| **Quality contacts** (work email, fewer bad domains) | **Contact quality** — require work email, allow / block email domains. |
| **A tight shortlist, not the whole file** | **Volume, sampling & pagination** — see [How many rows you export](#how-many-rows-you-export) below. |
| **No duplicate people** | **Deduplication** — pick a primary key (e.g. LinkedIn username) and optional backup key. |
| **Safer sharing or demos** (masked email / phone) | **Output shaping & privacy** — redact PII, trim fields, or flatten nested fields for CSV. |

Narrow with **AND** (every enabled group must match) or explore more broadly with **OR** (at least one group). **Exclusion** lists (countries you block, bad domains, title excludes) are **always** applied, even in OR mode, so you do not “leak” blocked rows by accident.

---

### How filters work (short version)

- Each enabled field is a **condition**. **Match mode** (AND / OR) controls how **groups** of conditions combine; values **inside** one list are OR’d (e.g. any of several countries).
- **Empty** = that filter is off.
- **Excludes** (countries, companies, keywords, etc.) are always enforced for safety.

The Console form is grouped into sections: **Optional listing** (if used) → **Filter logic** → **Volume** → **Location** through **Output shaping**. Every field has examples and tips inline.

---

### How many rows you export

The Actor filters the **entire** file first, then **deduplicates** (if you set dedupe), then optionally takes a **random sample**, and **only then** applies row limits. So limits always apply to the **qualified** list.

You can use **either** style — **not both** (the run will stop with a clear error if you mix them on purpose).

#### A) “Start at row” and “Stop before row” (range)

Good when you want a **single slice** without doing math (e.g. “rows 0–999” or “100 to 1000”).

- **Start at row** — 0 = first row in the *matched* list (after filters, dedupe, and optional sample).
- **Stop before row** — **Exclusive** end: valid rows are **\[Start, Stop)**. Example: start `0`, stop `1000` = first 1000 rows. Start `100`, stop `1000` = 900 rows (indices 100 through 999).

**Rows in this export ≈ Stop − Start.** **Paid** plans support starting after row 0 (pagination). On the **free** tier, starting after the first row is not supported — use the first slice only, or upgrade for offset / pagination.

#### B) “Skip first N” and “Max records” (classic)

- **Skip first N** — offset after the qualified list (page 2 of 1 000: skip `1000`, max `1000` when each “page” is 1 000 rows).
- **Max records to output** — `0` means “up to the limit allowed by your **plan** and the monthly allowance,” not “zero rows.”

**Random sample** (optional) shuffles the qualified list *before* skip / cap — use it for A/B tests or training splits, not for stable paging unless you know what you are doing.

**Billing reminder:** the platform may charge by **delivered** rows; your plan also enforces per-run and per-month caps. See the Actor’s **Pricing** tab in Apify and **`RUN_SUMMARY` → `monetization`**.

---

### Output and transparency

- **Dataset** — one JSON object per row; download as JSON, CSV, or Excel from the run.
- **`RUN_SUMMARY`** (in the run’s default **Key-value store**) — how many records were loaded, filtered, deduplicated, sampled, skipped, and exported, plus **monetization** and timing. Use it when results look empty, too small, or when reconciling usage.

Set **Flatten nested fields** for wider CSV columns. Use **Redact PII** when you need shareable samples without full email or phone.

---

### Pricing and plans (summary)

**Exact** unit prices, events, and any platform fees are on this Actor’s **Pricing** tab in the Apify Console. The table below is the **Actor-side policy** (from our tier file), so you can see run and monthly **caps**; it is not a substitute for the Console invoice.

| Tier        | Max / run | Max / month | Runs / day | Free tier field limits |
|------------|-----------|-------------|------------|-------------------------|
| `free`     | 50        | 120         | 1          | Yes (basic fields only) |
| `starter`  | 4 000     | 15 000      | no hard daily cap in Actor | — |
| `pro`      | 4 000     | 25 000      | no hard daily cap in Actor | — |
| `agency`   | 10 000    | 100 000     | no hard daily cap in Actor | — |
| `development` | (high) | (high)   | (high)     | For local / owner tests only |

- **Free** strips sensitive columns (e.g. work email, phones, some addresses) so you can evaluate fit before upgrading.
- **Paid** tiers unlock the full record, **offset pagination** (skip / start-after-first-row), and **overage** past the monthly cap where configured — see the Console for **overage** event names and prices.
- After each run, check **`RUN_SUMMARY` → `monetization`** and compare to your Apify billing view.

---

### Trust, data, and compliance

- You supply the JSON; this run **does not** crawl third-party sites or “discover” new profiles from the open web.
- You are responsible for **lawful** use, consent, and platform terms that apply to your source data (e.g. privacy rules, email outreach laws).
- Use **redaction** and **field allow / deny** lists for demos, contractors, or external sharing.
- **Who can see a run’s full Input** is controlled in Apify (organization permissions). Do not put passwords or private keys in task input.

On **performance** and large files, see **Options** on the run (memory, timeout). A rough guide: a **22k-row** file has been used in development tests in a few seconds at **2 GB** memory; very large single files may need more memory, a longer timeout, or splitting the source file — ask your workspace admin if a run times out or runs out of memory.

---

### Reliability and support

- Invalid inputs (e.g. bad **regex** patterns, over-claimed **advertised** counts, or **conflicting** volume settings) fail fast with a readable error.
- **0 results after filters** — widen one group at a time, try **OR** match mode, or check **`RUN_SUMMARY` → `pipeline`** to see where the list went to zero.

**Support and feedback:** `crystalbytes@proton.me` — usually within one business day.

---

*Ready to build a clean, plan-aware shortlist from your own researcher JSON — start a run and refine filters using `RUN_SUMMARY` until the numbers match your goal.*

# Actor input Schema

## `sourceR2Bucket` (type: `string`):

Set via API or task JSON if the publisher uses R2. Hidden from the default form. Leave **empty** for Apify Storage or the bundled local demo. Example: `acme-datasets`
## `sourceR2Key` (type: `string`):

Object path when using R2. The Actor also reads `R2_KEY` or `R2_OBJECT_KEY` from the environment if this is left empty. Hidden from the default form.
## `sourceR2Endpoint` (type: `string`):

Only if the Actor owner configured a non-default S3-compatible endpoint. Hidden from the default form; usually set in environment.
## `inputJsonPath` (type: `string`):

Hidden in the default Console form. For local `apify run`, defaults to `data/demo_sample.json` in code when unset. Set manually in run input or API for custom paths relative to the project root.
## `advertisedRecordCount` (type: `integer`):

Optional public-facing count for the listing. Must not exceed the number of records in your file. When empty, the true file count is used.
## `matchMode` (type: `string`):

Global combinator for filter **groups**. AND is the default and recommended for narrowing. Switch to OR to widen results without deleting filters.
## `textMatchMode` (type: `string`):

Applies to free-text fields (name, job title, LinkedIn username, keywords, etc.).

- `contains` — substring, most forgiving (default).
- `exact` — whole-string equality after trim.
- `startsWith` / `endsWith` — prefix / suffix match.
- `regex` — JavaScript RegExp. Use `\b`, `|`, `^` etc. Example: `^(chief|head)\s`.

⚠️ `regex` is powerful — invalid patterns will fail the run with a clear error.
## `caseSensitive` (type: `boolean`):

When enabled, text comparisons respect upper / lower case. Default **off** — useful because most of the dataset is stored lowercase.
## `exportFromIndex` (type: `integer`):

0-based index into the **matched** list (after filters, dedupe, and optional random sample). `0` = first row. Only used when **Stop before row** is set; otherwise use *Skip first N* below.
## `exportToExclusiveIndex` (type: `integer`):

**Optional.** When set, export only half-open range **[Start at row, Stop before row)** — same as a spreadsheet row slice. Example: start `0`, stop `1000` = first 1000 rows. Leave **empty** to use *Skip first N* + *Max records* instead. Must be ≥ *Start at row*.
## `maxRecords` (type: `integer`):

Maximum rows written to the result. `0` = as many as your **current plan** allows, up to how many rows matched. **Not** used when *Stop before row* is set (use Start/Stop range or this + Skip — not both). Tip: about `500` while tuning filters.
## `skipRecords` (type: `integer`):

Pagination offset applied after filtering & dedupe. Use with *Max records* for chunked exports — e.g. page 2 of 1000 → skip `1000`, max `1000`.
## `randomSample` (type: `integer`):

If > 0, take a **random sample** of N records (Fisher–Yates shuffle) from the filtered set before pagination. Useful for A/B outreach tests and model training splits. Seeded by run date for loose reproducibility within the same day.
## `countriesInclude` (type: `array`):

Keep only rows where `country` equals one of these (case-insensitive, exact). Examples: `united states`, `united kingdom`, `germany`, `turkey`.
## `countriesExclude` (type: `array`):

Drop rows whose `country` matches any value here. Applied **always** (even in OR mode). Use to blocklist markets you can't serve. Examples: `india`, `china`.
## `regionsInclude` (type: `array`):

Keep rows where **any** value in `regions[]` contains **any** of these substrings. Good for state / province targeting: `new york`, `bavaria`, `greater london`.
## `regionsExclude` (type: `array`):

Drop rows where any `regions[]` entry contains any of these substrings. Always enforced.
## `localityContains` (type: `array`):

Substring match across `location_names[]` and `street_addresses[].locality`. Case-insensitive. Examples: `new york`, `san francisco`, `berlin`.
## `continentsInclude` (type: `array`):

Keep rows where **any** continent in `location[].continent` or `street_addresses[].continent` is in this list. Valid values: `north america`, `south america`, `europe`, `asia`, `africa`, `oceania`.
## `industriesInclude` (type: `array`):

Keep rows where `industry` contains any of these substrings. Examples: `computer software`, `legal services`, `hospital & health care`.
## `industriesExclude` (type: `array`):

Drop rows where `industry` contains any of these substrings. Always enforced. Examples: `staffing`, `gambling`.
## `jobTitleContains` (type: `array`):

Text match on `job_title` (and all `experience[].title.name`). Uses your **Text match mode** above (contains / exact / regex / …). Examples: `counsel`, `director`, `chief`.
## `jobTitleExclude` (type: `array`):

Reject rows whose `job_title` matches any entry under the active Text match mode. Always enforced. Examples: `intern`, `assistant`.
## `jobRolesInclude` (type: `array`):

Taxonomy match against `experience[].title.role` **or** `job[].title_role`. Multi-select. Useful when titles vary wildly (e.g. all sales-flavored roles).
## `jobLevelsInclude` (type: `array`):

Taxonomy match against `experience[].title.levels[]`. Pick one or more. `cxo` covers CEO / CTO / CFO etc.; `training` = interns and residents.
## `jobSubRolesContains` (type: `array`):

Substring match on `experience[].title.sub_role`. Lets you reach into specialties like `lawyer`, `doctor`, `nursing`, `data`, `graphic_design`, `product`.
## `minYearsExperience` (type: `integer`):

Require `inferred_years_experience` ≥ this value. `0` = no minimum. Great proxy for seniority when titles are messy.
## `maxYearsExperience` (type: `integer`):

Require `inferred_years_experience` ≤ this value. `0` = no maximum.
## `currentlyEmployed` (type: `boolean`):

Keep rows where `experience[]` contains an `is_primary: true` entry with no `end_date` (i.e. still in that role).
## `companyNamesInclude` (type: `array`):

Keep rows whose `experience[].company.name` contains any of these substrings. Examples: `siemens`, `google`, `mckinsey`.
## `companyNamesExclude` (type: `array`):

Drop rows where any `experience[].company.name` contains any of these substrings. Always enforced. Useful to exclude competitors or blacklisted employers.
## `currentCompanyContains` (type: `array`):

Substring match on `job_company[].name` (the current employer). Use this when you don't care about history, only where they work *right now*.
## `companyIndustriesInclude` (type: `array`):

Substring match on `experience[].company.industry`. Broader than the top-level `industry` because it covers every past employer too. Example: `electrical/electronic manufacturing`.
## `companySizeMin` (type: `string`):

Lower bound of `experience[].company.size`. A row passes if they ever worked at a company with headcount ≥ this bucket.
## `companySizeMax` (type: `string`):

Upper bound of `experience[].company.size`.
## `minCompaniesWorked` (type: `integer`):

Require `experience[].length` ≥ this value. Proxy for career breadth.
## `schoolsInclude` (type: `array`):

Keep rows who studied at any of these institutions. Substring match. Examples: `harvard`, `mit`, `hofstra`.
## `degreesInclude` (type: `array`):

Any token in `education[].degrees[]` must contain any of these strings. Common tokens: `bachelors`, `masters`, `doctorates`, `doctor of jurisprudence`, `master of business administration`.
## `majorsInclude` (type: `array`):

Substring match on `education[].majors[]`. Examples: `law`, `computer science`, `business`.
## `minEducationLevel` (type: `string`):

Require at least one education entry at this level or higher. Degree strings are matched heuristically (regex covers PhD, JD, MD, MBA, MSc, BSc, BA, LLB, associate, etc.).
## `languagesAnyOf` (type: `array`):

Keep rows who speak at least one of these languages. Examples: `english`, `german`, `arabic`.
## `languagesAllOf` (type: `array`):

Stricter: require **every** listed language to be present in `languages[]`. Example: `english` + `german` = bilingual.
## `interestsInclude` (type: `array`):

Substring match on `interests[]`. Often qualitative — think `human rights`, `science and technology`, `education`.
## `hasAnyCertification` (type: `boolean`):

Drop rows where `certifications` is null / empty. Useful for regulated industries (legal, medical, finance).
## `certificationsInclude` (type: `array`):

Substring match on `certifications[].name` (or raw string). Examples: `pmp`, `bar exam`, `cpa`.
## `genderInclude` (type: `string`):

Filter on `basic_info.gender`. Dataset currently stores `male` / `female`; rows with unspecified gender are passed through on *Any*.
## `nameContains` (type: `array`):

Text match on `full_name` using the active *Text match mode*. With `regex`, a pattern like `^a[ln]` matches names starting with *al* or *an*.
## `nameExclude` (type: `array`):

Reject rows whose `full_name` matches. Always enforced. Good for suppression lists or test / dummy rows.
## `linkedinUsernameContains` (type: `string`):

Text match on `linkedin_username` using the active *Text match mode*. Empty = off.
## `linkedinRequired` (type: `boolean`):

Drop rows without a `linkedin_username` — handy when your downstream tool keys off LinkedIn URLs.
## `minConnections` (type: `integer`):

Require `connections` ≥ this value. `0` = no minimum.
## `maxConnections` (type: `integer`):

Require `connections` ≤ this value. `0` = no maximum.
## `requireAnyEmail` (type: `boolean`):

Drop rows where both `email` and `work_email` are empty or null.
## `requireWorkEmail` (type: `boolean`):

Drop rows where `work_email` is missing **and** no `emails[]` entry has a professional-type token (`professional`, `current_professional`, `previous_professional`, `work`, or `business`). The Actor canonicalizes these to the same class before filtering.
## `requirePersonalEmail` (type: `boolean`):

Drop rows where `email` is missing **and** no `emails[]` entry has `type = personal`.
## `requirePhone` (type: `boolean`):

Drop rows where `phone_numbers` is null or empty.
## `emailDomainsInclude` (type: `array`):

Keep rows where **any** email (personal, work, or from `emails[]`) ends with one of these domains. Examples: `gmail.com`, `siemens.com`.
## `emailDomainsExclude` (type: `array`):

Drop rows where any email ends with one of these domains. Always enforced. Common blocklist: `mailinator.com`, `guerrillamail.com`, `example.com`.
## `globalKeywordsAny` (type: `array`):

Keep rows where **any** of these keywords appears in **any** of the searched fields. Uses the active *Text match mode* (supports regex). Examples: `patent`, `intellectual property`.
## `globalKeywordsExclude` (type: `array`):

Drop rows where any of these keywords appears in any searched field. Always enforced. Great for suppressing known irrelevant niches.
## `dedupeByField` (type: `string`):

Pick the best unique identifier for your audience. `linkedin_username` is usually safest; `email` is great for deliverability lists.
## `secondaryDedupeField` (type: `string`):

When the primary key is missing or empty on a row, fall back to this key so you don't lose records that lack (e.g.) a LinkedIn handle but do have an email.
## `fieldsToKeep` (type: `array`):

If non-empty, only these top-level keys are emitted. Nested paths are **not** supported here — use *Fields to omit* for surgical removal. Example keys: `id`, `full_name`, `email`, `linkedin_username`, `country`, `job_title`.
## `fieldsToOmit` (type: `array`):

Strip these top-level keys from every emitted record. Applied *after* `fieldsToKeep`. Example keys: `phone_numbers`, `street_addresses`.
## `redactPii` (type: `boolean`):

Masks emails as `j***@domain.com` and keeps only the last 4 digits of phone numbers. Useful when sharing samples with contractors or posting demos publicly. Does **not** affect filtering — only the output.
## `flattenNestedFields` (type: `boolean`):

Expose the most-used nested bits at the top level: `first_name`, `last_name`, `gender`, `primary_city`, `primary_region`, `primary_continent`, `current_company_name`, `primary_email_domain`. Original nested objects remain, unless you also use *Fields to omit*.

## Actor input object example

```json
{
  "sourceR2Bucket": "",
  "sourceR2Key": "",
  "sourceR2Endpoint": "",
  "inputJsonPath": "",
  "matchMode": "AND",
  "textMatchMode": "contains",
  "caseSensitive": false,
  "exportFromIndex": 100,
  "exportToExclusiveIndex": 1000,
  "maxRecords": 5000,
  "skipRecords": 1000,
  "randomSample": 250,
  "countriesInclude": [
    "united states",
    "canada"
  ],
  "countriesExclude": [
    "india"
  ],
  "regionsInclude": [
    "new york",
    "california"
  ],
  "regionsExclude": [
    "texas"
  ],
  "localityContains": [
    "new york",
    "london"
  ],
  "continentsInclude": [
    "north america",
    "europe"
  ],
  "industriesInclude": [
    "computer software",
    "research"
  ],
  "industriesExclude": [
    "staffing and recruiting"
  ],
  "jobTitleContains": [
    "counsel",
    "chief"
  ],
  "jobTitleExclude": [
    "intern"
  ],
  "jobRolesInclude": [
    "legal",
    "engineering"
  ],
  "jobLevelsInclude": [
    "cxo",
    "vp"
  ],
  "jobSubRolesContains": [
    "lawyer",
    "data"
  ],
  "minYearsExperience": 5,
  "maxYearsExperience": 25,
  "currentlyEmployed": false,
  "companyNamesInclude": [
    "google",
    "siemens"
  ],
  "companyNamesExclude": [
    "acme"
  ],
  "currentCompanyContains": [
    "siemens"
  ],
  "companyIndustriesInclude": [
    "electrical/electronic manufacturing"
  ],
  "companySizeMin": "",
  "companySizeMax": "",
  "minCompaniesWorked": 3,
  "schoolsInclude": [
    "harvard"
  ],
  "degreesInclude": [
    "masters",
    "doctorates"
  ],
  "majorsInclude": [
    "law"
  ],
  "minEducationLevel": "any",
  "languagesAnyOf": [
    "english",
    "french"
  ],
  "languagesAllOf": [
    "english",
    "german"
  ],
  "interestsInclude": [
    "education"
  ],
  "hasAnyCertification": false,
  "certificationsInclude": [
    "pmp"
  ],
  "genderInclude": "any",
  "nameContains": [
    "smith"
  ],
  "nameExclude": [
    "test"
  ],
  "linkedinUsernameContains": "mducro",
  "linkedinRequired": false,
  "minConnections": 100,
  "maxConnections": 500,
  "requireAnyEmail": false,
  "requireWorkEmail": false,
  "requirePersonalEmail": false,
  "requirePhone": false,
  "emailDomainsInclude": [
    "gmail.com"
  ],
  "emailDomainsExclude": [
    "mailinator.com"
  ],
  "globalKeywordsAny": [
    "patent",
    "ip"
  ],
  "globalKeywordsExclude": [
    "bitcoin",
    "forex"
  ],
  "dedupeByField": "none",
  "secondaryDedupeField": "none",
  "fieldsToKeep": [
    "full_name",
    "email",
    "linkedin_username"
  ],
  "fieldsToOmit": [
    "phone_numbers"
  ],
  "redactPii": false,
  "flattenNestedFields": false
}
````

# Actor output Schema

## `overview` (type: `string`):

Browse and export filtered rows from the default dataset.

## `signals` (type: `string`):

Year-of-experience and connection signals for quick QA.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "countriesInclude": [],
    "countriesExclude": [],
    "regionsInclude": [],
    "regionsExclude": [],
    "localityContains": [],
    "continentsInclude": [],
    "industriesInclude": [],
    "industriesExclude": [],
    "jobTitleContains": [],
    "jobTitleExclude": [],
    "jobRolesInclude": [],
    "jobLevelsInclude": [],
    "jobSubRolesContains": [],
    "companyNamesInclude": [],
    "companyNamesExclude": [],
    "currentCompanyContains": [],
    "companyIndustriesInclude": [],
    "schoolsInclude": [],
    "degreesInclude": [],
    "majorsInclude": [],
    "languagesAnyOf": [],
    "languagesAllOf": [],
    "interestsInclude": [],
    "certificationsInclude": [],
    "nameContains": [],
    "nameExclude": [],
    "emailDomainsInclude": [],
    "emailDomainsExclude": [],
    "globalKeywordsAny": [],
    "globalKeywordsExclude": [],
    "fieldsToKeep": [],
    "fieldsToOmit": []
};

// Run the Actor and wait for it to finish
const run = await client.actor("crystalbytes/phds").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "countriesInclude": [],
    "countriesExclude": [],
    "regionsInclude": [],
    "regionsExclude": [],
    "localityContains": [],
    "continentsInclude": [],
    "industriesInclude": [],
    "industriesExclude": [],
    "jobTitleContains": [],
    "jobTitleExclude": [],
    "jobRolesInclude": [],
    "jobLevelsInclude": [],
    "jobSubRolesContains": [],
    "companyNamesInclude": [],
    "companyNamesExclude": [],
    "currentCompanyContains": [],
    "companyIndustriesInclude": [],
    "schoolsInclude": [],
    "degreesInclude": [],
    "majorsInclude": [],
    "languagesAnyOf": [],
    "languagesAllOf": [],
    "interestsInclude": [],
    "certificationsInclude": [],
    "nameContains": [],
    "nameExclude": [],
    "emailDomainsInclude": [],
    "emailDomainsExclude": [],
    "globalKeywordsAny": [],
    "globalKeywordsExclude": [],
    "fieldsToKeep": [],
    "fieldsToOmit": [],
}

# Run the Actor and wait for it to finish
run = client.actor("crystalbytes/phds").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "countriesInclude": [],
  "countriesExclude": [],
  "regionsInclude": [],
  "regionsExclude": [],
  "localityContains": [],
  "continentsInclude": [],
  "industriesInclude": [],
  "industriesExclude": [],
  "jobTitleContains": [],
  "jobTitleExclude": [],
  "jobRolesInclude": [],
  "jobLevelsInclude": [],
  "jobSubRolesContains": [],
  "companyNamesInclude": [],
  "companyNamesExclude": [],
  "currentCompanyContains": [],
  "companyIndustriesInclude": [],
  "schoolsInclude": [],
  "degreesInclude": [],
  "majorsInclude": [],
  "languagesAnyOf": [],
  "languagesAllOf": [],
  "interestsInclude": [],
  "certificationsInclude": [],
  "nameContains": [],
  "nameExclude": [],
  "emailDomainsInclude": [],
  "emailDomainsExclude": [],
  "globalKeywordsAny": [],
  "globalKeywordsExclude": [],
  "fieldsToKeep": [],
  "fieldsToOmit": []
}' |
apify call crystalbytes/phds --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crystalbytes/phds",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "AI & PhD Researcher Dataset Filter — recruiting, GTM, research",
        "description": "Turn a raw JSON export of AI / PhD / researcher profiles into a precise, deduplicated, deliverable-grade shortlist in seconds. Built for recruiting teams, B2B growth/SDR teams, and research panels who need clean, targeted lists instead of raw scraping noise.\n\n🚀 22.5k records filtered in <6s.",
        "version": "0.3",
        "x-build-id": "L1F3MTXIgrv09FeL3"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crystalbytes~phds/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crystalbytes-phds",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crystalbytes~phds/runs": {
            "post": {
                "operationId": "runs-sync-crystalbytes-phds",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crystalbytes~phds/run-sync": {
            "post": {
                "operationId": "run-sync-crystalbytes-phds",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "sourceR2Bucket": {
                        "title": "🪣 Optional — cloud bucket (operator / large files)",
                        "type": "string",
                        "description": "Set via API or task JSON if the publisher uses R2. Hidden from the default form. Leave **empty** for Apify Storage or the bundled local demo. Example: `acme-datasets`",
                        "default": ""
                    },
                    "sourceR2Key": {
                        "title": "🔑 Object key in that bucket",
                        "type": "string",
                        "description": "Object path when using R2. The Actor also reads `R2_KEY` or `R2_OBJECT_KEY` from the environment if this is left empty. Hidden from the default form.",
                        "default": ""
                    },
                    "sourceR2Endpoint": {
                        "title": "🌐 Custom R2 endpoint (optional)",
                        "type": "string",
                        "description": "Only if the Actor owner configured a non-default S3-compatible endpoint. Hidden from the default form; usually set in environment.",
                        "default": ""
                    },
                    "inputJsonPath": {
                        "title": "💻 Local file path (developers & local runs only)",
                        "type": "string",
                        "description": "Hidden in the default Console form. For local `apify run`, defaults to `data/demo_sample.json` in code when unset. Set manually in run input or API for custom paths relative to the project root.",
                        "default": ""
                    },
                    "advertisedRecordCount": {
                        "title": "📊 Advertised record count (optional)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Optional public-facing count for the listing. Must not exceed the number of records in your file. When empty, the true file count is used."
                    },
                    "matchMode": {
                        "title": "🧩 Match mode between filter groups",
                        "enum": [
                            "AND",
                            "OR"
                        ],
                        "type": "string",
                        "description": "Global combinator for filter **groups**. AND is the default and recommended for narrowing. Switch to OR to widen results without deleting filters.",
                        "default": "AND"
                    },
                    "textMatchMode": {
                        "title": "🔤 Text match mode",
                        "enum": [
                            "contains",
                            "exact",
                            "startsWith",
                            "endsWith",
                            "regex"
                        ],
                        "type": "string",
                        "description": "Applies to free-text fields (name, job title, LinkedIn username, keywords, etc.).\n\n- `contains` — substring, most forgiving (default).\n- `exact` — whole-string equality after trim.\n- `startsWith` / `endsWith` — prefix / suffix match.\n- `regex` — JavaScript RegExp. Use `\\b`, `|`, `^` etc. Example: `^(chief|head)\\s`.\n\n⚠️ `regex` is powerful — invalid patterns will fail the run with a clear error.",
                        "default": "contains"
                    },
                    "caseSensitive": {
                        "title": "🔠 Case-sensitive matching",
                        "type": "boolean",
                        "description": "When enabled, text comparisons respect upper / lower case. Default **off** — useful because most of the dataset is stored lowercase.",
                        "default": false
                    },
                    "exportFromIndex": {
                        "title": "📌 Start at row (after filters & dedupe)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "0-based index into the **matched** list (after filters, dedupe, and optional random sample). `0` = first row. Only used when **Stop before row** is set; otherwise use *Skip first N* below.",
                        "default": 0
                    },
                    "exportToExclusiveIndex": {
                        "title": "🛑 Stop before row (exclusive end)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "**Optional.** When set, export only half-open range **[Start at row, Stop before row)** — same as a spreadsheet row slice. Example: start `0`, stop `1000` = first 1000 rows. Leave **empty** to use *Skip first N* + *Max records* instead. Must be ≥ *Start at row*."
                    },
                    "maxRecords": {
                        "title": "🔢 Max records to output",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum rows written to the result. `0` = as many as your **current plan** allows, up to how many rows matched. **Not** used when *Stop before row* is set (use Start/Stop range or this + Skip — not both). Tip: about `500` while tuning filters.",
                        "default": 0
                    },
                    "skipRecords": {
                        "title": "⏭️ Skip first N records (offset)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Pagination offset applied after filtering & dedupe. Use with *Max records* for chunked exports — e.g. page 2 of 1000 → skip `1000`, max `1000`.",
                        "default": 0
                    },
                    "randomSample": {
                        "title": "🎲 Random sample size (0 = disabled)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "If > 0, take a **random sample** of N records (Fisher–Yates shuffle) from the filtered set before pagination. Useful for A/B outreach tests and model training splits. Seeded by run date for loose reproducibility within the same day.",
                        "default": 0
                    },
                    "countriesInclude": {
                        "title": "🌍 Countries — include (any-of)",
                        "type": "array",
                        "description": "Keep only rows where `country` equals one of these (case-insensitive, exact). Examples: `united states`, `united kingdom`, `germany`, `turkey`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "countriesExclude": {
                        "title": "🚫 Countries — exclude",
                        "type": "array",
                        "description": "Drop rows whose `country` matches any value here. Applied **always** (even in OR mode). Use to blocklist markets you can't serve. Examples: `india`, `china`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "regionsInclude": {
                        "title": "🗺️ Regions — include (substring, any-of)",
                        "type": "array",
                        "description": "Keep rows where **any** value in `regions[]` contains **any** of these substrings. Good for state / province targeting: `new york`, `bavaria`, `greater london`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "regionsExclude": {
                        "title": "🚷 Regions — exclude",
                        "type": "array",
                        "description": "Drop rows where any `regions[]` entry contains any of these substrings. Always enforced.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "localityContains": {
                        "title": "🏙️ City / locality contains (any-of)",
                        "type": "array",
                        "description": "Substring match across `location_names[]` and `street_addresses[].locality`. Case-insensitive. Examples: `new york`, `san francisco`, `berlin`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "continentsInclude": {
                        "title": "🌐 Continents — include",
                        "type": "array",
                        "description": "Keep rows where **any** continent in `location[].continent` or `street_addresses[].continent` is in this list. Valid values: `north america`, `south america`, `europe`, `asia`, `africa`, `oceania`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "industriesInclude": {
                        "title": "🏭 Industries — include (substring, any-of)",
                        "type": "array",
                        "description": "Keep rows where `industry` contains any of these substrings. Examples: `computer software`, `legal services`, `hospital & health care`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "industriesExclude": {
                        "title": "🧱 Industries — exclude",
                        "type": "array",
                        "description": "Drop rows where `industry` contains any of these substrings. Always enforced. Examples: `staffing`, `gambling`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "jobTitleContains": {
                        "title": "📝 Current job title — include (any-of)",
                        "type": "array",
                        "description": "Text match on `job_title` (and all `experience[].title.name`). Uses your **Text match mode** above (contains / exact / regex / …). Examples: `counsel`, `director`, `chief`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "jobTitleExclude": {
                        "title": "🙅 Current job title — exclude",
                        "type": "array",
                        "description": "Reject rows whose `job_title` matches any entry under the active Text match mode. Always enforced. Examples: `intern`, `assistant`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "jobRolesInclude": {
                        "title": "🧭 Job roles (functional area)",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Taxonomy match against `experience[].title.role` **or** `job[].title_role`. Multi-select. Useful when titles vary wildly (e.g. all sales-flavored roles).",
                        "items": {
                            "type": "string",
                            "enum": [
                                "customer_service",
                                "design",
                                "education",
                                "engineering",
                                "finance",
                                "health",
                                "human_resources",
                                "legal",
                                "marketing",
                                "media",
                                "operations",
                                "public_relations",
                                "real_estate",
                                "sales"
                            ]
                        }
                    },
                    "jobLevelsInclude": {
                        "title": "🏅 Seniority levels",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Taxonomy match against `experience[].title.levels[]`. Pick one or more. `cxo` covers CEO / CTO / CFO etc.; `training` = interns and residents.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "cxo",
                                "vp",
                                "director",
                                "partner",
                                "owner",
                                "senior",
                                "manager",
                                "entry",
                                "training",
                                "unpaid"
                            ]
                        }
                    },
                    "jobSubRolesContains": {
                        "title": "🧪 Sub-roles — include (any-of)",
                        "type": "array",
                        "description": "Substring match on `experience[].title.sub_role`. Lets you reach into specialties like `lawyer`, `doctor`, `nursing`, `data`, `graphic_design`, `product`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "minYearsExperience": {
                        "title": "🧓 Min years of experience",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Require `inferred_years_experience` ≥ this value. `0` = no minimum. Great proxy for seniority when titles are messy.",
                        "default": 0
                    },
                    "maxYearsExperience": {
                        "title": "🧑 Max years of experience",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Require `inferred_years_experience` ≤ this value. `0` = no maximum.",
                        "default": 0
                    },
                    "currentlyEmployed": {
                        "title": "🟢 Currently employed (primary role open)",
                        "type": "boolean",
                        "description": "Keep rows where `experience[]` contains an `is_primary: true` entry with no `end_date` (i.e. still in that role).",
                        "default": false
                    },
                    "companyNamesInclude": {
                        "title": "🏢 Company — has worked at (any-of)",
                        "type": "array",
                        "description": "Keep rows whose `experience[].company.name` contains any of these substrings. Examples: `siemens`, `google`, `mckinsey`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "companyNamesExclude": {
                        "title": "🚯 Company — never worked at",
                        "type": "array",
                        "description": "Drop rows where any `experience[].company.name` contains any of these substrings. Always enforced. Useful to exclude competitors or blacklisted employers.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "currentCompanyContains": {
                        "title": "📌 Current employer contains",
                        "type": "array",
                        "description": "Substring match on `job_company[].name` (the current employer). Use this when you don't care about history, only where they work *right now*.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "companyIndustriesInclude": {
                        "title": "🏗️ Company industries (from experience)",
                        "type": "array",
                        "description": "Substring match on `experience[].company.industry`. Broader than the top-level `industry` because it covers every past employer too. Example: `electrical/electronic manufacturing`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "companySizeMin": {
                        "title": "👥 Min company size bucket",
                        "enum": [
                            "",
                            "1-10",
                            "11-50",
                            "51-200",
                            "201-500",
                            "501-1000",
                            "1001-5000",
                            "5001-10000",
                            "10001+"
                        ],
                        "type": "string",
                        "description": "Lower bound of `experience[].company.size`. A row passes if they ever worked at a company with headcount ≥ this bucket.",
                        "default": ""
                    },
                    "companySizeMax": {
                        "title": "👤 Max company size bucket",
                        "enum": [
                            "",
                            "1-10",
                            "11-50",
                            "51-200",
                            "201-500",
                            "501-1000",
                            "1001-5000",
                            "5001-10000",
                            "10001+"
                        ],
                        "type": "string",
                        "description": "Upper bound of `experience[].company.size`.",
                        "default": ""
                    },
                    "minCompaniesWorked": {
                        "title": "🧳 Minimum number of past employers",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Require `experience[].length` ≥ this value. Proxy for career breadth.",
                        "default": 0
                    },
                    "schoolsInclude": {
                        "title": "🎓 Schools — include (substring, any-of)",
                        "type": "array",
                        "description": "Keep rows who studied at any of these institutions. Substring match. Examples: `harvard`, `mit`, `hofstra`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "degreesInclude": {
                        "title": "🥇 Degrees — include (any-of)",
                        "type": "array",
                        "description": "Any token in `education[].degrees[]` must contain any of these strings. Common tokens: `bachelors`, `masters`, `doctorates`, `doctor of jurisprudence`, `master of business administration`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "majorsInclude": {
                        "title": "📚 Majors — include (any-of)",
                        "type": "array",
                        "description": "Substring match on `education[].majors[]`. Examples: `law`, `computer science`, `business`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "minEducationLevel": {
                        "title": "🏛️ Minimum education level",
                        "enum": [
                            "any",
                            "associates",
                            "bachelors",
                            "masters",
                            "doctorate"
                        ],
                        "type": "string",
                        "description": "Require at least one education entry at this level or higher. Degree strings are matched heuristically (regex covers PhD, JD, MD, MBA, MSc, BSc, BA, LLB, associate, etc.).",
                        "default": "any"
                    },
                    "languagesAnyOf": {
                        "title": "🗣️ Languages — any of",
                        "type": "array",
                        "description": "Keep rows who speak at least one of these languages. Examples: `english`, `german`, `arabic`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "languagesAllOf": {
                        "title": "🈯 Languages — all of",
                        "type": "array",
                        "description": "Stricter: require **every** listed language to be present in `languages[]`. Example: `english` + `german` = bilingual.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "interestsInclude": {
                        "title": "🎯 Interests contain (any-of)",
                        "type": "array",
                        "description": "Substring match on `interests[]`. Often qualitative — think `human rights`, `science and technology`, `education`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "hasAnyCertification": {
                        "title": "📜 Must have ≥ 1 certification",
                        "type": "boolean",
                        "description": "Drop rows where `certifications` is null / empty. Useful for regulated industries (legal, medical, finance).",
                        "default": false
                    },
                    "certificationsInclude": {
                        "title": "🏷️ Certifications contain (any-of)",
                        "type": "array",
                        "description": "Substring match on `certifications[].name` (or raw string). Examples: `pmp`, `bar exam`, `cpa`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "genderInclude": {
                        "title": "⚧️ Gender",
                        "enum": [
                            "any",
                            "male",
                            "female",
                            "unspecified"
                        ],
                        "type": "string",
                        "description": "Filter on `basic_info.gender`. Dataset currently stores `male` / `female`; rows with unspecified gender are passed through on *Any*.",
                        "default": "any"
                    },
                    "nameContains": {
                        "title": "👤 Full name contains (any-of)",
                        "type": "array",
                        "description": "Text match on `full_name` using the active *Text match mode*. With `regex`, a pattern like `^a[ln]` matches names starting with *al* or *an*.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "nameExclude": {
                        "title": "🙈 Full name excludes",
                        "type": "array",
                        "description": "Reject rows whose `full_name` matches. Always enforced. Good for suppression lists or test / dummy rows.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "linkedinUsernameContains": {
                        "title": "🔗 LinkedIn username contains",
                        "type": "string",
                        "description": "Text match on `linkedin_username` using the active *Text match mode*. Empty = off."
                    },
                    "linkedinRequired": {
                        "title": "🔗 Require LinkedIn username present",
                        "type": "boolean",
                        "description": "Drop rows without a `linkedin_username` — handy when your downstream tool keys off LinkedIn URLs.",
                        "default": false
                    },
                    "minConnections": {
                        "title": "📈 Minimum connections",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Require `connections` ≥ this value. `0` = no minimum.",
                        "default": 0
                    },
                    "maxConnections": {
                        "title": "📉 Maximum connections",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Require `connections` ≤ this value. `0` = no maximum.",
                        "default": 0
                    },
                    "requireAnyEmail": {
                        "title": "✉️ Require at least one email (personal OR work)",
                        "type": "boolean",
                        "description": "Drop rows where both `email` and `work_email` are empty or null.",
                        "default": false
                    },
                    "requireWorkEmail": {
                        "title": "🧑‍💼 Require work email",
                        "type": "boolean",
                        "description": "Drop rows where `work_email` is missing **and** no `emails[]` entry has a professional-type token (`professional`, `current_professional`, `previous_professional`, `work`, or `business`). The Actor canonicalizes these to the same class before filtering.",
                        "default": false
                    },
                    "requirePersonalEmail": {
                        "title": "🏠 Require personal email",
                        "type": "boolean",
                        "description": "Drop rows where `email` is missing **and** no `emails[]` entry has `type = personal`.",
                        "default": false
                    },
                    "requirePhone": {
                        "title": "📞 Require phone number",
                        "type": "boolean",
                        "description": "Drop rows where `phone_numbers` is null or empty.",
                        "default": false
                    },
                    "emailDomainsInclude": {
                        "title": "🌐 Email domains — include (any-of)",
                        "type": "array",
                        "description": "Keep rows where **any** email (personal, work, or from `emails[]`) ends with one of these domains. Examples: `gmail.com`, `siemens.com`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "emailDomainsExclude": {
                        "title": "🛑 Email domains — exclude",
                        "type": "array",
                        "description": "Drop rows where any email ends with one of these domains. Always enforced. Common blocklist: `mailinator.com`, `guerrillamail.com`, `example.com`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "globalKeywordsAny": {
                        "title": "🔎 Smart keyword search — any-of",
                        "type": "array",
                        "description": "Keep rows where **any** of these keywords appears in **any** of the searched fields. Uses the active *Text match mode* (supports regex). Examples: `patent`, `intellectual property`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "globalKeywordsExclude": {
                        "title": "🧹 Smart keyword — exclude",
                        "type": "array",
                        "description": "Drop rows where any of these keywords appears in any searched field. Always enforced. Great for suppressing known irrelevant niches.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "dedupeByField": {
                        "title": "♻️ Primary dedupe key",
                        "enum": [
                            "none",
                            "linkedin_username",
                            "email",
                            "work_email",
                            "id",
                            "full_name",
                            "phone"
                        ],
                        "type": "string",
                        "description": "Pick the best unique identifier for your audience. `linkedin_username` is usually safest; `email` is great for deliverability lists.",
                        "default": "none"
                    },
                    "secondaryDedupeField": {
                        "title": "♻️ Secondary dedupe key (fallback)",
                        "enum": [
                            "none",
                            "linkedin_username",
                            "email",
                            "work_email",
                            "id",
                            "full_name",
                            "phone"
                        ],
                        "type": "string",
                        "description": "When the primary key is missing or empty on a row, fall back to this key so you don't lose records that lack (e.g.) a LinkedIn handle but do have an email.",
                        "default": "none"
                    },
                    "fieldsToKeep": {
                        "title": "🧾 Fields to keep (allow-list)",
                        "type": "array",
                        "description": "If non-empty, only these top-level keys are emitted. Nested paths are **not** supported here — use *Fields to omit* for surgical removal. Example keys: `id`, `full_name`, `email`, `linkedin_username`, `country`, `job_title`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "fieldsToOmit": {
                        "title": "✂️ Fields to omit (deny-list)",
                        "type": "array",
                        "description": "Strip these top-level keys from every emitted record. Applied *after* `fieldsToKeep`. Example keys: `phone_numbers`, `street_addresses`.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "redactPii": {
                        "title": "🛡️ Redact PII (emails & phones)",
                        "type": "boolean",
                        "description": "Masks emails as `j***@domain.com` and keeps only the last 4 digits of phone numbers. Useful when sharing samples with contractors or posting demos publicly. Does **not** affect filtering — only the output.",
                        "default": false
                    },
                    "flattenNestedFields": {
                        "title": "📐 Flatten nested fields",
                        "type": "boolean",
                        "description": "Expose the most-used nested bits at the top level: `first_name`, `last_name`, `gender`, `primary_city`, `primary_region`, `primary_continent`, `current_company_name`, `primary_email_domain`. Original nested objects remain, unless you also use *Fields to omit*.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
