Pricing

from $10.00 / 1,000 delivered records

Try for free

Go to Apify Store

AI & PhD Researcher Dataset Filter — recruiting, GTM, research

Try for free

Turn a raw JSON export of AI / PhD / researcher profiles into a precise, deduplicated, deliverable-grade shortlist in seconds. Built for recruiting teams, B2B growth/SDR teams, and research panels who need clean, targeted lists instead of raw scraping noise. 🚀 22.5k records filtered in <6s.

Pricing

from $10.00 / 1,000 delivered records

Rating

5.0

(1)

Developer

CrystalBytes

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

🎓 AI & PhD Researcher Dataset Filter — recruiting, GTM, research

Turn a raw JSON export of AI / PhD / researcher profiles into a precise, deduplicated, deliverable-grade shortlist in seconds.

You do not need extra engineering to get a useful first run. This Actor does not browse the web or pull live profiles. You bring your own JSON file (a single array of profile objects). The Actor filters, deduplicates, and shapes the rows you choose, then writes them to an Apify Dataset you can download as JSON or CSV.

Who it is for

Hiring and talent teams shortlisting PhD-level AI, ML, or research profiles from an existing export.
B2B GTM, SDR, and growth teams who need a clean, ICP-matched list instead of a noisy raw dump.
Research, policy, and panel coordinators who need specific countries, languages, or seniority without manual spreadsheet work.
Data and ops teams that already have profile JSON and want repeatable, versioned “audience” runs.

Get started in three steps

Try the built-in sample — Open the Actor and run. The bundled demo loads automatically so you can see how filters work; results appear on the Dataset tab.
Your own profile file — For production runs, whoever manages your workspace connects the JSON source your organization uses. If you need a different file than the default, ask them to point the Actor at it.
Download results — Open the run’s Dataset for the rows. For a step-by-step breakdown, read RUN_SUMMARY in the run’s default Key-value store.

Note: The Input form is for filters and export limits only. Which JSON file a run uses is chosen outside the public form (by your workspace setup).

Find the right people (practical playbooks)

Use the matching sections in the Input form. Leave a field empty to turn that filter off.

I want to…	Start here in the form
US or UK candidates only	Location — countries include (and add excludes for regions you do not want).
Europe-based PhD+ researchers	Location (continent or country) + Education — minimum level, schools, or degrees.
Senior AI / product / legal in software	Career — industry, job title, job level; optionally Company for size or employer name.
Quality contacts (work email, fewer bad domains)	Contact quality — require work email, allow / block email domains.
A tight shortlist, not the whole file	Volume, sampling & pagination — see How many rows you export below.
No duplicate people	Deduplication — pick a primary key (e.g. LinkedIn username) and optional backup key.
Safer sharing or demos (masked email / phone)	Output shaping & privacy — redact PII, trim fields, or flatten nested fields for CSV.

Narrow with AND (every enabled group must match) or explore more broadly with OR (at least one group). Exclusion lists (countries you block, bad domains, title excludes) are always applied, even in OR mode, so you do not “leak” blocked rows by accident.

How filters work (short version)

Each enabled field is a condition. Match mode (AND / OR) controls how groups of conditions combine; values inside one list are OR’d (e.g. any of several countries).
Empty = that filter is off.
Excludes (countries, companies, keywords, etc.) are always enforced for safety.

The Console form is grouped into sections: Optional listing (if used) → Filter logic → Volume → Location through Output shaping. Every field has examples and tips inline.

How many rows you export

The Actor filters the entire file first, then deduplicates (if you set dedupe), then optionally takes a random sample, and only then applies row limits. So limits always apply to the qualified list.

You can use either style — not both (the run will stop with a clear error if you mix them on purpose).

A) “Start at row” and “Stop before row” (range)

Good when you want a single slice without doing math (e.g. “rows 0–999” or “100 to 1000”).

Start at row — 0 = first row in the matched list (after filters, dedupe, and optional sample).
Stop before row — Exclusive end: valid rows are [Start, Stop). Example: start 0, stop 1000 = first 1000 rows. Start 100, stop 1000 = 900 rows (indices 100 through 999).

Rows in this export ≈ Stop − Start. Paid plans support starting after row 0 (pagination). On the free tier, starting after the first row is not supported — use the first slice only, or upgrade for offset / pagination.

B) “Skip first N” and “Max records” (classic)

Skip first N — offset after the qualified list (page 2 of 1 000: skip 1000, max 1000 when each “page” is 1 000 rows).
Max records to output — 0 means “up to the limit allowed by your plan and the monthly allowance,” not “zero rows.”

Random sample (optional) shuffles the qualified list before skip / cap — use it for A/B tests or training splits, not for stable paging unless you know what you are doing.

Billing reminder: the platform may charge by delivered rows; your plan also enforces per-run and per-month caps. See the Actor’s Pricing tab in Apify and RUN_SUMMARY → monetization.

Output and transparency

Dataset — one JSON object per row; download as JSON, CSV, or Excel from the run.
RUN_SUMMARY (in the run’s default Key-value store) — how many records were loaded, filtered, deduplicated, sampled, skipped, and exported, plus monetization and timing. Use it when results look empty, too small, or when reconciling usage.

Set Flatten nested fields for wider CSV columns. Use Redact PII when you need shareable samples without full email or phone.

Pricing and plans (summary)

Exact unit prices, events, and any platform fees are on this Actor’s Pricing tab in the Apify Console. The table below is the Actor-side policy (from our tier file), so you can see run and monthly caps; it is not a substitute for the Console invoice.

Tier	Max / run	Max / month	Runs / day	Free tier field limits
`free`	50	120	1	Yes (basic fields only)
`starter`	4 000	15 000	no hard daily cap in Actor	—
`pro`	4 000	25 000	no hard daily cap in Actor	—
`agency`	10 000	100 000	no hard daily cap in Actor	—
`development`	(high)	(high)	(high)	For local / owner tests only

Free strips sensitive columns (e.g. work email, phones, some addresses) so you can evaluate fit before upgrading.
Paid tiers unlock the full record, offset pagination (skip / start-after-first-row), and overage past the monthly cap where configured — see the Console for overage event names and prices.
After each run, check RUN_SUMMARY → monetization and compare to your Apify billing view.

Trust, data, and compliance

You supply the JSON; this run does not crawl third-party sites or “discover” new profiles from the open web.
You are responsible for lawful use, consent, and platform terms that apply to your source data (e.g. privacy rules, email outreach laws).
Use redaction and field allow / deny lists for demos, contractors, or external sharing.
Who can see a run’s full Input is controlled in Apify (organization permissions). Do not put passwords or private keys in task input.

On performance and large files, see Options on the run (memory, timeout). A rough guide: a 22k-row file has been used in development tests in a few seconds at 2 GB memory; very large single files may need more memory, a longer timeout, or splitting the source file — ask your workspace admin if a run times out or runs out of memory.

Reliability and support

Invalid inputs (e.g. bad regex patterns, over-claimed advertised counts, or conflicting volume settings) fail fast with a readable error.
0 results after filters — widen one group at a time, try OR match mode, or check RUN_SUMMARY → pipeline to see where the list went to zero.

Support and feedback: crystalbytes@proton.me — usually within one business day.

Ready to build a clean, plan-aware shortlist from your own researcher JSON — start a run and refine filters using RUN_SUMMARY until the numbers match your goal.

AI Company Researcher

bala-ceg/ai-company-researcher

AI Company Researcher is an Apify Actor that automates company research using Tavily Search API and AI-powered insights. It gathers company details, financials, products, leadership insights, and recent developments, then compiles a structured Markdown report with inline citations

Balaji Seetharaman

AI Company Researcher Agent

louisdeconinck/ai-company-researcher-agent

AI-powered agent that performs comprehensive company research and generates detailed business reports.

Louis Deconinck

161

1.1

Signalbase Real Time Companies Dataset

signalbase/signalbase-companies

A live company dataset, generated from real signal activity. 189k+ companies detected through funding, hiring, M&A, and job changes signals growing every minute. Built for B2B vendors, GTM teams, and AI agents who need active companies, not bloated databases.

Signalbase

Business URL Discovery: B2B Lead Generation

mechanical_spirit/business-url-discovery

Turn search queries into verified company domains. Uses AI to filter noise and returns clean B2B leads at $0.005/result.

mechanical_spirit

ORCID Scraper — Researcher Profiles, Works & Affiliations

openclawmara/orcid-scraper

Scrape ORCID researcher registry. Modes: search profiles, researcher details by ORCID iD, works/publications, employment and education history. Extracts names, affiliations, DOIs, funding, peer reviews. Official Public API. For academic network analysis & research mapping.

OpenClaw Mara

Researcher Integrity — Retraction & Paper Mill

ryanclinton/researcher-integrity-check

Comprehensive academic researcher integrity screening.

Ryan Clinton

Google Search to Full Article Text ⚡$4 per 1k

ohmydata/google-search-to-full-article

Turn Google search (SERP) queries into a dataset of deduplicated, clean full article text.

OhMyData

5.0

Research Paper Assistant

brilliant_kimono/my-actor

Search academic papers across arXiv and PubMed with AI-powered intelligence.Automatically generate summaries, extract citations, and create comprehensive literature reviews. Streamline your research workflow - perfect for PhD students, researchers, R&D teams, and anyone conducting academic research.

Utsab Dahal

Global Ai B2b Lead Enrichment Actor

josiah_essau/global-ai-b2b-lead-enrichment-actor

Find, crawl, enrich, score, and prepare outreach for B2B sales leads. This Actor is built for agencies, consultants, lead brokers, SEO teams, web design businesses, and SaaS outbound teams that need more than raw scraped URLs.