AI & ML Engineer Jobs Scraper — 8 Boards in One
Pricing
Pay per event
AI & ML Engineer Jobs Scraper — 8 Boards in One
Every strong AI/ML job source behind one endpoint: aijobs.net, LinkedIn, Hacker News Who-is-Hiring, Y Combinator, Built In, RemoteOK/Remotive/WeWorkRemotely, WTTJ and JustJoin.it. One run returns a merged, URL-deduped dataset of live ML, AI and data roles.
One call, eight sources. Fans out to 8 job-source actors tuned for AI/ML roles, merges and dedupes into a single dataset.
What machine learning jobs data does this scraper extract?
Each result is one flat JSON record per job posting:
| Field | Type | Meaning |
|---|---|---|
source | string | Which child board the record came from, e.g. "ai_jobs_net" |
id | string | Stable source-side identifier ("" when the source has none) |
title | string | Job title as posted |
company | string | Hiring company / organisation |
location | string | Location / duty station (may include remote hints) |
url | string | Direct link to the posting |
postedAt | string | Posting date where the source provides it, else "" |
deadline | string | Application deadline — none of this bundle's 8 sources provide one today, so this is always "" |
snippet | string | Short description excerpt |
salary | string | Salary text — source-dependent, see below, else "" |
salary is populated by ai_jobs_net, ycombinator_was, builtin, remote_boards, wttj and justjoinit. linkedin and hackernews don't expose salary on their listings, so those records always return "".
How the bundle works
This is a bundle Actor: one endpoint that fans out to the individual job-source Actors listed below, runs them concurrently, maps every record onto one flat schema and dedupes by URL across boards. You can restrict the run to a subset with the sources input. Each child source is charged its own pay-per-event pricing on top of this bundle's — that is the cost of one-call breadth.
How to scrape machine learning jobs with this Actor
- Click Try for free / Run — no login to the target site, no cookies, no proxies to configure.
- Adjust the input (keyword, filters,
maxItems) or keep the defaults. - Run it and export the dataset as JSON, CSV or Excel, or read it over the API.
Run it from your own code:
from apify_client import ApifyClientclient = ApifyClient("<YOUR_APIFY_TOKEN>")run = client.actor("nomad-agent/ml-ai-dev-bundle").call(run_input={"maxItems": 50})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["title"], "—", item["company"], item["url"])
Or a single HTTP call that runs the Actor and returns items in one response:
curl -X POST \"https://api.apify.com/v2/acts/nomad-agent~ml-ai-dev-bundle/run-sync-get-dataset-items?token=<YOUR_APIFY_TOKEN>" \-H "Content-Type: application/json" \-d '{"maxItems": 50}'
Input
| Field | Type | Default | Notes |
|---|---|---|---|
sources | array | ["linkedin", "ai_jobs_net", "hackernews", "ycombinator_was", "builtin", "remote_boards", "wttj", "justjoinit"] | Which boards to include. Leave empty to run the full default set. Each enabled source costs its own $0.05 actor-start plus per-result fees. |
keyword | string | "" | Optional free-text filter forwarded to children that support it (others ignore it). |
maxItemsPerSource | integer | 36 | Cap on items fetched from EACH child board before merge. |
maxItems | integer | 288 | Hard cap on the merged, deduped output. Default is sources × maxItemsPerSource (the zero-config ceiling). Set 0 for no cap. |
cacheTtlSeconds | integer | 1800 | How long to reuse results already fetched from a source instead of re-fetching. 0 = always fetch fresh. |
concurrency | integer | 6 | How many child boards to run in parallel. (Advanced) |
runTimeoutSecs | integer | 120 | How long to wait for each source before giving up on it. (Advanced) |
apifyToken | string (secret) | "" | Leave empty — injected automatically on the Apify platform. Only set for local runs outside the platform. (Advanced) |
actorOwner | string | "" | Which Apify account's child actors to call. Leave empty to use this bundle's published sources. (Advanced) |
Output example
{"source": "ai_jobs_net","id": "200475","title": "Machine Learning Engineer","company": "Hugging Face","location": "Remote","url": "https://aijobs.net/job/machine-learning-engineer-remote-200475/","postedAt": "2026-06-28","deadline": "","snippet": "We're hiring an ML engineer to work on...","salary": ""}
A record from ai_jobs_net, ycombinator_was, builtin, remote_boards, wttj or justjoinit has the same shape with salary filled in when the source card includes it, e.g. "salary": "$140K–$200K".
Pricing
Pay per event: $0.05 per Actor start and $0.004 per job returned — plus each enabled child source's own pay-per-event pricing (also $0.05 per start + $0.004 per result, charged by that child actor directly).
Zero-config run estimate (defaults, all 8 sources): up to ~288 merged items, roughly $2.75 all-in ($0.05 bundle start + 8×$0.05 in child starts + up to 288 items × $0.008 combined per-result fee). Real runs usually cost less — not every board returns the full cap, and cross-board duplicates are billed once by whichever child returned them first but not double-billed by the bundle.
Use cases
- AI-specialist job boards
- ML-engineer alert bots
- AI-talent market research
- Recruiting pipelines for data/ML teams
FAQ
Is it legal to scrape machine learning jobs? This Actor reads only publicly available job postings — data any visitor can see without logging in. No personal data behind authentication is touched. Review the target site's terms and your local regulations for your specific use case.
Do I need an account on the target site? No. Postings are fetched from public pages/APIs — no login, cookies or session tokens.
How fresh is the data?
Every run fetches live listings. Results are cached for cacheTtlSeconds (default 30 min, set 0 to always hit the source live).
How many jobs can I get?
maxItems caps the run (set 0 for no cap). Most sources paginate from newest to oldest.
Something broken or missing? Open an issue on the Actor's Issues tab — it is monitored and reliability fixes ship fast.
Integrations
Export the dataset as JSON, CSV or Excel, or read it straight from the Apify API. Works out of the box with Make, Zapier and n8n via their Apify integrations, can be called synchronously with run-sync-get-dataset-items from any backend, and is usable by AI agents through the Apify MCP server.
Related Actors
- AI Jobs Scraper (aijobs.net) — ML & Data Roles
- LinkedIn Jobs Scraper — No Login, No Cookies
- Hacker News Who Is Hiring Scraper — HN Jobs
- Y Combinator Jobs Scraper — Work at a Startup
- Built In Jobs Scraper — US Tech & Startup Jobs
- Remote Jobs Scraper — RemoteOK Remotive WWR
- Welcome to the Jungle Jobs Scraper (WTTJ)
- JustJoin.it Jobs Scraper — Polish Tech & IT Jobs