AI & ML Engineer Jobs Scraper — 8 Boards in One avatar

AI & ML Engineer Jobs Scraper — 8 Boards in One

Pricing

Pay per event

Go to Apify Store
AI & ML Engineer Jobs Scraper — 8 Boards in One

AI & ML Engineer Jobs Scraper — 8 Boards in One

Every strong AI/ML job source behind one endpoint: aijobs.net, LinkedIn, Hacker News Who-is-Hiring, Y Combinator, Built In, RemoteOK/Remotive/WeWorkRemotely, WTTJ and JustJoin.it. One run returns a merged, URL-deduped dataset of live ML, AI and data roles.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Nomad.Dev

Nomad.Dev

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

8 hours ago

Last modified

Categories

Share

One call, eight sources. Fans out to 8 job-source actors tuned for AI/ML roles, merges and dedupes into a single dataset.

What machine learning jobs data does this scraper extract?

Each result is one flat JSON record per job posting:

FieldTypeMeaning
sourcestringWhich child board the record came from, e.g. "ai_jobs_net"
idstringStable source-side identifier ("" when the source has none)
titlestringJob title as posted
companystringHiring company / organisation
locationstringLocation / duty station (may include remote hints)
urlstringDirect link to the posting
postedAtstringPosting date where the source provides it, else ""
deadlinestringApplication deadline — none of this bundle's 8 sources provide one today, so this is always ""
snippetstringShort description excerpt
salarystringSalary text — source-dependent, see below, else ""

salary is populated by ai_jobs_net, ycombinator_was, builtin, remote_boards, wttj and justjoinit. linkedin and hackernews don't expose salary on their listings, so those records always return "".

How the bundle works

This is a bundle Actor: one endpoint that fans out to the individual job-source Actors listed below, runs them concurrently, maps every record onto one flat schema and dedupes by URL across boards. You can restrict the run to a subset with the sources input. Each child source is charged its own pay-per-event pricing on top of this bundle's — that is the cost of one-call breadth.

How to scrape machine learning jobs with this Actor

  1. Click Try for free / Run — no login to the target site, no cookies, no proxies to configure.
  2. Adjust the input (keyword, filters, maxItems) or keep the defaults.
  3. Run it and export the dataset as JSON, CSV or Excel, or read it over the API.

Run it from your own code:

from apify_client import ApifyClient
client = ApifyClient("<YOUR_APIFY_TOKEN>")
run = client.actor("nomad-agent/ml-ai-dev-bundle").call(run_input={"maxItems": 50})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["title"], "—", item["company"], item["url"])

Or a single HTTP call that runs the Actor and returns items in one response:

curl -X POST \
"https://api.apify.com/v2/acts/nomad-agent~ml-ai-dev-bundle/run-sync-get-dataset-items?token=<YOUR_APIFY_TOKEN>" \
-H "Content-Type: application/json" \
-d '{"maxItems": 50}'

Input

FieldTypeDefaultNotes
sourcesarray["linkedin", "ai_jobs_net", "hackernews", "ycombinator_was", "builtin", "remote_boards", "wttj", "justjoinit"]Which boards to include. Leave empty to run the full default set. Each enabled source costs its own $0.05 actor-start plus per-result fees.
keywordstring""Optional free-text filter forwarded to children that support it (others ignore it).
maxItemsPerSourceinteger36Cap on items fetched from EACH child board before merge.
maxItemsinteger288Hard cap on the merged, deduped output. Default is sources × maxItemsPerSource (the zero-config ceiling). Set 0 for no cap.
cacheTtlSecondsinteger1800How long to reuse results already fetched from a source instead of re-fetching. 0 = always fetch fresh.
concurrencyinteger6How many child boards to run in parallel. (Advanced)
runTimeoutSecsinteger120How long to wait for each source before giving up on it. (Advanced)
apifyTokenstring (secret)""Leave empty — injected automatically on the Apify platform. Only set for local runs outside the platform. (Advanced)
actorOwnerstring""Which Apify account's child actors to call. Leave empty to use this bundle's published sources. (Advanced)

Output example

{
"source": "ai_jobs_net",
"id": "200475",
"title": "Machine Learning Engineer",
"company": "Hugging Face",
"location": "Remote",
"url": "https://aijobs.net/job/machine-learning-engineer-remote-200475/",
"postedAt": "2026-06-28",
"deadline": "",
"snippet": "We're hiring an ML engineer to work on...",
"salary": ""
}

A record from ai_jobs_net, ycombinator_was, builtin, remote_boards, wttj or justjoinit has the same shape with salary filled in when the source card includes it, e.g. "salary": "$140K–$200K".

Pricing

Pay per event: $0.05 per Actor start and $0.004 per job returned — plus each enabled child source's own pay-per-event pricing (also $0.05 per start + $0.004 per result, charged by that child actor directly).

Zero-config run estimate (defaults, all 8 sources): up to ~288 merged items, roughly $2.75 all-in ($0.05 bundle start + 8×$0.05 in child starts + up to 288 items × $0.008 combined per-result fee). Real runs usually cost less — not every board returns the full cap, and cross-board duplicates are billed once by whichever child returned them first but not double-billed by the bundle.

Use cases

  • AI-specialist job boards
  • ML-engineer alert bots
  • AI-talent market research
  • Recruiting pipelines for data/ML teams

FAQ

Is it legal to scrape machine learning jobs? This Actor reads only publicly available job postings — data any visitor can see without logging in. No personal data behind authentication is touched. Review the target site's terms and your local regulations for your specific use case.

Do I need an account on the target site? No. Postings are fetched from public pages/APIs — no login, cookies or session tokens.

How fresh is the data? Every run fetches live listings. Results are cached for cacheTtlSeconds (default 30 min, set 0 to always hit the source live).

How many jobs can I get? maxItems caps the run (set 0 for no cap). Most sources paginate from newest to oldest.

Something broken or missing? Open an issue on the Actor's Issues tab — it is monitored and reliability fixes ship fast.

Integrations

Export the dataset as JSON, CSV or Excel, or read it straight from the Apify API. Works out of the box with Make, Zapier and n8n via their Apify integrations, can be called synchronously with run-sync-get-dataset-items from any backend, and is usable by AI agents through the Apify MCP server.