Pricing

from $2.00 / 1,000 results

Totaljobs Scraper - UK’s Largest Job Board

Scrape totaljobs.com - UK’s largest job board. Salary data, employer contact details, full job descriptions, and job change monitoring. Incremental mode detects new and changed listings. Compact output for AI agents and MCP workflows.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Black Falcon Data

Actor stats

Bookmarked

Total users

Monthly active users

16 hours

Issues response

10 days ago

Last modified

2026-05-14

Fixed: maxPages no longer silently caps maxResults. When you request more results than the default page budget allows, the actor now extends pagination automatically.

1.3.1 — 2026-05-04

Critical billing — caught by external review of v1.3.0 before ship

SERP-only Scrape.do path no longer double-bills. With includeDetails: false, Scrape.do pushed every row directly, but scrapedoSerpUsed = pendingDetailJobs.length > 0 evaluated to false because nothing was queued for detail enrichment. CheerioCrawler then ran the same start URLs again, producing duplicate dataset rows and charging twice. The flag now derives from "did Scrape.do successfully parse a SERP page", and the SERP-only push routes through the central pushOutputForJob helper. v1.3.0 shipped this regression but was held before going live; production stayed on v1.2.x.
Push contract is now wired into every engine path (Playwright SERP+detail, Firefox detail, browser-fetch detail, Playwright detail crawler, plus the onDetailFailed SERP-fallback push in main.ts). v1.3.0 only reached routes.ts and Scrape.do, leaving the rest with the same C2/I1/I2 bugs the helper was meant to retire:
- markSeen-on-dedup-hit: local maybePush returned true on dedup, callers then unconditionally called incremental.markSeen for rows that never reached the dataset, poisoning incremental state.
- detailsFetched: true for parser failures: a 200 with no JSON-LD JobPosting now correctly emits detailsFetched: false. Was unconditionally true whenever the HTTP response succeeded.
- SERP-only rows missing changeType / firstSeenAt / lastSeenAt: the engine paths advertised these lifecycle fields but only Scrape.do (post-v1.3.0) populated them.

Diagnostics

writeFailureDiagnostics now fires on every early-exit branch. Invalid maxResults, invalid startUrl, and state-lock conflict each write a FAILURE_DIAGNOSTICS KV record so consumers can distinguish "ran clean with 0 results" from "never started". Lock-conflict happens before the outer try/catch wrap, so it gets its own write rather than relying on the thrown_error fallback.

State store

Incremental state is pruned on save. Two-phase: age-based prune drops entries whose lastSeenAt is older than 90 days, then a soft size budget (8 MB, leaving headroom under Apify's 9 MB per-record limit) triggers oldest-first pruning if the JSON payload still exceeds it. Without this, the v2 timestamp-per-id format would eventually breach the KV limit and start failing every save on long-lived state stores. INCREMENTAL_RETENTION_MS and INCREMENTAL_MAX_BYTES are exported so other actors can tune.

1.3.0 — 2026-05-04

Critical correctness — push contract

pushOutputForJob helper centralises every dataset write. Owns changeType, firstSeenAt, lastSeenAt, transform, push, dedup, and incremental.markSeen. Callers now go through one function instead of 7 ad-hoc copies, so future fixes don't have to be replicated across engine paths (which is exactly the failure mode that left the Scrape.do paths stale through v1.1.0 / v1.2.0).
markSeen is now strictly post-push. Previously routes.ts SERP-only and LABEL_DETAIL paths called incremental.markSeen BEFORE attemptPush, so a cap rejection or transient push failure left the job permanently locked out of future incremental runs. Helper invariant now enforces "markSeen iff push succeeded".
Cap-rejected jobs are no longer marked seen. The SERP onDetailJob callback used to mark cap-overflow jobs as seen ("// cap reached — mark remaining jobs seen"), causing silent data loss across runs. Removed; future runs now rediscover those jobs as NEW.

State model

IncrementalState v2 KV format. Per-id timestamps {firstSeenAt, lastSeenAt} replace the bare seen-set. Legacy v1 {ids} state migrates automatically on first load by synthesising timestamps from the run-level updatedAt. OutputItem.firstSeenAt / lastSeenAt now carry real cross-run semantics.
State lock TTL bumped 30 min → 90 min. With actor timeoutSecs: 1800 (30 min), the previous TTL was identical to the timeout, so any long-but-legal run could trigger a stale_override race mid-write.
Lock release is compare-and-delete. Previously setValue(lockKey, null) ran unconditionally, so a late-exiting run could nuke a concurrent run's freshly-acquired lock. Now reads current lock first and only clears if runId matches.

Scrape.do detail engine — bugs from the v1.2.0 audit pass

The "comprehensive correctness pass" of v1.1.0 missed both Scrape.do detail blocks (primary + deferred retry). All three engine-specific bugs are now fixed in those paths:

Currency: parseDetailJsonLd now receives defaultCurrencyForGeo(geo). UK was getting EUR salaries; now correctly GBP.
contentHash is positional null-safe — was using .filter(Boolean).join('|') which dropped empty strings and shifted slot positions, so a missing field would re-hash an unrelated value.
detailsFetched = Boolean(detail) — was unconditionally true when Scrape.do returned 200, claiming detail success even when the parser couldn't read the JSON-LD.

Cap math

SERP collection at main.ts:916 now compares pushed + reserved + pendingDetailJobs.length against maxResults instead of just pendingDetailJobs.length. Same I7 bug from the v1.1.0 audit, different code path that had been missed.

Input contract

geo runtime default: 'TOTALJOBS'. Was DEFAULTS.geo (= 'DE') inherited from a shared constants file. API/CLI callers omitting geo now match the schema's "Fixed to Totaljobs (UK)" promise.
includeDetails runtime default: true. Was false while the schema advertised true, so Console UI users got details but API/CLI callers without the field got SERP-only data. The Apify quality-test guard lives in prefill: false (separate field), unchanged.
Drift-audit test added — parses main.ts for runtime defaults and compares to input_schema.json.

Diagnostics

Scrape.do SERP failures now record block signals. A 403/429/transport block on the optimised SERP path used to surface as "no signals + 0 pushed" in run summary, indistinguishable from an empty query.
Failure diagnostics on early exit / thrown errors. Validation failures (missing query, invalid geo) and unhandled exceptions now write a FAILURE_DIAGNOSTICS KV record so consumers can tell a clean run from one that never started.

1.2.0 — 2026-05-04

Performance

Inline detail fetches now run with bounded concurrency (playwrightCrawler.ts). Each SERP page used to fetch its 25 details one-at-a-time; now bounded by min(maxConcurrency, 8) per SERP page. End-to-end inline-detail throughput up ~4-8× on default settings.
Firefox detail crawler reuses browser contexts across jobs (up to 10 uses per context, then auto-recycled). Previously created and immediately closed a fresh context per job (and per retry), which OOM'd on 500+-job runs and added ~1-2s of overhead per fetch. Retries still get a fresh context to avoid keeping a flagged identity.
Scrape.do retry backoff now jittered (±20% uniform). Multiple actors retrying at the same tick produced thundering-herd spikes against the proxy; jitter spreads the second wave.

Schema

dataset_schema.json drift fixed: descriptionHtml, descriptionMarkdown, contentHash, changeType were missing from the "all" view's transformation fields. Display labels added for 11 lifecycle/repost/extraction fields that previously appeared with raw key names. A drift-audit test guards against future regressions.

1.1.0 — 2026-05-03

Critical fixes

Pagination cap removed: maxPages was previously clamped to 1 in the browser-fetch path, so users requesting maxPages=10 got only the first page of results. The clamp was a leftover speculative optimization; the SERP path is owned by CheerioCrawler/Playwright and pagination now runs end-to-end. ([reported by @cleme_ntino])
Pass-2 escalation no longer double-bills: pendingDetailJobs is now cleared after each detail phase. Previously, escalating from datacenter to residential proxy re-pushed the entire pass-1 set against the cap, billing users twice for the same listings.
Detail retry no longer false-fails: Firefox detail crawler used to retry whenever description was empty, even when the JSON-LD JobPosting block was present and complete. Retries now only fire when JSON-LD is entirely missing.
Incremental state isolation: Two runs with identical query+geo+location but different age/radius/contractType/etc. used to share state, silently suppressing fresh hits in run B. Filter dimensions are now hashed into the state-key prefix.
Phone & URL extraction now read post-format text (no longer broken by HTML→Markdown conversion); email extraction reads raw HTML (so mailto: anchors aren't lost). The previous code did the opposite of both.
changeType: 'NEW' now wired across all detail engines (Firefox, Playwright, browser-fetch, Scrape.do). Was missing on multiple paths, so incremental subscribers couldn't tell new from existing items.
contentHash is now null-safe: previously a missing field would throw inside the SHA-256 hashing call.
Lock-acquisition errors no longer mask root cause: Actor.fail() was being thrown awkwardly during state-lock acquisition, swallowing the underlying error message. Now throws a plain Error.
State lock always released on failure: try/catch added around the main run body so a crash mid-run still releases the lock instead of holding it for the full TTL.

Important fixes

Currency mapping is now geo-aware: UK GBP, EU EUR, ZA ZAR. Salaries from JSON-LD without explicit currency previously defaulted to EUR for everything.
Scrape.do success criterion fixed: changed from html.length > 5000 to JSON-LD presence check. Long block pages used to count as success; legitimate compact templates used to count as failure.
Telegram/WhatsApp message splits at semantic boundaries: notifications now split at \n\n boundaries before falling back to hard slices, preventing job entries from being chopped mid-sentence.
Notification dispatch gated on success: previously dispatched even when the run had failed mid-way.
Detail uniqueKey discriminated by pass: pass-2 detail retries used to be deduped against pass-1 entries by Crawlee's RequestQueue, making escalation a no-op. UniqueKey now includes the pass label.
startUrls hostname validation: invalid hostnames are rejected up front instead of failing mid-run with a confusing error.
onDetailJob cap math: pass-2 escalation now correctly accounts for pushed + reserved + pendingDetailJobs.length against maxResults.
stateStoreName default: was "stepstone-state" in code but "totaljobs-state" in input_schema.json. Aligned to "totaljobs-state".

Compact output

salaryMin / salaryMax added to compact field set (essential for AI-agent salary filtering).

Operational

Default memory bumped from 1024MB → 2048MB; default timeout 300s → 1800s. Browser detail paths previously OOM'd on larger runs and timed out on maxPages>5.

1.0.x — 2026-04-30

Fixed: startUrls now processes all URLs in the array. Previously the optimized SERP path only used the first URL; subsequent URLs were silently ignored. Each URL is now its own pagination universe with shared dedup + maxResults cap.

0.1.x — 2026-04-14

Added: descriptionHtml, descriptionMarkdown output fields (triple-format descriptions for RAG/LLM pipelines)
Added: contentHash output field (SHA-256 fingerprint of content-identifying fields)

0.1.x — 2026-04-14

Added: cross-run repost detection (isRepost, repostOfId, repostDetectedAt)
Added: skipReposts input to exclude detected reposts from output

1.0.0 — 2026-03-26

Added

Initial release
Search UK job listings on totaljobs.com by keyword, location, and filters
Salary data, full descriptions, company profiles, and contact info
Detail enrichment with apply URLs and employer metadata
Incremental mode with change detection
Compact output mode for AI-agent and MCP workflows

Totaljobs UK Job Scraper

janbruinier/jan-totaljobs-scraper

Scrape job listings from Totaljobs.com. Extract job titles, salaries, locations, descriptions, and company info for UK job market analysis.

Jan Bruinier

Naukri Job Scraper — India’s Largest Job Board

blackfalcondata/naukri-jobs-feed

Scrape naukri.com - India's largest job board with 500K+ active listings. Salary data, recruiter contact details, skill requirements, and company profiles. Incremental mode detects new and changed listings. Compact output for AI agents and MCP workflows.

Black Falcon Data

583

5.0

Totaljobs Job Scraper — Search & Extract UK Job Listings

unfenced-group/totaljobs-scraper

Extract job listings from Totaljobs.com. Search by keyword and location across the UK. Returns structured job data including title, company, salary, location, and publication date. No API key needed.

Unfenced Group

Totaljobs Scraper

lexis-solutions/totaljobs-scraper

The Totaljobs scraper is a web scraping tool that retrieves job postings from Totaljobs, a job search website in the UK.

Lexis Solutions

4.0

RemoteOK Scraper - Remote Job Listings

blackfalcondata/remoteok-scraper

Scrape remoteok.com - the world’s largest remote job board. Filter by skill tag, location, or company. Salary range and direct apply URL on every result. Incremental mode detects new and changed listings. Compact output for AI agents and MCP workflows.

Black Falcon Data

Dice.com Job Scraper - U.S. Tech Jobs

blackfalcondata/dice-com-job-scraper

Scrape dice.com - the leading U.S. tech job board. Structured salary (min/max/currency), tech-focused search filters, and job change monitoring. Incremental mode detects new and changed listings. Compact output for AI agents and MCP workflows.

Black Falcon Data

Totaljobs Listing Scraper

powerbox/totaljobs-listing-scraper

Scrape job listings from totaljobs.com by providing a search URL, with automatic pagination and comprehensive job data extraction.

PowerBox

TotalJobs Scraper 🎯

easyapi/totaljobs-scraper

Scrape job listings from TotalJobs.com. Extract detailed job information including titles, companies, locations, salaries, and more. Perfect for job market analysis, recruitment research, and tracking employment trends.

EasyApi

Hellowork Jobs Scraper — France Job Listings

blackfalcondata/hellowork-scraper

Scrape hellowork.com — France’s major job board. Salary ranges, location data, and incremental change tracking. Detects new and changed listings across runs. Compact output for AI agents and MCP workflows.

Black Falcon Data

Duunitori Scraper - Finland Job Listings

blackfalcondata/duunitori-scraper

Scrape duunitori.fi - Finland’s largest job board with 22,000+ listings. Salary ranges, employment types, and employer details. Incremental mode detects new and changed listings. Compact output for AI agents and MCP workflows.

Black Falcon Data

Totaljobs Scraper - UK’s Largest Job Board

Changelog

2026-05-14

1.3.1 — 2026-05-04

Critical billing — caught by external review of v1.3.0 before ship

Diagnostics

State store

1.3.0 — 2026-05-04

Critical correctness — push contract

State model

Scrape.do detail engine — bugs from the v1.2.0 audit pass

Cap math

Input contract

Diagnostics

1.2.0 — 2026-05-04

Performance

Schema

1.1.0 — 2026-05-03

Critical fixes

Important fixes

Compact output

Operational

1.0.x — 2026-04-30

0.1.x — 2026-04-14

0.1.x — 2026-04-14

1.0.0 — 2026-03-26

Added

You might also like

Totaljobs UK Job Scraper

Naukri Job Scraper — India’s Largest Job Board

Totaljobs Job Scraper — Search & Extract UK Job Listings

Totaljobs Scraper

RemoteOK Scraper - Remote Job Listings

Dice.com Job Scraper - U.S. Tech Jobs

Totaljobs Listing Scraper

TotalJobs Scraper 🎯

Hellowork Jobs Scraper — France Job Listings

Duunitori Scraper - Finland Job Listings

Changelog

2026-05-14

1.3.1 — 2026-05-04

Critical billing — caught by external review of v1.3.0 before ship

Diagnostics

State store

1.3.0 — 2026-05-04

Critical correctness — push contract

State model

Scrape.do detail engine — bugs from the v1.2.0 audit pass

Cap math

Input contract

Diagnostics

1.2.0 — 2026-05-04

Performance

Schema

1.1.0 — 2026-05-03

Critical fixes

Important fixes

Compact output

Operational

1.0.x — 2026-04-30

0.1.x — 2026-04-14

0.1.x — 2026-04-14

1.0.0 — 2026-03-26

Added