CWJobs Scraper: UK Tech Jobs, Salaries & Geo
Pricing
from $1.50 / 1,000 jobs
CWJobs Scraper: UK Tech Jobs, Salaries & Geo
Scrape every UK tech / IT job on cwjobs.co.uk. Extract titles, employers with logos, full JobPosting JSON-LD, parsed salary bands (min/max/period), geo-coords (lat/lng), industries, and posting dates. Auto-paginate listings or paste direct detail URLs. $1.50 per 1,000 jobs.
Pricing
from $1.50 / 1,000 jobs
Rating
0.0
(0)
Developer
GetAScraper
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Scrape every UK tech and IT job on cwjobs.co.uk, with full JobPosting JSON-LD, parsed salary bands, employer logos, and lat/lng geo-coords, on a schedule or on demand. This Apify Actor pulls structured recruitment data from one of the UK's largest IT job boards, built on the StepStone group stack, and delivers clean JSON rows you can drop straight into Sheets, Airtable, HubSpot, BigQuery, or a CRM.
Built for recruiters sourcing UK tech talent, B2B sales teams building hiring-intent lead lists, salary benchmarking researchers, and AI/ML teams training job-classification models on real UK labour-market data.
What does CWJobs Scraper do?
CWJobs Scraper is an HTTP-only Apify Actor that crawls the public, server-rendered HTML on cwjobs.co.uk, the UK IT recruitment board owned by the StepStone group (sister site of totaljobs.com and jobsite.co.uk). It auto-paginates listing pages, parses full detail pages, and emits one clean JSON row per job, with structured fields ready for downstream tooling.
- Two-stage crawl. Listings emit the search-result card fields and enqueue detail URLs. Detail pages pull the full
JobPostingJSON-LD block, including description HTML, salary range, employer, location with lat/lng, and applicant location requirements. - URL builder from keywords and locations. Pass
javascript+londonand the Actor builds/jobs/javascript/in-london/automatically. No URL crafting required. - Structured salary extraction. The Actor parses the
baseSalaryblock from JSON-LD first (handlesminValue,maxValue,unitText), then falls back to a multi-pattern regex over the rendered salary text (£40,000 - £65,000 per annum,£40k - £65k, single amounts). Currency is always GBP. - Five pre-filtered dataset views. Overview, B2B Leads, Salary Benchmark, Remote / Hybrid, and Newest Postings. Each view is curated for a different downstream workflow.
- Run summary in the Key-Value Store. Total jobs, unique employers, top companies by job count, top locations, salary distribution, and the exact search URLs used, all written to
run-summary.
Why use CWJobs Scraper?
CWJobs indexes over 30,000 UK IT and tech vacancies at any given time. It is the primary feed for many UK tech recruiters because of the StepStone group's strong SEO presence and direct employer postings. Pulling this data reliably is non-trivial because of the Akamai Web Application Firewall in front of the site.
This Actor handles that for you:
- Bypass Akamai WAF reliably. All requests route through Apify Residential proxies with a forced UK geolocation. Datacenter IPs get blocked at the edge. The Actor retries transient 403s with rotated sessions.
- Skip the browser overhead. Because CWJobs is server-rendered HTML with a JSON-LD block on every detail page, there is zero need for Playwright or Puppeteer. Crawls run 10x faster and 10x cheaper than equivalent browser-based scrapers.
- Drop-in for downstream tools. The Output tab exposes six ready-to-use links: full dataset (JSON/CSV/Excel), the run summary record, and four filtered dataset views.
- Built-in salary benchmarking. Structured
salary: { min, max, currency, period }on every row makes the dataset directly usable for compensation research, market-rate dashboards, or RAG chatbots. - Schedule or run on demand. Run once for a snapshot, schedule hourly for live job alerts, or trigger via the Apify API for webhook-driven workflows.
How to scrape CWJobs data
- Open the Input tab in Apify Console.
- Add one or more Keywords (for example
javascript,python-developer,devops,data-scientist,cyber-security). - Add one or more Locations (for example
london,manchester,edinburgh,bristol,birmingham,remote). - (Optional) Set Contract Type to
permanent,contract,temporary, orpart-time. Leave onallfor the full set. - (Optional) Toggle Remote only to restrict to fully remote roles. This is mutually exclusive with the
locationsfield. - (Optional) Cap the run with Max Jobs per Search. Default is 100 per search URL.
- (Optional) Toggle Include full description off to emit leaner B2B lead rows (~1 KB each) without the description HTML.
- Click Start. The Actor emits one row per job and writes a summary record.
For advanced users, paste raw CWJobs listing URLs (for example https://www.cwjobs.co.uk/jobs/javascript/in-london) or detail URLs into Start URLs to bypass the URL builder entirely.
Input
| Field | Type | Description | Default |
|---|---|---|---|
startUrls | array | Optional. Paste CWJobs listing or detail URLs. If provided, takes precedence over the keyword and location filters. | [] |
keywords | array | Job titles, skills, or categories. Each becomes a /jobs/<keyword>/ path. Examples: javascript, python-developer, devops, data-scientist, cyber-security, software-development. Leave empty to skip keyword filter. | ["javascript", "devops"] |
locations | array | UK cities or regions. Each becomes a /jobs/in-<location>/ path. Known slugs: london, manchester, edinburgh, bristol, birmingham, glasgow, leeds, liverpool, newcastle, nottingham, sheffield, southampton, cambridge, oxford, brighton, reading, cardiff, belfast, aberdeen, remote, central-london, city-of-london. Leave empty for all UK. | ["london"] |
contractType | string | Filter by employment contract type. Omit for both permanent and contract. | "all" |
remoteOnly | boolean | If true, restrict to fully remote roles (jobLocationType=TELECOMMUTE). Adds the /in-remote/ path. Mutually exclusive with the locations filter. | false |
maxItems | integer | Maximum job rows to emit per search URL. Direct detail URLs always emit 1 row. | 100 |
includeDescription | boolean | If true (default), emit the full HTML job description (~5 to 10 KB per row). Set false for lean B2B lead rows (~1 KB each, 5x cheaper to store). | true |
dateWithinDays | integer | Optional client-side filter. Only emit jobs whose datePosted is within the last N days. 0 = no filter. Useful for freshness-focused job alerts. | 0 |
maxConcurrency | integer | Parallel HTTP requests for detail-page fetches. CWJobs is open with residential GB; 4 to 8 is comfortable. | 4 |
maxRequestRetries | integer | Per-URL retry budget on transient errors and 5xx/403. Each retry rotates the proxy session. | 3 |
proxyConfiguration | object | Apify Residential GB is recommended. Datacenter IPs are blocked by Akamai WAF. | RESIDENTIAL + GB |
Output example
Each dataset item represents a single job vacancy. The description field is the full HTML body from the JobPosting JSON-LD block. Set includeDescription: false to omit it.
{"rowType": "job","listingUrl": "https://www.cwjobs.co.uk/jobs/javascript/in-london","jobId": "232145678","jobUrl": "https://www.cwjobs.co.uk/job/senior-javascript-developer/acme-tech-job232145678","title": "Senior JavaScript Developer","description": "<p>Acme Tech is hiring a Senior JavaScript Developer to join our London team...</p>","datePosted": "2026-06-05","validThrough": "2026-07-05","employmentType": "FULL_TIME","industry": "Information Technology","directApply": true,"jobLocationType": null,"applicantLocationRequirements": [],"employer": {"name": "Acme Tech Ltd","url": "https://www.cwjobs.co.uk/companies/acme-tech","logoUrl": "https://www.cwjobs.co.uk/logos/acme-tech-200x200.png"},"location": {"text": "London, City of London","locality": "London","region": "Greater London","postalCode": "EC2N 4AY","country": "GB","lat": 51.5155,"lng": -0.0922},"salary": {"rawText": "£40,000 - £65,000 per annum","min": 40000,"max": 65000,"currency": "GBP","period": "annum"},"applyType": "internal","scrapedAt": "2026-06-06T12:00:00.000Z"}
Data fields
| Field name | Format | Description |
|---|---|---|
rowType | text | Always "job". Useful for mixing job rows with summary records. |
jobId | text | Unique CWJobs job ID parsed from the detail URL. |
jobUrl | link | Direct canonical URL of the vacancy. |
title | text | Job title as posted by the employer. |
description | text | Full HTML job description from the JSON-LD block. Omit by setting includeDescription: false. |
datePosted | date | ISO 8601 posting date. |
validThrough | date | ISO 8601 expiry date, or null if not specified. |
employmentType | text | One of FULL_TIME, PART_TIME, CONTRACTOR, INTERN, TEMPORARY, or null. |
industry | text | Industry classification (for example Information Technology, Financial Services). |
directApply | boolean | true if the posting supports direct apply through CWJobs. |
jobLocationType | text | One of TELECOMMUTE (fully remote), REMOTE (remote with constraints), or null for on-site/hybrid. |
applicantLocationRequirements | array | List of eligible countries for remote roles (for example [{ "type": "Country", "name": "United Kingdom" }]). Empty for on-site. |
employer | object | { name, url, logoUrl }. logoUrl may be null for employers that did not upload a logo. |
location | object | Structured address with { text, locality, region, postalCode, country, lat, lng }. Lat and lng are decimal degrees from the JSON-LD geo block. |
salary | object | Parsed { rawText, min, max, currency, period }. Currency is always GBP. Period is one of annum, hour, day, week, month, or null. |
applyType | text | One of internal (apply through CWJobs), external (redirect to employer site), or unknown. |
listingUrl | link | The search-results URL the job was discovered on. Useful for attribution. |
scrapedAt | date | ISO 8601 timestamp of when the row was emitted. |
Dataset views
The Output tab exposes five curated views on the same underlying dataset. Pick the view that matches your workflow.
| View | Use case | Key fields |
|---|---|---|
| Job Overview | Full analysis. All 16 fields per row. | title, jobId, jobUrl, datePosted, validThrough, employmentType, industry, directApply, jobLocationType, salary, employer, location, applyType, description, listingUrl, scrapedAt |
| B2B Leads | CRM import into HubSpot, Salesforce, or Airtable. | title, employer, jobUrl, location, applyType, datePosted |
| Salary Benchmark | Compensation research and market-rate dashboards. | title, employer, location, salary, employmentType, datePosted |
| Remote / Hybrid | Distributed-work job searches. Filters on jobLocationType and applicantLocationRequirements. | title, employer, location, salary, jobLocationType, applicantLocationRequirements, jobUrl, datePosted |
| Newest Postings | Fresh job alerts. Sorted by datePosted descending. | title, employer, location, salary, jobUrl, datePosted |
Switch views from the Output tab. The underlying data is the same.
Run summary (Key-Value Store)
Every run writes a single run-summary record to the default Key-Value Store. The summary contains aggregate stats useful for at-a-glance reports, no need to download the full dataset.
{"totalJobs": 142,"uniqueEmployers": 87,"topCompanies": [{ "name": "Acme Tech Ltd", "count": 4 },{ "name": "Globex Corporation", "count": 3 }],"topLocations": [{ "name": "London", "count": 98 },{ "name": "Manchester", "count": 22 }],"salary": {"withMin": 89,"withMax": 78,"byPeriod": { "annum": 87, "day": 2 },"minAcrossAll": 22000,"maxAcrossAll": 140000},"remote": { "count": 18 },"contractTypes": { "FULL_TIME": 120, "CONTRACTOR": 18, "PART_TIME": 4 },"industries": { "Information Technology": 110, "Financial Services": 22 },"searchUrls": ["https://www.cwjobs.co.uk/jobs/javascript/in-london","https://www.cwjobs.co.uk/jobs/devops/in-london"]}
How much does it cost to scrape CWJobs?
$1.50 per 1,000 results. Charged per dataset row emitted, not per HTTP request.
Residential GB proxy traffic is the largest cost driver. A typical 100-row run with includeDescription: true lands in the $0.20 to $0.40 range on top of the per-result fee. Set includeDescription: false to cut storage and egress by 5x for B2B lead workflows.
The Actor runs in standby mode, so it stays warm between runs at no extra cost beyond the per-result fee.
Tips and advanced options
- Bypass the URL builder. Paste any CWJobs URL into
startUrlsto take full control. Works with listing pages (/jobs/.../in-.../), direct detail URLs, or a mix. - Cap results per search. Use
maxItemsto control run cost. Therecentview is most useful when capped to the last 50 to 200 rows. - Lean rows for B2B leads. Set
includeDescription: falseto drop the description HTML and shrink each row from ~10 KB to ~1 KB. Theleadsdataset view still has everything you need for CRM import. - Freshness filter. Set
dateWithinDays: 7to only emit jobs posted in the last week. Useful for hot-lead alerts. - Scale up safely. With Apify Residential GB, concurrency up to
8is comfortable. Above that, expect more 403s. - Persistent IDs. The
jobIdis stable across rescrapes of the same detail URL. Use it as a primary key in your database to dedupe across runs. - Schedule it. Run on a cron schedule (for example every 6 hours) and pipe the dataset to BigQuery, Postgres, or a webhook for live job alert feeds.
FAQ
How does this scraper bypass the Akamai WAF?
All requests route through Apify Residential proxies with countryCode: "GB". CWJobs sits behind Akamai (subnet 23.192.0.0/11) and blocks datacenter IP ranges outright. The Actor also retries transient 403s with rotated proxy sessions. You can bump maxRequestRetries for noisier runs.
Do I need a login or API key?
No. All CWJobs vacancy listings are public and require no authentication or cookies. The detail pages render the full JobPosting JSON-LD block for any visitor.
Why is the description sometimes empty?
A small number of postings (typically under 2 percent) are routed through third-party ATS integrations that do not expose a full description on the CWJobs page. The detail URL is still valid, and the rest of the structured fields (title, employer, location, salary) are populated normally. Set includeDescription: false to skip these rows entirely if you do not need the description body.
Can I scrape contracts only?
Yes. Set contractType: "contract" to restrict to contract roles. The Actor appends /contract/ to the path. The employmentType field on each row will reflect the contract type as posted.
Can I scrape remote roles only?
Yes. Toggle remoteOnly: true. The Actor appends /in-remote/ to the path and the resulting rows will have jobLocationType: TELECOMMUTE (or REMOTE) set. Mutually exclusive with the locations field, so leave locations empty when you enable this.
Why is the applyUrl field missing?
CWJobs renders the apply button as an XHR-loaded placeholder rather than a direct link, so the platform itself does not expose an applyUrl in the JSON-LD block. To work around this, set directApply: true jobs go through CWJobs's own apply flow, and directApply: false jobs typically redirect to the employer's site. Neither competitor Actor on the Apify Store exposes an applyUrl field, so this is a platform gap, not a scraper gap.
How accurate is the salary extraction?
The Actor pulls baseSalary from the JSON-LD block first (the most reliable signal). For postings without a structured baseSalary (a minority), it falls back to regex matching on the rendered salary text, supporting £X - £Y per annum, £Xk - £Yk, and single-amount formats. All values are normalized to integers in GBP. The rawText field preserves the original posted string for auditing.
Does this work on StepStone group sister sites (totaljobs, jobsite)?
Not out of the box. The Akamai firewall and HTML structure differ across totaljobs.com, jobsite.co.uk, and reed.co.uk. Each sister site needs its own dedicated Actor. Use the search at the top of the Apify Store to find Actors for those sites.
Disclaimers and support
This Actor is an independent web scraping tool and is not affiliated with, endorsed by, or sponsored by CWJobs, cwjobs.co.uk, the StepStone Group, or any of their subsidiaries or affiliates. All trademarks are the property of their respective owners.
The scraper accesses only the public, unauthenticated job listings of the CWJobs website, matching data the platform serves to any public user. Users are responsible for ensuring compliance with CWJobs Terms of Service, the UK Computer Misuse Act, GDPR, and any other applicable data regulations.
If you encounter issues or have custom requirements, please submit a report on the Issues tab. For custom scraping or dataset services, contact the author via their profile.