# CWJobs Scraper: UK Tech Jobs, Salaries & Geo (`getascraper/cwjobs-scraper`) Actor

Scrape every UK tech / IT job on cwjobs.co.uk. Extract titles, employers with logos, full JobPosting JSON-LD, parsed salary bands (min/max/period), geo-coords (lat/lng), industries, and posting dates. Auto-paginate listings or paste direct detail URLs. $1.50 per 1,000 jobs.

- **URL**: https://apify.com/getascraper/cwjobs-scraper.md
- **Developed by:** [GetAScraper](https://apify.com/getascraper) (community)
- **Categories:** Jobs, Lead generation, Social media
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.50 / 1,000 jobs

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## CWJobs Scraper: UK Tech Jobs, Salaries & Geo

**Scrape every UK tech and IT job on cwjobs.co.uk, with full JobPosting JSON-LD, parsed salary bands, employer logos, and lat/lng geo-coords, on a schedule or on demand.** This Apify Actor pulls structured recruitment data from one of the UK's largest IT job boards, built on the StepStone group stack, and delivers clean JSON rows you can drop straight into Sheets, Airtable, HubSpot, BigQuery, or a CRM.

Built for recruiters sourcing UK tech talent, B2B sales teams building hiring-intent lead lists, salary benchmarking researchers, and AI/ML teams training job-classification models on real UK labour-market data.

---

### What does CWJobs Scraper do?

CWJobs Scraper is an HTTP-only Apify Actor that crawls the public, server-rendered HTML on cwjobs.co.uk, the UK IT recruitment board owned by the StepStone group (sister site of totaljobs.com and jobsite.co.uk). It auto-paginates listing pages, parses full detail pages, and emits one clean JSON row per job, with structured fields ready for downstream tooling.

- **Two-stage crawl.** Listings emit the search-result card fields and enqueue detail URLs. Detail pages pull the full `JobPosting` JSON-LD block, including description HTML, salary range, employer, location with lat/lng, and applicant location requirements.
- **URL builder from keywords and locations.** Pass `javascript` + `london` and the Actor builds `/jobs/javascript/in-london/` automatically. No URL crafting required.
- **Structured salary extraction.** The Actor parses the `baseSalary` block from JSON-LD first (handles `minValue`, `maxValue`, `unitText`), then falls back to a multi-pattern regex over the rendered salary text (`£40,000 - £65,000 per annum`, `£40k - £65k`, single amounts). Currency is always GBP.
- **Five pre-filtered dataset views.** Overview, B2B Leads, Salary Benchmark, Remote / Hybrid, and Newest Postings. Each view is curated for a different downstream workflow.
- **Run summary in the Key-Value Store.** Total jobs, unique employers, top companies by job count, top locations, salary distribution, and the exact search URLs used, all written to `run-summary`.

---

### Why use CWJobs Scraper?

CWJobs indexes over 30,000 UK IT and tech vacancies at any given time. It is the primary feed for many UK tech recruiters because of the StepStone group's strong SEO presence and direct employer postings. Pulling this data reliably is non-trivial because of the Akamai Web Application Firewall in front of the site.

This Actor handles that for you:

1. **Bypass Akamai WAF reliably.** All requests route through Apify Residential proxies with a forced UK geolocation. Datacenter IPs get blocked at the edge. The Actor retries transient 403s with rotated sessions.
2. **Skip the browser overhead.** Because CWJobs is server-rendered HTML with a JSON-LD block on every detail page, there is zero need for Playwright or Puppeteer. Crawls run 10x faster and 10x cheaper than equivalent browser-based scrapers.
3. **Drop-in for downstream tools.** The Output tab exposes six ready-to-use links: full dataset (JSON/CSV/Excel), the run summary record, and four filtered dataset views.
4. **Built-in salary benchmarking.** Structured `salary: { min, max, currency, period }` on every row makes the dataset directly usable for compensation research, market-rate dashboards, or RAG chatbots.
5. **Schedule or run on demand.** Run once for a snapshot, schedule hourly for live job alerts, or trigger via the Apify API for webhook-driven workflows.

---

### How to scrape CWJobs data

1. Open the **Input** tab in Apify Console.
2. Add one or more **Keywords** (for example `javascript`, `python-developer`, `devops`, `data-scientist`, `cyber-security`).
3. Add one or more **Locations** (for example `london`, `manchester`, `edinburgh`, `bristol`, `birmingham`, `remote`).
4. (Optional) Set **Contract Type** to `permanent`, `contract`, `temporary`, or `part-time`. Leave on `all` for the full set.
5. (Optional) Toggle **Remote only** to restrict to fully remote roles. This is mutually exclusive with the `locations` field.
6. (Optional) Cap the run with **Max Jobs per Search**. Default is 100 per search URL.
7. (Optional) Toggle **Include full description** off to emit leaner B2B lead rows (~1 KB each) without the description HTML.
8. Click **Start**. The Actor emits one row per job and writes a summary record.

For advanced users, paste raw CWJobs listing URLs (for example `https://www.cwjobs.co.uk/jobs/javascript/in-london`) or detail URLs into **Start URLs** to bypass the URL builder entirely.

---

### Input

| Field | Type | Description | Default |
| --- | --- | --- | --- |
| `startUrls` | array | Optional. Paste CWJobs listing or detail URLs. If provided, takes precedence over the keyword and location filters. | `[]` |
| `keywords` | array | Job titles, skills, or categories. Each becomes a `/jobs/<keyword>/` path. Examples: `javascript`, `python-developer`, `devops`, `data-scientist`, `cyber-security`, `software-development`. Leave empty to skip keyword filter. | `["javascript", "devops"]` |
| `locations` | array | UK cities or regions. Each becomes a `/jobs/in-<location>/` path. Known slugs: `london`, `manchester`, `edinburgh`, `bristol`, `birmingham`, `glasgow`, `leeds`, `liverpool`, `newcastle`, `nottingham`, `sheffield`, `southampton`, `cambridge`, `oxford`, `brighton`, `reading`, `cardiff`, `belfast`, `aberdeen`, `remote`, `central-london`, `city-of-london`. Leave empty for all UK. | `["london"]` |
| `contractType` | string | Filter by employment contract type. Omit for both permanent and contract. | `"all"` |
| `remoteOnly` | boolean | If true, restrict to fully remote roles (`jobLocationType=TELECOMMUTE`). Adds the `/in-remote/` path. Mutually exclusive with the `locations` filter. | `false` |
| `maxItems` | integer | Maximum job rows to emit per search URL. Direct detail URLs always emit 1 row. | `100` |
| `includeDescription` | boolean | If true (default), emit the full HTML job description (~5 to 10 KB per row). Set false for lean B2B lead rows (~1 KB each, 5x cheaper to store). | `true` |
| `dateWithinDays` | integer | Optional client-side filter. Only emit jobs whose `datePosted` is within the last N days. `0` = no filter. Useful for freshness-focused job alerts. | `0` |
| `maxConcurrency` | integer | Parallel HTTP requests for detail-page fetches. CWJobs is open with residential GB; 4 to 8 is comfortable. | `4` |
| `maxRequestRetries` | integer | Per-URL retry budget on transient errors and 5xx/403. Each retry rotates the proxy session. | `3` |
| `proxyConfiguration` | object | Apify Residential GB is recommended. Datacenter IPs are blocked by Akamai WAF. | `RESIDENTIAL` + `GB` |

---

### Output example

Each dataset item represents a single job vacancy. The `description` field is the full HTML body from the `JobPosting` JSON-LD block. Set `includeDescription: false` to omit it.

```json
{
    "rowType": "job",
    "listingUrl": "https://www.cwjobs.co.uk/jobs/javascript/in-london",
    "jobId": "232145678",
    "jobUrl": "https://www.cwjobs.co.uk/job/senior-javascript-developer/acme-tech-job232145678",
    "title": "Senior JavaScript Developer",
    "description": "<p>Acme Tech is hiring a Senior JavaScript Developer to join our London team...</p>",
    "datePosted": "2026-06-05",
    "validThrough": "2026-07-05",
    "employmentType": "FULL_TIME",
    "industry": "Information Technology",
    "directApply": true,
    "jobLocationType": null,
    "applicantLocationRequirements": [],
    "employer": {
        "name": "Acme Tech Ltd",
        "url": "https://www.cwjobs.co.uk/companies/acme-tech",
        "logoUrl": "https://www.cwjobs.co.uk/logos/acme-tech-200x200.png"
    },
    "location": {
        "text": "London, City of London",
        "locality": "London",
        "region": "Greater London",
        "postalCode": "EC2N 4AY",
        "country": "GB",
        "lat": 51.5155,
        "lng": -0.0922
    },
    "salary": {
        "rawText": "£40,000 - £65,000 per annum",
        "min": 40000,
        "max": 65000,
        "currency": "GBP",
        "period": "annum"
    },
    "applyType": "internal",
    "scrapedAt": "2026-06-06T12:00:00.000Z"
}
````

***

### Data fields

| Field name | Format | Description |
| --- | --- | --- |
| `rowType` | text | Always `"job"`. Useful for mixing job rows with summary records. |
| `jobId` | text | Unique CWJobs job ID parsed from the detail URL. |
| `jobUrl` | link | Direct canonical URL of the vacancy. |
| `title` | text | Job title as posted by the employer. |
| `description` | text | Full HTML job description from the JSON-LD block. Omit by setting `includeDescription: false`. |
| `datePosted` | date | ISO 8601 posting date. |
| `validThrough` | date | ISO 8601 expiry date, or `null` if not specified. |
| `employmentType` | text | One of `FULL_TIME`, `PART_TIME`, `CONTRACTOR`, `INTERN`, `TEMPORARY`, or `null`. |
| `industry` | text | Industry classification (for example `Information Technology`, `Financial Services`). |
| `directApply` | boolean | `true` if the posting supports direct apply through CWJobs. |
| `jobLocationType` | text | One of `TELECOMMUTE` (fully remote), `REMOTE` (remote with constraints), or `null` for on-site/hybrid. |
| `applicantLocationRequirements` | array | List of eligible countries for remote roles (for example `[{ "type": "Country", "name": "United Kingdom" }]`). Empty for on-site. |
| `employer` | object | `{ name, url, logoUrl }`. `logoUrl` may be `null` for employers that did not upload a logo. |
| `location` | object | Structured address with `{ text, locality, region, postalCode, country, lat, lng }`. Lat and lng are decimal degrees from the JSON-LD `geo` block. |
| `salary` | object | Parsed `{ rawText, min, max, currency, period }`. Currency is always `GBP`. Period is one of `annum`, `hour`, `day`, `week`, `month`, or `null`. |
| `applyType` | text | One of `internal` (apply through CWJobs), `external` (redirect to employer site), or `unknown`. |
| `listingUrl` | link | The search-results URL the job was discovered on. Useful for attribution. |
| `scrapedAt` | date | ISO 8601 timestamp of when the row was emitted. |

***

### Dataset views

The Output tab exposes five curated views on the same underlying dataset. Pick the view that matches your workflow.

| View | Use case | Key fields |
| --- | --- | --- |
| **Job Overview** | Full analysis. All 16 fields per row. | title, jobId, jobUrl, datePosted, validThrough, employmentType, industry, directApply, jobLocationType, salary, employer, location, applyType, description, listingUrl, scrapedAt |
| **B2B Leads** | CRM import into HubSpot, Salesforce, or Airtable. | title, employer, jobUrl, location, applyType, datePosted |
| **Salary Benchmark** | Compensation research and market-rate dashboards. | title, employer, location, salary, employmentType, datePosted |
| **Remote / Hybrid** | Distributed-work job searches. Filters on `jobLocationType` and `applicantLocationRequirements`. | title, employer, location, salary, jobLocationType, applicantLocationRequirements, jobUrl, datePosted |
| **Newest Postings** | Fresh job alerts. Sorted by `datePosted` descending. | title, employer, location, salary, jobUrl, datePosted |

Switch views from the Output tab. The underlying data is the same.

***

### Run summary (Key-Value Store)

Every run writes a single `run-summary` record to the default Key-Value Store. The summary contains aggregate stats useful for at-a-glance reports, no need to download the full dataset.

```json
{
    "totalJobs": 142,
    "uniqueEmployers": 87,
    "topCompanies": [
        { "name": "Acme Tech Ltd", "count": 4 },
        { "name": "Globex Corporation", "count": 3 }
    ],
    "topLocations": [
        { "name": "London", "count": 98 },
        { "name": "Manchester", "count": 22 }
    ],
    "salary": {
        "withMin": 89,
        "withMax": 78,
        "byPeriod": { "annum": 87, "day": 2 },
        "minAcrossAll": 22000,
        "maxAcrossAll": 140000
    },
    "remote": { "count": 18 },
    "contractTypes": { "FULL_TIME": 120, "CONTRACTOR": 18, "PART_TIME": 4 },
    "industries": { "Information Technology": 110, "Financial Services": 22 },
    "searchUrls": [
        "https://www.cwjobs.co.uk/jobs/javascript/in-london",
        "https://www.cwjobs.co.uk/jobs/devops/in-london"
    ]
}
```

***

### How much does it cost to scrape CWJobs?

**$1.50 per 1,000 results.** Charged per dataset row emitted, not per HTTP request.

Residential GB proxy traffic is the largest cost driver. A typical 100-row run with `includeDescription: true` lands in the $0.20 to $0.40 range on top of the per-result fee. Set `includeDescription: false` to cut storage and egress by 5x for B2B lead workflows.

The Actor runs in standby mode, so it stays warm between runs at no extra cost beyond the per-result fee.

***

### Tips and advanced options

- **Bypass the URL builder.** Paste any CWJobs URL into `startUrls` to take full control. Works with listing pages (`/jobs/.../in-.../`), direct detail URLs, or a mix.
- **Cap results per search.** Use `maxItems` to control run cost. The `recent` view is most useful when capped to the last 50 to 200 rows.
- **Lean rows for B2B leads.** Set `includeDescription: false` to drop the description HTML and shrink each row from ~10 KB to ~1 KB. The `leads` dataset view still has everything you need for CRM import.
- **Freshness filter.** Set `dateWithinDays: 7` to only emit jobs posted in the last week. Useful for hot-lead alerts.
- **Scale up safely.** With Apify Residential GB, concurrency up to `8` is comfortable. Above that, expect more 403s.
- **Persistent IDs.** The `jobId` is stable across rescrapes of the same detail URL. Use it as a primary key in your database to dedupe across runs.
- **Schedule it.** Run on a cron schedule (for example every 6 hours) and pipe the dataset to BigQuery, Postgres, or a webhook for live job alert feeds.

***

### FAQ

#### How does this scraper bypass the Akamai WAF?

All requests route through Apify Residential proxies with `countryCode: "GB"`. CWJobs sits behind Akamai (subnet `23.192.0.0/11`) and blocks datacenter IP ranges outright. The Actor also retries transient 403s with rotated proxy sessions. You can bump `maxRequestRetries` for noisier runs.

#### Do I need a login or API key?

No. All CWJobs vacancy listings are public and require no authentication or cookies. The detail pages render the full `JobPosting` JSON-LD block for any visitor.

#### Why is the description sometimes empty?

A small number of postings (typically under 2 percent) are routed through third-party ATS integrations that do not expose a full description on the CWJobs page. The detail URL is still valid, and the rest of the structured fields (title, employer, location, salary) are populated normally. Set `includeDescription: false` to skip these rows entirely if you do not need the description body.

#### Can I scrape contracts only?

Yes. Set `contractType: "contract"` to restrict to contract roles. The Actor appends `/contract/` to the path. The `employmentType` field on each row will reflect the contract type as posted.

#### Can I scrape remote roles only?

Yes. Toggle `remoteOnly: true`. The Actor appends `/in-remote/` to the path and the resulting rows will have `jobLocationType: TELECOMMUTE` (or `REMOTE`) set. Mutually exclusive with the `locations` field, so leave `locations` empty when you enable this.

#### Why is the `applyUrl` field missing?

CWJobs renders the apply button as an XHR-loaded placeholder rather than a direct link, so the platform itself does not expose an `applyUrl` in the JSON-LD block. To work around this, set `directApply: true` jobs go through CWJobs's own apply flow, and `directApply: false` jobs typically redirect to the employer's site. Neither competitor Actor on the Apify Store exposes an `applyUrl` field, so this is a platform gap, not a scraper gap.

#### How accurate is the salary extraction?

The Actor pulls `baseSalary` from the JSON-LD block first (the most reliable signal). For postings without a structured `baseSalary` (a minority), it falls back to regex matching on the rendered salary text, supporting `£X - £Y per annum`, `£Xk - £Yk`, and single-amount formats. All values are normalized to integers in GBP. The `rawText` field preserves the original posted string for auditing.

#### Does this work on StepStone group sister sites (totaljobs, jobsite)?

Not out of the box. The Akamai firewall and HTML structure differ across `totaljobs.com`, `jobsite.co.uk`, and `reed.co.uk`. Each sister site needs its own dedicated Actor. Use the search at the top of the Apify Store to find Actors for those sites.

***

### Disclaimers and support

This Actor is an independent web scraping tool and is not affiliated with, endorsed by, or sponsored by CWJobs, cwjobs.co.uk, the StepStone Group, or any of their subsidiaries or affiliates. All trademarks are the property of their respective owners.

The scraper accesses only the public, unauthenticated job listings of the CWJobs website, matching data the platform serves to any public user. Users are responsible for ensuring compliance with CWJobs Terms of Service, the UK Computer Misuse Act, GDPR, and any other applicable data regulations.

If you encounter issues or have custom requirements, please submit a report on the **Issues** tab. For custom scraping or dataset services, contact the author via their profile.

# Actor input Schema

## `startUrls` (type: `array`):

Optional. Paste CWJobs listing URLs (e.g. https://www.cwjobs.co.uk/jobs/javascript/in-london) or detail URLs. If provided, takes precedence over the keyword/location filters below.

## `keywords` (type: `array`):

Job titles, skills, or categories. Each becomes a /jobs/<keyword>/ path. Examples: javascript, python-developer, devops, data-scientist, cyber-security, software-development. Leave empty to skip keyword filter.

## `locations` (type: `array`):

UK cities or regions. Each becomes a /jobs/in-<location>/ path. Known slugs: london, manchester, edinburgh, bristol, birmingham, glasgow, leeds, liverpool, newcastle, nottingham, sheffield, southampton, cambridge, oxford, brighton, reading, cardiff, belfast, aberdeen, remote, central-london, city-of-london. Leave empty for all UK.

## `contractType` (type: `string`):

Filter by employment contract type. Omit for both permanent and contract.

## `remoteOnly` (type: `boolean`):

If true, restrict to fully remote roles (jobLocationType=TELECOMMUTE). Adds the /in-remote/ path. Mutually exclusive with the 'locations' filter.

## `maxItems` (type: `integer`):

Maximum job rows to emit per search URL. Direct detail URLs always emit 1 row. Default 100.

## `includeDescription` (type: `boolean`):

If true (default), emit the full HTML job description (~5-10 KB per row). Set false for lean B2B lead rows (~1 KB each, 5x cheaper to store).

## `dateWithinDays` (type: `integer`):

Optional client-side filter. Only emit jobs whose datePosted is within the last N days. 0 = no filter. Useful for freshness-focused job alerts.

## `maxConcurrency` (type: `integer`):

Parallel HTTP requests for detail-page fetches. CWjobs is open with residential GB; 4-8 is comfortable. Default 4.

## `maxRequestRetries` (type: `integer`):

Per-URL retry budget on transient errors and 5xx/403. Each retry rotates the proxy session. Default 3.

## `proxyConfiguration` (type: `object`):

Apify Residential GB is recommended. Datacenter IPs are blocked by Akamai WAF.

## Actor input object example

```json
{
  "startUrls": [],
  "keywords": [
    "javascript",
    "devops"
  ],
  "locations": [
    "london"
  ],
  "contractType": "all",
  "remoteOnly": false,
  "maxItems": 100,
  "includeDescription": true,
  "dateWithinDays": 0,
  "maxConcurrency": 4,
  "maxRequestRetries": 3,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ],
    "apifyProxyCountry": "GB"
  }
}
```

# Actor output Schema

## `results` (type: `string`):

One row per job. Download as JSON, CSV, or Excel from the Storage tab.

## `summary` (type: `string`):

Aggregate stats: total jobs, unique employers, salary range, top companies, top locations, search URLs used.

## `datasetOverview` (type: `string`):

Pre-filtered view with all 14 JobPosting fields plus salary and location. Best for full analysis.

## `datasetLeads` (type: `string`):

Compact view optimized for CRM import (HubSpot, Salesforce, Airtable).

## `datasetSalary` (type: `string`):

View focused on title, employer, location, and salary (min/max/period). Use for market-rate research.

## `datasetRemote` (type: `string`):

View of jobs with explicit remote / hybrid location types. Use for distributed-work job searches.

## `datasetRecent` (type: `string`):

All jobs sorted by date posted (newest first). Use for fresh job alerts.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [],
    "keywords": [
        "javascript",
        "devops"
    ],
    "locations": [
        "london"
    ],
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ],
        "apifyProxyCountry": "GB"
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("getascraper/cwjobs-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [],
    "keywords": [
        "javascript",
        "devops",
    ],
    "locations": ["london"],
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
        "apifyProxyCountry": "GB",
    },
}

# Run the Actor and wait for it to finish
run = client.actor("getascraper/cwjobs-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [],
  "keywords": [
    "javascript",
    "devops"
  ],
  "locations": [
    "london"
  ],
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ],
    "apifyProxyCountry": "GB"
  }
}' |
apify call getascraper/cwjobs-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=getascraper/cwjobs-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "CWJobs Scraper: UK Tech Jobs, Salaries & Geo",
        "description": "Scrape every UK tech / IT job on cwjobs.co.uk. Extract titles, employers with logos, full JobPosting JSON-LD, parsed salary bands (min/max/period), geo-coords (lat/lng), industries, and posting dates. Auto-paginate listings or paste direct detail URLs. $1.50 per 1,000 jobs.",
        "version": "0.2",
        "x-build-id": "sk1DF8OuMBql5afPq"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/getascraper~cwjobs-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-getascraper-cwjobs-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/getascraper~cwjobs-scraper/runs": {
            "post": {
                "operationId": "runs-sync-getascraper-cwjobs-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/getascraper~cwjobs-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-getascraper-cwjobs-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "startUrls": {
                        "title": "Start URLs (advanced)",
                        "type": "array",
                        "description": "Optional. Paste CWJobs listing URLs (e.g. https://www.cwjobs.co.uk/jobs/javascript/in-london) or detail URLs. If provided, takes precedence over the keyword/location filters below.",
                        "default": [],
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "keywords": {
                        "title": "Keywords",
                        "type": "array",
                        "description": "Job titles, skills, or categories. Each becomes a /jobs/<keyword>/ path. Examples: javascript, python-developer, devops, data-scientist, cyber-security, software-development. Leave empty to skip keyword filter.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "locations": {
                        "title": "Locations",
                        "type": "array",
                        "description": "UK cities or regions. Each becomes a /jobs/in-<location>/ path. Known slugs: london, manchester, edinburgh, bristol, birmingham, glasgow, leeds, liverpool, newcastle, nottingham, sheffield, southampton, cambridge, oxford, brighton, reading, cardiff, belfast, aberdeen, remote, central-london, city-of-london. Leave empty for all UK.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "contractType": {
                        "title": "Contract Type",
                        "enum": [
                            "all",
                            "permanent",
                            "contract",
                            "temporary",
                            "part-time"
                        ],
                        "type": "string",
                        "description": "Filter by employment contract type. Omit for both permanent and contract.",
                        "default": "all"
                    },
                    "remoteOnly": {
                        "title": "Remote only",
                        "type": "boolean",
                        "description": "If true, restrict to fully remote roles (jobLocationType=TELECOMMUTE). Adds the /in-remote/ path. Mutually exclusive with the 'locations' filter.",
                        "default": false
                    },
                    "maxItems": {
                        "title": "Max Jobs per Search",
                        "minimum": 1,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Maximum job rows to emit per search URL. Direct detail URLs always emit 1 row. Default 100.",
                        "default": 100
                    },
                    "includeDescription": {
                        "title": "Include full description",
                        "type": "boolean",
                        "description": "If true (default), emit the full HTML job description (~5-10 KB per row). Set false for lean B2B lead rows (~1 KB each, 5x cheaper to store).",
                        "default": true
                    },
                    "dateWithinDays": {
                        "title": "Posted within last N days",
                        "minimum": 0,
                        "maximum": 365,
                        "type": "integer",
                        "description": "Optional client-side filter. Only emit jobs whose datePosted is within the last N days. 0 = no filter. Useful for freshness-focused job alerts.",
                        "default": 0
                    },
                    "maxConcurrency": {
                        "title": "Max Concurrency",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Parallel HTTP requests for detail-page fetches. CWjobs is open with residential GB; 4-8 is comfortable. Default 4.",
                        "default": 4
                    },
                    "maxRequestRetries": {
                        "title": "Max Request Retries",
                        "minimum": 0,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Per-URL retry budget on transient errors and 5xx/403. Each retry rotates the proxy session. Default 3.",
                        "default": 3
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Apify Residential GB is recommended. Datacenter IPs are blocked by Akamai WAF.",
                        "default": {
                            "useApifyProxy": true,
                            "apifyProxyGroups": [
                                "RESIDENTIAL"
                            ],
                            "apifyProxyCountry": "GB"
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
