# Job Hunt Automation (`scrapyspider/job-hunt-automation`) Actor

Aggregates and normalizes job listings from multiple Apify job scraper datasets. Deduplicates by URL and outputs clean, structured data ready for CRM import or further processing.

- **URL**: https://apify.com/scrapyspider/job-hunt-automation.md
- **Developed by:** [ScrapySpider](https://apify.com/scrapyspider) (community)
- **Categories:** Jobs
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### Job Hunt Automation

Aggregate, normalize, and deduplicate job listings from multiple Apify job scraper dataset runs into one clean, structured dataset — ready for analysis, CRM import, or further processing.

- Aggregates job listings from multiple scraper runs into one clean dataset
- Normalizes data from different job boards (Indeed, LinkedIn, Glassdoor, etc.) into a standard format
- Deduplicates by job URL — no duplicate entries in your output
- Handles missing or partial data gracefully with safe defaults
- Clean JSON output with 11 structured fields per job listing
- Works with any Apify job scraper that outputs standard job listing fields
- Lightweight — reads from existing datasets, no web scraping performed

#### What data does it extract?

Each job listing in the output contains 11 structured fields:

**Job details:** title, company, location, remote (Yes/No), jobType, salary, postedDate

**Links:** applyUrl (direct application link), jobUrl (original listing URL)

**Metadata:** source (job board name), description (full job description text)

All results can be exported as JSON, CSV, or Excel from the Apify dataset.

#### Use cases

- **Job search automation:** Aggregate results from multiple scraper runs across different job boards into a single, deduplicated master list
- **Recruitment pipeline:** Normalize candidate job postings from Indeed, LinkedIn, Glassdoor, and other sources into one consistent format
- **Market research:** Analyze job market trends by aggregating and comparing listings across platforms and regions
- **Job board monitoring:** Combine daily scraping results and track unique new listings over time without duplicates
- **Data enrichment:** Create clean, deduplicated master datasets from raw scraper outputs for downstream processing or CRM import

#### How to use

1. Click **Try for free** on this Actor's page
2. In the **Input** tab, enter one or more dataset IDs from your Apify job scraper runs
3. Choose whether to deduplicate by URL (enabled by default — recommended)
4. Click **Start** and wait for the run to complete
5. View results in the **Output** tab
6. Download your results as JSON, CSV, or Excel

#### Input parameters

| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| `datasetIds` | Array of strings | Yes | — | One or more dataset IDs from Apify job scraper runs. Find these in the run details of your scraper Actors. |
| `deduplicateByUrl` | Boolean | No | `true` | Remove duplicate job listings that share the same `jobUrl`. Recommended to keep enabled. |

#### Output example

```json
{
  "title": "Senior Software Engineer",
  "company": "Acme Corp",
  "location": "San Francisco, CA",
  "remote": "Yes",
  "jobType": "Full-time",
  "salary": "$120,000 - $150,000 a year",
  "postedDate": "2024-01-15",
  "applyUrl": "https://company.com/careers/apply/12345",
  "jobUrl": "https://www.indeed.com/viewjob?jk=abc123",
  "source": "Indeed",
  "description": "We are looking for a Senior Software Engineer to join our platform team. You will design and build scalable backend services..."
}
````

#### Pricing

This Actor is **free to use** — you only pay for Apify platform compute time.

Job Hunt Automation is very lightweight. It reads from existing Apify datasets and writes normalized results to the output dataset. No web scraping, browser rendering, or proxy usage is involved, so runs cost minimal platform credits.

New Apify accounts receive $5 in free credits — enough for thousands of normalization runs.

#### Technical notes

- **No web scraping:** Reads from existing Apify datasets only — no HTTP requests to external websites
- **Compatible sources:** Works with job data from Indeed, LinkedIn, Glassdoor, and other job board scrapers available on Apify
- **Missing fields:** Handles incomplete data gracefully — missing fields default to empty strings
- **Deduplication logic:** Uses exact `jobUrl` string matching to identify and remove duplicates
- **Lightweight execution:** Minimal compute and memory usage — processes thousands of listings in seconds

#### Integrations

Outputs are compatible with standard Apify integrations:

- **Make (formerly Integromat):** Use the Apify module to trigger runs and route job data to any destination
- **Zapier:** Connect Actor runs to 5,000+ apps via the Apify Zapier integration
- **n8n:** Self-host and automate workflows via the Apify n8n node
- **REST API:** Run the Actor programmatically and fetch results via the [Apify API](https://docs.apify.com/api/v2)

#### Support

Have questions, found a bug, or need a custom integration? Reach out:

- **Email:** <ScrapySpider@protonmail.com>
- **Website:** [ScrapySpider.com](https://ScrapySpider.com)
- **Apify:** Open a support issue on this Actor's page
- **Response time:** Within 24–48 hours on weekdays

# Actor input Schema

## `datasetIds` (type: `array`):

Array of Apify dataset IDs from job scraper runs. Each dataset should contain job listing data with fields like title, companyName, location, jobUrl, etc.

## `deduplicateByUrl` (type: `boolean`):

When enabled, removes duplicate job listings that share the same jobUrl. Recommended when aggregating from multiple sources.

## Actor input object example

```json
{
  "datasetIds": [
    "RiGJfaWtJFqVH3rGi"
  ],
  "deduplicateByUrl": true
}
```

# Actor output Schema

## `jobs` (type: `string`):

Aggregated, normalized, and deduplicated job listings from all input datasets.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "datasetIds": [
        "RiGJfaWtJFqVH3rGi"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("scrapyspider/job-hunt-automation").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "datasetIds": ["RiGJfaWtJFqVH3rGi"] }

# Run the Actor and wait for it to finish
run = client.actor("scrapyspider/job-hunt-automation").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "datasetIds": [
    "RiGJfaWtJFqVH3rGi"
  ]
}' |
apify call scrapyspider/job-hunt-automation --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=scrapyspider/job-hunt-automation",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Job Hunt Automation",
        "description": "Aggregates and normalizes job listings from multiple Apify job scraper datasets. Deduplicates by URL and outputs clean, structured data ready for CRM import or further processing.",
        "version": "1.0",
        "x-build-id": "DDLclNWFLPRw7i92u"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/scrapyspider~job-hunt-automation/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-scrapyspider-job-hunt-automation",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/scrapyspider~job-hunt-automation/runs": {
            "post": {
                "operationId": "runs-sync-scrapyspider-job-hunt-automation",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/scrapyspider~job-hunt-automation/run-sync": {
            "post": {
                "operationId": "run-sync-scrapyspider-job-hunt-automation",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "datasetIds"
                ],
                "properties": {
                    "datasetIds": {
                        "title": "Dataset IDs",
                        "type": "array",
                        "description": "Array of Apify dataset IDs from job scraper runs. Each dataset should contain job listing data with fields like title, companyName, location, jobUrl, etc.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "deduplicateByUrl": {
                        "title": "Deduplicate by Job URL",
                        "type": "boolean",
                        "description": "When enabled, removes duplicate job listings that share the same jobUrl. Recommended when aggregating from multiple sources.",
                        "default": true
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
