# Indeed Job Enrichment Automation (`scrapyspider/indeed-job-enrichment-automation`) Actor

Scrape Indeed jobs by category and country, discover official company websites, and enrich companies with Apollo.io decision-maker data in one workflow.

- **URL**: https://apify.com/scrapyspider/indeed-job-enrichment-automation.md
- **Developed by:** [ScrapySpider](https://apify.com/scrapyspider) (community)
- **Categories:** Jobs, Lead generation, Automation
- **Stats:** 1 total users, 0 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Indeed Job Enrichment Automation

An Apify Actor that automates job lead generation from Indeed by scraping job postings, discovering company websites via Google Search, and enriching company and decision-maker data using Apollo.io.

**Built with:** [Apify SDK](https://docs.apify.com/sdk/js/), [Crawlee](https://crawlee.dev/) (CheerioCrawler), Apollo.io API, and Google SERP proxy.

### 🚀 Features

- **Phase 1 - Job Scraping:** Scrapes jobs from Indeed using Apify's [Indeed Scraper](https://apify.com/hMvNSpz3JnHgl5jkh) for any job category
- **Phase 1.5 - Website Discovery:** Automatically searches Google to find official UK company websites for every job posting
- **Phase 2 - Data Enrichment:** Uses Apollo.io to enrich each company with:
  - Industry classification
  - Company LinkedIn profile
  - Decision Maker details (CEO/Director/Founder)
  - Verified email addresses and confidence scores
- **Structured Output:** All data is pushed to the Apify Dataset with multiple views for easy access

### 📂 Project Structure

```text
.actor/
    actor.json              ## Actor configuration and metadata
    input_schema.json       ## Input parameter definitions
    output_schema.json      ## Output view templates
    dataset_schema.json     ## Dataset field mappings and views
src/
    main.js                 ## Main orchestrator (Phases 1, 1.5, 2)
    routes.js               ## Phase 1: Indeed Scraper integration
    googleSearch.js         ## Phase 1.5: Google Search for websites
    apollo.js               ## Phase 2: Apollo.io API handler
jobs.json                   ## Job title configurations by category
Dockerfile                  ## Container image definition
package.json                ## Dependencies and scripts
````

### ⚙️ Workflow

#### Phase 1: Job Scraping

- Reads job titles from `jobs.json` based on the selected category
- Calls the Indeed Scraper Actor for each job title
- Collects all scraped jobs with company details

#### Phase 1.5: Website Discovery

- Uses Google Search (with SERP proxy) to find company websites
- Searches for "Company Name UK" and extracts the first valid result
- Filters out social media and aggregator sites (LinkedIn, Facebook, Indeed, etc.)
- Updates each job with the discovered company website

#### Phase 2: Data Enrichment

- Enriches companies using Apollo.io Organization API:
  - Gets industry classification
  - Gets company LinkedIn URL
  - Extracts primary domain
- Searches for decision makers (CEO, Founder, Managing Director, COO, Directors):
  - Retrieves decision maker name and title
  - Gets LinkedIn profile URL
  - Optionally extracts verified email addresses (if `extractEmails` is enabled)

#### Final Output

- Pushes enriched data to Apify Dataset with status tracking:
  - `Enriched`: Successfully enriched with decision maker data
  - `Failed-to-Enrich`: Company found but no decision maker data available
  - `Not-Enriched`: Apollo API key not provided

### 💻 Usage

#### Running on Apify Platform

1. **Create an Actor run** via the Apify Console or API
2. **Configure input parameters:**
   - Select a job category
   - Provide your Apollo.io API key
   - Set maximum items per search
   - Enable email extraction if needed
3. **View results** in the Output tab with three available views:
   - **Enriched Jobs Overview**: Key fields with decision maker contacts
   - **Full Job Details**: Complete job descriptions and metadata
   - **All Data (JSON)**: Raw dataset export

#### Local Development

**Install dependencies:**

```bash
npm install
```

**Set environment variables:**

Create a `.env` file or set in your environment:

```env
INDEED_ACTOR_ID=hMvNSpz3JnHgl5jkh
APIFY_TOKEN=your_apify_token_here
```

**Run locally:**

```bash
npm start
```

Note: Local runs use the `storage/` directory to emulate Apify storage. This data is NOT synced to Apify Console. To verify output, deploy and run on the platform.

#### Deploy to Apify

Authenticate and push to Apify platform:

```bash
apify login
apify push
```

### 🧩 Configuration

#### Input Parameters

Defined in [.actor/input\_schema.json](.actor/input_schema.json):

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `category` | string | Yes | Job category from jobs.json (Admin, Resourcers, Compliance, etc.) |
| `apolloApiKey` | string | Yes | Your Master API Key from Apollo.io (stored securely) |
| `maxItemsPerSearch` | integer | No | Maximum jobs to scrape per search term (default: 10) |
| `extractEmails` | boolean | No | Enable email extraction using Apollo credits (default: false) |
| `parseCompanyDetails` | boolean | No | Parse company details from Indeed (default: true) |

#### Job Categories

Edit `jobs.json` to customize job titles for each category:

- **Admin**: Administrator, Admin Assistant, Office Administrator, HR Administrator
- **Resourcers**: Recruiter, Talent Sourcer, Recruitment Consultant
- **Compliance**: Compliance Officer, Compliance Administrator, Compliance Coordinator
- **Data Entry**: Data Entry Clerk, Data Entry Administrator, Data Processor
- **Back Office**: Operations Assistant, Accounts Assistant, Finance Assistant, and more

[View full categories](jobs.json)

### 📊 Output

#### Dataset Schema

The Actor outputs enriched job data with three views defined in [.actor/dataset\_schema.json](.actor/dataset_schema.json):

##### Overview View

Key enrichment fields for lead generation:

- Job title, company, location, salary
- Job type, posting date, job URL
- Industry, company LinkedIn
- Decision maker name, title, LinkedIn
- Email address and confidence score
- Enrichment status, category

##### Job Details View

Complete job information:

- All job posting details
- Job description and snippets
- Company details from Indeed
- Company website from Google Search
- Search query metadata

#### Output Schema

The Actor provides multiple output templates in [.actor/output\_schema.json](.actor/output_schema.json):

- **Enriched Jobs Overview**: Filtered view with lead generation data
- **Full Job Details**: Complete job postings with descriptions
- **All Data (JSON)**: Raw dataset export
- **Run Statistics**: Actor performance metrics

### 🔑 API Keys

#### Apollo.io API Key

1. Sign up at [Apollo.io](https://apollo.io)
2. Navigate to Settings → API
3. Generate a Master API Key
4. Add to Actor input (stored securely as a secret)

**Note:** Email extraction consumes Apollo credits. Set `extractEmails: false` to save credits.

#### Apify API Token

- Required for local development
- Get from [Apify Console](https://console.apify.com/account/integrations)
- Set as `APIFY_TOKEN` environment variable

### 🎯 Use Cases

- **Lead Generation**: Find decision makers at companies hiring for specific roles
- **Sales Prospecting**: Build targeted lists with verified contact information
- **Market Research**: Analyze hiring trends by industry and location
- **Recruitment**: Identify companies actively hiring in your niche

### 📝 Notes

- Google SERP proxy is required for website discovery (included with Apify residential proxies)
- Apollo.io free tier provides limited credits - monitor usage if extracting emails
- The Indeed Scraper Actor ID can be configured via `INDEED_ACTOR_ID` environment variable
- Local storage in `storage/` directory is for testing only and not synced to Apify Console

### 🤝 Contributing

Contributions welcome! To add new job categories:

1. Edit `jobs.json` with new category and job titles
2. Update `.actor/input_schema.json` enum values
3. Test with `npm start` locally
4. Submit a pull request

### 📄 License

ISC

#### Apify Dataset

Contains one JSON object per job with merged data, e.g.:

```json
{
  "job_title": "Finance Officer",
  "company": "Aster Group",
  "salary": "£26,510 a year",
  "industry": "Non-profit",
  "decision_maker_name": "Bjorn",
  "email": "bjorn.howard@aster.co.uk",
  "enriched_status": "Enriched"
}
```

# Actor input Schema

## `category` (type: `string`):

If you need any job category other than this, please create an issue. Our developers will implement it.

## `maxItemsPerSearch` (type: `integer`):

Maximum number of jobs to scrape.

## `extractEmails` (type: `boolean`):

If true, uses extra Apollo credits to find verified emails.

## `apolloApiKey` (type: `string`):

Your API Key from Apollo.io Settings.

## `parseCompanyDetails` (type: `boolean`):

If true, fetches company details (website, size, description) from Indeed. Required for enrichment.

## Actor input object example

```json
{
  "category": "Admin",
  "maxItemsPerSearch": 10,
  "extractEmails": false,
  "apolloApiKey": "7K4a6A_NjwmLHUVo2vHqJA",
  "parseCompanyDetails": true
}
```

# Actor output Schema

## `enrichedJobs` (type: `string`):

View enriched job postings with decision maker contact details

## `fullJobDetails` (type: `string`):

Complete job information including descriptions and metadata

## `allData` (type: `string`):

Raw dataset with all fields in JSON format

## `statistics` (type: `string`):

View Actor run statistics and crawler performance

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "category": "Admin",
    "maxItemsPerSearch": 10,
    "extractEmails": false,
    "apolloApiKey": "7K4a6A_NjwmLHUVo2vHqJA",
    "parseCompanyDetails": true
};

// Run the Actor and wait for it to finish
const run = await client.actor("scrapyspider/indeed-job-enrichment-automation").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "category": "Admin",
    "maxItemsPerSearch": 10,
    "extractEmails": False,
    "apolloApiKey": "7K4a6A_NjwmLHUVo2vHqJA",
    "parseCompanyDetails": True,
}

# Run the Actor and wait for it to finish
run = client.actor("scrapyspider/indeed-job-enrichment-automation").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "category": "Admin",
  "maxItemsPerSearch": 10,
  "extractEmails": false,
  "apolloApiKey": "7K4a6A_NjwmLHUVo2vHqJA",
  "parseCompanyDetails": true
}' |
apify call scrapyspider/indeed-job-enrichment-automation --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=scrapyspider/indeed-job-enrichment-automation",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Indeed Job Enrichment Automation",
        "description": "Scrape Indeed jobs by category and country, discover official company websites, and enrich companies with Apollo.io decision-maker data in one workflow.",
        "version": "0.0",
        "x-build-id": "8ElagWGwaB7M0WScx"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/scrapyspider~indeed-job-enrichment-automation/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-scrapyspider-indeed-job-enrichment-automation",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/scrapyspider~indeed-job-enrichment-automation/runs": {
            "post": {
                "operationId": "runs-sync-scrapyspider-indeed-job-enrichment-automation",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/scrapyspider~indeed-job-enrichment-automation/run-sync": {
            "post": {
                "operationId": "run-sync-scrapyspider-indeed-job-enrichment-automation",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "category",
                    "apolloApiKey"
                ],
                "properties": {
                    "category": {
                        "title": "Job Category",
                        "enum": [
                            "Admin",
                            "Data Entry",
                            "Back Office",
                            "Compliance",
                            "Vetting",
                            "Onboarding",
                            "DBS",
                            "RTW",
                            "Resourcers",
                            "Talent Sourcing"
                        ],
                        "type": "string",
                        "description": "If you need any job category other than this, please create an issue. Our developers will implement it.",
                        "default": "Resourcers"
                    },
                    "maxItemsPerSearch": {
                        "title": "Max Items per Search",
                        "type": "integer",
                        "description": "Maximum number of jobs to scrape.",
                        "default": 10
                    },
                    "extractEmails": {
                        "title": "Extract Emails & Confidence",
                        "type": "boolean",
                        "description": "If true, uses extra Apollo credits to find verified emails.",
                        "default": false
                    },
                    "apolloApiKey": {
                        "title": "Apollo.io API Key",
                        "type": "string",
                        "description": "Your API Key from Apollo.io Settings."
                    },
                    "parseCompanyDetails": {
                        "title": "Parse Company Details",
                        "type": "boolean",
                        "description": "If true, fetches company details (website, size, description) from Indeed. Required for enrichment.",
                        "default": true
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
