# AiJobs.net Scraper (`shahidirfan/aijobs-net-scraper`) Actor

Automatically scrape AI job listings from AiJobs.net. Extract job titles, companies, locations, salaries, and full descriptions. Perfect for recruitment research, job market analysis, and career tracking with reliable data extraction.

- **URL**: https://apify.com/shahidirfan/aijobs-net-scraper.md
- **Developed by:** [Shahid Irfan](https://apify.com/shahidirfan) (community)
- **Categories:** Jobs, Automation, Lead generation
- **Stats:** 1 total users, 0 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## AiJobs.net Scraper

Extract job listings from aijobs.net with flexible inputs for direct URLs, keyword discovery, and location targeting. Build high-quality datasets for hiring intelligence, compensation benchmarking, and AI talent market monitoring with richer job detail fields such as tasks, perks, skills, education, role labels, and geographic breakdowns.

### Features

- **URL-first scraping** — Start from a specific listing page or a direct job URL.
- **Keyword and location search** — Find relevant jobs with search-friendly input fields.
- **Pagination support** — Collect data across multiple listing pages with a page limit.
- **Rich job detail extraction** — Capture salary, level, tasks, perks, skills, education, roles, and region hierarchy.
- **Clean datasets** — Records exclude empty and null values for analysis-ready output.

---

### Use Cases

#### AI Talent Market Research
Track demand for AI, ML, and data roles across countries and cities. Compare job volume and skills trends over time.

#### Hiring Pipeline Discovery
Identify companies actively hiring for specific roles and technologies. Build targeted outreach lists for recruiting and business development.

#### Compensation Benchmarking
Collect salary ranges when available to compare market rates across job families and regions.

#### Skills Trend Monitoring
Measure how often key technologies and competencies appear in active job listings.

#### Job Intelligence Dashboards
Feed clean job data into BI tools for recurring reporting and competitive intelligence.

---

### Input Parameters

| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| `url` | String | No | `"https://aijobs.net/"` | Start URL for a listing page or direct job page. |
| `keyword` | String | No | `"python"` | Keyword used to discover relevant jobs. |
| `location` | String | No | `""` | Location filter for region-specific jobs. |
| `results_wanted` | Integer | No | `20` | Maximum number of jobs to save. |
| `max_pages` | Integer | No | `5` | Maximum number of listing pages to request. |
| `startUrl` | String | No | — | Alias for `url` for compatibility. |
| `proxyConfiguration` | Object | No | Apify Proxy (Residential) | Proxy configuration for reliable data collection. |

---

### Output Data

Each item in the dataset contains:

| Field | Type | Description |
|---|---|---|
| `title` | String | Job title |
| `company` | String | Hiring company name |
| `company_profile_url` | String | Company profile URL when publicly available |
| `company_slug` | String | Company slug derived from the profile path |
| `company_id` | String | Company identifier when available in the public profile path |
| `location` | String | Primary job location |
| `locations` | Array<String> | All detected location values |
| `salary` | String | Human-readable salary badge when shown |
| `level` | String | Seniority badge |
| `employment_type` | String | Employment type |
| `employment_types` | Array<String> | All employment-type badges shown on the job |
| `posted_ago` | String | Relative posting age shown on the page |
| `skills` | Array<String> | Skills associated with the role |
| `tasks` | Array<String> | Task statements listed on the job detail page |
| `perks` | Array<String> | Perks and benefits listed on the job detail page |
| `education` | Array<String> | Education labels shown on the job detail page |
| `roles` | Array<String> | Role labels associated with the job |
| `regions` | Array<String> | Region labels associated with the job |
| `countries` | Array<String> | Country labels associated with the job |
| `states` | Array<String> | State or province labels associated with the job |
| `cities` | Array<String> | City labels associated with the job |
| `apply_url` | String | Public apply path URL shown on the job page |
| `apply_path` | String | Relative apply path shown on the site |
| `apply_id` | String | Apply-path identifier when available |
| `identifier` | String | Job identifier when provided |
| `description_text` | String | Cleaned plain-text description |
| `url` | String | Job detail URL |
| `source` | String | Data source domain |

---

### Usage Examples

#### Basic Run

```json
{
	"url": "https://aijobs.net/",
	"results_wanted": 20,
	"max_pages": 5
}
````

#### Keyword + Location Search

```json
{
	"keyword": "data engineer",
	"location": "Germany",
	"results_wanted": 30,
	"max_pages": 6
}
```

#### Direct Listing URL

```json
{
	"url": "https://aijobs.net/jobs/api/",
	"results_wanted": 25,
	"max_pages": 4
}
```

#### Single Job URL

```json
{
	"url": "https://aijobs.net/job/frontier-ai-research-lead-georgetown-university-main-campus-walsh-school-of-foreign-service-500-first-st-nw-7th-floor-74243/"
}
```

***

### Sample Output

```json
{
	"title": "Senior / Lead Data Engineer (24x7 Data & AI Factory)",
	"company": "Devoteam",
	"company_profile_url": "https://aijobs.net/company/devoteam-317/",
	"company_slug": "devoteam",
	"company_id": "317",
	"location": "Kraków, Poland",
	"locations": ["Kraków, Poland", "Kraków, Lesser Poland, PL", "Lesser Poland, PL", "Poland", "Europe"],
	"salary": "PLN 258K-370K (estimate)",
	"level": "Senior-level",
	"employment_type": "Full Time",
	"posted_ago": "17h ago",
	"skills": ["Airflow", "Apache Spark", "DBT", "Data Observability"],
	"tasks": ["Build AI models", "Build data pipelines", "Improve data observability"],
	"perks": ["Conference attendance", "Hybrid work model", "Private medical healthcare"],
	"education": ["Bachelor of Engineering", "Bachelor of Science"],
	"roles": ["Data Engineer", "Lead Data Engineer", "Senior Data Engineer"],
	"countries": ["Poland"],
	"states": ["Lesser Poland, PL"],
	"cities": ["Kraków, Lesser Poland, PL"],
	"apply_url": "https://aijobs.net/job/1zBHEnUHQOy3wjW/apply/",
	"apply_id": "1zBHEnUHQOy3wjW",
	"identifier": "89359",
	"description_text": "Advise junior team members; Build AI models; Build data pipelines; Design AI models; Design data pipelines; Improve data observability; Maintain AI models; Maintain data observability; Maintain data pipelines; Prepare business insights solutions; Support dataops and ai projects; Transform data; Troubleshoot AI models; Troubleshoot data observability; Troubleshoot data pipelines;",
	"url": "https://aijobs.net/job/senior-lead-data-engineer-24x7-data-ai-factory-krakow-poland-89359/",
	"source": "aijobs.net"
}
```

***

### Tips for Best Results

#### Start with Small Runs

- Use `results_wanted: 20` for fast validation.
- Increase result volume after confirming output quality.

#### Use Strong Keywords

- Prefer specific role names like `ml engineer`, `computer vision`, or `data scientist`.
- Combine with location for more targeted datasets.

#### Set Practical Pagination Limits

- Increase `max_pages` for broader discovery.
- Keep limits reasonable to maintain fast run times.

#### Use Proxies for Stability

- Residential proxy settings are recommended for reliable multi-page runs.

***

### Integrations

- **Google Sheets** — Export job data for collaborative review.
- **Airtable** — Build searchable hiring intelligence tables.
- **Slack** — Send run notifications and alerts.
- **Make** — Automate enrichment and downstream workflows.
- **Zapier** — Trigger alerts and CRM updates.
- **Webhooks** — Push datasets to your own services.

#### Export Formats

- **JSON** — Best for APIs and engineering workflows
- **CSV** — Best for spreadsheet analysis
- **Excel** — Best for business reporting
- **XML** — Best for system integrations

***

### Frequently Asked Questions

#### Can I run with only a keyword?

Yes. Provide `keyword` and optionally `location`, and the actor will discover matching listings.

#### Can I run with only a URL?

Yes. You can provide either a listing URL or a direct job URL.

#### Why are some fields missing in certain records?

Some job listings do not provide every field. The actor keeps only available values to avoid null-heavy output.

#### How many jobs can I collect?

You can scale based on `results_wanted` and `max_pages`, depending on available listings.

#### Does user input override defaults?

Yes. Runtime input values always take priority over prefill/default values.

***

### Support

For issues or feature requests, use the Apify Console issue/reporting channels for this actor.

#### Resources

- [Apify Documentation](https://docs.apify.com/)
- [Apify API Reference](https://docs.apify.com/api/v2)
- [Scheduling Runs](https://docs.apify.com/platform/schedules)

***

### Legal Notice

This actor is intended for legitimate data collection and market research. Users are responsible for complying with website terms and applicable laws in their jurisdiction.

# Actor input Schema

## `url` (type: `string`):

Start from a specific aijobs.net listing or job URL.

## `keyword` (type: `string`):

Keyword used to resolve matching roles, skills, and topics.

## `location` (type: `string`):

Location used to resolve matching countries, states, cities, or regions.

## `results_wanted` (type: `integer`):

Maximum number of jobs to save.

## `max_pages` (type: `integer`):

Maximum number of listing pages to request.

## `proxyConfiguration` (type: `object`):

Use Apify Proxy for reliability.

## Actor input object example

```json
{
  "url": "https://aijobs.net/",
  "keyword": "python",
  "results_wanted": 20,
  "max_pages": 5,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "url": "https://aijobs.net/",
    "keyword": "python",
    "location": "",
    "results_wanted": 20,
    "max_pages": 5
};

// Run the Actor and wait for it to finish
const run = await client.actor("shahidirfan/aijobs-net-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "url": "https://aijobs.net/",
    "keyword": "python",
    "location": "",
    "results_wanted": 20,
    "max_pages": 5,
}

# Run the Actor and wait for it to finish
run = client.actor("shahidirfan/aijobs-net-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "url": "https://aijobs.net/",
  "keyword": "python",
  "location": "",
  "results_wanted": 20,
  "max_pages": 5
}' |
apify call shahidirfan/aijobs-net-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=shahidirfan/aijobs-net-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "AiJobs.net Scraper",
        "description": "Automatically scrape AI job listings from AiJobs.net. Extract job titles, companies, locations, salaries, and full descriptions. Perfect for recruitment research, job market analysis, and career tracking with reliable data extraction.",
        "version": "0.0",
        "x-build-id": "heNL77dafhWEvWNzH"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/shahidirfan~aijobs-net-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-shahidirfan-aijobs-net-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/shahidirfan~aijobs-net-scraper/runs": {
            "post": {
                "operationId": "runs-sync-shahidirfan-aijobs-net-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/shahidirfan~aijobs-net-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-shahidirfan-aijobs-net-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "url": {
                        "title": "URL",
                        "type": "string",
                        "description": "Start from a specific aijobs.net listing or job URL."
                    },
                    "keyword": {
                        "title": "Keyword",
                        "type": "string",
                        "description": "Keyword used to resolve matching roles, skills, and topics."
                    },
                    "location": {
                        "title": "Location",
                        "type": "string",
                        "description": "Location used to resolve matching countries, states, cities, or regions."
                    },
                    "results_wanted": {
                        "title": "Results wanted",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of jobs to save.",
                        "default": 20
                    },
                    "max_pages": {
                        "title": "Max pages",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of listing pages to request.",
                        "default": 5
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Use Apify Proxy for reliability.",
                        "default": {
                            "useApifyProxy": false
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
