# Remote Jobs Actor (`cancap/remote-jobs-actor`) Actor

Aggregates fresh remote job listings from RemoteOK and We Work Remotely's official public APIs into one clean dataset — title, company, salary, location, tags, and direct apply links. No login required, no HTML scraping, fully attribution-compliant with both sources.

- **URL**: https://apify.com/cancap/remote-jobs-actor.md
- **Developed by:** [CANCAP](https://apify.com/cancap) (community)
- **Categories:** Jobs
- **Stats:** 3 total users, 0 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Remote Jobs Aggregator (Apify Actor)

Fetches fresh remote job listings from public, scraper-friendly job board
sources and outputs them as one clean, normalized dataset.

### Sources

- **RemoteOK** — official public JSON API (`https://remoteok.com/api`)
- **We Work Remotely** — official public RSS feed (`https://weworkremotely.com/remote-jobs.rss`)

Both are no-login, no-anti-bot-bypass, terms-compliant sources. Neither
involves scraping rendered HTML, which is why this Actor should stay
stable even if either site changes its page design — only a change to
the API/feed format itself would require an update.

### What it does

- Pulls from both sources and merges them into one consistent schema:
  `title, company, location, tags, salaryMin, salaryMax, postedAt, applyUrl, description`.
- Filters by keyword and listing age (both optional, set in Input).
- Deduplicates and skips malformed records instead of crashing the run.
- Retries failed requests with exponential backoff (2s → 4s → 8s).
- If one source fails entirely, the run still completes with whatever
  other source(s) succeeded, and the error is logged to `SOURCE_ERRORS`
  in the key-value store — it won't silently return nothing or crash.

### Required attribution (important — keep this)

Both RemoteOK and WWR's API/feed terms require crediting the source and
linking directly to the original listing (no redirects). This Actor
already does both: every record includes an `attribution` field and an
`applyUrl` pointing straight at the original listing on its source site.
**Don't strip these out** if you resell or republish this data — it's
the condition that keeps both feeds usable long-term.

### A known limitation of the WWR adapter

WWR's RSS titles commonly follow a "Company: Job Title" format, which
the adapter splits on the first colon. If a title has no colon, `company`
is set to `"Unknown"` rather than guessed — check a live run's output
for how often that happens with the current feed, and let me know if
it's frequent enough to need a smarter rule.

### Deploying to Apify

1. Install the Apify CLI: `npm install -g apify-cli`
2. From this folder: `apify login` then `apify push`
   — or push this folder directly via the Apify Console ("Create Actor" → "Upload" / connect via Git).
3. Set Input via the generated UI (sources, keywords, maxItems, maxAgeDays).
4. Run it. Output lands in the Actor's default Dataset — exportable as
   JSON, CSV, Excel, or XML directly from the Apify Console with no
   extra code.

### Extending to more sources

Add a new file under `src/sources/`, following the same pattern as
`remoteok.js` or `weworkremotely.js`: fetch → normalize → return an array
of objects matching the same schema. Then register it in
`SOURCE_FETCHERS` in `src/main.js` and add it to the `sources` enum in
`.actor/input_schema.json`.

Good next candidates: Himalayas, Working Nomads (has a JSON feed) — both
public, no-login boards. Avoid sites requiring login or aggressive
anti-bot measures (LinkedIn, Indeed at scale) — those involve ToS/legal
risk that's out of scope for this Actor.

### Honest limitation

This was built and unit-tested offline against realistic mocks of each
source's documented response shape (the sandbox used to build this has
no outbound access to remoteok.com or weworkremotely.com). The
normalization logic is defensive — it validates response shape, retries
on failure, and skips/logs malformed records rather than crashing — but
you should run one live test on Apify before relying on it, in case
either site's actual current API/feed differs in some way I couldn't
verify directly.

# Actor input Schema

## `sources` (type: `array`):

Which job boards to pull from.
## `keywords` (type: `array`):

Optional. Only keep jobs whose title or tags contain at least one of these (case-insensitive). Leave empty to keep everything.
## `maxItems` (type: `integer`):

Stop after collecting this many jobs (0 = no limit).
## `maxAgeDays` (type: `integer`):

Drop listings older than this many days. 0 = no limit.

## Actor input object example

```json
{
  "sources": [
    "remoteok",
    "weworkremotely"
  ],
  "keywords": [],
  "maxItems": 200,
  "maxAgeDays": 30
}
````

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("cancap/remote-jobs-actor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("cancap/remote-jobs-actor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call cancap/remote-jobs-actor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=cancap/remote-jobs-actor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Remote Jobs Actor",
        "description": "Aggregates fresh remote job listings from RemoteOK and We Work Remotely's official public APIs into one clean dataset — title, company, salary, location, tags, and direct apply links. No login required, no HTML scraping, fully attribution-compliant with both sources.",
        "version": "0.0",
        "x-build-id": "H0NgpojcylYuXTK4B"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/cancap~remote-jobs-actor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-cancap-remote-jobs-actor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/cancap~remote-jobs-actor/runs": {
            "post": {
                "operationId": "runs-sync-cancap-remote-jobs-actor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/cancap~remote-jobs-actor/run-sync": {
            "post": {
                "operationId": "run-sync-cancap-remote-jobs-actor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "sources"
                ],
                "properties": {
                    "sources": {
                        "title": "Sources",
                        "type": "array",
                        "description": "Which job boards to pull from.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "remoteok",
                                "weworkremotely"
                            ],
                            "enumTitles": [
                                "RemoteOK",
                                "We Work Remotely"
                            ]
                        },
                        "default": [
                            "remoteok",
                            "weworkremotely"
                        ]
                    },
                    "keywords": {
                        "title": "Keyword filter",
                        "type": "array",
                        "description": "Optional. Only keep jobs whose title or tags contain at least one of these (case-insensitive). Leave empty to keep everything.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxItems": {
                        "title": "Max items",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Stop after collecting this many jobs (0 = no limit).",
                        "default": 200
                    },
                    "maxAgeDays": {
                        "title": "Max listing age (days)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Drop listings older than this many days. 0 = no limit.",
                        "default": 30
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
