Remote Jobs Actor
Pricing
Pay per usage
Remote Jobs Actor
Aggregates fresh remote job listings from RemoteOK and We Work Remotely's official public APIs into one clean dataset — title, company, salary, location, tags, and direct apply links. No login required, no HTML scraping, fully attribution-compliant with both sources.
Remote Jobs Aggregator (Apify Actor)
Fetches fresh remote job listings from public, scraper-friendly job board sources and outputs them as one clean, normalized dataset.
Sources
- RemoteOK — official public JSON API (
https://remoteok.com/api) - We Work Remotely — official public RSS feed (
https://weworkremotely.com/remote-jobs.rss)
Both are no-login, no-anti-bot-bypass, terms-compliant sources. Neither involves scraping rendered HTML, which is why this Actor should stay stable even if either site changes its page design — only a change to the API/feed format itself would require an update.
What it does
- Pulls from both sources and merges them into one consistent schema:
title, company, location, tags, salaryMin, salaryMax, postedAt, applyUrl, description. - Filters by keyword and listing age (both optional, set in Input).
- Deduplicates and skips malformed records instead of crashing the run.
- Retries failed requests with exponential backoff (2s → 4s → 8s).
- If one source fails entirely, the run still completes with whatever
other source(s) succeeded, and the error is logged to
SOURCE_ERRORSin the key-value store — it won't silently return nothing or crash.
Required attribution (important — keep this)
Both RemoteOK and WWR's API/feed terms require crediting the source and
linking directly to the original listing (no redirects). This Actor
already does both: every record includes an attribution field and an
applyUrl pointing straight at the original listing on its source site.
Don't strip these out if you resell or republish this data — it's
the condition that keeps both feeds usable long-term.
A known limitation of the WWR adapter
WWR's RSS titles commonly follow a "Company: Job Title" format, which
the adapter splits on the first colon. If a title has no colon, company
is set to "Unknown" rather than guessed — check a live run's output
for how often that happens with the current feed, and let me know if
it's frequent enough to need a smarter rule.
Deploying to Apify
- Install the Apify CLI:
npm install -g apify-cli - From this folder:
apify loginthenapify push— or push this folder directly via the Apify Console ("Create Actor" → "Upload" / connect via Git). - Set Input via the generated UI (sources, keywords, maxItems, maxAgeDays).
- Run it. Output lands in the Actor's default Dataset — exportable as JSON, CSV, Excel, or XML directly from the Apify Console with no extra code.
Extending to more sources
Add a new file under src/sources/, following the same pattern as
remoteok.js or weworkremotely.js: fetch → normalize → return an array
of objects matching the same schema. Then register it in
SOURCE_FETCHERS in src/main.js and add it to the sources enum in
.actor/input_schema.json.
Good next candidates: Himalayas, Working Nomads (has a JSON feed) — both public, no-login boards. Avoid sites requiring login or aggressive anti-bot measures (LinkedIn, Indeed at scale) — those involve ToS/legal risk that's out of scope for this Actor.
Honest limitation
This was built and unit-tested offline against realistic mocks of each source's documented response shape (the sandbox used to build this has no outbound access to remoteok.com or weworkremotely.com). The normalization logic is defensive — it validates response shape, retries on failure, and skips/logs malformed records rather than crashing — but you should run one live test on Apify before relying on it, in case either site's actual current API/feed differs in some way I couldn't verify directly.