Remote Jobs Intelligence Scraper avatar

Remote Jobs Intelligence Scraper

Pricing

from $1.80 / 1,000 job-results

Go to Apify Store
Remote Jobs Intelligence Scraper

Remote Jobs Intelligence Scraper

Scrape public remote job listings from remote-first sources (Remotive, Remote OK, We Work Remotely) and turn them into clean, CSV-ready hiring-intelligence data - no login, cookies, or residential proxy.

Pricing

from $1.80 / 1,000 job-results

Rating

0.0

(0)

Developer

Delowar Munna

Delowar Munna

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Remote Jobs Intelligence Scraper

Collect public remote job listings from remote-first sources — Remotive, Remote OK, and We Work Remotely — and turn them into one clean, flat, CSV-ready schema enriched with lightweight remote-work intelligence: remote scope, location/country/timezone restrictions, salary availability, detected skills, and a transparent hiring-signal score. Built for recruiters, staffing agencies, sales teams, and remote-work market researchers.

No login, no cookies, no residential proxy, no paid APIs. The actor reads each source's public API/feed over plain HTTP, so it stays fast and cost-predictable. You pay one flat event per unique job row that passes your filters.

✨ Why this scraper

  • Remote-first, not generic — only remote-job sources, normalized into a single schema with remote-specific intelligence fields.
  • Multiple sources, one schema — Remotive + Remote OK JSON APIs and We Work Remotely RSS, deduplicated across sources.
  • 31 flat fields — job identity, company, remote scope, salary, skills, posting age, and hiring signal. No nested objects; drops straight into Sheets/Excel/CRMs.
  • Pay-Per-Event — one flat job-result event per saved unique job. Duplicates and filtered rows are never charged.
  • Partial-failure safe — if one source is down, the others still return results.
  • Transparent hiring-signal score — rule-based (no AI), explained below.

🚀 Quick start — sample inputs

Example 1 — keyword search across all three sources

{
"keywords": ["software engineer", "python"],
"keywordMatchMode": "any",
"sources": ["remotive", "remoteok", "weworkremotely"],
"remoteScope": "any",
"salaryRequired": false,
"postedWithinDays": 30,
"includeDescription": true,
"includeDetectedSkills": true,
"maxResults": 500,
"deduplicate": true,
"proxyConfiguration": { "useApifyProxy": true }
}

Keyword matching is word/token based"software engineer" also matches "Backend Engineer" or "Staff Engineer". Use keywordMatchMode: "all" to require every keyword.

Example 2 — worldwide remote, salary required, all three sources + direct URL

{
"keywords": ["designer"],
"sources": ["remotive", "remoteok", "weworkremotely"],
"sourceUrls": ["https://remotive.com/remote-jobs/design"],
"locationKeywords": ["Worldwide", "Europe"],
"remoteScope": "worldwide",
"salaryRequired": true,
"postedWithinDays": 14,
"maxResults": 500,
"proxyConfiguration": { "useApifyProxy": true }
}

Leave keywords, sourceUrls, and the filters empty to simply pull the most recent jobs from the selected sources. If both sources and sourceUrls are given, the actor runs both and deduplicates across the whole run.

Attribution: Remote OK requires that data consumers credit Remote OK as the source. If you republish Remote OK rows, link back to the original source_job_url.


📦 Output

The dataset has one view: Remote jobs & intelligence — a 31-column flat table.

Remote Jobs Intelligence Scraper — all-fields table view

Output fields (31)

job_id, source, source_job_url, canonical_job_url, job_title, company_name, company_website, company_logo_url, source_category, employment_type, seniority, remote_scope, location_restriction, country_restrictions, timezone_restrictions, salary_available, salary_min, salary_max, salary_currency, salary_period, posted_at, posted_age_days, description, application_url, detected_skills, matched_keywords, hiring_signal_score, reason_tags, input_keyword, input_source_url, scraped_at.

Sample record — Remote jobs & intelligence

(Real row from a sample run; the description is truncated here for readability.)

{
"job_id": "2090910",
"source": "remotive",
"source_job_url": "https://remotive.com/remote-jobs/software-development/staff-software-engineer-product-belo-horizonte-2090910",
"canonical_job_url": "https://remotive.com/remote-jobs/software-development/staff-software-engineer-product-belo-horizonte-2090910",
"job_title": "Staff Software Engineer, Product (Belo Horizonte)",
"company_name": "LawnStarter",
"company_website": null,
"company_logo_url": "https://remotive.com/job/2090910/logo",
"source_category": "software development",
"employment_type": "full-time",
"seniority": "lead",
"remote_scope": "country_restricted",
"location_restriction": "Brazil",
"country_restrictions": "Brazil",
"timezone_restrictions": null,
"salary_available": true,
"salary_min": 80000,
"salary_max": 100000,
"salary_currency": "USD",
"salary_period": "unknown",
"posted_at": "2026-06-02T07:53:42.000Z",
"posted_age_days": 4,
"description": "This is a remote role for candidates located in Belo Horizonte, Brazil. About LawnStarter — LawnStarter is the nation's leading on-demand marketplace for lawn care and outdoor services...",
"application_url": "https://remotive.com/remote-jobs/software-development/staff-software-engineer-product-belo-horizonte-2090910",
"detected_skills": "typescript,php,laravel,react,rest,aws,ai,ux,machine learning",
"matched_keywords": "software engineer",
"hiring_signal_score": 98,
"reason_tags": "recent_posting,salary_visible,location_restriction_clear,company_present,apply_url_present,skills_detected,keyword_match",
"input_keyword": "software engineer",
"input_source_url": null,
"scraped_at": "2026-06-07T05:47:42.247Z"
}

🎯 Hiring-signal score

Transparent rule-based score (0–100) computed from extracted fields — no AI, no external enrichment.

SignalPoints
Base (any valid remote job row)+20
Posted within the last 7 days+15
Posted within the last 30 days (if not 7-day)+10
Salary visible+15
Remote scope worldwide+10
Remote scope clearly country/region restricted+8
Company name present+10
Application URL present+10
Detected skills present+10
Matched a keyword/category filter+10

Score is capped at 100. Bands: high (80–100) · medium (50–79) · low (1–49) · unknown (0).

reason_tags is a comma-separated list explaining the score — e.g. recent_posting, salary_visible, worldwide_remote, location_restriction_clear, skills_detected, keyword_match, company_present, apply_url_present, stale_posting, missing_posted_date.


💰 Pricing

Pay-Per-Event. One flat event per saved row (final per-event price is configured on the Apify console):

EventCharged when
job-resultOnce per unique job row that passed all filters and was successfully written to the dataset.

So your bill is simply results_saved × price_per_event. The actor honors the user-configured per-run spending cap (Apify eventChargeLimitReached): it caps how many results it collects up-front to what the limit can pay for, and stops cleanly the moment the cap is reached during charging.

Not charged:

  • Duplicates (deduplicated by source + job_id, canonical URL, and title+company keys).
  • Rows filtered out by keyword / category / company / location / remote-scope / salary / date filters.
  • Invalid rows (missing title, company, source, or any URL).
  • Failed or blocked requests.

🚦 Proxy policy

Use Apify Datacenter proxy or no proxy for normal runs — both work reliably for these public APIs/feeds at this actor's conservative concurrency.

Apify Residential proxy is not supported. The actor will fail at startup if proxyConfiguration.apifyProxyGroups includes RESIDENTIAL. Reason: in pay-per-event actors, residential bandwidth (~/GB) is billed to the developer, not the run user, so a single bandwidth-heavy run could exceed the per-result event revenue.

If you genuinely need residential routing, supply your own residential provider via the proxy editor's Custom proxy URLs field — that traffic goes through your provider, not Apify, and is unaffected:

http://user:pass@proxy.iproyal.com:12321
http://user:pass@proxy.brightdata.com:22225
http://user:pass@proxy.oxylabs.io:7777

📊 Run summary

After each run, a RUN_SUMMARY entry is written to the key-value store:

{
"inputs_total": 12,
"sources_requested": ["remotive", "remoteok", "weworkremotely"],
"successful_sources": ["remotive", "remoteok", "weworkremotely"],
"failed_sources": [],
"successful_inputs": 12,
"failed_inputs": 0,
"raw_results_found": 362,
"results_saved": 136,
"duplicates_removed": 26,
"filtered_out": 200,
"charged_events": 136,
"blocked_requests": 0,
"retry_count": 0,
"source_counts": { "remotive": 22, "remoteok": 38, "weworkremotely": 76 },
"runtime_seconds": 6,
"scraped_at": "2026-06-07T05:47:42.247Z"
}

inputs_total is 12 because We Work Remotely fans out across its ~10 category RSS feeds (plus one Remotive and one Remote OK request). Leaving keywords empty pushes filtered_out toward 0 and returns far more rows.

charged_events equals the number of successfully saved unique rows.


⚙️ Filters

All filters apply after extraction and normalization, and before any dataset push or charge.

FilterEffect
keywords + keywordMatchModeMatch title/company/category/tags/description. any = at least one; all = every keyword.
categoriesKeep only jobs in a matching source category.
companiesKeep only jobs from matching company names.
locationKeywordsKeep only jobs whose location/region text matches.
remoteScopeany / worldwide / country_restricted / region_restricted / unknown.
salaryRequiredKeep only jobs with a visible salary.
postedWithinDaysKeep only jobs posted within N days (0 disables; missing date is dropped when N > 0).
deduplicateDrop duplicate jobs across sources and inputs (recommended ON).

Missing values behave conservatively: when a filter is set and the relevant field is missing, the row is filtered out.


🚧 Limitations (V1)

  • Public sources only: Remotive public API, Remote OK public JSON feed, We Work Remotely public RSS. No login, cookies, or member-only content.
  • Salary parsing is best-effort and only set when numeric compensation is visible; "competitive salary" is not treated as available.
  • Remote fields are derived from the visible location/candidate text — they do not infer legal work eligibility beyond what's stated.
  • detected_skills is a curated keyword dictionary match (not AI).
  • No recruiter/contact extraction, email enrichment, company-website crawling, logo downloading, or AI scoring.
  • maxResults caps saved unique rows across the whole run (not per source).

❓ FAQ

Do I need an account or API key? No. All three sources are read through their public, unauthenticated API/feeds.

Why are some fields empty? Sources expose different fields, and the actor never invents values. company_website is not published by any of the three sources, so it is always empty. company_logo_url comes from Remotive (Remote OK currently returns blank logos on its public feed; We Work Remotely RSS has none). Salary is well populated from Remotive, sparse on Remote OK, and absent from WWR RSS. country_restrictions/timezone_restrictions are derived only from visible location text, so "Worldwide"/"Anywhere" jobs correctly stay empty. input_source_url is set only when you use sourceUrls. Missing values are null / false / unknown consistently.

How is remote_scope derived? From the visible location/candidate-required-location text: Worldwide/Anywhereworldwide; a single country → country_restricted; a multi-country region (Europe, APAC, …) → region_restricted; otherwise unknown.

Can I paste a source URL? Yes — put supported URLs (remotive.com / remoteok.com / weworkremotely.com) in sourceUrls. Unsupported URLs are logged as failed inputs and skipped without failing the run.

Can I export to CSV? Yes — every field is flat. Use Apify's CSV / Excel export, or call the dataset API with format=csv.


🛠️ Technical notes

  • Stack: Node.js 22 · Apify SDK 3 · Crawlee HttpCrawler · Cheerio (RSS/HTML parsing). No browser.
  • Sources: Remotive …/api/remote-jobs (JSON), Remote OK …/api (JSON), We Work Remotely category .rss feeds (XML).
  • Concurrency: min=1, max=5 (conservative; tune after real runs).
  • Memory: 1 GB min · 2 GB default · 4 GB max.
  • Proxy: Apify Proxy (Datacenter) by default; no-proxy and custom proxy URLs accepted; Apify Residential rejected at startup.