Remote Job Scraper avatar
Remote Job Scraper

Pricing

Pay per usage

Go to Apify Store
Remote Job Scraper

Remote Job Scraper

Developed by

Shahid Irfan

Shahid Irfan

Maintained by Community

The Remote Job Scraper is a lightweight Apify actor designed to efficiently scrape remote job postings. It's optimized for speed and simplicity. For best results and to ensure reliable data extraction without blocks, the use of residential proxies is highly recommended.

0.0 (0)

Pricing

Pay per usage

0

2

2

Last modified

2 days ago

Remote.co Jobs Scraper 🌐

A production-ready Apify actor that scrapes remote job listings from Remote.co using Crawlee's CheerioCrawler and gotScraping for HTTP-based web scraping.

Features ✨

  • Stealth Scraping: Realistic browser headers, session management, and proxy rotation to avoid detection
  • Smart Pagination: Automatically follows Remote.co's ?page=N pagination until results quota is met
  • JSON-LD Priority: Extracts structured JobPosting data when available, with robust HTML fallbacks
  • Flexible Input: Search by keyword/location or provide direct Remote.co URLs
  • Dataset Deduplication: Built-in in-memory URL deduplication to prevent duplicate results
  • Polite Crawling: Configurable random delays between requests (500-1500ms default)
  • No Browser Required: Pure HTTP scraping using Crawlee + gotScraping stack
  • Production-Ready: Designed for Apify platform with full proxy support

Input πŸ“‹

All input fields are optional. The actor intelligently builds search URLs if you don't provide direct URLs.

Search Parameters

  • keyword (string) β€” Job title or skill keywords (e.g., "Software Engineer", "Marketing Manager")
  • location (string) β€” Location filter (most Remote.co jobs are remote/global)
  • category (string) β€” Optional job category filter
  • startUrl / url / startUrls β€” Direct Remote.co search URLs (overrides keyword/location)

Crawl Configuration

  • results_wanted (integer, default: 100) β€” Maximum number of jobs to scrape
  • max_pages (integer, default: 20) β€” Safety limit on pagination depth
  • collectDetails (boolean, default: true) β€” Visit detail pages for full descriptions
  • dedupe (boolean, default: true) β€” Remove duplicate job URLs

Stealth & Performance

  • minRequestDelay (integer, default: 500) β€” Minimum delay between requests (ms)
  • maxRequestDelay (integer, default: 1500) β€” Maximum delay between requests (ms)
  • proxyConfiguration (object) β€” Residential proxies recommended for Remote.co
    {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"]
    }

Advanced

  • cookies (string) β€” Raw cookie header string
  • cookiesJson (string) β€” JSON-formatted cookies (array or object)

Output πŸ“¦

Each job is saved to the Apify dataset with the following schema:

{
"title": "Senior Software Engineer",
"company": "Acme Corp",
"category": "Engineering",
"location": "Remote, US National",
"date_posted": "2025-10-18T12:00:00Z",
"description_html": "<p>Full job description HTML...</p>",
"description_text": "Plain text version of description",
"url": "https://remote.co/job-details/senior-software-engineer-abc123"
}

Example Usage πŸš€

{
"keyword": "python developer",
"results_wanted": 50,
"collectDetails": true
}

Direct URL with custom delays

{
"startUrl": "https://remote.co/remote-jobs/search?search_keywords=marketing",
"results_wanted": 100,
"minRequestDelay": 1000,
"maxRequestDelay": 2000
}

Technical Stack πŸ› οΈ

  • Apify SDK (^3.4.5) β€” Actor runtime and dataset management
  • Crawlee (^3.14.1) β€” CheerioCrawler for HTTP-based scraping
  • Cheerio (^1.0.0-rc.12) β€” Fast HTML parsing
  • got-scraping (^4.0.3) β€” HTTP client with stealth features

Best Practices πŸ’‘

  1. Use Residential Proxies: Remote.co may rate-limit datacenter IPs
  2. Start Small: Test with results_wanted: 10 before scaling up
  3. Monitor Sessions: Check run logs for 403/429 errors and adjust delays
  4. Enable Deduplication: Keep dedupe: true to avoid duplicate jobs
  5. Set Realistic Delays: 500-1500ms per request is polite and effective

Stealth Features πŸ₯·

  • Realistic Chrome 122 user-agent and browser headers
  • Session pooling with automatic rotation on errors
  • Random delays between requests (configurable)
  • Referer headers for detail pages
  • Cookie support for bypassing consent banners
  • Aggressive session retirement on 403/429 responses

Notes πŸ“

  • No local dependencies neededβ€”runs directly on Apify platform
  • Selectors are tuned for Remote.co's structure as of October 2025
  • If Remote.co changes their HTML, selectors in src/main.js may need updates
  • The actor respects robots.txt and uses polite crawling delays