Himalayas Job Scraper avatar

Himalayas Job Scraper

Pricing

Pay per usage

Go to Apify Store
Himalayas Job Scraper

Himalayas Job Scraper

Meet the Himalayas Job Scraper, a lightweight actor designed to efficiently extract remote job listings from Himalayas.app. Fast, reliable, and easy to use. To ensure uninterrupted performance and avoid IP bans, the use of residential proxies is highly recommended.

Pricing

Pay per usage

Rating

5.0

(1)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

1

Bookmarked

22

Total users

9

Monthly active users

3 days ago

Last modified

Share

Himalayas Jobs Scraper

Extract remote job listings from Himalayas into clean, analysis-ready datasets. Collect job titles, companies, locations, compensation details, work type, requirements, and application deadlines at scale. This scraper is built for recruiting teams, talent intelligence workflows, and job market research.

Features

  • Flexible search inputs — Run by keyword and location, or a single startUrl.
  • Rich job coverage — Collect both listing-level and detailed job metadata in one dataset.
  • Automatic pagination — Traverse result pages up to your configured limits.
  • Null-clean output — Empty fields are removed automatically for cleaner downstream processing.
  • Duplicate protection — Canonical URL deduplication avoids repeated job records.
  • Integration-ready data — Export clean records for dashboards, CRMs, ATS tools, and automations.

Use Cases

Talent Intelligence

Track hiring patterns across companies, markets, and job families. Build searchable datasets to identify demand trends by role, region, and employment type.

Recruitment Sourcing

Collect high-quality openings for outreach pipelines. Filter by relevant keywords and locations to prioritize matching opportunities faster.

Salary and Market Analysis

Monitor salary visibility, work arrangements, and role requirements across listings. Use historical runs to evaluate market movement over time.

Job Board Monitoring

Run scheduled collections to detect newly posted opportunities and deadlines. Power alerting workflows for teams that need timely updates.


Input Parameters

ParameterTypeRequiredDefaultDescription
startUrlStringNo"https://himalayas.app/jobs"Start from one Himalayas jobs/search URL.
keywordStringNo"software engineer"Search keyword when URL inputs are not provided.
locationStringNo"United States"Optional location filter for keyword-based runs.
collectDetailsBooleanNotrueInclude raw_job payload in output for advanced use.
results_wantedIntegerNo20Maximum number of jobs to save.
max_pagesIntegerNo5Maximum pages to scan per seed URL.
proxyConfigurationObjectNoApify Proxy presetProxy settings for run reliability.

Output Data

Each dataset item may contain the following fields (fields with no value are omitted):

FieldTypeDescription
titleStringJob title.
excerptStringShort summary/excerpt from listing.
companyStringHiring company name.
company_slugStringCompany slug from listing data.
company_logoStringCompany logo URL.
locationStringConsolidated location text.
applicant_location_requirementsArrayExplicit applicant location requirements.
timezone_restrictionsArrayAllowed timezone offsets for candidates.
categoriesArrayJob categories/tags from source data.
parent_categoriesArrayParent category labels from source data.
date_postedStringJob posting date (ISO).
apply_beforeStringApplication deadline when provided.
Description_htmlStringSanitized rich job description HTML (no script/style/href/src attributes).
Description_textStringPlain-text version of description HTML.
salaryStringHuman-readable salary summary.
salary_minNumberMinimum salary value.
salary_maxNumberMaximum salary value.
salary_currencyStringSalary currency code.
job_typeStringPrimary employment type.
employment_typesArrayFull list of employment types.
remoteBooleanWhether role is marked remote.
seniorityArraySeniority values from source data.
raw_jobObjectFull source job object (only when collectDetails is true).
urlStringCanonical job URL.
sourceStringData source label.
scraped_atStringISO timestamp of extraction.

Usage Examples

Collect software engineering jobs with default detail collection:

{
"keyword": "software engineer",
"location": "United States",
"results_wanted": 20
}

URL-Based Collection

Start from a specific jobs URL:

{
"startUrl": "https://himalayas.app/jobs?q=data+engineer",
"results_wanted": 50,
"max_pages": 5,
"collectDetails": true
}

Targeted Keyword Run

Use a focused keyword and location combination:

{
"keyword": "python engineer",
"location": "Canada",
"results_wanted": 100,
"max_pages": 8
}

Lightweight Fast Run

Skip extended details for faster scans:

{
"keyword": "customer success",
"location": "Remote",
"collectDetails": false,
"results_wanted": 30,
"max_pages": 3
}

Sample Output

{
"title": "Senior Data Engineer",
"excerpt": "Build scalable data systems for global analytics workloads.",
"company": "Example Labs",
"company_slug": "example-labs",
"company_logo": "https://cdn-images.himalayas.app/example-logo",
"location": "United States, Canada",
"applicant_location_requirements": ["United States", "Canada"],
"timezone_restrictions": [-8, -7, -6, -5],
"categories": ["Data-Engineer", "Software-Engineer"],
"parent_categories": ["Developer"],
"date_posted": "2026-04-01T10:22:00.000Z",
"apply_before": "2026-05-01T10:22:00.000Z",
"Description_html": "<p>We are looking for a <strong>senior data engineer</strong> to build scalable data systems.</p><ul><li>Python</li><li>SQL</li></ul>",
"Description_text": "We are looking for a senior data engineer to build scalable data systems. Python SQL",
"salary": "120000-160000 USD",
"salary_min": 120000,
"salary_max": 160000,
"salary_currency": "USD",
"job_type": "Full Time",
"employment_types": ["Full Time"],
"remote": true,
"seniority": ["Senior"],
"url": "https://himalayas.app/companies/example-labs/jobs/senior-data-engineer",
"source": "himalayas.app",
"scraped_at": "2026-04-08T08:35:12.220Z"
}

Tips for Best Results

Start Small, Then Scale

  • Use results_wanted around 20-50 for quick validation.
  • Increase limits only after confirming your filters produce relevant jobs.

Use Specific Search Intent

  • Prefer focused queries like "backend engineer" over broad terms like "engineer".
  • Combine keyword and location when you need tighter targeting.

Balance Depth and Speed

  • Keep collectDetails set to true when you need enriched fields.
  • Set collectDetails to false for faster monitoring runs.

Keep Data Clean

  • Canonical URL deduplication is always applied internally to avoid repeated records.
  • Run periodic exports to keep downstream datasets current.

Improve Stability on Large Runs

  • Use proxy configuration when collecting many pages.
  • Increase max_pages gradually to avoid unnecessary retries.

Integrations

Connect your collected data with:

  • Google Sheets — Build collaborative tracking sheets.
  • Airtable — Create searchable recruiting databases.
  • Slack — Send job alerts to hiring channels.
  • Zapier — Trigger no-code automations.
  • Make — Orchestrate multi-step workflows.
  • Webhooks — Deliver records to your own systems.

Export Formats

  • JSON — Structured integration and application use.
  • CSV — Spreadsheet analysis and reporting.
  • Excel — Business-friendly data sharing.
  • XML — Legacy or enterprise integration flows.

Frequently Asked Questions

Can I run with only a URL and no keyword?

Yes. You can provide startUrl without a keyword.

Can I collect jobs from multiple searches in one run?

Run the actor multiple times with different startUrl or keyword/location inputs, then combine datasets downstream.

Why are some fields missing from some jobs?

Different postings expose different metadata. The actor saves all available fields and omits empty ones.

How can I make runs faster?

Lower results_wanted and max_pages, or set collectDetails to false.

How can I avoid duplicate records?

Deduplication is automatic using canonical job URLs.

Is this suitable for scheduled monitoring?

Yes. It works well for periodic runs that track new postings and market movement.


Support

For issues or enhancement requests, use the actor support channel in the Apify Console.

Resources


This actor is intended for legitimate data collection and analysis workflows. Users are responsible for ensuring compliance with applicable laws, website terms, and internal data-governance policies.