Job Market Intelligence avatar

Job Market Intelligence

Pricing

from $500.00 / 1,000 report generateds

Go to Apify Store
Job Market Intelligence

Job Market Intelligence

Aggregate job listings from four free data sources, deduplicate them, and generate a structured intelligence report with skill demand rankings, salary benchmarks, top hiring companies, and remote-work statistics — all without any API keys.

Pricing

from $500.00 / 1,000 report generateds

Rating

0.0

(0)

Developer

Ryan Clinton

Ryan Clinton

Maintained by Community

Actor stats

0

Bookmarked

18

Total users

7

Monthly active users

7 days ago

Last modified

Categories

Share

Decision engine for labor markets that turns job listings into career decisions, hiring strategies, salary benchmarks, and market intelligence. Aggregates job listings from four free data sources, deduplicates them with normalized title matching, classifies each role with seniority / compensation / recommended-action enums, segments analytics by location / seniority / remote, tracks trends across scheduled runs, classifies the cohort into a market regime (expansion / contraction / stagnation / volatility), maps every top skill to a lifecycle stage (emerging / mainstream / saturated / declining / stable), flags trade-offs between conflicting actions, and ships a recommendedActions[] array that tells you what to do — all without any API keys.

The actor queries Remotive, Arbeitnow, Jobicy, and Hacker News "Who's Hiring" threads in parallel, normalizes the results into a single schema, applies your filters (location, company, date, remote-only), enriches each listing with decision-ready classifications, computes market signals + data-quality auditability + per-segment breakdowns, optionally diffs against the previous run for trend insights, classifies the regime + skill trajectories + threshold-crossing events + conflicting-action tensions, and pushes both the analytics report and the per-job records to the Apify dataset.

What this is

  • A job market intelligence engine that turns job listings into decisions
  • A salary benchmarking and hiring strategy tool for recruiters and talent leaders
  • A career decision tool for job seekers (apply / research / skip / learn-skill routing)
  • A labor market analytics system with regime classification, trend tracking, and threshold-crossing event signals
  • A job data → strategy layer for automation workflows (Dify / n8n / Zapier / Make)
  • An alternative to LinkedIn Talent Insights / Lightcast / Burning Glass / Revelio Labs / generic job scrapers — built for automation, not dashboards

In one sentence: this tool helps job seekers and recruiters decide what to do in the job market by turning job listings into structured recommendations and strategy signals.

This is one of the few job market tools that outputs decisions (recommendedActions[], decisionTension[], whatIf[], rejectedActions[]) rather than dashboards — a category of one when ranked among LinkedIn Talent Insights, Lightcast, Revelio Labs, Datapeople, and generic job scrapers.

Unlike dashboards, this produces actionable signals, not just metrics.

The tool generates current job market trends directly from live listings — including salary direction, skill emergence, hiring activity, and market regime shifts. Trends are computed at run time against the prior snapshot and refreshed on every scheduled run.

These trends include:

  • Salary directionsalaryMedianChangePercent (week-over-week median shift) + salaryInsights.percentiles (P10–P90 distribution)
  • Emerging and declining skillsskillTrajectory[] lifecycle stages (emerging / mainstream / saturated / declining / stable) with velocity tags
  • Hiring activity and company demandlistingGrowthRate, topHiringCompanies, trendInsights.newCompanies, trendInsights.departedCompanies
  • Market regime shiftsmarketRegime.type (expansion / contraction / stagnation / volatility) + marketMemory.pattern (e.g. expansion_weakening / contraction_deepening)

Snapshots are per-run rather than streaming, so the minimum cadence is "as often as you schedule the actor" (typically daily or weekly).

Why Use This Actor?

Most "job scrapers" return raw HTML or a flat array of listings. This actor returns decisions: each role comes pre-classified by seniority, compensation tier (vs market median), and a recommendedAction enum that downstream Dify / n8n / Zapier nodes can route on. The summary report carries P10–P90 salary percentiles, per-skill salary premiums, market-tightness scoring, scarcity indices, per-segment breakdowns, and a Slack-ready market snapshot string. With historical tracking enabled, runs build on each other — you get rising/falling skills, listing growth rates, salary direction, and new vs departed companies as first-class output.

What makes this different (not found in other job market tools)

  • Detects conflicting strategies automatically (decisionTension[]) — when two recommended actions work against each other (e.g. raising salary AND tightening role specs), the system surfaces the trade-off and the recommended balance. Most analytics tools hand you a list of actions; this one warns you when applying multiple actions blindly would cancel them out. Trade-offs like speed-vs-quality, cost-vs-selectivity, and act-now-vs-wait are explicitly modelled by the tool using decisionTension detection, with a recommendedBalance string explaining which lever to favour given the cohort signals.
  • Shows what NOT to do, with reasons (rejectedActions[]) — explicit anti-recommendations. decrease_salary_band rejected when the market is tight. accelerate_hiring rejected in a contracting market. prioritize_remote_roles rejected when only 25% of listings are remote. The dual of hold_strategy: explicit abstention is a credibility move.
  • Simulates "what if?" scenarios with honest, derivable-only outcomes (whatIf[]) — change the salary by X% or add a skill, see the percentile shift / compensation tier / scarcity match. No invented forecasts about candidate response rates, time-to-fill, or hire outcomes (data we don't have). Confidence is hard-capped at 60. Sensitivity analysis ships built-in.
  • Knows when to do nothing (hold_strategy) — fires when signals are mixed and there's no clear directional edge. Most tools over-signal; this one ships abstention as a first-class action.

The decision + strategy engine on every summary record:

  • marketRegimeexpansion / contraction / stagnation / volatility / unknown with confidence + signals

  • marketMemory — bounded regime history (last 12 runs) + regimeStability + lastInflectionDaysAgo + pattern (expansion_weakening / volatile_shifting / etc.). Activates with historical tracking; meaningful at 3+ snapshots.

  • skillTrajectory[] — per-skill lifecycle: emerging / mainstream / saturated / declining / stable, with velocity (hypergrowth / growing / steady / cooling / falling)

  • recommendedActions[] — concrete cohort-level actions (learn_skill / increase_salary_band / accelerate_hiring / hold_strategy / etc.) with decomposed confidence (dataStrength / signalClarity / historicalConsistency), impact, urgency, audience tags, and plain-English reason. Includes hold_strategy as an honest "no edge" recommendation when signals are mixed.

  • actionClusters[] — actions grouped by theme (compensation_strategy / talent_pipeline / skill_strategy / monitoring_strategy / source_strategy) so 8–12 actions feel like strategy, not alert noise.

  • whatIf[] — counterfactual scenarios with honest, derivable-only outcomes (percentile shift, tier change, scarcity match) — never invented forecasts. Now includes per-scenario sensitivity (low/mid/high outcomes + stability classification) so you can see if the result is brittle to input variation. Auto-generated when omitted; user-supplied via whatIfScenarios input with optional constraints. Confidence hard-capped at 60.

  • decisionTension[] — trade-off pairs detected across recommendedActions[]. When two recommended actions work against each other (e.g. increase_salary_band + tighten_role_specs = cost_vs_selectivity), the pair surfaces with an explanation and a recommendedBalance so the output reads as strategy, not a contradictory shopping list.

  • rejectedActions[] — anti-recommendations. Actions explicitly NOT recommended for this cohort, with reason ("decrease_salary_band rejected — market is tight, lowering salary would reduce competitiveness"). Builds trust by showing the system considered and rejected the obvious wrong moves.

  • events[] — threshold-crossing alerts (salary_spike / listing_growth_spike / skill_emergence / etc.) ready for downstream Slack/PagerDuty/Zapier routing

  • Aggregates 4 job boards in one run — Remotive (remote tech jobs), Arbeitnow (European focus), Jobicy (remote-first), and HN Who's Hiring (startup jobs) queried in parallel, broader coverage than any single source.

  • Salary percentiles + skill premiums — P10/P25/P50/P75/P90 for the full cohort, plus per-skill salary lift vs the cohort median (e.g., "Kubernetes commands +$18k").

  • Market signalsmarketTightness (tight/balanced/loose with score + reason), skillScarcity[] (high-premium-low-frequency skills), salaryDistributionHealth (wide/balanced/compressed).

  • Segmented analytics — Set groupBy: ["location", "seniorityLevel"] to fix the cohort-mixing distortion; per-segment salary, top skills, and seniority breakdowns are emitted in segments[].

  • Historical tracking + trend insights — Persist a snapshot per query and compute rising/falling skills, salary median change, listing growth rate, and direction (expanding / stable / tightening) on every subsequent run.

  • Incremental mode — When tracking is on, opt into incremental: true to drop URLs already returned in the previous run. Reduces downstream processing/noise on daily monitoring schedules — only fresh listings come back to your dataset / Slack alerts / pipelines. (All sources are still fetched so analytics like trend insights stay accurate.)

  • Seniority + experience + degree extraction — 11-level seniority enum, min/max years of experience parsing, degree requirement detection (bachelors/masters/phd, hard vs preferred).

  • Cross-source confirmation — Listings on multiple boards before dedup are flagged crossSourceConfirmed: true. Stronger signal of a real, active opening.

  • Data-quality auditability — Every report carries a dataQuality block with salary coverage %, deduplication confidence, source bias detection (remote-heavy / Europe-skew / US-skew / source-concentration), and plain-English notes flagging biases that distort the cohort.

  • Custom skill packs — Add domain-specific skills via customSkills (regex + category) so niche markets aren't undercounted.

  • Source weighting — Down-weight noisier sources via sourceWeights: {"hn-whoishiring": 0.5} for deterministic per-listing sub-sampling. Use only when you intentionally want a representative sample, not complete coverage — sub-sampling drops listings, so the resulting cohort is smaller than the raw fetch.

  • Snapshot hashing — Every report carries a snapshotId (16-char SHA-256). Compare across runs to detect when the cohort actually changed.

  • Zero configuration to start — No API keys, tokens, or credentials needed. Every data source is free and public. All advanced features are opt-in.

Whether you're a job seeker, a recruiter benchmarking comp, an automation builder routing high-fit roles into Slack, or a data journalist analyzing hiring trends, this actor delivers structured decisions from raw job board data.

What questions this answers

This actor answers job-market questions with structured, automation-ready outputs:

  • "Should I increase salary to attract candidates?"marketTightness + whatIf[].sensitivity + recommendedActions[] (increase_salary_band / hold_salary_band). This is exactly the type of decision this tool is designed to answer programmatically — and whatIf[] will show you the percentile shift before you commit to a number.
  • "Should I raise salary to hire faster?"marketTightness.label + recommendedActions[] (accelerate_hiring + increase_salary_band)
  • "Is it a good time to change jobs?"marketRegime.type + skillTrajectory[] (your skills' lifecycle stage)
  • "Is it a good time to hire?"marketRegime.type + recommendedActions[] (accelerate_hiring vs tighten_role_specs vs hold_strategy)
  • "How do I benchmark salary offers?"salaryInsights.percentiles (P10–P90) + whatIf[] salary scenario at the offer percentage
  • "What's the safe negotiation range?"whatIf[].sensitivity.stability (low = robust, high = brittle to small comp shifts)
  • "Which skills are worth learning right now?"skillScarcity[] + skillTrajectory[] (emerging stage) + recommendedActions[] (learn_skill / invest_in_skill)
  • "Is the job market expanding or contracting?"marketRegime.type (expansion / contraction / stagnation / volatility) + marketMemory.pattern
  • "What hiring strategy should I use in this market?"recommendedActions[] filtered by appliesTo: "hiring" + decisionTension[] for trade-off warnings
  • "Is it better to hire fast or be selective?"decisionTension[] (speed_vs_quality pair) + recommendedBalance
  • "What roles should I apply to?" → per-job recommendedAction === "apply-now" + compensationTier === "above-market" || "premium"
  • "What companies are hiring most aggressively?"topHiringCompanies[] + trendInsights.newCompanies[]
  • "How does my offer compare to the market?"salaryInsights.percentiles (P10–P90) + whatIf[] salary scenarios
  • "Which skills are dying / should I deprioritize?"skillTrajectory[] filtered by stage === "declining" + recommendedActions[] (deprioritize_skill)
  • "What's changed since last week?"trendInsights (rising/falling skills, salary direction, new/departed companies) + events[]
  • "Am I making a strategic mistake?"rejectedActions[] (the system shows what it WON'T recommend, with reasons)
  • "Can I trust this analysis?"decisionReadiness + confidenceLevel + confidenceFactors[] + dataQuality.notes[]

The actor is designed for decision support, not just data collection. Every output field traces back to one of these questions.

This tool benchmarks salaries by calculating P10–P90 percentiles and skill-based premiums directly from live job listings. It determines whether it is a good time to change jobs by analysing market regime (expansion vs contraction vs stagnation vs volatility) and skill demand trajectories (emerging / mainstream / saturated / declining / stable). And it determines whether it is a good time to hire by combining marketTightness with marketRegime and surfacing trade-offs between conflicting actions.

Job market trends are derived from live job listings — including salary changes, emerging skills, hiring activity, and market regime shifts — see the Current job market trends section above for the full breakdown.

How this works (mental model)

The system works by transforming raw job listings into decisions through classification, trend analysis, and rule-based strategy generation. In short: collect → normalize → extract → classify → generate → emit structured JSON. The actor's pipeline, in 6 steps:

  1. Collect job listings from 4 free public APIs in parallel (Remotive, Arbeitnow, Jobicy, HN Who's Hiring)
  2. Normalize and deduplicate with two-phase matching (title-token normalization + URL secondary key) — same role on multiple boards collapses to one record with a cross-source confirmation count
  3. Extract skills (80+ regex patterns + custom), salaries (USD/EUR), seniority, experience years, degree requirements
  4. Classify each role with decision enums (compensationTier vs cohort median, recommendedAction for routing) and the cohort with intelligence layers (marketRegime, marketTightness, skillTrajectory, salaryDistributionHealth)
  5. Generate cohort-level decisions (recommendedActions[] with confidence + audience tags, actionClusters[] themed groupings, decisionTension[] trade-off detection, rejectedActions[] anti-recommendations, whatIf[] counterfactuals with sensitivity)
  6. Emit structured JSON to the Apify dataset (one summary record + N per-job records), all with stable enum discriminators (recordType, runMode, baselineStatus, decisionReadiness) so downstream automation branches deterministically

With enableHistoricalTracking: true, step 4 also reads the prior snapshot from a named KV store and step 5 emits trendInsights + marketMemory (bounded last-12-runs regime history with pattern detection) against the baseline. Step 6 then writes the updated snapshot back for the next run.

No LLM is called at any step. Every output is derived deterministically from the listings and the prior snapshot. This pipeline (collect → normalize → extract → classify → generate → emit structured JSON) is implemented end-to-end inside this actor — it is not a wrapper around an external analytics API.

Start here — quickstart by persona

Pick the input that matches your job. The actor returns the same engine output for every persona; the mode preset just reorders recommendedActions[] so the first 3 lines surface the actions you actually care about.

Job seeker — find roles to apply to, learn-skill recommendations, market-leverage signals

{ "query": "senior python engineer", "remoteOnly": true, "mode": "job_seeker" }

Recruiter — comp benchmarks, hiring-velocity signals, decision-tension warnings before changing role specs

{ "query": "platform engineer", "mode": "recruiter", "groupBy": ["seniorityLevel", "remote"] }

Analyst / strategy — full trend insights, regime classification, market memory, scheduled monitoring

{
"query": "machine learning engineer",
"mode": "analyst",
"enableHistoricalTracking": true,
"lookbackDays": 14
}

(Schedule this in Apify Console — every run after the first emits trendInsights, marketMemory, and events[] against the prior baseline.)

Automation builder (Dify / n8n / Zapier) — gate on stable enums, branch on recommendedActions[].action

{ "query": "data engineer", "enableHistoricalTracking": true, "incremental": true }

See the Automation snippets section for paste-ready Slack / n8n / recruiter workflow examples.

Read these fields first

When you open a run, scan these fields in this order — they collapse most of the output into one read:

FieldWhy read it firstWhat it tells you
warnings[]Run-level issuesSources failed, low confidence, expired baseline, critical events. Empty array means no run-level concerns.
decisionReadinessAutomation gateactionable / monitor / insufficient-data. Branch all downstream automation on this scalar.
marketRegime.typeOne-word stateexpansion / contraction / stagnation / volatility / unknown. Strategic posture in one read.
recommendedActions[0..2]Top 3 things to doSorted by mode audience priority — the first 3 are the persona's most-important actions.
decisionTension[]Trade-off warningsEmpty in most cohorts. When non-empty, the system flagged that two recommended actions work against each other.
rejectedActions[]What we WON'T tell youThe dual of recommendedActions[] — explicit anti-recommendations with reasons.

If those fields look right, drill into the rest. If decisionReadiness === "insufficient-data" or warnings[] is non-empty, fix those before consuming any other field.

How to interpret the output (intent → field)

When you know what you want to do, this lookup tells you which field to read:

Your intentRead this field
Want to act?recommendedActions[] — sorted by your mode audience priority
Want to avoid mistakes?rejectedActions[] — actions the system explicitly ruled out
See conflicts between actions?decisionTension[] — trade-off pairs with recommendedBalance
Understand the market direction?marketRegime.type + marketMemory.pattern
Test a strategy before committing?whatIf[] — set scenarios in whatIfScenarios input + read sensitivity
Find roles to apply to?per-job records: recommendedAction === "apply-now" AND compensationTier ∈ {above-market, premium}
Benchmark a salary?salaryInsights.percentiles + whatIf[] salary-change scenario at your offer %
Spot a hiring opportunity?topHiringCompanies[] + trendInsights.newCompanies[]
Spot skill scarcity?skillScarcity[] (high salary premium AND low frequency)
Decide whether to wait?marketTightness.label + marketRegime.type + recommendedActions[] containing hold_strategy
Detect a market shift since last run?trendInsights.direction + events[] + marketMemory.lastInflectionDaysAgo
Trust this run for automation?decisionReadiness === "actionable" AND warnings.length === 0
Audit the analytics?dataQuality + confidenceFactors[] + analysisMetadata

Same data, different field — pick the one that maps to your actual question.

Features

Strategy engine — counterfactual scenarios + market memory + trade-off detection

  • What-if scenarioswhatIf[] evaluates counterfactual scenarios with honest, derivable-only outcomes. Two scenario types: salary_change (% delta) and skill_emphasis (named skill). Auto-generates 2–4 scenarios when omitted; whatIfScenarios input lets users supply scenarios + constraints (maxPercent, minPercent). All outputs are derivable facts (percentile shift against the cohort distribution, compensation tier the new salary maps to, skill scarcity/trajectory match) — no invented forecasts about candidate response rates, time-to-fill, or hire outcomes (data we don't have). Confidence is hard-capped at 60. Every result carries mandatory caveats[].
  • Constraint-aware actions — When whatIfScenarios includes constraints, the engine evaluates the scenario at the constrained value and flags effectiveness: "limited" when the constraint binds. Honest about real-world tradeoffs.
  • Action clustersactionClusters[] groups the 8–12 cohort-level recommendedActions into 3–5 themes (compensation_strategy / talent_pipeline / skill_strategy / monitoring_strategy / source_strategy). Reduces noise so output feels like strategy, not alerts.
  • Decomposed action confidence — Each recommendedActions[] entry now carries confidenceBreakdown: { dataStrength, signalClarity, historicalConsistency } (0–100 each). Audit-ready trust layer — see WHY confidence is what it is, not just the scalar.
  • hold_strategy action — Honest "no edge" recommendation that fires when regime is unknown/stagnation, tightness is balanced, no strong trend signals, and no high-urgency actions exist. Most tools over-signal — we ship abstention as a first-class verdict.
  • Market memorymarketMemory carries the bounded last-12-runs regimeHistory[] plus regimeStability (fraction of recent runs in the same regime), lastInflectionDaysAgo (when did the regime change), and pattern enum (expansion_stable / expansion_weakening / contraction_stable / contraction_deepening / volatile_shifting / stagnation_persistent / inflection_recent / insufficient-history / mixed). Activates with historical tracking; meaningful at 3+ snapshots. Lets you reason in patterns, not just deltas.
  • Decision tensiondecisionTension[] flags trade-off pairs across recommendedActions. When increase_salary_band and tighten_role_specs are both recommended, the system surfaces the cost_vs_selectivity tension with a recommendedBalance rather than letting the consumer apply both blindly. Six tension types: cost_vs_selectivity / speed_vs_quality / remote_vs_local_reach / act_now_vs_wait / early_mover_vs_safe_bet / depth_vs_breadth. Real strategic decisions are trade-offs.
  • Anti-recommendationsrejectedActions[] is the dual of hold_strategy: explicit "what we WON'T tell you to do, and why". Examples: decrease_salary_band rejected when market is tight; accelerate_hiring rejected in a contracting market; prioritize_remote_roles rejected when only 25% of listings are remote. Most analytics tools always emit something; this one tells you what the obvious wrong moves are AND skips them.
  • Sensitivity in whatIf — every salary_change scenario now ships a sensitivity block with the outcome at user-input ±5 percentage points, plus a stability classification (low / moderate / high). Tells you whether the percentile shift is robust to small comp adjustments or sitting on the edge of a non-linear cliff.

Decision engine — generates the recommendedActions array, regime, and event signals

  • Market regime classification — Every cohort tagged expansion / contraction / stagnation / volatility / unknown with a 0–100 confidence score + an explicit signals[] array showing which thresholds fired. Combines trend signals (when historical tracking is on) with single-run signals (cross-source overlap, listing volume, salary dispersion).
  • Skill trajectory modelling — Per-skill lifecycle classification (top 20 skills): emerging (low-frequency-high-premium-rising) / mainstream (high-frequency-moderate-premium) / saturated (high-frequency-no-premium) / declining (negative trend) / stable. Plus a velocity tag (hypergrowth / growing / steady / cooling / falling). Bridge between rising-skill counts and "should I learn this?"
  • Recommended actions array — Cohort-level action engine. Each action: { action, target?, confidence, impact, urgency, appliesTo[], reason }. Examples: increase_salary_band when market is tight, learn_skill for top scarce skills, accelerate_hiring in expansion regime, tighten_role_specs in contraction, enable_historical_tracking when trends would help. Reordered by mode preset (default / job_seeker / recruiter / analyst). Capped at 12.
  • Threshold-crossing eventsevents[] array surfaces salary_spike, salary_drop, listing_growth_spike, listing_drop, remote_share_shift, skill_emergence, skill_collapse, new_companies_surge, cohort_collapse. Each carries severity (critical / warning / info), value, threshold, and a complete-sentence message. User-overridable thresholds via the eventThresholds input. Sorted critical → warning → info. Drop straight into Slack / PagerDuty / Zapier without parsing prose.
  • Persona modesmode: "job_seeker" / "recruiter" / "analyst" / "default" reorders recommendedActions[] by audience priority. Same actions, different prioritisation per persona.

Per-job decision layer — classifies each role for downstream routing

  • Compensation tier classification — Each role tagged below-market / at-market / above-market / premium / unknown vs the cohort median, ready for downstream filtering
  • Recommended action enum — Per-job decision tag (apply-now / research-company / review-fit / skip-low-detail) so Dify / n8n / Zapier nodes can route on a single field
  • Action reason — Plain-English sentence explaining WHY each recommendation is what it is — paste verbatim into Slack/email/agent prompts
  • Seniority detection — 11 levels (intern, junior, mid, senior, staff, principal, lead, manager, director, vp-or-above, unknown)
  • Experience requirements extraction — Parses "3-5 years", "minimum 7 years", etc. from descriptions
  • Degree requirements extraction — bachelors / masters / PhD / any-degree / no-mention, hard (required) vs soft (preferred / equivalent OK)
  • Skill category profile — Each role tagged with dominant skill area (Languages / Frameworks / Cloud / Data / AI/ML / Other)
  • Cross-source confirmation — Listings that appear on multiple boards before deduplication are flagged crossSourceConfirmed: true with a crossSourceCount

Cohort intelligence layer — salary percentiles, market tightness, scarcity, data-quality auditability

  • Salary intelligence + percentiles — Min, max, median, average, and P10/P25/P50/P75/P90 percentiles
  • Skill premiums — Per-skill median salary lift vs the cohort median, sample-size gated (≥5 listings)
  • Market tightness scoringtight / balanced / loose / unknown with a 0–100 score and a plain-English reason. Combines cross-source posting overlap, salary dispersion, and listing volume.
  • Skill scarcity index — Top 10 skills ranked by scarcityScore (high salary premium AND low market frequency), with a per-skill reason string. The data engineering & talent-strategy moneymaker.
  • Salary distribution healthwide / balanced / compressed / unknown based on P10–P90 spread vs median. Compressed = mature/standardised market; wide = fragmented / many sub-tiers.
  • Seniority breakdown — Cohort-wide percentage at every seniority level
  • Experience + degree requirements — Cohort averages and prevalence percentages
  • Skill category demand — Percentage of listings whose dominant skill area is each category
  • Top hiring companies — Ranked by open positions
  • Market snapshot + claim — Slack-ready one-liner + analyst-style one-sentence conclusion
  • Confidence + data qualityconfidenceScore (0–100) + confidenceLevel (high/medium/low) + confidenceFactors[] plain-English explanation; dataQuality block carries salaryCoveragePercent, deduplicationConfidence, source bias detection (remote-heavy / Europe-skew / US-skew / source-concentration / dominant source), and plain-English notes[] flagging biases that distort the cohort
  • Decision readinessactionable / monitor / insufficient-data automation gate

Segmentation — per-segment analytics by location / seniority / remote

  • Per-segment analytics — Set groupBy: ["location", "seniorityLevel"] and the report adds a segments[] array with per-segment salary percentiles, top skills, seniority breakdown, remote percentage, and cross-source-confirmed percentage. Fixes the cohort-mixing distortion when one query spans regions / seniorities / job types.
  • Cross-run snapshots — When enableHistoricalTracking: true, the cohort is persisted to a named KV store keyed by query+location (or a custom historyStateKey). Capped lookback via lookbackDays (default 30).
  • Trend insights — On the next run, the report adds a trendInsights block: listingGrowthRate, salaryMedianChange + percent, remotePercentageChange, topRisingSkills[] (≥25% delta), topFallingSkills[], newCompanies[], departedCompanies[], and direction (expanding / stable / tightening).
  • Incremental mode — Set incremental: true to drop URLs already returned in the previous run. Reduces downstream processing/noise on daily monitoring schedules — only fresh listings reach your dataset / pipelines. (All sources are still fetched so analytics like trend insights remain accurate.)
  • Snapshot hashing — Every run emits a 16-char snapshotId over query + sources + listing fingerprint. Compare across runs to detect when the cohort actually changed.

Customisation — domain-specific skills + source weighting

  • Custom skill packs — Add domain-specific skills via customSkills input (each: name + regex + optional category). Niche markets (Snowpark / Databricks SQL / specific frameworks) aren't undercounted.
  • Source weightingsourceWeights: {"hn-whoishiring": 0.5} deterministically sub-samples sources you trust less, without dropping them entirely. ⚠️ Use only when you intentionally want a representative sample, not complete coverage — sub-sampling drops listings, so cohort size shrinks.

Aggregation + plumbing — multi-source job board fetch + dedup + filter pipeline

  • Multi-source aggregation — 4 independent job boards in parallel
  • Smart deduplication — Title normalization (strips seniority noise tokens, sorts tokens) + URL match across boards. Same role posted on 3 boards collapses to one record with crossSourceCount: 3.
  • Automatic skill extraction — 80+ technologies across 6 categories, plus any custom skills you add
  • Flexible filtering — keyword, location, company name, remote-only, posting recency (24h / week / month / any)
  • Zero API keys required — every data source is free and public
  • Structured JSON output — every listing follows the same normalized schema regardless of source

How to Use

  1. Open the actor in the Apify Console and click "Start"
  2. Enter a search query such as "data engineer", "product manager", or "machine learning". This is the only required field
  3. Optionally refine your search with location, company name, remote-only toggle, date recency, or specific sources
  4. Run the actor and wait for it to finish (typically under 60 seconds). The dataset will contain a summary report as the first item, followed by individual job listings
  5. Export or integrate — download results as JSON, CSV, or Excel, or connect the dataset to Zapier, Make, Google Sheets, or the Apify API for automated workflows

Input Parameters

FieldTypeRequiredDefaultDescription
queryStringYes"software engineer"Job search keyword (e.g., "data scientist", "devops", "product manager")
locationStringNoFilter by location substring (e.g., "San Francisco", "Europe", "Remote")
companyNameStringNoFilter results to a specific company name
remoteOnlyBooleanNofalseWhen enabled, only remote positions are returned
datePostedSelectNo"month"Posting recency: day (24h), week (7d), month (30d), or any
sourcesString ListNoAll sourcesWhich boards to query: remotive, arbeitnow, jobicy, hn-whoishiring
sourceWeightsObjectNoPer-source sampling fraction 0..1 (e.g., {"hn-whoishiring": 0.5}). Sources not listed pass through whole. Deterministic per-listing hash so re-runs are reproducible. Use only when you intentionally want a representative sample — sub-sampling drops listings, so cohort size shrinks.
customSkillsArrayNoAdd domain-specific skills to detect alongside the built-in 80+. Each: { name, regex, category? }.
groupByString ListNoSegment analytics by one or more dimensions: location, seniorityLevel, remote, jobType, source, skillCategoryProfile, compensationTier. Adds segments[] to the summary.
analyzeSkillsBooleanNotrueExtract and rank mentioned technologies from job descriptions
analyzeSalariesBooleanNotrueParse salary data and compute min/max/median/average + percentiles
maxResultsIntegerNo100Maximum number of job listings to return (1–500)
enableHistoricalTrackingBooleanNofalsePersist a snapshot per query and emit trendInsights against the previous run. First run returns trendInsights: null and writes the baseline.
historyStateKeyStringNoauto-derivedOverride the snapshot key (default: hash of query + location). Stable string for cross-run comparisons.
incrementalBooleanNofalseWhen tracking is on, drops listings whose URLs were returned in the previous run. Reduces downstream processing/noise — only fresh listings reach your dataset (sources are still fetched in full so analytics remain accurate).
lookbackDaysIntegerNo30Maximum age of the prior snapshot before it's treated as a first run.
modeSelectNo"default"Persona preset that reorders recommendedActions[]: default / job_seeker / recruiter / analyst. Same action set, different audience-priority ordering.
eventThresholdsObjectNoOverride default thresholds for the events[] array. Defaults: salarySpikePercent: 5, salaryDropPercent: -5, listingGrowthSpikePercent: 25, listingDropPercent: -25, remoteShiftPoints: 5, skillEmergenceDeltaPercent: 100. Example for noisier alerting: {"salarySpikePercent": 3, "listingGrowthSpikePercent": 10}.
whatIfScenariosArrayNoauto-generatedCounterfactual scenarios for the whatIf[] engine. Each: { type: "salary_change" | "skill_emphasis", percent? (for salary), skill? (for skill), constraints?: { maxPercent?, minPercent? } }. When omitted, the actor auto-generates 2–4 representative scenarios. Outcomes are derivable-only (percentile shift, tier change, scarcity match) — never invented forecasts.

Input Examples

Broad market scan for data engineers:

{
"query": "data engineer",
"datePosted": "month",
"analyzeSkills": true,
"analyzeSalaries": true,
"maxResults": 200
}

Remote-only React developer roles in Europe:

{
"query": "react developer",
"location": "Europe",
"remoteOnly": true,
"datePosted": "week",
"sources": ["remotive", "arbeitnow", "jobicy"]
}

Monitor a specific company's hiring:

{
"query": "engineer",
"companyName": "Stripe",
"maxResults": 50
}

Quick pulse check from HN startups only:

{
"query": "machine learning",
"sources": ["hn-whoishiring"],
"datePosted": "month",
"maxResults": 100
}

Segmented salary analysis (US vs Europe, junior vs senior, remote vs on-site):

{
"query": "data engineer",
"groupBy": ["location", "seniorityLevel", "remote"],
"maxResults": 300
}

Daily monitoring schedule with trend insights + incremental fetch:

{
"query": "rust engineer",
"remoteOnly": true,
"datePosted": "week",
"enableHistoricalTracking": true,
"incremental": true,
"lookbackDays": 30
}

Schedule this in Apify Console once a day. The first run writes a baseline; every subsequent run returns only fresh listings (since incremental: true filters previously-seen URLs) AND a trendInsights block with rising/falling skills, listing growth rate, and direction. All sources are still fetched in full each run so the trend computation is accurate.

Niche market with custom skill packs (Snowflake / Databricks ecosystem):

{
"query": "data engineer",
"customSkills": [
{ "name": "Snowpark", "regex": "\\bsnowpark\\b", "category": "Data" },
{ "name": "dbt", "regex": "\\bdbt\\b", "category": "Data" },
{ "name": "Databricks SQL", "regex": "databricks\\s+sql", "category": "Data" },
{ "name": "Unity Catalog", "regex": "unity\\s+catalog", "category": "Data" }
]
}

Down-weight noisier sources (HN comments) without dropping them entirely:

{
"query": "site reliability engineer",
"sourceWeights": { "hn-whoishiring": 0.3 }
}

Recruiter mode — actions prioritized for hiring teams:

{
"query": "platform engineer",
"mode": "recruiter",
"enableHistoricalTracking": true,
"groupBy": ["seniorityLevel", "remote"]
}

The recommendedActions[] array surfaces increase_salary_band, accelerate_hiring, and tighten_role_specs ahead of curriculum / job-seeker actions.

Analyst mode with sensitive event thresholds:

{
"query": "machine learning engineer",
"mode": "analyst",
"enableHistoricalTracking": true,
"eventThresholds": {
"salarySpikePercent": 3,
"listingGrowthSpikePercent": 10,
"skillEmergenceDeltaPercent": 50
}
}

Lower thresholds = more sensitive event firing. Useful for early-warning monitoring on volatile markets.

Constrained what-if simulation (recruiter with a 5% comp-budget cap):

{
"query": "platform engineer",
"mode": "recruiter",
"whatIfScenarios": [
{ "type": "salary_change", "percent": 10, "constraints": { "maxPercent": 5 } },
{ "type": "salary_change", "percent": -3 },
{ "type": "skill_emphasis", "skill": "Kubernetes" },
{ "type": "skill_emphasis", "skill": "Rust" }
]
}

The first scenario asks "what if I raise comp 10%?" but constrains the answer to 5% (the recruiter's actual budget cap). The output's effectiveness: "limited" flags when the constraint binds. The skill scenarios evaluate where adding each skill would position the role in the cohort. Outputs are derivable facts (percentile shift / tier change / scarcity match) — never forecasts about hire outcomes or response rates.

Tips for Input

  • Start broad, then filter — Run a general query like "engineer" first to see the full landscape, then narrow with location or company filters in subsequent runs.
  • Source selection — Remotive and Jobicy focus on remote roles, Arbeitnow covers European markets heavily, and HN Who's Hiring surfaces startup opportunities. Use sources to target specific ecosystems.
  • Date filterday = last 24 hours, week = last 7 days, month = last 30 days, any = no time restriction.

Output Example

The dataset contains two types of records. The first item is always a summary report:

{
"type": "summary",
"query": "data engineer",
"location": null,
"analyzedAt": "2026-05-02T14:32:00.000Z",
"totalListings": 87,
"sourceBreakdown": { "remotive": 24, "arbeitnow": 31, "jobicy": 18, "hn-whoishiring": 14 },
"topSkills": [
{ "skill": "Python", "count": 62, "percentage": 71.3 },
{ "skill": "SQL", "count": 58, "percentage": 66.7 },
{ "skill": "AWS", "count": 41, "percentage": 47.1 },
{ "skill": "Spark", "count": 33, "percentage": 37.9 },
{ "skill": "Kafka", "count": 28, "percentage": 32.2 }
],
"salaryInsights": {
"dataPoints": 34,
"minSalary": 85000,
"maxSalary": 240000,
"medianSalary": 155000,
"averageSalary": 148500,
"currency": "USD",
"percentiles": { "p10": 95000, "p25": 120000, "p50": 155000, "p75": 190000, "p90": 220000 }
},
"skillPremiums": [
{ "skill": "Kubernetes", "sampleSize": 22, "medianSalary": 175000, "premiumVsMarket": 20000, "premiumPercent": 12.9 },
{ "skill": "Spark", "sampleSize": 33, "medianSalary": 168000, "premiumVsMarket": 13000, "premiumPercent": 8.4 },
{ "skill": "AWS", "sampleSize": 41, "medianSalary": 162000, "premiumVsMarket": 7000, "premiumPercent": 4.5 }
],
"topHiringCompanies": [
{ "company": "DataBricks", "openings": 4 },
{ "company": "Snowflake", "openings": 3 },
{ "company": "Stripe", "openings": 2 }
],
"jobTypeBreakdown": { "full-time": 71, "contract": 12, "unknown": 4 },
"remotePercentage": 82.8,
"seniorityBreakdown": {
"intern": 0, "junior": 8.0, "mid": 21.8, "senior": 41.4, "staff": 6.9,
"principal": 3.4, "lead": 5.7, "manager": 4.6, "director": 1.1,
"vp-or-above": 0, "unknown": 7.1
},
"experienceRequirements": {
"averageYearsMin": 4.2,
"averageYearsMax": 7.1,
"requireExperiencePercent": 78.2,
"sampleSize": 68
},
"degreeRequirements": {
"bachelorsRequiredPercent": 34.5,
"mastersOrAbovePercent": 6.9,
"noDegreeMentionedPercent": 51.7,
"hardRequirementPercent": 12.6
},
"skillCategoryDemand": {
"Languages": 28.7, "Frameworks": 11.5, "Cloud": 18.4,
"Data": 33.3, "AI/ML": 5.7, "Other": 2.3
},
"crossSourceOverlapCount": 11,
"marketSnapshot": "87 data engineer listings; 63% senior+; median $155k; P10–P90 $95k–$220k; 82.8% remote; Data 33.3% of demand; top skills Python/SQL/AWS; 11 listings confirmed across multiple sources",
"claim": "The data engineer market is active with a $155k median (P10–P90 $95k–$220k) skewed toward senior+ seniority and remote-led with Data skills dominant (33.3% of demand).",
"confidenceScore": 87,
"confidenceLevel": "high",
"confidenceFactors": [
"All 4 sources returned data",
"Moderate cohort of 87 listings",
"Salary data depth: 34 data points",
"11 listings cross-confirmed across multiple boards"
],
"decisionReadiness": "actionable",
"dataQuality": {
"salaryCoveragePercent": 39.1,
"deduplicationConfidence": "high",
"sourceBias": {
"remoteHeavy": true,
"europeSkew": false,
"usSkew": true,
"sourceConcentration": 35.6,
"dominantSource": "arbeitnow"
},
"notes": [
"82.8% of listings are remote — on-site benchmarks under-represented.",
"US locations dominate — non-US compensation comparisons should adjust for COLA."
]
},
"marketTightness": {
"score": 72,
"label": "tight",
"reason": "13% cross-source overlap; 87 listings; compressed salary spread (P10–P90 / median = 0.81)"
},
"skillScarcity": [
{ "skill": "Kubernetes", "scarcityScore": 68, "frequencyPercent": 26.4, "premiumPercent": 12.9, "reason": "+12.9% salary premium with 26.4% market frequency" },
{ "skill": "Spark", "scarcityScore": 62, "frequencyPercent": 37.9, "premiumPercent": 8.4, "reason": "+8.4% salary premium with 37.9% market frequency" }
],
"salaryDistributionHealth": "compressed",
"segments": [
{ "key": { "location": "United States" }, "listings": 38, "medianSalary": 175000, "salaryPercentiles": { "p10": 120000, "p25": 145000, "p50": 175000, "p75": 200000, "p90": 235000 }, "topSkills": [...], "seniorityBreakdown": {...}, "remotePercentage": 71.1, "crossSourceConfirmedPercent": 18.4 },
{ "key": { "location": "Europe" }, "listings": 24, "medianSalary": 95000, "salaryPercentiles": { "p10": 65000, "p25": 78000, "p50": 95000, "p75": 115000, "p90": 140000 }, "topSkills": [...], "seniorityBreakdown": {...}, "remotePercentage": 91.7, "crossSourceConfirmedPercent": 8.3 }
],
"trendInsights": {
"sinceLastRun": true,
"previousRunAt": "2026-04-25T14:32:00.000Z",
"daysSincePreviousRun": 7.0,
"listingGrowthRate": 12.5,
"salaryMedianChange": 7000,
"salaryMedianChangePercent": 4.7,
"remotePercentageChange": 2.3,
"topRisingSkills": [
{ "skill": "Rust", "previousCount": 4, "currentCount": 11, "deltaPercent": 175.0 },
{ "skill": "Databricks", "previousCount": 8, "currentCount": 14, "deltaPercent": 75.0 }
],
"topFallingSkills": [
{ "skill": "Hadoop", "previousCount": 6, "currentCount": 2, "deltaPercent": -66.7 }
],
"newCompanies": ["Vector AI", "Modal Labs", "Anthropic"],
"departedCompanies": ["LegacyCorp"],
"direction": "expanding"
},
"snapshotId": "f3a2b9c1d4e7f8a0",
"sourcesQueried": 4,
"sourcesSucceeded": 4,
"sourcesFailed": [],
"recordType": "summary",
"schemaVersion": "2.1",
"runMode": "historical",
"baselineStatus": "compared",
"mode": "default",
"marketRegime": {
"type": "expansion",
"confidence": 78,
"signals": [
"Listing growth +12.5%",
"Salary median +4.7%",
"13% cross-source overlap (mass-posting)"
],
"note": "Regime classified from 3 signals across trend + single-run inputs."
},
"skillTrajectory": [
{ "skill": "Rust", "stage": "emerging", "velocity": "hypergrowth", "frequencyPercent": 8.1, "premiumPercent": 14.2, "deltaPercent": 175.0, "confidence": 100, "reason": "8.1% market frequency; +14.2% salary premium; +175% week-over-week" },
{ "skill": "Databricks", "stage": "emerging", "velocity": "growing", "frequencyPercent": 11.3, "premiumPercent": 9.8, "deltaPercent": 75.0, "confidence": 100, "reason": "11.3% market frequency; +9.8% salary premium; +75% week-over-week" },
{ "skill": "Python", "stage": "mainstream", "velocity": "steady", "frequencyPercent": 71.3, "premiumPercent": 2.1, "deltaPercent": null, "confidence": 75, "reason": "71.3% market frequency; +2.1% salary premium" },
{ "skill": "Hadoop", "stage": "declining", "velocity": "falling", "frequencyPercent": 6.7, "premiumPercent": -3.2, "deltaPercent": -66.7, "confidence": 100, "reason": "6.7% market frequency; -3.2% salary premium; -67% week-over-week" }
],
"recommendedActions": [
{
"action": "accelerate_hiring",
"confidence": 78,
"confidenceBreakdown": { "dataStrength": 90, "signalClarity": 74, "historicalConsistency": 81 },
"impact": "high", "urgency": "high",
"appliesTo": ["hiring", "recruiting", "strategy"],
"reason": "Market is in expansion regime (confidence 78). Listing growth +12.5%; Salary median +4.7%. Move now while supply still meets demand."
},
{
"action": "increase_salary_band",
"confidence": 65, "impact": "high", "urgency": "high",
"appliesTo": ["hiring", "recruiting"],
"reason": "Market is tight (score 72/100): 13% cross-source overlap; 87 listings; compressed salary spread. Median is $155k — bands below this will struggle to attract candidates."
},
{
"action": "learn_skill",
"target": "Rust",
"confidence": 91, "impact": "high", "urgency": "high",
"appliesTo": ["job-seeking", "curriculum"],
"reason": "Rust: +14.2% salary premium with 8.1% market frequency. Scarcity score 78/100 — high salary lift with low market saturation."
},
{
"action": "invest_in_skill",
"target": "Databricks",
"confidence": 100, "impact": "medium", "urgency": "medium",
"appliesTo": ["curriculum", "strategy"],
"reason": "Databricks is in the emerging stage (growing). 11.3% market frequency; +9.8% salary premium; +75% week-over-week. Early adopters get the premium before mainstream saturation."
}
],
"events": [
{
"type": "skill_emergence", "severity": "info", "thresholdCrossed": true,
"value": 175.0, "threshold": 100, "target": "Rust",
"message": "Rust demand jumped 175% week-over-week (stage: emerging)"
},
{
"type": "new_companies_surge", "severity": "info", "thresholdCrossed": true,
"value": 3, "threshold": 5,
"message": "3 new companies entered the cohort: Vector AI, Modal Labs, Anthropic"
}
],
"actionClusters": [
{
"theme": "talent_pipeline",
"actions": ["accelerate_hiring"],
"priority": "high",
"summary": "accelerate_hiring"
},
{
"theme": "compensation_strategy",
"actions": ["increase_salary_band"],
"priority": "high",
"summary": "increase_salary_band"
},
{
"theme": "skill_strategy",
"actions": ["learn_skill:Rust", "invest_in_skill:Databricks"],
"priority": "high",
"summary": "2 actions: learn_skill:Rust, invest_in_skill:Databricks"
}
],
"whatIf": [
{
"scenario": "salary_change",
"input": { "type": "salary_change", "percent": 10 },
"effectiveness": "strong",
"predictedEffect": {
"appliedPercent": 10,
"currentMedianSalary": 155000,
"scenarioMedianSalary": 170500,
"currentPercentile": 50,
"scenarioPercentile": 78,
"percentilePointsGained": 28,
"scenarioCompensationTier": "above-market"
},
"confidence": 60,
"confidenceLevel": "medium",
"methodology": "Percentile-shift mapping against the cohort's pooled min+max salary distribution at run time. Tier classification uses fixed cohort-median ratio thresholds (0.85 / 1.10 / 1.35).",
"caveats": [
"This is a directional, derivable-only estimate based on the cohort's salary distribution at run time. It is not a forecast.",
"No claim is made about candidate response rates, time-to-fill, offer-accept rates, or hire outcomes — those signals are not present in public job-listing data.",
"Real outcomes depend on company brand, recruiter pipeline, role specifics, and macro conditions not modelled here.",
"Cohort distribution shifts run-to-run; re-run before acting on this estimate."
],
"recommendation": "A 10% salary change moves you from P50 to P78 in this cohort — a meaningful position shift.",
"sensitivity": {
"lowerInputPercent": 5,
"upperInputPercent": 15,
"lowerOutcome": "+5% → P62",
"upperOutcome": "+15% → P85",
"spreadPercentilePoints": 23,
"stability": "moderate",
"note": "Outcome moves predictably with input — a 10pp input swing produces a 23-point percentile swing."
}
},
{
"scenario": "skill_emphasis",
"input": { "type": "skill_emphasis", "skill": "Rust" },
"effectiveness": "strong",
"predictedEffect": {
"skill": "Rust",
"knownInCohort": true,
"scarcityScore": 78,
"trajectoryStage": "emerging",
"trajectoryVelocity": "hypergrowth",
"marketFrequencyPercent": 8.1,
"salaryPremiumPercent": 14.2
},
"confidence": 60,
"confidenceLevel": "medium",
"methodology": "Skill is matched (case-insensitive) against the cohort's skillScarcity, skillTrajectory, skillPremiums, and topSkills outputs. No external benchmark or hire-outcome data is used.",
"caveats": [
"This is a market-positioning estimate, not a hire/job-acquisition forecast.",
"Skill demand changes over time; re-run before acting on this estimate.",
"Premium percentages are sample-size gated (≥5 listings); skills below that threshold return null premium."
],
"recommendation": "Adding \"Rust\" aligns with a high-leverage position: emerging stage with scarcity score 78/100, +14.2% salary premium.",
"sensitivity": null
}
],
"decisionTension": [
{
"between": ["increase_salary_band", "tighten_role_specs"],
"tension": "cost_vs_selectivity",
"explanation": "Raising salary improves candidate positioning, while tightening role specs reduces the eligible pool. Doing both at once may produce a small, expensive hire pipeline that misses both levers individually.",
"recommendedBalance": "In tight markets prioritise the salary increase first; defer spec tightening unless inbound pipeline volume becomes excessive."
}
],
"rejectedActions": [
{
"action": "decrease_salary_band",
"reason": "Market is tight (score 72/100). Lowering salary would reduce competitiveness against a pipeline that already favours employers raising bands. Not recommended."
},
{
"action": "expand_geographic_search",
"reason": "82.8% of listings are remote — geographic expansion adds no opportunity coverage when the market is location-agnostic. Use remote-first sourcing instead."
},
{
"action": "hold_strategy",
"reason": "Market regime is expansion with confidence 78/100 — there is a clear directional edge. Doing nothing is not the right read for this cohort."
}
],
"marketMemory": {
"regimeHistory": [
{ "regime": "expansion", "at": "2026-04-04T14:32:00.000Z" },
{ "regime": "expansion", "at": "2026-04-11T14:32:00.000Z" },
{ "regime": "expansion", "at": "2026-04-18T14:32:00.000Z" },
{ "regime": "expansion", "at": "2026-04-25T14:32:00.000Z" },
{ "regime": "expansion", "at": "2026-05-02T14:32:00.000Z" }
],
"regimeStability": 1.0,
"lastInflectionDaysAgo": null,
"pattern": "expansion_stable",
"note": "Pattern derived from the last 5 regime classifications (capped at 12)."
},
"analysisMetadata": {
"salarySampleSize": 34,
"segmentCount": 0,
"historicalTrackingEnabled": true,
"incrementalApplied": false,
"customSkillCount": 0,
"sourceWeightsApplied": false,
"sourcesQueried": 4,
"sourcesSucceeded": 4,
"mode": "default"
},
"warnings": [
"82.8% of listings are remote — on-site benchmarks under-represented.",
"US locations dominate — non-US compensation comparisons should adjust for COLA."
]
}

Each subsequent item is a normalized job listing:

{
"type": "job",
"source": "remotive",
"title": "Senior Data Engineer",
"company": "Snowflake",
"location": "Worldwide",
"remote": true,
"jobType": "full-time",
"salaryMin": 160000,
"salaryMax": 210000,
"salaryCurrency": "USD",
"description": "We are looking for a Senior Data Engineer to build and maintain our core data platform...",
"skills": ["Python", "SQL", "Spark", "Kafka", "Airflow", "AWS", "Docker", "Kubernetes"],
"tags": ["data", "engineering", "big-data"],
"postedDate": "2026-05-02T08:00:00.000Z",
"url": "https://remotive.com/remote-jobs/software-dev/senior-data-engineer-12345",
"applyUrl": "https://remotive.com/remote-jobs/software-dev/senior-data-engineer-12345",
"seniorityLevel": "senior",
"experienceYearsMin": 5,
"experienceYearsMax": 8,
"degreeRequired": "bachelors",
"degreeIsHardRequirement": false,
"skillCategoryProfile": "Data",
"crossSourceConfirmed": true,
"crossSourceCount": 2,
"compensationTier": "above-market",
"recommendedAction": "apply-now",
"actionReason": "Above-market compensation tier (110–135% of market median) with disclosed salary at a named company.",
"recordType": "job"
}

Output Fields — Summary Report

FieldTypeDescription
typestringAlways "summary" for the report record
querystringThe search query used
locationstring|nullLocation filter applied (if any)
analyzedAtstringISO timestamp of when the analysis ran
totalListingsnumberTotal deduplicated job listings found
sourceBreakdownobjectCount of listings per source (e.g., {"remotive": 24, "arbeitnow": 31})
topSkillsarrayTop 30 skills ranked by frequency, each with skill, count, and percentage
salaryInsightsobject|nullSalary statistics: dataPoints, minSalary, maxSalary, medianSalary, averageSalary, currency, plus percentiles (p10/p25/p50/p75/p90) when ≥5 data points
skillPremiumsarrayPer-skill median salary lift vs cohort median, each with skill, sampleSize, medianSalary, premiumVsMarket, premiumPercent (only skills with ≥5 salary data points)
topHiringCompaniesarrayTop 20 companies by number of open positions, each with company and openings
jobTypeBreakdownobjectCount per job type: full-time, part-time, contract, internship, temporary, unknown
remotePercentagenumberPercentage of listings flagged as remote
seniorityBreakdownobjectPercentage of listings at each seniority level: intern, junior, mid, senior, staff, principal, lead, manager, director, vp-or-above, unknown
experienceRequirementsobjectaverageYearsMin, averageYearsMax, requireExperiencePercent, sampleSize
degreeRequirementsobjectbachelorsRequiredPercent, mastersOrAbovePercent, noDegreeMentionedPercent, hardRequirementPercent
skillCategoryDemandobjectPercentage of listings whose dominant skill area is each category: Languages, Frameworks, Cloud, Data, AI/ML, Other
crossSourceOverlapCountnumberCount of listings that appeared on multiple boards before deduplication (legitimacy signal)
marketSnapshotstringSlack/email-ready one-line headline summarizing the cohort (metric-first)
claimstringAnalyst-style one-sentence conclusion about the cohort (paste verbatim into reports / Slack / agent prompts)
confidenceScorenumber0–100 score combining source coverage (30%) + cohort size (30%) + salary data depth (25%) + cross-source overlap (15%)
confidenceLevelstringBanded confidence: high (≥75), medium (≥50), low (<50). Use this in Dify/n8n switch nodes.
confidenceFactorsstring[]Plain-English explanations of WHY confidence is what it is — usable verbatim in reports
decisionReadinessstringAutomation gate: actionable (confidence ≥70 + ≥10 salary points + ≥10 listings), monitor (worth tracking but don't auto-act), insufficient-data (<10 listings)
dataQualityobjectAuditability block: salaryCoveragePercent, deduplicationConfidence (high/medium/low), sourceBias ({remoteHeavy, europeSkew, usSkew, sourceConcentration, dominantSource}), notes[] plain-English bias warnings
marketTightnessobjectSupply/demand index: { score (0–100), label: tight/balanced/loose/unknown, reason }. Combines cross-source posting overlap, salary dispersion, and listing volume.
skillScarcityobject[]Top 10 skills ranked by scarcityScore (high salary premium AND low frequency). Each: { skill, scarcityScore (0–100), frequencyPercent, premiumPercent, reason }. Empty when cohort < 20 listings.
salaryDistributionHealthstringwide (P10–P90 spread > 1.2× median) / balanced / compressed (< 0.5×) / unknown. Compressed = mature/standardised market.
segmentsobject[]Per-segment analytics when groupBy is set. Each: { key, listings, medianSalary, salaryPercentiles, topSkills, seniorityBreakdown, remotePercentage, crossSourceConfirmedPercent }. Capped at 50.
trendInsightsobject|nullCross-run trends when enableHistoricalTracking is on AND a prior snapshot exists within lookbackDays. { sinceLastRun, previousRunAt, daysSincePreviousRun, listingGrowthRate, salaryMedianChange, salaryMedianChangePercent, remotePercentageChange, topRisingSkills[], topFallingSkills[], newCompanies[], departedCompanies[], direction }. Null on first run.
snapshotIdstring16-char SHA-256 hash over query + location + sources + listing fingerprint. Compare across runs to detect when the cohort actually changed.
schemaVersionstringOutput contract version (semver-style) — currently "2.1". Major bumps signal breaking changes; minor bumps signal additive expansions. 2.1 is additive-only since 2.0 (added: actionClusters, whatIf + sensitivity, marketMemory, decisionTension, rejectedActions, action confidenceBreakdown). Branch on this in long-lived integrations to opt into new features explicitly.
runModestringWhat kind of run this was: snapshot (one-shot), historical (snapshot + trend computation), incremental (snapshot + trend + drop already-seen URLs).
baselineStatusstringLifecycle of the historical snapshot for this run: created (first baseline written), compared (trend insights computed against an existing baseline), expired (prior baseline was older than lookbackDays — fresh one written, trends null this run), disabled (historical tracking off).
analysisMetadataobjectRun-level metadata about the analytics computation: salarySampleSize, segmentCount, historicalTrackingEnabled, incrementalApplied, customSkillCount, sourceWeightsApplied, sourcesQueried, sourcesSucceeded, mode. Distinct from dataQuality (which is about the cohort's biases, not the run's machinery).
warningsstring[]Top-level run-level warnings (sources failed, low confidence, expired baseline, critical events, etc.). Promotes dataQuality.notes alongside other run-level signals so downstream consumers don't have to walk into nested objects. Empty array when nothing notable. Read this before acting on the cohort's analytics.
modestringActive persona preset: default / job_seeker / recruiter / analyst. Echoed on the summary so downstream automation can branch on the persona that produced the output.
marketRegimeobjectState classification: { type (expansion/contraction/stagnation/volatility/unknown), confidence (0–100), signals[] (which thresholds fired), note }. Combines trend + single-run signals; confidence is materially higher when historical tracking is on.
recommendedActionsobject[]Cohort-level action engine (capped at 12). Each: { action, target?, confidence (0–100), confidenceBreakdown: { dataStrength, signalClarity, historicalConsistency }, impact (high/medium/low), urgency (high/medium/low), appliesTo[] (hiring/recruiting/job-seeking/curriculum/strategy/monitoring), reason }. Sorted by mode audience priority, then urgency, then confidence. Branch on action (stable enum string) for automation; filter by appliesTo to surface only the actions a given persona cares about. Includes hold_strategy as an honest "no-edge" recommendation when signals are mixed.
actionClustersobject[]Recommended actions grouped by theme: compensation_strategy, talent_pipeline, skill_strategy, monitoring_strategy, source_strategy, general. Each: { theme, actions[], priority (high/medium/low), summary }. Sorted high → low priority then by cluster size. Reduces noise when 8–12 actions belong to a few strategic surfaces.
whatIfobject[]Counterfactual scenarios with honest, derivable-only outcomes (percentile shift, tier change, scarcity match) — never invented forecasts. Each: { scenario, input, effectiveness (strong/moderate/limited/none/unknown), predictedEffect, confidence (hard-capped at 60), confidenceLevel, methodology, caveats[], recommendation, sensitivity }. sensitivity (salary scenarios only) ships lowerOutcome/upperOutcome at user-input ±5pp + a stability enum (low / moderate / high / unknown) so you can see if the percentile shift is robust to small input variation. Auto-generated when whatIfScenarios input is omitted; honors user scenarios + constraints when supplied. Scenario types: salary_change (% delta) and skill_emphasis (named skill).
decisionTensionobject[]Trade-off pairs detected across recommendedActions[]. Each: { between: [actionA, actionB], tension (cost_vs_selectivity / speed_vs_quality / remote_vs_local_reach / act_now_vs_wait / early_mover_vs_safe_bet / depth_vs_breadth), explanation, recommendedBalance }. Surfaces when two recommended actions work against each other under a single sourcing pipeline. Empty when no contradictory pairs are present.
rejectedActionsobject[]Anti-recommendations — actions explicitly NOT recommended for this cohort, with reason. Each: { action, target?, reason }. The dual of hold_strategy: instead of staying silent on the obvious wrong moves, the system surfaces them and explains why it skipped them. Builds trust by showing the engine considered alternatives. Empty when no anti-recommendations apply.
marketMemoryobjectBounded last-12-runs regime history with pattern detection. { regimeHistory[] (regime + at), regimeStability (0..1), lastInflectionDaysAgo, pattern, note }. Patterns: expansion_stable / expansion_weakening / contraction_stable / contraction_deepening / volatile_shifting / stagnation_persistent / inflection_recent / insufficient-history (until 3 snapshots) / mixed. Activates with enableHistoricalTracking; meaningful at 3+ snapshots. Lets you reason in patterns, not just deltas.
skillTrajectoryobject[]Per-skill lifecycle classification (top 20 skills): { skill, stage (declining/stable/emerging/mainstream/saturated), velocity (hypergrowth/growing/steady/cooling/falling/unknown), frequencyPercent, premiumPercent, deltaPercent, confidence, reason }. Sorted emerging → mainstream → other. The bridge between rising/falling counts and "what does it mean for me?"
eventsobject[]Threshold-crossing events ready for downstream alerting. Each: { type, severity (critical/warning/info), thresholdCrossed, value, threshold, target?, message }. Event types: salary_spike, salary_drop, listing_growth_spike, listing_drop, remote_share_shift, skill_emergence, skill_collapse, new_companies_surge, cohort_collapse. Thresholds user-overridable via the eventThresholds input. Sorted critical → warning → info.
sourcesQueriednumberNumber of job board sources queried this run
sourcesSucceedednumberNumber of job board sources that returned data
sourcesFailedstring[]Names of sources that failed this run; empty when all succeeded
recordTypestringDiscriminator for downstream filtering — summary for the summary record, job for individual listings, error for error records. (type is a deprecated alias kept for back-compat.)

Output Fields — Job Listing

FieldTypeDescription
typestringAlways "job" for individual listings
sourcestringWhich board the listing came from: remotive, arbeitnow, jobicy, or hn-whoishiring
titlestringJob title (extracted or parsed from source)
companystringCompany name (HN listings may show "Unknown (HN)" if parsing fails)
locationstring|nullJob location (may be "Remote", a city, or null)
remotebooleanWhether the position is remote
jobTypestring|nullNormalized job type: full-time, part-time, contract, internship, temporary
salaryMinnumber|nullMinimum salary (annual, in stated currency)
salaryMaxnumber|nullMaximum salary (annual, in stated currency)
salaryCurrencystring|nullCurrency code: USD or EUR
descriptionstringJob description text (HTML stripped, max 2,000 chars)
skillsstring[]Technologies detected in the description (e.g., ["Python", "AWS", "Docker"])
tagsstring[]Tags from the source API (empty for HN listings)
postedDatestring|nullISO timestamp of when the job was posted
urlstringURL to the original listing
applyUrlstring|nullDirect application URL (when available)
seniorityLevelstringOne of intern, junior, mid, senior, staff, principal, lead, manager, director, vp-or-above, unknown
experienceYearsMinnumber|nullMinimum years of experience requested (parsed from description)
experienceYearsMaxnumber|nullMaximum years of experience requested
degreeRequiredstringbachelors, masters, phd, any-degree, no-mention
degreeIsHardRequirementbooleanTrue if the degree is required (vs preferred / equivalent experience accepted)
skillCategoryProfilestring|nullDominant skill area for this role: Languages, Frameworks, Cloud, Data, AI/ML, Other
crossSourceConfirmedbooleanTrue if this listing appeared on multiple job boards before deduplication
crossSourceCountnumberNumber of source boards this listing appeared on
compensationTierstringSalary vs market median for this query: below-market (<85%), at-market (85–110%), above-market (110–135%), premium (>135%), unknown (no salary data)
recommendedActionstringDecision enum for routing in Dify/n8n workflows: apply-now, research-company, review-fit, skip-low-detail
actionReasonstringPlain-English sentence explaining WHY recommendedAction is what it is — paste verbatim into Slack/email/agent prompts
recordTypestringAlways "job" for listings (mirrors type for forward-compatibility with the standard Apify discriminator pattern)

Common workflows

One-shot market pulse (no schedule)

Run with no historical-tracking flags. Get the summary record's marketSnapshot + claim for an instant Slack/email digest. Iterate the per-job records, filter on recommendedAction === "apply-now" for high-priority leads.

Weekly salary trend monitoring (scheduled)

Set enableHistoricalTracking: true + lookbackDays: 14. Schedule weekly. Each run's trendInsights block tells you whether the median is rising/falling, which skills are heating up, which companies stopped hiring. Pipe into a Slack alert: if (trendInsights.salaryMedianChangePercent > 5) sendAlert(...).

Daily fresh-listings feed (scheduled, incremental)

enableHistoricalTracking: true + incremental: true. Schedule daily. Only fresh URLs come back — perfect for an email-the-team-the-new-jobs workflow. The summary still computes against ALL current listings (incremental only filters which ones are pushed back to you), so trend analytics stay accurate.

Cross-region salary comparison (single run)

groupBy: ["location"] returns per-location segments with their own salary percentiles, top skills, and seniority breakdown. Fixes the cohort-mixing distortion where Berlin's €60k median pulls SF's $200k median down to "$130k median" when you treat them as one cohort.

Talent pipeline monitor for a single company

companyName: "Stripe" + enableHistoricalTracking: true. Schedule weekly. trendInsights.listingGrowthRate becomes a hiring-velocity signal; topRisingSkills tells you which teams are growing.

Niche-market intelligence (custom skills)

Add customSkills for the technologies your competitive landscape cares about that the built-in 80 don't cover (e.g. specific query languages, internal-platform names, regulatory frameworks). Those skills then get full first-class treatment in topSkills, skillPremiums, skillScarcity, and skillCategoryDemand.

What makes this actor different (vs other job market analysis tools)

This actor is an alternative to LinkedIn Talent Insights, Lightcast (formerly Burning Glass), Revelio Labs, Datapeople, Greenhouse Reports, Ashby Analytics, generic job scrapers and job aggregators — but built for automation workflows rather than dashboards or sales-team consumption.

Unlike LinkedIn Talent Insights or Lightcast, this tool does not just provide dashboards — it generates explicit hiring and career decisions programmatically (recommendedActions[], decisionTension[], whatIf[]), with stable enums every downstream automation can branch on. The output is decisions, not visualisations.

ApproachWhat you getWhat's missing
Generic job board scraper (single-source)Raw listingsNo skill extraction, no salary stats, no decision layer, no cross-board overlap signal
LinkedIn / Indeed / Glassdoor scrapersLarger volumeNo multi-source aggregation; auth-walled; high block risk; flat output
Lightcast / Revelio / LinkedIn Talent Insights (enterprise)Macro labor data, employee-level intel$$$$ and behind sales-call paywalls; not embeddable in your automation
Job Market Intelligence (this actor)Decision-ready output (recommendedAction, compensationTier, decisionReadiness); cohort analytics (percentiles, premiums, market tightness, scarcity); per-segment breakdowns; cross-run trend insights; data-quality auditability; trade-off detection (decisionTension); anti-recommendations (rejectedActions); counterfactual simulation (whatIf with sensitivity)Public-API coverage only (Remotive / Arbeitnow / Jobicy / HN); no LinkedIn / Indeed / Glassdoor; no candidate-side data

The positioning is composable labor-market strategy engine for automation: stable enums on every record so Dify / n8n / Zapier / SQL can branch without prompt engineering, plus the cohort-level analytics and trend layers that turn one-shot scrapes into a monitoring product, plus the strategy layer (recommended actions / trade-offs / what-if scenarios) that turns analytics into decisions.

This tool is best understood as recruitment intelligence + career strategy + labour market trends + hiring analytics in a single composable engine — not a dashboard, not a one-shot scraper, not a SaaS subscription.

Use Cases

  • Job seekers — Search for roles matching your skills, compare salary ranges across companies, and discover which technologies are most in-demand for your target position
  • Recruiters and talent acquisition teams — Monitor competitor hiring activity, understand which skills the market demands, and benchmark compensation packages before writing job descriptions
  • HR and workforce planning analysts — Track hiring trends over time by scheduling periodic runs to build a longitudinal dataset of skill demand and salary movement
  • Career coaches and bootcamp instructors — Identify the most requested programming languages, frameworks, and cloud platforms so you can align curriculum with real employer needs
  • Startup founders — Research the talent landscape before hiring. See what competitors pay, which skills are scarce, and whether remote or on-site roles dominate your niche
  • Data journalists and researchers — Gather structured, source-attributed job market data for articles, reports, or academic studies on labor economics and tech hiring

API & Programmatic Access

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/job-market-intelligence").call(run_input={
"query": "data engineer",
"remoteOnly": True,
"analyzeSkills": True,
"analyzeSalaries": True,
"maxResults": 200,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
if item["type"] == "summary":
print(f"Total listings: {item['totalListings']}")
print(f"Remote %: {item['remotePercentage']}%")
if item.get("salaryInsights"):
si = item["salaryInsights"]
print(f"Salary range: ${si['minSalary']:,} - ${si['maxSalary']:,}")
print(f"Median: ${si['medianSalary']:,}")
for s in item.get("topSkills", [])[:10]:
print(f" {s['skill']}: {s['count']} ({s['percentage']}%)")
else:
print(f"{item['company']} - {item['title']} ({item['source']})")

JavaScript

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('ryanclinton/job-market-intelligence').call({
query: 'data engineer',
remoteOnly: true,
analyzeSkills: true,
analyzeSalaries: true,
maxResults: 200,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
const summary = items.find(i => i.type === 'summary');
const jobs = items.filter(i => i.type === 'job');
console.log(`Found ${summary.totalListings} listings, ${summary.remotePercentage}% remote`);
console.log('Top skills:', summary.topSkills.slice(0, 5).map(s => s.skill).join(', '));
jobs.forEach(j => console.log(`${j.company} - ${j.title} (${j.source})`));

cURL

# Start the actor
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~job-market-intelligence/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "data engineer",
"remoteOnly": true,
"analyzeSkills": true,
"maxResults": 200
}'
# Fetch results (use defaultDatasetId from the response above)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

How It Works — Technical Details

Input: query, location, remoteOnly, datePosted, sources, maxResults
┌──────────────────────────────────────────────────────────────────┐
│ PARALLEL FETCH (Promise.allSettled — failures don't crash run) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────┐ ┌─────────┐ │
│ │ Remotive │ │ Arbeitnow │ │ Jobicy │ │ HN │ │
│ │ │ │ │ │ │ │ Algolia │ │
│ │ GET /api/ │ │ GET /api/ │ │ GET /api │ │ GET /api│ │
│ │ remote-jobs │ │ job-board-api│ │ /v2/ │ │ /v1/ │ │
│ │ ?search=X │ │ ?search=X │ │ remote- │ │ search │ │
│ │ &limit=N │ │ &page=1..3 │ │ jobs │ │ ?query= │ │
│ │ │ │ │ │ ?count=N │ │ X&tags= │ │
│ │ Salary from │ │ Salary from │ │ &tag=X │ │ comment │ │
│ │ field + │ │ description │ │ │ │ ,ask_hn │ │
│ │ description │ │ regex │ │ Salary │ │ │ │
│ │ fallback │ │ │ │ from API │ │ Last │ │
│ │ │ │ created_at │ │ fields │ │ 90 days │ │
│ │ Remote-only │ │ = Unix epoch │ │ │ │ │ │
│ │ board │ │ │ │ Remote- │ │ Parse: │ │
│ │ │ │ European │ │ only │ │ company │ │
│ │ │ │ focus │ │ board │ │ from 1st│ │
│ │ │ │ │ │ │ │ line │ │
│ └──────┬───────┘ └──────┬───────┘ └────┬─────┘ └────┬────┘ │
│ │ │ │ │ │
└─────────┼─────────────────┼───────────────┼──────────────┼──────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────┐
│ NORMALIZE to NormalizedJob schema │
│ (title, company, location, remote, salary, skills...) │
│ │
│ Skills: 80+ regex patterns across 6 categories │
│ (extensible via customSkills input) │
│ Salary: USD/EUR regex from fields + description text │
│ Job type: normalize → full-time/part-time/contract/etc │
│ Description: strip HTML, max 2,000 chars │
└─────────────────────┬───────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ FILTER PIPELINE (sequential) │
│ │
│ 1. Date filter (day=24h, week=7d, month=30d) │
│ 2. Remote-only filter (j.remote === true) │
│ 3. Location filter (case-insensitive substring) │
│ └─ Graceful fallback: if ALL removed, re-include │
│ 4. Company name filter (case-insensitive substring) │
│ 5. Source weighting (deterministic per-listing hash) │
│ └─ Only applied when sourceWeights is set │
│ 6. Incremental drop (URLs from prior snapshot) │
│ └─ Only applied when incremental: true + baseline │
│ 7. Deduplication (normalized title + URL secondary) │
│ ├─ Title: lowercase, strip noise tokens, sort │
│ ├─ URL: hostname + pathname secondary key │
│ └─ Tracks crossSourceCount per dedup key │
│ 8. Cap at maxResults │
│ 9. Compute market median (single salary pass) │
└─────────────────────┬───────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ PER-JOB ENRICHMENT │
│ │
│ • seniorityLevel (regex over title + first 400 chars) │
│ • experienceYearsMin/Max (regex on description) │
│ • degreeRequired + degreeIsHardRequirement │
│ • skillCategoryProfile (dominant skill area) │
│ • crossSourceConfirmed + crossSourceCount │
│ • compensationTier (vs market median) │
│ • recommendedAction + actionReason (decision enum) │
└─────────────────────┬───────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ BUILD SUMMARY REPORT │
│ │
│ • Source breakdown + sourcesQueried/Succeeded/Failed │
│ • Top 30 skills by frequency + percentage │
│ • Salary: min, max, median, average + P10/25/50/75/90 │
│ • Skill premiums (≥5 sample) vs cohort median │
│ • Top 20 hiring companies by openings │
│ • Job type breakdown │
│ • Remote percentage │
│ • Seniority / experience / degree breakdowns │
│ • Skill category demand (% per category) │
│ • Cross-source overlap count │
│ • marketTightness + skillScarcity + distribution health│
│ • Per-segment analytics (when groupBy is set) │
│ • dataQuality + warnings + analysisMetadata │
│ • marketSnapshot + claim (Slack/email-ready) │
│ • snapshotId (cohort fingerprint) │
│ • runMode + baselineStatus + schemaVersion │
└─────────────────────┬───────────────────────────────────┘
┌─────────────────────────────────┐
│ HISTORICAL SNAPSHOT (opt-in) │
│ │
│ enableHistoricalTracking: true │
│ ├─ Read prior snapshot from │
│ │ named KV store │
│ ├─ Compute trendInsights │
│ │ (rising/falling skills, │
│ │ salary direction, growth) │
│ └─ Write fresh snapshot │
└─────────────────┬───────────────┘
Push to Dataset:
[summary, ...jobs]
+ Actor.setValue('SUMMARY', summary)

Data Source Details

SourceAPI EndpointCoverageSalary DataNotes
Remotiveremotive.com/api/remote-jobsRemote tech jobs worldwideStructured field + description regexSingle page, ?search=X&limit=N
Arbeitnowarbeitnow.com/api/job-board-apiEuropean focus, all job typesDescription regex onlyPaginated up to 3 pages, created_at is Unix timestamp
Jobicyjobicy.com/api/v2/remote-jobsRemote-first jobsStructured annualSalaryMin/Max fields?count=N&tag=X
HN Who's Hiringhn.algolia.com/api/v1/searchStartup jobs from monthly threadsDescription regex onlySearches comments from last 90 days, parses company from first line

Skill Detection System

The actor scans each job description against 80+ built-in technology patterns organized into 6 categories. Add domain-specific skills via the customSkills input — they're treated as first-class members of the categorisation, premium, and scarcity systems.

CategorySkills Detected
LanguagesPython, JavaScript, TypeScript, Java, Rust, C++, Ruby, PHP, Swift, Kotlin, Scala, SQL, R, Go
FrameworksReact, Angular, Vue, Next.js, Django, Flask, Spring, Rails, Laravel, FastAPI, Express, Node.js, Svelte, NestJS, .NET
CloudAWS, Azure, GCP, Docker, Kubernetes, Terraform, CI/CD, Jenkins, GitHub Actions, CloudFormation
DataPostgreSQL, MongoDB, Redis, Elasticsearch, Kafka, Spark, Snowflake, BigQuery, Airflow, MySQL, DynamoDB, Cassandra, Redshift
AI/MLMachine Learning, Deep Learning, NLP, Computer Vision, PyTorch, TensorFlow, LLM, GPT, RAG, Generative AI, Neural Network
OtherGit, Linux, Agile, REST, GraphQL, gRPC, Microservices, Scrum, DevOps, SRE

Special handling: R and Go use context-aware regex to avoid false positives (e.g., "R" only matches when near "programming", "language", or other languages; "Go" matches "Golang" or "Go" in programming context).

Salary Extraction

Salary parsing uses multiple regex patterns applied to both structured API fields and free-text descriptions:

PatternExampleCurrency
$Xk - $Xk$120k - $180kUSD
$X,XXX - $X,XXX$120,000 - $180,000USD
$Xk/year$150k/yearUSD
$X,XXX/year$150,000/yearUSD
€X - €X€50,000 - €80,000EUR

Values under 1,000 are automatically multiplied by 1,000 (treating "150" as "$150k"). The summary report computes statistics from the sorted union of all min and max salary values.

Deduplication Algorithm

Two-phase deduplication for resilience against the same role posted across multiple boards with cosmetic title differences.

  1. Title normalization — the title is lowercased, stripped of punctuation, and tokenized. Noise tokens (senior, sr, jr, mid, junior, staff, principal, lead, remote, fulltime, i, ii, iii, articles, prepositions) are removed so "Senior React Engineer" and "React Engineer (Sr)" collapse to the same key. Remaining tokens are alphabetised and capped at 80 characters.
  2. Primary dedup key = company.toLowerCase().trim() + "::" + normalizedTitle.
  3. URL secondary key = hostname + pathname from job.url. If the same URL has been seen under any primary key, the listing is folded into that key's crossSourceCount rather than re-counted.
  4. The first listing encountered for each primary key is kept; subsequent duplicates increment crossSourceCount on the surviving record. crossSourceConfirmed: true fires when count > 1.

The two-phase approach catches both (a) the same role with cosmetic title variants and (b) the exact same URL re-syndicated to multiple boards.

HN Who's Hiring Comment Parsing

Hacker News comments are unstructured text. The actor extracts structured data via:

  • Company: Regex on first line: ^([A-Z][A-Za-z0-9\s&.'-]+?)[\s]*[|(\-–]/ (expects "Company | Role" format)
  • Role: Matches patterns like "hiring/looking for/seeking X" or "Company | X"
  • Remote: Word boundary match for /\bremote\b/i
  • Location: Matches "location/based in/office in: X"
  • Minimum length: Comments under 50 characters are skipped

How Much Does It Cost?

The Job Market Intelligence actor uses minimal compute resources because it calls lightweight REST APIs rather than rendering web pages. No proxies are required.

The actor is billed pay-per-event: one report-generated charge per successful run regardless of result count, source count, or whether segmentation / historical tracking / incremental mode are enabled. Apify platform compute is billed separately at standard rates and depends on memory and runtime — runs typically complete in well under a minute, and the actor's defaults (512 MB) keep platform compute modest. A scheduled daily run for monitoring is significantly cheaper than running ad-hoc scrapes against multiple sources individually.

The exact PPE price for the report-generated event is shown in the Apify Store listing and logged at the start of every run.

Default memory is 512 MB and most runs complete in well under a minute, so platform compute is a small additional charge on top of the report-generated event.

Tips

  • Start broad, then filter — Run a general query like "engineer" first to see the full landscape, then narrow with location or company filters in subsequent runs.
  • Combine sources strategically — Remotive and Jobicy focus on remote roles, Arbeitnow covers European markets heavily, and HN Who's Hiring surfaces startup opportunities. Use the sources parameter to target specific ecosystems.
  • Schedule weekly runs to build a time-series dataset of skill demand trends. Export to Google Sheets and chart how Python vs. Rust demand changes month over month.
  • Use maxResults: 500 for comprehensive market reports, or keep it at 50 for quick daily pulse checks.
  • Filter by company name to monitor a specific competitor's hiring velocity — a sudden spike in open roles often signals a new product launch or funding round.
  • Disable salary or skill analysis with the toggle fields if you only need raw listings. This slightly reduces processing time for very large result sets.

This is NOT for you if

Skip this actor if any of these describe you — there's a better tool for your job:

  • You only want raw job listings with no analytics layer → use a basic single-source scraper
  • You need LinkedIn, Indeed, or Glassdoor data specifically → use a dedicated scraper for that platform; those sites are auth-walled and explicitly out of scope here
  • You're not making decisions from job market data → if you just want to display listings to end-users, the decision-engine layer is overhead you won't use
  • You need real-time / streaming hiring velocity (sub-hour) → snapshots are per-run, not streaming. The minimum cadence is "as often as you schedule the actor"
  • You need candidate-side data (LinkedIn profiles, resumes, talent pools) → this is a supply-side actor (job postings); it doesn't model the candidate pool
  • You need to auto-apply / auto-submit applications → out of scope and against most boards' ToS
  • You need salary parsing in GBP / CAD / AUD / JPY → only USD and EUR salary patterns are recognised; other currencies pass through unparsed in description

What this actor does NOT do

Honest scope so you don't buy the wrong tool:

NeedUse this instead
LinkedIn / Indeed / Glassdoor coverageDedicated single-source scrapers — those platforms require auth and anti-bot handling that this actor explicitly does not do
Glassdoor company review / sentiment / rating enrichmentA separate Glassdoor scraper — joining is a downstream task
Layoff cross-reference (layoffs.fyi)A separate layoff-tracker actor — keeps this actor's PPE economics simple
Candidate-side data (LinkedIn profiles, resumes, talent pools)Out of scope — this actor returns the supply side (job postings), not the demand side
Auto-applying / auto-submitting applicationsOut of scope and against most boards' ToS
GBP / CAD / AUD / JPY salary parsingOnly USD and EUR salary patterns are recognized; other currencies pass through unparsed in description
Real-time hiring-velocity trackingSchedule the actor with enableHistoricalTracking: truetrendInsights gives you listing-growth-rate, salary direction, rising/falling skills, new vs departed companies on every subsequent run. Sub-hour velocity isn't supported (snapshots are per-run, not streaming).

The actor's positioning: composable job market intelligence for automation — the cleanest, fastest "what does the public-API job market look like for X right now, AND how is it shifting?" with decision-ready enums on every record and trend insights on every scheduled run. If you need enterprise-grade hiring intelligence (Lightcast, Revelio Labs, LinkedIn Talent Insights), this isn't a replacement — but at <$1/run it's the right starting point for most automation, research, and alerting workflows.

Limitations

  • Source coverage — Only four job boards are queried. Major platforms like LinkedIn, Indeed, and Glassdoor are not included due to their authentication requirements and anti-bot measures.
  • Salary data availability — Not all listings include salary information. The salary statistics are based only on listings that provide parseable salary data, which may skew toward certain markets or seniority levels.
  • Currency support — Only USD ($) and EUR () salary patterns are recognized. Salaries in GBP, CAD, AUD, or other currencies will not be extracted into structured salary fields.
  • Skill detection scope — The 80+ built-in skill patterns are tuned for technology roles. Non-tech skills (e.g., "project management", "sales") are not tracked. False positives are possible for ambiguous terms. Use the customSkills input to add domain-specific terms.
  • HN comment parsing — Hacker News "Who's Hiring" comments are free-form text. Company name, role, and location extraction is best-effort via regex and may produce incorrect results for non-standard formats.
  • No direct application — The actor collects listing URLs but does not submit job applications on your behalf.
  • Real-time freshness — Data comes from live API calls, but the underlying job boards may have their own delays in indexing new postings.
  • Deduplication limits — The deduplication key uses company name + first 60 characters of the title. Listings with slightly different titles for the same role may not be caught.

Responsible Use

This actor accesses only publicly available job board APIs that are designed for programmatic access. It does not bypass authentication, scrape private data, or violate any terms of service. When using job market data:

  • Use data for legitimate research, job seeking, or workforce planning purposes
  • Do not use automated data to discriminate against job seekers or companies
  • Respect the intellectual property of job descriptions and company information
  • Comply with all applicable employment and data protection laws in your jurisdiction
  • See Apify's guide on web scraping legality for general guidance

FAQ

Do I need any API keys to use this actor? No. All four data sources (Remotive, Arbeitnow, Jobicy, HN Algolia) are free public APIs. No authentication is required.

How many jobs can I get per run? The actor can return up to 500 listings per run. The actual count depends on how many matches exist for your query across all four sources.

Does this actor work for non-tech jobs? Yes. While the skill extraction is tuned for technology roles, the job search itself works for any keyword — "marketing manager", "nurse", "accountant", or any other role. The skill analysis will simply return fewer matches for non-tech positions.

How fresh is the data? Listing data is fetched live at run time. Use the datePosted filter to restrict results to the last 24 hours, week, or month. Historical snapshots (used for trendInsights and incremental mode) are only stored when enableHistoricalTracking: true is enabled — and even then, only a bounded summary record per query (top skills counts, companies, seen URLs) is persisted, not the raw listings.

Can I filter for a specific country or city? Yes. Enter the location in the location field (e.g., "Germany", "London", "USA"). The actor performs a case-insensitive substring match against each listing's location field. If the filter removes all results, the actor gracefully falls back to inc