Website Job Extractor (Browser) avatar

Website Job Extractor (Browser)

Pricing

Pay per usage

Go to Apify Store
Website Job Extractor (Browser)

Website Job Extractor (Browser)

Extract job listings from JavaScript-rendered career pages (React, Vue, Angular) using AI + Playwright. Companion to the HTTP-only Website Job Extractor. Use it for the ~28% of company sites that need a real browser. Same output format, same quality, same LLM fallback chain.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alessandro Santamaria

Alessandro Santamaria

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

6 hours ago

Last modified

Share

Extract job listings from JavaScript-rendered career pages (React, Vue, Angular SPAs) using AI + Playwright.

This is the browser-based companion to the Website Job Extractor (HTTP-only). Use this actor when the HTTP version flags companies with js_rendering_suspected: true.

When to use this actor

  • Career pages built with React, Vue, Angular, or other JS frameworks
  • Pages that return empty/skeleton HTML without JavaScript execution
  • Companies flagged by the HTTP actor's JS-rendering detection
  • Auto-chained via enablePlaywrightFallback on the HTTP actor

How it works

  1. Playwright renders the full page (waits for network idle + text content)
  2. Career page discovery from homepage navigation (same as HTTP actor)
  3. ATS detection for 19 systems (Personio, Greenhouse, Softgarden, etc.)
  4. LLM extraction using Gemini Flash / Groq / OpenRouter
  5. Validation with confidence scoring and deduplication
  6. Pagination follow-up for multi-page listings

Same extraction pipeline as the HTTP actor — same output format, same quality.

Input

Same input format as the HTTP actor. Typically auto-chained:

{
"companies": [
{
"company_id": "abc-123",
"company_name": "TechCorp AG",
"website_url": "https://techcorp.ch"
}
],
"llmProvider": "gemini",
"geminiApiKey": "YOUR_KEY"
}

Output

Each job is a dataset item with browser_extraction: true:

{
"company_id": "abc-123",
"company_name": "TechCorp AG",
"title": "Senior Frontend Developer (m/w/d)",
"location": "Zürich",
"employment_type": "Vollzeit",
"department": "Engineering",
"application_url": "https://techcorp.ch/jobs/apply/123",
"confidence": 0.85,
"browser_extraction": true,
"extracted_at": "2026-03-09T10:00:00.000Z"
}

Memory requirements

  • Minimum: 1024 MB (Playwright + Chrome)
  • Recommended: 2048 MB for 5+ companies
  • Maximum: 4096 MB

Pricing

Browser-based extraction costs ~2x the HTTP actor due to Chrome overhead:

EventCost
browser-company-enriched$0.02/company
browser-job-result$0.008/job

Auto-chaining

The HTTP actor can automatically trigger this browser actor for JS-flagged companies:

  1. Run the HTTP actor with enablePlaywrightFallback: true
  2. Companies with js_rendering_suspected are collected
  3. A browser actor run starts automatically (fire-and-forget)
  4. The browser run ID is saved in the key-value store as BROWSER_FALLBACK_RUN_ID

LLM fallback chain

Like the HTTP actor, this actor supports automatic provider fallback. Just provide API keys for the providers you want to use:

{
"geminiApiKey": "YOUR_GEMINI_KEY",
"llmApiKey": "YOUR_GROQ_KEY",
"openrouterApiKey": "YOUR_OPENROUTER_KEY"
}

The system auto-discovers available providers and builds a fallback chain (e.g. Gemini → Groq → OpenRouter). If one provider's quota runs out, it instantly falls back to the next.

End-to-end pipeline

This actor is part of a 5-actor enrichment suite:

ActorPurposeMemoryLink
Google Maps ScraperFind companies by location~80MBView
Website Job ExtractorExtract jobs (HTTP)~128MBView
Website Job Extractor (Browser)Extract jobs from JS pages~1-4GBThis actor
Website Contact ExtractorExtract contacts (HTTP)~256MBView
Website Contact Extractor (Browser)Extract contacts from JS pages~1-4GBView

Limitations

  • Higher memory usage (~1GB vs ~128MB for HTTP)
  • Slower execution (page rendering + wait times)
  • Higher cost per result (2x HTTP rates)
  • Use the HTTP actor first — only fall back to browser when needed