Under maintenance

Pricing

from $0.80 / actor start

Try for free

Go to Apify Store

Ambitionbox Job Scrapper

Under maintenance

Try for free

Production-grade job scraper for AmbitionBox using a **Cheerio-first, Playwright-fallback** architecture. Extracts job listings, enriches with job details and company data, then exports normalized, structured data to Apify Dataset.

Pricing

from $0.80 / actor start

Rating

0.0

(0)

Developer

Actor stats

Bookmarked

Total users

Monthly active users

16 days ago

Last modified

AmbitionBox Ultra-Fast Job Scraper

Production-grade job scraper for AmbitionBox using a Cheerio-first, Playwright-fallback architecture. Extracts job listings, enriches with job details and company data, then exports normalized, structured data to Apify Dataset.

Architecture Overview

Core Principles

Nuxt SSR JSON First: Extract window.__NUXT__ from HTML using regex (NO JavaScript execution)
CheerioCrawler Primary: Fast, lightweight scraping for all phases
PlaywrightCrawler Fallback: ONLY when Cheerio fails to extract critical fields
Three-Phase Pipeline: Listing → Job Detail → Company Overview
Deterministic URL Construction: Use companyUrlName from Nuxt state as single source of truth

Data Flow

Phase 1: Listing Extraction (CheerioCrawler)
  ↓ Extract window.__NUXT__.data[1].jobs
  ↓ Parse job listings + companyUrlName
  ↓ Store in KeyValueStore
  ↓
Phase 2: Job Detail Enrichment (CheerioCrawler)
  ↓ Extract description, rating, skills
  ↓ Resolve company URL from companyUrlName
  ↓ Update KeyValueStore
  ↓
Phase 3: Company Overview Enrichment (CheerioCrawler)
  ↓ Extract size, website, industry, description
  ↓ STRICT employee count validation
  ↓ Merge job + company data
  ↓ Calculate confidence score
  ↓
Export to Apify Dataset

Performance Targets

Concurrency: 40 requests
Throughput: 1200 requests/minute
Timeouts: 20s handler, 30s navigation
Retries: Max 1, on [429, 500, 502, 503]

Project Structure

cherro-scrapper/
├── src/
│   └── main.js              # Main orchestration
├── routes/
│   ├── listing.js           # Phase 1: Listing extraction
│   ├── jobDetail.js         # Phase 2: Job detail enrichment
│   └── company.js           # Phase 3: Company overview enrichment
├── utils/
│   ├── nuxtParser.js        # Nuxt state extraction
│   ├── validators.js        # Data validation (strict rules)
│   ├── normalizers.js       # Data normalization
│   └── confidenceScore.js   # Quality scoring
├── .actor/
│   ├── actor.json           # Apify actor configuration
│   └── input_schema.json    # Input schema
├── package.json
├── Dockerfile
├── .env.example
└── README.md

Installation

Local Development

# Clone repository
cd cherro-scrapper

# Install dependencies
npm install

# Copy environment template
cp .env.example .env

# Edit .env with your configuration
# (Optional: Add APIFY_TOKEN for local testing)

# Run scraper
npm start

Apify Deployment

# Install Apify CLI
npm install -g apify-cli

# Login to Apify
apify login

# Push to Apify
apify push

# Run on Apify platform
# Navigate to https://console.apify.com/actors

Configuration

Input Parameters

Configure via Apify Console or INPUT.json:

{
  "startUrls": [
    "https://www.ambitionbox.com/jobs",
    "https://www.ambitionbox.com/jobs?q=software+engineer"
  ],
  "maxConcurrency": 40,
  "maxRequestsPerMinute": 1200,
  "requestHandlerTimeoutSecs": 20
}

Environment Variables

See .env.example for local testing configuration.

Data Schema

Output Format

Each job record in the dataset contains:

{
  "jobId": "12345",
  "title": "Senior Software Engineer",
  "companyName": "Example Corp",
  "companyUrlName": "example-corp",
  "location": "Bangalore",
  "postedDate": "2025-12-15",
  "salary": {
    "min": 1500000,
    "max": 2500000,
    "currency": "INR"
  },
  "experience": {
    "min": 3,
    "max": 5
  },
  "description": "Job description text...",
  "skills": ["JavaScript", "React", "Node.js"],
  "companyRating": 4.2,
  "employeeCount": {
    "min": 201,
    "max": 500,
    "raw": "201-500"
  },
  "companyWebsite": "https://example.com",
  "industry": "Information Technology",
  "companyDescription": "Company description text...",
  "headquarters": "Bangalore, India",
  "confidenceScore": 87.5,
  "confidenceLevel": "GOOD",
  "scrapedAt": "2025-12-18T09:44:20.000Z",
  "sourceUrl": "https://www.ambitionbox.com/jobs"
}

Confidence Scoring

Data quality score (0-100) based on field completeness:

90-100: EXCELLENT - All mandatory and most optional fields present
75-89: GOOD - All mandatory fields + some enrichment
60-74: FAIR - Mandatory fields present, limited enrichment
40-59: POOR - Some mandatory fields missing
0-39: VERY_POOR - Multiple mandatory fields missing

Critical Implementation Details

Employee Count Validation

STRICT RULES (implemented in utils/validators.js):

✅ ACCEPT:

Ranges: "201-500", "1-10"
Lakh format: "1 Lakh+", "2 Lakhs"
Large numbers: "10,000+", "5000"
K values ≥ 100: "100k", "500k"

❌ REJECT:

Contains "follow": "5.6k followers"
K values < 100: "5.6k", "10k", "50k"

Company URL Resolution

Priority Order:

companyUrlName from Nuxt state (SINGLE SOURCE OF TRUTH)
Extract from job detail page anchor
Construct slug from company name (LAST RESORT)

Format: https://www.ambitionbox.com/overview/{companyUrlName}-overview

Nuxt State Extraction

Method: Regex-based extraction from HTML string

// Extract window.__NUXT__ = {...}
const nuxtRegex = /window\.__NUXT__\s*=\s*({.+?})\s*;?/s;
const match = html.match(nuxtRegex);
const nuxtState = JSON.parse(match[1]);

// Navigate to jobs
const jobs = nuxtState.data[1].jobs;

NO JavaScript execution - works in CheerioCrawler.

Troubleshooting

Common Issues

Issue: No jobs found in Nuxt state

Solution:

Check if AmbitionBox changed their Nuxt state structure
Verify data[1].jobs path is correct
Enable debug logging to inspect raw Nuxt state

Issue: Employee count always null

Solution:

Check if validation rules are too strict
Inspect raw employee count values in logs
Adjust selectors in routes/company.js

Issue: Low confidence scores

Solution:

Review field weights in utils/confidenceScore.js
Check if selectors are extracting data correctly
Verify company URLs are resolving properly

Debug Mode

Enable verbose logging:

// In src/main.js, add:
const crawler = new CheerioCrawler({
  // ... other config
  log: {
    level: 'debug',
  },
});

Performance Optimization

Recommended Settings

For maximum throughput:

{
  "maxConcurrency": 40,
  "maxRequestsPerMinute": 1200
}

For stability (avoid rate limiting):

{
  "maxConcurrency": 20,
  "maxRequestsPerMinute": 600
}

Monitoring

Check Apify Console for:

Request queue size
Dataset item count
Failed requests
Retry histogram

Dependencies

{
  "apify": "^3.1.10",
  "crawlee": "^3.7.0",
  "cheerio": "^1.0.0-rc.12"
}

NO hallucinated packages - all dependencies are official and verified.

License

ISC

Support

For issues or questions:

Check Apify logs for error messages
Review this README for troubleshooting steps
Inspect KeyValueStore for intermediate data
Enable debug logging for detailed output

Built with: Node.js 18+, Crawlee, Apify, Cheerio

Architecture: Cheerio-first, Playwright-fallback

Performance: 40 concurrent requests, 1200 req/min throughput

Ambitionbox Job Scraper

yodeling_elevator/ambitionbox-job-scraper

Ambitionbox Jobs Search Scraper

stealth_mode/ambitionbox-jobs-search-scraper

Efficiently scrape job listings from AmbitionBox.com, India's leading career platform. Extract comprehensive data including job titles, company profiles, salary ranges, experience requirements, and skills. Perfect for recruitment agencies, salary benchmarking, and Indian job market research.

Stealth mode

Jobs Scrapper

ai-scraper-labs/ambition-box-Jobs-scrapper

Powerful AmbitionBox Job Scraper that extracts detailed job listings by role and location. Includes responsibilities, skills, qualifications, company insights, and Naukri integration for technical details. Fast, structured, and proxy-supported for large-scale data collection.

ai-scraper-labs

Jobs Scrapper

ai-scraper-labs/jobs-scrapper

ai-scraper-labs

LinkedIn Job Details Scraper

piotrv1001/linkedin-job-details-scraper

The LinkedIn Job Details Scraper extracts job data from LinkedIn job detail URLs, capturing job titles, company names, logos, locations, posting times, descriptions, number of applicants, job criteria, similar jobs, and related job listings—ideal for recruitment insights and job market analysis.

FalconScrape

421

Talent Job Search Scraper – Cheap 🎯🔍💼

scrapestorm/talent-job-search-scraper---cheap

🔍 Easily collect job listings from Talent job platforms Extract structured job data from Talent job search results, including job titles, company names, locations, job types, posting dates, job descriptions & more Ideal for job market research, recruitment intelligence & hiring trend analysis 🌍

Storm_Scraper

Ambitionbox Salary Scraper

getdataforme/ambitionbox-salary-scraper

Extract salary insights from AmbitionBox for targeted companies. This Apify Actor provides reliable, structured JSON data on compensation trends, using residential proxies for undetected scraping....

GetDataForMe

LinkedIn Job Listing Scrapper

zerobreak/linkedin-job-listing-scrapper

Fast LinkedIn Job Listings Scraper that extracts real-time job posts by keywords, filters, or URLs. Ideal for recruiters, job seekers, and market analysts. Get full job details including title, company, location, skills, and description in bulk.

ZeroBreak

100

💼 Remote Job Board Scraper

pixel_drafter/remote-job-board-scraper

Remote Job Board Scraper extracts remote job listings from public job boards using a headless browser. It collects job titles, company names, locations, and job URLs in structured JSON format. Ideal for job aggregators, alerts, analytics, and market research workflows.

Rohit Bhagat

Fast LinkedIn Jobs Scraper

aaas.ma/fast-linkedin-jobs-scraper

Simply input your desired job titles and location to get a customized list of job opportunities, job titles, job description, job salary, job type, job location complete with company details and links. Start exploring today!

aaas ma

491

3.0

Ambitionbox Job Scrapper

AmbitionBox Ultra-Fast Job Scraper

Architecture Overview

Core Principles

Data Flow

Performance Targets

Project Structure

Installation

Local Development

Apify Deployment

Configuration

Input Parameters

Environment Variables

Data Schema

Output Format

Confidence Scoring

Critical Implementation Details

Employee Count Validation

Company URL Resolution

Nuxt State Extraction

Troubleshooting

Common Issues

Debug Mode

Performance Optimization

Recommended Settings

Monitoring

Dependencies

License

Support

You might also like

Ambitionbox Job Scraper

Ambitionbox Jobs Search Scraper

Jobs Scrapper

Jobs Scrapper

LinkedIn Job Details Scraper

Talent Job Search Scraper – Cheap 🎯🔍💼

Ambitionbox Salary Scraper

LinkedIn Job Listing Scrapper

💼 Remote Job Board Scraper

Fast LinkedIn Jobs Scraper