Deprecated

Pricing

Pay per usage

See alternative Actors

Go to Apify Store

Ambitionbox Job Scraper

Deprecated

See alternative Actors

Production-grade job scraper for AmbitionBox using a **Cheerio-first, Playwright-fallback** architecture. Extracts job listings, enriches with job details and company data, then exports normalized, structured data to Apify Dataset.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

AmbitionBox Ultra-Fast Job Scraper

Production-grade job scraper for AmbitionBox using a Cheerio-first, Playwright-fallback architecture. Extracts job listings, enriches with job details and company data, then exports normalized, structured data to Apify Dataset.

Architecture Overview

Core Principles

Nuxt SSR JSON First: Extract window.__NUXT__ from HTML using regex (NO JavaScript execution)
CheerioCrawler Primary: Fast, lightweight scraping for all phases
PlaywrightCrawler Fallback: ONLY when Cheerio fails to extract critical fields
Three-Phase Pipeline: Listing → Job Detail → Company Overview
Deterministic URL Construction: Use companyUrlName from Nuxt state as single source of truth

Data Flow

Phase 1: Listing Extraction (CheerioCrawler)
  ↓ Extract window.__NUXT__.data[1].jobs
  ↓ Parse job listings + companyUrlName
  ↓ Store in KeyValueStore
  ↓
Phase 2: Job Detail Enrichment (CheerioCrawler)
  ↓ Extract description, rating, skills
  ↓ Resolve company URL from companyUrlName
  ↓ Update KeyValueStore
  ↓
Phase 3: Company Overview Enrichment (CheerioCrawler)
  ↓ Extract size, website, industry, description
  ↓ STRICT employee count validation
  ↓ Merge job + company data
  ↓ Calculate confidence score
  ↓
Export to Apify Dataset

Performance Targets

Concurrency: 40 requests
Throughput: 1200 requests/minute
Timeouts: 20s handler, 30s navigation
Retries: Max 1, on [429, 500, 502, 503]

Project Structure

cherro-scrapper/
├── src/
│   └── main.js              # Main orchestration
├── routes/
│   ├── listing.js           # Phase 1: Listing extraction
│   ├── jobDetail.js         # Phase 2: Job detail enrichment
│   └── company.js           # Phase 3: Company overview enrichment
├── utils/
│   ├── nuxtParser.js        # Nuxt state extraction
│   ├── validators.js        # Data validation (strict rules)
│   ├── normalizers.js       # Data normalization
│   └── confidenceScore.js   # Quality scoring
├── .actor/
│   ├── actor.json           # Apify actor configuration
│   └── input_schema.json    # Input schema
├── package.json
├── Dockerfile
├── .env.example
└── README.md

Installation

Local Development

# Clone repository
cd cherro-scrapper

# Install dependencies
npm install

# Copy environment template
cp .env.example .env

# Edit .env with your configuration
# (Optional: Add APIFY_TOKEN for local testing)

# Run scraper
npm start

Apify Deployment

# Install Apify CLI
npm install -g apify-cli

# Login to Apify
apify login

# Push to Apify
apify push

# Run on Apify platform
# Navigate to https://console.apify.com/actors

Configuration

Input Parameters

Configure via Apify Console or INPUT.json:

{
  "startUrls": [
    "https://www.ambitionbox.com/jobs",
    "https://www.ambitionbox.com/jobs?q=software+engineer"
  ],
  "maxConcurrency": 40,
  "maxRequestsPerMinute": 1200,
  "requestHandlerTimeoutSecs": 20
}

Environment Variables

See .env.example for local testing configuration.

Data Schema

Output Format

Each job record in the dataset contains:

{
  "jobId": "12345",
  "title": "Senior Software Engineer",
  "companyName": "Example Corp",
  "companyUrlName": "example-corp",
  "location": "Bangalore",
  "postedDate": "2025-12-15",
  "salary": {
    "min": 1500000,
    "max": 2500000,
    "currency": "INR"
  },
  "experience": {
    "min": 3,
    "max": 5
  },
  "description": "Job description text...",
  "skills": ["JavaScript", "React", "Node.js"],
  "companyRating": 4.2,
  "employeeCount": {
    "min": 201,
    "max": 500,
    "raw": "201-500"
  },
  "companyWebsite": "https://example.com",
  "industry": "Information Technology",
  "companyDescription": "Company description text...",
  "headquarters": "Bangalore, India",
  "confidenceScore": 87.5,
  "confidenceLevel": "GOOD",
  "scrapedAt": "2025-12-18T09:44:20.000Z",
  "sourceUrl": "https://www.ambitionbox.com/jobs"
}

Confidence Scoring

Data quality score (0-100) based on field completeness:

90-100: EXCELLENT - All mandatory and most optional fields present
75-89: GOOD - All mandatory fields + some enrichment
60-74: FAIR - Mandatory fields present, limited enrichment
40-59: POOR - Some mandatory fields missing
0-39: VERY_POOR - Multiple mandatory fields missing

Critical Implementation Details

Employee Count Validation

STRICT RULES (implemented in utils/validators.js):

✅ ACCEPT:

Ranges: "201-500", "1-10"
Lakh format: "1 Lakh+", "2 Lakhs"
Large numbers: "10,000+", "5000"
K values ≥ 100: "100k", "500k"

❌ REJECT:

Contains "follow": "5.6k followers"
K values < 100: "5.6k", "10k", "50k"

Company URL Resolution

Priority Order:

companyUrlName from Nuxt state (SINGLE SOURCE OF TRUTH)
Extract from job detail page anchor
Construct slug from company name (LAST RESORT)

Format: https://www.ambitionbox.com/overview/{companyUrlName}-overview

Nuxt State Extraction

Method: Regex-based extraction from HTML string

// Extract window.__NUXT__ = {...}
const nuxtRegex = /window\.__NUXT__\s*=\s*({.+?})\s*;?/s;
const match = html.match(nuxtRegex);
const nuxtState = JSON.parse(match[1]);

// Navigate to jobs
const jobs = nuxtState.data[1].jobs;

NO JavaScript execution - works in CheerioCrawler.

Troubleshooting

Common Issues

Issue: No jobs found in Nuxt state

Solution:

Check if AmbitionBox changed their Nuxt state structure
Verify data[1].jobs path is correct
Enable debug logging to inspect raw Nuxt state

Issue: Employee count always null

Solution:

Check if validation rules are too strict
Inspect raw employee count values in logs
Adjust selectors in routes/company.js

Issue: Low confidence scores

Solution:

Review field weights in utils/confidenceScore.js
Check if selectors are extracting data correctly
Verify company URLs are resolving properly

Debug Mode

Enable verbose logging:

// In src/main.js, add:
const crawler = new CheerioCrawler({
  // ... other config
  log: {
    level: 'debug',
  },
});

Performance Optimization

Recommended Settings

For maximum throughput:

{
  "maxConcurrency": 40,
  "maxRequestsPerMinute": 1200
}

For stability (avoid rate limiting):

{
  "maxConcurrency": 20,
  "maxRequestsPerMinute": 600
}

Monitoring

Check Apify Console for:

Request queue size
Dataset item count
Failed requests
Retry histogram

Dependencies

{
  "apify": "^3.1.10",
  "crawlee": "^3.7.0",
  "cheerio": "^1.0.0-rc.12"
}

NO hallucinated packages - all dependencies are official and verified.

License

ISC

Support

For issues or questions:

Check Apify logs for error messages
Review this README for troubleshooting steps
Inspect KeyValueStore for intermediate data
Enable debug logging for detailed output

Built with: Node.js 18+, Crawlee, Apify, Cheerio

Architecture: Cheerio-first, Playwright-fallback

Performance: 40 concurrent requests, 1200 req/min throughput

Ambitionbox Jobs Search Scraper

stealth_mode/ambitionbox-jobs-search-scraper

Efficiently scrape job listings from AmbitionBox.com, India's leading career platform. Extract comprehensive data including job titles, company profiles, salary ranges, experience requirements, and skills. Perfect for recruitment agencies, salary benchmarking, and Indian job market research.

Stealth mode

Jobs Scrapper

ai-scraper-labs/ambition-box-Jobs-scrapper

Powerful AmbitionBox Job Scraper that extracts detailed job listings by role and location. Includes responsibilities, skills, qualifications, company insights, and Naukri integration for technical details. Fast, structured, and proxy-supported for large-scale data collection.

ai-scraper-labs

Jobs Scrapper

ai-scraper-labs/jobs-scrapper

ai-scraper-labs

Ambitionbox Job Scrapper

yodeling_elevator/ambitionbox-job-scrapper

✅ Naukri + Gulf Jobs Scraper — No Login, AmbitionBox, API

k1ra/naukri-jobs-scraper

Naukri + Naukrigulf jobs scraper — no GIDs, no login, no cookies. Full descriptions, salaries, skills, experience & AmbitionBox ratings. Naukri salary scraper, Naukri scraper API for n8n & Make, India jobs scraper, company scraper, CSV export. Pay per job, $0 on empty.

Kevin Savani

5.0

(1)

CutShort.io Scraper - India Tech Jobs, Salary & Skills

thirdwatch/cutshort-jobs-scraper

Scrape CutShort.io tech job listings: title, company, salary, skills, experience, full descriptions. India's curated hiring platform with 4M+ developers. Startup jobs with funding stage and company size. No login needed.

Thirdwatch

167

💼 Naukri Job Scraper — India Salary & Skills · $1.4/1k

themineworks/naukri-jobs

Scrape Naukri.com jobs at scale: title, company, salary (normalised to lakhs), experience, skills, location, remote/hybrid work mode. India residential proxy, no login, pay-per-job. Works in Claude, ChatGPT & any MCP AI agent.

The Mine Works

Glassdoor Scraper - Jobs, Salaries & Company Reviews

thirdwatch/glassdoor-scraper

Scrape Glassdoor job listings, salary data, and company reviews. Extracts job title, company, location, salary estimates, ratings, descriptions, pros/cons. No login needed.

Thirdwatch

Naukri Job Scraper

muhammetakkurtt/naukri-job-scraper

Naukri Job Scraper is an actor that automatically scrapes job postings from Naukri.com. It scrapes details such as job title, company name, experience and salary based on the specified keyword and maximum number of jobs. The collected data can be used for recruitment analysis and market research.

Muhammet Akkurt

14K

3.6

(11)

Naukri Job Scraper — India + Gulf · Emails & 41 Fields

memo23/naukri-scraper

Scrape Naukri and NaukriGulf jobs for recruiting intel, talent-market research, and lead gen — employer emails included. One run covers India + the Gulf: 41 fields per job (salary bands, skills, education, apply links, AmbitionBox company ratings). JSON or CSV out, bulk-ready.

Muhamed Didovic

3.7

(12)

Ambitionbox Job Scraper

AmbitionBox Ultra-Fast Job Scraper

Architecture Overview

Core Principles

Data Flow

Performance Targets

Project Structure

Installation

Local Development

Apify Deployment

Configuration

Input Parameters

Environment Variables

Data Schema

Output Format

Confidence Scoring

Critical Implementation Details

Employee Count Validation

Company URL Resolution

Nuxt State Extraction

Troubleshooting

Common Issues

Debug Mode

Performance Optimization

Recommended Settings

Monitoring

Dependencies

License

Support

You might also like

Ambitionbox Jobs Search Scraper

Jobs Scrapper

Jobs Scrapper

Ambitionbox Job Scrapper

✅ Naukri + Gulf Jobs Scraper — No Login, AmbitionBox, API

CutShort.io Scraper - India Tech Jobs, Salary & Skills

💼 Naukri Job Scraper — India Salary & Skills · $1.4/1k

Glassdoor Scraper - Jobs, Salaries & Company Reviews

Naukri Job Scraper

Naukri Job Scraper — India + Gulf · Emails & 41 Fields