Scrape entry-level IT jobs in India avatar

Scrape entry-level IT jobs in India

Pricing

from $0.0005 / actor start

Go to Apify Store
Scrape entry-level IT jobs in India

Scrape entry-level IT jobs in India

Scrape entry-level IT jobs in India from LinkedIn and major ATS boards (Greenhouse, Lever, Workday, SmartRecruiters, etc.). Filter by recency, location, and job type. Outputs clean JSON to dataset and Excel report to key-value store.

Pricing

from $0.0005 / actor start

Rating

5.0

(1)

Developer

Dhanunjaya Y

Dhanunjaya Y

Maintained by Community

Actor stats

0

Bookmarked

10

Total users

1

Monthly active users

5 days ago

Last modified

Share

Entry-Level IT Jobs Scraper India - Export to Excel

What It Does

This actor scrapes entry-level IT jobs in India from LinkedIn and 12 ATS-style sources:

  • LinkedIn
  • Greenhouse
  • Lever
  • SmartRecruiters
  • Workday
  • iCIMS
  • Jobvite
  • BambooHR
  • Zoho Recruit
  • Freshteam
  • Keka
  • Darwinbox
  • Recruitee

It normalizes all results into one schema, filters by posting recency, removes duplicates, and exports:

  • JSON records to Apify dataset
  • OUTPUT.xlsx to Apify key-value store

Input Fields

  • keywords (string): Job search terms. Example: software engineer fresher, python developer, data analyst
  • location (string): Target location. Example: India, Bengaluru, Hyderabad
  • posted_within (enum): Time window filter. Options: 1 hour, 2 hours, 5 hours, 12 hours, today, 2 days, this week
  • sources (array): Which boards to scrape. Default includes all 13 sources.
  • job_type (enum): full-time, internship, or both
  • max_results_per_source (integer): Max jobs fetched per source
  • max_keyword_variants (integer): Number of auto-generated keyword variants tested per source (default: 20)
  • linkedin_li_at_cookie (secret string, optional): LinkedIn li_at cookie for improved LinkedIn fetch reliability on stricter rate limits
  • execution_mode (enum): ultra, fast, balanced, deep (default: fast)
    • ultra: fastest turnaround, smaller scan depth
    • fast: speed + coverage balance (recommended for frequent runs)
    • balanced: deeper than fast, slower
    • deep: maximum depth, slowest
  • run_timeout_seconds (integer): Global runtime budget before pending sources are cancelled
  • source_timeout_seconds (integer): Per-source timeout budget
  • source_concurrency / variant_concurrency / linkedin_variant_concurrency (integer): Advanced parallelism controls

Smart keyword expansion

  • The actor auto-expands user keywords for better entry-level coverage.
  • Example: QA jobs expands to variants like qa intern, manual testing intern, automation testing trainee, software tester, junior qa engineer, quality analyst, and more.
  • Prefix/suffix generation adds terms such as junior, associate, entry-level, fresher, intern, trainee, plus role endings like engineer, analyst, and tester.

Time Filter

Every job goes through a common parser that handles:

  • Relative strings: 2 hours ago, Posted 3 days ago, Today
  • ISO strings: 2024-01-15T10:30:00Z
  • Unix timestamps: 1705312200000
  • Calendar strings: Jan 15, 2024, 15/01/2024, 15 Jan 2024

When to use each option

  • 1 hour: Very fresh alerting workflow
  • 2 hours: Slightly wider near-real-time scan
  • 5 hours: Same-shift refresh
  • 12 hours: Half-day batch run
  • today: Daily run
  • 2 days: Catch-up run
  • this week: Weekly sourcing run

Speed Notes

  • Scraping now runs in parallel across sources and keyword variants.
  • Sources that do not support server-side keyword search are auto-run with a single keyword pass to avoid redundant calls.
  • In fast mode, typical runs complete within roughly 1-3 minutes depending on selected sources, keyword breadth, and target result counts.

Output

The Excel file includes 4 sheets:

  • All Jobs
  • By Source Board
  • By City
  • Dashboard

All Jobs has styled headers, freeze pane, filters, alternating rows, and clickable apply links.

Recommended screenshot path for store listing:

  • assets/excel-output-sample.png (add your real run screenshot before publishing)

Recommended actor icon path:

  • assets/actor-icon.svg (upload this in Actor Settings > Profile > Icon)

Pricing

Recommended Apify store pricing setup:

  • Pay-per-use: $0.50 per 100 results
  • Subscription: $19/month unlimited runs
  • Free tier: 50 results for trial users

Local Setup

pip install -r requirements.txt
playwright install chromium
python -m unittest discover -s tests -v
python main.py

Apify Deploy

apify login
apify push

After deployment:

  1. Run actor on Apify.
  2. Verify OUTPUT.xlsx exists in key-value store.
  3. Verify JSON rows exist in dataset.
  4. Review logs for per-source counts and errors.

Data Seed Files

Company/portal seeds are in data/:

  • lever_companies.json
  • greenhouse_companies.json
  • jobvite_companies.json
  • bamboohr_companies.json
  • zoho_companies.json
  • freshteam_companies.json
  • keka_companies.json
  • darwinbox_companies.json
  • recruitee_companies.json
  • icims_clients.json
  • workday_portals.json

Populate these lists with your target companies for higher coverage.