Scrape entry-level IT jobs in India
Pricing
from $0.0005 / actor start
Scrape entry-level IT jobs in India
Scrape entry-level IT jobs in India from LinkedIn and major ATS boards (Greenhouse, Lever, Workday, SmartRecruiters, etc.). Filter by recency, location, and job type. Outputs clean JSON to dataset and Excel report to key-value store.
Pricing
from $0.0005 / actor start
Rating
5.0
(1)
Developer

Dhanunjaya Y
Actor stats
0
Bookmarked
10
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
Entry-Level IT Jobs Scraper India - Export to Excel
What It Does
This actor scrapes entry-level IT jobs in India from LinkedIn and 12 ATS-style sources:
- Greenhouse
- Lever
- SmartRecruiters
- Workday
- iCIMS
- Jobvite
- BambooHR
- Zoho Recruit
- Freshteam
- Keka
- Darwinbox
- Recruitee
It normalizes all results into one schema, filters by posting recency, removes duplicates, and exports:
- JSON records to Apify dataset
OUTPUT.xlsxto Apify key-value store
Input Fields
keywords(string): Job search terms. Example:software engineer fresher,python developer,data analystlocation(string): Target location. Example:India,Bengaluru,Hyderabadposted_within(enum): Time window filter. Options:1 hour,2 hours,5 hours,12 hours,today,2 days,this weeksources(array): Which boards to scrape. Default includes all 13 sources.job_type(enum):full-time,internship, orbothmax_results_per_source(integer): Max jobs fetched per sourcemax_keyword_variants(integer): Number of auto-generated keyword variants tested per source (default:20)linkedin_li_at_cookie(secret string, optional): LinkedInli_atcookie for improved LinkedIn fetch reliability on stricter rate limitsexecution_mode(enum):ultra,fast,balanced,deep(default:fast)ultra: fastest turnaround, smaller scan depthfast: speed + coverage balance (recommended for frequent runs)balanced: deeper than fast, slowerdeep: maximum depth, slowest
run_timeout_seconds(integer): Global runtime budget before pending sources are cancelledsource_timeout_seconds(integer): Per-source timeout budgetsource_concurrency/variant_concurrency/linkedin_variant_concurrency(integer): Advanced parallelism controls
Smart keyword expansion
- The actor auto-expands user keywords for better entry-level coverage.
- Example:
QA jobsexpands to variants likeqa intern,manual testing intern,automation testing trainee,software tester,junior qa engineer,quality analyst, and more. - Prefix/suffix generation adds terms such as
junior,associate,entry-level,fresher,intern,trainee, plus role endings likeengineer,analyst, andtester.
Time Filter
Every job goes through a common parser that handles:
- Relative strings:
2 hours ago,Posted 3 days ago,Today - ISO strings:
2024-01-15T10:30:00Z - Unix timestamps:
1705312200000 - Calendar strings:
Jan 15, 2024,15/01/2024,15 Jan 2024
When to use each option
1 hour: Very fresh alerting workflow2 hours: Slightly wider near-real-time scan5 hours: Same-shift refresh12 hours: Half-day batch runtoday: Daily run2 days: Catch-up runthis week: Weekly sourcing run
Speed Notes
- Scraping now runs in parallel across sources and keyword variants.
- Sources that do not support server-side keyword search are auto-run with a single keyword pass to avoid redundant calls.
- In
fastmode, typical runs complete within roughly 1-3 minutes depending on selected sources, keyword breadth, and target result counts.
Output
The Excel file includes 4 sheets:
All JobsBy Source BoardBy CityDashboard
All Jobs has styled headers, freeze pane, filters, alternating rows, and clickable apply links.
Recommended screenshot path for store listing:
assets/excel-output-sample.png(add your real run screenshot before publishing)
Recommended actor icon path:
assets/actor-icon.svg(upload this in Actor Settings > Profile > Icon)
Pricing
Recommended Apify store pricing setup:
- Pay-per-use:
$0.50 per 100 results - Subscription:
$19/month unlimited runs - Free tier:
50 resultsfor trial users
Local Setup
pip install -r requirements.txtplaywright install chromiumpython -m unittest discover -s tests -vpython main.py
Apify Deploy
apify loginapify push
After deployment:
- Run actor on Apify.
- Verify
OUTPUT.xlsxexists in key-value store. - Verify JSON rows exist in dataset.
- Review logs for per-source counts and errors.
Data Seed Files
Company/portal seeds are in data/:
lever_companies.jsongreenhouse_companies.jsonjobvite_companies.jsonbamboohr_companies.jsonzoho_companies.jsonfreshteam_companies.jsonkeka_companies.jsondarwinbox_companies.jsonrecruitee_companies.jsonicims_clients.jsonworkday_portals.json
Populate these lists with your target companies for higher coverage.