Indeed Job Scraper
Pricing
Pay per usage
Indeed Job Scraper
A simple Indeed Job Scraper for minimalist, essential data. Uses residential proxies and cookies to prevent blocks, ensuring smooth and reliable runs. Perfect for getting targeted job data without the clutter.
5.0 (1)
Pricing
Pay per usage
1
17
12
Last modified
3 days ago
Indeed Jobs Scraper
A configurable actor that scrapes job listings from Indeed search results. Designed to be efficient and configurable for production runs.
Overview
This actor collects job listing metadata and, optionally, full job descriptions. It supports single or multiple search URLs, pagination, and options to control concurrency, proxy usage, and cookies.
Features
- Scrape job title, company, location, salary, post date, and description (HTML & text).
- Accepts a full search URL or builds searches from keywords and location.
- Handles pagination to collect multiple pages of results.
- Supports configuring concurrency, proxy usage, and cookies for authenticated sessions.
- Outputs results to the default dataset for further processing.
Inputs
Provide a JSON object with the following properties. Any unspecified fields use sensible defaults.
Search parameters
| Field | Type | Description |
|---|---|---|
searchUrl | string | Full Indeed search URL (if present, keyword and location are ignored). |
startUrls | string[] | List of search URLs to process (optional). |
keyword | string | Search keywords (used when searchUrl is not provided). |
location | string | Location filter for searches (optional). |
posted_date | string | Filter by date posted: e.g., Last 24 hours, Last 7 days, Last 30 days. |
Scraping options
| Field | Type | Description |
|---|---|---|
maxItems | number | Maximum number of job items to collect. |
collectDetails | boolean | If true, visits each job detail page to extract full description. |
maxConcurrency | number | Maximum parallel requests (tune to avoid rate limits). |
cookies / cookiesJson | object | string |
proxyConfiguration | object | Proxy settings (use residential proxies when needed). |
Example input
{"startUrls": ["https://www.indeed.com/jobs?q=software+engineer&l=Remote"],"maxItems": 200,"collectDetails": true,"maxConcurrency": 5,"proxyConfiguration": { "useApifyProxy": true }}
Output
The actor writes results to the dataset. Each item includes:
title— Job titlecompany— Company namelocation— Job locationpostedAt— When the job was posted (human readable)salary— Salary information (if present)description_html— Job description in HTMLdescription_text— Plain text job descriptionurl— Job posting URLsource— Source identifier (e.g.,indeed)search_url— Search page where the job was found
How to run
- Provide the input JSON (example above) via the platform's run interface or CLI.
- Start the actor. Monitor the run and dataset for collected items.
- Adjust
maxConcurrency,proxyConfiguration, and cookies if you encounter rate limiting.
Best practices & troubleshooting
Avoiding blocks
- Use a proxy pool or residential proxies for large-scale runs.
- Lower concurrency and add small delays when you see request failures.
- Provide valid cookies if you want to run authenticated sessions or reduce bot checks.
Common issues
- Incomplete results: increase
maxItemsor confirm your search URL parameters. - Many HTTP errors: reduce
maxConcurrencyand/or enable proxy rotation. - Captcha / challenges: try using cookies from a valid session and a reliable proxy provider.
Notes
Structure your input carefully and run smaller test jobs first to validate settings. Adjust proxies and concurrency for production-scale scraping.
