🔥Stepstone Job Scraper avatar
🔥Stepstone Job Scraper

Pricing

Pay per usage

Go to Apify Store
🔥Stepstone Job Scraper

🔥Stepstone Job Scraper

Developed by

Shahid Irfan

Shahid Irfan

Maintained by Community

Introducing the Stepstone Job Scraper, a lightweight actor for efficiently scraping job listings from Stepstone. Fast and simple. For best results and reliable data extraction, the use of residential proxies is strongly advised. Get the job data you need!

0.0 (0)

Pricing

Pay per usage

0

2

1

Last modified

3 days ago

Stepstone Job Scraper

This actor efficiently scrapes job listings from Stepstone.de, providing comprehensive data on available positions including titles, companies, locations, and detailed descriptions. It is designed for reliable data extraction with built-in mechanisms for handling large volumes of listings.

Description

The Stepstone Job Scraper automates the collection of job postings from Germany's leading job portal. It navigates through search results and detail pages to gather structured information, making it ideal for market research, recruitment analytics, and job market monitoring.

Features

  • Comprehensive Data Extraction: Captures key job details such as title, company, location, posting date, job type, category, and salary information.
  • Flexible URL Handling: Supports starting from specific URLs or constructing searches based on keywords and categories.
  • Advanced Filtering: Includes date-based filtering to focus on recent postings (last 24 hours, 7 days, 30 days, or month).
  • Detail Page Scraping: Optionally collects full job descriptions in both HTML and plain text formats.
  • Deduplication: Automatically removes duplicate job listings to ensure clean datasets.
  • Proxy Support: Integrates with proxy configurations for enhanced reliability and compliance.
  • Scalable Collection: Configurable limits on items and pages to manage data volume.
  • Error Handling: Includes retry mechanisms and human-like delays to mimic natural browsing behavior.

Input

The actor accepts the following input parameters to customize the scraping process:

Output

The actor outputs job data in JSON format, stored in the default dataset. Each item represents a single job listing with the following schema:

Usage

Basic Usage

  1. Access the Actor: Navigate to the actor's page in the Apify Console.
  2. Configure Input: Set the desired input parameters, such as keyword for search terms or startUrl for a specific page.
  3. Run the Actor: Click "Run" to start the scraping process.
  4. Monitor Progress: View logs and progress in the run details.
  5. Download Results: Once complete, export the dataset in JSON, CSV, or other formats.

Example Configurations

Example 1: Scrape Jobs by Keyword

  • Set keyword to "data scientist"
  • Set datePosted to "last_7d"
  • Set maxItems to 50

This will search for data scientist positions posted in the last 7 days and collect up to 50 listings.

Example 2: Scrape from Specific URL

This starts scraping from the IT jobs section and includes detailed descriptions.

Example 3: Large-Scale Collection

  • Set startUrls to an array of category URLs
  • Set maxPages to 100
  • Configure proxyConfiguration for residential proxies

Ideal for comprehensive data collection across multiple categories.

Configuration

Proxy Settings

For optimal performance and to avoid IP blocking, configure the proxyConfiguration parameter:

  • Use Apify's built-in proxy groups (e.g., "residential" for better success rates).
  • Set rotation and session persistence as needed.

If Stepstone requires specific cookies:

  • Use cookies for a raw header string.
  • Alternatively, provide cookiesJson for structured cookie data.

Performance Tuning

  • Adjust maxItems and maxPages based on your data needs and rate limits.
  • Enable dedupe to maintain data quality.
  • Use datePosted filtering to focus on recent jobs and reduce processing time.

Cost & Limits

  • Proxy Usage: Utilizes Apify Proxy for reliable scraping. Residential proxies are recommended for large-scale runs.
  • Rate Limiting: Implements human-like delays to respect website policies.
  • No Hard Limits: Configurable parameters allow flexible data collection, but always adhere to Stepstone's terms of service.
  • Cost Estimation: Costs depend on proxy usage and run duration. Monitor usage in the Apify Console.

Limitations

  • Date filtering is applied post-scraping, which may affect performance for very large datasets.
  • Location-based filtering via URL parameters is not fully supported by Stepstone's current interface.
  • Requires valid, accessible Stepstone URLs for optimal results.
  • Some job details may vary based on Stepstone's page structure changes.

Troubleshooting

  • Low Success Rate: Check proxy configuration and try residential proxies.
  • Duplicates in Output: Ensure dedupe is enabled.
  • Missing Details: Verify collectDetails is set to true and URLs are valid.
  • Rate Limiting: Increase delays or use different proxy groups.

For further assistance, refer to the Apify documentation or contact support.