Ausbildung Jobs Scraper avatar
Ausbildung Jobs Scraper

Pricing

Pay per usage

Go to Apify Store
Ausbildung Jobs Scraper

Ausbildung Jobs Scraper

Introducing the Ausbildung Jobs Scraper, a lightweight actor for efficiently scraping apprenticeship and vocational training listings. Fast and simple. For best results and reliable data extraction, the use of residential proxies is strongly advised. Get the training data you need!

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Share

Ausbildung.de Jobs Scraper

Extract comprehensive apprenticeship and training position data from Ausbildung.de, Germany's leading platform for vocational training opportunities. This scraper efficiently collects job listings with detailed information including company details, locations, training types, and complete job descriptions.

🚀 Key Features

  • Dual Extraction Method: Prioritizes fast JSON API calls, automatically falls back to HTML parsing when needed
  • Smart Pagination: Intelligently navigates through search results to collect the exact number of listings you need
  • Rich Data Collection: Captures complete job information including descriptions, locations, federal states, and training types
  • Flexible Search Options: Filter by keyword, location, and profession
  • Structured Data Support: Leverages JSON-LD schema for accurate data extraction when available
  • Built-in Deduplication: Automatically removes duplicate job listings
  • Proxy Support: Includes proxy configuration for reliable, uninterrupted scraping

📋 Use Cases

  • Job Market Analysis: Gather data for analyzing apprenticeship trends across different regions and industries
  • Career Guidance: Aggregate training opportunities for students and career counselors
  • Recruitment Intelligence: Monitor competitor hiring patterns and training programs
  • Research & Analytics: Build datasets for labor market research and vocational education studies
  • Automated Job Boards: Feed fresh apprenticeship listings into your own platforms or applications

🎯 Input Configuration

Configure the scraper with these parameters to match your specific needs:

Search Parameters

ParameterTypeDescriptionDefault
keywordStringJob title or search keyword (e.g., "Fachinformatiker", "Kaufmann")-
locationStringCity or location (e.g., "Berlin", "München")-
berufStringSpecific profession or job category-
startUrlStringCustom Ausbildung.de search URL (overrides other search parameters)-

Scraping Options

ParameterTypeDescriptionDefault
results_wantedIntegerMaximum number of job listings to collect100
max_pagesIntegerMaximum number of pages to process (safety limit)50
collectDetailsBooleanVisit detail pages to extract full job descriptionstrue
proxyConfigurationObjectProxy settings for reliable scrapingResidential proxies

Example Input

{
"keyword": "Fachinformatiker",
"location": "Berlin",
"results_wanted": 50,
"max_pages": 10,
"collectDetails": true
}

📤 Output Format

Each scraped job listing contains the following fields:

FieldTypeDescription
titleStringJob position title
companyStringCompany or employer name
locationStringJob location (city)
bundeslandStringGerman federal state
berufStringProfession or job category
ausbildungsartStringType of training/apprenticeship
start_dateStringTraining start date
date_postedStringDate the job was posted
description_htmlStringFull job description (HTML format)
description_textStringPlain text version of job description
salaryStringSalary information (if available)
job_typeStringEmployment type
urlStringDirect link to job posting

Example Output

{
"title": "Ausbildung zum Fachinformatiker für Anwendungsentwicklung (m/w/d)",
"company": "TechCorp GmbH",
"location": "Berlin",
"bundesland": "Berlin",
"beruf": "Fachinformatiker/in - Anwendungsentwicklung",
"ausbildungsart": "Duale Ausbildung",
"start_date": "01.08.2025",
"date_posted": "2024-12-01",
"description_html": "<p>Wir suchen motivierte Auszubildende...</p>",
"description_text": "Wir suchen motivierte Auszubildende...",
"salary": "1000-1200 EUR",
"job_type": "Ausbildung",
"url": "https://www.ausbildung.de/stellen/..."
}

💡 How It Works

  1. BUILD_ID Extraction: Automatically extracts the Next.js build ID from the initial page load for API access
  2. Tier 1 - Next.js Data API: Fetches data via /_next/data/[BUILD_ID]/suche.json for maximum speed and reliability
  3. Tier 2 - JSON-LD Schema: If API fails, extracts JobPosting structured data from detail pages
  4. Tier 3 - CSS Selectors: Falls back to HTML parsing using .c-jobCard, .c-jobCard__company, .c-jobCard__location selectors
  5. Smart Pagination: Navigates results using a[rel='next'] and .c-pagination__next selectors
  6. Detail Collection: Optionally visits each job detail page to extract complete information
  7. Data Validation: Cleans, validates, and deduplicates all extracted data

🔧 Best Practices

  • Start Small: Test with results_wanted: 10 before running large-scale extractions
  • Use Proxies: Enable proxy configuration for reliable, uninterrupted scraping
  • Specific Searches: More specific keywords yield better, more relevant results
  • Monitor Limits: Set appropriate max_pages to control runtime and costs
  • Detail Mode: Disable collectDetails if you only need basic listing information

⚙️ Technical Details

  • Built with Crawlee for robust crawling and data extraction
  • Uses JSON API for efficient data extraction with HTML fallback capability
  • Implements intelligent retry logic and error handling
  • Uses residential proxies for optimal reliability
  • Processes data asynchronously for maximum performance

📊 Performance

  • Speed: Processes 20-50 jobs per minute with API mode
  • Accuracy: 95%+ data completeness with detail collection enabled
  • Reliability: Built-in retry mechanisms handle temporary failures
  • Scalability: Efficiently handles from 10 to 10,000+ job listings

🆘 Troubleshooting

No results returned: Verify your search parameters are correct and the website has matching listings

Incomplete data: Enable collectDetails to extract full job information from detail pages

Rate limiting: Enable proxy configuration and reduce results_wanted or add delays

Outdated selectors: The scraper automatically updates to handle website changes, but contact support if issues persist

📞 Support & Feedback

Found an issue or have a suggestion? We'd love to hear from you! Your feedback helps us improve this scraper for everyone.


Start extracting valuable apprenticeship data from Ausbildung.de today! Configure your parameters and run the scraper to build comprehensive datasets for your analysis, research, or application needs.