Bundesagentur Scraper avatar

Bundesagentur Scraper

Under maintenance

Pricing

from $2.99 / 1,000 results

Go to Apify Store
Bundesagentur Scraper

Bundesagentur Scraper

Under maintenance

Extract job listings from Bundesagentur für Arbeit, Germany's federal employment agency (Jobbörse).

Pricing

from $2.99 / 1,000 results

Rating

0.0

(0)

Developer

Jobs Scraper

Jobs Scraper

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Overview

Navigate Germany's federal employment agency portal (Bundesagentur für Arbeit / Jobbörse) to compile officially registered job openings. This actor extracts listing data from Germany's most comprehensive public employment database, covering positions across all industries and qualification levels in the DACH region.

Features

  • Official German federal employment agency data
  • Berufenet occupation classification integration
  • Bundesland and city-level regional filtering
  • Arbeitszeit and contract type specifications
  • Proxy rotation with automatic fallback (residential → datacenter)
  • CAPTCHA detection and session rotation
  • Automatic retry on failures with exponential backoff
  • Deduplication of results by application URL
  • Dataset validation with auto-fix capability

Supported Inputs

FieldTypeDefaultDescription
keywordstring"software engineer"Search terms for job discovery
locationstring"Berlin"Geographic filter for results
countrystring"DE"Country code for proxy routing
maxItemsinteger50Upper limit on extracted listings
proxyEnabledbooleantrueToggle proxy rotation on/off
sortBystring"relevance"Result ordering (relevance/date/salary)
jobTypestring""Employment type filter
experienceLevelstring""Seniority level filter
datePostedstring""Recency filter (24h/3d/7d/14d/30d)
remoteOnlybooleanfalseRestrict to remote positions only
includeCompanyDetailsbooleantrueFetch extra company information
includeSalarybooleantrueInclude compensation data

Output Format

Each scraped listing produces a JSON object with these fields:

{
"jobTitle": "Senior Software Engineer",
"companyName": "Example Corp",
"location": "Berlin",
"salary": "$120,000 - $160,000",
"jobType": "Full-time",
"experienceLevel": "Senior",
"postedDate": "2 days ago",
"applyUrl": "https://www.arbeitsagentur.de/job/12345",
"companyUrl": "https://www.arbeitsagentur.de/company/example",
"description": "We are looking for a skilled engineer...",
"requirements": ["JavaScript", "Node.js", "React"],
"benefits": ["Health Insurance", "Remote Work"],
"sourcePortal": "Bundesagentur für Arbeit",
"country": "DE",
"scrapedAt": "2025-01-15T10:30:00.000Z"
}

Proxy Handling

The actor employs a multi-tier proxy strategy to maximize successful data extraction.

  1. Apify Residential Proxy (country-targeted) — First choice for Bundesagentur für Arbeit
  2. Apify Residential Proxy (any region) — Fallback if country proxy unavailable
  3. Apify Datacenter Proxy — Secondary fallback for cost efficiency
  4. Direct Connection — Last resort when all proxies fail

Proxies auto-rotate on each request. Blocked sessions are discarded and replaced automatically.

Retry Logic

Failed requests are retried up to 5 times with automatic session rotation.

  • Maximum 5 retries per request
  • Fresh browser session on each retry
  • Automatic proxy rotation between attempts
  • Blocked status codes (401, 403, 429) trigger session refresh
  • Configurable request timeout (120 seconds)

Anti-block Handling

The actor incorporates multiple stealth techniques to minimize detection.

  • navigator.webdriver property masked
  • Human-like delays between page interactions (2–5 seconds)
  • Browser language and plugin fingerprints normalised
  • Session pool with automatic rotation on blocks
  • CAPTCHA detection with graceful retry
  • Rate limit detection (HTTP 429) with backoff

Sample Input

{
"keyword": "data analyst",
"location": "Berlin",
"maxItems": 25,
"proxyEnabled": true,
"sortBy": "date",
"remoteOnly": false
}

Sample Output

{
"jobTitle": "Data Analyst",
"companyName": "TechCorp International",
"location": "Berlin",
"salary": "Competitive",
"jobType": "Full-time",
"experienceLevel": "Mid-level",
"postedDate": "1 day ago",
"applyUrl": "https://www.arbeitsagentur.de/job/example-123",
"companyUrl": "",
"description": "Seeking a detail-oriented data analyst to join our growing team...",
"requirements": ["SQL", "Python", "Tableau"],
"benefits": ["Health Insurance", "Flexible Hours"],
"sourcePortal": "Bundesagentur für Arbeit",
"country": "DE",
"scrapedAt": "2025-01-15T14:22:00.000Z"
}

Usage

Local Development

# Install dependencies
npm install
# Set Apify token (required for proxy)
export APIFY_TOKEN=your_token_here
# Run the actor
npm start
# Validate scraped data
node dataset-validator.js

Apify Platform

# Login to Apify
apify login
# Push actor to platform
apify push
# Run from Apify Console or API

Deployment

  1. Ensure all dependencies are installed: npm install
  2. Authenticate with Apify: apify login
  3. Deploy the actor: apify push
  4. Configure input in the Apify Console
  5. Schedule runs or trigger via API / webhooks

Limitations

  • Results depend on the portal's current HTML structure; layout changes may require selector updates
  • Some job details (salary, benefits) may not be available for all listings
  • Rate limiting by the portal may reduce throughput during high-volume scrapes
  • CAPTCHA challenges may interrupt scraping on heavily protected pages
  • Bundesagentur für Arbeit may modify their anti-bot measures, requiring periodic updates
  • Maximum items per run is capped at 1000 to prevent excessive resource usage
  • Proxy costs apply when using Apify residential or datacenter proxies