Arbeitsagentur.de Scraper - German Federal Job Board avatar

Arbeitsagentur.de Scraper - German Federal Job Board

Pricing

from $8.00 / 1,000 job serp results

Go to Apify Store
Arbeitsagentur.de Scraper - German Federal Job Board

Arbeitsagentur.de Scraper - German Federal Job Board

Extract jobs from Germany's official employment agency (Bundesagentur für Arbeit). Get job titles, companies, locations, salaries, descriptions & contact details with ML-powered captcha solving. Supports search filters, direct URLs & job status checks.

Pricing

from $8.00 / 1,000 job serp results

Rating

0.0

(0)

Developer

Alessandro Santamaria

Alessandro Santamaria

Maintained by Community

Actor stats

1

Bookmarked

19

Total users

9

Monthly active users

a day ago

Last modified

Share

Arbeitsagentur.de Job Scraper

Scraper for job listings from Arbeitsagentur.de (Bundesagentur fur Arbeit), the German Federal Employment Agency's official job portal - one of the largest job boards in Germany with over 1 million active listings.

Features

  • Pure HTTP Architecture: No browser needed -- fast API-based search + HTTP captcha flow (~80MB memory)
  • Multi-Query Support: Run multiple search keywords in a single run with automatic deduplication
  • ML-Powered Captcha Solving: Extracts contact details (email, phone, contact person) behind the captcha using a trained ONNX neural network (~95% first-attempt accuracy)
  • Rich Company Data: Company logo, website (Homepage), description, and "Alle Stellen" link via employer profile API
  • Full Job Descriptions: Complete text with structured salary, dates, and remote work flags
  • Filter Options: Search by keywords, location, federal state (Bundesland), and employment type
  • Direct URL Mode: Check status of specific job listings
  • Standardized Output: Compatible with the Santamaria ecosystem JobListing schema

How It Works

  1. Search Phase: Uses the public REST API (v4/jobs) with X-API-Key authentication to search and paginate
  2. Job Details API (optional): Fetches rich structured data from v4/jobdetails -- full description, salary range, dates, employment flags
  3. Employer Profile API: Fetches company description, Homepage link, and logo URL from ag-darstellung-service
  4. Captcha + Contact API: Requests captcha assignment, solves with ONNX model, submits solution to unlock hidden contact data (email, phone, name)

All steps use pure HTTP -- no browser or Playwright required.

Input

FieldTypeDescriptionDefault
searchQueriesstring[]Multiple search keywords (each runs as separate search, deduplicated)-
searchQuerystringSingle search keyword (legacy, backward compatible)-
locationstringCity or region (Wo)-
bundeslandstringFederal state codeAll states
employmentTypestringType of employmentAll types
maxResultsPerQueryintegerMaximum results per search keyword100
maxResultsintegerTotal cap across all queries (0 = unlimited)0
includeJobDetailsbooleanExtract contact details, full description, company data (~3-5s/job)false
directUrlsarraySpecific job URLs to scrape-
proxyConfigurationobjectProxy settings (datacenter works fine)Apify proxy

Bundesland Codes

CodeState
BWBaden-Wurttemberg
BYBayern (Bavaria)
BEBerlin
BBBrandenburg
HBBremen
HHHamburg
HEHessen
MVMecklenburg-Vorpommern
NINiedersachsen
NWNordrhein-Westfalen
RPRheinland-Pfalz
SLSaarland
SNSachsen
STSachsen-Anhalt
SHSchleswig-Holstein
THThuringen

Employment Types

CodeDescription
VOLLZEITFull-time
TEILZEITPart-time
MINIJOBMini job
AUSBILDUNGApprenticeship
PRAKTIKUMInternship
FREIBERUFLICHFreelance
HEIMARBEITRemote work

Output

Each job listing includes:

{
"id": "12016-10004030577-S",
"title": "Pflegeassistent /in",
"company": "PerZukunft Arbeitsvermittlung GmbH&Co.KG",
"location": "10179, Berlin",
"country": "DE",
"canton": null,
"salary_min": 12.82,
"salary_max": 12.82,
"salary_currency": "EUR",
"salary_period": "hourly",
"salary_text": "12,82 EUR/Std.",
"employment_type": "full-time",
"workload_min": null,
"workload_max": null,
"remote_option": null,
"description_snippet": "Fur [mehrere] Standorte des Grossraums Berlin...",
"description_full": "Full job description with markdown formatting...",
"requirements": [],
"company_benefits": [],
"posted_at": "2026-02-26T00:00:00.000Z",
"modified_at": "2026-02-26T14:51:45.062Z",
"expires_at": null,
"source_url": "https://www.arbeitsagentur.de/jobsuche/jobdetail/12016-10004030577-S",
"source_platform": "arbeitsagentur.de",
"contact_salutation": "Frau",
"contact_firstname": "Delia",
"contact_lastname": "Schneider",
"contact_email": "wedding.pflege@perzukunft.de",
"contact_phone": "+49302200870",
"apply_url": "https://www.perzukunft.de/job/pflegeassistent-in-1201610004030577",
"apply_email": "wedding.pflege@perzukunft.de",
"company_url": null,
"company_website": "http://www.perzukunft.de/",
"company_logo_url": "https://rest.arbeitsagentur.de/.../arbeitgeberlogo/QCGq...",
"company_description": "Perzukunft - Unternehmensprofil...",
"company_jobs_url": "https://www.arbeitsagentur.de/jobsuche/suche?angebotsart=1&arbeitgeberKundennummerHash=...",
"search_query": "Pflege",
"scraped_at": "2026-03-04T15:36:48.707Z"
}

Field Reference

FieldSourceRequires includeJobDetails
company_websiteEmployer profile API (Homepage link)Yes
company_logo_urlConstructed from arbeitgeberKundennummerHashYes
company_descriptionEmployer profile APIYes
company_jobs_urlConstructed from employer hashYes
description_fullv4 jobdetails APIYes
modified_atv4 jobdetails API (aenderungsdatum)Yes
salary_min/max/periodv4 jobdetails APIYes
contact_*Captcha-protected bewerbung APIYes
remote_optionv4 jobdetails APIYes
search_queryInput search keyword that found this jobNo

Usage

{
"searchQueries": ["Pflege", "Krankenschwester", "Altenpfleger"],
"location": "Berlin",
"maxResultsPerQuery": 50,
"maxResults": 0,
"includeJobDetails": false
}

Quick Search (without contact details)

{
"searchQueries": ["Elektriker"],
"location": "Munchen",
"maxResultsPerQuery": 50,
"includeJobDetails": false
}

Full Extraction (with contact details)

{
"searchQueries": ["Pflege"],
"bundesland": "BY",
"maxResultsPerQuery": 20,
"includeJobDetails": true
}

Legacy Single Query (backward compatible)

{
"searchQuery": "Softwareentwickler",
"location": "Berlin",
"maxResults": 100,
"includeJobDetails": true
}

Via API

curl -X POST "https://api.apify.com/v2/acts/santamaria~arbeitsagentur-de-scraper/runs" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"searchQueries": ["Softwareentwickler", "Programmierer"],
"location": "Berlin",
"employmentType": "VOLLZEIT",
"maxResultsPerQuery": 100,
"includeJobDetails": true
}'

Performance & Cost

ModeMemorySpeedCU (50 jobs)
Search only (includeJobDetails: false)~40 MB~25 jobs/sec~0.005
With details (includeJobDetails: true)~120 MB~1 job/3-5 sec~0.05

Tip: Use includeJobDetails: false (default) for high-volume scraping, enable only when you need contact details and company data.

Technical Details

Captcha Solver

  • Model: CNN + Bidirectional LSTM with CTC loss
  • Input: 250x50 grayscale image
  • Accuracy: ~95% first-attempt solve rate
  • Character set: 0-9, a-z (29 characters)
  • Format: ONNX for fast inference (onnxruntime-node)

Key Implementation Notes

  • Job IDs must be base64url-encoded in API URLs (raw IDs return 404)
  • Captcha flow uses plain fetch -- gotScraping mangles AAS custom headers causing 403
  • Assignment body requires formId: 'ARBEITGEBERDATEN' and formProtectionLevel: 'JB_JOBSUCHE_20'
  • Bewerbung headers: aas-info: sessionId=..., challengeId=... and aas-answer: <solution>
  • Phone numbers from API are structured objects {laendervorwahl, vorwahl, rufnummer}, not flat strings
  • Docker image: node:20-slim (Debian-based, required for onnxruntime-node glibc dependency)

Limitations

  1. Rate Limiting: 2s delay between requests to avoid blocks
  2. Salary Data: German job listings rarely include salary information
  3. Job Expiration: API doesn't provide expiration dates
  4. Contact Availability: Not all listings have contact details behind captcha
  5. Company Website: Only available when the employer has configured a Homepage link in their profile

Version History

  • 3.1.0 (2026-03-17):
    • Multi-query support: searchQueries array with per-query limits and deduplication
    • Added maxResultsPerQuery (default 100), maxResults 0=unlimited
    • Added search_query output field to track which query found each job
    • Backward compatible: searchQuery (singular) still works
    • Memory limit raised to 512MB
  • 3.0.0 (2026-03-04):
    • Full HTTP-only migration -- removed Playwright entirely
    • Added employer profile API for company website, description, logo
    • Added company_website, company_jobs_url, company_description, modified_at fields
    • Logo displayed as image in Apify results table
    • Memory reduced from ~530MB to ~80MB, CU from 0.148 to ~0.005
    • Docker: node:20-slim (was apify/actor-node-playwright-chrome:20)
  • 2.0.0 (2026-01-26):
    • Added Playwright browser automation for detail extraction
    • Integrated ML-powered captcha solver (ONNX)
    • Hybrid architecture: API search + browser details
  • 1.0.0 (2024-12-22): Initial implementation with v4 API

Support

For issues or feature requests: Actor Issues


Part of the Santamaria Job Scrapers Suite - Professional-grade job data for the DACH region.

Built with Apify | Arbeitsagentur.de