AutoScout24 Clean Scraper with 75 typed fields, 8 EU domains
Pricing
from $4.00 / 1,000 vehicle (standard)s
AutoScout24 Clean Scraper with 75 typed fields, 8 EU domains
75 typed fields per AutoScout24 listing across 8 EU domains (DE/FR/IT/ES/NL/AT/BE/EN). VIN, GPS, VAT, accident history, TÜV, structured equipment. Pay-per-vehicle $0.004 — failed records never charge. MCP-ready for AI agents. Idempotent UUID dedup, schema versioned.
Pricing
from $4.00 / 1,000 vehicle (standard)s
Rating
5.0
(1)
Developer
Tars Technology
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
2
Monthly active users
4 days ago
Last modified
Categories
Share
AutoScout24 Clean Scraper — 82 typed fields, 8 EU domains
🎁 First 10 records virtually free (~$0.0001 total). ~$0.36 for 100, ~$4 for 1,000. Pay only per delivered record above the trial.
Pull AutoScout24 used-car listings as clean structured JSON across 8 EU domains (DE/FR/IT/ES/NL/AT/BE/EN) — 82 ready-to-query fields per vehicle. VIN, GPS coordinates, VAT recoverability, accident history, TÜV inspection date, dealer hero images and opening hours — fields competing scrapers don't deliver. Cross-locale enums normalized to EN-stable values with raw locale preserved.
Built for data engineers shipping dealer pipelines. Also powers market analytics dashboards and AI agents (MCP-native).
✅ Production-hardened — 100+ runs, 479 tests, 5 review-loop cycles completed 2026-05-20. ✅ Idempotent — stable
vehicleIdUUID dedup survives Apify host migrations. ✅ Soft-block aware — Cloudflare / Captcha / WAF challenges trigger automatic session rotation. ✅ Cost-correct — failed records never charge; transient billing outages auto-recovered (4-attempt exponential backoff).
Why this scraper
🎯 82 ready-to-query fields per listing. Integers are integers (mileageKm: 58500), dates are ISO 8601 (firstRegistration: "2019-06-01"), equipment is structured arrays — never a "58,500 km" string to parse.
📍 The columns B2B teams actually filter on. VIN (when published by dealer), GPS coordinates (100% populated), VAT recoverability, vehicle history (accidents, previous owners, full service history, next TÜV inspection). Standard in this actor, absent everywhere else.
🛡️ Pay-per-event billing. Idempotent + actively maintained. You only pay for successful records pushed to your dataset — not for proxy retries, blocked fetches, or partial run failures upstream of push. Note: records pushed before a mid-run crash ARE billed (standard Apify PPE semantics). Stable vehicleId UUID dedup, schema versioning via _meta.parserVersion, < 24h support response.
Why this Actor beats $4 alternatives
Most AS24 scrapers on the Store ship locale-leaked strings you have to clean up downstream. Side-by-side, what you actually get per record:
| What you get | Typical AS24 scraper | This actor |
|---|---|---|
| VIN | absent | identifier.vin: "WAUZZZF49KN012345" (when published by dealer) |
| Mileage | "58,500 km" string | 58500 integer |
| Price | "€ 119,900" string | {amount: 119900, currency: "EUR", vatDeductible: false, vatRate: null, netAmount: null} |
| Transmission | "Schaltgetriebe" (DE locale) | "Manual" + transmissionOriginal: "Schaltgetriebe" |
| GPS | absent | {latitude: 50.96258, longitude: 7.18602} |
| Equipment | CSV string "ABS,ESP,Klimaautomatik,..." | Structured arrays per category (4 categories, typed entries) |
| Vehicle history | absent | {hadAccident: false, previousOwners: 3, fullServiceHistory: true, hsnTsn: "0583/AJZ"} |
| Dealer enrichment (v1.1) | name + phone string | seller.{heroImage, whatsappNumber, homepageUrl, contactPerson:{name,position,languages[]}, openingHoursByDay:{monday:{open:"09:00",close:"18:00"}, sunday:{closed:true}, …}, googleRatings} — extracted from the same listing payload, zero extra fetches |
| Pipeline-ready | parse strings manually | direct JOIN on integers/ISO dates/enums |
| Cross-domain consistent | locale-leaked | EN enums + original preserved in *Original sister fields |
| Migration-resilient | trial window re-opens / counters reset | _record_count persisted to KVS + Event.MIGRATING handler → cap math survives Apify host migrations |
| Soft-block aware | silent partial dataset on Cloudflare challenge | SoftBlockSessionError detection (CF / Captcha / WAF sniff) → automatic session rotation |
| Numeric drift tolerance | parser crashes on AS24 schema drift | Locale-aware weight regex (DE/EN/IT/ES/NL/FR) + safe coercion across 7 engine fields (CO₂, range, consumption, power) |
| Cost-correct PPE | charged for records lost to race conditions | Abort race fix + charge retry (1s/3s/9s exp backoff) → no pushed-not-charged or charged-not-pushed records |
Try it in 60 seconds
1️⃣ Apify Console (no code)
Click Try for free above. Paste any AS24 search URL. Hit Save & Start. First records flow in within ~10 seconds.
Example URL to paste:
https://www.autoscout24.com/lst/porsche/991?sort=age&desc=1
2️⃣ AI Agent (MCP-native)
Plug directly into Claude Desktop, Cursor, or any MCP client. The agent calls the scraper with natural language — no glue code:
https://mcp.apify.com?tools=hardy_ice-owner/autoscout24-clean-scraper-py
3️⃣ REST API
curl "https://api.apify.com/v2/acts/hardy_ice-owner~autoscout24-clean-scraper-py/run-sync-get-dataset-items?token=YOUR_TOKEN" \-X POST -H "Content-Type: application/json" \-d '{"urls": ["https://www.autoscout24.de/lst/audi/a6"], "maxRecords": 50}'
4️⃣ Python SDK
from apify_client import ApifyClientclient = ApifyClient("YOUR_TOKEN")run = client.actor("hardy_ice-owner/autoscout24-clean-scraper-py").call(run_input={"urls": ["https://www.autoscout24.de/lst/audi/a6"], "maxRecords": 50})for vehicle in client.dataset(run["defaultDatasetId"]).iterate_items():print(vehicle["vehicle"]["make"], vehicle["vehicle"]["model"], vehicle["price"]["amount"])
Real output sample
One record, the differentiators visible at a glance:
{"id": "d6a3056e-267d-45e6-a160-4777491ddcde","identifier": { "vin": "WP0ZZZ99ZKS123456", "offerReference": "P-2024-1287" },"vehicle": {"make": "Porsche", "model": "991","modelVersion": "911 Carrera 4 GTS","engine": { "powerKw": 331, "powerHp": 450, "cylinders": 6, "gears": 7 }},"price": { "amount": 119900, "currency": "EUR", "vatDeductible": false },"mileageKm": 58500,"firstRegistration": "2019-06-01","nextSafetyInspection": "2026-06","location": { "latitude": 50.96258, "longitude": 7.18602, "city": "Bergisch Gladbach" },"history": { "previousOwners": 3, "fullServiceHistory": true, "hsnTsn": "0583/AJZ" },"fuel": { "co2EfficiencyClass": "Green", "emissionStandardLabel": "EURO6" },"_meta": { "parserVersion": "ndata-v2", "fetchMethod": "http" }}
…plus 60+ more fields (full schema: equipment arrays per category, HD/mid images, dealer info, ratings, description HTML+text, etc.).
Pricing — try (virtually) free, pay only above the trial
🎁 First 10 records every run cost ~$0.0001 total (1/100 cent — Apify Store min event price). No card needed for a real evaluation. Convert when ready.
$0.004 per delivered record above the trial. No subscription. No minimum. Failed records never charge.
| Run size | Trial (1-10) | Paid (after 10) | Total |
|---|---|---|---|
| 10 vehicles | 10 × $0.00001 | — | $0.0001 |
| 100 vehicles | 10 × $0.00001 | 90 × $0.004 | $0.36 |
| 1,000 vehicles | 10 × $0.00001 | 990 × $0.004 | $3.96 |
| 5,000 vehicles | 10 × $0.00001 | 4,990 × $0.004 | $19.96 |
| 50,000 vehicles | 10 × $0.00001 | 49,990 × $0.004 | $199.96 |
Volume 50k+/mois : offre sur mesure via Apify Console messaging on the actor page.
Proxy data metered separately by Apify (free tier covers small runs; large jobs may need RESIDENTIAL proxy).
Premium options (priced higher)
Two opt-in flags trigger premium per-record pricing:
| Option | Price/record | When to use |
|---|---|---|
| Default (HTTP) | $0.004 | Standard scrape — works on 99% of AS24 pages |
forcePlaywright: true | $0.012 (3×) | Browser-rendered fallback — enable only if HTTP gets blocked at high volume |
includeRawData: true | $0.005 (+25%) | Adds raw __NEXT_DATA__ JSON blob (5-10× output size) — for debugging, custom field extraction, or downstream raw shipping |
Stacking applies: forcePlaywright + includeRawData → $0.012 (Playwright cost dominates).
Premium options ALSO honor the 10-record virtual trial ($0.00001 each).
The Console UI labels both options with ⚠️ PREMIUM so you see the cost impact before running.
Standard vs Full — which one?
Both tiers ship the same 82 typed fields. Full appends the raw __NEXT_DATA__ payload under _raw.* for everything we don't normalize.
Standard ($0.004) Full ($0.005)================= =============82 typed fields ✓ all 82 ✓ all 82+ raw __NEXT_DATA__ — ✓ (~150 extra fields)Record size ~5 KB ~38 KB (6.8×)Bandwidth (1000 rec) 5 MB 38 MB
Use Standard if you need:
- Dealer CRM ingestion (clean fields → DB columns)
- Analytics dashboards (price/km curves, regional supply)
- Cross-source dedup via UUID
- AI agent queries via MCP
- 95% of B2B pipelines
Use Full if you need:
_raw.dpvStatistics— page-view & favorite counts (lead scoring)_raw.financingAndInsurance— AS24-native leasing/finance offers per record_raw.prices.public/_raw.prices.dealer— split B2C vs B2B price evaluation_raw.adTier/appliedAdTier— dealer ad spend visibility_raw.adTargetingString— AS24 segmentation cohort- Custom field extraction (you need a field we don't parse)
- Forward raw payload to your data lake
- Debug parser regressions
Battle-tested — 5 hardening cycles, 479 tests, 0 regressions
Most AS24 scrapers on the Store are write-once / hope-it-still-works. This actor went through 5 successive review-loop cycles during the week of 2026-05-19/20 (loop 1 → loop 5), each one fixing a concrete production failure mode before publish. +105 new tests, +28% coverage, zero regression across the cycle. Numbers are the headline; the why-it-matters is below.
| # | Loop | What was fixed | Why it matters to your pipeline |
|---|---|---|---|
| 1 | Performance | Parallel HEAD preflight (50s → 5s), vehicle cache (−7 redundant lookups/record), regex HTML strip (10× faster than BeautifulSoup), KVS handle reuse | +18–25% throughput on large runs; cap-math fires sooner so you stop closer to maxRecords |
| 2a | Cost-correctness (PPE) | Abort race fix (no more pushed-not-charged), record_pushed lifted out of finally (no counter inflation on retry), defensive chargeLimit detection, charge retry with 1s/3s/9s exponential backoff | Recaptures ~$0.30–0.40 per 1k records previously lost to billing transients. No double-charges, no silent missed-charges. |
| 2b | Migration safety | _record_count persisted to KVS + Event.MIGRATING handler registered | Trial window + cap math now survive Apify host migrations. Pre-fix: a migration mid-run could silently re-open the $0 trial window on the new host, breaking pricing invariants. |
| 3 | Parser robustness (ndata-v2.1) | Media crash guard (None URL filter), CO₂ / electric-range / consumption coercion via _safe_int / _safe_float, evaluationCategory null union, weight regex locale-aware (DE thousands-dot, FR space separator, EN/IT/ES/NL plain int), numeric type-drift protection across 7 engine fields | Drift-tolerant: when AS24 ships a stray "45.5" where they used to ship 45, the record validates instead of crashing. Worth its weight in any long-running schedule. |
| 4 | Fetch reliability (v1.1.3) | SoftBlockSessionError subclass + Cloudflare / Captcha / WAF challenge sniff, detailUrlsOnly preflight, HEAD UA spoofed to Chrome 126, deterministic 404/410/451-only filter (no false-positive on 403 / 429 / 5xx), startswith prefix bug fix, dead-config cleanup | Soft-blocked requests rotate the session instead of polluting the dataset with empty records. Preflight only filters genuinely-gone URLs, not transient errors. |
| 5 | Sign-off | 479 tests deterministic, ruff/mypy clean, 5 cross-loop integration checks PASS | Ship signal — no flake, no skip, no # type: ignore churn |
Test count progression across the cycle: 374 → 421 → 432 → 465 → 479 + 1 xfail (intentional, schema-evolution gate). All 5 commits land on main; Apify webhook rebuilds the published actor on push.
What this means for a buyer: when AS24 changes their HTML, ships a new locale string, or returns a Cloudflare interstitial mid-run, you don't get paged at 3am — you get a clean record or a clearly-tagged retry. That's the moat.
Use cases
Dealer inventory monitoring — $6/day
Track 50 competitor dealers hourly. ~1,500 vehicles/day. Apify Schedules + webhook → your CRM. Cheaper than 1h of analyst time.
Market analytics — $20/week
5,000 BMW 3-Series scraped weekly. Model price erosion, mileage curves, regional supply. CSV/Excel export native.
AI agents (MCP-native) — pay per query
Connect via mcp.apify.com to Claude, Cursor, or any MCP client. Agent asks "2023 Tesla Model 3 under €40k near Berlin"; gets structured JSON; answers naturally. ~$0.04 per 10-vehicle query.
Operational notes
- 8 EU domains supported:
.de,.com,.fr,.it,.nl,.at,.be,.es. Cross-domain output normalized via EN-stable enums (bodyColor,vehicleType,emissionStandardLabel). - Cross-domain enums normalized: transmission, drive train, body type, upholstery, color, paint type, original market — all mapped to EN-stable values from DE/FR/IT/NL/AT/BE/ES locales. Raw locale preserved in
*Originalsister fields. - AutoScout24 caps any search at ~400 results (20 pages × 20 listings). For bigger queries, split by region/year/price-range and pass multiple
urls. The actor dedups across them. - Performance: 1,000 records in ~4 min; 5k in ~25 min; 50k in ~4 h. Memory tier 4 GB default; smart-capped to prevent OOM.
- Idempotency: intra-run dedup by AS24 UUID is 100% — no duplicate vehicle within a single run. Cross-run overlap varies (~80–85% on
sort=age/sort=priceURLs over a few minutes) due to AS24 listing volatility (new entries, sold items removed, paginator non-determinism). Stableidfield allows union dedup downstream across N runs. maxRecordsovershoot bound: actor may overshootmaxRecordsby ~10% due to in-flight handlers when the stop gate fires (e.g. requestmaxRecords=1000, receive up to ~1100 — billed accordingly). Set the Apify PPE charge limit to your hard budget cap; the actor stops cleanly when either bound is hit.- Schema stability:
_meta.parserVersion = "ndata-v2". Bumped only on breaking parser changes — your pipelines can gate on this field.
FAQ
Q: Is scraping AS24 legal in the EU? A: Public listings only, no authentication bypass, no PII beyond what AS24 publishes on its public marketplace. We respect AS24 rate limits.
Q: Can I skip listing crawl and hit detail URLs directly?
A: Yes. Use detailUrlsOnly: ["https://www.autoscout24.de/offers/audi-a6-...", ...] instead of urls.
Q: How do I scrape beyond the 400-result AS24 cap?
A: Provide multiple urls — one per filter slice (year, region, price band). The actor dedups by vehicleId across them.
Q: What if AS24 blocks the scrape?
A: Set forcePlaywright: true to render via headless Chrome. Slower (~4×) but bypasses most blocks. Auto-fallback in V1.5.
Q: Can I get CSV or Excel instead of JSON?
A: Yes — at download time. Apify Console (Export button) or REST GET /datasets/{id}/items?format=csv|xlsx. The actor always writes JSON.
Q: How fresh is the data?
A: Each record carries _meta.scrapedAt (when we fetched) and createdAt (when AS24 published the listing). Re-run on-demand or via Apify Schedules.
Q: Support response time? A: < 24h on weekdays. Issues, feature requests, and private deals via Apify Console messaging on the actor page.
Full feature comparison
vs typical AS24 scrapers on Apify Store:
| Dimension | Typical | This actor |
|---|---|---|
| Field count | ~30 (mostly strings) | 82 typed |
| VIN | absent | present when published by dealer |
| Billing model | flat (failed runs charged) | PPE (only successful records charged) |
| Cost-correct retry | no | 4-attempt exponential backoff on billing transients |
| GPS coordinates | absent | lat + lng 100% |
| VAT info | absent | vatDeductible + vatRate |
| HD images | 120×90 thumbnails | 1280×960 + 640×480 mid |
| Vehicle history | absent | accidents, owners, service, TÜV |
| Equipment | CSV mono-block string | structured arrays |
| Dealer enrichment | name + phone | heroImage, WhatsApp, opening hours, contact person, Google ratings (zero extra fetches) |
| Schema versioning | none | _meta.parserVersion |
| MCP-ready | no | yes |
| Soft-block aware | no | CF / Captcha / WAF sniff → session rotation |
| Idempotent across Apify host migration | no | yes — Event.MIGRATING handler + persisted counter |
| Drift-tolerant types | parser crashes on schema drift | safe int/float coercion across 7 engine fields |
| Idempotency | unspecified | documented (UUID dedup) |
| Source URL | inconsistent | canonical |
| Test suite | none / undocumented | 479 tests + 5 review-loop cycles |
| Maintenance | varies | < 24h support, active releases |
Output schema
_meta.parserVersion = "ndata-v2". 15 logical groups × 82 fields (v1.1 adds 7 dealer-enrichment fields under seller). Top-level:
id · sourceUrl · sourceDomain · identifier · price · vehicle (+ nested engine) · mileageKm · firstRegistration · nextSafetyInspection · createdAt · fuel · body · history · location · seller · ratings · equipment (4 categories) · media (HD + mid) · description · status · _meta.
Agent-ready (x402, USDC on Base)
For AI agents paying via HTTP 402 micropayments (USDC, no Apify account):
# 1. Install Apify MCP clientnpm install -g @apify/mcpc# 2. Connect to Apify MCP server with x402 payments enabledmcpc connect "mcp.apify.com?payment=x402" @apify --x402# 3. Call this actor — mcpc handles signing + retries automatically# First 10 records virtually free ($0.00001 each); above 10 = $0.004 USDC/vehicle.
Min $1 USDC prepaid balance on Base mainnet; subsequent calls draw down. Compatible with Coinbase Wallet, Privy, any Base-compatible wallet. PPE billing model required (usesStandbyMode: false, ✓ here).
See Apify x402 docs for wallet setup options.
Changelog
v1.1.3 (2026-05-20) — Fetch reliability hardening (loop 4). SoftBlockSessionError subclass + Cloudflare / Captcha / WAF challenge sniff → automatic session rotation instead of polluting the dataset with empty records. detailUrlsOnly preflight added. HEAD requests spoof Chrome 126 UA to dodge bot-detection on the preflight tier. Preflight filter tightened to 404 / 410 / 451 only — no false-positive elimination on transient 403 / 429 / 5xx. startswith prefix bug fixed; dead config keys cleaned up. +14 tests (→ 479).
v1.1.2 (2026-05-20) — Parser robustness sweep (loop 3, ndata-v2.1). Media-list crash guard (None URL filter). CO₂ / electricRange / consumption now coerced via _safe_int / _safe_float — no more parser crash on AS24 numeric drift. evaluationCategory union widened to nullable. Weight regex now locale-aware: DE thousands-dot (1.450 kg), FR space separator (1 450 kg), EN/IT/ES/NL plain int (1450 kg) — drift-tolerant across 7 engine fields. +33 tests (→ 465).
v1.1.1 (2026-05-20) — Cost-correctness + migration safety (loops 2a/2b). Abort race fix: no more pushed-not-charged records when a run is aborted mid-flight. record_pushed lifted out of finally block so retried records don't inflate the counter. Defensive chargeLimit detection. Charge retry with 1s/3s/9s exponential backoff → transient billing-API outages recover automatically. _record_count persisted to KVS + Event.MIGRATING handler registered → trial window and cap math now survive Apify host migrations. +11 tests (→ 432).
v1.1.0 (2026-05-20) — Dealer enrichment + perf wins (loop 1). 7 new nullable seller.* fields surfaced from the listing payload (zero extra fetches) — heroImage, heroImageInterior, whatsappNumber, homepageUrl, contactPerson (name/position/image/languages), googleRatings, openingHoursByDay (weekday-keyed {open, close} or {closed: true}). Closes the dealer-data gap vs $1.29/1K competitors who upsell on the same fields. Field count 75 → 82, all backward-compatible. Parallel HEAD preflight (50s → 5s), dict-wrap drop, vehicle-data cache (−7 lookups/record), regex HTML strip (10× faster than BeautifulSoup), KVS handle reuse → +18–25% throughput. Virtual free trial: first 10 records of every run billed at $0.00001 via new vehicle-trial PPE event. +47 tests (→ 421).
v1.0.5 (2026-05-19) — Filter loop gaps closed: Andorre, Partes en cuero (ES upholstery), MPV (NL bodyType) — 100% × 8 EU locales × 7 fields normalization coverage achieved.
v1.0.1 (2026-05-18) — Pre-publish hardening: abort handler stops PPE billing + crawler on user-aborted runs (was billing up to ~20 records post-abort), description HTML capped at 100 KB before parse, telemetry counter cardinality bounded at 1000 unique keys, URL host allowlist (8 AS24 domains) to block SSRF, KVS dedup persist cadence 50 → 500 (cuts O(n²) write amplification on 50k runs), apify SDK pinned to >=3.4,<3.5 for reproducible deploys.
v1.0.0 (2026-05-17) — Initial release. 75-field schema with cross-domain enum normalization (ndata-v2) — 7 fields mapped to EN-stable values, raw locale preserved in *Original sister fields. MCP + x402 support, PPE pricing, listing + detail crawling, vehicleId dedup, Playwright fallback, 8 EU domains. (Bumped to 82 fields in v1.1.0 — see above.)
Support
- Issues / feature requests / custom fields / private deals: Apify Console messaging on the actor page
- Response time: < 24h on weekdays