# AutoScout24 Clean Scraper with 75 typed fields, 8 EU domains (`hardy_ice-owner/autoscout24-clean-scraper-py`) Actor

75 typed fields per AutoScout24 listing across 8 EU domains (DE/FR/IT/ES/NL/AT/BE/EN). VIN, GPS, VAT, accident history, TÜV, structured equipment. Pay-per-vehicle $0.004 — failed records never charge. MCP-ready for AI agents. Idempotent UUID dedup, schema versioned.

- **URL**: https://apify.com/hardy\_ice-owner/autoscout24-clean-scraper-py.md
- **Developed by:** [Tars Technology](https://apify.com/hardy_ice-owner) (community)
- **Categories:** AI, MCP servers, Agents
- **Stats:** 2 total users, 2 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $4.00 / 1,000 vehicle (standard)s

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### AutoScout24 Clean Scraper — 82 typed fields, 8 EU domains

**🎁 First 10 records virtually free (~$0.0001 total). ~$0.36 for 100, ~$4 for 1,000. Pay only per delivered record above the trial.**

Pull AutoScout24 used-car listings as clean structured JSON across **8 EU domains** (DE/FR/IT/ES/NL/AT/BE/EN) — **82 ready-to-query fields per vehicle**. **VIN**, GPS coordinates, VAT recoverability, accident history, TÜV inspection date, dealer hero images and opening hours — fields competing scrapers don't deliver. Cross-locale enums normalized to EN-stable values with raw locale preserved.

Built for **data engineers shipping dealer pipelines**. Also powers market analytics dashboards and AI agents (MCP-native).

> ✅ **Production-hardened** — 100+ runs, **479 tests**, 5 review-loop cycles completed 2026-05-20.
> ✅ **Idempotent** — stable `vehicleId` UUID dedup survives Apify host migrations.
> ✅ **Soft-block aware** — Cloudflare / Captcha / WAF challenges trigger automatic session rotation.
> ✅ **Cost-correct** — failed records never charge; transient billing outages auto-recovered (4-attempt exponential backoff).

---

### Why this scraper

🎯 **82 ready-to-query fields per listing.** Integers are integers (`mileageKm: 58500`), dates are ISO 8601 (`firstRegistration: "2019-06-01"`), equipment is structured arrays — never a `"58,500 km"` string to parse.

📍 **The columns B2B teams actually filter on.** **VIN** (when published by dealer), GPS coordinates (100% populated), VAT recoverability, vehicle history (accidents, previous owners, full service history, next TÜV inspection). Standard in this actor, absent everywhere else.

🛡️ **Pay-per-event billing. Idempotent + actively maintained.** You only pay for successful records pushed to your dataset — not for proxy retries, blocked fetches, or partial run failures upstream of push. Note: records pushed before a mid-run crash ARE billed (standard Apify PPE semantics). Stable `vehicleId` UUID dedup, schema versioning via `_meta.parserVersion`, < 24h support response.

---

### Why this Actor beats $4 alternatives

Most AS24 scrapers on the Store ship locale-leaked strings you have to clean up downstream. Side-by-side, what you actually get per record:

| What you get | Typical AS24 scraper | This actor |
|---|---|---|
| **VIN** | absent | `identifier.vin: "WAUZZZF49KN012345"` (when published by dealer) |
| Mileage | `"58,500 km"` string | `58500` integer |
| Price | `"€ 119,900"` string | `{amount: 119900, currency: "EUR", vatDeductible: false, vatRate: null, netAmount: null}` |
| Transmission | `"Schaltgetriebe"` (DE locale) | `"Manual"` + `transmissionOriginal: "Schaltgetriebe"` |
| GPS | absent | `{latitude: 50.96258, longitude: 7.18602}` |
| Equipment | CSV string `"ABS,ESP,Klimaautomatik,..."` | Structured arrays per category (4 categories, typed entries) |
| Vehicle history | absent | `{hadAccident: false, previousOwners: 3, fullServiceHistory: true, hsnTsn: "0583/AJZ"}` |
| **Dealer enrichment** (v1.1) | name + phone string | `seller.{heroImage, whatsappNumber, homepageUrl, contactPerson:{name,position,languages[]}, openingHoursByDay:{monday:{open:"09:00",close:"18:00"}, sunday:{closed:true}, …}, googleRatings}` — extracted from the same listing payload, **zero extra fetches** |
| Pipeline-ready | parse strings manually | direct JOIN on integers/ISO dates/enums |
| Cross-domain consistent | locale-leaked | EN enums + original preserved in `*Original` sister fields |
| **Migration-resilient** | trial window re-opens / counters reset | `_record_count` persisted to KVS + `Event.MIGRATING` handler → cap math survives Apify host migrations |
| **Soft-block aware** | silent partial dataset on Cloudflare challenge | `SoftBlockSessionError` detection (CF / Captcha / WAF sniff) → automatic session rotation |
| **Numeric drift tolerance** | parser crashes on AS24 schema drift | Locale-aware weight regex (DE/EN/IT/ES/NL/FR) + safe coercion across 7 engine fields (CO₂, range, consumption, power) |
| **Cost-correct PPE** | charged for records lost to race conditions | Abort race fix + charge retry (1s/3s/9s exp backoff) → no pushed-not-charged or charged-not-pushed records |

---

### Try it in 60 seconds

#### 1️⃣ Apify Console (no code)

Click **Try for free** above. Paste any AS24 search URL. Hit **Save & Start**. First records flow in within ~10 seconds.

Example URL to paste:
````

https://www.autoscout24.com/lst/porsche/991?sort=age\&desc=1

```

#### 2️⃣ AI Agent (MCP-native)

Plug directly into Claude Desktop, Cursor, or any MCP client. The agent calls the scraper with natural language — no glue code:

```

https://mcp.apify.com?tools=hardy\_ice-owner/autoscout24-clean-scraper-py

````

#### 3️⃣ REST API

```bash
curl "https://api.apify.com/v2/acts/hardy_ice-owner~autoscout24-clean-scraper-py/run-sync-get-dataset-items?token=YOUR_TOKEN" \
  -X POST -H "Content-Type: application/json" \
  -d '{"urls": ["https://www.autoscout24.de/lst/audi/a6"], "maxRecords": 50}'
````

#### 4️⃣ Python SDK

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_TOKEN")
run = client.actor("hardy_ice-owner/autoscout24-clean-scraper-py").call(
    run_input={"urls": ["https://www.autoscout24.de/lst/audi/a6"], "maxRecords": 50}
)
for vehicle in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(vehicle["vehicle"]["make"], vehicle["vehicle"]["model"], vehicle["price"]["amount"])
```

***

### Real output sample

One record, the differentiators visible at a glance:

```json
{
  "id": "d6a3056e-267d-45e6-a160-4777491ddcde",
  "identifier": { "vin": "WP0ZZZ99ZKS123456", "offerReference": "P-2024-1287" },
  "vehicle": {
    "make": "Porsche", "model": "991",
    "modelVersion": "911 Carrera 4 GTS",
    "engine": { "powerKw": 331, "powerHp": 450, "cylinders": 6, "gears": 7 }
  },
  "price": { "amount": 119900, "currency": "EUR", "vatDeductible": false },
  "mileageKm": 58500,
  "firstRegistration": "2019-06-01",
  "nextSafetyInspection": "2026-06",
  "location": { "latitude": 50.96258, "longitude": 7.18602, "city": "Bergisch Gladbach" },
  "history": { "previousOwners": 3, "fullServiceHistory": true, "hsnTsn": "0583/AJZ" },
  "fuel": { "co2EfficiencyClass": "Green", "emissionStandardLabel": "EURO6" },
  "_meta": { "parserVersion": "ndata-v2", "fetchMethod": "http" }
}
```

…plus 60+ more fields (full schema: equipment arrays per category, HD/mid images, dealer info, ratings, description HTML+text, etc.).

***

### Pricing — try (virtually) free, pay only above the trial

**🎁 First 10 records every run cost ~$0.0001 total** (1/100 cent — Apify Store min event price). No card needed for a real evaluation. Convert when ready.

**$0.004 per delivered record** above the trial. No subscription. No minimum. Failed records never charge.

| Run size | Trial (1-10) | Paid (after 10) | Total |
|---|---:|---:|---:|
| 10 vehicles | 10 × $0.00001 | — | **$0.0001** |
| 100 vehicles | 10 × $0.00001 | 90 × $0.004 | **$0.36** |
| 1,000 vehicles | 10 × $0.00001 | 990 × $0.004 | **$3.96** |
| 5,000 vehicles | 10 × $0.00001 | 4,990 × $0.004 | **$19.96** |
| 50,000 vehicles | 10 × $0.00001 | 49,990 × $0.004 | **$199.96** |

Volume 50k+/mois : offre sur mesure via Apify Console messaging on the actor page.

Proxy data metered separately by Apify (free tier covers small runs; large jobs may need RESIDENTIAL proxy).

#### Premium options (priced higher)

Two opt-in flags trigger premium per-record pricing:

| Option | Price/record | When to use |
|---|---|---|
| **Default (HTTP)** | $0.004 | Standard scrape — works on 99% of AS24 pages |
| `forcePlaywright: true` | **$0.012** (3×) | Browser-rendered fallback — enable only if HTTP gets blocked at high volume |
| `includeRawData: true` | **$0.005** (+25%) | Adds raw `__NEXT_DATA__` JSON blob (5-10× output size) — for debugging, custom field extraction, or downstream raw shipping |

Stacking applies: `forcePlaywright + includeRawData` → $0.012 (Playwright cost dominates).

Premium options ALSO honor the 10-record virtual trial ($0.00001 each).

The Console UI labels both options with `⚠️ PREMIUM` so you see the cost impact before running.

***

### Standard vs Full — which one?

Both tiers ship the same 82 typed fields. **Full** appends the raw `__NEXT_DATA__` payload under `_raw.*` for everything we don't normalize.

```
                       Standard ($0.004)  Full ($0.005)
                       =================  =============
82 typed fields            ✓ all 82            ✓ all 82
+ raw __NEXT_DATA__        —                   ✓ (~150 extra fields)
Record size                ~5 KB               ~38 KB (6.8×)
Bandwidth (1000 rec)       5 MB                38 MB
```

**Use Standard if you need:**

- Dealer CRM ingestion (clean fields → DB columns)
- Analytics dashboards (price/km curves, regional supply)
- Cross-source dedup via UUID
- AI agent queries via MCP
- 95% of B2B pipelines

**Use Full if you need:**

- `_raw.dpvStatistics` — page-view & favorite counts (lead scoring)
- `_raw.financingAndInsurance` — AS24-native leasing/finance offers per record
- `_raw.prices.public` / `_raw.prices.dealer` — split B2C vs B2B price evaluation
- `_raw.adTier` / `appliedAdTier` — dealer ad spend visibility
- `_raw.adTargetingString` — AS24 segmentation cohort
- Custom field extraction (you need a field we don't parse)
- Forward raw payload to your data lake
- Debug parser regressions

***

### Battle-tested — 5 hardening cycles, 479 tests, 0 regressions

Most AS24 scrapers on the Store are write-once / hope-it-still-works. This actor went through **5 successive review-loop cycles** during the week of 2026-05-19/20 (loop 1 → loop 5), each one fixing a concrete production failure mode before publish. **+105 new tests, +28% coverage, zero regression** across the cycle. Numbers are the headline; the why-it-matters is below.

| ## | Loop | What was fixed | Why it matters to your pipeline |
|---|---|---|---|
| 1 | **Performance** | Parallel HEAD preflight (50s → 5s), vehicle cache (−7 redundant lookups/record), regex HTML strip (10× faster than BeautifulSoup), KVS handle reuse | **+18–25% throughput** on large runs; cap-math fires sooner so you stop closer to `maxRecords` |
| 2a | **Cost-correctness (PPE)** | Abort race fix (no more pushed-not-charged), `record_pushed` lifted out of `finally` (no counter inflation on retry), defensive `chargeLimit` detection, charge retry with 1s/3s/9s exponential backoff | **Recaptures ~$0.30–0.40 per 1k records** previously lost to billing transients. No double-charges, no silent missed-charges. |
| 2b | **Migration safety** | `_record_count` persisted to KVS + `Event.MIGRATING` handler registered | Trial window + cap math now **survive Apify host migrations**. Pre-fix: a migration mid-run could silently re-open the $0 trial window on the new host, breaking pricing invariants. |
| 3 | **Parser robustness** (`ndata-v2.1`) | Media crash guard (None URL filter), CO₂ / electric-range / consumption coercion via `_safe_int` / `_safe_float`, `evaluationCategory` null union, **weight regex locale-aware** (DE thousands-dot, FR space separator, EN/IT/ES/NL plain int), numeric type-drift protection across 7 engine fields | **Drift-tolerant**: when AS24 ships a stray `"45.5"` where they used to ship `45`, the record validates instead of crashing. Worth its weight in any long-running schedule. |
| 4 | **Fetch reliability** (v1.1.3) | `SoftBlockSessionError` subclass + Cloudflare / Captcha / WAF challenge sniff, `detailUrlsOnly` preflight, **HEAD UA spoofed to Chrome 126**, deterministic `404/410/451`-only filter (no false-positive on `403 / 429 / 5xx`), `startswith` prefix bug fix, dead-config cleanup | Soft-blocked requests rotate the session instead of polluting the dataset with empty records. Preflight only filters genuinely-gone URLs, not transient errors. |
| 5 | **Sign-off** | 479 tests deterministic, ruff/mypy clean, 5 cross-loop integration checks PASS | Ship signal — no flake, no skip, no `## type: ignore` churn |

**Test count progression across the cycle**: 374 → 421 → 432 → 465 → **479** + 1 xfail (intentional, schema-evolution gate). All 5 commits land on `main`; Apify webhook rebuilds the published actor on push.

What this means for a buyer: when AS24 changes their HTML, ships a new locale string, or returns a Cloudflare interstitial mid-run, **you don't get paged at 3am** — you get a clean record or a clearly-tagged retry. That's the moat.

***

### Use cases

#### Dealer inventory monitoring — $6/day

Track 50 competitor dealers hourly. ~1,500 vehicles/day. Apify Schedules + webhook → your CRM. Cheaper than 1h of analyst time.

#### Market analytics — $20/week

5,000 BMW 3-Series scraped weekly. Model price erosion, mileage curves, regional supply. CSV/Excel export native.

#### AI agents (MCP-native) — pay per query

Connect via `mcp.apify.com` to Claude, Cursor, or any MCP client. Agent asks *"2023 Tesla Model 3 under €40k near Berlin"*; gets structured JSON; answers naturally. ~$0.04 per 10-vehicle query.

***

### Operational notes

- **8 EU domains supported**: `.de`, `.com`, `.fr`, `.it`, `.nl`, `.at`, `.be`, `.es`. Cross-domain output normalized via EN-stable enums (`bodyColor`, `vehicleType`, `emissionStandardLabel`).
- **Cross-domain enums normalized**: transmission, drive train, body type, upholstery, color, paint type, original market — all mapped to EN-stable values from DE/FR/IT/NL/AT/BE/ES locales. Raw locale preserved in `*Original` sister fields.
- **AutoScout24 caps any search at ~400 results** (20 pages × 20 listings). For bigger queries, split by region/year/price-range and pass multiple `urls`. The actor dedups across them.
- **Performance**: 1,000 records in ~4 min; 5k in ~25 min; 50k in ~4 h. Memory tier 4 GB default; smart-capped to prevent OOM.
- **Idempotency**: intra-run dedup by AS24 UUID is **100%** — no duplicate vehicle within a single run. Cross-run overlap varies (~80–85% on `sort=age`/`sort=price` URLs over a few minutes) due to AS24 listing volatility (new entries, sold items removed, paginator non-determinism). Stable `id` field allows union dedup downstream across N runs.
- **`maxRecords` overshoot bound**: actor may overshoot `maxRecords` by ~10% due to in-flight handlers when the stop gate fires (e.g. request `maxRecords=1000`, receive up to ~1100 — billed accordingly). Set the Apify PPE charge limit to your hard budget cap; the actor stops cleanly when either bound is hit.
- **Schema stability**: `_meta.parserVersion = "ndata-v2"`. Bumped only on breaking parser changes — your pipelines can gate on this field.

***

### FAQ

**Q: Is scraping AS24 legal in the EU?**
A: Public listings only, no authentication bypass, no PII beyond what AS24 publishes on its public marketplace. We respect AS24 rate limits.

**Q: Can I skip listing crawl and hit detail URLs directly?**
A: Yes. Use `detailUrlsOnly: ["https://www.autoscout24.de/offers/audi-a6-...", ...]` instead of `urls`.

**Q: How do I scrape beyond the 400-result AS24 cap?**
A: Provide multiple `urls` — one per filter slice (year, region, price band). The actor dedups by `vehicleId` across them.

**Q: What if AS24 blocks the scrape?**
A: Set `forcePlaywright: true` to render via headless Chrome. Slower (~4×) but bypasses most blocks. Auto-fallback in V1.5.

**Q: Can I get CSV or Excel instead of JSON?**
A: Yes — at download time. Apify Console (Export button) or REST `GET /datasets/{id}/items?format=csv|xlsx`. The actor always writes JSON.

**Q: How fresh is the data?**
A: Each record carries `_meta.scrapedAt` (when we fetched) and `createdAt` (when AS24 published the listing). Re-run on-demand or via Apify Schedules.

**Q: Support response time?**
A: < 24h on weekdays. Issues, feature requests, and private deals via Apify Console messaging on the actor page.

***

### Full feature comparison

vs typical AS24 scrapers on Apify Store:

| Dimension | Typical | This actor |
|---|---|---|
| Field count | ~30 (mostly strings) | **82 typed** |
| **VIN** | absent | **present when published by dealer** |
| **Billing model** | flat (failed runs charged) | **PPE (only successful records charged)** |
| **Cost-correct retry** | no | **4-attempt exponential backoff on billing transients** |
| GPS coordinates | absent | **lat + lng 100%** |
| VAT info | absent | **vatDeductible + vatRate** |
| HD images | 120×90 thumbnails | **1280×960 + 640×480 mid** |
| Vehicle history | absent | **accidents, owners, service, TÜV** |
| Equipment | CSV mono-block string | **structured arrays** |
| Dealer enrichment | name + phone | **heroImage, WhatsApp, opening hours, contact person, Google ratings** (zero extra fetches) |
| Schema versioning | none | **`_meta.parserVersion`** |
| MCP-ready | no | **yes** |
| **Soft-block aware** | no | **CF / Captcha / WAF sniff → session rotation** |
| **Idempotent across Apify host migration** | no | **yes — `Event.MIGRATING` handler + persisted counter** |
| **Drift-tolerant types** | parser crashes on schema drift | **safe int/float coercion across 7 engine fields** |
| Idempotency | unspecified | **documented (UUID dedup)** |
| Source URL | inconsistent | canonical |
| Test suite | none / undocumented | **479 tests + 5 review-loop cycles** |
| Maintenance | varies | **< 24h support, active releases** |

***

### Output schema

`_meta.parserVersion = "ndata-v2"`. 15 logical groups × 82 fields (v1.1 adds 7 dealer-enrichment fields under `seller`). Top-level:
`id` · `sourceUrl` · `sourceDomain` · `identifier` · `price` · `vehicle` (+ nested `engine`) · `mileageKm` · `firstRegistration` · `nextSafetyInspection` · `createdAt` · `fuel` · `body` · `history` · `location` · `seller` · `ratings` · `equipment` (4 categories) · `media` (HD + mid) · `description` · `status` · `_meta`.

***

### Agent-ready (x402, USDC on Base)

For AI agents paying via HTTP 402 micropayments (USDC, no Apify account):

```bash
## 1. Install Apify MCP client
npm install -g @apify/mcpc

## 2. Connect to Apify MCP server with x402 payments enabled
mcpc connect "mcp.apify.com?payment=x402" @apify --x402

## 3. Call this actor — mcpc handles signing + retries automatically
##    First 10 records virtually free ($0.00001 each); above 10 = $0.004 USDC/vehicle.
```

Min $1 USDC prepaid balance on Base mainnet; subsequent calls draw down. Compatible with Coinbase Wallet, Privy, any Base-compatible wallet. PPE billing model required (`usesStandbyMode: false`, ✓ here).

See [Apify x402 docs](https://docs.apify.com/platform/integrations/x402) for wallet setup options.

***

### Changelog

**v1.1.3** (2026-05-20) — **Fetch reliability hardening (loop 4).** `SoftBlockSessionError` subclass + Cloudflare / Captcha / WAF challenge sniff → automatic session rotation instead of polluting the dataset with empty records. `detailUrlsOnly` preflight added. HEAD requests spoof Chrome 126 UA to dodge bot-detection on the preflight tier. Preflight filter tightened to **`404 / 410 / 451` only** — no false-positive elimination on transient `403 / 429 / 5xx`. `startswith` prefix bug fixed; dead config keys cleaned up. +14 tests (→ **479**).

**v1.1.2** (2026-05-20) — **Parser robustness sweep (loop 3, `ndata-v2.1`).** Media-list crash guard (None URL filter). CO₂ / `electricRange` / consumption now coerced via `_safe_int` / `_safe_float` — no more parser crash on AS24 numeric drift. `evaluationCategory` union widened to nullable. **Weight regex now locale-aware**: DE thousands-dot (`1.450 kg`), FR space separator (`1 450 kg`), EN/IT/ES/NL plain int (`1450 kg`) — drift-tolerant across 7 engine fields. +33 tests (→ 465).

**v1.1.1** (2026-05-20) — **Cost-correctness + migration safety (loops 2a/2b).** Abort race fix: no more pushed-not-charged records when a run is aborted mid-flight. `record_pushed` lifted out of `finally` block so retried records don't inflate the counter. Defensive `chargeLimit` detection. **Charge retry with 1s/3s/9s exponential backoff** → transient billing-API outages recover automatically. `_record_count` persisted to KVS + `Event.MIGRATING` handler registered → **trial window and cap math now survive Apify host migrations**. +11 tests (→ 432).

**v1.1.0** (2026-05-20) — **Dealer enrichment + perf wins (loop 1).** 7 new nullable `seller.*` fields surfaced from the listing payload (zero extra fetches) — `heroImage`, `heroImageInterior`, `whatsappNumber`, `homepageUrl`, `contactPerson` (name/position/image/languages), `googleRatings`, `openingHoursByDay` (weekday-keyed `{open, close}` or `{closed: true}`). Closes the dealer-data gap vs $1.29/1K competitors who upsell on the same fields. Field count **75 → 82**, all backward-compatible. **Parallel HEAD preflight** (50s → 5s), dict-wrap drop, vehicle-data cache (−7 lookups/record), regex HTML strip (10× faster than BeautifulSoup), KVS handle reuse → **+18–25% throughput**. Virtual free trial: first 10 records of every run billed at $0.00001 via new `vehicle-trial` PPE event. +47 tests (→ 421).

**v1.0.5** (2026-05-19) — Filter loop gaps closed: Andorre, *Partes en cuero* (ES upholstery), MPV (NL bodyType) — 100% × 8 EU locales × 7 fields normalization coverage achieved.

**v1.0.1** (2026-05-18) — Pre-publish hardening: abort handler stops PPE billing + crawler on user-aborted runs (was billing up to ~20 records post-abort), description HTML capped at 100 KB before parse, telemetry counter cardinality bounded at 1000 unique keys, URL host allowlist (8 AS24 domains) to block SSRF, KVS dedup persist cadence 50 → 500 (cuts O(n²) write amplification on 50k runs), apify SDK pinned to `>=3.4,<3.5` for reproducible deploys.

**v1.0.0** (2026-05-17) — Initial release. 75-field schema with cross-domain enum normalization (`ndata-v2`) — 7 fields mapped to EN-stable values, raw locale preserved in `*Original` sister fields. MCP + x402 support, PPE pricing, listing + detail crawling, vehicleId dedup, Playwright fallback, 8 EU domains. (Bumped to 82 fields in v1.1.0 — see above.)

***

### Support

- **Issues / feature requests / custom fields / private deals**: Apify Console messaging on the actor page
- **Response time**: < 24h on weekdays

# Actor input Schema

## `urls` (type: `array`):

AutoScout24 search result URLs to scrape. Use AS24 native filters (make, model, price, mileage, year, location, radius) to build URLs on autoscout24.de/.com/.fr/.it/.nl/.at/.be/.es, then paste here. AS24 caps any single search at ~400 results — split by region/year/price band and pass multiple URLs for larger queries (the actor dedups by vehicleId UUID across them). Provide AT LEAST ONE of urls or detailUrlsOnly.

## `maxRecords` (type: `integer`):

Maximum vehicles to scrape. First 10 records of every run are virtually free ($0.00001 each — $0.0001 trial total). Standard pricing ($0.004/record) applies above 10. Hard cap. Smart-capped per memory tier when includeRawData=true (1024 MB Actor: 2k cap; 4096 MB: 10k cap). Without includeRawData, full range available.

## `detailUrlsOnly` (type: `array`):

Direct AS24 vehicle detail page URLs. Bypasses listing crawl, scrapes detail pages directly. Useful for re-fetching known vehicles. Leave empty to use 'urls' (listing crawl).

## `outputFormat` (type: `string`):

Informational hint — actor always writes JSON to Apify dataset. Download in any format via Apify Console (download button) or API: GET /datasets/{id}/items?format=json|csv|xlsx. CSV/Excel flatten nested fields with dot notation (e.g. vehicle.make).

## `forcePlaywright` (type: `boolean`):

⚠️ PREMIUM OPTION — billed at $0.012/vehicle (3× standard $0.004). Renders pages via headless Chrome instead of HTTP. Enable only when AS24 blocks the default HTTP scrape (rare). Slower (~4× wall-clock), requires Actor memory ≥ 2048 MB.

## `proxyConfiguration` (type: `object`):

Proxy used to access AutoScout24. Defaults to Apify auto-tier (datacenter, included in free plan). Upgrade to RESIDENTIAL if you hit blocks at high volume.

## `includeRawData` (type: `boolean`):

⚠️ PREMIUM OPTION — billed at $0.005/vehicle (+25% vs standard $0.004). Bundles the complete raw AutoScout24 source payload (~150 fields per listing — every value AS24 ships, vs our 82 typed/normalized fields in the clean record). Output size 5-10× larger. WHO NEEDS THIS: (1) data engineering teams extracting fields beyond our 82 typed (e.g. niche dealer metadata, undocumented promo flags); (2) QA / audit teams verifying our normalization against source values; (3) pipelines shipping raw AS24 payloads downstream to internal warehouses; (4) buyers future-proofing against AS24 schema changes (our parserVersion drifts → raw still captures everything). Off by default — the clean 82-field record covers 95% of use cases. Toggle on per-run when you need source-level fidelity.

## Actor input object example

```json
{
  "urls": [
    {
      "url": "https://www.autoscout24.de/lst/audi/a6?sort=age&desc=1&atype=C"
    }
  ],
  "maxRecords": 300,
  "detailUrlsOnly": [],
  "outputFormat": "json",
  "forcePlaywright": false,
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "includeRawData": false
}
```

# Actor output Schema

## `vehicles` (type: `string`):

Default dataset: one record per AutoScout24 listing. See dataset\_schema.json for field-level types and the Overview table view.

## `vehiclesCsv` (type: `string`):

Same dataset exported as flattened CSV (Excel / spreadsheet friendly).

## `runLog` (type: `string`):

Stdout + stderr captured during the run (parser warnings, fetch failures, sentinel events).

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        {
            "url": "https://www.autoscout24.de/lst/audi/a6?sort=age&desc=1&atype=C"
        }
    ],
    "detailUrlsOnly": [],
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("hardy_ice-owner/autoscout24-clean-scraper-py").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "urls": [{ "url": "https://www.autoscout24.de/lst/audi/a6?sort=age&desc=1&atype=C" }],
    "detailUrlsOnly": [],
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("hardy_ice-owner/autoscout24-clean-scraper-py").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    {
      "url": "https://www.autoscout24.de/lst/audi/a6?sort=age&desc=1&atype=C"
    }
  ],
  "detailUrlsOnly": [],
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call hardy_ice-owner/autoscout24-clean-scraper-py --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=hardy_ice-owner/autoscout24-clean-scraper-py",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "AutoScout24 Clean Scraper with 75 typed fields, 8 EU domains",
        "description": "75 typed fields per AutoScout24 listing across 8 EU domains (DE/FR/IT/ES/NL/AT/BE/EN). VIN, GPS, VAT, accident history, TÜV, structured equipment. Pay-per-vehicle $0.004 — failed records never charge. MCP-ready for AI agents. Idempotent UUID dedup, schema versioned.",
        "version": "1.0",
        "x-build-id": "byaXPrHFJhF1ueJOs"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/hardy_ice-owner~autoscout24-clean-scraper-py/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-hardy_ice-owner-autoscout24-clean-scraper-py",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/hardy_ice-owner~autoscout24-clean-scraper-py/runs": {
            "post": {
                "operationId": "runs-sync-hardy_ice-owner-autoscout24-clean-scraper-py",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/hardy_ice-owner~autoscout24-clean-scraper-py/run-sync": {
            "post": {
                "operationId": "run-sync-hardy_ice-owner-autoscout24-clean-scraper-py",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "urls": {
                        "title": "Search URLs (listing pages)",
                        "type": "array",
                        "description": "AutoScout24 search result URLs to scrape. Use AS24 native filters (make, model, price, mileage, year, location, radius) to build URLs on autoscout24.de/.com/.fr/.it/.nl/.at/.be/.es, then paste here. AS24 caps any single search at ~400 results — split by region/year/price band and pass multiple URLs for larger queries (the actor dedups by vehicleId UUID across them). Provide AT LEAST ONE of urls or detailUrlsOnly.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxRecords": {
                        "title": "Max records",
                        "minimum": 1,
                        "maximum": 50000,
                        "type": "integer",
                        "description": "Maximum vehicles to scrape. First 10 records of every run are virtually free ($0.00001 each — $0.0001 trial total). Standard pricing ($0.004/record) applies above 10. Hard cap. Smart-capped per memory tier when includeRawData=true (1024 MB Actor: 2k cap; 4096 MB: 10k cap). Without includeRawData, full range available.",
                        "default": 300
                    },
                    "detailUrlsOnly": {
                        "title": "Detail URLs (skip listing)",
                        "type": "array",
                        "description": "Direct AS24 vehicle detail page URLs. Bypasses listing crawl, scrapes detail pages directly. Useful for re-fetching known vehicles. Leave empty to use 'urls' (listing crawl).",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "outputFormat": {
                        "title": "Output format",
                        "enum": [
                            "json",
                            "csv",
                            "excel"
                        ],
                        "type": "string",
                        "description": "Informational hint — actor always writes JSON to Apify dataset. Download in any format via Apify Console (download button) or API: GET /datasets/{id}/items?format=json|csv|xlsx. CSV/Excel flatten nested fields with dot notation (e.g. vehicle.make).",
                        "default": "json"
                    },
                    "forcePlaywright": {
                        "title": "Force Playwright (browser rendering) — PREMIUM 3× cost",
                        "type": "boolean",
                        "description": "⚠️ PREMIUM OPTION — billed at $0.012/vehicle (3× standard $0.004). Renders pages via headless Chrome instead of HTTP. Enable only when AS24 blocks the default HTTP scrape (rare). Slower (~4× wall-clock), requires Actor memory ≥ 2048 MB.",
                        "default": false
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Proxy used to access AutoScout24. Defaults to Apify auto-tier (datacenter, included in free plan). Upgrade to RESIDENTIAL if you hit blocks at high volume.",
                        "default": {
                            "useApifyProxy": true
                        }
                    },
                    "includeRawData": {
                        "title": "Unlock all ~150 raw AutoScout24 source fields per vehicle — audit, custom extract, future-proof (PREMIUM +25%)",
                        "type": "boolean",
                        "description": "⚠️ PREMIUM OPTION — billed at $0.005/vehicle (+25% vs standard $0.004). Bundles the complete raw AutoScout24 source payload (~150 fields per listing — every value AS24 ships, vs our 82 typed/normalized fields in the clean record). Output size 5-10× larger. WHO NEEDS THIS: (1) data engineering teams extracting fields beyond our 82 typed (e.g. niche dealer metadata, undocumented promo flags); (2) QA / audit teams verifying our normalization against source values; (3) pipelines shipping raw AS24 payloads downstream to internal warehouses; (4) buyers future-proofing against AS24 schema changes (our parserVersion drifts → raw still captures everything). Off by default — the clean 82-field record covers 95% of use cases. Toggle on per-run when you need source-level fidelity.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
