Api Surface Mapper avatar
Api Surface Mapper

Pricing

Pay per usage

Go to Apify Store
Api Surface Mapper

Api Surface Mapper

An Apify Actor that discovers a website’s API surface by capturing browser network traffic (`fetch`/`xhr`), grouping similar requests into endpoint candidates, scoring them, and generating ready-to-run replay snippets (curl + TypeScript fetch).

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Nikita Chapovskii

Nikita Chapovskii

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

1

Monthly active users

8 days ago

Last modified

Share

API Surface Mapper (Crawlee + Playwright)

An Apify Actor that discovers a website’s API surface by capturing browser network traffic (fetch/xhr), grouping similar requests into endpoint candidates, scoring them, and generating ready-to-run replay snippets (curl + TypeScript fetch).

This is API discovery, not HTML scraping: point it at a site, optionally perform a few interactions, and get a ranked list of endpoints the UI is calling.


What it does

For each visited page, the Actor:

  1. Navigates using PlaywrightCrawler (Crawlee performs navigation automatically).
  2. Attaches a network tap before navigation to capture early fetch/xhr requests.
  3. Optionally runs a flow (steps) to trigger pagination, infinite scroll, filters, “Load more”, etc.
  4. Waits until the network becomes quiet (no new fetch/xhr for quietMs).
  5. Builds endpoint candidates:
    • normalizes URLs
    • patternizes volatile segments (IDs, tokens, etc.)
    • groups exchanges by endpoint pattern + method + kind
  6. Classifies candidates as REST / GraphQL / Other.
  7. Scores candidates and outputs the top-N with replay snippets.

Key features

  • Captures fetch and xhr requests (configurable).
  • Optional JSON response sampling (size-limited).
  • GraphQL detection from request body (query / operationName) even if endpoint is not /graphql.
  • Endpoint grouping via URL patternization so you get “unique endpoints”, not a dump of raw URLs.
  • Generates replay snippets:
    • curl
    • fetch (TypeScript)
  • Optional link crawling via enqueueLinks().

Output (Dataset)

For each processed page, the Actor stores an item like:

{
"pageUrl": "https://example.com/",
"timestamp": "2026-01-07T15:49:37.257Z",
"stats": { "exchangesCaptured": 12, "candidates": 7 },
"candidates": [
{
"patternUrl": "https://api.example.com/v1/items?page=*",
"method": "GET",
"kind": "REST",
"score": 55,
"sample": {
"url": "https://api.example.com/v1/items?page=1",
"status": 200,
"contentType": "application/json",
"requestHeaders": { "...": "..." },
"requestBody": null,
"responseBody": { "...": "..." }
},
"generated": {
"curl": "curl ...",
"tsFetch": "const res = await fetch(...)"
}
}
]
}

Input

The input is intentionally flat and simple.

startUrls (required): array of start URLs. Accepts both:

* "https://example.com"
* { "url": "https://example.com" }

Crawling

  • maxRequests (default: 20): maximum number of pages to process.
  • enqueueLinks (default: false): whether to discover and enqueue links from each page.
  • strategy (default: "same-hostname"): crawling strategy for links:
    • "same-hostname" | "same-domain" | "all"
  • globs (optional): allowlist patterns for links to enqueue.
  • linkSelector (default: "a[href]"): selector used by enqueueLinks().

Capture

  • captureTypes (default: ["xhr","fetch"]): which request types to capture.
  • maxExchangesPerPage (default: 250): hard cap of captured exchanges per page.
  • includeResponseBodies (default: false): if true, attempts to parse JSON responses and store a sample.
  • maxBodyKb (default: 256): JSON body size limit (best-effort).

Settle / timing

  • quietMs (default: 800): quiet period (no new fetch/xhr) before we consider capture “settled”.
  • quietTimeoutMs (default: 15000): hard timeout for settling. Settling waits for the first captured request. This prevents returning “quiet” too early when a page triggers fetch/xhr slightly later.

Page interaction flow

  • steps (default: []): page interaction flow (see below).
  • continueOnError (default: true): if a step fails, log a warning and continue.

Filtering (optional)

  • allowDomains: only capture requests to these domains (if set).
  • denyDomains: ignore requests to these domains.
  • denyUrlRegex: regex patterns to ignore requests.

Safety / privacy

  • redactHeaders: request/response headers to redact (defaults include auth/cookies).

Flow steps (steps)

Supported step types:

  • wait
  • waitForSelector
  • click
  • type
  • scroll

Example:

{
"steps": [
{ "type": "wait", "ms": 1200 },
{ "type": "scroll", "to": "bottom", "times": 1, "pauseMs": 700 },
{ "type": "click", "selector": "button:has-text(\"Load more\")" },
{ "type": "waitForSelector", "selector": "main", "timeoutMs": 5000 }
]
}

Example inputs

Apify website (crawl a few pages)

{
"startUrls": [{ "url": "https://apify.com/" }],
"enqueueLinks": true,
"maxRequests": 12,
"strategy": "same-hostname",
"globs": ["https://apify.com/**"],
"linkSelector": "a[href]",
"includeResponseBodies": false,
"maxExchangesPerPage": 300,
"quietMs": 800,
"quietTimeoutMs": 20000,
"steps": [{ "type": "wait", "ms": 1200 }]
}

GraphQL demo (Catstronauts)

{
"startUrls": [{ "url": "https://catstronauts.netlify.app/" }],
"enqueueLinks": false,
"maxRequests": 1,
"includeResponseBodies": true,
"maxBodyKb": 128,
"maxExchangesPerPage": 300,
"quietMs": 800,
"quietTimeoutMs": 20000,
"steps": [{ "type": "wait", "ms": 1800 }]
}

Scoring (how candidates are ranked)

Each captured exchange gets a numeric score. Exchanges are grouped into endpoint candidates, and the highest-scoring exchange becomes the representative example for that candidate. Scoring rules (current)

  1. Noise filter: if URL looks like analytics/telemetry → score = -1000.
  2. +30 if response content-type includes json.
  3. +10 if request hints include pagination keywords:
    • cursor | offset | limit | page | perpage | nexttoken
    • (checked across URL.search and request body text)
  4. +10 if response size is known and content-length > 20k.
  5. -50 if HTTP status is >= 400.
  6. -30 if path looks like auth/session/token/csrf:
    • /auth | /session | /csrf | /token
  7. +15 if parsed JSON response looks list-like:
    • an array of objects: [{...}, {...}]
    • or an object with items: [...]

Notes

  • If includeResponseBodies is disabled, the “list-like response” boost cannot apply.