# SEO & Technical Site Audit (`berkaydev/seo-audit-tool`) Actor

Combined technical SEO audit: broken links, redirect chains, on-page checks, sitemap/robots health. Client-ready HTML report + structured JSON.

- **URL**: https://apify.com/berkaydev/seo-audit-tool.md
- **Developed by:** [Berkay](https://apify.com/berkaydev) (community)
- **Categories:** SEO tools, Developer tools
- **Stats:** 3 total users, 3 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## SEO & Technical Site Audit

Crawl a website and get every broken link, redirect chain, on-page SEO problem, and sitemap issue in one run — as a clean JSON dataset plus an HTML report you can hand to a client.

Most link checkers on the Store do exactly one thing: find 404s. This actor runs the full technical pass in a single crawl, so you're not stitching together four different tools to audit one site.

### What it checks

- **Broken links** (4xx / 5xx) — internal and external, with the page they were found on and the anchor text. It also catches broken `#fragment` anchors, which most checkers quietly skip.
- **Redirect chains** — every hop, the final URL, and the chain length, so you can flatten `301 → 302 → 200` messes that waste crawl budget and leak link equity.
- **On-page SEO** — missing or duplicate titles and meta descriptions, missing or multiple `H1`s, images without `alt` text, missing canonicals, and accidental `noindex` tags.
- **Sitemap & robots health** — whether your sitemap is reachable and valid, and which crawled pages never appear in it (orphan hints).

Every issue is graded by severity (critical / warning / info) and rolled up into a single health score.

### How it's different

I built this because the existing options are either single-purpose (just broken links) or haven't been touched in over a year. A few things I cared about:

- **One crawl, full picture** — links, redirects, on-page, and sitemap in the same run instead of four separate actors.
- **Output you can actually use** — flat, predictable JSON (one row per page, with that page's issues nested alongside its SEO fields) that drops straight into n8n, Make, a Google Sheet, or your own script. Plus an HTML report for the people who sign off on the work but don't read JSON.
- **No start fee** — most audit actors on the Store charge a flat $0.035–$0.04 just to launch a run, before they've checked a single page. This one only bills per page actually crawled. A quick 20-page spot-check stays cheap.
- **Polite by default** — respects `robots.txt`, rate-limits itself, and sends a clear user agent. It won't hammer the site you're auditing.
- **Kept up to date** — if a check breaks or a field is missing, it gets fixed. See the roadmap below for what's next.

### How to use it

1. Hit **Try for free**.
2. Paste the URL you want to audit into **Start URLs**.
3. Optionally set **Max pages** — start small (say 50) to see the shape of the output first.
4. Run it. Issues land in the **Dataset** tab; the HTML report and summary sit in the key-value store under the **Storage** tab.

No code required, but everything is reachable through the [Apify API](https://docs.apify.com/api/v2) if you want it in a pipeline.

### Input

| Field | What it does |
|---|---|
| `startUrls` | One or more URLs to audit |
| `maxPages` | Cap on pages crawled (`0` = unlimited) |
| `maxDepth` | How many link-hops deep to go |
| `crawlSubdomains` | Include subdomains of the start URL |
| `checkExternalLinks` | Verify external links too |
| `includeOnPageSEO` | Run the on-page checks |
| `maxConcurrency` / `requestDelayMs` | Trade speed against politeness |

Example input:

```json
{
  "startUrls": [{ "url": "https://example.com" }],
  "maxPages": 200,
  "includeOnPageSEO": true,
  "checkExternalLinks": true
}
````

### Output

**Dataset — one row per crawled page**, with all SEO fields and any issues found on that page:

```json
{
  "url": "https://example.com/about",
  "finalUrl": "https://example.com/about/",
  "statusCode": 200,
  "title": "About Us — Example",
  "metaDescription": "Learn more about our team.",
  "h1": ["About Us"],
  "canonical": "https://example.com/about/",
  "indexable": true,
  "wordCount": 423,
  "imagesMissingAlt": 2,
  "redirectChain": [],
  "crawlError": null,
  "issues": [
    {
      "issue_type": "missing_alt_text",
      "severity": "warning",
      "detail": "2 image(s) missing alt attribute."
    }
  ]
}
```

**Key-Value Store:**

- `SUMMARY` — site-level stats, health score (0–100), score breakdown, sitemap/robots status
- `REPORT` — full HTML audit report, open directly in the browser

```json
{
  "pages_crawled": 142,
  "health_score": 78,
  "critical_count": 3,
  "warning_count": 19,
  "info_count": 7,
  "score_breakdown": {
    "broken_link": { "count": 3, "deduction": 8.5 },
    "missing_meta_description": { "count": 12, "deduction": 5.2 }
  }
}
```

### Pricing

Pay per page crawled — you pay for what you actually audit, and the first batch is free so you can see the output before committing. No subscription, no minimum.

### On the roadmap

Being upfront about what it doesn't do yet:

- Lighthouse / Core Web Vitals (performance + accessibility scoring)
- Scheduled re-audits with change alerts
- Word count and thin-content flags per page

If you need one of these sooner, or you hit a site that trips up the crawl, tell me — it genuinely helps me decide what to build next.

### FAQ

**Does it work on JavaScript-heavy sites?**
It reads the server-rendered HTML, which covers most sites. Fully client-rendered SPAs may expose fewer links until the rendered mode (on the roadmap) lands.

**Will it get me blocked?**
It respects `robots.txt`, rate-limits, and identifies itself. On your own site you can turn the delay down; on sites you don't control, keep it polite.

**Can I run it on a schedule?**
Yes — use Apify [Schedules](https://docs.apify.com/platform/schedules) to run it weekly and diff the results over time.

**Is running an SEO audit like this allowed?**
You're checking publicly accessible pages for technical health, which is standard SEO practice. Respect the target site's terms and `robots.txt` — which this actor does by default.

# Actor input Schema

## `startUrls` (type: `array`):

One or more URLs to audit (typically the homepage or a specific section).

## `maxPages` (type: `integer`):

Maximum pages per run. Set to 0 for unlimited (use with caution on large sites).

## `maxDepth` (type: `integer`):

How many link-hops deep to crawl from the start URLs.

## `crawlSubdomains` (type: `boolean`):

Also crawl pages on subdomains (e.g. blog.example.com when starting from example.com).

## `includeGlobs` (type: `array`):

Only crawl URLs matching these glob patterns (e.g. https://example.com/blog/\*). Leave empty to crawl everything. Exclude patterns take priority over include patterns.

## `excludeGlobs` (type: `array`):

Skip URLs matching these glob patterns (e.g. */cart/*, *?replytocom=*). Useful to avoid low-value pages and save on crawl cost.

## `checkExternalLinks` (type: `boolean`):

Verify HTTP status of external links (HEAD request only — no crawl).

## `includeOnPageSEO` (type: `boolean`):

Check titles, meta descriptions, H1 tags, canonical, alt text and noindex flags.

## `maxConcurrency` (type: `integer`):

Number of parallel requests. Higher = faster but more load on the target.

## `requestDelayMs` (type: `integer`):

Milliseconds to wait between requests. Increase for polite crawling.

## `userAgent` (type: `string`):

User agent string sent with every request.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://books.toscrape.com"
    }
  ],
  "maxPages": 20,
  "maxDepth": 2,
  "crawlSubdomains": false,
  "includeGlobs": [],
  "excludeGlobs": [],
  "checkExternalLinks": true,
  "includeOnPageSEO": true,
  "maxConcurrency": 5,
  "requestDelayMs": 200,
  "userAgent": "SEOAuditBot/1.0 (+https://apify.com)"
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://books.toscrape.com"
        }
    ],
    "maxPages": 20,
    "maxDepth": 2,
    "checkExternalLinks": false
};

// Run the Actor and wait for it to finish
const run = await client.actor("berkaydev/seo-audit-tool").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://books.toscrape.com" }],
    "maxPages": 20,
    "maxDepth": 2,
    "checkExternalLinks": False,
}

# Run the Actor and wait for it to finish
run = client.actor("berkaydev/seo-audit-tool").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://books.toscrape.com"
    }
  ],
  "maxPages": 20,
  "maxDepth": 2,
  "checkExternalLinks": false
}' |
apify call berkaydev/seo-audit-tool --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=berkaydev/seo-audit-tool",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "SEO & Technical Site Audit",
        "description": "Combined technical SEO audit: broken links, redirect chains, on-page checks, sitemap/robots health. Client-ready HTML report + structured JSON.",
        "version": "0.4",
        "x-build-id": "vwM46pkhAn6Adr0L2"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/berkaydev~seo-audit-tool/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-berkaydev-seo-audit-tool",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/berkaydev~seo-audit-tool/runs": {
            "post": {
                "operationId": "runs-sync-berkaydev-seo-audit-tool",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/berkaydev~seo-audit-tool/run-sync": {
            "post": {
                "operationId": "run-sync-berkaydev-seo-audit-tool",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "One or more URLs to audit (typically the homepage or a specific section).",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxPages": {
                        "title": "Max pages to crawl",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum pages per run. Set to 0 for unlimited (use with caution on large sites).",
                        "default": 200
                    },
                    "maxDepth": {
                        "title": "Max crawl depth",
                        "minimum": 1,
                        "type": "integer",
                        "description": "How many link-hops deep to crawl from the start URLs.",
                        "default": 5
                    },
                    "crawlSubdomains": {
                        "title": "Crawl subdomains",
                        "type": "boolean",
                        "description": "Also crawl pages on subdomains (e.g. blog.example.com when starting from example.com).",
                        "default": false
                    },
                    "includeGlobs": {
                        "title": "Include URL patterns",
                        "type": "array",
                        "description": "Only crawl URLs matching these glob patterns (e.g. https://example.com/blog/*). Leave empty to crawl everything. Exclude patterns take priority over include patterns.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "excludeGlobs": {
                        "title": "Exclude URL patterns",
                        "type": "array",
                        "description": "Skip URLs matching these glob patterns (e.g. */cart/*, *?replytocom=*). Useful to avoid low-value pages and save on crawl cost.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "checkExternalLinks": {
                        "title": "Check external links",
                        "type": "boolean",
                        "description": "Verify HTTP status of external links (HEAD request only — no crawl).",
                        "default": true
                    },
                    "includeOnPageSEO": {
                        "title": "Include on-page SEO checks",
                        "type": "boolean",
                        "description": "Check titles, meta descriptions, H1 tags, canonical, alt text and noindex flags.",
                        "default": true
                    },
                    "maxConcurrency": {
                        "title": "Max concurrent requests",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Number of parallel requests. Higher = faster but more load on the target.",
                        "default": 5
                    },
                    "requestDelayMs": {
                        "title": "Delay between requests (ms)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Milliseconds to wait between requests. Increase for polite crawling.",
                        "default": 200
                    },
                    "userAgent": {
                        "title": "User agent",
                        "type": "string",
                        "description": "User agent string sent with every request.",
                        "default": "SEOAuditBot/1.0 (+https://apify.com)"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
