# SEO Fields Scraper (`trovevault/seo-fields-scraper`) Actor

Extracts website SEO metadata with titles, descriptions, canonicals, robots tags, headings, Open Graph fields, and audit issues. Export data, run via API, schedule and monitor runs, or integrate with other tools.

- **URL**: https://apify.com/trovevault/seo-fields-scraper.md
- **Developed by:** [Trove Vault](https://apify.com/trovevault) (community)
- **Categories:** AI, SEO tools, Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.85 / 1,000 urls

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## SEO Fields Scraper

SEO Fields Scraper extracts page-level SEO metadata from public websites and turns it into an audit-ready dataset. Give it a homepage, a list of URLs, or a sitemap URL and it returns titles, meta descriptions, canonicals, robots directives, headings, Open Graph fields, Twitter Card fields, and practical issue flags.

Use it when you need a fast metadata inventory for site QA, content migrations, competitor checks, agency reporting, or scheduled monitoring. The actor is HTTP-first, so it is cheap to run and suitable for repeated audits where you want clear fields instead of screenshots or bulky crawl exports.

### Why Use This Actor

- Audit title tags and meta descriptions across important pages.
- Catch missing or weak canonicals, noindex directives, and H1 problems.
- Review Open Graph and Twitter Card metadata for social sharing previews.
- Sample pages from a site without running a heavy browser crawler.
- Export structured metadata to Apify datasets, API clients, spreadsheets, or downstream workflows.

### What It Extracts

For each processed page, the actor can return:

- `title`, `titleLength`, and `titleStatus`
- `metaDescription`, `metaDescriptionLength`, and `metaDescriptionStatus`
- `canonicalUrl` and `canonicalStatus`
- `robotsMeta`, `isNoindex`, and `isNofollow`
- `h1`, `h1Count`, and `h2Sample`
- Open Graph fields such as `openGraphTitle`, `openGraphDescription`, and `openGraphImage`
- Twitter Card fields such as `twitterTitle`, `twitterDescription`, and `twitterImage`
- `renderingUsed`, showing whether the row came from `http` or `browser`
- `seoScore`, `issues`, and `warnings`
- structured error fields when a URL cannot be fetched

The `seoScore` is a simple completeness score from 0 to 100. It is meant for triage and prioritization, not as a replacement for a full SEO strategy. Use `renderingUsed` to understand whether a row came from the fast HTTP path or from Playwright browser rendering.

### Use Cases

Content teams can check whether newly published pages have complete search and social metadata.

SEO consultants can produce a quick metadata export for audits, migration QA, or retainer reporting.

Growth teams can compare competitor landing pages and identify common metadata patterns.

Developers can run the actor after releases to catch missing titles, incorrect canonicals, or accidental noindex tags.

Automation teams can schedule runs and append results to an existing dataset for monitoring.

### Input Example

```json
{
  "startUrls": [
    { "url": "https://apify.com/" }
  ],
  "maxPages": 10,
  "crawlDepth": 1,
  "requestTimeoutSecs": 30,
  "renderingMode": "BROWSER_FALLBACK",
  "browserWaitSecs": 5,
  "sameDomainOnly": true,
  "includeOpenGraph": true,
  "includeTwitterCards": true,
  "includeHeadings": true
}
````

Use `crawlDepth: 0` when you already have the exact URLs you want to audit. Use `crawlDepth: 1` for a fast same-domain sample from a homepage. Use a sitemap URL when the site exposes one and you want broader coverage.

### Input Reference

| Field | Type | Description |
|---|---|---|
| `startUrls` | array | Website URLs or sitemap URLs to audit. |
| `maxPages` | integer | Maximum number of successful page rows to create. |
| `crawlDepth` | integer | Number of HTML link levels to follow from each start URL. |
| `requestTimeoutSecs` | integer | HTTP timeout for each page or sitemap request. |
| `renderingMode` | string | `HTTP_ONLY`, `BROWSER_FALLBACK`, or `BROWSER_ONLY`. |
| `browserWaitSecs` | integer | Extra wait time after browser page load when Playwright is used. |
| `sameDomainOnly` | boolean | Keeps discovered links on the same hostname as the start URL. |
| `includeOpenGraph` | boolean | Extracts Open Graph social preview fields. |
| `includeTwitterCards` | boolean | Extracts Twitter Card social preview fields. |
| `includeHeadings` | boolean | Extracts H1 and H2 heading signals. |
| `proxyConfiguration` | object | Optional Apify Proxy settings for blocked sites. |
| `datasetId` | string | Optional existing dataset to append results to. |
| `runId` | string | Optional upstream run ID copied into output rows. |

### Output Example

```json
{
  "url": "https://apify.com/",
  "finalUrl": "https://apify.com/",
  "statusCode": 200,
  "title": "Apify: Full-stack web scraping and data extraction platform",
  "titleLength": 61,
  "titleStatus": "ok",
  "metaDescription": "Apify is a full-stack web scraping and browser automation platform.",
  "metaDescriptionLength": 70,
  "metaDescriptionStatus": "ok",
  "canonicalUrl": "https://apify.com/",
  "canonicalStatus": "self",
  "renderingUsed": "http",
  "robotsMeta": null,
  "isNoindex": false,
  "h1": "Web scraping, automation, and AI agents",
  "h1Count": 1,
  "seoScore": 88,
  "issues": [],
  "warningCount": 2,
  "warnings": ["Missing Twitter Card image", "Missing Open Graph image"],
  "discoveredVia": "input",
  "scrapedAt": "2026-04-27T12:00:00.000Z",
  "error": false
}
```

### API Usage

```bash
curl "https://api.apify.com/v2/acts/TroveVault~seo-fields-scraper/runs" \
  -X POST \
  -H "Authorization: Bearer $APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "startUrls": [{ "url": "https://apify.com/" }],
    "maxPages": 10,
    "crawlDepth": 1,
    "renderingMode": "BROWSER_FALLBACK",
    "sameDomainOnly": true
  }'
```

After the run finishes, download results from the default dataset URL in the run output or from the Apify Console.

### How to Use Browser Rendering

Use `HTTP_ONLY` for most websites. It is the default, fastest, and cheapest mode. It works when SEO tags are present in the raw HTML returned by the server, which is common for well-built marketing sites, content sites, and many server-rendered apps.

Use `BROWSER_FALLBACK` when you are not sure. The actor first tries HTTP, checks whether core metadata is present, and only opens Playwright when the raw HTML looks sparse. This is the best setting for mixed crawls because normal pages stay cheap while JavaScript-rendered pages get a second pass.

Use `BROWSER_ONLY` when you know the site renders metadata or headings in JavaScript. This opens a browser for every page, so it is slower and more expensive. Start with `maxPages: 1` to `5`, keep `crawlDepth: 0` for tests, and increase `browserWaitSecs` only if the site is slow to hydrate.

For protected websites such as Amazon, browser mode may still return empty or blocked data. Amazon often serves bot checks, alternate HTML, regional pages, or sparse responses to automation. Try `BROWSER_FALLBACK` or `BROWSER_ONLY` with Apify Proxy enabled and a very small page limit, but do not expect guaranteed extraction from strongly protected domains.

### Limitations

Browser rendering is available through Playwright, but it should be used deliberately because it costs more than HTTP extraction.

It does not perform keyword research, backlink analysis, Core Web Vitals testing, search ranking checks, or screenshot analysis. It focuses on metadata extraction and lightweight page-level QA.

Some websites block automated HTTP clients. If you see `BLOCKED`, retry with Apify Proxy enabled and keep `maxPages` small.

### Troubleshooting

If results are empty, check that the URL returns public HTML or XML and is not a PDF, image, or login page.

If many pages are blocked, enable Apify Proxy and reduce concurrency by using a smaller `maxPages` and `crawlDepth`.

If a canonical appears different from the final URL, inspect redirects and trailing slash behavior before treating it as an error.

If social metadata is missing, verify whether the site uses Open Graph, Twitter Card tags, or JavaScript-rendered metadata. Retry with `renderingMode: "BROWSER_FALLBACK"` before assuming the tags do not exist.

If the actor finds too many irrelevant URLs, keep `sameDomainOnly` enabled and start from a narrower section URL or sitemap.

### FAQ

#### Can it crawl a whole website?

It can crawl same-domain links up to the `maxPages` and `crawlDepth` limits. For very large websites, use a sitemap and a deliberate page cap.

#### Does it support sitemap XML?

Yes. Add a sitemap URL to `startUrls` and the actor will enqueue URLs from `<loc>` entries when they match the domain rules.

#### Does it use a browser?

Only when you ask it to. `HTTP_ONLY` never opens a browser. `BROWSER_FALLBACK` opens a browser only when the raw HTML is missing useful metadata. `BROWSER_ONLY` opens a browser for every page.

#### Can I monitor metadata changes?

Yes. Schedule the actor and compare datasets over time in your own workflow or append runs to a shared `datasetId`.

#### What does `seoScore` mean?

It is a lightweight completeness score based on missing or weak metadata fields. Use it to prioritize QA, not as a universal ranking metric.

#### Will it respect external links?

By default, no. `sameDomainOnly` keeps the crawl focused on the start URL hostname.

#### Can it scrape blocked websites?

Sometimes. Enable Apify Proxy when a site blocks datacenter traffic, but always follow the target website's terms and applicable laws.

#### Should I use browser mode for Amazon?

You can try it, but Amazon is heavily protected and may still return sparse or blocked pages. Use Apify Proxy, `maxPages: 1`, `crawlDepth: 0`, and `BROWSER_ONLY` for a small test before running a larger job.

### Related Actors

Use this actor with product, catalog, review, or competitor intelligence actors when you need both page metadata and business data in the same workflow.

### Changelog

- `0.1` Initial release with HTTP-first metadata extraction, social fields, headings, scoring, structured errors, and dataset append support.

### Support

Open an issue from the Apify actor page or contact TroveVault with the run ID, input, and a short description of the page behavior you expected.

# Actor input Schema

## `startUrls` (type: `array`):

One or more website URLs to audit for SEO metadata. The actor fetches each URL, extracts title, meta description, canonical, robots, headings, Open Graph, Twitter Card fields, and optionally follows same-domain HTML links. Accepts full public URLs such as `https://apify.com/` or sitemap URLs such as `https://apify.com/sitemap.xml`. Start with a homepage or sitemap for site audits, and use specific URLs for page QA or competitor checks.

## `maxPages` (type: `integer`):

Maximum number of pages to output across the run. The actor stops creating successful page rows after this cap, while still reporting structured errors for failed seed URLs. Use 3 to 10 for smoke tests and 25 to 200 for practical website audits. Higher values cover more pages but increase run time and request cost.

## `crawlDepth` (type: `integer`):

How many link levels to follow from each HTML start URL. `0` audits only the provided URLs, `1` also audits links found on those pages, and higher values crawl deeper within the same website when allowed. Recommended: `0` for known page lists, `1` for fast site samples, and `2` only when you need broader coverage.

## `requestTimeoutSecs` (type: `integer`):

HTTP timeout for each page or sitemap request. The actor uses this limit when fetching HTML and XML over the network. Use 20 to 30 seconds for most public websites, and increase it only for slow sites. Higher values reduce false timeout errors but make blocked or broken URLs take longer to finish.

## `renderingMode` (type: `string`):

Controls whether the actor uses only fast HTTP requests or opens a real browser for JavaScript-rendered pages. `HTTP_ONLY` is cheapest and best for most SEO audits because many sites put title, meta description, canonical, robots, Open Graph, and Twitter tags in raw HTML. `BROWSER_FALLBACK` tries HTTP first, then uses Playwright only when the raw HTML has sparse metadata; this is recommended for mixed websites. `BROWSER_ONLY` renders every page in a browser, which helps JavaScript-heavy sites but is slower and more expensive. Browser mode does not guarantee success on strongly protected websites such as Amazon; use Apify Proxy and small page limits for those tests.

## `browserWaitSecs` (type: `integer`):

Extra time to wait after a browser page reaches DOMContentLoaded when `renderingMode` uses Playwright. This gives JavaScript frameworks time to inject title, meta, canonical, social, or heading tags before extraction. Use 3 to 5 seconds for most rendered pages, 8 to 15 seconds for slow single-page apps, and keep it low for large crawls because every additional second increases runtime and cost. This field is ignored in `HTTP_ONLY` mode.

## `sameDomainOnly` (type: `boolean`):

When enabled, the actor follows only links on the same hostname as each start URL. This keeps website audits focused and prevents third-party social, payment, CDN, or documentation links from consuming the page limit. Disable it only when you deliberately want to audit linked external pages.

## `includeOpenGraph` (type: `boolean`):

When enabled, the actor extracts Open Graph fields such as `og:title`, `og:description`, `og:image`, `og:type`, and `og:url`. These fields are useful for social previews, content QA, and competitor metadata reviews. Disable this field only if you want a smaller dataset focused on core search metadata.

## `includeTwitterCards` (type: `boolean`):

When enabled, the actor extracts Twitter Card fields such as `twitter:title`, `twitter:description`, `twitter:image`, `twitter:card`, and `twitter:site`. These fields help validate social sharing previews and metadata consistency. Disable this field for simpler technical SEO exports.

## `includeHeadings` (type: `boolean`):

When enabled, the actor extracts the first H1, H1 count, and a small sample of H2 headings. Use it to catch missing or duplicated H1s and compare page titles against visible content. Disable it when you only need HTML head metadata fields.

## `proxyConfiguration` (type: `object`):

Proxy settings for website requests. The actor applies this configuration to HTML pages, sitemap files, and discovered links. Enable Apify Proxy, preferably residential groups, when the target website returns HTTP 403, 429, CAPTCHA, Cloudflare, or region-specific blocks. Leave it disabled for open websites to keep runs cheaper and simpler.

## `datasetId` (type: `string`):

ID of an existing Apify dataset to append results to, in addition to the default run dataset. Use this for SEO pipelines that combine several actor runs into one dataset. Leave blank to write only to the default run dataset.

## `runId` (type: `string`):

ID of an existing Apify actor run to associate results with. The actor copies this value into each dataset item so external workflows can connect metadata rows to an upstream audit, crawl, or scheduling job. Leave blank for standalone runs.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://apify.com/",
      "label": "Apify homepage"
    }
  ],
  "maxPages": 3,
  "crawlDepth": 1,
  "requestTimeoutSecs": 30,
  "renderingMode": "HTTP_ONLY",
  "browserWaitSecs": 5,
  "sameDomainOnly": true,
  "includeOpenGraph": true,
  "includeTwitterCards": true,
  "includeHeadings": true,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `dataset` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://apify.com/",
            "label": "Apify homepage"
        }
    ],
    "maxPages": 3
};

// Run the Actor and wait for it to finish
const run = await client.actor("trovevault/seo-fields-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{
            "url": "https://apify.com/",
            "label": "Apify homepage",
        }],
    "maxPages": 3,
}

# Run the Actor and wait for it to finish
run = client.actor("trovevault/seo-fields-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://apify.com/",
      "label": "Apify homepage"
    }
  ],
  "maxPages": 3
}' |
apify call trovevault/seo-fields-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=trovevault/seo-fields-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "SEO Fields Scraper",
        "description": "Extracts website SEO metadata with titles, descriptions, canonicals, robots tags, headings, Open Graph fields, and audit issues. Export data, run via API, schedule and monitor runs, or integrate with other tools.",
        "version": "0.1",
        "x-build-id": "ahnJB3gOF1m3H6AeI"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/trovevault~seo-fields-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-trovevault-seo-fields-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/trovevault~seo-fields-scraper/runs": {
            "post": {
                "operationId": "runs-sync-trovevault-seo-fields-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/trovevault~seo-fields-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-trovevault-seo-fields-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "One or more website URLs to audit for SEO metadata. The actor fetches each URL, extracts title, meta description, canonical, robots, headings, Open Graph, Twitter Card fields, and optionally follows same-domain HTML links. Accepts full public URLs such as `https://apify.com/` or sitemap URLs such as `https://apify.com/sitemap.xml`. Start with a homepage or sitemap for site audits, and use specific URLs for page QA or competitor checks.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxPages": {
                        "title": "Max Pages",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum number of pages to output across the run. The actor stops creating successful page rows after this cap, while still reporting structured errors for failed seed URLs. Use 3 to 10 for smoke tests and 25 to 200 for practical website audits. Higher values cover more pages but increase run time and request cost.",
                        "default": 25
                    },
                    "crawlDepth": {
                        "title": "Crawl Depth",
                        "minimum": 0,
                        "maximum": 5,
                        "type": "integer",
                        "description": "How many link levels to follow from each HTML start URL. `0` audits only the provided URLs, `1` also audits links found on those pages, and higher values crawl deeper within the same website when allowed. Recommended: `0` for known page lists, `1` for fast site samples, and `2` only when you need broader coverage.",
                        "default": 1
                    },
                    "requestTimeoutSecs": {
                        "title": "Request Timeout (seconds)",
                        "minimum": 5,
                        "maximum": 120,
                        "type": "integer",
                        "description": "HTTP timeout for each page or sitemap request. The actor uses this limit when fetching HTML and XML over the network. Use 20 to 30 seconds for most public websites, and increase it only for slow sites. Higher values reduce false timeout errors but make blocked or broken URLs take longer to finish.",
                        "default": 30
                    },
                    "renderingMode": {
                        "title": "Rendering Mode",
                        "enum": [
                            "HTTP_ONLY",
                            "BROWSER_FALLBACK",
                            "BROWSER_ONLY"
                        ],
                        "type": "string",
                        "description": "Controls whether the actor uses only fast HTTP requests or opens a real browser for JavaScript-rendered pages. `HTTP_ONLY` is cheapest and best for most SEO audits because many sites put title, meta description, canonical, robots, Open Graph, and Twitter tags in raw HTML. `BROWSER_FALLBACK` tries HTTP first, then uses Playwright only when the raw HTML has sparse metadata; this is recommended for mixed websites. `BROWSER_ONLY` renders every page in a browser, which helps JavaScript-heavy sites but is slower and more expensive. Browser mode does not guarantee success on strongly protected websites such as Amazon; use Apify Proxy and small page limits for those tests.",
                        "default": "HTTP_ONLY"
                    },
                    "browserWaitSecs": {
                        "title": "Browser Wait (seconds)",
                        "minimum": 0,
                        "maximum": 30,
                        "type": "integer",
                        "description": "Extra time to wait after a browser page reaches DOMContentLoaded when `renderingMode` uses Playwright. This gives JavaScript frameworks time to inject title, meta, canonical, social, or heading tags before extraction. Use 3 to 5 seconds for most rendered pages, 8 to 15 seconds for slow single-page apps, and keep it low for large crawls because every additional second increases runtime and cost. This field is ignored in `HTTP_ONLY` mode.",
                        "default": 5
                    },
                    "sameDomainOnly": {
                        "title": "Same Domain Only",
                        "type": "boolean",
                        "description": "When enabled, the actor follows only links on the same hostname as each start URL. This keeps website audits focused and prevents third-party social, payment, CDN, or documentation links from consuming the page limit. Disable it only when you deliberately want to audit linked external pages.",
                        "default": true
                    },
                    "includeOpenGraph": {
                        "title": "Include Open Graph",
                        "type": "boolean",
                        "description": "When enabled, the actor extracts Open Graph fields such as `og:title`, `og:description`, `og:image`, `og:type`, and `og:url`. These fields are useful for social previews, content QA, and competitor metadata reviews. Disable this field only if you want a smaller dataset focused on core search metadata.",
                        "default": true
                    },
                    "includeTwitterCards": {
                        "title": "Include Twitter Cards",
                        "type": "boolean",
                        "description": "When enabled, the actor extracts Twitter Card fields such as `twitter:title`, `twitter:description`, `twitter:image`, `twitter:card`, and `twitter:site`. These fields help validate social sharing previews and metadata consistency. Disable this field for simpler technical SEO exports.",
                        "default": true
                    },
                    "includeHeadings": {
                        "title": "Include Headings",
                        "type": "boolean",
                        "description": "When enabled, the actor extracts the first H1, H1 count, and a small sample of H2 headings. Use it to catch missing or duplicated H1s and compare page titles against visible content. Disable it when you only need HTML head metadata fields.",
                        "default": true
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Proxy settings for website requests. The actor applies this configuration to HTML pages, sitemap files, and discovered links. Enable Apify Proxy, preferably residential groups, when the target website returns HTTP 403, 429, CAPTCHA, Cloudflare, or region-specific blocks. Leave it disabled for open websites to keep runs cheaper and simpler.",
                        "default": {
                            "useApifyProxy": false
                        }
                    },
                    "datasetId": {
                        "title": "Dataset ID (optional)",
                        "type": "string",
                        "description": "ID of an existing Apify dataset to append results to, in addition to the default run dataset. Use this for SEO pipelines that combine several actor runs into one dataset. Leave blank to write only to the default run dataset."
                    },
                    "runId": {
                        "title": "Run ID (optional)",
                        "type": "string",
                        "description": "ID of an existing Apify actor run to associate results with. The actor copies this value into each dataset item so external workflows can connect metadata rows to an upstream audit, crawl, or scheduling job. Leave blank for standalone runs."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
