Pricing

$4.99/month + usage

Indexability Audit

Indexability audit tool that checks robots.txt, meta robots tags, X-Robots-Tag headers, and canonical URLs for any list of pages, so SEO teams know which ones Google can actually crawl and index.

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

ZeroBreak

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

Indexability Audit: Check Which Pages Search Engines Can Index

Indexability audit tool that scans any list of URLs and tells you which pages Google and other search engines can actually crawl and index. For each URL, it checks robots.txt rules, meta robots tags, X-Robots-Tag response headers, canonical tags, and HTTP status codes, then returns a pass or fail with the specific reason.

Run it before a site launch, after a migration, or when organic traffic drops and you need to know which pages have gone dark in search.

Use cases

Site audits: scan hundreds of pages to confirm that product pages, blog posts, and landing pages are all indexable
Pre-launch checks: verify pages before go-live so you don't ship with accidental noindex tags baked in
Post-migration recovery: after a domain move or CMS switch, confirm canonical tags and robots.txt rules are set up correctly
Traffic investigation: when rankings drop without warning, check whether key pages are still visible to search engines
Ongoing monitoring: run on a schedule to catch noindex regressions before they affect rankings

Input

Parameter	Type	Default	Description
`url`	string		A single URL to audit. Use this, `urls`, or both.
`urls`	array		A list of URLs to audit, one per line.
`checkRobotsTxt`	boolean	`true`	Fetch and check the robots.txt file for each domain.
`maxUrls`	integer	`100`	Maximum number of URLs to process per run. Hard cap: 1000.
`timeoutSecs`	integer	`300`	Total actor timeout in seconds.
`requestTimeoutSecs`	integer	`30`	Per-request timeout in seconds. Increase for slow sites.
`proxyConfiguration`	object	Datacenter (Anywhere)	Proxy type and location for requests. Supports Datacenter, Residential, Special, and custom proxies. Optional.

Example input

{
    "urls": [
        "https://apify.com",
        "https://apify.com/blog",
        "https://apify.com/about"
    ],
    "checkRobotsTxt": true,
    "maxUrls": 100,
    "requestTimeoutSecs": 30,
    "proxyConfiguration": { "useApifyProxy": true }
}

What data does this actor extract?

The actor saves one result per URL in the dataset.

{
    "url": "https://apify.com/blog",
    "finalUrl": "https://apify.com/blog/",
    "httpStatus": 200,
    "isIndexable": true,
    "indexabilityIssues": [],
    "metaRobotsContent": "",
    "xRobotsTag": "",
    "canonicalUrl": "https://apify.com/blog/",
    "isSelfCanonical": true,
    "robotsTxtBlocked": false,
    "redirectChain": [],
    "pageTitle": "Blog | Apify",
    "metaDescription": "News and tutorials about web scraping and automation.",
    "checkedAt": "2025-03-05T10:23:00.000Z"
}

Field	Type	Description
`url`	string	Original URL from input
`finalUrl`	string	Final URL after following any redirects
`httpStatus`	integer	HTTP response status code (200, 301, 404, etc.)
`isIndexable`	boolean	Whether the page passes all indexability checks
`indexabilityIssues`	array	List of issues found. Empty if the page is indexable.
`metaRobotsContent`	string	Content of the meta robots tag, e.g. `noindex, nofollow`
`xRobotsTag`	string	Value of the X-Robots-Tag HTTP response header
`canonicalUrl`	string	Canonical URL from the `<link rel="canonical">` tag
`isSelfCanonical`	boolean	Whether the canonical tag points back to the same page
`robotsTxtBlocked`	boolean	Whether the URL path is blocked by robots.txt
`redirectChain`	array	List of intermediate URLs in any redirect chain
`pageTitle`	string	Content of the `<title>` tag
`metaDescription`	string	Content of the meta description tag
`checkedAt`	string	ISO 8601 timestamp of when the URL was checked

How it works

Takes the input URL list, merges single and multi-URL inputs, and deduplicates
Fetches each page with a standard browser user-agent, tracking redirects manually to capture the full chain
Reads the X-Robots-Tag response header for noindex directives
Parses the HTML to extract the meta robots tag, canonical tag, title, and meta description
Optionally fetches robots.txt once per domain and checks whether the URL path is disallowed
Marks a page as indexable if the HTTP status is under 400 and none of the above checks fail
Saves the full result to the dataset

FAQ

What causes a page to fail the indexability check? Any of the following: HTTP 400+ status, meta robots noindex, X-Robots-Tag noindex, a canonical tag pointing to a different URL, or a robots.txt disallow rule matching the path.

Does it handle JavaScript-rendered pages? No. It fetches raw HTML. Most sites include the meta tags that matter for indexability in the initial server response, so this covers the majority of cases. If your site injects meta robots tags only via JavaScript, those will not be detected here.

How many URLs can I check per run? Default is 100. Raise it up to 1,000 with the maxUrls input. For larger audits, split the list across multiple runs.

Will target sites block my requests? This actor sends a standard Chrome user-agent. Most sites allow normal crawling. If a site blocks repeated requests, enabling a proxy in the input usually resolves it.

Can I audit URLs directly from a sitemap? Not directly. Extract the URLs from your sitemap first, then paste them into the urls input field.

Does it detect noindex in HTTP headers as well as meta tags? Yes. It checks both the X-Robots-Tag HTTP response header and the HTML meta robots tag independently.

Integrations

Connect Indexability Audit with other apps and services using Apify integrations. You can integrate with Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and many more. You can also use webhooks to trigger actions whenever results are available.

Sitemap & Robots SEO Index Auditor

glowing_glove/sitemap-robots-index-auditor

Audit robots.txt, sitemap discovery, indexability signals, canonical tags, and SEO crawl readiness for business websites.

Ushba Khan

Robots Indexability Audit

toronto_777/robots-indexability-audit

Audit public robots.txt, sitemap declarations, homepage robots directives, and crawler allow/block signals for SEO and AI visibility checks.

Steven Feng

Site Launch & Migration Technical SEO Audit

burly_bat/site-launch-migration-audit

Bulk technical launch/indexability audit for a list of URLs, a sitemap or a chained dataset. Checks HTTP status, redirects, robots.txt, meta robots, X-Robots-Tag, canonical, on-page tags, JSON-LD and hreflang, then returns a PASS/WARN/FAIL verdict per URL plus a go/no-go rollup.

Burly Bat

Site QA Indexability AI Crawler Report Scraper

taroyamada/site-qa-indexability-ai-crawler-report-scraper

Unofficially audit user-supplied public pages, robots.txt, and llms.txt signals for AI crawler indexability issues and source-linked report rows.

naoki anzai

Robots Txt Analyzer

zerobreak/robots-txt-analyzer

Robots txt analyzer that fetches and parses crawl rules from any website in bulk, so SEO teams and developers can audit blocked paths, user agents, and sitemap locations across hundreds of domains without manual work.

ZeroBreak

Website Sitemap & Robots Audit

zucram/website-sitemap-audit

Find robots.txt, declared sitemaps, homepage metadata, canonical URL, and basic crawlability signals for a list of websites.

Marcus Harlid Davin

robots.txt Analyzer

eliai/robots-txt-analyzer

Anthony Snider

SEO Page Audit

toronto_777/seo-page-audit

Audit public website pages for title, meta description, headings, canonical tags, robots directives, image alt coverage, links, and SEO issues.

Steven Feng

Robots.txt Auditor

junipr/robots-txt-auditor

Fetch and audit robots.txt syntax, user-agent rules, blocked paths, sitemap declarations, and crawl risks.

junipr

Noindex Directive Validator

zerobreak/noindex-directive-validator

Noindex checker that scans URLs for meta robots and X-Robots-Tag headers, so SEO teams can find pages accidentally blocked from indexing before they drop out of search results.

ZeroBreak