Pricing

$2.99/month + usage

Noindex Directive Validator

Noindex checker that scans URLs for meta robots and X-Robots-Tag headers, so SEO teams can find pages accidentally blocked from indexing before they drop out of search results.

Pricing

$2.99/month + usage

Rating

0.0

(0)

Developer

ZeroBreak

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

Noindex Directive Validator: Check Any URL for Noindex Tags and Headers

The noindex directive validator checks URLs for indexing blocks that might be hiding pages from Google. Feed it a list of URLs and it reads the meta robots tag and X-Robots-Tag HTTP header on each page, then tells you which ones have noindex set, along with the HTTP status code and the raw directive content.

Most useful after a site migration, CMS update, or new deployment. A stray noindex on a production page can silently drop it from search results. Catching them manually across dozens of pages is slow. This does it in bulk.

Use cases

SEO audits: verify that no important pages are accidentally blocked from search indexing across an entire site section
Post-migration checks: confirm that noindex tags used on staging have been removed before or after launch
CMS validation: catch cases where a CMS update or plugin added noindex to pages it should not have
Developer QA: run a quick crawlability check on a list of pages before publishing
Ongoing monitoring: schedule regular runs to catch noindex regressions before they affect rankings

What data does this actor extract?

Each URL in the dataset includes:

{
    "url": "https://apify.com/about",
    "finalUrl": "https://apify.com/about",
    "httpStatus": 200,
    "noindex": false,
    "noindexInMetaRobots": false,
    "noindexInXRobotsTag": false,
    "metaRobotsContent": "index, follow",
    "xRobotsTagContent": "",
    "pageTitle": "About Apify",
    "checkedAt": "2025-06-15T10:23:45.123456+00:00",
    "error": ""
}

Field	Type	Description
`url`	string	The original URL submitted for checking
`finalUrl`	string	The URL after following any redirects
`httpStatus`	integer	HTTP status code (200, 301, 404, etc.)
`noindex`	boolean	True if noindex was found anywhere on the page
`noindexInMetaRobots`	boolean	True if noindex is in a `<meta name="robots">` or `<meta name="googlebot">` tag
`noindexInXRobotsTag`	boolean	True if noindex is in the X-Robots-Tag HTTP response header
`metaRobotsContent`	string	Full content of the meta robots tag, if present
`xRobotsTagContent`	string	Full value of the X-Robots-Tag header, if present
`pageTitle`	string	Page title from the `<title>` element
`checkedAt`	string	ISO 8601 timestamp of when the check ran
`error`	string	Error message if the request failed; empty on success

Input

Parameter	Type	Default	Description
`url`	string		Single URL to check
`urls`	array		List of URLs to check, one per line
`maxUrls`	integer	100	Maximum number of URLs to process (up to 1000)
`requestTimeoutSecs`	integer	30	Timeout per request in seconds
`proxyConfiguration`	object	Datacenter (Anywhere)	Proxy type and location to use for requests. Optional.

Example input

{
    "urls": [
        "https://apify.com",
        "https://apify.com/about",
        "https://apify.com/pricing"
    ],
    "maxUrls": 100,
    "requestTimeoutSecs": 30,
    "proxyConfiguration": { "useApifyProxy": true }
}

How it works

Takes the submitted URLs, deduplicates them, and normalizes missing schemes to https://
Fetches each URL using an HTTP client that follows redirects
Reads the X-Robots-Tag response header and checks it for noindex or none
Parses the HTML and looks for <meta name="robots"> and <meta name="googlebot"> tags with noindex or none in their content
Pushes a result record per URL with the noindex status, raw directive values, HTTP status code, and page title

FAQ

Does this check the robots.txt file? No. This actor checks page-level noindex directives only: the meta robots tag and X-Robots-Tag header. Robots.txt controls crawling access, not indexing, and is a separate concern.

What counts as noindex? Both noindex and none directives (as defined by Google) trigger the noindex flag. none means noindex and nofollow combined.

Does it check Google-specific noindex tags? Yes. The actor checks both <meta name="robots"> and <meta name="googlebot"> tags.

What happens if a URL redirects? The actor follows redirects and checks the final destination page. Both the original URL and the final URL are recorded in the output.

How many URLs can I check per run? Up to 1,000 URLs per run. Set the maxUrls input to control the limit.

Can I run this on a schedule? Yes. Use Apify's scheduling feature to run the actor automatically at regular intervals to catch noindex regressions over time.

Integrations

Connect Noindex Directive Validator with other apps and services using Apify integrations. You can integrate with Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and many more. You can also use webhooks to trigger actions whenever results are available.

Run the noindex checker before a site launch to confirm every page you want indexed is actually indexable.

Robots-Noindex Conflict Checker

junipr/robots-noindex-conflict-checker

Detect conflicts between robots.txt blocking and page-level meta/X-Robots noindex signals.

junipr

SEO Indexability Checker

q_services/seo-indexability-checker

Audit pages for noindex, X-Robots-Tag, canonicals, titles, meta descriptions and H1 signals.

Q Services

Indexability Audit

zerobreak/indexability-audit

Indexability audit tool that checks robots.txt, meta robots tags, X-Robots-Tag headers, and canonical URLs for any list of pages, so SEO teams know which ones Google can actually crawl and index.

ZeroBreak

Robots Txt Analyzer

zerobreak/robots-txt-analyzer

Robots txt analyzer that fetches and parses crawl rules from any website in bulk, so SEO teams and developers can audit blocked paths, user agents, and sitemap locations across hundreds of domains without manual work.

ZeroBreak

Website SEO Audit Crawler — On-Page Analyzer

logiover/website-seo-audit-crawler

Crawl a whole website and export an on-page SEO audit to CSV/JSON: meta tag checker, canonical & noindex checker, schema.org type extractor. No browser.

Logiover

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

scrappy_garden/robots-txt-validator

Validate robots.txt for one or more websites: fetches /robots.txt per host, parses directive groups (User-agent/Allow/Disallow/Crawl-delay/Sitemap), reports common errors and warnings, and can test URLs against the chosen User-Agent.

Bikram Adhikari

Check Page Sizes

zerobreak/check-page-sizes

Page size checker that crawls any website and flags HTML pages over 2MB or PDFs over 64MB, the exact thresholds where Google stops indexing — so SEO teams can fix oversized files before they drop from search.

ZeroBreak

Robots.txt Validator

predictable_function/my-actor-3

List of website base URLs whose robots.txt files will be validated

riya rawat

5.0

Meta Tag Analyzer

scrappy_garden/meta-tag-analyzer

Analyze SEO meta tags for any list of URLs: title tag, meta description, canonical URL, robots meta, Open Graph, Twitter Cards, viewport, and hreflang. Produces a structured report with warnings and an SEO score for audits and QA.

Bikram Adhikari