Bulk SEO Data Extractor avatar

Bulk SEO Data Extractor

Pricing

from $1.20 / 1,000 results

Go to Apify Store
Bulk SEO Data Extractor

Bulk SEO Data Extractor

Extract every on-page SEO signal from any URL: title, meta tags, canonical, OG/Twitter cards, JSON-LD schema, heading hierarchy, alt-text gaps, internal/external link counts, word count, text-to-HTML ratio.

Pricing

from $1.20 / 1,000 results

Rating

0.0

(0)

Developer

Thirdwatch

Thirdwatch

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Bulk SEO Audit - Titles, Meta, Headings, Schema, Links

The fastest bulk SEO meta tags and headings extractor — audit titles, meta descriptions, headings, canonicals, social cards, and schema across hundreds of URLs in one pass.

What you get

Paste a list of URLs and get a complete on-page SEO snapshot for each one: title, meta description, heading hierarchy, canonical, robots directive, Open Graph and Twitter social cards, schema markup, image alt-text gaps, internal/external link counts, and word count. Designed for SEO consultants, content auditors, and in-house SEO teams who need real on-page data across hundreds of pages without firing up a desktop crawler.

Output fields

FieldDescription
urlThe URL you submitted.
final_urlThe URL after following redirects.
status_codeHTTP status code (200, 301, 404, etc.).
titleThe <title> tag text.
title_lengthCharacter count of the title (Google typically truncates over ~60).
meta_descriptionThe meta description tag.
meta_description_lengthCharacter count (Google typically truncates over ~160).
meta_keywordsThe meta keywords tag (rarely used today).
canonicalThe canonical URL declared by the page.
robots_metaThe meta robots directive (e.g. index,follow, noindex).
viewportThe viewport meta tag (mobile-friendliness signal).
langThe lang attribute on the <html> tag.
charsetCharacter set declared by the page.
h1_tagsList of unique H1 headings on the page.
h2_tagsList of unique H2 headings on the page.
h1_countTotal H1 tags (more than one is an SEO smell).
h2_countTotal H2 tags.
h3_countTotal H3 tags.
h4_countTotal H4 tags.
ogOpen Graph tags (og:title, og:description, og:image, etc.) as a flat dict.
twitterTwitter card tags (twitter:card, twitter:title, etc.) as a flat dict.
json_ldParsed schema markup blocks (Product, Article, FAQ, Breadcrumb, etc.).
images_totalTotal <img> tags on the page.
images_missing_altImages with no alt attribute (accessibility + SEO gap).
internal_linksLinks pointing back to the same domain.
external_linksLinks pointing to other domains.
nofollow_linksLinks marked rel="nofollow".
word_countVisible word count (script and style content excluded).
text_to_html_ratioWord count divided by HTML byte size — low values flag thin or template-heavy pages.
response_time_msTime to fetch the page in milliseconds.
content_lengthHTML byte size.
errorError message on failure (null on success).
checked_atISO timestamp when the audit ran.

Example output

{
"url": "https://example.com",
"final_url": "https://example.com/",
"status_code": 200,
"title": "Example Domain",
"title_length": 14,
"meta_description": "Illustrative example for documentation.",
"meta_description_length": 39,
"meta_keywords": "",
"canonical": "https://example.com/",
"robots_meta": "index,follow",
"viewport": "width=device-width, initial-scale=1",
"lang": "en",
"charset": "utf-8",
"h1_tags": ["Example Domain"],
"h2_tags": ["More information"],
"h1_count": 1,
"h2_count": 1,
"h3_count": 0,
"h4_count": 0,
"og": {
"og:title": "Example Domain",
"og:description": "Illustrative example for documentation.",
"og:image": "https://example.com/og.png",
"og:type": "website",
"og:url": "https://example.com/",
"og:site_name": "Example"
},
"twitter": {
"twitter:card": "summary_large_image",
"twitter:title": "Example Domain",
"twitter:description": "Illustrative example for documentation.",
"twitter:image": "https://example.com/og.png"
},
"json_ld": [
{
"@context": "https://schema.org",
"@type": "WebSite",
"name": "Example",
"url": "https://example.com/"
}
],
"images_total": 4,
"images_missing_alt": 1,
"internal_links": 17,
"external_links": 3,
"nofollow_links": 2,
"word_count": 412,
"text_to_html_ratio": 0.0421,
"response_time_ms": 184,
"content_length": 9786,
"error": null,
"checked_at": "2026-05-04T12:00:00Z"
}

When a URL fails, the same shape is returned with error set and the numeric fields zeroed out.

Input parameters

ParameterRequiredDescription
urlsYesList of page URLs to audit. Each URL produces one result.
timeoutSecsNo (default 20)Maximum seconds to wait for each page (1–120). Slow pages report a timeout error.
concurrencyNo (default 10)How many pages to fetch in parallel (1–50).
userAgentNoThe User-Agent header sent with each request. Override to mimic a specific bot or browser.
proxyConfigurationNoOptional Apify proxy. Most public pages don't need one.

Use cases

  • SEO consultants running on-page audits — surface every page missing a title, meta description, or canonical, and rank H1 issues across the whole site.
  • Content auditors cleaning up legacy content — find thin pages by word count and text-to-HTML ratio, then prioritize rewrites.
  • In-house SEO teams monitoring schema coverage — verify every product page has Product schema, every article has Article schema, and every FAQ page has FAQ schema.
  • Accessibility leads finding images missing alt text — get a per-page count and prioritize the worst offenders.
  • Competitive researchers comparing social previews — pull Open Graph and Twitter cards from a competitor's blog to see how they package shares.

Limitations

  • Captures the HTML the server returns. Pages that render their content entirely in the browser (heavy single-page apps) may show empty title, missing headings, or zero links — for those sites, use a browser-based scraper instead.
  • Some sites rate-limit aggressive scraping. Use the proxy option if you start hitting 403s.
  • Word count is computed from visible text after removing script and style content. Inline SVG and obscure HTML may slightly inflate or reduce the count.
  • Schema markup blocks that are syntactically broken are skipped rather than failing the whole audit.
  • Authenticated pages are not supported — this is for public URLs only.

Compared to alternatives

Thirdwatch Bulk SEO Audittypical "SEO On-Page Analyzer" actors on Apify Store
Open Graph + Twitter cards as structured dictsYesOften title/description only
Parsed schema markup blocksYes (full JSON)Often missing
Heading lists (not just counts)YesOften only counts
Internal vs external vs nofollow link countsYesOften "total links" only
Image alt-text gap countYesSometimes
Text-to-HTML ratio (thin-content signal)YesRarely included
Concurrency controlUp to 50 parallelOften 5–10

FAQ

Does this respect robots.txt? This actor sends one request per URL and does not crawl. It uses a clearly identifiable User-Agent (ThirdwatchSEO/1.0) so site owners can recognize and rate-limit it if they want.

What's the rate limit? You control it via concurrency (max 50). For sites you don't own, keep it low and enable the proxy option if you hit 429 or 403.

Can I audit pages behind a login? No — this actor does not handle authentication, cookies, or session headers. It's designed for public URLs.

Why are some fields empty on JavaScript-heavy sites? Single-page apps render content in the browser after the initial HTML loads. This actor reads the server-rendered HTML only. For React/Vue/Angular sites with no server-side rendering, use a browser-based audit tool instead.

Does it parse every kind of schema markup? Yes — every <script type="application/ld+json"> block is parsed and returned as structured data. Microdata and RDFa are not parsed.

SEO meta tags and headings extractor for site-wide audits

Run a full on-page SEO audit across an entire site in minutes. Pair this with the Sitemap URL Extractor to pull every URL on the site, then pipe the list into this actor for a complete title/meta/heading/schema audit.

For uptime, redirect, and SSL checks across the same URL list, combine with the Bulk URL Status Checker.

Built and maintained by Thirdwatch.

Last verified: 2026-05