Pricing

from $1.20 / 1,000 results

Bulk SEO Data Extractor

Extract every on-page SEO signal from any URL: title, meta tags, canonical, OG/Twitter cards, JSON-LD schema, heading hierarchy, alt-text gaps, internal/external link counts, word count, text-to-HTML ratio.

Pricing

from $1.20 / 1,000 results

Rating

0.0

(0)

Developer

Thirdwatch

Actor stats

Bookmarked

Total users

Monthly active users

11 days ago

Last modified

Bulk SEO Audit - Titles, Meta, Headings, Schema, Links

Thirdwatch portfolio: 5K users across 88 public Actors, 2M+ records delivered, and >99% run success. Explore all Thirdwatch Actors.

The fastest bulk SEO meta tags and headings extractor — audit titles, meta descriptions, headings, canonicals, social cards, and schema across hundreds of URLs in one pass.

What you get

Paste a list of URLs and get a complete on-page SEO snapshot for each one: title, meta description, heading hierarchy, canonical, robots directive, Open Graph and Twitter social cards, schema markup, image alt-text gaps, internal/external link counts, and word count. Designed for SEO consultants, content auditors, and in-house SEO teams who need real on-page data across hundreds of pages without firing up a desktop crawler.

Output fields

Field	Description
`url`	The URL you submitted.
`final_url`	The URL after following redirects.
`status_code`	HTTP status code (200, 301, 404, etc.).
`title`	The `<title>` tag text.
`title_length`	Character count of the title (Google typically truncates over ~60).
`meta_description`	The meta description tag.
`meta_description_length`	Character count (Google typically truncates over ~160).
`meta_keywords`	The meta keywords tag (rarely used today).
`canonical`	The canonical URL declared by the page.
`robots_meta`	The meta robots directive (e.g. `index,follow`, `noindex`).
`viewport`	The viewport meta tag (mobile-friendliness signal).
`lang`	The `lang` attribute on the `<html>` tag.
`charset`	Character set declared by the page.
`h1_tags`	List of unique H1 headings on the page.
`h2_tags`	List of unique H2 headings on the page.
`h1_count`	Total H1 tags (more than one is an SEO smell).
`h2_count`	Total H2 tags.
`h3_count`	Total H3 tags.
`h4_count`	Total H4 tags.
`og`	Open Graph tags (`og:title`, `og:description`, `og:image`, etc.) as a flat dict.
`twitter`	Twitter card tags (`twitter:card`, `twitter:title`, etc.) as a flat dict.
`json_ld`	Parsed schema markup blocks (Product, Article, FAQ, Breadcrumb, etc.).
`images_total`	Total `<img>` tags on the page.
`images_missing_alt`	Images with no `alt` attribute (accessibility + SEO gap).
`internal_links`	Links pointing back to the same domain.
`external_links`	Links pointing to other domains.
`nofollow_links`	Links marked `rel="nofollow"`.
`word_count`	Visible word count (script and style content excluded).
`text_to_html_ratio`	Word count divided by HTML byte size — low values flag thin or template-heavy pages.
`response_time_ms`	Time to fetch the page in milliseconds.
`content_length`	HTML byte size.
`error`	Error message on failure (`null` on success).
`checked_at`	ISO timestamp when the audit ran.

Example output

{
  "url": "https://example.com",
  "final_url": "https://example.com/",
  "status_code": 200,
  "title": "Example Domain",
  "title_length": 14,
  "meta_description": "Illustrative example for documentation.",
  "meta_description_length": 39,
  "meta_keywords": "",
  "canonical": "https://example.com/",
  "robots_meta": "index,follow",
  "viewport": "width=device-width, initial-scale=1",
  "lang": "en",
  "charset": "utf-8",
  "h1_tags": ["Example Domain"],
  "h2_tags": ["More information"],
  "h1_count": 1,
  "h2_count": 1,
  "h3_count": 0,
  "h4_count": 0,
  "og": {
    "og:title": "Example Domain",
    "og:description": "Illustrative example for documentation.",
    "og:image": "https://example.com/og.png",
    "og:type": "website",
    "og:url": "https://example.com/",
    "og:site_name": "Example"
  },
  "twitter": {
    "twitter:card": "summary_large_image",
    "twitter:title": "Example Domain",
    "twitter:description": "Illustrative example for documentation.",
    "twitter:image": "https://example.com/og.png"
  },
  "json_ld": [
    {
      "@context": "https://schema.org",
      "@type": "WebSite",
      "name": "Example",
      "url": "https://example.com/"
    }
  ],
  "images_total": 4,
  "images_missing_alt": 1,
  "internal_links": 17,
  "external_links": 3,
  "nofollow_links": 2,
  "word_count": 412,
  "text_to_html_ratio": 0.0421,
  "response_time_ms": 184,
  "content_length": 9786,
  "error": null,
  "checked_at": "2026-05-04T12:00:00Z"
}

When a URL fails, the same shape is returned with error set and the numeric fields zeroed out.

Input parameters

Parameter	Required	Description
`urls`	Yes	List of page URLs to audit. Each URL produces one result.
`timeoutSecs`	No (default `20`)	Maximum seconds to wait for each page (1–120). Slow pages report a `timeout` error.
`concurrency`	No (default `10`)	How many pages to fetch in parallel (1–50).
`userAgent`	No	The User-Agent header sent with each request. Override to mimic a specific bot or browser.
`proxyConfiguration`	No	Optional Apify proxy. Most public pages don't need one.

Use cases

SEO consultants running on-page audits — surface every page missing a title, meta description, or canonical, and rank H1 issues across the whole site.
Content auditors cleaning up legacy content — find thin pages by word count and text-to-HTML ratio, then prioritize rewrites.
In-house SEO teams monitoring schema coverage — verify every product page has Product schema, every article has Article schema, and every FAQ page has FAQ schema.
Accessibility leads finding images missing alt text — get a per-page count and prioritize the worst offenders.
Competitive researchers comparing social previews — pull Open Graph and Twitter cards from a competitor's blog to see how they package shares.

Limitations

Captures the HTML the server returns. Pages that render their content entirely in the browser (heavy single-page apps) may show empty title, missing headings, or zero links — for those sites, use a browser-based scraper instead.
Some sites rate-limit aggressive scraping. Use the proxy option if you start hitting 403s.
Word count is computed from visible text after removing script and style content. Inline SVG and obscure HTML may slightly inflate or reduce the count.
Schema markup blocks that are syntactically broken are skipped rather than failing the whole audit.
Authenticated pages are not supported — this is for public URLs only.

Compared to alternatives

	Thirdwatch Bulk SEO Audit	typical "SEO On-Page Analyzer" actors on Apify Store
Open Graph + Twitter cards as structured dicts	Yes	Often title/description only
Parsed schema markup blocks	Yes (full JSON)	Often missing
Heading lists (not just counts)	Yes	Often only counts
Internal vs external vs nofollow link counts	Yes	Often "total links" only
Image alt-text gap count	Yes	Sometimes
Text-to-HTML ratio (thin-content signal)	Yes	Rarely included
Concurrency control	Up to 50 parallel	Often 5–10

FAQ

Does this respect robots.txt? This actor sends one request per URL and does not crawl. It uses a clearly identifiable User-Agent (ThirdwatchSEO/1.0) so site owners can recognize and rate-limit it if they want.

What's the rate limit? You control it via concurrency (max 50). For sites you don't own, keep it low and enable the proxy option if you hit 429 or 403.

Can I audit pages behind a login? No — this actor does not handle authentication, cookies, or session headers. It's designed for public URLs.

Why are some fields empty on JavaScript-heavy sites? Single-page apps render content in the browser after the initial HTML loads. This actor reads the server-rendered HTML only. For React/Vue/Angular sites with no server-side rendering, use a browser-based audit tool instead.

Does it parse every kind of schema markup? Yes — every <script type="application/ld+json"> block is parsed and returned as structured data. Microdata and RDFa are not parsed.

SEO meta tags and headings extractor for site-wide audits

Run a full on-page SEO audit across an entire site in minutes. Pair this with the Sitemap URL Extractor to pull every URL on the site, then pipe the list into this actor for a complete title/meta/heading/schema audit.

For uptime, redirect, and SSL checks across the same URL list, combine with the Bulk URL Status Checker.

Built and maintained by Thirdwatch.

Last verified: 2026-05

SEO Audit Tool — Meta Tags, OpenGraph & Schema Extractor

fanciful_geode/seo-audit-tool

Extract SEO data from any URL: title, meta description, OpenGraph tags, Twitter Cards, JSON-LD schema, heading hierarchy, canonical URLs, and more. Bulk URL support.

Ernesto de Armas

Website Content Crawler

alizarin_refrigerator-owner/website-crawler

Crawl websites for SEO audits. Extracts HTML, title, meta tags, headings, links, & text content from pages. Automatic sitemap detection & parsing Extracts metadata (title, description, OG tags) Heading structure (H1, H2, H3) Internal & external link analysis Image extraction w/alt text Word count

The Howlers

120

HTML Metadata Extractor (OG, Twitter Cards, Schema.org)

gochujang/html-metadata-extractor

Extract structured metadata from any URL: title, description, OpenGraph (og:title/image/type/url/site_name), Twitter Cards, canonical, favicon, JSON-LD schema.org, language, h1 count, images, links. Used for link previews, SEO audits, content cataloging. $0.005/URL.

Hojun Lee

Page SEO Auditor — check meta tags, OG, headings & get a score

acclaimed_ashram/seo-meta-analyzer

Paste a URL, get a full SEO report in seconds. Score (0-100), title, meta, OG preview, Twitter Cards, headings, alt-text, schema.org detection, word count, and a priority fix list. Works on any public page — blog posts, product pages, landing pages.

Klako Cariol

Facebook Page SEO Scraper ⚡ | Open Graph & Meta Tags

premiumscraper/facebook-page-seo-scraper-open-graph-meta-tags

Extract full SEO and meta tag data from Facebook Pages and Profiles. Output includes: og:title, og:description, og:image, og:url, twitter:card, app store links, canonical URL, follower count, page category, and bio text. One structured row per page ✨ Facebook Page SEO Scraper⚡

Premium Scraper

SEO Meta Tag Analyzer

constant_quadruped/seo-meta-analyzer

Audit any URL for SEO: title length, meta description, Open Graph tags, heading hierarchy, image alt texts, keyword density, mobile viewport, canonical URL, schema markup, and more.

Website Meta Tags & Open Graph Scraper

fit_melon/website-meta-tags-scraper

Extract SEO meta tags from any list of URLs: title, meta description, canonical, robots, Open Graph (og:image, og:title), Twitter Cards, favicons, language. Clean JSON for SEO audits and link previews. Free.

D N

Technical On-Page SEO & Meta Auditor

gp005/seo-onpage-auditor

Audit any website's on-page SEO: titles, meta descriptions, H1 tags, canonical URLs, hreflang, Open Graph, Twitter Cards, JSON-LD schema, broken links, image alt text, and robots compliance. Single polite crawl, first-party domains only.

Geo

SEO Meta & Schema Extractor

blazing_stake/seo-meta-extractor

Extract SEO metadata, Open Graph, Twitter Cards, JSON-LD schema, canonical, hreflang and headings from any list of URLs. Fast HTTP-based crawler.

Mehmet Kut

Meta Tags Extractor

krawlify/meta-tags-extractor

Extract SEO meta tags, Open Graph, Twitter Cards, JSON-LD structured data, and headings from any website. Perfect for SEO analysis, competitor research, and content audits.