Pricing

from $2.00 / 1,000 results

SEO Data Extractor

Extract comprehensive SEO metadata, headings, links, images, Open Graph tags, Twitter Cards, and technical data from websites. Perfect for SEO audits, competitor analysis, and content optimization. Runs on Apify platform with structured JSON output.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

No-Code Venture

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Features

Extract comprehensive SEO data from any webpage including:

Meta Information: Title, description, keywords, robots directives, canonical URLs, author, and generator tags with length counts
Heading Structure: All H1-H6 tags with text content and counts for each level
Content Analysis: Word count, link analysis (total/internal/external), and image audit (total/without alt text)
Open Graph Tags: Complete Open Graph metadata (title, description, image, URL, type, site name)
Twitter Cards: Twitter Card metadata for social sharing
Technical SEO: Status codes, response time, charset, language, viewport settings
Structured Data: JSON-LD detection and schema type identification
Branding Assets: Favicon, Apple touch icon, and theme color detection
Sitemap Extraction: Optionally fetch and include all URLs from each domain's sitemap.xml
SSL Certificate Analysis: Extract SSL/TLS certificate details including issuer, expiry, validity, and protocol version
Error Handling: Graceful handling of HTTP errors (404, 500, etc.) with proper error codes and messages

Use Cases

SEO Monitoring: Track SEO data for your websites or competitors over time
Content Analysis: Analyze meta tags to optimize webpage content for search engines
SEO Audits: Collect data for comprehensive SEO audits across multiple pages
Competitor Analysis: Track SEO data for your competitors
Bulk Data Extraction: Process 1 to 100,000+ pages efficiently

Input Configuration

Field	Type	Description	Default
`startUrls`	Array	List of URLs to extract SEO data from	`https://nocodeventure.com`
`extractSitemapUrls`	Boolean	Fetch and include sitemap data for each domain	`false`
`sitemapUrl`	String	Custom sitemap path (e.g., `sitemap_index.xml` or `/sitemaps/main.xml`)	`/sitemap.xml`
`extractSslInfo`	Boolean	Extract SSL/TLS certificate information	`false`
`maxRequestsPerCrawl`	Integer	Maximum pages to scrape (0 = unlimited)	`100`
`requestTimeout`	Integer	Request timeout in seconds (3-10)	`5`
`maxConcurrency`	Integer	Parallel requests (1-50)	`10`
`maxRequestRetries`	Integer	Max retries for failed requests (0-5)	`1`
`proxyConfiguration`	Object	Proxy settings for anti-blocking	Apify Proxy disabled

Output Schema

The Actor returns structured JSON data with the following fields:

Field	Type	Description
`url`	String	The URL that was scraped
`scrapedAt`	String	ISO 8601 timestamp of when the page was scraped
`error`	String (optional)	Error code if scraping failed (e.g., "404", "500", "REQUEST_FAILED")
`errorMessage`	String (optional)	Human-readable error message

Meta Information (`meta`)

Field	Type	Description
`title`	String	Page title from `<title>` tag
`titleLength`	Number	Character count of the title
`description`	String	Meta description content
`descriptionLength`	Number	Character count of the description
`keywords`	String	Meta keywords content
`robots`	String	Robots meta directive (e.g., "index, follow")
`canonical`	String	Canonical URL from meta tag
`author`	String	Author meta tag content
`generator`	String	Generator meta tag content

Headings (`headings`)

Field	Type	Description
`h1.text`	String	Combined text content of all H1 tags
`h1.count`	Number	Number of H1 tags found
`h2.text`	String	Combined text content of all H2 tags
`h2.count`	Number	Number of H2 tags found
`h3.text`	String	Combined text content of all H3 tags
`h3.count`	Number	Number of H3 tags found
`h4.text`	String	Combined text content of all H4 tags
`h4.count`	Number	Number of H4 tags found
`h5.text`	String	Combined text content of all H5 tags
`h5.count`	Number	Number of H5 tags found
`h6.text`	String	Combined text content of all H6 tags
`h6.count`	Number	Number of H6 tags found

Open Graph Tags (`openGraph`)

Field	Type	Description
`title`	String	Open Graph title
`description`	String	Open Graph description
`image`	String	Open Graph image URL
`url`	String	Open Graph URL
`type`	String	Open Graph type (e.g., "website", "article")
`siteName`	String	Open Graph site name

Twitter Cards (`twitterCard`)

Field	Type	Description
`card`	String	Twitter card type (e.g., "summary", "summary_large_image")
`title`	String	Twitter card title
`description`	String	Twitter card description
`image`	String	Twitter card image URL
`site`	String	Twitter site handle

Content Analysis (`content`)

Field	Type	Description
`wordCount`	Number	Total word count in page body
`links.total`	Number	Total number of links found
`links.internal`	Number	Number of internal links (same domain)
`links.external`	Number	Number of external links (different domain)
`images.total`	Number	Total number of images found
`images.withoutAlt`	Number	Number of images missing alt text

Technical SEO (`technical`)

Field	Type	Description
`statusCode`	Number	HTTP response status code
`responseTime`	Number	Response time in milliseconds
`charset`	String	Character encoding (e.g., "UTF-8")
`language`	String	Page language from HTML lang attribute
`viewport`	String	Viewport meta tag content
`structuredData.hasStructuredData`	Boolean	Whether JSON-LD structured data was found
`structuredData.types`	Array	Array of structured data schema types found

Branding Assets (`branding`)

Field	Type	Description
`favicon`	String	Favicon URL
`appleTouchIcon`	String	Apple touch icon URL
`themeColor`	String	Theme color meta tag content

Sitemap Data (`sitemap`) - Optional

Note: This field is only included when extractSitemapUrls is enabled. If the page scrape fails (HTTP error or request failure), the sitemap object will not be included in the output.

Field	Type	Description
`found`	Boolean	Whether a sitemap was found and parsed
`sitemapUrl`	String	The sitemap URL that was fetched
`isKnownPath`	Boolean	Whether a known/custom sitemap path was used (see below)
`urlCount`	Number	Total number of URLs found in the sitemap
`urls`	Array	List of all URLs from the sitemap
`error`	String (optional)	Error message if sitemap fetch failed

Example output with sitemap enabled:

{
  "url": "https://example.com",
  "meta": { ... },
  "sitemap": {
    "found": true,
    "sitemapUrl": "https://example.com/sitemap.xml",
    "isKnownPath": false,
    "urlCount": 156,
    "urls": [
      "https://example.com/",
      "https://example.com/about",
      "https://example.com/contact",
      ...
    ]
  },
  "scrapedAt": "2025-12-12T10:00:00.000Z"
}

Sitemap caching: If you have multiple URLs from the same domain, the sitemap is only fetched once and reused for all pages from that domain.

SSL Certificate Data (`ssl`) - Optional

Note: This field is only included when extractSslInfo is enabled.

Field	Type	Description
`isHttps`	Boolean	Whether the site uses HTTPS
`isValid`	Boolean	Whether the SSL certificate is valid
`issuer`	String	Certificate issuer organization
`issuerCN`	String	Certificate issuer common name
`subject`	String	Certificate subject (domain)
`validFrom`	String	Certificate valid from date (ISO 8601)
`validTo`	String	Certificate expiry date (ISO 8601)
`daysUntilExpiry`	Number	Days until certificate expires (negative if expired)
`isExpired`	Boolean	Whether certificate is expired
`expiresSoon`	Boolean	Whether certificate expires within 30 days
`protocol`	String	SSL/TLS protocol version (e.g., "TLSv1.3")
`altNames`	Array	Subject Alternative Names (other domains covered)
`serialNumber`	String	Certificate serial number
`fingerprint`	String	Certificate fingerprint (SHA-256)
`error`	String (optional)	Error message if SSL check failed

Example output with SSL enabled:

{
  "url": "https://example.com",
  "meta": { ... },
  "ssl": {
    "isHttps": true,
    "isValid": true,
    "issuer": "Let's Encrypt",
    "issuerCN": "R3",
    "subject": "example.com",
    "validFrom": "2025-01-01T00:00:00.000Z",
    "validTo": "2025-04-01T00:00:00.000Z",
    "daysUntilExpiry": 90,
    "isExpired": false,
    "expiresSoon": false,
    "protocol": "TLSv1.3",
    "altNames": ["example.com", "www.example.com"],
    "serialNumber": "ABC123...",
    "fingerprint": "AA:BB:CC:..."
  },
  "scrapedAt": "2025-12-21T10:00:00.000Z"
}

SSL caching: If you have multiple URLs from the same domain, the SSL certificate is only checked once and reused for all pages from that domain.

Known Sitemap Paths

Some websites don't use the standard /sitemap.xml location. The Actor includes built-in support for these sites with isKnownPath: true in the output.

Domain	Sitemap Location
`amazon.com`, `www.amazon.com`, `aws.amazon.com`	`https://aws.amazon.com/ar/sitemaps/index/`

When a known path is used, you'll see it in the logs:

Using known sitemap path for www.amazon.com: https://aws.amazon.com/ar/sitemaps/index/

Error Output Example

When a URL returns an HTTP error (like 404), the Actor returns an error item instead of failing:

{
  "url": "https://example.com/broken-link",
  "meta": {
    "title": "",
    "titleLength": 0,
    "description": "",
    "descriptionLength": 0,
    "keywords": "",
    "robots": "",
    "canonical": "",
    "author": "",
    "generator": ""
  },
  "technical": {
    "statusCode": 404,
    "responseTime": 150
  },
  "error": "404",
  "errorMessage": "Page not found",
  "scrapedAt": "2025-12-11T20:23:04.317Z"
}

This allows you to:

Continue processing other URLs without failing the entire run
Identify broken links and problematic URLs in your dataset
Filter error results using the dedicated "Errors" view

Output Views

The Actor provides multiple dataset views for different analysis needs:

SEO Overview: Quick summary with URL, error status, title, description, canonical, robots, H1 count, and links
Errors: Dedicated view for URLs that returned HTTP errors (404, 500, etc.) with error codes and messages
Heading Structure: H1-H6 tags with text content and counts for each level
Open Graph: Complete Open Graph metadata for social sharing
Twitter Cards: Twitter Card metadata for social sharing
Content Analysis: Word count, link breakdown (internal/external), and image audit data
Technical SEO: HTTP status, response time, charset, language, viewport, and structured data
Branding Assets: Favicon, Apple touch icon, and theme color information
Sitemap Data: URLs found in each domain's sitemap (when sitemap extraction is enabled)
SSL Certificates: Certificate validity, issuer, expiry dates, protocol version (when SSL extraction is enabled)

How to Export

Access Results: After running, view collected data in Apify's interface
Select Export Option: Download as CSV, JSON, Excel, or XML
Open in Tools: Import into Excel, Google Sheets, or your analysis tool
API Access: Use the Apify API to integrate with your workflows

Pricing Model

This Actor uses Pay-Per-Event (PPE) pricing with automatic charging via Apify's synthetic events:

Actor Start: Charged automatically when the Actor starts
Dataset Item: Charged automatically for each result pushed to the dataset

Error Handling & Billing

URLs that return HTTP errors (404, 500, etc.) are still charged because:

The Actor had to make a request to discover the error
Error items are returned with proper error codes and messages
This allows you to identify broken links without failing the entire run

You can set a maximum spending limit in the Apify Console to control costs.

What's Included

Apify SDK - Toolkit for building Actors
Input Schema - Input validation
Dataset - Structured data storage
Proxy Configuration - IP rotation for anti-blocking

Limitations

⚠️ JavaScript-Heavy Sites: This tool primarily extracts data from static HTML. It may not capture content that loads dynamically via JavaScript, potentially resulting in incomplete data extraction.

FAQ

Are duplicate URLs processed multiple times?

Yes. The Actor processes every URL in your input list, including duplicates. If you submit the same URL multiple times, it will be processed and charged each time.

Tip: Remove duplicates from your input list before running to save costs:

https://example.com/page1  ← processed, charged
https://example.com/page1  ← processed again, charged again
https://example.com/page2  ← processed, charged

Am I charged for failed requests?

Yes. URLs that return HTTP errors (404, 500, etc.) or fail after retries are still charged because the Actor had to make a request to discover the error. However, you receive an error item in your dataset with the error code and message, so you know exactly what happened.

How can I control costs?

Set a maximum spending limit in the Apify Console before running
Use the maxRequestsPerCrawl input to limit the number of pages processed
Remove duplicate URLs from your input list before running
Set maxRequestRetries to 0 if you don't want failed requests to be retried

Legal Disclaimer

⚠️ Important Legal Notice

This tool is provided for educational and research purposes only. By using this SEO Data Extractor, you agree to:

Comply with all applicable laws: You are solely responsible for ensuring your use of this tool complies with local, national, and international laws, including copyright laws, data protection regulations (such as GDPR, CCPA), and terms of service of target websites.
Respect website terms of service: Many websites prohibit automated scraping in their terms of service. You must review and comply with each website's terms before using this tool.
Respect robots.txt: This tool does not automatically check or respect robots.txt files. You are responsible for checking and honoring robots.txt directives.
Rate limiting and ethical use: Use reasonable request rates and respect website operators. Excessive requests may constitute a denial-of-service attack.
Data privacy compliance: Ensure your data collection and processing activities comply with privacy laws. Do not collect personal data without proper consent and legal basis.
No warranties: This tool is provided "as is" without warranties of any kind. The authors are not responsible for any damages or legal consequences arising from its use.
Use at your own risk: You assume all risks associated with using this tool. The authors disclaim all liability for any direct, indirect, incidental, or consequential damages.

Before using this tool, consult with legal counsel to ensure compliance with applicable laws and regulations.

Meta Tags Extractor

krawlify/meta-tags-extractor

Extract SEO meta tags, Open Graph, Twitter Cards, JSON-LD structured data, and headings from any website. Perfect for SEO analysis, competitor research, and content audits.

Praveen Kumar

Seo Analyzer

literal_jacktree/seo-analyzer

Comprehensive on-page SEO analyzer for any URL. Get an instant SEO score (0-100) with actionable recommendations. Checks title tags, meta descriptions, headings, images, links, Open Graph, and content quality. Perfect for SEO audits, competitor analysis, and site optimization.

Janice

Meta Tags Extractor

hairy_grape/meta-tags-extractor

Extract all SEO meta tags, Open Graph, Twitter Cards, and get an instant SEO score (0-100). Perfect for SEO audits, competitive analysis, and digital marketing. Analyze any website in seconds!

Ares Y

SEO Audit Pro

botflowtech/seo-audit-pro

Professional-grade SEO audit tool built for agencies, developers, and marketers. Extract 30+ SEO data points from any webpage including Title Tags, Meta Descriptions, H1-H6 Headings, Status Codes, Canonical URLs, Open Graph tags, Twitter Cards, JSON-LD structured data, and comprehensive SEO health.

BotFlowTech

Complete SEO Audit Tool - Comprehensive Website SEO Analysis

smart-digital/complete-seo-audit-tool

Analyzes websites for SEO issues across meta tags, technical SEO, performance, links, and images. Generates 0-100 SEO scores per page with detailed recommendations and site-wide summary.

My Smart Digital

117

5.0

Elite Seo Analytics Lite

thepattyroller/elite-seo-analytics-lite

Comprehensive SEO analysis tool. Extract meta tags, analyze keywords, check page structure, and get actionable SEO recommendations. Perfect for quick SEO audits and on-page optimization.

Logan Kiser

Open Graph & Meta Tag Extractor

automation-lab/og-meta-extractor

This actor fetches any list of URLs and extracts all social media meta tags (Open Graph, Twitter Cards), SEO metadata (title, description, canonical, robots), structured data (JSON-LD), and internationalization (hreflang). Use it for social media audits, SEO analysis, link preview...

Stas Persiianenko

SEO Analyzer

vivid_astronaut/seo-analyzer

Analyze websites for SEO issues. Get scores, meta tags, headings, links, images, and actionable recommendations.

Fabio Suizu

DataForSeo On-page SEO

alizarin_refrigerator-owner/dataforseo-onpage

This actor performs in-depth on-page SEO audits using the DataForSEO On-Page API. Page titles, meta descriptions, headings, content, technical SEO factors w/actionable recommendations. Technical SEO, Content Analysis, Meta Tags, Heading Structure, Image Optimization, Link Analysis & Core Web Vitals

The Howlers

Technical SEO Audit & Scoring Tool

juryless_lens/seo-audit-tool

Crawl websites and instantly identify critical technical SEO issues. This tool analyzes title tags, meta descriptions, heading hierarchies, Open Graph data, and image attributes to provide a 0-100 SEO score with actionable fixes.

Brian

SEO Data Extractor

Features

Use Cases

Input Configuration

Output Schema

Meta Information (meta)

Headings (headings)

Open Graph Tags (openGraph)

Twitter Cards (twitterCard)

Content Analysis (content)

Technical SEO (technical)

Branding Assets (branding)

Sitemap Data (sitemap) - Optional

SSL Certificate Data (ssl) - Optional

Known Sitemap Paths

Error Output Example

Output Views

How to Export

Pricing Model

Error Handling & Billing

What's Included

Limitations

FAQ

Are duplicate URLs processed multiple times?

Am I charged for failed requests?

How can I control costs?

Legal Disclaimer

⚠️ Important Legal Notice

You might also like

Meta Tags Extractor

Seo Analyzer

Meta Tags Extractor

SEO Audit Pro

Complete SEO Audit Tool - Comprehensive Website SEO Analysis

Elite Seo Analytics Lite

Open Graph & Meta Tag Extractor

SEO Analyzer

DataForSeo On-page SEO

Technical SEO Audit & Scoring Tool

Meta Information (`meta`)

Headings (`headings`)

Open Graph Tags (`openGraph`)

Twitter Cards (`twitterCard`)

Content Analysis (`content`)

Technical SEO (`technical`)

Branding Assets (`branding`)

Sitemap Data (`sitemap`) - Optional

SSL Certificate Data (`ssl`) - Optional