Pricing

from $7.00 / 1,000 results

Try for free

Go to Apify Store

AI-Enhanced Website Metadata

Try for free

Extracts complete website metadata including SEO tags, OpenGraph data, social media links, contact information and performs link analysis. Features AI-powered content summarization with multilingual support and structured data extraction. Perfect for gathering deep insights from any URL.

Pricing

from $7.00 / 1,000 results

Rating

5.0

(1)

Developer

njoylab

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

URL Summary Scraper with AI

A powerful Apify actor that extracts essential website information with optional AI-powered summaries and key facts extraction. Supports LLM analysis in 30+ languages.

Features

Core Scraping

Comprehensive metadata extraction - SEO, OpenGraph, Twitter Card data
Social media links - Facebook, X (Twitter), LinkedIn, Instagram, YouTube, TikTok, Pinterest, Trustpilot, GitHub, Discord, Telegram, WhatsApp, Medium, Reddit, Threads, Mastodon, Twitch, Vimeo, Spotify, Snapchat
Contact information - Email, phone numbers, addresses
Link analysis - Internal/external links with domain categorization
Media assets - Favicons, logos, featured images
Structured data - JSON-LD extraction
Robots.txt compliance - Respects crawling rules (can be bypassed)
Batch processing - Process single URL or multiple URLs in one run

AI-Powered Analysis (Optional)

Intelligent summaries - Short (50 words), Medium (150 words), Long (300 words)
Semantic keywords - AI-extracted keywords from content (works for any page type)
Multilingual support - 30+ languages including English, Italian, Spanish, French, German, Portuguese, etc.
Key facts extraction - Company name, industry, services, target audience, business model
Graceful degradation - Returns metadata even if AI analysis fails

Input Parameters

Parameter	Type	Required	Default	Description
`url`	array	Yes	-	Array of URLs to scrape (use single-element array for one URL)
`language`	string	No	`en, en-US;q=0.9, en-GB;q=0.8`	Accept-Language header
`ignoreRobots`	boolean	No	`false`	Bypass robots.txt rules
`ignoreExternalLinks`	boolean	No	`false`	Skip external links extraction
`ignoreInteralLinks`	boolean	No	`false`	Skip internal links extraction
`generateSummary`	boolean	No	`false`	Enable AI-powered summaries (opt-in)
`summaryLength`	string	No	-	Summary length: `short`, `medium`, or `long`. Leave empty for all three.
`summaryLanguage`	string	No	auto-detect	Target language code (e.g., `en`, `it`, `es`)
`extractKeyFacts`	boolean	No	`false`	Extract structured business information

Usage Examples

Single URL - Basic Scraping

{
  "url": ["https://apify.com"]
}

Multiple URLs - Batch Processing

{
  "url": [
    "https://example.com",
    "https://example.org",
    "https://example.net"
  ]
}

AI-Powered Analysis

{
  "url": ["https://apify.com"],
  "generateSummary": true,
  "extractKeyFacts": true
}

Multilingual Summary

{
  "url": ["https://example.it"],
  "generateSummary": true,
  "summaryLanguage": "it"
}

Output Schema

The actor returns hierarchical JSON structure for each URL:

{
  "url": "string",
  "seo": {
    "title": "string",
    "description": "string",
    "keywords": ["string"],
    "canonical": "string",
    "robots": "string",
    "language": "string",
    "viewport": "string"
  },
  "openGraph": {
    "title": "string",
    "description": "string",
    "image": "string",
    "url": "string",
    "type": "string",
    "siteName": "string"
  },
  "twitterCard": {
    "card": "string",
    "site": "string",
    "creator": "string",
    "title": "string",
    "description": "string",
    "image": "string"
  },
  "social": {
    "facebook": "string",
    "x": "string",
    "linkedin": "string",
    "instagram": "string",
    "youtube": "string",
    "tiktok": "string",
    "pinterest": "string",
    "trustpilot": "string",
    "github": "string",
    "discord": "string",
    "telegram": "string",
    "whatsapp": "string",
    "medium": "string",
    "reddit": "string",
    "threads": "string",
    "mastodon": "string",
    "twitch": "string",
    "vimeo": "string",
    "spotify": "string",
    "snapchat": "string"
  },
  "contact": {
    "email": "string",
    "phone": "string",
    "address": "string"
  },
  "technical": {
    "statusCode": 200,
    "finalUrl": "string",
    "originalUrl": "string",
    "robotsAllowed": true,
    "loadTime": 1234,
    "isSecure": true,
    "contentType": "text/html"
  },
  "media": {
    "favicon": "string",
    "appleTouchIcon": "string",
    "featuredImage": "string",
    "logo": "string",
    "screenshots": ["string"]
  },
  "links": {
    "internal": {
      "total": 42,
      "urls": ["string"]
    },
    "external": {
      "total": 15,
      "urls": ["string"],
      "domains": ["string"]
    },
    "mailto": ["string"],
    "tel": ["string"]
  },
  "structuredData": [{}],
  "ai": {
    "summary": {
      "short": "string",
      "medium": "string",
      "long": "string",
      "contentLength": 5000,
      "truncated": false
    },
    "keywords": ["string"],
    "keyFacts": {
      "companyName": "string",
      "companyType": "B2B SaaS",
      "industry": "Technology",
      "services": ["string"],
      "targetAudience": "string",
      "headquarters": "San Francisco, USA",
      "foundedYear": 2020,
      "keyFeatures": ["string"],
      "businessModel": "Subscription"
    },
    "processingTime": 2340,
    "error": "string"
  }
}

Note: When processing multiple URLs, one record per URL will be added to the dataset.

Supported Languages for AI Summaries

English, Italian, Spanish, French, German, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Polish, Turkish, Swedish, Norwegian, Danish, Finnish, Greek, Czech, Romanian, Hungarian, Thai, Vietnamese, Indonesian, Malay, Ukrainian, Bulgarian, Croatian, Slovak, Slovenian, Lithuanian, Latvian, Estonian.

Performance

Basic scraping: < 5 seconds per URL
With AI analysis: < 30 seconds per URL
Memory: Recommended 2048 MB
Timeout: Recommended 300 seconds (5 minutes)

Error Handling

The actor implements graceful degradation:

AI failures → Returns metadata with ai.error field
Network errors → Retries with different URL variants (http/https, www/non-www)
Robots.txt blocking → Can be bypassed with ignoreRobots: true
Partial failures → When processing multiple URLs, failed URLs return error objects while successful ones return full data
Individual URL errors → Each URL is processed independently; one failure doesn't stop the batch

Example Response

Here's a real example of the actor output for a single URL:

{
  "url": "https://apify.com/",
  "seo": {
    "title": "Apify: Full-stack web scraping and data extraction platform",
    "description": "Extract data from any website with Apify's scraping tools and ready-made scrapers. No coding needed.",
    "keywords": ["web scraping", "data extraction", "automation"],
    "canonical": "https://apify.com/",
    "language": "en",
    "viewport": "width=device-width, initial-scale=1"
  },
  "openGraph": {
    "title": "Apify: Full-stack web scraping and data extraction platform",
    "description": "Extract data from any website with Apify's scraping tools",
    "image": "https://apify.com/og-image.png",
    "url": "https://apify.com/",
    "type": "website",
    "siteName": "Apify"
  },
  "twitterCard": {
    "card": "summary_large_image",
    "site": "@apify",
    "title": "Apify: Web scraping platform",
    "image": "https://apify.com/twitter-card.png"
  },
  "social": {
    "x": "https://x.com/apify",
    "linkedin": "https://linkedin.com/company/apifytech",
    "youtube": "https://youtube.com/c/apify",
    "github": "https://github.com/apify",
    "discord": "https://discord.com/invite/apify",
    "medium": "https://medium.com/@apify"
  },
  "contact": {
    "email": "support@apify.com"
  },
  "technical": {
    "statusCode": 200,
    "finalUrl": "https://apify.com/",
    "originalUrl": "https://apify.com",
    "robotsAllowed": true,
    "loadTime": 1247,
    "isSecure": true,
    "contentType": "text/html; charset=utf-8"
  },
  "media": {
    "favicon": "https://apify.com/favicon.ico",
    "logo": "https://apify.com/logo.svg",
    "featuredImage": "https://apify.com/og-image.png"
  },
  "links": {
    "internal": {
      "total": 127,
      "urls": ["https://apify.com/pricing", "https://apify.com/about", "..."]
    },
    "external": {
      "total": 8,
      "urls": ["https://docs.apify.com", "..."],
      "domains": ["docs.apify.com", "blog.apify.com"]
    },
    "mailto": ["support@apify.com"],
    "tel": []
  },
  "ai": {
    "summary": {
      "short": "Apify is a web scraping and automation platform that allows users to extract data from websites without coding.",
      "medium": "Apify is a comprehensive web scraping and data extraction platform designed for both developers and non-technical users. It offers ready-made scrapers, custom scraping tools, and a cloud infrastructure to extract data from any website at scale. The platform features an extensive library of pre-built actors, proxy management, and scheduling capabilities.",
      "contentLength": 15420,
      "truncated": false
    },
    "keywords": ["web scraping", "data extraction", "automation", "B2B SaaS", "cloud platform", "API"],
    "keyFacts": {
      "companyName": "Apify",
      "companyType": "B2B SaaS",
      "industry": "Web Scraping & Data Extraction",
      "services": ["Web scraping tools", "Ready-made scrapers", "Cloud infrastructure", "Proxy services"],
      "targetAudience": "Developers, Data Scientists, Business Analysts",
      "businessModel": "Subscription",
      "keyFeatures": ["Actor marketplace", "Serverless computing", "Proxy management", "Scheduling"]
    },
    "processingTime": 3421
  }
}

Tips for Best Results

Batch Processing - Use arrays for multiple URLs to process them efficiently
AI Costs - Enable generateSummary only when needed to avoid AI costs
Language Detection - Leave summaryLanguage empty to auto-detect from page content
Specific Summaries - Use summaryLength to get only the length you need
Robots.txt - Respect robots.txt by default; only use ignoreRobots: true when legally permitted

Disclaimer

This actor is provided for legitimate web scraping and data extraction purposes. Users are responsible for:

Compliance with Terms of Service - Ensure you have permission to scrape target websites
Respect for robots.txt - Follow website crawling guidelines unless legally permitted to override
Rate limiting - Implement appropriate delays to avoid overloading target servers
Data privacy - Comply with GDPR, CCPA, and other data protection regulations
Intellectual property - Respect copyright and trademark rights of scraped content

The developers of this actor are not responsible for misuse or violations of applicable laws and terms of service.

Website Scraper Search Email, Phone, & Social Media

scraping_solutions/website-scraper-search-email-phone-social-media

Automatically extracts emails, social media links, and phone numbers from any website. Perfect for quickly gathering contact details and online presence data of businesses or professionals.

Scraping Solutions

URL to Metadata - mail, social and more

njoylab/url-summary-scraper

A powerful Apify actor that extracts essential website information, including title, description, images, mail, and social media links. Perfect for quick data gathering and insights from any URL.

njoylab

120

5.0

SEO Intelligence Suite - Complete Analysis with AI

viralanalyzer/seo-intelligence-suite

Complete SEO audit: meta tags, headings, links, structured data, AI recommendations.

viralanalyzer

AI Website Enricher & Metadata Scraper

express_kingfisher/website-details-gather-with-ai

🚀 Instantly turn any URL into rich data. Extract description, tech stack, pricing model, social links, and SEO categories using AI. Perfect for lead generation and market research.

Prince Raj

Best Data Extractor API

crawlkit/best-data-extractor-api

Extract structured data from any website with AI-powered extraction. Powered by Crawlkit.

Crawlkit

Website Contact Extractor

krawlify/website-contact-extractor

Extract emails, phone numbers, and social media links from any website. Perfect for lead generation, sales prospecting, and contact discovery.

Praveen Kumar

AI Web Extractor

uxinfra/uxinfra-web-extractor

Intelligent web content extraction with AI-powered structuring. Extracts articles, products, reviews, and structured data from any website.

UXINFRA

AI-Powered RSS Aggregator & Summarizer

primeparse/rss-aggregator

Enterprise-grade RSS aggregator with AI-powered summarization. Collects, filters, and processes feeds from any source. Ideal for content analysis, news monitoring, and AI training. Features keyword filtering, metadata extraction, and structured output in JSON/CSV. Built with Hugging Face.