AI-Enhanced Website Metadata
Pricing
from $7.00 / 1,000 results
AI-Enhanced Website Metadata
Extracts complete website metadata including SEO tags, OpenGraph data, social media links, contact information and performs link analysis. Features AI-powered content summarization with multilingual support and structured data extraction. Perfect for gathering deep insights from any URL.
Pricing
from $7.00 / 1,000 results
Rating
5.0
(1)
Developer

njoylab
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
8 hours ago
Last modified
Categories
Share
URL Summary Scraper with AI
A powerful Apify actor that extracts essential website information with optional AI-powered summaries and key facts extraction. Supports LLM analysis in 30+ languages.
Features
Core Scraping
- Comprehensive metadata extraction - SEO, OpenGraph, Twitter Card data
- Social media links - Facebook, X (Twitter), LinkedIn, Instagram, YouTube, TikTok, Pinterest, Trustpilot, GitHub, Discord, Telegram, WhatsApp, Medium, Reddit, Threads, Mastodon, Twitch, Vimeo, Spotify, Snapchat
- Contact information - Email, phone numbers, addresses
- Link analysis - Internal/external links with domain categorization
- Media assets - Favicons, logos, featured images
- Structured data - JSON-LD extraction
- Robots.txt compliance - Respects crawling rules (can be bypassed)
- Batch processing - Process single URL or multiple URLs in one run
AI-Powered Analysis (Optional)
- Intelligent summaries - Short (50 words), Medium (150 words), Long (300 words)
- Semantic keywords - AI-extracted keywords from content (works for any page type)
- Multilingual support - 30+ languages including English, Italian, Spanish, French, German, Portuguese, etc.
- Key facts extraction - Company name, industry, services, target audience, business model
- Graceful degradation - Returns metadata even if AI analysis fails
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url | array | Yes | - | Array of URLs to scrape (use single-element array for one URL) |
language | string | No | en, en-US;q=0.9, en-GB;q=0.8 | Accept-Language header |
ignoreRobots | boolean | No | false | Bypass robots.txt rules |
ignoreExternalLinks | boolean | No | false | Skip external links extraction |
ignoreInteralLinks | boolean | No | false | Skip internal links extraction |
generateSummary | boolean | No | false | Enable AI-powered summaries (opt-in) |
summaryLength | string | No | - | Summary length: short, medium, or long. Leave empty for all three. |
summaryLanguage | string | No | auto-detect | Target language code (e.g., en, it, es) |
extractKeyFacts | boolean | No | false | Extract structured business information |
Usage Examples
Single URL - Basic Scraping
{"url": ["https://apify.com"]}
Multiple URLs - Batch Processing
{"url": ["https://example.com","https://example.org","https://example.net"]}
AI-Powered Analysis
{"url": ["https://apify.com"],"generateSummary": true,"extractKeyFacts": true}
Multilingual Summary
{"url": ["https://example.it"],"generateSummary": true,"summaryLanguage": "it"}
Output Schema
The actor returns hierarchical JSON structure for each URL:
{"url": "string","seo": {"title": "string","description": "string","keywords": ["string"],"canonical": "string","robots": "string","language": "string","viewport": "string"},"openGraph": {"title": "string","description": "string","image": "string","url": "string","type": "string","siteName": "string"},"twitterCard": {"card": "string","site": "string","creator": "string","title": "string","description": "string","image": "string"},"social": {"facebook": "string","x": "string","linkedin": "string","instagram": "string","youtube": "string","tiktok": "string","pinterest": "string","trustpilot": "string","github": "string","discord": "string","telegram": "string","whatsapp": "string","medium": "string","reddit": "string","threads": "string","mastodon": "string","twitch": "string","vimeo": "string","spotify": "string","snapchat": "string"},"contact": {"email": "string","phone": "string","address": "string"},"technical": {"statusCode": 200,"finalUrl": "string","originalUrl": "string","robotsAllowed": true,"loadTime": 1234,"isSecure": true,"contentType": "text/html"},"media": {"favicon": "string","appleTouchIcon": "string","featuredImage": "string","logo": "string","screenshots": ["string"]},"links": {"internal": {"total": 42,"urls": ["string"]},"external": {"total": 15,"urls": ["string"],"domains": ["string"]},"mailto": ["string"],"tel": ["string"]},"structuredData": [{}],"ai": {"summary": {"short": "string","medium": "string","long": "string","contentLength": 5000,"truncated": false},"keywords": ["string"],"keyFacts": {"companyName": "string","companyType": "B2B SaaS","industry": "Technology","services": ["string"],"targetAudience": "string","headquarters": "San Francisco, USA","foundedYear": 2020,"keyFeatures": ["string"],"businessModel": "Subscription"},"processingTime": 2340,"error": "string"}}
Note: When processing multiple URLs, one record per URL will be added to the dataset.
Supported Languages for AI Summaries
English, Italian, Spanish, French, German, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Polish, Turkish, Swedish, Norwegian, Danish, Finnish, Greek, Czech, Romanian, Hungarian, Thai, Vietnamese, Indonesian, Malay, Ukrainian, Bulgarian, Croatian, Slovak, Slovenian, Lithuanian, Latvian, Estonian.
Performance
- Basic scraping: < 5 seconds per URL
- With AI analysis: < 30 seconds per URL
- Memory: Recommended 2048 MB
- Timeout: Recommended 300 seconds (5 minutes)
Error Handling
The actor implements graceful degradation:
- AI failures → Returns metadata with
ai.errorfield - Network errors → Retries with different URL variants (http/https, www/non-www)
- Robots.txt blocking → Can be bypassed with
ignoreRobots: true - Partial failures → When processing multiple URLs, failed URLs return error objects while successful ones return full data
- Individual URL errors → Each URL is processed independently; one failure doesn't stop the batch
Example Response
Here's a real example of the actor output for a single URL:
{"url": "https://apify.com/","seo": {"title": "Apify: Full-stack web scraping and data extraction platform","description": "Extract data from any website with Apify's scraping tools and ready-made scrapers. No coding needed.","keywords": ["web scraping", "data extraction", "automation"],"canonical": "https://apify.com/","language": "en","viewport": "width=device-width, initial-scale=1"},"openGraph": {"title": "Apify: Full-stack web scraping and data extraction platform","description": "Extract data from any website with Apify's scraping tools","image": "https://apify.com/og-image.png","url": "https://apify.com/","type": "website","siteName": "Apify"},"twitterCard": {"card": "summary_large_image","site": "@apify","title": "Apify: Web scraping platform","image": "https://apify.com/twitter-card.png"},"social": {"x": "https://x.com/apify","linkedin": "https://linkedin.com/company/apifytech","youtube": "https://youtube.com/c/apify","github": "https://github.com/apify","discord": "https://discord.com/invite/apify","medium": "https://medium.com/@apify"},"contact": {"email": "support@apify.com"},"technical": {"statusCode": 200,"finalUrl": "https://apify.com/","originalUrl": "https://apify.com","robotsAllowed": true,"loadTime": 1247,"isSecure": true,"contentType": "text/html; charset=utf-8"},"media": {"favicon": "https://apify.com/favicon.ico","logo": "https://apify.com/logo.svg","featuredImage": "https://apify.com/og-image.png"},"links": {"internal": {"total": 127,"urls": ["https://apify.com/pricing", "https://apify.com/about", "..."]},"external": {"total": 8,"urls": ["https://docs.apify.com", "..."],"domains": ["docs.apify.com", "blog.apify.com"]},"mailto": ["support@apify.com"],"tel": []},"ai": {"summary": {"short": "Apify is a web scraping and automation platform that allows users to extract data from websites without coding.","medium": "Apify is a comprehensive web scraping and data extraction platform designed for both developers and non-technical users. It offers ready-made scrapers, custom scraping tools, and a cloud infrastructure to extract data from any website at scale. The platform features an extensive library of pre-built actors, proxy management, and scheduling capabilities.","contentLength": 15420,"truncated": false},"keywords": ["web scraping", "data extraction", "automation", "B2B SaaS", "cloud platform", "API"],"keyFacts": {"companyName": "Apify","companyType": "B2B SaaS","industry": "Web Scraping & Data Extraction","services": ["Web scraping tools", "Ready-made scrapers", "Cloud infrastructure", "Proxy services"],"targetAudience": "Developers, Data Scientists, Business Analysts","businessModel": "Subscription","keyFeatures": ["Actor marketplace", "Serverless computing", "Proxy management", "Scheduling"]},"processingTime": 3421}}
Tips for Best Results
- Batch Processing - Use arrays for multiple URLs to process them efficiently
- AI Costs - Enable
generateSummaryonly when needed to avoid AI costs - Language Detection - Leave
summaryLanguageempty to auto-detect from page content - Specific Summaries - Use
summaryLengthto get only the length you need - Robots.txt - Respect
robots.txtby default; only useignoreRobots: truewhen legally permitted
Disclaimer
This actor is provided for legitimate web scraping and data extraction purposes. Users are responsible for:
- Compliance with Terms of Service - Ensure you have permission to scrape target websites
- Respect for robots.txt - Follow website crawling guidelines unless legally permitted to override
- Rate limiting - Implement appropriate delays to avoid overloading target servers
- Data privacy - Comply with GDPR, CCPA, and other data protection regulations
- Intellectual property - Respect copyright and trademark rights of scraped content
The developers of this actor are not responsible for misuse or violations of applicable laws and terms of service.