URL Metadata Scraper - OG, Twitter, JSON-LD
Pricing
Pay per event
URL Metadata Scraper - OG, Twitter, JSON-LD
Extract complete metadata from any URL: Open Graph tags, Twitter Card metadata, JSON-LD structured data, favicons, hreflang alternates, canonical URLs and HTML meta. Perfect for link previews, SEO audits, social media tools, bookmark managers and content aggregators.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Mohieldin Mohamed
Actor stats
0
Bookmarked
0
Total users
0
Monthly active users
5 days ago
Last modified
Categories
Share
URL Metadata Scraper - OG, Twitter Card, JSON-LD
Extract every piece of metadata from any URL in one API call. URL Metadata Scraper pulls Open Graph tags, Twitter Card metadata, JSON-LD structured data, favicons, hreflang alternates, canonical URLs, and basic HTML meta - everything you need for link previews, SEO audits, social media tools, content aggregators, and bookmark managers.
Pass in a list of URLs, get back a normalized JSON payload with title, description, image, site name, author, language, structured data types, and much more. No custom selectors. No broken previews.
What does URL Metadata Scraper do?
This actor fetches each URL and extracts a complete metadata profile:
- Normalized top-level fields - title, description, image, site name, type, author, language (picks the best value across OG, Twitter, and HTML meta)
- Open Graph - All
og:*properties (title, description, image, type, site_name, locale, video, audio, article, book, profile, ...) - Twitter Card - All
twitter:*properties (card, title, description, image, creator, site, player, ...) - JSON-LD structured data - Every
<script type="application/ld+json">block parsed and returned as-is, plus a normalized list of schema types for quick filtering - Favicons - All declared icons including
apple-touch-icon,mask-icon, andmanifest - hreflang alternates - Language variants of the page
- Canonical URL - The authoritative URL after redirects and
<link rel="canonical"> - Basic meta - keywords, author, viewport, theme-color, description
- HTTP info - final URL, status code, content-type, content-language
Why use URL Metadata Scraper?
- Link preview cards - Build Discord, Slack, or iMessage-style link previews without maintaining your own parser
- Social media tools - Preview how a URL will appear when shared on Facebook, Twitter, LinkedIn before hitting publish
- SEO audits - Bulk-check Open Graph and structured data compliance across thousands of URLs in minutes
- Content aggregators - Power "read later" apps, RSS readers, or news aggregators with rich metadata
- Bookmark managers - Show users clean titles and thumbnails when they save URLs
- Knowledge bases - Enrich internal documents, Slack links, or Notion pages with source metadata
- AI agents - Give LLM-based agents structured context about any URL they encounter
Built on Apify: scheduling, API access, proxy rotation, webhooks, and monitoring out of the box.
How to use URL Metadata Scraper
- Click Try for free and sign in to Apify
- Paste your URLs into the URLs field
- (Optional) Turn off JSON-LD or favicons to slim down the output
- Click Start - results appear in seconds
- Download as JSON, CSV, or Excel, or query via the Apify API
Input
{"startUrls": [{ "url": "https://apify.com" },{ "url": "https://github.com" }],"includeJsonLd": true,"includeFavicons": true,"maxRequestsPerCrawl": 100}
| Field | Type | Description |
|---|---|---|
startUrls | array | List of URLs to scrape. Each entry is { "url": "..." }. Required. |
includeJsonLd | boolean | Include JSON-LD structured data in the output. Default: true. |
includeFavicons | boolean | Include favicons and apple-touch-icons. Default: true. |
maxRequestsPerCrawl | integer | Safety cap on requests. Default: 100, max: 5000. |
Output
{"url": "https://apify.com/","statusCode": 200,"title": "Apify: Full-stack web scraping and data extraction platform","description": "Build, deploy, and scale web scraping and automation Actors...","image": "https://apify.com/og-image.png","siteName": "Apify","type": "website","author": null,"language": "en","canonicalUrl": "https://apify.com/","keywords": ["web scraping", "data extraction", "automation"],"themeColor": "#ffffff","viewport": "width=device-width, initial-scale=1","openGraph": {"title": "Apify...","description": "Build, deploy, and scale...","image": "https://apify.com/og-image.png","type": "website","site_name": "Apify","locale": "en_US"},"twitterCard": {"card": "summary_large_image","site": "@apify","title": "Apify...","description": "...","image": "https://apify.com/og-image.png"},"favicons": [{ "rel": "icon", "href": "https://apify.com/favicon.ico", "sizes": null, "type": "image/x-icon" },{ "rel": "apple-touch-icon", "href": "https://apify.com/apple-touch-icon.png", "sizes": "180x180", "type": null }],"hreflangs": [],"jsonLd": [ ... ],"structuredDataTypes": ["Organization", "WebSite"],"contentType": "text/html; charset=utf-8","contentLanguage": "en","scrapedAt": "2026-04-13T19:42:17.301Z"}
You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.
Output fields
| Field | Type | Description |
|---|---|---|
title | string | Best title found (OG > Twitter > <title>) |
description | string | Best description found |
image | string | Absolute URL of the primary image |
siteName | string | Site name from OG |
type | string | OG type (website, article, product, ...) |
canonicalUrl | string | Canonical URL after redirect resolution |
openGraph | object | All Open Graph properties |
twitterCard | object | All Twitter Card properties |
favicons | array | All favicon / apple-touch-icon links |
hreflangs | array | Language alternates |
jsonLd | array | Every parsed JSON-LD block |
structuredDataTypes | array | Normalized list of schema.org types on the page |
How much does it cost to extract URL metadata?
The actor uses a lightweight Cheerio crawler with no headless browser and 10 concurrent requests, so it is very cheap. Extracting 1,000 URLs typically costs pennies on the Apify free tier.
Tips and advanced options
- Batch thousands of URLs - Paste a CSV or file of URLs; the actor processes them in parallel
- Disable JSON-LD for lean output - If you only need OG tags for link previews, turn off
includeJsonLdto shrink your dataset size by 50-80% - Schedule recurring scans - Use Apify Schedules to detect when a page's metadata changes (e.g., for monitoring competitor landing pages)
- Integrate with your link preview service - Call the actor's API from your app every time a user pastes a URL and cache the result
FAQ
Which sites are supported? Any public URL that returns HTML. The extractor is site-agnostic and handles millions of pages per month reliably.
Is this legal? The actor reads only public HTML and the metadata the site voluntarily declares - exactly what your browser does when it renders a page. No terms of service are bypassed.
What if the site blocks scrapers? Add an Apify proxy configuration in the run settings and the actor will route through datacenter or residential IPs.
Can it handle JavaScript-rendered pages? Most metadata is in static HTML and works with Cheerio. For pages that render OG tags via JS, use a PlaywrightCrawler variant.
Support
Hit a site where metadata extraction fails or returns the wrong title? Open an issue with the URL and we will investigate.