URL Metadata Scraper - OG, Twitter, JSON-LD avatar

URL Metadata Scraper - OG, Twitter, JSON-LD

Pricing

Pay per event

Go to Apify Store
URL Metadata Scraper - OG, Twitter, JSON-LD

URL Metadata Scraper - OG, Twitter, JSON-LD

Extract complete metadata from any URL: Open Graph tags, Twitter Card metadata, JSON-LD structured data, favicons, hreflang alternates, canonical URLs and HTML meta. Perfect for link previews, SEO audits, social media tools, bookmark managers and content aggregators.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Mohieldin Mohamed

Mohieldin Mohamed

Maintained by Community

Actor stats

0

Bookmarked

0

Total users

0

Monthly active users

5 days ago

Last modified

Share

URL Metadata Scraper - OG, Twitter Card, JSON-LD

Extract every piece of metadata from any URL in one API call. URL Metadata Scraper pulls Open Graph tags, Twitter Card metadata, JSON-LD structured data, favicons, hreflang alternates, canonical URLs, and basic HTML meta - everything you need for link previews, SEO audits, social media tools, content aggregators, and bookmark managers.

Pass in a list of URLs, get back a normalized JSON payload with title, description, image, site name, author, language, structured data types, and much more. No custom selectors. No broken previews.

What does URL Metadata Scraper do?

This actor fetches each URL and extracts a complete metadata profile:

  • Normalized top-level fields - title, description, image, site name, type, author, language (picks the best value across OG, Twitter, and HTML meta)
  • Open Graph - All og:* properties (title, description, image, type, site_name, locale, video, audio, article, book, profile, ...)
  • Twitter Card - All twitter:* properties (card, title, description, image, creator, site, player, ...)
  • JSON-LD structured data - Every <script type="application/ld+json"> block parsed and returned as-is, plus a normalized list of schema types for quick filtering
  • Favicons - All declared icons including apple-touch-icon, mask-icon, and manifest
  • hreflang alternates - Language variants of the page
  • Canonical URL - The authoritative URL after redirects and <link rel="canonical">
  • Basic meta - keywords, author, viewport, theme-color, description
  • HTTP info - final URL, status code, content-type, content-language

Why use URL Metadata Scraper?

  • Link preview cards - Build Discord, Slack, or iMessage-style link previews without maintaining your own parser
  • Social media tools - Preview how a URL will appear when shared on Facebook, Twitter, LinkedIn before hitting publish
  • SEO audits - Bulk-check Open Graph and structured data compliance across thousands of URLs in minutes
  • Content aggregators - Power "read later" apps, RSS readers, or news aggregators with rich metadata
  • Bookmark managers - Show users clean titles and thumbnails when they save URLs
  • Knowledge bases - Enrich internal documents, Slack links, or Notion pages with source metadata
  • AI agents - Give LLM-based agents structured context about any URL they encounter

Built on Apify: scheduling, API access, proxy rotation, webhooks, and monitoring out of the box.

How to use URL Metadata Scraper

  1. Click Try for free and sign in to Apify
  2. Paste your URLs into the URLs field
  3. (Optional) Turn off JSON-LD or favicons to slim down the output
  4. Click Start - results appear in seconds
  5. Download as JSON, CSV, or Excel, or query via the Apify API

Input

{
"startUrls": [
{ "url": "https://apify.com" },
{ "url": "https://github.com" }
],
"includeJsonLd": true,
"includeFavicons": true,
"maxRequestsPerCrawl": 100
}
FieldTypeDescription
startUrlsarrayList of URLs to scrape. Each entry is { "url": "..." }. Required.
includeJsonLdbooleanInclude JSON-LD structured data in the output. Default: true.
includeFaviconsbooleanInclude favicons and apple-touch-icons. Default: true.
maxRequestsPerCrawlintegerSafety cap on requests. Default: 100, max: 5000.

Output

{
"url": "https://apify.com/",
"statusCode": 200,
"title": "Apify: Full-stack web scraping and data extraction platform",
"description": "Build, deploy, and scale web scraping and automation Actors...",
"image": "https://apify.com/og-image.png",
"siteName": "Apify",
"type": "website",
"author": null,
"language": "en",
"canonicalUrl": "https://apify.com/",
"keywords": ["web scraping", "data extraction", "automation"],
"themeColor": "#ffffff",
"viewport": "width=device-width, initial-scale=1",
"openGraph": {
"title": "Apify...",
"description": "Build, deploy, and scale...",
"image": "https://apify.com/og-image.png",
"type": "website",
"site_name": "Apify",
"locale": "en_US"
},
"twitterCard": {
"card": "summary_large_image",
"site": "@apify",
"title": "Apify...",
"description": "...",
"image": "https://apify.com/og-image.png"
},
"favicons": [
{ "rel": "icon", "href": "https://apify.com/favicon.ico", "sizes": null, "type": "image/x-icon" },
{ "rel": "apple-touch-icon", "href": "https://apify.com/apple-touch-icon.png", "sizes": "180x180", "type": null }
],
"hreflangs": [],
"jsonLd": [ ... ],
"structuredDataTypes": ["Organization", "WebSite"],
"contentType": "text/html; charset=utf-8",
"contentLanguage": "en",
"scrapedAt": "2026-04-13T19:42:17.301Z"
}

You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.

Output fields

FieldTypeDescription
titlestringBest title found (OG > Twitter > <title>)
descriptionstringBest description found
imagestringAbsolute URL of the primary image
siteNamestringSite name from OG
typestringOG type (website, article, product, ...)
canonicalUrlstringCanonical URL after redirect resolution
openGraphobjectAll Open Graph properties
twitterCardobjectAll Twitter Card properties
faviconsarrayAll favicon / apple-touch-icon links
hreflangsarrayLanguage alternates
jsonLdarrayEvery parsed JSON-LD block
structuredDataTypesarrayNormalized list of schema.org types on the page

How much does it cost to extract URL metadata?

The actor uses a lightweight Cheerio crawler with no headless browser and 10 concurrent requests, so it is very cheap. Extracting 1,000 URLs typically costs pennies on the Apify free tier.

Tips and advanced options

  • Batch thousands of URLs - Paste a CSV or file of URLs; the actor processes them in parallel
  • Disable JSON-LD for lean output - If you only need OG tags for link previews, turn off includeJsonLd to shrink your dataset size by 50-80%
  • Schedule recurring scans - Use Apify Schedules to detect when a page's metadata changes (e.g., for monitoring competitor landing pages)
  • Integrate with your link preview service - Call the actor's API from your app every time a user pastes a URL and cache the result

FAQ

Which sites are supported? Any public URL that returns HTML. The extractor is site-agnostic and handles millions of pages per month reliably.

Is this legal? The actor reads only public HTML and the metadata the site voluntarily declares - exactly what your browser does when it renders a page. No terms of service are bypassed.

What if the site blocks scrapers? Add an Apify proxy configuration in the run settings and the actor will route through datacenter or residential IPs.

Can it handle JavaScript-rendered pages? Most metadata is in static HTML and works with Cheerio. For pages that render OG tags via JS, use a PlaywrightCrawler variant.

Support

Hit a site where metadata extraction fails or returns the wrong title? Open an issue with the URL and we will investigate.