URL Metadata & OpenGraph Extractor avatar

URL Metadata & OpenGraph Extractor

Pricing

$1.00 / 1,000 url reads

Go to Apify Store
URL Metadata & OpenGraph Extractor

URL Metadata & OpenGraph Extractor

Reads a page's own public head tags, OpenGraph, Twitter card, title, description, canonical, favicon, and language, for clean link previews and RAG ingestion. Respects robots.txt by default. Billed only per URL successfully read.

Pricing

$1.00 / 1,000 url reads

Rating

0.0

(0)

Developer

Pono Data

Pono Data

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

11 minutes ago

Last modified

Categories

Share

Give it a list of page URLs and get back the metadata each page publishes for previews: OpenGraph (og:*), Twitter card (twitter:*), <title>, meta description, canonical link, declared favicon, and page language. Clean, flat rows, built for link previews and for feeding RAG pipelines with consistent per-link metadata.

Input

  • URLs: one per line.
  • Respect robots.txt: when on (default), the host's robots.txt is checked and disallowed URLs are skipped.
  • Max delivered URLs: cap on billed rows (0 = no cap).

Output

One row per URL: url, finalUrl, httpStatus, title, description, canonical, the og* fields, the twitter* fields, favicon, lang, plus provenance (sourceUrl, retrievedAt, confidence, dataSource).

How it works

Sites publish these head tags specifically so other tools can render previews. The actor fetches each page politely with a declared User-Agent, reads only the head, and copies the tags verbatim. Relative og:image, canonical, and favicon URLs are resolved to absolute against the page URL; nothing else is transformed, and a tag the page does not declare is null, never invented. A URL that robots disallows, or that fails to fetch, is written to the free rejected dataset and is not billed. A site owner can ask us to skip their domain at https://ponodata.com/opt-out ; opted-out hosts are skipped and never charged.

Billing

Pay per URL successfully read. Robots-disallowed and failed URLs cost nothing.

Sample output

A real run reading each page's own public head tags (one row per URL):

URLtitledescriptionOG type
https://www.cloudflare.comCloudflare: Build for the…Welcome to Cloudflare - Powering …website
https://stripe.comStripe / Financial Infras…Stripe is a financial services pl…website
https://www.python.orgWelcome to Python.orgThe official home of the Python P…website
https://kubernetes.ioKubernetesKubernetes, also known as K8s, is…website

Every row carries a sourceUrl (the page read), for example https://www.cloudflare.com. Pages that return no metadata route to the free reject dataset.

See also

More clean, pay-only-for-results data tools from Pono Data:

Full catalog: https://apify.com/thoob