Pricing

Pay per usage

Go to Store

Metadata Extractor

Try for free

Developed by

Jan Čurn

A small efficient actor that loads a web page, parses its HTML using Cheerio library and extracts the following meta-data from the <HEAD> tag, such as page title, description, author etc.

0.0 (0)

Pricing

Pay per usage

Total users

1.3K

Monthly users

Runs succeeded

86%

Last modified

2 years ago

Developer tools

Open source

The actor takes a list of URLs of web pages on input, loads the HTML, and then extracts metadata from the HTML. The result is stored as a JSON file into the default dataset.

For example, for https://www.apify.com, the JSON result looks as follows:

{
    "url": "https://www.apify.com/",
    "title": "Web Scraping, Data Extraction and Automation · Apify",
    "meta": {
        "X-UA-Compatible": "IE=edge,chrome=1",
        "viewport": "width=device-width,minimum-scale=1,initial-scale=1",
        "copyright": "Copyright&copy; 2019 Apify Technologies s.r.o. All rights reserved.",
        "keywords": "web scraper, web crawler, scraping, data extraction, API",
        "robots": "index,follow",
        "referrer": "origin",
        "googlebot": "index,follow",
        "description": "Apify extracts data from websites, crawls lists of URLs and automates workflows on the web. Turn any website into an API in a few minutes!",
        "twitter:card": "summary_large_image",
        "twitter:creator": "@apify",
        "fb:app_id": "1636933253245869",
        "og:url": "https://apify.com/",
        "og:type": "website",
        "og:title": "Web Scraping, Data Extraction and Automation · Apify",
        "og:description": "Apify extracts data from websites, crawls lists of URLs and automates workflows on the web. Turn any website into an API in a few minutes!",
        "og:image": "https://apify.com/img/og-image.png",
        "og:image:alt": "Apify",
        "og:image:width": "1200",
        "og:image:height": "630",
        "og:locale": "en_IE",
        "og:site_name": "Apify",
        "next-head-count": "19"
    }
}

On this page

Metadata extractor

Share Actor:

Meta Data Extractor

dainty_screw/metadata-extractor-reliable-web-page-metadata-extraction

Metadata Extractor is your go-to tool for extracting meta-data from web pages. Using Cheerio, it parses HTML to extract titles, descriptions, authors, and more.Perfect for content managers and SEO experts.

codemaster devops

Metadata Scraper

louisdeconinck/metadata-scraper

Automatically scrape metadata such as title, description, heading and article from websites. It will crawl the start URLs and then scrape the metadata from the detail pages automatically navigating through the pagination.

Louis Deconinck

5.0

URL to Metadata

njoylab/url-summary-scraper

A powerful Apify actor that extracts essential website information, including title, description, images, and social media links. Perfect for quick data gathering and insights from any URL.

njoylab

5.0

Website Metadata Extractor (meta tags, sitemap, robots) 🔎

powerful_bachelor/website-metadata-extractor

🔍 Website Metadata Extractor 🌐 Extract essential website data: meta tags, robots.txt, and sitemap.xml in one scan. 📊 Analyze SEO elements, crawler directives, and site structure. ✅ Perfect for SEO audits, 🔎 competitor research, and 🚀 understanding how search engines view your website.

Powerful Bachelor

Get Metadata

maged120/get-metadata

The actor extracts comprehensive metadata including image previews, titles, descriptions, author, time of publish, fav icon, and a lot more

Maged

5.0

Metadata Scraper

autofacts/metadata-scraper

A powerful web scraper that extracts various types of structured metadata from web pages, including JSON-LD, Microdata, Open Graph, Twitter Cards, and more. Perfect for SEO analysis, content aggregation, and research purposes.

Autofactor

5.0

Example Sitemap Cheerio

jancurn/example-sitemap-cheerio

An example actor that first downloads a sitemap in XML format and the crawls each page from the sitemap using the fast CheerioCrawler from Apify SDK.

Jan Čurn

Sitemap URL Extractor

onescales/sitemap-url-extractor

Provide a link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

One Scales

5.0

Sitemap Sniffer

vaclavrut/sitemap-sniffer

Sitemap sniffer will check the most used variants of sitemaps and you can use that for crawling. This will just save you time so you don't have to check manually.