Metadata Extractor

Pricing

Pay per usage

Try for free

Go to Apify Store

Metadata Extractor

Try for free

Developed by

Jan Čurn

Maintained by Community

A small efficient actor that loads a web page, parses its HTML using Cheerio library and extracts the following meta-data from the <HEAD> tag, such as page title, description, author etc.

0.0 (0)

Pricing

Pay per usage

1.4K

Last modified

2 years ago

Developer tools

Open source

Dockerfile

FROM apify/actor-node:16
COPY package*.json ./
RUN npm --quiet set progress=false \
 && npm install --only=prod --no-optional \
 && echo "Installed NPM packages:" \
 && (npm list --only=prod --no-optional --all || true) \
 && echo "Node.js version:" \
 && node --version \
 && echo "NPM version:" \
 && npm --version
COPY . ./

main.js

1const Apify = require('apify');
2
3const { log } = Apify.utils;
4
5Apify.main(async () => {
6    const input = await Apify.getInput();
7    const { urls = [], proxy = { useApifyProxy: false } } = input
8
9    if (input.url) urls.push(input.url)
10
11    const requests = [];
12    for (const url of urls) {
13        if (!new URL(url)) throw new Error('All URLs must be valid URLs!');
14        requests.push({ url });
15    }
16
17    const requestList = await Apify.openRequestList('start-urls', requests);
18    const proxyConfiguration = await Apify.createProxyConfiguration({ ...proxy });
19
20    const crawler = new Apify.CheerioCrawler({
21        requestList,
22        proxyConfiguration,
23        maxConcurrency: 50,
24        handlePageFunction: async ({ $, request }) => {
25            const meta = {};
26
27            for (const tag of $('head meta')) {
28                const name = $(tag).attr('name') || $(tag).attr('property') || $(tag).attr('http-equiv');
29                const content = $(tag).attr('content');
30                if (name) meta[name] = content ? content.trim() : null;
31            }
32
33            const result = {
34                url: request.url,
35                title: ($('head title').text() || '').trim(),
36                meta,
37            };
38
39            return Apify.pushData(result);
40        },
41    });
42
43    log.info('Starting the crawl...');
44    await crawler.run();
45    log.info('Scraping finished! Metadata for each site is available in "Results".');
46});

package.json

{
	"name": "extract-metadata",
    "version": "0.0.2",
    "description": "Metadata extractor.",
	"dependencies": {
		"apify": "^2.0.7"
	},
    "scripts": {
        "start": "node main.js"
    },
    "author": "Jan Curn"
}

Meta Data Extractor

dainty_screw/metadata-extractor-reliable-web-page-metadata-extraction

Metadata Extractor is your go-to tool for extracting meta-data from web pages. Using Cheerio, it parses HTML to extract titles, descriptions, authors, and more.Perfect for content managers and SEO experts.

codemaster devops

5.0

Metadata Scraper

louisdeconinck/metadata-scraper

Automatically scrape metadata such as title, description, heading and article from websites. It will crawl the start URLs and then scrape the metadata from the detail pages automatically navigating through the pagination.

Louis Deconinck

109

5.0

Metadata Scraper

autofacts/metadata-scraper

A powerful web scraper that extracts various types of structured metadata from web pages, including JSON-LD, Microdata, Open Graph, Twitter Cards, and more. Perfect for SEO analysis, content aggregation, and research purposes.

Autofactor

5.0

JSDOM Scraper

apify/jsdom-scraper

Parses the HTML using the JSDOM library, providing the same DOM API as browsers do (e.g. `window`). It is able to process client-side JavaScript without using a real browser. Performance-wise, it stands somewhere between the Cheerio Scraper and the browser scrapers.

Apify

118

4.3

HTML Scraper pro

scrapingxpert/html-scraper-pro

The HTML Scraper Pro is a powerful tool designed to extract the HTML source code and metadata from websites. It uses advanced web scraping techniques to retrieve the full HTML content of web pages,page title and HTTP status code.This tool is ideal for data extraction, website analysis, and archiving

scrapingxpert

166

5.0

Download HTML from URLs

mtrunkat/url-list-download-html

This actor takes a list of URLs and downloads HTML of each page.

Marek Trunkát

8.8K

Vanilla JS Scraper

mstephen190/vanilla-js-scraper

Scrape the web using familiar JavaScript methods! Crawls websites using raw HTTP requests, parses the HTML with the JSDOM package, and extracts data from the pages using Node.js code. Supports both recursive crawling and lists of URLs. This actor is a non jQuery alternative to CheerioScraper.

Matthias Stephens

491

Website Metadata Extractor (meta tags, sitemap, robots) 🔎

powerful_bachelor/website-metadata-extractor

🔍 Website Metadata Extractor 🌐 Extract essential website data: meta tags, robots.txt, and sitemap.xml in one scan. 📊 Analyze SEO elements, crawler directives, and site structure. ✅ Perfect for SEO audits, 🔎 competitor research, and 🚀 understanding how search engines view your website.

Powerful Bachelor

Enhanced Deep Content Crawler

assertive_analogy/advanced-crawler

A fast, Python-powered web crawler with smart content extraction, JS support, metadata capture, and duplicate detection. Ideal for SEO, content migration, and e-commerce scraping. Reliable, scalable, and easy to customize.

Gideon Nesh

1.0

🔗✨ Link Extractor Pro: URL to HTML List Downloader

dainty_screw/link-extractor-pro-url-to-html-list-downloader

Maximize productivity with HTML URL List Downloader. Quickly extract, manage, and organize URLs from HTML pages. Ideal for SEO professionals and digital marketers. Streamline your workflow today!

codemaster devops

158

5.0