Metadata Extractor avatar
Metadata Extractor
Try for free

No credit card required

View all Actors
Metadata Extractor

Metadata Extractor

jancurn/extract-metadata
Try for free

No credit card required

A small efficient actor that loads a web page, parses its HTML using Cheerio library and extracts the following meta-data from the <HEAD> tag, such as page title, description, author etc.

The actor takes a list of URLs of web pages on input, loads the HTML, and then extracts metadata from the HTML. The result is stored as a JSON file into the default dataset.

For example, for https://www.apify.com, the JSON result looks as follows:

1{
2    "url": "https://www.apify.com/",
3    "title": "Web Scraping, Data Extraction and Automation · Apify",
4    "meta": {
5        "X-UA-Compatible": "IE=edge,chrome=1",
6        "viewport": "width=device-width,minimum-scale=1,initial-scale=1",
7        "copyright": "Copyright&copy; 2019 Apify Technologies s.r.o. All rights reserved.",
8        "keywords": "web scraper, web crawler, scraping, data extraction, API",
9        "robots": "index,follow",
10        "referrer": "origin",
11        "googlebot": "index,follow",
12        "description": "Apify extracts data from websites, crawls lists of URLs and automates workflows on the web. Turn any website into an API in a few minutes!",
13        "twitter:card": "summary_large_image",
14        "twitter:creator": "@apify",
15        "fb:app_id": "1636933253245869",
16        "og:url": "https://apify.com/",
17        "og:type": "website",
18        "og:title": "Web Scraping, Data Extraction and Automation · Apify",
19        "og:description": "Apify extracts data from websites, crawls lists of URLs and automates workflows on the web. Turn any website into an API in a few minutes!",
20        "og:image": "https://apify.com/img/og-image.png",
21        "og:image:alt": "Apify",
22        "og:image:width": "1200",
23        "og:image:height": "630",
24        "og:locale": "en_IE",
25        "og:site_name": "Apify",
26        "next-head-count": "19"
27    }
28}
Developer
Maintained by Community
Actor metrics
  • 36 monthly users
  • 66.6% runs succeeded
  • 0.0 days response time
  • Created in Feb 2018
  • Modified 7 months ago
Categories