Metadata Extractor
Try for free
No credit card required
Go to Store
Metadata Extractor
jancurn/extract-metadata
Try for free
No credit card required
A small efficient actor that loads a web page, parses its HTML using Cheerio library and extracts the following meta-data from the <HEAD> tag, such as page title, description, author etc.
The actor takes a list of URLs of web pages on input, loads the HTML, and then extracts metadata from the HTML. The result is stored as a JSON file into the default dataset.
For example, for https://www.apify.com
, the JSON result looks as follows:
1{ 2 "url": "https://www.apify.com/", 3 "title": "Web Scraping, Data Extraction and Automation · Apify", 4 "meta": { 5 "X-UA-Compatible": "IE=edge,chrome=1", 6 "viewport": "width=device-width,minimum-scale=1,initial-scale=1", 7 "copyright": "Copyright© 2019 Apify Technologies s.r.o. All rights reserved.", 8 "keywords": "web scraper, web crawler, scraping, data extraction, API", 9 "robots": "index,follow", 10 "referrer": "origin", 11 "googlebot": "index,follow", 12 "description": "Apify extracts data from websites, crawls lists of URLs and automates workflows on the web. Turn any website into an API in a few minutes!", 13 "twitter:card": "summary_large_image", 14 "twitter:creator": "@apify", 15 "fb:app_id": "1636933253245869", 16 "og:url": "https://apify.com/", 17 "og:type": "website", 18 "og:title": "Web Scraping, Data Extraction and Automation · Apify", 19 "og:description": "Apify extracts data from websites, crawls lists of URLs and automates workflows on the web. Turn any website into an API in a few minutes!", 20 "og:image": "https://apify.com/img/og-image.png", 21 "og:image:alt": "Apify", 22 "og:image:width": "1200", 23 "og:image:height": "630", 24 "og:locale": "en_IE", 25 "og:site_name": "Apify", 26 "next-head-count": "19" 27 } 28}
Developer
Maintained by Community
Actor Metrics
30 monthly users
-
11 stars
>99% runs succeeded
22 days response time
Created in Feb 2018
Modified a year ago
Categories