Metascraper avatar

Metascraper

Deprecated
View all Actors
This Actor is deprecated

This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?

See alternative Actors
Metascraper

Metascraper

kristoferlund/apify-metascraper

Simple actor that loads webpage and scrapes metadata using Metascraper library. Metadata – A library to easily scrape metadata from an article on the web using Open Graph, JSON+LD, regular HTML metadata, and series of fallbacks. https://metascraper.js.org

Dockerfile

1# This is a template for a Dockerfile used to run acts in Actor system.
2# The base image name below is set during the act build, based on user settings.
3# IMPORTANT: The base image must set a correct working directory, such as /usr/src/app or /home/user
4FROM apify/actor-node-basic:v0.21.10
5
6# Second, copy just package.json and package-lock.json since it should be
7# the only file that affects "npm install" in the next step, to speed up the build
8COPY package*.json ./
9
10# Install NPM packages, skip optional and development dependencies to
11# keep the image small. Avoid logging too much and print the dependency
12# tree for debugging
13RUN npm --quiet set progress=false \
14 && npm install --only=prod --no-optional \
15 && echo "Installed NPM packages:" \
16 && (npm list --all || true) \
17 && echo "Node.js version:" \
18 && node --version \
19 && echo "NPM version:" \
20 && npm --version
21
22# Copy source code to container
23# Do this in the last step, to have fast build if only the source code changed
24COPY  . ./
25
26# NOTE: The CMD is already defined by the base image.
27# Uncomment this for local node inspector debugging:
28# CMD [ "node", "--inspect=0.0.0.0:9229", "main.js" ]

package.json

1{
2    "name": "apify-project",
3    "version": "0.0.1",
4    "description": "",
5    "author": "It's not you it's me",
6    "license": "ISC",
7    "dependencies": {
8        "apify": "0.21.10",
9        "request-promise": "latest",
10        "metascraper": "latest",
11        "metascraper-author": "latest",
12        "metascraper-date": "latest",
13        "metascraper-description": "latest",
14        "metascraper-image": "latest",
15        "metascraper-video": "latest",
16        "metascraper-youtube": "latest",
17        "metascraper-logo": "latest",
18        "metascraper-clearbit": "latest",
19        "metascraper-publisher": "latest",
20        "metascraper-title": "latest",
21        "metascraper-url": "latest"
22    },
23    "scripts": {
24        "start": "node main.js"
25    }
26}

main.js

1const Apify = require('apify');
2const request = require('request-promise');
3
4const metascraper = require('metascraper')([
5  require('metascraper-author')(),
6  require('metascraper-date')(),
7  require('metascraper-description')(),
8  require('metascraper-image')(),
9  require('metascraper-video')(),
10  require('metascraper-youtube')(),
11  require('metascraper-logo')(),
12  require('metascraper-clearbit')(),
13  require('metascraper-publisher')(),
14  require('metascraper-title')(),
15  require('metascraper-url')()
16])
17
18
19Apify.main(async () => {
20    // Get input of your actor
21    const input = await Apify.getInput();
22
23    if (!input || !input.url) throw new Error('Invalid input, must be a JSON object with the "url" field!');
24
25    const html = await request(input.url);
26    const metadata = await metascraper({ html: html, url: input.url })
27
28    console.dir(metadata)
29    
30    await Apify.setValue('OUTPUT', JSON.stringify(metadata));
31});
Developer
Maintained by Community
Categories