Metadata Extractor

  • jancurn/extract-metadata
  • Modified
  • Users 945
  • Runs 599.2k
  • Created by Author's avatarJan 膶urn

A small efficient actor that loads a web page, parses its HTML using Cheerio library and extracts the following meta-data from the <HEAD> tag, such as page title, description, author etc.

To run the code examples, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the code with your API token. For a more detailed explanation, please read about running Actors via the API in Apify Docs.

# Set API token
API_TOKEN=<YOUR_API_TOKEN>

# Prepare Actor input
cat > input.json <<'EOF'
{
  "urls": [
    "https://www.apify.com/",
    "https://blog.apify.com"
  ],
  "proxy": {
    "useApifyProxy": true
  }
}
EOF

# Run the Actor
curl "https://api.apify.com/v2/acts/jancurn~extract-metadata/runs?token=$API_TOKEN" \
  -X POST \
  -d @input.json \
  -H 'Content-Type: application/json'