URL to Metadata - mail, social and more avatar
URL to Metadata - mail, social and more

Pricing

$2.50/month + usage

Go to Apify Store
URL to Metadata - mail, social and more

URL to Metadata - mail, social and more

A powerful Apify actor that extracts essential website information, including title, description, images, mail, and social media links. Perfect for quick data gathering and insights from any URL.

Pricing

$2.50/month + usage

Rating

5.0

(2)

Developer

njoylab

njoylab

Maintained by Community

Actor stats

7

Bookmarked

105

Total users

2

Monthly active users

6 hours ago

Last modified

Share

Website URL to Metadata

Looking for AI-powered metadata extraction? Check out our pay-per-use actor with AI summary capabilities: URL Summary Scraper with AI

This project is a web scraping tool designed to extract metadata from websites. The scraper can fetch metadata such as titles, descriptions, social media links, and more from a webpage.

Features

  • Fetches metadata from a given URL.
  • Supports custom user-agent strings.
  • Respects robots.txt rules unless explicitly ignored.
  • Extracts social media links and contact information.
  • Extracts external links.

Usage

  1. Prepare the input:

It should include at least the url field.

{
"url": "https://example.com",
"language": "en-US",
"ignoreRobots": false,
"ignoreExternalLinks": false,
"ignoreInteralLinks": false
}
  1. Output:

    {
    "title": "Example Domain",
    "description": "This domain is for use in illustrative examples in documents.",
    "keywords": "example, domain, illustrative, examples, documents",
    "image": "https://example.com/image.png",
    "facebook": "https://facebook.com/example",
    "x": "https://twitter.com/example",
    "linkedin": "https://linkedin.com/company/example",
    "instagram": "https://instagram.com/example",
    "youtube": "https://youtube.com/example",
    "trustpilot": "https://trustpilot.com/review/example.com",
    "canonical": "https://example.com",
    "url_fetched": "https://example.com",
    "url": "https://example.com",
    "mail": "contact@example.com",
    "robotsAllow": true,
    "linksExternal": ["https://example.com/external1", "https://example.com/external2"],
    "linksInternal": ["https://example.com/about", "https://example.com/contacts"]
    }

Configuration

  • User-Agent: The scraper uses a random user-agent string for each request to mimic a real browser.
  • Language: You can specify the Accept-Language header in the input payload.
  • Robots.txt: By default, the scraper respects robots.txt rules. Set ignoreRobots to true in the input payload to bypass this.
  • External Links: By default, the scraper extracts external links. Set ignoreExternalLinks to true in the input payload to bypass this.
  • Internal Links: By default, the scraper extracts internal links. Set ignoreInternalLinks to true in the input payload to bypass this.