URL to Metadata - mail, social and more
Pricing
$2.50/month + usage
Go to Apify Store

URL to Metadata - mail, social and more
A powerful Apify actor that extracts essential website information, including title, description, images, mail, and social media links. Perfect for quick data gathering and insights from any URL.
Pricing
$2.50/month + usage
Rating
5.0
(2)
Developer

njoylab
Maintained by Community
Actor stats
7
Bookmarked
105
Total users
2
Monthly active users
7 hours ago
Last modified
Categories
Share
Website URL to Metadata
Looking for AI-powered metadata extraction? Check out our pay-per-use actor with AI summary capabilities: URL Summary Scraper with AI
This project is a web scraping tool designed to extract metadata from websites. The scraper can fetch metadata such as titles, descriptions, social media links, and more from a webpage.
Features
- Fetches metadata from a given URL.
- Supports custom user-agent strings.
- Respects
robots.txtrules unless explicitly ignored. - Extracts social media links and contact information.
- Extracts external links.
Usage
- Prepare the input:
It should include at least the url field.
{"url": "https://example.com","language": "en-US","ignoreRobots": false,"ignoreExternalLinks": false,"ignoreInteralLinks": false}
-
Output:
{"title": "Example Domain","description": "This domain is for use in illustrative examples in documents.","keywords": "example, domain, illustrative, examples, documents","image": "https://example.com/image.png","facebook": "https://facebook.com/example","x": "https://twitter.com/example","linkedin": "https://linkedin.com/company/example","instagram": "https://instagram.com/example","youtube": "https://youtube.com/example","trustpilot": "https://trustpilot.com/review/example.com","canonical": "https://example.com","url_fetched": "https://example.com","url": "https://example.com","mail": "contact@example.com","robotsAllow": true,"linksExternal": ["https://example.com/external1", "https://example.com/external2"],"linksInternal": ["https://example.com/about", "https://example.com/contacts"]}
Configuration
- User-Agent: The scraper uses a random user-agent string for each request to mimic a real browser.
- Language: You can specify the
Accept-Languageheader in the input payload. - Robots.txt: By default, the scraper respects
robots.txtrules. SetignoreRobotstotruein the input payload to bypass this. - External Links: By default, the scraper extracts external links. Set
ignoreExternalLinkstotruein the input payload to bypass this. - Internal Links: By default, the scraper extracts internal links. Set
ignoreInternalLinkstotruein the input payload to bypass this.