
URL Summary Scraper
Pricing
$2.50/month + usage

URL Summary Scraper
A powerful Apify actor that extracts essential website information, including title, description, images, and social media links. Perfect for quick data gathering and insights from any URL.
5.0 (2)
Pricing
$2.50/month + usage
6
Monthly users
5
Runs succeeded
>99%
Last modified
5 months ago
Website Summary Scraper
This project is a web scraping tool designed to extract metadata from websites. It uses libraries like Axios for HTTP requests, Cheerio for HTML parsing, and Apify SDK for actor management. The scraper can fetch metadata such as titles, descriptions, social media links, and more from a webpage.
Features
- Fetches metadata from a given URL.
- Supports custom user-agent strings.
- Respects
robots.txt
rules unless explicitly ignored. - Extracts social media links and contact information.
- Extracts external links.
Usage
- Prepare the input:
It should include at least the url
field.
1{ 2 "url": "https://example.com", 3 "language": "en-US", 4 "ignoreRobots": false, 5 "ignoreExternalLinks": false 6}
-
Output:
1{ 2 "title": "Example Domain", 3 "description": "This domain is for use in illustrative examples in documents.", 4 "keywords": "example, domain, illustrative, examples, documents", 5 "image": "https://example.com/image.png", 6 "facebook": "https://facebook.com/example", 7 "x": "https://twitter.com/example", 8 "linkedin": "https://linkedin.com/company/example", 9 "instagram": "https://instagram.com/example", 10 "youtube": "https://youtube.com/example", 11 "trustpilot": "https://trustpilot.com/review/example.com", 12 "canonical": "https://example.com", 13 "url_fetched": "https://example.com", 14 "url": "https://example.com", 15 "mail": "contact@example.com", 16 "robotsAllow": true, 17 "linksExternal": ["https://example.com/external1", "https://example.com/external2"] 18}
Configuration
- User-Agent: The scraper uses a random user-agent string for each request to mimic a real browser.
- Language: You can specify the
Accept-Language
header in the input payload. - Robots.txt: By default, the scraper respects
robots.txt
rules. SetignoreRobots
totrue
in the input payload to bypass this. - External Links: By default, the scraper extracts external links. Set
ignoreExternalLinks
totrue
in the input payload to bypass this.
Pricing
Pricing model
RentalTo use this Actor, you have to pay a monthly rental fee to the developer. The rent is subtracted from your prepaid usage every month after the free trial period. You also pay for the Apify platform usage.
Free trial
3 days
Price
$2.50