Simple Contact Info and Social Media Scraper avatar

Simple Contact Info and Social Media Scraper

Try for free

Pay $3.00 for 1,000 results

View all Actors
Simple Contact Info and Social Media Scraper

Simple Contact Info and Social Media Scraper

pajoe/simple-contact-info-and-social-media-scraper
Try for free

Pay $3.00 for 1,000 results

This Apify actor is designed to crawl web pages and extract social media handles, emails, and phone numbers using Puppeteer. It can handle dynamic content and navigate through multiple pages, making it suitable for comprehensive data extraction tasks.

This Apify actor is designed to crawl web pages and extract social media handles, emails, and phone numbers using Puppeteer. It can handle dynamic content and navigate through multiple pages, making it suitable for comprehensive data extraction tasks.

If you're looking for examples or want to learn more visit:

Included features

  • Data Extraction: Extracts social media handles, emails, and phone numbers.
  • Dynamic Content Handling: Supports crawling through links and HTML frames.
  • Configurable: Set depth and request limits.
  • Proxy Support: Uses Apify's proxy configuration for anonymity and IP rotation.

How it works

  1. Input: Define start URLs in INPUT.json.
  2. Proxy Configuration: Set up proxies to avoid IP blocking.
  3. Crawler Setup: Use PuppeteerCrawler with custom routing.
  4. Request Handling: Customize page handling in routes.js.
  5. Execution: Start the crawler with crawler.run(startUrls);.

Input Configuration

{ "considerChildFrames": true, "maxDepth": 2, "maxRequests": 100, "sameDomain": true, "startUrls": [ { "url": "https://nonos.ph/", "method": "GET" } ] } ``

  • startUrls: List of URLs to start crawling from.
  • proxyConfig: Configuration for using Apify's proxy services.
  • sameDomain: Restrict crawling to the same domain.
  • maxDepth: Maximum depth of links to follow.
  • considerChildFrames: Enable crawling of HTML frames.
  • maxRequests: Total number of requests to make.
  • maxRequestsPerStartUrl: Limit requests per start URL.

Output Dataset

The actor stores its results in the default dataset associated with the actor run. You can download the results in formats such as JSON, HTML, CSV, XML, or Excel. Each record in the dataset includes:

  • URL: The page URL.
  • Email: Extracted email addresses.
  • Phone Number: Extracted phone numbers.
  • Social Media Profiles: Links to social media profiles (e.g., Facebook, Twitter, LinkedIn).

Resources

If you're looking for examples or want to learn more visit:

Documentation reference

To learn more about Apify and Actors, take a look at the following resources:

Developer
Maintained by Community

Actor Metrics

  • 2 monthly users

  • 1 star

  • >99% runs succeeded

  • Created in Nov 2024

  • Modified 8 days ago