Contact Info Scraper avatar

Contact Info Scraper

Try for free

1 day trial then $20.00/month - No credit card required now

View all Actors
Contact Info Scraper

Contact Info Scraper

onidivo/contact-info-scraper
Try for free

1 day trial then $20.00/month - No credit card required now

Extract contact information from a list of websites.

Features

Get the following contact details:

Input Configuration

The actor offers several input options to let you specify which pages will be crawled:

  • Start URLs - Lets you add a list of URLs of web pages where the scraper should start. You can enter multiple URLs, upload a text file with URLs, or even use a Google Sheets document.
  • Maximum link depth - Specifies how deep the actor will scrape links from the web pages specified in the Start URLs. If zero, the actor ignores the links and only crawls the Start URLs.
  • Stay within domain - If enabled, the actor will only follow links that are on the same domain as the referring page. For example, if the setting is enabled and the actor finds a link on http://www.example.com/some-page to http://www.another-domain.com/, it will not crawl the second page, because www.example.com is not the same as www.another-domain.com.

The actor also accepts additional input options that let you specify proxy servers, limit the number of pages, etc.

Results

The actor stores its results into the default dataset associated with the actor run. You can then download the results in formats such as JSON, HTML, CSV, XML, or Excel. For each page crawled, the following contact information is extracted (examples shown):

  • Emails
    1noone@example.com
    2no.one@example.com
    3no+one@example.co.in
  • Phone numbers - These are extracted from phone links in HTML (e.g. <a href='tel://123456789'>phone</a>).
    1123456789
    2+123456789
    300123456789
  • Uncertain phone numbers - These are extracted from the plain text of the web page using a number of regular expressions. Note that this approach can generate false positives.
    1+123.456.7890
    2123456789
    3123-456-789
  • LinkedIn profiles
    1https://www.linkedin.com/in/mercedes-benz-group-ag
    2en.linkedin.com/in/mercedes-benz-group-ag
    3linkedin.com/in/mercedes-benz-group-ag
  • Twitter profiles
    1https://www.twitter.com/mercedesbenz
    2twitter.com/mercedesbenz
  • Instagram profiles
    1https://www.instagram.com/mercedesbenz_careers
    2www.instagram.com/mercedesbenz_careers/
    3instagr.am/mercedesbenz_careers
  • Facebook profiles or pages
    1https://www.facebook.com/mercedesbenzcareers
    2facebook.com/mercedesbenzcareers
    3fb.com/mercedesbenzcareers
    4https://www.facebook.com/profile.php?id=99999000

The results also contain information about the URL of the web page, domain, and referring URL (if the page was linked from another page), and depth (how many links away from Start URLs the page was found).

For each page crawled, the resulting dataset contains a single record, which looks like this (in JSON format):

1{
2  "url": "https://group.mercedes-benz.com/investors/services/contact/",
3  "domain": "mercedes-benz.com",
4  "requestUrl": "https://group.mercedes-benz.com/investors/services/contact/",
5  "depth": 2,
6  "referrerUrl": null,
7  "startUrl": "http://group.mercedes-benz.com/en/",
8  "emails": [
9    "alexander.jasperneite@mercedes-benz.com",
10    "andreas.kusche@mercedes-benz.com",
11    "christian.ck.keller@mercedes-benz.com",
12    "dialog@mercedes-benz.com",
13    "ellen_christin.haehnlein@mercedes-benz.com",
14    "investorportal@computershare.de",
15    "ir.mbg@mercedes-benz.com",
16    "na.rothenberg@mercedes-benz.com",
17    "patrick.odermatt@mercedes-benz.com"
18  ],
19  "phones": [
20    "+4989309036376"
21  ],
22  "phonesUncertain": [
23    "+49 89 30903",
24    "0800 324 1111",
25    "32 12 81 763",
26    "711 17 94075"
27  ],
28  "linkedIns": [
29    "https://www.linkedin.com/company/mercedes-benz-group-ag"
30  ],
31  "twitters": [
32    "https://twitter.com/mercedesbenz"
33  ],
34  "instagrams": [
35    "https://www.instagram.com/mercedesbenz_careers"
36  ],
37  "facebooks": [
38    "https://www.facebook.com/mercedesbenzcareers"
39  ],
40  "youtubes": [
41    "https://www.youtube.com/user/mercedesbenztv"
42  ],
43  "tiktoks": [
44    "https://www.tiktok.com/@mercedesbenz"
45  ],
46  "pinterests": [],
47  "discords": []
48}
Developer
Maintained by Community
Actor metrics
  • 27 monthly users
  • 2 stars
  • 100.0% runs succeeded
  • Created in Aug 2024
  • Modified about 1 month ago