Contact Details Scraper

vdrmota/contact-info-scraper

Free email extractor and lead scraper to extract and download emails, phone numbers, Facebook, Twitter, LinkedIn, and Instagram profiles from any website. Extract contact information at scale from lists of URLs and download the data as Excel, CSV, JSON, HTML, and XML.

Do you want to learn more about this Actor?

Get a demo

Back to issues Create new issue

Does not find any e-mail, even though I find it manually

Closed

beautiful_jumbo opened this issue

Hi,

I have a list of URLs and it doesnt find any e-mails at all.

Let's take the first example:

https://mi-tech.de

I can find an e-mail here (https://www.mi-tech.de/impressum/) and here (https://www.mi-tech.de/kontakt/)

It is in the code, even as a href link. Are my settings wrong?

Here is my input: { "considerChildFrames": false, "maxDepth": 1000, "maxRequests": 9999999, "maxRequestsPerStartUrl": 1000, "sameDomain": true, "startUrls": [ { "requestsFromUrl": "https://apify-uploads-prod.s3.us-east-1.amazonaws.com/J04srFO9aYIUdph82-no-email.txt" } ], "waitUntil": "domcontentloaded", "proxyConfig": { "useApifyProxy": true } }

Milan Knoll (milunnn)

Hi,

the https://mi-tech.de page was the only page from that domain that was scraped at that point in time. Your config is okay, the only thing I would change is maxDepth, it seems too high for an ordinary purpose. The recommended value for this field would be below 10.

If you don't know what it means, it is like a "distance" from the first page (it describes how many pages you can go away from the original one). So if you put a high number there, you could theoretically crawl the whole website (which would also include a lot of unimportant and rarely referenced pages).

I did a test run of the actor and it works as you would expect. The problem is that it concentrated on other websites from the list. To see the results for this page more quickly, try to isolate the input to only this https://mi-tech.de. You should see that it gives you emails at the right pages.

Also, this actor returns one row of data per webpage, not per the whole domain. Maybe it seemed like it did not find anything on the whole https://mi-tech.de website, but it only searched through the initial (index) page.

beautiful_jumbo

Hi,

thanks alot for your response. It is a bit confusing to me. I would expect that the actor takes every domain of the list, crawls it, if it finds an email ok, go to next domain from list, if it doesnt find an email, go to the referenced pages and look there. As soon as it finds an email, go to next domain. But it seems its not working like that. What do you mean by "concentrating on other domains from the list". What logic is behind this, why have other domains higher priorities. I will try and reduce the maxDepth parameter and let your know.

beautiful_jumbo

interesting, as you have mentioned, if the domains are added manually, instead of using the .txt, it works as expected. It outputs the maxDepth number set amount of rows per domain, which is logical to me. Can you explain, why when using the remot txt file, it is only crawling the start domain and not going further, despite the maxDepth Setting being > 1 ? Thanks.

beautiful_jumbo

my bad, they are being queued and run asynchronously....sorry!

Add comment

Developer

Vojta Drmota

Actor Metrics

1.2k monthly users
247 stars
>99% runs succeeded
5.3 days response time
Created in May 2019
Modified 14 days ago

Categories

Lead generation

Other

Contact Details Scraper Bundler

hamza.alwan/contact-bundle

Bundles any actor with Contact Details Scraper and enhances the original dataset with contact details.

Hamza Alwan

251

Contact Detail Scraper

pintostudio/contact-detail-scraper

Effortlessly extract and download emails, phone numbers, and social media profiles (Facebook, Twitter, LinkedIn, Instagram) from any website with our free email extractor. You can export the data in your preferred format—Excel, CSV, JSON, HTML, or XML—for easy analysis and integration.

Pinto Studio

Facebook Pages Detail (E-mail, phone, website) Scraper

memo23/facebook-page-contact-detail-scraper

Get Facebook pages addresses, category, email, mobile, likes, website, ratings, hours open, check-ins, and phone information and social platform links

Muhamed Didovic

Contact Details Merge & Deduplicate

lukaskrivka/contact-details-merge-deduplicate

Merge and deduplicate all contacts extracted by Contact Details Scraper. Works with multiple datasets. One row per domain.

Lukáš Křivka

Google Maps Scraper

compass/crawler-google-places

Extract data from hundreds of Google Maps locations and businesses. Get Google Maps data including reviews, images, contact info, opening hours, location, popular times, prices & more. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.

Compass

81.3k

683

Facebook Pages Scraper

apify/facebook-pages-scraper

Facebook scraping tool to crawl and extract basic data from one or multiple Facebook Pages. Extract Facebook page name, page URL address, category, likes, check-ins, and other public data. Download data in JSON, CSV, Excel and use it in apps, spreadsheets, and reports.

Apify

19.7k

Extended GPT Scraper

drobnikj/extended-gpt-scraper

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.

Jakub Drobník

1.2k

Tripadvisor Scraper

maxcopell/tripadvisor

This unofficial Tripadvisor API is a data extraction tool able to get data on hotels, restaurants, things to do, vacation rentals, attractions, tours, and public trips. Get pricing, contact details, amenities, awards, ratings, and more. Download your data in Excel, JSON, CSV, and other formats.

Maximillian Copelli

6.6k

Craigslist Scraper

ivanvs/craigslist-scraper

Extract data from classified advertisements on Craigslist. Scrape contact details from jobs, housing, items wanted, items for sale, services, community service, gigs, events and resumes listed on Craigslist. Download listings data in JSON, XML, Excel, and other versatile