Deep Email, Phone, & Social Media Scraper avatar
Deep Email, Phone, & Social Media Scraper

Pricing

from $6.00 / 1,000 emails

Go to Store
Deep Email, Phone, & Social Media Scraper

Deep Email, Phone, & Social Media Scraper

Developed by

peterasorensen

peterasorensen

Maintained by Community

A powerful tool that extracts emails, phone numbers, and social media profiles from any website. It intelligently navigates, prioritizing pages likely to have contact info - even deep in the site. Perfect for lead generation, market research, competitive analysis, and building contact databases.

4.8 (7)

Pricing

from $6.00 / 1,000 emails

71

Total users

1.2K

Monthly users

376

Runs succeeded

>99%

Issues response

21 hours

Last modified

13 days ago

CA

ERROR CheerioCrawler: Request failed and reached maximum retries.

Closed

CAP-Apify opened this issue
23 days ago

Hi I am intermittently getting this error crawling the same website. Sometimes it will return 7 results other times 0 with this error.

{ "maxDepth": 3, "removeDuplicates": true, "scrapeTypes": [ "emails", "socialMedia", "phoneNumbers" ], "websites": [ "www.geneseeny.com" ] }

peterasorensen avatar

Can you possibly share the full logs?

CA

CAP-Apify

22 days ago

2025-06-25T13:02:59.609Z ACTOR: Pulling Docker image of build XR94UeTuOXE3hbJG5 from registry. 2025-06-25T13:03:14.762Z ACTOR: Creating Docker container. 2025-06-25T13:03:15.579Z ACTOR: Starting Docker container. 2025-06-25T13:03:16.253Z Will run command: xvfb-run -a -s "-ac -screen 0 1920x1080x24+32 -nolisten tcp" npm start --silent 2025-06-25T13:03:18.505Z INFO System info {"apifyVersion":"3.4.2","apifyClientVersion":"2.12.5","crawleeVersion":"3.13.5","osType":"Linux","nodeVersion":"v22.16.0"} 2025-06-25T13:03:18.571Z Starting the contact information scraper... 2025-06-25T13:03:18.631Z Received input: { 2025-06-25T13:03:18.633Z input: { 2025-06-25T13:03:18.635Z maxDepth: 3, 2025-06-25T13:03:18.637Z removeDuplicates: true, 2025-06-25T13:03:18.639Z scrapeTypes: [ 'emails', 'socialMedia', 'phoneNumbers' ], 2025-06-25T13:03:18.641Z websites: [ 'www.geneseeny.com' ], 2025-06-25T13:03:18.643Z maxLinksPerPage: 200 2025-06-25T13:03:18.644Z } 2025-06-25T13:03:18.653Z } 2025-06-25T13:03:18.655Z Scraping types: emails, socialMedia, phoneNumbers 2025-06-25T13:03:18.656Z Number of websites to scrape: 1 2025-06-25T13:03:18.658Z Processing 1 URLs with Cheerio crawler in batches of 25 2025-06-25T13:03:18.660Z Processing Cheerio batch 1/1 (1 URLs) 2025-06-25T13:03:19.078Z Added 1 URLs to Cheerio crawler queue 2025-06-25T13:03:19.096Z Starting Cheerio crawler batch 1... 2025-06-25T13:03:19.492Z INFO CheerioCrawler: Starting the crawler. 2025-06-25T13:03:19.680Z Using User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:109.0) Gecko/20100101 Firefox/115.0 2025-06-25T13:03:21.227Z ERROR CheerioCrawler: Request failed and reached maximum retries. request timed out after 1.5 seconds. {"id":"s7BzgmSE8HIBpgk","url":"http://geneseeny.com","method":"GET","uniqueKey":"http://geneseeny.com"} 2025-06-25T13:03:21.230Z Request failed for http://geneseeny.com: request timed out after 1.5 seconds. 2025-06-25T13:03:21.231Z Error details: { 2025-06-25T13:03:21.233Z "name": "Error", 2025-06-25T13:03:21.245Z "statusCode": "N/A", 2025-06-25T13:03:21.247Z "stack": "Error: request timed out after 1.5 seconds.\n at Timeout._onTimeout (/home/myuser/node_modules/@apify/timeout/cjs/index.cjs:64:68)\n at listOnTimeout (node:internal/timers:588:17)\n at process.processTimers (node:internal/timers:523:7)" 2025-06-25T13:03:21.249Z } 2025-06-25T13:03:21.320Z INFO CheerioCrawler: All requests from the queue have been processed, the crawler will shut down. 2025-06-25T13:03:21.492Z INFO CheerioCrawler: Final request statistics: {"requestsFinished":0,"requestsFailed":1,"retryHistogram":[1],"requestAvgFailedDurationMillis":1548,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":24,"requestTotalDurationMillis":1548,"requestsTotal":1,"crawlerRuntimeMillis":2479} 2025-06-25T13:03:21.495Z INFO CheerioCrawler: Error analysis: {"totalErrors":1,"uniqueErrors":1,"mostCommonErrors":["1x: request timed out after 1.5 seconds. (/home/myuser/node_modules/@apify/timeout/cjs/index.cjs:64:68)"]} 2025-06-25T13:03:21.496Z INFO CheerioCrawler: Finished! Total 1 requests: 0 succeeded, 1 failed. {"terminal":true} 2025-06-25T13:03:21.511Z Cheerio crawler batch 1 finished 2025-06-25T13:03:21.515Z INFO CheerioCrawler: The crawler has been gracefully stopped. 2025-06-25T13:03:23.512Z Memory usage: 70MB heap, 164MB RSS 2025-06-25T13:03:23.515Z All Cheerio crawler batches finished 2025-06-25T13:03:23.516Z Domains with zero results: { 'geneseeny.com': { count: 0, originalUrl: 'www.geneseeny.com' } } 2025-06-25T13:03:23.518Z Processing 1 domains with no results from cheerio crawler 2025-06-25T13:03:23.520Z Clearing global variables before Playwright... 2025-06-25T13:03:23.542Z Processing batch 1/1 (1 domains) 2025-06-25T13:03:23.760Z Starting playwright crawler for batch with 1 requests... 2025-06-25T13:03:23.914Z INFO PlaywrightCrawler: Starting the crawler. 2025-06-25T13:03:39.480Z Using User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:139.0) Gecko/20100101 Firefox/139.0 2025-06-25T13:03:41.580Z ERROR PlaywrightCrawler: Request failed and reached maximum retries. Navigation timed out after 2 seconds. {"id":"oOvKSb2RnvgOqAG","url":"http://geneseeny.com","method":"GET","uniqueKey":"http://geneseeny.com-playwright-1750856603760-0"} 2025-06-25T13:03:41.582Z Request failed for http://geneseeny.com: Navigation timed out after 2 seconds. 2025-06-25T13:03:41.584Z Error details: { 2025-06-25T13:03:41.585Z "name": "Error", 2025-06-25T13:03:41.587Z "statusCode": "N/A", 2025-06-25T13:03:41.589Z "stack": "Error: Navigation timed out after 2 seconds.\n at handleRequestTimeout (/home/myuser/node_modules/@crawlee/core/crawlers/crawler_utils.js:13:11)\n at PlaywrightCrawler._handleNavigationTimeout (/home/myuser/node_modules/@crawlee/browser/internals/browser-crawler.js:357:46)\n at PlaywrightCrawler._handleNavigation (/home/myuser/node_modules/@crawlee/browser/internals/browser-crawler.js:334:24)\n at async PlaywrightCrawler._runRequestHandler (/home/myuser/node_modules/@crawlee/browser/internals/browser-crawler.js:260:13)\n at async PlaywrightCrawler._runRequestHandler (/home/myuser/node_modules/@crawlee/playwright/internals/playwright-crawler.js:114:9)\n at async wrap (/home/myuser/node_modules/@apify/timeout/cjs/index.cjs:54:21)" 2025-06-25T13:03:41.597Z } 2025-06-25T13:03:41.881Z INFO PlaywrightCrawler: All requests from the queue have been processed, the crawler will shut down. 2025-06-25T13:03:48.075Z INFO PlaywrightCrawler: Final request statistics: {"requestsFinished":0,"requestsFailed":1,"retryHistogram":[1],"requestAvgFailedDurationMillis":17346,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":2,"requestTotalDurationMillis":17346,"requestsTotal":1,"crawlerRuntimeMillis":24512} 2025-06-25T13:03:48.077Z INFO PlaywrightCrawler: Error analysis: {"totalErrors":1,"uniqueErrors":1,"mostCommonErrors":["1x: Navigation timed out after 2 seconds. (/home/myuser/node_modules/@crawlee/core/crawlers/crawler_utils.js:13:11)"]} 2025-06-25T13:03:48.079Z INFO PlaywrightCrawler: Finished! Total 1 requests: 0 succeeded, 1 failed. {"terminal":true} 2025-06-25T13:03:48.093Z Batch 1 finished 2025-06-25T13:03:48.114Z INFO PlaywrightCrawler: The crawler has been gracefully stopped. 2025-06-25T13:03:51.094Z Memory usage: 81MB heap, 177MB RSS 2025-06-25T13:03:51.102Z All crawlers finished 2025-06-25T13:03:51.104Z Domains with zero results [0: No results, 3xx: Redirect, 4xx/5xx: Site unavailable]: [ [ 'geneseeny.com', { count: 0, originalUrl: 'www.geneseeny.com' } ] ]

peterasorensen avatar

I increased the timeout option just in case that was a problem. However, in the future, I would highly recommend running this from a residential scraper rather than the Data center proxies. The reason is because data center ones have a higher chance of getting detected and blocked for bot detection. The residential proxies will be much better off.

Try running again with those and let me know if you still have troubles. If you do, please reopen the issue and I will take another look. Thank you.

CA

CAP-Apify

20 days ago

thanks. I will try it again. As an FYI it was failing with Apify in the test environment as well.

CA

CAP-Apify

20 days ago

Hey here is a run from WITHIN apify not my app

2025-06-27T12:12:26.468Z ACTOR: Pulling Docker image of build urWEtnH2KVO16yKuA from registry. 2025-06-27T12:12:26.469Z ACTOR: Creating Docker container. 2025-06-27T12:12:26.531Z ACTOR: Starting Docker container. 2025-06-27T12:12:26.759Z Will run command: xvfb-run -a -s "-ac -screen 0 1920x1080x24+32 -nolisten tcp" npm start --silent 2025-06-27T12:12:29.426Z INFO System info {"apifyVersion":"3.4.2","apifyClientVersion":"2.12.5","crawleeVersion":"3.13.5","osType":"Linux","nodeVersion":"v22.16.0"} 2025-06-27T12:12:29.472Z Starting the contact information scraper... 2025-06-27T12:12:29.606Z Received input: { 2025-06-27T12:12:29.607Z input: { 2025-06-27T12:12:29.608Z maxDepth: 3, 2025-06-27T12:12:29.610Z removeDuplicates: true, 2025-06-27T12:12:29.611Z scrapeTypes: [ 'emails', 'socialMedia', 'phoneNumbers' ], 2025-06-27T12:12:29.612Z websites: [ 'https://www.lunenburgchamber.com/' ], 2025-06-27T12:12:29.613Z maxLinksPerPage: 200 2025-06-27T12:12:29.613Z } 2025-06-27T12:12:29.614Z } 2025-06-27T12:12:29.615Z Scraping types: emails, socialMedia, phoneNumbers 2025-06-27T12:12:29.616Z Number of websites to scrape: 1 2025-06-27T12:12:29.616Z Processing 1 URLs with Cheerio crawler in batches of 25 2025-06-27T12:12:29.617Z Processing Cheerio batch 1/1 (1 URLs) 2025-06-27T12:12:30.030Z Added 1 URLs to Cheerio crawler queue 2025-06-27T12:12:30.031Z Starting Cheerio crawler batch 1... 2025-06-27T12:12:30.260Z INFO CheerioCrawler: Starting the crawler. 2025-06-27T12:12:30.485Z Using User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:139.0) Gecko/20100101 Firefox/139.0 2025-06-27T12:12:30.776Z Completed navigation to: https://lunenburgchamber.com 2025-06-27T12:12:30.777Z Response status code: 200 2025-06-27T12:12:30.785Z Processing https://lunenburgchamber.com... (depth: 0) 2025-06-27T12:12:30.786Z 🔍 [DEBUG] Starting email extraction... 2025-06-27T12:12:30.788Z 🔍 [DEBUG] Starting phone number extraction... 2025-06-27T12:12:30.792Z 🔍 [DEBUG] Starting social media extraction... 2025-06-27T12:12:31.223Z INFO CheerioCrawler: All requests from the queue have been processed, the crawler will shut down. 2025-06-27T12:12:31.690Z INFO CheerioCrawler: Final request statistics: {"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":350,"requestsFinishedPerMinute":35,"requestsFailedPerMinute":0,"requestTotalDurationMillis":350,"requestsTotal":1,"crawlerRuntimeMillis":1730} 2025-06-27T12:12:31.691Z INFO CheerioCrawler: Finished! Total 1 requests: 1 succeeded, 0 failed. {"terminal":true} 2025-06-27T12:12:31.729Z Cheerio crawler batch 1 finished 2025-06-27T12:12:31.731Z INFO CheerioCrawler: The crawler has been gracefully stopped. 2025-06-27T12:12:33.730Z Memory usage: 67MB heap, 167MB RSS 2025-06-27T12:12:33.735Z All Cheerio crawler batches finished 2025-06-27T12:12:33.736Z Domains with zero results: { 2025-06-27T12:12:33.737Z 'lunenburgchamber.com': { count: 0, originalUrl: 'https://www.lunenburgchamber.com/' } 2025-06-27T12:12:33.737Z } 2025-06-27T12:12:33.738Z Processing 1 domains with no results from cheerio crawler 2025-06-27T12:12:33.739Z Clearing global variables before Playwright... 2025-06-27T12:12:33.739Z Processing batch 1/1 (1 domains) 2025-06-27T12:12:33.944Z Starting playwright crawler for batch with 1 requests... 2025-06-27T12:12:34.047Z INFO PlaywrightCrawler: Starting the crawler. 2025-06-27T12:12:36.760Z Using User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:139.0) Gecko/20100101 Firefox/139.0 2025-06-27T12:12:39.862Z ERROR PlaywrightCrawler: Request failed and reached maximum retries. Navigation timed out after 3 seconds. {"id":"rtns6OdLkIbxtMX","url":"https://lunenburgchamber.com","method":"GET","uniqueKey":"https://lunenburgchamber.com/-playwright-1751026353943-0"} 2025-06-27T12:12:39.928Z Request failed for https://lunenburgchamber.com: Navigation timed out after 3 seconds. 2025-06-27T12:12:39.929Z Error details: { 2025-06-27T12:12:39.930Z "name": "Error", 2025-06-27T12:12:39.930Z "statusCode": "N/A", 2025-06-27T12:12:39.931Z "stack": "Error: Navigation timed out after 3 seconds.\n at handleRequestTimeout (/home/myuser/node_modules/@crawlee/core/crawlers/crawler_utils.js:13:11)\n at PlaywrightCrawler._handleNavigationTimeout (/home/myuser/node_modules/@crawlee/browser/internals/browser-crawler.js:357:46)\n at PlaywrightCrawler._handleNavigation (/home/myuser/node_modules/@crawlee/browser/internals/browser-crawler.js:334:24)\n at async PlaywrightCrawler._runRequestHandler (/home/myuser/node_modules/@crawlee/browser/internals/browser-crawler.js:260:13)\n at async PlaywrightCrawler._runRequestHandler (/home/myuser/node_modules/@crawlee/playwright/internals/playwright-crawler.js:114:9)\n at async wrap (/home/myuser/node_modules/@apify/timeout/cjs/index.cjs:54:21)" 2025-06-27T12:12:39.932Z } 2025-06-27T12:12:40.129Z INFO PlaywrightCrawler: All requests from the queue have been processed, the crawler will shut down. 2025-06-27T12:12:45.202Z INFO PlaywrightCrawler: Final request statistics: {"requestsFinished":0,"requestsFailed":1,"retryHistogram":[1],"requestAvgFailedDurationMillis":5652,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":5,"requestTotalDurationMillis":5652,"requestsTotal":1,"crawlerRuntimeMillis":11439} 2025-06-27T12:12:45.203Z INFO PlaywrightCrawler: Error analysis: {"totalErrors":1,"uniqueErrors":1,"mostCommonErrors":["1x: Navigation timed out after 3 seconds. (/home/myuser/node_modules/@crawlee/core/crawlers/crawler_utils.js:13:11)"]} 2025-06-27T12:12:45.204Z INFO PlaywrightCrawler: Finished! Total 1 requests: 0 succeeded, 1 failed. {"terminal":true} 2025-06-27T12:12:45.220Z Batch 1 finished 2025-06-27T12:12:45.226Z INFO PlaywrightCrawler: The crawler has been gracefully stopped. 2025-06-27T12:12:48.221Z Memory usage: 81MB heap, 178MB RSS 2025-06-27T12:12:48.222Z All crawlers finished 2025-06-27T12:12:48.223Z Domains with zero results [0: No results, 3xx: Redirect, 4xx/5xx: Site unavailable]: [ 2025-06-27T12:12:48.224Z [ 2025-06-27T12:12:48.225Z 'lunenburgchamber.com', 2025-06-27T12:12:48.226Z { count: 0, originalUrl: 'https://www.lunenburgchamber.com/' } 2025-06-27T12:12:48.226Z ] 2025-06-27T12:12:48.227Z ]

peterasorensen avatar

Hi thanks for reopening. I realized I didn't have residential proxies setup on the configuration side. I added defaults for that now to default to the USA, but if most the websites you are scraping are in the UK or elsewhere, please select the UK. Can you try again once more and let me know how it goes?

I attached an image to show how to do that. Or you can pass via the JSON input. I'll leave the issue open until you confirm some improvement.

peterasorensen avatar

I also realized one more thing which is I had the timeouts set way too short. I'm correcting this asap.

peterasorensen avatar

Just pushed a fix. This particular website you posted lunenburghcamber.com doesn't appear to have actual contact on it, but now the scraper doesn't return a error at least.