Email Scraper
Pricing
Pay per usage
Email Scraper
It scrapes anything that looks like an email on a website. It applies 9 protocols and gathers raw material that you later need to clean. It's a garbage collector. Gather everything there is, and then pick what you need. Results are exported to kv store in an xlsx file. Give it a go and have fun! 👍
Pricing
Pay per usage
Rating
0.0
(0)
Developer
WEU
Maintained by CommunityActor stats
0
Bookmarked
1
Total users
1
Monthly active users
17 hours ago
Last modified
Categories
Share
Python Email Scraper
This actor specializes in raw scraping of email addresses out of websites. It crawls a website (or a list) in search for anything that looks like an email. It implements 9 different scraping protocols to leave nothing behind. It works as a garbage collector providing strings with the format "something@domain". Very useful when dealing with business projects that require finding email lists for marketing purposes or reaching out to a wide range of audiences.
Included features
- **Flexibility in the scraping parameters to adjust to any circumstances. From shallow to deep scraping, depending on your specific interests and apify budget.
- **Convenient exporting results to an excel file to avoid additional formatting of data. Just download it and you're set!
- **Strong email detection, going from plain html crawl to JavaScript injection. No stone unturned.
- **Resilient to Apify server swap, keeping data across servers changes and resuming from where it left off.
- **Parrallel scraping of several pages at a time, saving time and costs. The amount of parallel scraping can be determined in the input data.
How it works
It basically fishes for emails on websites, page by page, starting at the top level pages, and going down the rabbit hole as deep as you set in the parameters. Everytime it finds emails, it stores them in an excel file in kv store. That way, if you cancel the scrape midway or it shutsdown for any reason, you won't lose your progress. It's just one output file. Easy, compact, ready to use. No additional CSV exporting or formatting. Just the good old excel, widely compatible with every platform there is. An important technical note. The crawler will first scrape all pages found with depth 0, and then it will move on to depth 1, 2, etc, until maximum depth is reached, or maximum emails, or maximum pages per website. The multiple boundaries guarantee that your scrape won't run forever and spending all your credits. Those boundaries also define the type of scrape you want to do.
Getting started
Before starting the scrape you have to fill in the necessary input data. If any value is missing it will fall back to a default value, except for the website addresses that are explicitly required. The level of success of this scraper is directly tied to the input data you provide. If you want a shallow scrape, you need to lower the values of depth, load and post-load timeouts, maximum pages per website and maximum emails to find. Do the opposite for deeper scraping. The parallel scraping feature is better to be kept as high as your Apify credits plan allow it. Higher parallel activity means faster credits usage, but also faster crawl, and if you do the math, faster is always better. You must keep in mind that more tasks in parallel consume more memory and therefore you must set a memory allocation high enough to handle the amount of parallel scraping you're aiming for. It also increases the cost per compute unit, so it must be handled with care. Just to give you a rough idea, for 10 parallel pages being scraped, 1 GB of memory is a manageable threshold for most runs, but the exact allocation will strongly depend on your specific crawling needs.
Also important to handle depth input value carefully. Most websites keep their emails from depth 0 to 2. Going beyond that will multiply your scraping exponentially. Handle with care.