Pricing

Pay per usage

Email Extractor Pro — Bulk Website Emails, No Hunter.io Cap

Email lists in 2 min — emails + page title + URL + role hint as CSV. No Hunter.io cap. 201 runs · 14 users · 6 u30d · 98% success. From author of trustpilot-review-scraper (972r). B2B prospecting + sales. blog.spinov.online · dev.to/0012303

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

package.json

{
  "name": "email-extractor",
  "version": "1.0.0",
  "main": "src/main.js",
  "scripts": {
    "start": "node src/main.js"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "description": "",
  "dependencies": {
    "apify": "^3.7.0",
    "crawlee": "^3.16.0"
  },
  "type": "module"
}

.actor/Dockerfile

FROM apify/actor-node:22

COPY package*.json ./
RUN npm --quiet set progress=false \
    && npm install --omit=dev --omit=optional

COPY . ./

.actor/actor.json

{
    "actorSpecification": 1,
    "name": "email-extractor-pro",
    "title": "Email Extractor — Find Emails & Phones from Websites",
    "description": "Building a lead list? Extract emails, phones, and social links from any website automatically. Crawls contact/about pages, filters spam emails, finds mailto: links. Process 100+ domains per run. For sales prospecting and outreach.",
    "version": "1.0",
    "seoTitle": "Email extractor from website — bulk email finder",
    "seoDescription": "Extract emails, phones, and social links from any website. Auto-crawls contact pages, filters spam. Process 100+ domains per run. For sales prospecting.",
    "input": "./input_schema.json",
    "dockerfile": "./Dockerfile"
}

.actor/input_schema.json

{
    "title": "Email & Contact Extractor Input",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "urls": {
            "title": "Website URLs",
            "type": "array",
            "description": "List of website URLs to extract contacts from",
            "editor": "stringList",
            "default": [],
            "prefill": ["https://example.com"]
        },
        "maxPagesPerDomain": {
            "title": "Max Pages Per Domain",
            "type": "integer",
            "description": "How many pages to crawl per website",
            "default": 20,
            "minimum": 1,
            "maximum": 100
        },
        "maxDepth": {
            "title": "Crawl Depth",
            "type": "integer",
            "description": "How deep to follow links (0 = only provided URLs)",
            "default": 2,
            "minimum": 0,
            "maximum": 5
        },
        "includePhones": {
            "title": "Include Phone Numbers",
            "type": "boolean",
            "description": "Whether to extract phone numbers found on the pages",
            "default": true
        },
        "includeSocialLinks": {
            "title": "Include Social Media Links",
            "type": "boolean",
            "description": "Whether to extract links to social media profiles (Facebook, Twitter, LinkedIn, etc.)",
            "default": true
        },
        "deduplicateEmails": {
            "title": "Remove Duplicate Emails",
            "type": "boolean",
            "description": "Whether to remove duplicate email addresses from the results",
            "default": true
        }
    },
    "required": ["urls"]
}

src/main.js

1import { Actor } from 'apify';
2import { CheerioCrawler } from 'crawlee';
3
4await Actor.init();
5
6const input = await Actor.getInput() ?? {};
7
8const {
9    urls = [],
10    maxPagesPerDomain = 20,
11    maxDepth = 2,
12    includePhones = true,
13    includeSocialLinks = true,
14    deduplicateEmails = true,
15} = input;
16
17const allEmails = new Map();
18const allPhones = new Set();
19const allSocials = new Map();
20
21// Email regex - comprehensive pattern
22const emailRegex = /[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}/g;
23
24// Phone regex - requires separators or international prefix to avoid matching plain IDs
25const phoneRegex = /\+[1-9]\d{0,2}[\s\-.]\(?\d{2,4}\)?[\s\-.]?\d{3,4}[\s\-.]?\d{3,4}|\(\d{3}\)[\s\-.]?\d{3}[\s\-.]?\d{4}|\b\d{3}[\-.]\d{3}[\-.]\d{4}\b/g;
26
27// Social media patterns
28const socialPatterns = {
29    linkedin: /(?:https?:\/\/)?(?:www\.)?linkedin\.com\/(?:in|company)\/[a-zA-Z0-9\-_.]+/gi,
30    twitter: /(?:https?:\/\/)?(?:www\.)?(?:twitter\.com|x\.com)\/[a-zA-Z0-9_]+/gi,
31    facebook: /(?:https?:\/\/)?(?:www\.)?facebook\.com\/[a-zA-Z0-9.\-]+/gi,
32    instagram: /(?:https?:\/\/)?(?:www\.)?instagram\.com\/[a-zA-Z0-9_.]+/gi,
33    youtube: /(?:https?:\/\/)?(?:www\.)?youtube\.com\/(?:@|channel\/|c\/)[a-zA-Z0-9_\-]+/gi,
34    github: /(?:https?:\/\/)?(?:www\.)?github\.com\/[a-zA-Z0-9\-]+/gi,
35    tiktok: /(?:https?:\/\/)?(?:www\.)?tiktok\.com\/@[a-zA-Z0-9_.]+/gi,
36};
37
38// Common spam/junk email patterns to filter out
39const junkEmailPatterns = [
40    /noreply@/i, /no-reply@/i, /donotreply@/i,
41    /example\.com$/i, /test\.com$/i, /localhost$/i,
42    /sentry\.io$/i, /wixpress\.com$/i, /wordpress\.com$/i,
43    /\.png$/i, /\.jpg$/i, /\.gif$/i, /\.svg$/i,
44    /@2x\./i, /@3x\./i,
45];
46
47function isValidEmail(email) {
48    if (email.length > 100) return false;
49    for (const pattern of junkEmailPatterns) {
50        if (pattern.test(email)) return false;
51    }
52    return true;
53}
54
55function cleanPhone(phone) {
56    const digits = phone.replace(/\D/g, '');
57    if (digits.length < 7 || digits.length > 15) return null;
58    return phone.trim();
59}
60
61const crawler = new CheerioCrawler({
62    maxRequestsPerCrawl: urls.length * maxPagesPerDomain,
63    maxConcurrency: 10,
64    requestHandlerTimeoutSecs: 30,
65
66    async requestHandler({ $, request, log, enqueueLinks }) {
67        const { depth = 0, sourceDomain } = request.userData;
68        log.info(`[depth=${depth}] Scanning: ${request.url}`);
69
70        const html = $.html();
71        const text = $('body').text();
72        const domain = new URL(request.url).hostname;
73
74        // Extract emails from HTML and text
75        const htmlEmails = html.match(emailRegex) || [];
76        const textEmails = text.match(emailRegex) || [];
77        const allFound = [...new Set([...htmlEmails, ...textEmails])];
78
79        for (const email of allFound) {
80            const cleanEmail = email.toLowerCase().trim();
81            if (isValidEmail(cleanEmail) && !allEmails.has(cleanEmail)) {
82                allEmails.set(cleanEmail, {
83                    email: cleanEmail,
84                    source: request.url,
85                    domain,
86                });
87            }
88        }
89
90        // Extract mailto: links (most reliable)
91        $('a[href^="mailto:"]').each((i, el) => {
92            const href = $(el).attr('href');
93            const email = href.replace('mailto:', '').split('?')[0].toLowerCase().trim();
94            if (email && isValidEmail(email) && !allEmails.has(email)) {
95                allEmails.set(email, {
96                    email,
97                    source: request.url,
98                    domain,
99                    fromMailto: true,
100                });
101            }
102        });
103
104        // Extract phones
105        if (includePhones) {
106            // From tel: links
107            $('a[href^="tel:"]').each((i, el) => {
108                const phone = $(el).attr('href').replace('tel:', '').trim();
109                if (phone) allPhones.add(phone);
110            });
111
112            // From text
113            const foundPhones = text.match(phoneRegex) || [];
114            for (const phone of foundPhones) {
115                const cleaned = cleanPhone(phone);
116                if (cleaned) allPhones.add(cleaned);
117            }
118        }
119
120        // Extract social links
121        if (includeSocialLinks) {
122            for (const [platform, regex] of Object.entries(socialPatterns)) {
123                const matches = html.match(regex) || [];
124                for (const match of matches) {
125                    const key = `${platform}:${match.toLowerCase()}`;
126                    if (!allSocials.has(key)) {
127                        allSocials.set(key, {
128                            platform,
129                            url: match,
130                            foundOn: request.url,
131                        });
132                    }
133                }
134            }
135        }
136
137        // Crawl deeper on the same domain
138        if (depth < maxDepth) {
139            const targetDomain = sourceDomain || domain;
140            await enqueueLinks({
141                strategy: 'same-domain',
142                userData: { depth: depth + 1, sourceDomain: targetDomain },
143                transformRequestFunction: (req) => {
144                    // Only follow links on same domain
145                    try {
146                        const linkDomain = new URL(req.url).hostname;
147                        if (linkDomain !== targetDomain) return false;
148                    } catch {
149                        return false;
150                    }
151                    // Prioritize contact/about pages
152                    const priorityPages = /contact|about|team|imprint|impressum|privacy|legal/i;
153                    if (priorityPages.test(req.url)) {
154                        req.userData.priority = 1;
155                    }
156                    return req;
157                },
158            });
159        }
160    },
161});
162
163// Build request queue
164const requests = urls.map(url => ({
165    url: url.startsWith('http') ? url : `https://${url}`,
166    userData: { depth: 0 },
167}));
168
169await crawler.addRequests(requests);
170await crawler.run();
171
172// Push results
173const emailResults = [...allEmails.values()];
174const phoneResults = [...allPhones].map(phone => ({ phone }));
175const socialResults = [...allSocials.values()];
176
177if (emailResults.length > 0) {
178    // Push each email as a separate dataset entry with associated contacts
179    for (const emailData of emailResults) {
180        await Actor.pushData({
181            ...emailData,
182            phones: includePhones ? [...allPhones] : undefined,
183            socialLinks: includeSocialLinks ? socialResults.filter(s => s.foundOn === emailData.source) : undefined,
184            scrapedAt: new Date().toISOString(),
185        });
186    }
187} else {
188    // No emails found — still push phones and social links as results
189    if (phoneResults.length > 0 || socialResults.length > 0) {
190        await Actor.pushData({
191            email: null,
192            message: 'No emails found on the provided URLs',
193            phones: includePhones ? [...allPhones] : [],
194            socialLinks: includeSocialLinks ? socialResults : [],
195            urlsScanned: urls.length,
196            scrapedAt: new Date().toISOString(),
197        });
198    } else {
199        await Actor.pushData({
200            email: null,
201            message: 'No contact information found on the provided URLs',
202            urlsScanned: urls.length,
203            scrapedAt: new Date().toISOString(),
204        });
205    }
206}
207
208console.log(`\nExtraction complete!`);
209console.log(`Emails found: ${emailResults.length}`);
210console.log(`Phones found: ${phoneResults.length}`);
211console.log(`Social links found: ${socialResults.length}`);
212
213await Actor.exit();

Trustpilot Review Scraper — Unlimited Reviews, Bypass 200 Limit

knotless_cadence/trustpilot-review-scraper

Trustpilot reviews → CSV/JSON/Excel in 2min. 972 runs · 797/30d · 100% success · bypasses 200-review cap. 9 fields (stars, text, author, date, lang, company, URL, headline, verified). BI, competitor research, lead enrichment. blog.spinov.online · dev.to/0012303 · spinov001@gmail.com

Alex

Email Scraper

thodor/apify-email-scraper-tool

Email scraper & extractor tool to pull emails from any website or domain in bulk. Hunter.io alternative.

Thodor

hunter.io

canadesk/hunter-io

Search for emails, and enrich profiles with Hunter!

Canadesk Support

320

Reddit Scraper Pro — Posts, Comments, Subreddits, No API Key

knotless_cadence/reddit-discussion-scraper

Reddit scraper via public JSON — posts + comments, no login. 20 fields/post (score, ratio, flair, NSFW). CSV/JSON. 101 runs · 6 users · u30d=2 · 27/30d. Trend research + LLM training data. blog.spinov.online · dev.to/0012303 · spinov001@gmail.com

Alex

Google News Scraper — Fast Headlines & Sources [No API Key]

knotless_cadence/google-news-scraper

Monitor Google News fast. No API, no RSS limits, no blocks. Titles, dates, snippets, sources → CSV. 75 lifetime runs · 100% 30d success · u30d=3, u7d=1 · 8 paying users. dev.to/0012303 (Proxy-Seller 2320w paid) · blog.spinov.online · spinov001@gmail.com

Alex

Bluesky Scraper — Posts, Followers & Profiles [No API Limits]

knotless_cadence/bluesky-scraper

Bluesky posts, profiles & feeds in CSV in 2 min — no API waitlist, no rate limits, no bans. 44 runs · fresh u7d signal · 100% 30d success. Text/images/likes/reposts/profile metadata. Post-Twitter audience tracking + creator discovery + brand listening. dev.to/0012303 · blog.spinov.online

Alex

📧 Website Email Extractor — Bulk Contact Scraper

nexgendata/website-email-extractor

Extract emails, phone numbers & social profiles from any website. Crawls contact/about pages automatically. Hunter.io alternative for lead generation.

NexGenData

Glassdoor Scraper — Reviews, Salaries, CSV, No Login Required

knotless_cadence/glassdoor-reviews-scraper

Glassdoor reviews + salary in CSV/JSON in 5 min — no coding, no login, no rate-limits. 59 lifetime runs · 5 paying users · u30d=1 active. Ratings/pros-cons/titles/dates/salary schema. Competitive intel + recruiter outreach + comp planning. dev.to/0012303 · blog.spinov.online

Alex

Social Profiles — Bio, Followers, Posts in CSV, Bulk

knotless_cadence/social-profile-scraper

Social profile data CSV/JSON — username, bio, followers, following, posts. Same schema LinkedIn/GitHub/Reddit. 52 lifetime runs · 9 users · 5 active 30d · 100% success rate. B2B prospecting/ABM/recruiter sourcing. dev.to/0012303 · blog.spinov.online

Alex

Walmart Reviews Scraper — Product Reviews to CSV/JSON in 2 min

knotless_cadence/walmart-reviews-scraper

25 runs / u7d=1 fresh signal. Backed by 971-run Trustpilot flagship + 32-actor portfolio (2190 lifetime runs). Walmart reviews → CSV/JSON. Bypasses 100-review UI cap. 17 fields: stars, text, author, date, helpful, images. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex