Website Email, Phone Number, Social Links & Link Scraper avatar

Website Email, Phone Number, Social Links & Link Scraper

Pricing

Pay per usage

Go to Apify Store
Website Email, Phone Number, Social Links & Link Scraper

Website Email, Phone Number, Social Links & Link Scraper

Scrape emails, phone numbers, social links, internal links, external links, images, and files from websites page-wise with low-cost HTTP crawling.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Anas Nadeem

Anas Nadeem

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Low-cost Apify Actor to scrape websites page-wise and extract:

  • emails
  • phone numbers
  • social links
  • internal/external links
  • image URLs
  • file URLs (PDF, DOCX, CSV, ZIP, etc.)

It supports two output modes:

  • full - one dataset item per crawled page with complete extracted data
  • emails_only - one dataset item per unique email (compatible with common email-only workflows)

Each run also emits:

  • one seed_summary row per input seed URL
  • one run_summary row with global totals

Why this Actor

  • HTTP-first crawling (crawlerType: "http") for cheaper runs
  • optional browser mode (crawlerType: "browser") for JS-heavy pages
  • shallow crawling with maxDepth and maxPages to keep spend predictable
  • include/exclude URL globs for crawl control

Input

Core fields:

  • startUrls (required)
  • mode (full | emails_only)
  • crawlerType (http | browser)
  • maxPages, maxDepth
  • sameDomainOnly, includeSubdomains
  • includeUrlGlobs, excludeUrlGlobs
  • extractEmails, extractPhones, extractSocial, extractImages, extractFiles, extractLinks

See .actor/input_schema.json for the full schema.

Output examples

Full mode row

{
"recordType": "page",
"seedUrl": "https://example.com/",
"pageUrl": "https://example.com/contact",
"depth": 1,
"statusCode": 200,
"title": "Contact",
"emails": ["hello@example.com"],
"phoneNumbers": ["+12025550123"],
"socialLinks": {
"linkedin": [],
"x": [],
"facebook": [],
"instagram": [],
"youtube": [],
"tiktok": [],
"threads": [],
"telegram": [],
"whatsapp": [],
"discord": [],
"pinterest": [],
"reddit": [],
"github": []
},
"internalLinks": [],
"externalLinks": [],
"images": [],
"files": [],
"counts": {
"emails": 1,
"phoneNumbers": 1,
"socialLinks": 0,
"internalLinks": 0,
"externalLinks": 0,
"images": 0,
"files": 0
}
}

Emails-only row

{
"recordType": "email",
"seedUrl": "https://example.com/",
"url": "https://example.com/contact",
"email": "hello@example.com",
"depth": 1,
"statusCode": 200
}
### Seed summary row
```json
{
"recordType": "seed_summary",
"seedUrl": "https://example.com/",
"mode": "full",
"crawlerType": "http",
"pagesCrawled": 14,
"failedRequests": 1,
"uniqueEmails": 6,
"statusCodeHistogram": {
"200": 13,
"404": 1
},
"totals": {
"emails": 9,
"phoneNumbers": 4,
"socialLinks": 11,
"internalLinks": 173,
"externalLinks": 42,
"images": 88,
"files": 7
}
}

Run summary row

{
"recordType": "run_summary",
"mode": "full",
"crawlerType": "http",
"seedsTotal": 3,
"pagesCrawled": 39,
"failedRequests": 2,
"uniqueEmails": 15,
"totals": {
"emails": 24,
"phoneNumbers": 11,
"socialLinks": 29,
"internalLinks": 513,
"externalLinks": 126,
"images": 244,
"files": 19
},
"durationMs": 5841
}
## Local development
```bash
npm install
npm run build
npm run dev
npm test

Notes

  • Invalid start URLs are skipped with a warning.
  • 4xx/5xx pages may produce no data but do not crash the whole run.
  • Use sameDomainOnly: true for cost-efficient, controlled crawls.