Contact Phone Extractor avatar

Contact Phone Extractor

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Contact Phone Extractor

Contact Phone Extractor

Fast and highly accurate phone number extractor. Automatically crawls into relevant contact and about pages to scrape valid international phone numbers while strictly filtering out faxes, dates, and VAT IDs.

Pricing

from $1.00 / 1,000 results

Rating

5.0

(2)

Developer

CodeScraper

CodeScraper

Maintained by Community

Actor stats

2

Bookmarked

4

Total users

3

Monthly active users

7 days ago

Last modified

Share

๐Ÿ“ž Contact Phone Extractor โ€“ High-Speed B2B Data Engine

This Apify actor extracts business and customer service phone numbers directly from websites with high accuracy and intelligent validation.

It combines contextual filtering, DOM targeting, intelligent contact page discovery, dynamic country detection, and libphonenumber-js validation to identify, normalize, validate, and deduplicate phone numbers while filtering out dates, zip codes, fax numbers, serial numbers, and other false positives.


๐Ÿš€ What It Does

For every website URL provided, the actor extracts:

๐Ÿข Website Overview

  • ๐ŸŒ Input URL
  • ๐Ÿ”— Resolved Source URL
  • ๐Ÿšฆ Extraction Status
  • ๐Ÿ—‚๏ธ Contact Pages Crawled
  • ๐Ÿงฎ Phone Numbers Count

๐Ÿ“ž Individual Phone Data

For each phone number found:

  • ๐Ÿ“ฑ Phone Number
  • ๐Ÿ“ Location Found (header, footer, body)
  • ๐ŸŽฏ Confidence Score
  • ๐Ÿ“ Context Snippet
  • ๐Ÿ”— Source URL
  • ๐Ÿท๏ธ DOM Subsection

โšก It Handles

  • โœ… Multiple website URLs
  • ๐Ÿ”„ Automatic URL normalization
  • โšก High-speed raw HTML scraping
  • ๐Ÿ›ก๏ธ Anti-blocking capabilities
  • ๐Ÿ•ต๏ธ Automatic Contact/About page discovery
  • ๐ŸŒ Dynamic country code detection
  • ๐Ÿ“ž Phone validation using libphonenumber-js
  • ๐Ÿงน Cross-page deduplication
  • ๐Ÿ“Š Context-based confidence scoring
  • ๐Ÿข Header, footer, navigation, and body extraction
  • ๐Ÿšซ Filtering of dates, zip codes, fax numbers, and invalid numeric patterns

๐Ÿง  How It Works

  1. Loads website URLs

  2. Fetches raw HTML using CheerioCrawler with session rotation and retry handling

  3. Scans:

    • Headers
    • Footers
    • Navigation sections
    • Body content
  4. Automatically discovers and visits:

    • Contact pages
    • About pages
    • Customer service pages
  5. Extracts phone candidates from:

    • Visible text
    • tel: links
    • Structured HTML content
  6. Detects country based on website TLD

  7. Validates numbers using libphonenumber-js

  8. Calculates confidence scores from surrounding context

  9. Removes duplicates and low-quality matches

  10. Saves structured data to the Apify Dataset


โš™๏ธ Input Configuration

FieldTypeDescriptionDefault
startUrlsArrayList of website URLs to scrape[]
defaultCountryCodeStringFallback ISO country code"US"
searchSectionsObjectSelect sections to scan{"header":true,"footer":true,"body":false}
deduplicationStrictnessStringDeduplication mode"balanced"
minPhoneLengthIntegerMinimum digits required8
maxPhoneLengthIntegerMaximum digits allowed15
excludePatternsArrayRegex patterns to exclude["^800","^888"]
includeOnlyCountryCodesArrayAllow only specific calling codes[]
confidenceThresholdNumberMinimum confidence score0.5
maxResultsPerUrlIntegerMaximum phones returned per URL5
outputFormatStringOutput format"both"

๐Ÿงฉ Example Input

{
"startUrls": ["https://oberlausitzer-alpakaland.de"],
"defaultCountryCode": "US",
"deduplicationStrictness": "balanced",
"confidenceThreshold": 0.5,
"excludePatterns": ["^800"],
"searchSections": {
"header": true,
"footer": true,
"body": false
}
}

๐Ÿ“Š Example Output

{
"originalInputUrl": "https://oberlausitzer-alpakaland.de",
"source": "https://oberlausitzer-alpakaland.de",
"contactPagesVisited": [
"https://oberlausitzer-alpakaland.de/pages/kontakt",
"https://oberlausitzer-alpakaland.de/policies/contact-information",
"https://oberlausitzer-alpakaland.de/policies/legal-notice"
],
"status": "Found",
"phoneNumbersCount": 2,
"phoneNumbers": [
{
"phoneNumber": "+49 35874 20425",
"formattedVariations": ["035874-20425", "+49 35874 20425"],
"source": "https://oberlausitzer-alpakaland.de/pages/kontakt",
"location": "body",
"subsection": "main-content",
"confidence": 1,
"context": "...alb von 24 Stunden., Telefon: 035874-20425 & 035874-223599, Bitte beacht..."
},
{
"phoneNumber": "+49 35874 223599",
"formattedVariations": [
"035874-223599",
"+49 35874 223599",
"035874223599"
],
"source": "https://oberlausitzer-alpakaland.de/pages/kontakt",
"location": "body",
"subsection": "main-content",
"confidence": 1,
"context": "...den., Telefon: 035874-20425 & 035874-223599, Bitte beachte, dass wir kein..."
}
]
}

If no phone numbers are found:

{
"originalInputUrl": "www.emiconner.com",
"source": "http://www.emiconner.com/",
"status": "No phone numbers found",
"contactPagesVisited": [],
"phoneNumbersCount": 0,
"phoneNumbers": []
}

โŒ Error Handling

If a website cannot be accessed or processed:

{
"originalInputUrl": "http://www.jewelrybyARSA.com/",
"source": "http://www.jewelrybyARSA.com/",
"status": "Failed",
"contactPagesVisited": [],
"error": "Request failed completely (check proxy or rate limits)",
"phoneNumbersCount": 0,
"phoneNumbers": []
}

๐Ÿง  Features

  • ๐Ÿ“ž Accurate phone number extraction
  • โšก High-speed HTML-based scraping
  • ๐Ÿ›ก๏ธ Built-in anti-blocking mechanisms
  • ๐ŸŒ Dynamic country detection
  • ๐Ÿ•ต๏ธ Automatic contact page crawling
  • ๐ŸŽฏ Confidence-based validation
  • ๐Ÿงน Automatic deduplication
  • ๐Ÿข Targeted DOM extraction
  • ๐Ÿ“Š Context-aware scoring
  • ๐Ÿšซ False-positive filtering

๐Ÿ’ก Use Cases

  • B2B Lead Generation
  • CRM Data Enrichment
  • Sales Prospecting
  • Business Directory Building
  • Contact Database Creation
  • Cold Outreach Campaigns
  • Business Intelligence Research

โ“ FAQs

1. Why is it so fast?

The actor downloads and processes raw HTML directly instead of launching a full browser session. Combined with CheerioCrawler, session pooling, and optimized parsing, it delivers high throughput with minimal resource usage.


2. Can it extract phone numbers hidden behind buttons or clicks?

No. Since the actor works with raw HTML, it does not execute JavaScript or interact with page elements. It specializes in extracting hardcoded contact information.


3. Does it filter Fax numbers?

Yes. The contextual scoring engine heavily penalizes numbers associated with terms such as Fax, Fax:, or F: to prevent them from being returned as primary contact numbers.


4. Does it support international phone numbers?

Yes. Dynamic Country Code Detection automatically maps website TLDs (such as .de, .uk, .fr) to their respective countries and validates numbers using libphonenumber-js. A fallback country can also be configured for generic domains.


๐Ÿง‘โ€๐Ÿ’ป Developer Info

Author: codescraper

Email: codescraper011@gmail.com


๐Ÿท๏ธ Tags

phone-scraper ยท contact-extractor ยท lead-generation ยท b2b-data ยท phone-number-extractor ยท data-enrichment ยท sales-intelligence ยท web-scraping