Japan Contact Scraper avatar

Japan Contact Scraper

Pricing

Pay per event

Go to Apify Store
Japan Contact Scraper

Japan Contact Scraper

Extract emails, Japanese phone numbers (03-, 090-, 0120- formats), and social media links from Japanese company websites. Optimized regex patterns ensure high accuracy with minimal false positives.

Pricing

Pay per event

Rating

5.0

(1)

Developer

kyo kou

kyo kou

Maintained by Community

Actor stats

1

Bookmarked

4

Total users

1

Monthly active users

2 days ago

Last modified

Share

Email & Phone Scraper for Japanese Websites — 日本企業の連絡先を一括抽出

日本企業サイトの問い合わせ先探し、手作業で1サイトずつ調べていませんか?

This Actor crawls Japanese company websites and extracts email addresses, phone numbers (固定電話・携帯・フリーダイヤル), social media profiles, and contact form URLs — all in a single batch run. Built specifically for Japanese B2B lead generation and sales list building.

Quick Start

Run on Apify Console or via API:

apify call your-username/japan-contact-scraper \
--input='{"urls": ["https://example.co.jp"], "maxPagesPerSite": 10}'

Or call the Apify API directly:

curl "https://api.apify.com/v2/acts/your-username~japan-contact-scraper/runs" \
-X POST \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-d '{"urls": ["https://example.co.jp"], "maxPagesPerSite": 10}'

Features

  • Batch URL processing — scrape hundreds of Japanese websites in a single run
  • Email extraction with DNS validation — verifies that the mail domain actually exists, auto-excludes dummy addresses (example.com, test.com, etc.)
  • Japanese phone number parsing — landline (固定電話), mobile (070/080/090), IP phone (050), toll-free / フリーダイヤル (0120/0800), using the phonenumbers-jp library
  • Social media profile detection — YouTube, Instagram, Facebook, GitHub, Reddit (LINE and X/Twitter are not currently supported)
  • Contact form URL detection (optional) — finds Japanese "お問い合わせ" pages by scoring 4 signals: URL path, link text, page title, and form element presence
  • Per-site page limits — control crawl depth per domain to stay within budget
  • Blacklist filtering — regex and domain-based rules to reduce false positives

How It Works

This Actor uses Crawlee with the BeautifulSoup crawler (HTTP-based, no browser). It follows same-domain links up to your configured page limit and extracts contact information from each page's HTML.

What this means in practice:

  • ✅ Fast and lightweight — no headless browser overhead
  • ✅ Respects per-site page limits to avoid excessive crawling
  • ⚠️ JavaScript-rendered content (SPAs, React sites) is not visible to this crawler. If the contact info is loaded dynamically via JS, it won't be extracted.
  • ⚠️ No built-in robots.txt enforcement — please check each site's robots.txt manually if compliance is important for your use case.

Input

FieldTypeDescription
urlsArray (required)Target website URLs(スクレイピング対象のURLリスト)
maxPagesPerSiteIntegerMax pages to crawl per site — default: 10(サイトあたりの最大ページ数)
enableContactFormBooleanEnable contact form URL detection — default: false(お問い合わせフォームURL検出の有効化)

Example Input

{
"urls": [
"https://example.co.jp",
"https://another-company.co.jp"
],
"maxPagesPerSite": 20,
"enableContactForm": true
}

Output

One result per URL in the dataset:

{
"url": "https://example.co.jp",
"emails": ["info@example.co.jp", "sales@example.co.jp"],
"phones": [
{
"number": "0312345678",
"formatted": "03-1234-5678",
"type": "固定電話"
},
{
"number": "09012345678",
"formatted": "090-1234-5678",
"type": "携帯"
}
],
"socials": {
"youtube": ["https://www.youtube.com/@example"],
"facebook": ["https://www.facebook.com/example.japan"]
},
"contact_url": {
"url": "https://example.co.jp/contact/",
"score": 95,
"has_form": true,
"error": null
},
"error": null
}

Note: The type field in phone results uses Japanese labels (固定電話, 携帯, IP電話, フリーダイヤル) as returned by the phonenumbers-jp library.

Supported Japanese Phone Formats

TypeExample
Landline / 固定電話03-1234-5678, 06-1234-5678
Mobile / 携帯電話090-1234-5678, 080-1234-5678, 070-1234-5678
IP Phone / IP電話050-1234-5678
Toll-free / フリーダイヤル0120-123-456, 0800-123-4567

Contact Form Detection — How Scoring Works

When enableContactForm is enabled, each crawled page is scored across 4 dimensions:

SignalExample matchPoints
URL path contains toiawase / contact/otoiawase/, /contact/+15 ~ +25
Link text matches Japanese contact terms「お問い合わせ」「ご相談」+25 ~ +45
Page title contains contact keywords<title>お問い合わせ</title>+30
Page has a real <form> with inputs<form> + <input> + submit button+35

The highest-scoring page is returned as the contact form URL. Pages matching negative patterns (/blog, /faq, /product, etc.) receive a -15 penalty.

Use Cases

  • BtoB lead generation(BtoBリード獲得) — build targeted outreach lists of Japanese companies
  • Sales prospecting(営業リスト作成) — collect contact details for cold outreach campaigns
  • Partner / supplier research(取引先調査) — bulk-collect contact info for potential business partners
  • Market research & competitive analysis(市場調査・競合分析) — gather structured contact data across an industry
  • CRM enrichment — import verified Japanese contact info into your CRM

Limitations

  • No JavaScript rendering — this crawler uses HTTP requests + BeautifulSoup, not a headless browser. Content rendered by JavaScript (React, Vue, Angular SPAs) will not be scraped.
  • No LINE or X (Twitter) detection — these platforms are not currently supported for social profile extraction.
  • Anti-scraping protections — sites with CAPTCHAs, Cloudflare, or aggressive rate limiting may return incomplete results.
  • No timeout configuration — crawl timeouts use Crawlee's defaults.

This tool only extracts publicly available information from website HTML. It does not bypass authentication, access restricted pages, or collect non-public data.

Users are responsible for complying with: