Japan Contact Scraper
Pricing
Pay per event
Japan Contact Scraper
Extract emails, Japanese phone numbers (03-, 090-, 0120- formats), and social media links from Japanese company websites. Optimized regex patterns ensure high accuracy with minimal false positives.
Pricing
Pay per event
Rating
5.0
(1)
Developer
kyo kou
Actor stats
1
Bookmarked
4
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Email & Phone Scraper for Japanese Websites — 日本企業の連絡先を一括抽出
日本企業サイトの問い合わせ先探し、手作業で1サイトずつ調べていませんか?
This Actor crawls Japanese company websites and extracts email addresses, phone numbers (固定電話・携帯・フリーダイヤル), social media profiles, and contact form URLs — all in a single batch run. Built specifically for Japanese B2B lead generation and sales list building.
Quick Start
Run on Apify Console or via API:
apify call your-username/japan-contact-scraper \--input='{"urls": ["https://example.co.jp"], "maxPagesPerSite": 10}'
Or call the Apify API directly:
curl "https://api.apify.com/v2/acts/your-username~japan-contact-scraper/runs" \-X POST \-H "Authorization: Bearer YOUR_API_TOKEN" \-d '{"urls": ["https://example.co.jp"], "maxPagesPerSite": 10}'
Features
- Batch URL processing — scrape hundreds of Japanese websites in a single run
- Email extraction with DNS validation — verifies that the mail domain actually exists, auto-excludes dummy addresses (example.com, test.com, etc.)
- Japanese phone number parsing — landline (固定電話), mobile (070/080/090), IP phone (050), toll-free / フリーダイヤル (0120/0800), using the
phonenumbers-jplibrary - Social media profile detection — YouTube, Instagram, Facebook, GitHub, Reddit (LINE and X/Twitter are not currently supported)
- Contact form URL detection (optional) — finds Japanese "お問い合わせ" pages by scoring 4 signals: URL path, link text, page title, and form element presence
- Per-site page limits — control crawl depth per domain to stay within budget
- Blacklist filtering — regex and domain-based rules to reduce false positives
How It Works
This Actor uses Crawlee with the BeautifulSoup crawler (HTTP-based, no browser). It follows same-domain links up to your configured page limit and extracts contact information from each page's HTML.
What this means in practice:
- ✅ Fast and lightweight — no headless browser overhead
- ✅ Respects per-site page limits to avoid excessive crawling
- ⚠️ JavaScript-rendered content (SPAs, React sites) is not visible to this crawler. If the contact info is loaded dynamically via JS, it won't be extracted.
- ⚠️ No built-in robots.txt enforcement — please check each site's robots.txt manually if compliance is important for your use case.
Input
| Field | Type | Description |
|---|---|---|
urls | Array (required) | Target website URLs(スクレイピング対象のURLリスト) |
maxPagesPerSite | Integer | Max pages to crawl per site — default: 10(サイトあたりの最大ページ数) |
enableContactForm | Boolean | Enable contact form URL detection — default: false(お問い合わせフォームURL検出の有効化) |
Example Input
{"urls": ["https://example.co.jp","https://another-company.co.jp"],"maxPagesPerSite": 20,"enableContactForm": true}
Output
One result per URL in the dataset:
{"url": "https://example.co.jp","emails": ["info@example.co.jp", "sales@example.co.jp"],"phones": [{"number": "0312345678","formatted": "03-1234-5678","type": "固定電話"},{"number": "09012345678","formatted": "090-1234-5678","type": "携帯"}],"socials": {"youtube": ["https://www.youtube.com/@example"],"facebook": ["https://www.facebook.com/example.japan"]},"contact_url": {"url": "https://example.co.jp/contact/","score": 95,"has_form": true,"error": null},"error": null}
Note: The
typefield in phone results uses Japanese labels (固定電話, 携帯, IP電話, フリーダイヤル) as returned by thephonenumbers-jplibrary.
Supported Japanese Phone Formats
| Type | Example |
|---|---|
| Landline / 固定電話 | 03-1234-5678, 06-1234-5678 |
| Mobile / 携帯電話 | 090-1234-5678, 080-1234-5678, 070-1234-5678 |
| IP Phone / IP電話 | 050-1234-5678 |
| Toll-free / フリーダイヤル | 0120-123-456, 0800-123-4567 |
Contact Form Detection — How Scoring Works
When enableContactForm is enabled, each crawled page is scored across 4 dimensions:
| Signal | Example match | Points |
|---|---|---|
URL path contains toiawase / contact | /otoiawase/, /contact/ | +15 ~ +25 |
| Link text matches Japanese contact terms | 「お問い合わせ」「ご相談」 | +25 ~ +45 |
| Page title contains contact keywords | <title>お問い合わせ</title> | +30 |
Page has a real <form> with inputs | <form> + <input> + submit button | +35 |
The highest-scoring page is returned as the contact form URL. Pages matching negative patterns (/blog, /faq, /product, etc.) receive a -15 penalty.
Use Cases
- BtoB lead generation(BtoBリード獲得) — build targeted outreach lists of Japanese companies
- Sales prospecting(営業リスト作成) — collect contact details for cold outreach campaigns
- Partner / supplier research(取引先調査) — bulk-collect contact info for potential business partners
- Market research & competitive analysis(市場調査・競合分析) — gather structured contact data across an industry
- CRM enrichment — import verified Japanese contact info into your CRM
Limitations
- No JavaScript rendering — this crawler uses HTTP requests + BeautifulSoup, not a headless browser. Content rendered by JavaScript (React, Vue, Angular SPAs) will not be scraped.
- No LINE or X (Twitter) detection — these platforms are not currently supported for social profile extraction.
- Anti-scraping protections — sites with CAPTCHAs, Cloudflare, or aggressive rate limiting may return incomplete results.
- No timeout configuration — crawl timeouts use Crawlee's defaults.
Legal Considerations / 法的事項
This tool only extracts publicly available information from website HTML. It does not bypass authentication, access restricted pages, or collect non-public data.
Users are responsible for complying with:
- Each website's terms of service and robots.txt
- Japan's Act on the Protection of Personal Information(個人情報保護法)
- Japan's Unauthorized Computer Access Act(不正アクセス禁止法)
- All other applicable laws in your jurisdiction