Japan Contact Scraper
Pricing
Pay per event
Japan Contact Scraper
Extract emails, Japanese phone numbers (03-, 090-, 0120- formats), and social media links from Japanese company websites. Optimized regex patterns ensure high accuracy with minimal false positives.
Pricing
Pay per event
Rating
5.0
(1)
Developer
kyo kou
Maintained by CommunityActor stats
1
Bookmarked
6
Total users
0
Monthly active users
2 months ago
Last modified
Categories
Share
Email & Phone Scraper for Japanese Websites — 日本企業の連絡先を一括抽出
日本企業サイトの問い合わせ先探し、手作業で1サイトずつ調べていませんか?
This Actor crawls Japanese company websites and extracts email addresses, phone numbers (固定電話・携帯・フリーダイヤル), social media profiles, and contact form URLs — all in a single batch run. Built specifically for Japanese B2B lead generation and sales list building.
Quick Start
Run on Apify Console or via API:
apify call your-username/japan-contact-scraper \--input='{"urls": ["https://example.co.jp"], "maxPagesPerSite": 10}'
Or call the Apify API directly:
curl "https://api.apify.com/v2/acts/your-username~japan-contact-scraper/runs" \-X POST \-H "Authorization: Bearer YOUR_API_TOKEN" \-d '{"urls": ["https://example.co.jp"], "maxPagesPerSite": 10}'
Features
- Batch URL processing — scrape hundreds of Japanese websites in a single run
- Email extraction with DNS validation — verifies that the mail domain actually exists, auto-excludes dummy addresses (example.com, test.com, etc.)
- Japanese phone number parsing — landline (固定電話), mobile (070/080/090), IP phone (050), toll-free / フリーダイヤル (0120/0800), using the
phonenumbers-jplibrary - Social media profile detection — YouTube, Instagram, Facebook, GitHub, Reddit (LINE and X/Twitter are not currently supported)
- Contact form URL detection (optional) — finds Japanese "お問い合わせ" pages by scoring 4 signals: URL path, link text, page title, and form element presence
- Per-site page limits — control crawl depth per domain to stay within budget
- Blacklist filtering — regex and domain-based rules to reduce false positives
How It Works
This Actor uses Crawlee with the BeautifulSoup crawler (HTTP-based, no browser). It follows same-domain links up to your configured page limit and extracts contact information from each page's HTML.
What this means in practice:
- ✅ Fast and lightweight — no headless browser overhead
- ✅ Respects per-site page limits to avoid excessive crawling
- ⚠️ JavaScript-rendered content (SPAs, React sites) is not visible to this crawler. If the contact info is loaded dynamically via JS, it won't be extracted.
- ⚠️ No built-in robots.txt enforcement — please check each site's robots.txt manually if compliance is important for your use case.
Input
| Field | Type | Description |
|---|---|---|
urls | Array (required) | Target website URLs(スクレイピング対象のURLリスト) |
maxPagesPerSite | Integer | Max pages to crawl per site — default: 10(サイトあたりの最大ページ数) |
enableContactForm | Boolean | Enable contact form URL detection — default: false(お問い合わせフォームURL検出の有効化) |
Example Input
{"urls": ["https://example.co.jp","https://another-company.co.jp"],"maxPagesPerSite": 20,"enableContactForm": true}
Output
One result per URL in the dataset:
{"url": "https://example.co.jp","emails": ["info@example.co.jp", "sales@example.co.jp"],"phones": [{"number": "0312345678","formatted": "03-1234-5678","type": "固定電話"},{"number": "09012345678","formatted": "090-1234-5678","type": "携帯"}],"socials": {"youtube": ["https://www.youtube.com/@example"],"facebook": ["https://www.facebook.com/example.japan"]},"contact_url": {"url": "https://example.co.jp/contact/","score": 95,"has_form": true,"error": null},"error": null}
Note: The
typefield in phone results uses Japanese labels (固定電話, 携帯, IP電話, フリーダイヤル) as returned by thephonenumbers-jplibrary.
Supported Japanese Phone Formats
| Type | Example |
|---|---|
| Landline / 固定電話 | 03-1234-5678, 06-1234-5678 |
| Mobile / 携帯電話 | 090-1234-5678, 080-1234-5678, 070-1234-5678 |
| IP Phone / IP電話 | 050-1234-5678 |
| Toll-free / フリーダイヤル | 0120-123-456, 0800-123-4567 |
Contact Form Detection — How Scoring Works
When enableContactForm is enabled, each crawled page is scored across 4 dimensions:
| Signal | Example match | Points |
|---|---|---|
URL path contains toiawase / contact | /otoiawase/, /contact/ | +15 ~ +25 |
| Link text matches Japanese contact terms | 「お問い合わせ」「ご相談」 | +25 ~ +45 |
| Page title contains contact keywords | <title>お問い合わせ</title> | +30 |
Page has a real <form> with inputs | <form> + <input> + submit button | +35 |
The highest-scoring page is returned as the contact form URL. Pages matching negative patterns (/blog, /faq, /product, etc.) receive a -15 penalty.
Use Cases
- BtoB lead generation(BtoBリード獲得) — build targeted outreach lists of Japanese companies
- Sales prospecting(営業リスト作成) — collect contact details for cold outreach campaigns
- Partner / supplier research(取引先調査) — bulk-collect contact info for potential business partners
- Market research & competitive analysis(市場調査・競合分析) — gather structured contact data across an industry
- CRM enrichment — import verified Japanese contact info into your CRM
Limitations
- No JavaScript rendering — this crawler uses HTTP requests + BeautifulSoup, not a headless browser. Content rendered by JavaScript (React, Vue, Angular SPAs) will not be scraped.
- No LINE or X (Twitter) detection — these platforms are not currently supported for social profile extraction.
- Anti-scraping protections — sites with CAPTCHAs, Cloudflare, or aggressive rate limiting may return incomplete results.
- No timeout configuration — crawl timeouts use Crawlee's defaults.
Legal Considerations / 法的事項
This tool only extracts publicly available information from website HTML. It does not bypass authentication, access restricted pages, or collect non-public data.
Users are responsible for complying with:
- Each website's terms of service and robots.txt
- Japan's Act on the Protection of Personal Information(個人情報保護法)
- Japan's Unauthorized Computer Access Act(不正アクセス禁止法)
- All other applicable laws in your jurisdiction