Instagram Lead Extractor
Pricing
from $3.00 / 1,000 results
Instagram Lead Extractor
Discover Instagram profiles from usernames, hashtags, locations, search queries, datasets, or CSV — and extract emails, phones, and social handles from their bios.
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Mukesh Kumar
Maintained by CommunityActor stats
0
Bookmarked
10
Total users
8
Monthly active users
12 days ago
Last modified
Categories
Share
Instagram Bio & Email Extractor
An Apify actor that discovers Instagram profiles from multiple sources and extracts contact info — especially emails — from their bios.
What it does
- Discovers Instagram profiles from any combination of: direct usernames/URLs, hashtags, locations, search queries, an upstream Apify dataset, or a CSV / Google Sheets URL.
- Fetches each profile via Instagram's
web_profile_infoJSON endpoint (the cheapest and most stable public path). - Extracts username, full name, bio, follower/following/post counts, external URL, business category, verification status, and
is_private. - Parses the bio (and optionally the link-in-bio page) for emails, phone numbers (E.164), and social handles for TikTok, YouTube, X, LinkedIn, WhatsApp, Telegram, Threads, Facebook, and Pinterest.
- Emits one row per profile into the default Apify dataset.
What it does NOT do
- Does not scrape private profiles' posts, stories, DMs, or follower lists.
- Does not log in by default. A session cookie is opt-in and required for auth-only modes (see below).
- Does not bypass Instagram's auth walls for protected content.
Discovery modes
| Mode | Input field | Requires session cookie | Notes |
|---|---|---|---|
| Direct usernames | usernames | No | Most reliable. Accepts @handle, username, or full profile URL. |
| Upstream Apify dataset | datasetId + datasetUsernameField | No | Chain from another actor. |
| CSV / Google Sheets | csvUrl + csvUsernameColumn | No | Public CSV URL or docs.google.com/spreadsheets/... (auto-converts to export?format=csv). |
| Search | searchQueries | No | Top accounts per query. Volume small. |
| Hashtags | hashtags | No | Discovers post authors. Cap with maxProfilesPerHashtag. |
| Locations | locations | No | Full IG location URL. Cap with maxProfilesPerLocation. |
| Followers of target | followersOf | Yes | Auth-only — IG does not serve follower lists unauthenticated. Not implemented in v1. |
| Following of target | followingOf | Yes | Same as above. Not implemented in v1. |
| Post engagers | postUrls | Mixed (likers auth-only) | Not implemented in v1. |
Discovery modes can be combined freely — results are merged and deduplicated by username (first-seen wins for discoveredVia / sourceRef).
Output
One row per profile. Always emitted with the full shape — empty arrays or nulls for missing fields.
{"username": "natgeo","fullName": "National Geographic","profileUrl": "https://www.instagram.com/natgeo/","biography": "Experience the world...","externalUrl": "https://natgeo.com","isVerified": true,"isPrivate": false,"isBusinessAccount": true,"businessCategory": "Media","followersCount": 281000000,"followingCount": 132,"postsCount": 28500,"profilePicUrl": "https://...","emails": ["press@natgeo.com"],"phones": ["+12025550100"],"socialHandles": {"tiktok": ["natgeo"], "youtube": ["natgeo"], "x": [],"linkedin": [], "whatsapp": [], "telegram": [],"threads": [], "facebook": [], "pinterest": []},"scrapedAt": "2026-05-23T12:34:56.000Z","discoveredVia": "hashtag","sourceRef": "veganbakeryberlin","contactSource": "profile_page","status": "ok"}
status is one of ok | private | not_found | deactivated | rate_limited | error.
Email extraction
The actor handles common bio obfuscations before matching:
name [at] domain [dot] com→name@domain.comname (at) domain (dot) com→name@domain.comname AT domain DOT com→name@domain.comname@@domain→name@domain
Matched emails are then filtered to drop:
- Image / asset extensions (
.png,.jpg,.webp, …) - Known tracking domains (
sentry.io,wixpress.com,example.com, …) noreply@/no-reply@/donotreply@- Purely numeric local-parts longer than 8 characters (tracking IDs)
Phone numbers are parsed via libphonenumber-js and output in E.164.
External URL scanning
When scrapeExternalUrl: true, the actor follows the profile's external_url and scans the response body for additional contacts. This is SSRF-guarded:
- Rejects private / loopback / link-local IPs (post-DNS).
- Rejects non-
http(s)schemes. - Caps response size at 2 MB.
- 10s total timeout.
- At most 3 redirects, re-validated per hop.
Anti-detection
- Residential proxy required for production scale. Datacenter IPs get challenged within a few requests.
- Session pool with rotation — sessions are retired after rate-limit / login-wall responses.
- Randomised delays between profiles (
minDelayMs/maxDelayMs, default 1500–4500 ms). - Login wall detection by both URL (
/accounts/login/) and body markers (LoginAndSignupPage, etc.). - Cheerio-based crawler by default — no headless browser, much cheaper.
Local development
npm install# place a test inputmkdir -p storage/key_value_stores/defaultecho '{"usernames":["natgeo"],"maxProfiles":1}' > storage/key_value_stores/default/INPUT.json# runnpm start# testnpm test
Local runs use no proxy by default and will hit IG's login wall after a few requests. That's expected — local is for logic, scale-test on the Apify platform with a residential proxy.
Legal & ethical
- This actor scrapes only public profiles. Private profiles return only basic metadata (no bio).
- Profile pictures are stored as URLs only — no binaries.
- You are responsible for compliance with GDPR, CCPA, CAN-SPAM, and local data-protection laws when using extracted contact info for outreach.
- Do not use this actor for harassment, stalking, or targeted abuse.
