Instagram Lead Extractor avatar

Instagram Lead Extractor

Pricing

from $3.00 / 1,000 results

Go to Apify Store
Instagram Lead Extractor

Instagram Lead Extractor

Discover Instagram profiles from usernames, hashtags, locations, search queries, datasets, or CSV — and extract emails, phones, and social handles from their bios.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Mukesh Kumar

Mukesh Kumar

Maintained by Community

Actor stats

0

Bookmarked

10

Total users

8

Monthly active users

12 days ago

Last modified

Share

Instagram Bio & Email Extractor

An Apify actor that discovers Instagram profiles from multiple sources and extracts contact info — especially emails — from their bios.

What it does

  • Discovers Instagram profiles from any combination of: direct usernames/URLs, hashtags, locations, search queries, an upstream Apify dataset, or a CSV / Google Sheets URL.
  • Fetches each profile via Instagram's web_profile_info JSON endpoint (the cheapest and most stable public path).
  • Extracts username, full name, bio, follower/following/post counts, external URL, business category, verification status, and is_private.
  • Parses the bio (and optionally the link-in-bio page) for emails, phone numbers (E.164), and social handles for TikTok, YouTube, X, LinkedIn, WhatsApp, Telegram, Threads, Facebook, and Pinterest.
  • Emits one row per profile into the default Apify dataset.

What it does NOT do

  • Does not scrape private profiles' posts, stories, DMs, or follower lists.
  • Does not log in by default. A session cookie is opt-in and required for auth-only modes (see below).
  • Does not bypass Instagram's auth walls for protected content.

Discovery modes

ModeInput fieldRequires session cookieNotes
Direct usernamesusernamesNoMost reliable. Accepts @handle, username, or full profile URL.
Upstream Apify datasetdatasetId + datasetUsernameFieldNoChain from another actor.
CSV / Google SheetscsvUrl + csvUsernameColumnNoPublic CSV URL or docs.google.com/spreadsheets/... (auto-converts to export?format=csv).
SearchsearchQueriesNoTop accounts per query. Volume small.
HashtagshashtagsNoDiscovers post authors. Cap with maxProfilesPerHashtag.
LocationslocationsNoFull IG location URL. Cap with maxProfilesPerLocation.
Followers of targetfollowersOfYesAuth-only — IG does not serve follower lists unauthenticated. Not implemented in v1.
Following of targetfollowingOfYesSame as above. Not implemented in v1.
Post engagerspostUrlsMixed (likers auth-only)Not implemented in v1.

Discovery modes can be combined freely — results are merged and deduplicated by username (first-seen wins for discoveredVia / sourceRef).

Output

One row per profile. Always emitted with the full shape — empty arrays or nulls for missing fields.

{
"username": "natgeo",
"fullName": "National Geographic",
"profileUrl": "https://www.instagram.com/natgeo/",
"biography": "Experience the world...",
"externalUrl": "https://natgeo.com",
"isVerified": true,
"isPrivate": false,
"isBusinessAccount": true,
"businessCategory": "Media",
"followersCount": 281000000,
"followingCount": 132,
"postsCount": 28500,
"profilePicUrl": "https://...",
"emails": ["press@natgeo.com"],
"phones": ["+12025550100"],
"socialHandles": {
"tiktok": ["natgeo"], "youtube": ["natgeo"], "x": [],
"linkedin": [], "whatsapp": [], "telegram": [],
"threads": [], "facebook": [], "pinterest": []
},
"scrapedAt": "2026-05-23T12:34:56.000Z",
"discoveredVia": "hashtag",
"sourceRef": "veganbakeryberlin",
"contactSource": "profile_page",
"status": "ok"
}

status is one of ok | private | not_found | deactivated | rate_limited | error.

Email extraction

The actor handles common bio obfuscations before matching:

  • name [at] domain [dot] comname@domain.com
  • name (at) domain (dot) comname@domain.com
  • name AT domain DOT comname@domain.com
  • name@@domainname@domain

Matched emails are then filtered to drop:

  • Image / asset extensions (.png, .jpg, .webp, …)
  • Known tracking domains (sentry.io, wixpress.com, example.com, …)
  • noreply@ / no-reply@ / donotreply@
  • Purely numeric local-parts longer than 8 characters (tracking IDs)

Phone numbers are parsed via libphonenumber-js and output in E.164.

External URL scanning

When scrapeExternalUrl: true, the actor follows the profile's external_url and scans the response body for additional contacts. This is SSRF-guarded:

  • Rejects private / loopback / link-local IPs (post-DNS).
  • Rejects non-http(s) schemes.
  • Caps response size at 2 MB.
  • 10s total timeout.
  • At most 3 redirects, re-validated per hop.

Anti-detection

  • Residential proxy required for production scale. Datacenter IPs get challenged within a few requests.
  • Session pool with rotation — sessions are retired after rate-limit / login-wall responses.
  • Randomised delays between profiles (minDelayMs / maxDelayMs, default 1500–4500 ms).
  • Login wall detection by both URL (/accounts/login/) and body markers (LoginAndSignupPage, etc.).
  • Cheerio-based crawler by default — no headless browser, much cheaper.

Local development

npm install
# place a test input
mkdir -p storage/key_value_stores/default
echo '{"usernames":["natgeo"],"maxProfiles":1}' > storage/key_value_stores/default/INPUT.json
# run
npm start
# test
npm test

Local runs use no proxy by default and will hit IG's login wall after a few requests. That's expected — local is for logic, scale-test on the Apify platform with a residential proxy.

  • This actor scrapes only public profiles. Private profiles return only basic metadata (no bio).
  • Profile pictures are stored as URLs only — no binaries.
  • You are responsible for compliance with GDPR, CCPA, CAN-SPAM, and local data-protection laws when using extracted contact info for outreach.
  • Do not use this actor for harassment, stalking, or targeted abuse.