Social Profiles — Bio, Followers, Posts in CSV, Bulk avatar

Social Profiles — Bio, Followers, Posts in CSV, Bulk

Pricing

Pay per usage

Go to Apify Store
Social Profiles — Bio, Followers, Posts in CSV, Bulk

Social Profiles — Bio, Followers, Posts in CSV, Bulk

Social profile data CSV/JSON — username, bio, followers, following, posts. Same schema LinkedIn/GitHub/Reddit. 49 lifetime runs · 9 users · 5 active 30d · 100% success (44/44). B2B prospecting/ABM/recruiter sourcing. dev.to/0012303 · blog.spinov.online · t.me/scraping_ai

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Alex

Maintained by Community

Actor stats

0

Bookmarked

9

Total users

5

Monthly active users

2 days ago

Last modified

Share

Social Profile Scraper — Bio + Avatar + Public Meta Across 13 Platforms

Turn YOUR list of mixed social URLs (GitHub + LinkedIn + Twitter/X + YouTube + Reddit + Medium + Dev.to + Bluesky + Mastodon + Instagram + TikTok + Threads + Facebook) into one clean JSON table — display name, bio, avatar, follower count, username — without writing 13 different scrapers or hitting 13 different rate limits.

The actor reads each URL with a single Cheerio fetch and extracts Open Graph + Twitter Card + JSON-LD + page-title meta tags into a unified profile schema. Each URL gets its own dataset record; one bad URL does not kill the batch. Auto-detects platform from hostname.

Who buys this actor

  • SDR / lead-gen teams enriching a list of prospect LinkedIn URLs with their GitHub/Dev.to/Medium activity — technical buyers reveal themselves through public dev presence.
  • VC / investor-research analysts pulling founder footprint across multiple platforms into one row per person for deal-review memos.
  • Talent / recruiting vetting candidate portfolios — GitHub follower count + Dev.to bio + LinkedIn headline in one scrape, instead of 3 separate tools.
  • Influencer-marketing teams comparing the same creator's public reach on Instagram vs TikTok vs YouTube vs Bluesky before signing a deal.
  • Brand / reputation teams monitoring the social graph of their executives — early signal when an exec is active on Bluesky but quiet on Twitter.
  • CRM-enrichment pipelines filling the "social handles" section of a HubSpot / Salesforce contact at import time.

Why this over the obvious alternatives

ConcernHow this actor handles it
"Clearbit / Apollo / ZoomInfo already do social enrichment."They do — at $10K+/year seat licenses, with stale data and limited to business platforms (no Bluesky, no Mastodon, no Threads). This actor runs on-demand at Apify rates, including the newer platforms.
"Why not 13 different platform SDKs?"Because each has different auth, rate limits, and OAuth review. This actor fetches the public unauth page once and reads the meta tags every platform exposes.
"Accuracy on platforms without public APIs (LinkedIn, Instagram)?"We do not bypass login walls or scrape authenticated pages. For LinkedIn / Instagram we read whatever Open Graph + JSON-LD the public unauth response exposes. You get display name + bio + avatar — not private fields. Full-detail LinkedIn requires Sales Navigator API.
"Follower counts come back as strings ('210K')?"Yes — that's what most platforms render in their Open Graph / page text. Convert to int yourself with the K/M/B suffix parser; the raw string preserves what the platform actually shows.
"Deleted / private / 404 profiles — does one bad URL kill the batch?"No. Crawlee/Cheerio handles each URL independently; failed fetches are logged and the next URL continues.
"Anti-bot on GitHub / LinkedIn / Instagram?"The actor sends a rotating desktop User-Agent (3 UAs) and runs at maxConcurrency: 3. For LinkedIn / Instagram / Facebook you will often get a login-wall response instead of the real profile — that's expected and the OG/Twitter Card meta tags returned by the login page usually still yield displayName + bio. Note: this version does NOT wire up Actor.createProxyConfiguration() — even if you have Apify residential proxy on your account, the CheerioCrawler in this actor will not use it. For proxy-routed requests against LinkedIn, request a custom build (see Custom scraping).

Input

{
"profileUrls": [
"https://github.com/torvalds",
"https://dev.to/ben",
"https://bsky.app/profile/jay.bsky.team",
"https://www.linkedin.com/in/patrick-collison/",
"https://www.youtube.com/@veritasium",
"https://twitter.com/paulg"
]
}
  • profileUrls (array, required) — list of full URLs, any mix of supported platforms.

Platforms auto-detected from hostname: GitHub, Twitter/X, LinkedIn, Instagram, Facebook, YouTube, TikTok, Reddit, Medium, Dev.to, Threads, Bluesky, Mastodon (incl. Fosstodon). Unknown hostnames get platform: "unknown" and still attempt OG-meta extraction.

Output schema (per URL)

The actor pushes one record per URL with these fields (real, verified against src/main.js):

{
"url": "https://github.com/torvalds",
"platform": "github",
"username": "torvalds",
"displayName": "Linus Torvalds · GitHub",
"bio": "Linux kernel developer",
"avatar": "https://avatars.githubusercontent.com/u/1024025",
"followers": "210K",
"following": "0",
"postsCount": "35",
"siteName": "GitHub",
"twitterHandle": null,
"website": null,
"meta": {
"ogTitle": "torvalds (Linus Torvalds) · GitHub",
"ogDescription": "Linux kernel developer",
"twitterTitle": "",
"twitterDescription": "",
"pageTitle": "torvalds (Linus Torvalds) · GitHub",
"metaDescription": "Linus Torvalds has 35 repositories available..."
},
"scrapedAt": "2026-04-29T12:30:00.000Z"
}

Top-level fields: url, platform, username, displayName, bio, avatar, followers, following, postsCount, siteName, scrapedAt (always present) + website, twitterHandle (present only when extracted). Nested meta object always present with 6 fields: ogTitle, ogDescription, twitterTitle, twitterDescription, pageTitle, metaDescription (truncated to 300 chars). Missing values are null rather than dropped to keep DataFrame schema stable.

followers / following / postsCount are strings — the actor extracts the human-readable count from page text or meta tags and does not parse '210K' into 210000 for you. Convert downstream with a simple suffix parser if needed.

Python copy-paste — enrich a LinkedIn export with GitHub signal

Given LinkedIn URLs from a Sales-Nav export, scrape + cross-reference with GitHub handles guessed from display names:

from apify_client import ApifyClient
import re
client = ApifyClient("<YOUR_APIFY_TOKEN>")
linkedin_urls = [line.strip() for line in open("prospects_linkedin.txt")]
def slugify(name: str) -> str:
return re.sub(r"[^a-z0-9]+", "-", name.lower()).strip("-")
run = client.actor("knotless_cadence/social-profile-scraper").call(run_input={
"profileUrls": linkedin_urls,
})
li = {r["url"]: r for r in client.dataset(run["defaultDatasetId"]).iterate_items()}
gh_guesses = [
f"https://github.com/{slugify(r['displayName'])}"
for r in li.values() if r.get("displayName")
]
run2 = client.actor("knotless_cadence/social-profile-scraper").call(run_input={
"profileUrls": gh_guesses,
})
def parse_followers(s):
if not s: return 0
s = s.replace(",", "").strip().upper()
m = re.match(r"([\d.]+)([KMB]?)", s)
if not m: return 0
n = float(m.group(1))
return int(n * {"K": 1_000, "M": 1_000_000, "B": 1_000_000_000}.get(m.group(2), 1))
gh = [r for r in client.dataset(run2["defaultDatasetId"]).iterate_items()
if r.get("displayName") and parse_followers(r.get("followers")) > 50]
print(f"Technical-buyer signal: {len(gh)} of {len(linkedin_urls)} prospects "
f"have an active GitHub (>50 followers).")
for r in gh:
print(f" {r['displayName']}{r['url']} ({r.get('followers')})")

MCP / LLM-agent use

Wrap as a single tool for agents that need "get public meta about this person":

tools = [{
"name": "lookup_social_profile",
"description": "Given a social profile URL on a major platform, return display name, bio, avatar, follower count.",
"input_schema": {
"type": "object",
"properties": {"url": {"type": "string"}},
"required": ["url"],
},
}]

The agent enriches names mentioned in conversation ("tell me about @paulg") into canonical profile data for its reasoning step.

Frequent questions

1. "Will LinkedIn / Instagram throw a login wall?" Often yes — both gate most public profile pages behind a sign-in for unauthenticated requests. The fetch returns the login-wall HTML; OG/Twitter Card meta tags are usually still present, so you may still get displayName and bio even when followers is empty. For higher success rates, enable Apify residential proxy.

2. "Does it follow Bluesky custom-domain handles (alice.dev)?" The actor reads whatever Open Graph the URL serves. https://bsky.app/profile/alice.dev works because bsky.app returns proper OG. For native AT-Protocol profile data (DID, full bio, exact follower count), use the dedicated Bluesky Scraper actor.

3. "How fresh is the data?" Scraped live per run, no cache. For change tracking, snapshot to your DB and diff.

4. "Mastodon has 1000+ servers. Do I need to specify the instance?" Pass the full URL with instance hostname (https://fosstodon.org/@user). The actor reads OG meta from whichever Mastodon server hosts the account — there is no global username registry to guess from.

5. "Follower counts on TikTok come back as '1.2M' strings." Yes — see the suffix parser snippet above.

6. "I need emails too." Different actor: Email Extractor Pro pulls emails from websites and chains cleanly after this one (social profile → website → email).

Lead-enrichment toolkit (companion actors)

StepToolPurpose
1Google Maps Scraper ProFind businesses by category + location
2Social Profile Scraper (this)Get social presence for each result
3Email Extractor ProPull contact email from website
4Website Tech-Stack DetectorQualify by tech stack (Next.js, Shopify, etc.)
5Trustpilot Review ScraperBonus: review-volume signal of business activity

All 5 chainable in an Apify run via input-from-dataset pattern.

Proof of delivery

This actor has 24 lifetime production runs as of May 2026 — part of an active portfolio of 31 published actors (78 total). The portfolio's flagship Trustpilot scraper alone has 951 lifetime production runs. The author shipped a paid 3-article series in March 2026 ($150, proxy industry).

Pilot pricing locked through May 2026 — see the tier table below.

Sample request? Reply sample to spinov001@gmail.com and you'll receive 2 published case-study articles within 24 hours, no obligation. This is the fastest way to evaluate writing style + technical depth before committing to a custom build.

Custom scraping — pricing

Need a different platform, a custom field, or a stitched pipeline? One-shot pilot tiers:

  • Pilot — $97: 1 actor, basic config, 7-day support. Good for proof-of-concept on a new platform.
  • Standard — $297: custom actor + Slack/email alerts on results, 30-day support. Most clients here.
  • Premium — $797: custom actor + dashboard + 90-day support + 1 modification round.

Email spinov001@gmail.com with the source URL + the fields you need. Typical turnaround: 48 hours.


Proof of work: 31 public actors on Apify Store (78 total in portfolio). Production-tested: Trustpilot 951 runs, Reddit 82 runs, Google News 45 runs, Glassdoor 39 runs, Email Extractor 107 runs, Hacker News 27 runs, Bluesky 25 runs.

More tips: t.me/scraping_ai


Honest disclosure

  • This actor uses a single Cheerio fetch per URL with a rotating desktop User-Agent. It does not bypass login walls, solve captchas, or scrape authenticated pages.
  • Output fields reflect exactly what requestHandler in src/main.js pushes. No fabricated fields like followersRaw, verified, location, httpStatus, or per-record status flags.
  • Per-platform rate-limiting is not implemented — the crawler runs at maxConcurrency: 3 across all platforms. For high-volume single-platform runs, supply your own platform PAT or use the dedicated platform-specific actor (Bluesky / Reddit / Trustpilot all available).
  • LinkedIn / Instagram / Facebook frequently return login walls; OG meta is usually present but follower data may be empty.
  • Proxy is NOT wired into this version of the actor. CheerioCrawler is constructed without proxyConfiguration, so Apify residential proxy will not be used even if enabled at the run config level. For proxy-routed requests, commission a custom build.
  • Mastodon detection is hostname-substring only (host.includes('mastodon') OR host.includes('fosstodon')). Federated instances without mastodon or fosstodon in the hostname (e.g. mas.to, infosec.exchange, tech.lgbt, hachyderm.io) are detected as platform: 'unknown' and processed via generic OG-meta extraction only — usually still works but platform field will be 'unknown'.
  • Follower-count regex is English + Russian only (/followers|Followers|подписчик/i). Other languages (German "Follower" actually matches via "follower"; French "abonnés", Chinese "关注者", Japanese "フォロワー") are NOT matched — followers field returns null even if the count is visible on the page.
  • Not affiliated with any of the listed platforms.