Pricing

Pay per usage

Social Profiles — Bio, Followers, Posts in CSV, Bulk

Social profile data CSV/JSON — username, bio, followers, following, posts. Same schema LinkedIn/GitHub/Reddit. 52 lifetime runs · 9 users · 5 active 30d · 100% success rate. B2B prospecting/ABM/recruiter sourcing. dev.to/0012303 · blog.spinov.online

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Who buys this actor

SDR / lead-gen teams enriching a list of prospect LinkedIn URLs with their GitHub/Dev.to/Medium activity — technical buyers reveal themselves through public dev presence.
VC / investor-research analysts pulling founder footprint across multiple platforms into one row per person for deal-review memos.
Talent / recruiting vetting candidate portfolios — GitHub follower count + Dev.to bio + LinkedIn headline in one scrape, instead of 3 separate tools.
Influencer-marketing teams comparing the same creator's public reach on Instagram vs TikTok vs YouTube vs Bluesky before signing a deal.
Brand / reputation teams monitoring the social graph of their executives — early signal when an exec is active on Bluesky but quiet on Twitter.
CRM-enrichment pipelines filling the "social handles" section of a HubSpot / Salesforce contact at import time.

Why this over the obvious alternatives

Concern	How this actor handles it
"Clearbit / Apollo / ZoomInfo already do social enrichment."	They do — at $10K+/year seat licenses, with stale data and limited to business platforms (no Bluesky, no Mastodon, no Threads). This actor runs on-demand at Apify rates, including the newer platforms.
"Why not 13 different platform SDKs?"	Because each has different auth, rate limits, and OAuth review. This actor fetches the public unauth page once and reads the meta tags every platform exposes.
"Accuracy on platforms without public APIs (LinkedIn, Instagram)?"	We do not bypass login walls or scrape authenticated pages. For LinkedIn / Instagram we read whatever Open Graph + JSON-LD the public unauth response exposes. You get display name + bio + avatar — not private fields. Full-detail LinkedIn requires Sales Navigator API.
"Follower counts come back as strings ('210K')?"	Yes — that's what most platforms render in their Open Graph / page text. Convert to int yourself with the K/M/B suffix parser; the raw string preserves what the platform actually shows.
"Deleted / private / 404 profiles — does one bad URL kill the batch?"	No. Crawlee/Cheerio handles each URL independently; failed fetches are logged and the next URL continues.
"Anti-bot on GitHub / LinkedIn / Instagram?"	The actor sends a rotating desktop User-Agent (3 UAs) and runs at `maxConcurrency: 3`. For LinkedIn / Instagram / Facebook you will often get a login-wall response instead of the real profile — that's expected and the OG/Twitter Card meta tags returned by the login page usually still yield `displayName` + `bio`. Note: this version does NOT wire up `Actor.createProxyConfiguration()` — even if you have Apify residential proxy on your account, the CheerioCrawler in this actor will not use it. For proxy-routed requests against LinkedIn, request a custom build (see Custom scraping).

Input

{
  "profileUrls": [
    "https://github.com/torvalds",
    "https://dev.to/ben",
    "https://bsky.app/profile/jay.bsky.team",
    "https://www.linkedin.com/in/patrick-collison/",
    "https://www.youtube.com/@veritasium",
    "https://twitter.com/paulg"
  ]
}

profileUrls (array, required) — list of full URLs, any mix of supported platforms.

Platforms auto-detected from hostname: GitHub, Twitter/X, LinkedIn, Instagram, Facebook, YouTube, TikTok, Reddit, Medium, Dev.to, Threads, Bluesky, Mastodon (incl. Fosstodon). Unknown hostnames get platform: "unknown" and still attempt OG-meta extraction.

Output schema (per URL)

The actor pushes one record per URL with these fields (real, verified against src/main.js):

{
  "url": "https://github.com/torvalds",
  "platform": "github",
  "username": "torvalds",
  "displayName": "Linus Torvalds · GitHub",
  "bio": "Linux kernel developer",
  "avatar": "https://avatars.githubusercontent.com/u/1024025",
  "followers": "210K",
  "following": "0",
  "postsCount": "35",
  "siteName": "GitHub",
  "twitterHandle": null,
  "website": null,
  "meta": {
    "ogTitle": "torvalds (Linus Torvalds) · GitHub",
    "ogDescription": "Linux kernel developer",
    "twitterTitle": "",
    "twitterDescription": "",
    "pageTitle": "torvalds (Linus Torvalds) · GitHub",
    "metaDescription": "Linus Torvalds has 35 repositories available..."
  },
  "scrapedAt": "2026-04-29T12:30:00.000Z"
}

Top-level fields: url, platform, username, displayName, bio, avatar, followers, following, postsCount, siteName, scrapedAt (always present) + website, twitterHandle (present only when extracted). Nested meta object always present with 6 fields: ogTitle, ogDescription, twitterTitle, twitterDescription, pageTitle, metaDescription (truncated to 300 chars). Missing values are null rather than dropped to keep DataFrame schema stable.

followers / following / postsCount are strings — the actor extracts the human-readable count from page text or meta tags and does not parse '210K' into 210000 for you. Convert downstream with a simple suffix parser if needed.

Python copy-paste — enrich a LinkedIn export with GitHub signal

Given LinkedIn URLs from a Sales-Nav export, scrape + cross-reference with GitHub handles guessed from display names:

from apify_client import ApifyClient
import re

client = ApifyClient("<YOUR_APIFY_TOKEN>")

linkedin_urls = [line.strip() for line in open("prospects_linkedin.txt")]

def slugify(name: str) -> str:
    return re.sub(r"[^a-z0-9]+", "-", name.lower()).strip("-")

run = client.actor("knotless_cadence/social-profile-scraper").call(run_input={
    "profileUrls": linkedin_urls,
})
li = {r["url"]: r for r in client.dataset(run["defaultDatasetId"]).iterate_items()}

gh_guesses = [
    f"https://github.com/{slugify(r['displayName'])}"
    for r in li.values() if r.get("displayName")
]

run2 = client.actor("knotless_cadence/social-profile-scraper").call(run_input={
    "profileUrls": gh_guesses,
})

def parse_followers(s):
    if not s: return 0
    s = s.replace(",", "").strip().upper()
    m = re.match(r"([\d.]+)([KMB]?)", s)
    if not m: return 0
    n = float(m.group(1))
    return int(n * {"K": 1_000, "M": 1_000_000, "B": 1_000_000_000}.get(m.group(2), 1))

gh = [r for r in client.dataset(run2["defaultDatasetId"]).iterate_items()
      if r.get("displayName") and parse_followers(r.get("followers")) > 50]

print(f"Technical-buyer signal: {len(gh)} of {len(linkedin_urls)} prospects "
      f"have an active GitHub (>50 followers).")
for r in gh:
    print(f"  {r['displayName']}  →  {r['url']}  ({r.get('followers')})")

MCP / LLM-agent use

Wrap as a single tool for agents that need "get public meta about this person":

tools = [{
    "name": "lookup_social_profile",
    "description": "Given a social profile URL on a major platform, return display name, bio, avatar, follower count.",
    "input_schema": {
        "type": "object",
        "properties": {"url": {"type": "string"}},
        "required": ["url"],
    },
}]

The agent enriches names mentioned in conversation ("tell me about @paulg") into canonical profile data for its reasoning step.

Frequent questions

1. "Will LinkedIn / Instagram throw a login wall?" Often yes — both gate most public profile pages behind a sign-in for unauthenticated requests. The fetch returns the login-wall HTML; OG/Twitter Card meta tags are usually still present, so you may still get displayName and bio even when followers is empty. For higher success rates, enable Apify residential proxy.

2. "Does it follow Bluesky custom-domain handles (alice.dev)?" The actor reads whatever Open Graph the URL serves. https://bsky.app/profile/alice.dev works because bsky.app returns proper OG. For native AT-Protocol profile data (DID, full bio, exact follower count), use the dedicated Bluesky Scraper actor.

3. "How fresh is the data?" Scraped live per run, no cache. For change tracking, snapshot to your DB and diff.

4. "Mastodon has 1000+ servers. Do I need to specify the instance?" Pass the full URL with instance hostname (https://fosstodon.org/@user). The actor reads OG meta from whichever Mastodon server hosts the account — there is no global username registry to guess from.

5. "Follower counts on TikTok come back as '1.2M' strings." Yes — see the suffix parser snippet above.

6. "I need emails too." Different actor: Email Extractor Pro pulls emails from websites and chains cleanly after this one (social profile → website → email).

Lead-enrichment toolkit (companion actors)

Step	Tool	Purpose
1	Google Maps Scraper Pro	Find businesses by category + location
2	Social Profile Scraper (this)	Get social presence for each result
3	Email Extractor Pro	Pull contact email from website
4	Website Tech-Stack Detector	Qualify by tech stack (Next.js, Shopify, etc.)
5	Trustpilot Review Scraper	Bonus: review-volume signal of business activity

All 5 chainable in an Apify run via input-from-dataset pattern.

Proof of delivery

This actor has 24 lifetime production runs as of May 2026 — part of an active portfolio of 31 published actors (78 total). The portfolio's flagship Trustpilot scraper alone has 951 lifetime production runs. The author shipped a paid 3-article series in March 2026 ($150, proxy industry).

Pilot pricing locked through May 2026 — see the tier table below.

Sample request? Reply sample to spinov001@gmail.com and you'll receive 2 published case-study articles within 24 hours, no obligation. This is the fastest way to evaluate writing style + technical depth before committing to a custom build.

Custom scraping — pricing

Need a different platform, a custom field, or a stitched pipeline? One-shot pilot tiers:

Pilot — $97: 1 actor, basic config, 7-day support. Good for proof-of-concept on a new platform.
Standard — $297: custom actor + Slack/email alerts on results, 30-day support. Most clients here.
Premium — $797: custom actor + dashboard + 90-day support + 1 modification round.

Email spinov001@gmail.com with the source URL + the fields you need. Typical turnaround: 48 hours.

Proof of work: 31 public actors on Apify Store (78 total in portfolio). Production-tested: Trustpilot 951 runs, Reddit 82 runs, Google News 45 runs, Glassdoor 39 runs, Email Extractor 107 runs, Hacker News 27 runs, Bluesky 25 runs.

More tips: t.me/scraping_ai

Honest disclosure

This actor uses a single Cheerio fetch per URL with a rotating desktop User-Agent. It does not bypass login walls, solve captchas, or scrape authenticated pages.
Output fields reflect exactly what requestHandler in src/main.js pushes. No fabricated fields like followersRaw, verified, location, httpStatus, or per-record status flags.
Per-platform rate-limiting is not implemented — the crawler runs at maxConcurrency: 3 across all platforms. For high-volume single-platform runs, supply your own platform PAT or use the dedicated platform-specific actor (Bluesky / Reddit / Trustpilot all available).
LinkedIn / Instagram / Facebook frequently return login walls; OG meta is usually present but follower data may be empty.
Proxy is NOT wired into this version of the actor. CheerioCrawler is constructed without proxyConfiguration, so Apify residential proxy will not be used even if enabled at the run config level. For proxy-routed requests, commission a custom build.
Mastodon detection is hostname-substring only (host.includes('mastodon') OR host.includes('fosstodon')). Federated instances without mastodon or fosstodon in the hostname (e.g. mas.to, infosec.exchange, tech.lgbt, hachyderm.io) are detected as platform: 'unknown' and processed via generic OG-meta extraction only — usually still works but platform field will be 'unknown'.
Follower-count regex is English + Russian only (/followers|Followers|подписчик/i). Other languages (German "Follower" actually matches via "follower"; French "abonnés", Chinese "关注者", Japanese "フォロワー") are NOT matched — followers field returns null even if the count is visible on the page.
Not affiliated with any of the listed platforms.

Glassdoor Scraper — Reviews, Salaries, CSV, No Login Required

knotless_cadence/glassdoor-reviews-scraper

Glassdoor reviews + salary in CSV/JSON in 5 min — no coding, no login, no rate-limits. 59 lifetime runs · 5 paying users · u30d=1 active. Ratings/pros-cons/titles/dates/salary schema. Competitive intel + recruiter outreach + comp planning. dev.to/0012303 · blog.spinov.online

Alex

Meta Threads Scraper — CSV, No Login, No Rate Limits

knotless_cadence/threads-scraper

Meta Threads (threads.net) JSON/CSV — POSTs (author, text, source) + PROFILEs (followers, bio, avatar) by username/search. 46 runs / 8 users / 32-actor portfolio (2190 lifetime). Audience research + brand mentions. Sample: dev.to/0012303. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

Bluesky Scraper — Posts, Followers & Profiles [No API Limits]

knotless_cadence/bluesky-scraper

Bluesky posts, profiles & feeds in CSV in 2 min — no API waitlist, no rate limits, no bans. 44 runs · fresh u7d signal · 100% 30d success. Text/images/likes/reposts/profile metadata. Post-Twitter audience tracking + creator discovery + brand listening. dev.to/0012303 · blog.spinov.online

Alex

Reddit Scraper Pro — Posts, Comments, Subreddits, No API Key

knotless_cadence/reddit-discussion-scraper

Reddit scraper via public JSON — posts + comments, no login. 20 fields/post (score, ratio, flair, NSFW). CSV/JSON. 101 runs · 6 users · u30d=2 · 27/30d. Trend research + LLM training data. blog.spinov.online · dev.to/0012303 · spinov001@gmail.com

Alex

Google News Scraper — Fast Headlines & Sources [No API Key]

knotless_cadence/google-news-scraper

Monitor Google News fast. No API, no RSS limits, no blocks. Titles, dates, snippets, sources → CSV. 75 lifetime runs · 100% 30d success · u30d=3, u7d=1 · 8 paying users. dev.to/0012303 (Proxy-Seller 2320w paid) · blog.spinov.online · spinov001@gmail.com

Alex

Instagram Profile Scraper – Followers, Bio, Posts, Verified

scrapepilot/instagram-profile-scraper---followers-bio-posts-verified

Get real‑time Instagram profile data: username, full name, bio, followers, following, posts, verified, business account, profile pic. No login. Pay per success ($0.0008). Residential proxy ready. Bulk support.

Scrape Pilot

207

Instagram Profile Scraper – Followers, Bio & Posts

ninhothedev/instagram-scraper

$2/1K 🔥 Fast Instagram profile scraper! Followers, bio, posts, business email & engagement stats — no login. JSON, CSV, Excel or API in seconds. Enter a username & pull thousands of profiles for influencer & lead research ⚡

ninhothedev

Instagram Profile Scraper — Bio, Posts & Follower Data

oneary/instagram-profile-scraper

📸 Scrape Instagram profiles — bio, followers, following, posts & engagement metrics. Perfect for influencer analysis & competitor research.

Luan M.

MCP Company Researcher — AI Agent Business Intel, JSON, No Key

knotless_cadence/mcp-company-researcher

44 lifetime runs · 1 user active 30d. Get company intel as JSON in 30 sec — feed a domain, get back website meta + tech-stack markers + DNS + SSL + Google News + HN mentions. No login. For SDR enrichment + ABM + investor due-diligence. dev.to/0012303 · blog.spinov.online

Alex

TikTok Profile Scraper - Followers, Bio, Likes & Engagement

santhej/tiktok-profile-scraper

Scrape TikTok profiles in bulk: followers, following, total likes, video count, bio, and verified status. Just paste usernames or URLs. Clean JSON/CSV for influencer research & social analytics. No login required.