GitHub Profile — Repos, Stars, Activity, CSV, No Token, Bulk avatar

GitHub Profile — Repos, Stars, Activity, CSV, No Token, Bulk

Pricing

Pay per usage

Go to Apify Store
GitHub Profile — Repos, Stars, Activity, CSV, No Token, Bulk

GitHub Profile — Repos, Stars, Activity, CSV, No Token, Bulk

21 runs. GitHub user intel in CSV/JSON — repos, stars, followers, contribs, languages, bio, email. No API token, no rate blocks. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For recruiter outreach + talent mapping. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Alex

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

0

Monthly active users

2 days ago

Last modified

Categories

Share

GitHub Profile Scraper — Users, Repos, Stars & Language Stats

Pull GitHub user profiles plus their public repositories — bio, follower counts, top repos by stars, language distribution, license info, activity timestamps — using the official GitHub REST API. No token required for public-data lookups (60 req/h unauthenticated rate-limit applies; the actor handles that automatically).

What you get per user

The actor pushes one record per username. All fields below come from GET /users/{username} and GET /users/{username}/repos — verified against src/main.js.

Profile-level fields (19, including scrapedAt metadata)

FieldTypeSource
usernamestringuser.login
namestringuser.name
biostringuser.bio
companystringuser.company
locationstringuser.location
emailstringuser.email (when public)
blogstringuser.blog
twitterUsernamestringuser.twitter_username
avatarstring (URL)user.avatar_url
profileUrlstring (URL)user.html_url
publicReposintuser.public_repos
publicGistsintuser.public_gists
followersintuser.followers
followingintuser.following
createdAtstring (ISO 8601)user.created_at
updatedAtstring (ISO 8601)user.updated_at
hireablebooluser.hireable
typestringUser or Organization
scrapedAtstring (ISO 8601)run timestamp

Repo-level fields (15 per repo, when includeRepos=true)

name, fullName, description, url, stars, forks, watchers, language, topics[], isForked, createdAt, updatedAt, pushedAt, license, openIssues

Plus aggregates on the parent profile record (computed from the EXTRACTED repos only — see Honest limitations):

  • totalStars — sum across the extracted repos (capped by maxReposPerUser, NOT total stars across all of the user's repos)
  • totalForks — sum across the extracted repos (same caveat)
  • languages — array of {language, repoCount}, sorted desc by repo count. repoCount is "how many of the extracted repos have this as their language field" — this is NOT GitHub's bytes-weighted language graph; for that you need the /repos/{owner}/{repo}/languages endpoint per-repo (custom build available).

Output example

{
"username": "torvalds",
"name": "Linus Torvalds",
"bio": null,
"company": "Linux Foundation",
"location": "Portland, OR",
"email": null,
"blog": "",
"twitterUsername": null,
"avatar": "https://avatars.githubusercontent.com/u/1024025?v=4",
"profileUrl": "https://github.com/torvalds",
"publicRepos": 7,
"publicGists": 0,
"followers": 220000,
"following": 0,
"createdAt": "2011-09-03T15:26:22Z",
"updatedAt": "2026-04-20T10:11:12Z",
"hireable": null,
"type": "User",
"totalStars": 185000,
"totalForks": 55000,
"languages": [
{ "language": "C", "repoCount": 4 },
{ "language": "Shell", "repoCount": 1 }
],
"repos": [
{
"name": "linux",
"fullName": "torvalds/linux",
"description": "Linux kernel source tree",
"url": "https://github.com/torvalds/linux",
"stars": 175000,
"forks": 53000,
"watchers": 175000,
"language": "C",
"topics": ["linux", "kernel"],
"isForked": false,
"createdAt": "2011-09-04T22:48:12Z",
"updatedAt": "2026-04-29T08:00:00Z",
"pushedAt": "2026-04-29T07:55:00Z",
"license": "GPL-2.0",
"openIssues": 1
}
],
"scrapedAt": "2026-04-29T12:00:00.000Z"
}

Input

ParameterTypeDefaultDescription
usernamesarray[]GitHub usernames (e.g. ["torvalds", "gaearon"])
includeReposbooleantrueIf false, profile-only — skip repo + language extraction
maxReposPerUserinteger30Cap per user, sorted by stars desc (1–100; the GitHub API page-size ceiling is 100, the actor enforces it)
includeLanguageStatsbooleantrueAggregate language distribution across the extracted repos

Use cases

  • Developer recruiting — evaluate candidates by their open-source footprint, language mix, and active-repo cadence (pushedAt).
  • Competitor analysis — fingerprint the OSS strategy of a company by mapping its top contributors' top repos.
  • Community research — identify influential developers in a technology ecosystem (high stars + matching topics).
  • Portfolio benchmarking — compare repo activity, fork ratios, and license distributions across a candidate set.
  • Lead generation — surface developers with specific technology expertise (filter by languages[*].language).

How it works

  1. GET /users/{username} → 18 fields from GitHub + scrapedAt (19 total).
  2. (If includeRepos) GET /users/{username}/repos?sort=stars&direction=desc&per_page=<maxReposPerUser> → 15 fields per repo. Single page only — no pagination beyond GitHub's per-page ceiling (100).
  3. Aggregate totalStars, totalForks, and languages from the extracted repo set (capped by maxReposPerUser).
  4. On HTTP 403 with X-RateLimit-Remaining: 0, the actor reads X-RateLimit-Reset, sleeps until the window resets (+5 s buffer), then retries. No manual handling required, but unauthenticated mode caps you at 60 req/h — set up an authenticated proxy actor (custom build) if you need higher throughput.

500 ms delay between user-profile fetch and repo-list fetch; 1 s delay between users.


Honest limitations (read before bulk runs)

  • No pagination beyond maxReposPerUser (GitHub per_page ceiling = 100). For users with hundreds of repos, only the top-N by stars are returned. If you need ALL repos, request a paginated custom build.
  • totalStars / totalForks are EXTRACTED-repo aggregates, NOT lifetime totals. A user with 200 repos and maxReposPerUser=30 will show totalStars summed over the top 30 only — not all 200.
  • languages is repoCount, NOT bytes-weighted. GitHub's UI shows percent of repo bytes per language; this actor counts how many repos list each language as their language field. Bytes-based language stats require per-repo /languages calls (custom build).
  • One user's HTTP error halts the entire batch. The for (const username of usernames) loop is wrapped in ONE outer try/catch; a 502 / 503 on user #5 of 100 stops the run. Workaround: split large batches into ≤25-username runs, or request a per-user try/catch custom build.
  • Rate-limit retry has no max-wait safeguard. If X-RateLimit-Reset shows 60 minutes ahead, the actor sleeps 60 minutes (+5 s) before retrying. For a 100-user batch hitting the unauthenticated cap, total wall-clock can stretch to 1.5–2 h. For high-volume runs, request an authenticated custom build (5000 req/h with PAT).
  • No proxy. Direct fetch from the Apify worker IP. GitHub's unauthenticated rate-limit is per-IP, so co-tenant noise on the same Apify IP can shrink your effective quota.
  • email is null for most users. GitHub only exposes email when the user has explicitly set a public email in profile settings.
  • license may be null for repos without a declared license, or repos using a license GitHub can't classify (returns spdx_id only, or null).
  • hireable is tri-state (true / false / null — GitHub-account opt-in). README example shows null (the most common state).
  • topics ordering is GitHub-internal — not alphabetical, not by relevance; just the array as returned.
  • Hardcoded UA ApifyGitHubProfileScraper/1.0. GitHub recommends a UA but doesn't inspect it for capability gating.
  • Empty usernames = [] is silently accepted — actor logs No usernames provided. and exits 0.

Python integration

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("knotless_cadence/github-profile-scraper").call(
run_input={
"usernames": ["torvalds", "gaearon", "sindresorhus"],
"maxReposPerUser": 20,
}
)
for p in client.dataset(run["defaultDatasetId"]).iterate_items():
top_lang = (p.get("languages") or [{}])[0].get("language", "?")
print(f"{p['username']:>20} {p['followers']:>7,} followers {p.get('totalStars',0):>7,} stars top:{top_lang}")

ToolWhat it does
GitHub Trending ScraperPopular repos by language / time window
GitHub Profile ScraperThis — developer profiles + repos
GitHub Issues ScraperIssues and PRs for a repo

All free to inspect on Apify Store — 31 published actors, 78 total in portfolio.


Common questions

Q: Do I need a GitHub API token? A: For public profiles, no — the actor uses the unauthenticated REST API (60 req/h). With ~30 repos per user, that supports about 30 users per hour before the rate-limit kicks in. The actor handles X-RateLimit-Reset automatically; very large batches will simply pause and resume.

Q: Does this scrape stars given, contribution graph, or sponsors? A: No — those are separate REST endpoints (and the contribution graph is HTML-only). Available as a custom build (see Custom scraping below).

Q: Why is email null even on public-figure accounts? A: GitHub only exposes email when the user has explicitly set a public email in profile settings. For most profiles email is null. Use the blog field as a fallback contact-discovery signal.

Q: Are forked repos included? A: Yes — isForked: true flags them. Filter them out client-side if you want owned-only repos.


Custom scraping — pilot tiers

Need authenticated batches, contribution-graph extraction, organization-wide audits, or a different schema (e.g. last-90-days commits per repo)? Three tiers:

  • Pilot — $97 · 1 actor, basic config, 7-day support. Good entry point — useful for a single OSS-strategy report on one company's contributors.
  • Standard — $297 · custom actor + Slack/email alerts on results, 30-day support. Most recruiting and competitor-OSS projects fit here.
  • Premium — $797 · custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (weekly contributor refresh, multi-org rollups, technology-trend tracking).

Email: spinov001@gmail.com — drop the username list and the schema you need; quote within 48h.

Proof of work: 31 published Apify scrapers (78 total in portfolio) — Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).

More tips: t.me/scraping_ai · blog.spinov.online


Disclaimer

Designed for recruiting research, OSS-strategy analysis, and academic use. Respect GitHub's Terms of Service, applicable data-protection law (GDPR, CCPA), and scrape publicly visible content only. Not affiliated with GitHub, Inc. or Microsoft Corporation.

Honest disclosure: 19 profile fields (18 from GitHub + scrapedAt) + 15 repo fields per record. Unauthenticated rate-limit is 60 req/h; the actor handles X-RateLimit-Reset automatically (no max-wait safeguard — long resets = long waits). totalStars/totalForks/languages aggregates are computed over the EXTRACTED repos only (capped by maxReposPerUser, max 100), NOT lifetime totals. languages.repoCount is repo count not bytes-weighted. Single API page, no pagination beyond 100 repos. One user's HTTP error halts the batch (outer try/catch). No contribution graph, no commit-activity series, no stars-given list, no sponsors data — those are different endpoints and can be built as custom additions.