GitHub Profile — Repos, Stars, Activity, CSV, No Token, Bulk
Pricing
Pay per usage
GitHub Profile — Repos, Stars, Activity, CSV, No Token, Bulk
21 runs. GitHub user intel in CSV/JSON — repos, stars, followers, contribs, languages, bio, email. No API token, no rate blocks. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For recruiter outreach + talent mapping. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Alex
Actor stats
0
Bookmarked
3
Total users
0
Monthly active users
2 days ago
Last modified
Categories
Share
GitHub Profile Scraper — Users, Repos, Stars & Language Stats
Pull GitHub user profiles plus their public repositories — bio, follower counts, top repos by stars, language distribution, license info, activity timestamps — using the official GitHub REST API. No token required for public-data lookups (60 req/h unauthenticated rate-limit applies; the actor handles that automatically).
What you get per user
The actor pushes one record per username. All fields below come from GET /users/{username} and GET /users/{username}/repos — verified against src/main.js.
Profile-level fields (19, including scrapedAt metadata)
| Field | Type | Source |
|---|---|---|
username | string | user.login |
name | string | user.name |
bio | string | user.bio |
company | string | user.company |
location | string | user.location |
email | string | user.email (when public) |
blog | string | user.blog |
twitterUsername | string | user.twitter_username |
avatar | string (URL) | user.avatar_url |
profileUrl | string (URL) | user.html_url |
publicRepos | int | user.public_repos |
publicGists | int | user.public_gists |
followers | int | user.followers |
following | int | user.following |
createdAt | string (ISO 8601) | user.created_at |
updatedAt | string (ISO 8601) | user.updated_at |
hireable | bool | user.hireable |
type | string | User or Organization |
scrapedAt | string (ISO 8601) | run timestamp |
Repo-level fields (15 per repo, when includeRepos=true)
name, fullName, description, url, stars, forks, watchers, language, topics[], isForked, createdAt, updatedAt, pushedAt, license, openIssues
Plus aggregates on the parent profile record (computed from the EXTRACTED repos only — see Honest limitations):
totalStars— sum across the extracted repos (capped bymaxReposPerUser, NOT total stars across all of the user's repos)totalForks— sum across the extracted repos (same caveat)languages— array of{language, repoCount}, sorted desc by repo count.repoCountis "how many of the extracted repos have this as theirlanguagefield" — this is NOT GitHub's bytes-weighted language graph; for that you need the/repos/{owner}/{repo}/languagesendpoint per-repo (custom build available).
Output example
{"username": "torvalds","name": "Linus Torvalds","bio": null,"company": "Linux Foundation","location": "Portland, OR","email": null,"blog": "","twitterUsername": null,"avatar": "https://avatars.githubusercontent.com/u/1024025?v=4","profileUrl": "https://github.com/torvalds","publicRepos": 7,"publicGists": 0,"followers": 220000,"following": 0,"createdAt": "2011-09-03T15:26:22Z","updatedAt": "2026-04-20T10:11:12Z","hireable": null,"type": "User","totalStars": 185000,"totalForks": 55000,"languages": [{ "language": "C", "repoCount": 4 },{ "language": "Shell", "repoCount": 1 }],"repos": [{"name": "linux","fullName": "torvalds/linux","description": "Linux kernel source tree","url": "https://github.com/torvalds/linux","stars": 175000,"forks": 53000,"watchers": 175000,"language": "C","topics": ["linux", "kernel"],"isForked": false,"createdAt": "2011-09-04T22:48:12Z","updatedAt": "2026-04-29T08:00:00Z","pushedAt": "2026-04-29T07:55:00Z","license": "GPL-2.0","openIssues": 1}],"scrapedAt": "2026-04-29T12:00:00.000Z"}
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
usernames | array | [] | GitHub usernames (e.g. ["torvalds", "gaearon"]) |
includeRepos | boolean | true | If false, profile-only — skip repo + language extraction |
maxReposPerUser | integer | 30 | Cap per user, sorted by stars desc (1–100; the GitHub API page-size ceiling is 100, the actor enforces it) |
includeLanguageStats | boolean | true | Aggregate language distribution across the extracted repos |
Use cases
- Developer recruiting — evaluate candidates by their open-source footprint, language mix, and active-repo cadence (
pushedAt). - Competitor analysis — fingerprint the OSS strategy of a company by mapping its top contributors' top repos.
- Community research — identify influential developers in a technology ecosystem (high stars + matching
topics). - Portfolio benchmarking — compare repo activity, fork ratios, and license distributions across a candidate set.
- Lead generation — surface developers with specific technology expertise (filter by
languages[*].language).
How it works
GET /users/{username}→ 18 fields from GitHub +scrapedAt(19 total).- (If
includeRepos)GET /users/{username}/repos?sort=stars&direction=desc&per_page=<maxReposPerUser>→ 15 fields per repo. Single page only — no pagination beyond GitHub's per-page ceiling (100). - Aggregate
totalStars,totalForks, andlanguagesfrom the extracted repo set (capped bymaxReposPerUser). - On HTTP 403 with
X-RateLimit-Remaining: 0, the actor readsX-RateLimit-Reset, sleeps until the window resets (+5 s buffer), then retries. No manual handling required, but unauthenticated mode caps you at 60 req/h — set up an authenticated proxy actor (custom build) if you need higher throughput.
500 ms delay between user-profile fetch and repo-list fetch; 1 s delay between users.
Honest limitations (read before bulk runs)
- No pagination beyond
maxReposPerUser(GitHubper_pageceiling = 100). For users with hundreds of repos, only the top-N by stars are returned. If you need ALL repos, request a paginated custom build. totalStars/totalForksare EXTRACTED-repo aggregates, NOT lifetime totals. A user with 200 repos andmaxReposPerUser=30will showtotalStarssummed over the top 30 only — not all 200.languagesisrepoCount, NOT bytes-weighted. GitHub's UI shows percent of repo bytes per language; this actor counts how many repos list each language as theirlanguagefield. Bytes-based language stats require per-repo/languagescalls (custom build).- One user's HTTP error halts the entire batch. The
for (const username of usernames)loop is wrapped in ONE outertry/catch; a 502 / 503 on user #5 of 100 stops the run. Workaround: split large batches into ≤25-username runs, or request a per-user try/catch custom build. - Rate-limit retry has no max-wait safeguard. If
X-RateLimit-Resetshows 60 minutes ahead, the actor sleeps 60 minutes (+5 s) before retrying. For a 100-user batch hitting the unauthenticated cap, total wall-clock can stretch to 1.5–2 h. For high-volume runs, request an authenticated custom build (5000 req/h with PAT). - No proxy. Direct fetch from the Apify worker IP. GitHub's unauthenticated rate-limit is per-IP, so co-tenant noise on the same Apify IP can shrink your effective quota.
emailisnullfor most users. GitHub only exposesemailwhen the user has explicitly set a public email in profile settings.licensemay benullfor repos without a declared license, or repos using a license GitHub can't classify (returnsspdx_idonly, ornull).hireableis tri-state (true/false/null— GitHub-account opt-in). README example showsnull(the most common state).topicsordering is GitHub-internal — not alphabetical, not by relevance; just the array as returned.- Hardcoded UA
ApifyGitHubProfileScraper/1.0. GitHub recommends a UA but doesn't inspect it for capability gating. - Empty
usernames = []is silently accepted — actor logsNo usernames provided.and exits 0.
Python integration
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("knotless_cadence/github-profile-scraper").call(run_input={"usernames": ["torvalds", "gaearon", "sindresorhus"],"maxReposPerUser": 20,})for p in client.dataset(run["defaultDatasetId"]).iterate_items():top_lang = (p.get("languages") or [{}])[0].get("language", "?")print(f"{p['username']:>20} {p['followers']:>7,} followers {p.get('totalStars',0):>7,} stars top:{top_lang}")
GitHub data toolkit (related actors)
| Tool | What it does |
|---|---|
| GitHub Trending Scraper | Popular repos by language / time window |
| GitHub Profile Scraper | This — developer profiles + repos |
| GitHub Issues Scraper | Issues and PRs for a repo |
All free to inspect on Apify Store — 31 published actors, 78 total in portfolio.
Common questions
Q: Do I need a GitHub API token?
A: For public profiles, no — the actor uses the unauthenticated REST API (60 req/h). With ~30 repos per user, that supports about 30 users per hour before the rate-limit kicks in. The actor handles X-RateLimit-Reset automatically; very large batches will simply pause and resume.
Q: Does this scrape stars given, contribution graph, or sponsors? A: No — those are separate REST endpoints (and the contribution graph is HTML-only). Available as a custom build (see Custom scraping below).
Q: Why is email null even on public-figure accounts?
A: GitHub only exposes email when the user has explicitly set a public email in profile settings. For most profiles email is null. Use the blog field as a fallback contact-discovery signal.
Q: Are forked repos included?
A: Yes — isForked: true flags them. Filter them out client-side if you want owned-only repos.
Custom scraping — pilot tiers
Need authenticated batches, contribution-graph extraction, organization-wide audits, or a different schema (e.g. last-90-days commits per repo)? Three tiers:
- Pilot — $97 · 1 actor, basic config, 7-day support. Good entry point — useful for a single OSS-strategy report on one company's contributors.
- Standard — $297 · custom actor + Slack/email alerts on results, 30-day support. Most recruiting and competitor-OSS projects fit here.
- Premium — $797 · custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (weekly contributor refresh, multi-org rollups, technology-trend tracking).
Email: spinov001@gmail.com — drop the username list and the schema you need; quote within 48h.
Proof of work: 31 published Apify scrapers (78 total in portfolio) — Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).
More tips: t.me/scraping_ai · blog.spinov.online
Disclaimer
Designed for recruiting research, OSS-strategy analysis, and academic use. Respect GitHub's Terms of Service, applicable data-protection law (GDPR, CCPA), and scrape publicly visible content only. Not affiliated with GitHub, Inc. or Microsoft Corporation.
Honest disclosure: 19 profile fields (18 from GitHub + scrapedAt) + 15 repo fields per record. Unauthenticated rate-limit is 60 req/h; the actor handles X-RateLimit-Reset automatically (no max-wait safeguard — long resets = long waits). totalStars/totalForks/languages aggregates are computed over the EXTRACTED repos only (capped by maxReposPerUser, max 100), NOT lifetime totals. languages.repoCount is repo count not bytes-weighted. Single API page, no pagination beyond 100 repos. One user's HTTP error halts the batch (outer try/catch). No contribution graph, no commit-activity series, no stars-given list, no sponsors data — those are different endpoints and can be built as custom additions.