Pricing

Pay per usage

GitHub Profile — Repos, Stars, Activity, CSV, No Token, Bulk

21 runs. GitHub user intel in CSV/JSON — repos, stars, followers, contribs, languages, bio, email. No API token, no rate blocks. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For recruiter outreach + talent mapping. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Actor stats

Bookmarked

Total users

Monthly active users

23 days ago

Last modified

GitHub Profile Scraper — Users, Repos, Stars & Language Stats

Pull GitHub user profiles plus their public repositories — bio, follower counts, top repos by stars, language distribution, license info, activity timestamps — using the official GitHub REST API. No token required for public-data lookups (60 req/h unauthenticated rate-limit applies; the actor handles that automatically).

What you get per user

The actor pushes one record per username. All fields below come from GET /users/{username} and GET /users/{username}/repos — verified against src/main.js.

Profile-level fields (19, including `scrapedAt` metadata)

Field	Type	Source
`username`	string	`user.login`
`name`	string	`user.name`
`bio`	string	`user.bio`
`company`	string	`user.company`
`location`	string	`user.location`
`email`	string	`user.email` (when public)
`blog`	string	`user.blog`
`twitterUsername`	string	`user.twitter_username`
`avatar`	string (URL)	`user.avatar_url`
`profileUrl`	string (URL)	`user.html_url`
`publicRepos`	int	`user.public_repos`
`publicGists`	int	`user.public_gists`
`followers`	int	`user.followers`
`following`	int	`user.following`
`createdAt`	string (ISO 8601)	`user.created_at`
`updatedAt`	string (ISO 8601)	`user.updated_at`
`hireable`	bool	`user.hireable`
`type`	string	`User` or `Organization`
`scrapedAt`	string (ISO 8601)	run timestamp

Repo-level fields (15 per repo, when `includeRepos=true`)

name, fullName, description, url, stars, forks, watchers, language, topics[], isForked, createdAt, updatedAt, pushedAt, license, openIssues

Plus aggregates on the parent profile record (computed from the EXTRACTED repos only — see Honest limitations):

totalStars — sum across the extracted repos (capped by maxReposPerUser, NOT total stars across all of the user's repos)
totalForks — sum across the extracted repos (same caveat)
languages — array of {language, repoCount}, sorted desc by repo count. repoCount is "how many of the extracted repos have this as their language field" — this is NOT GitHub's bytes-weighted language graph; for that you need the /repos/{owner}/{repo}/languages endpoint per-repo (custom build available).

Output example

{
  "username": "torvalds",
  "name": "Linus Torvalds",
  "bio": null,
  "company": "Linux Foundation",
  "location": "Portland, OR",
  "email": null,
  "blog": "",
  "twitterUsername": null,
  "avatar": "https://avatars.githubusercontent.com/u/1024025?v=4",
  "profileUrl": "https://github.com/torvalds",
  "publicRepos": 7,
  "publicGists": 0,
  "followers": 220000,
  "following": 0,
  "createdAt": "2011-09-03T15:26:22Z",
  "updatedAt": "2026-04-20T10:11:12Z",
  "hireable": null,
  "type": "User",
  "totalStars": 185000,
  "totalForks": 55000,
  "languages": [
    { "language": "C", "repoCount": 4 },
    { "language": "Shell", "repoCount": 1 }
  ],
  "repos": [
    {
      "name": "linux",
      "fullName": "torvalds/linux",
      "description": "Linux kernel source tree",
      "url": "https://github.com/torvalds/linux",
      "stars": 175000,
      "forks": 53000,
      "watchers": 175000,
      "language": "C",
      "topics": ["linux", "kernel"],
      "isForked": false,
      "createdAt": "2011-09-04T22:48:12Z",
      "updatedAt": "2026-04-29T08:00:00Z",
      "pushedAt": "2026-04-29T07:55:00Z",
      "license": "GPL-2.0",
      "openIssues": 1
    }
  ],
  "scrapedAt": "2026-04-29T12:00:00.000Z"
}

Input

Parameter	Type	Default	Description
`usernames`	array	`[]`	GitHub usernames (e.g. `["torvalds", "gaearon"]`)
`includeRepos`	boolean	`true`	If false, profile-only — skip repo + language extraction
`maxReposPerUser`	integer	`30`	Cap per user, sorted by `stars desc` (1–100; the GitHub API page-size ceiling is 100, the actor enforces it)
`includeLanguageStats`	boolean	`true`	Aggregate language distribution across the extracted repos

Use cases

Developer recruiting — evaluate candidates by their open-source footprint, language mix, and active-repo cadence (pushedAt).
Competitor analysis — fingerprint the OSS strategy of a company by mapping its top contributors' top repos.
Community research — identify influential developers in a technology ecosystem (high stars + matching topics).
Portfolio benchmarking — compare repo activity, fork ratios, and license distributions across a candidate set.
Lead generation — surface developers with specific technology expertise (filter by languages[*].language).

How it works

GET /users/{username} → 18 fields from GitHub + scrapedAt (19 total).
(If includeRepos) GET /users/{username}/repos?sort=stars&direction=desc&per_page=<maxReposPerUser> → 15 fields per repo. Single page only — no pagination beyond GitHub's per-page ceiling (100).
Aggregate totalStars, totalForks, and languages from the extracted repo set (capped by maxReposPerUser).
On HTTP 403 with X-RateLimit-Remaining: 0, the actor reads X-RateLimit-Reset, sleeps until the window resets (+5 s buffer), then retries. No manual handling required, but unauthenticated mode caps you at 60 req/h — set up an authenticated proxy actor (custom build) if you need higher throughput.

500 ms delay between user-profile fetch and repo-list fetch; 1 s delay between users.

Honest limitations (read before bulk runs)

No pagination beyond maxReposPerUser (GitHub per_page ceiling = 100). For users with hundreds of repos, only the top-N by stars are returned. If you need ALL repos, request a paginated custom build.
totalStars / totalForks are EXTRACTED-repo aggregates, NOT lifetime totals. A user with 200 repos and maxReposPerUser=30 will show totalStars summed over the top 30 only — not all 200.
languages is repoCount, NOT bytes-weighted. GitHub's UI shows percent of repo bytes per language; this actor counts how many repos list each language as their language field. Bytes-based language stats require per-repo /languages calls (custom build).
One user's HTTP error halts the entire batch. The for (const username of usernames) loop is wrapped in ONE outer try/catch; a 502 / 503 on user #5 of 100 stops the run. Workaround: split large batches into ≤25-username runs, or request a per-user try/catch custom build.
Rate-limit retry has no max-wait safeguard. If X-RateLimit-Reset shows 60 minutes ahead, the actor sleeps 60 minutes (+5 s) before retrying. For a 100-user batch hitting the unauthenticated cap, total wall-clock can stretch to 1.5–2 h. For high-volume runs, request an authenticated custom build (5000 req/h with PAT).
No proxy. Direct fetch from the Apify worker IP. GitHub's unauthenticated rate-limit is per-IP, so co-tenant noise on the same Apify IP can shrink your effective quota.
email is null for most users. GitHub only exposes email when the user has explicitly set a public email in profile settings.
license may be null for repos without a declared license, or repos using a license GitHub can't classify (returns spdx_id only, or null).
hireable is tri-state (true / false / null — GitHub-account opt-in). README example shows null (the most common state).
topics ordering is GitHub-internal — not alphabetical, not by relevance; just the array as returned.
Hardcoded UA ApifyGitHubProfileScraper/1.0. GitHub recommends a UA but doesn't inspect it for capability gating.
Empty usernames = [] is silently accepted — actor logs No usernames provided. and exits 0.

Python integration

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("knotless_cadence/github-profile-scraper").call(
    run_input={
        "usernames": ["torvalds", "gaearon", "sindresorhus"],
        "maxReposPerUser": 20,
    }
)
for p in client.dataset(run["defaultDatasetId"]).iterate_items():
    top_lang = (p.get("languages") or [{}])[0].get("language", "?")
    print(f"{p['username']:>20}  {p['followers']:>7,} followers  {p.get('totalStars',0):>7,} stars  top:{top_lang}")

Tool	What it does
GitHub Trending Scraper	Popular repos by language / time window
GitHub Profile Scraper	This — developer profiles + repos
GitHub Issues Scraper	Issues and PRs for a repo

All free to inspect on Apify Store — 31 published actors, 78 total in portfolio.

Common questions

Q: Do I need a GitHub API token? A: For public profiles, no — the actor uses the unauthenticated REST API (60 req/h). With ~30 repos per user, that supports about 30 users per hour before the rate-limit kicks in. The actor handles X-RateLimit-Reset automatically; very large batches will simply pause and resume.

Q: Does this scrape stars given, contribution graph, or sponsors? A: No — those are separate REST endpoints (and the contribution graph is HTML-only). Available as a custom build (see Custom scraping below).

Q: Why is email null even on public-figure accounts? A: GitHub only exposes email when the user has explicitly set a public email in profile settings. For most profiles email is null. Use the blog field as a fallback contact-discovery signal.

Q: Are forked repos included? A: Yes — isForked: true flags them. Filter them out client-side if you want owned-only repos.

Custom scraping — pilot tiers

Need authenticated batches, contribution-graph extraction, organization-wide audits, or a different schema (e.g. last-90-days commits per repo)? Three tiers:

Pilot — $97 · 1 actor, basic config, 7-day support. Good entry point — useful for a single OSS-strategy report on one company's contributors.
Standard — $297 · custom actor + Slack/email alerts on results, 30-day support. Most recruiting and competitor-OSS projects fit here.
Premium — $797 · custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (weekly contributor refresh, multi-org rollups, technology-trend tracking).

Email: spinov001@gmail.com — drop the username list and the schema you need; quote within 48h.

Proof of work: 31 published Apify scrapers (78 total in portfolio) — Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).

More tips: t.me/scraping_ai · blog.spinov.online

Disclaimer

Designed for recruiting research, OSS-strategy analysis, and academic use. Respect GitHub's Terms of Service, applicable data-protection law (GDPR, CCPA), and scrape publicly visible content only. Not affiliated with GitHub, Inc. or Microsoft Corporation.

Honest disclosure: 19 profile fields (18 from GitHub + scrapedAt) + 15 repo fields per record. Unauthenticated rate-limit is 60 req/h; the actor handles X-RateLimit-Reset automatically (no max-wait safeguard — long resets = long waits). totalStars/totalForks/languages aggregates are computed over the EXTRACTED repos only (capped by maxReposPerUser, max 100), NOT lifetime totals. languages.repoCount is repo count not bytes-weighted. Single API page, no pagination beyond 100 repos. One user's HTTP error halts the batch (outer try/catch). No contribution graph, no commit-activity series, no stars-given list, no sponsors data — those are different endpoints and can be built as custom additions.

GitHub Trending — CSV Stars, Topics by Period, No Token

knotless_cadence/github-trending-scraper

20 runs. GitHub Trending repos in CSV/JSON — owner, name, url, language, stars, topics. Daily/weekly/monthly + lang filter, no token. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For OSS scouting + VC dealflow. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

Google Maps Scraper — Reviews, Contacts & Leads [No API Key]

knotless_cadence/google-maps-scraper-pro

18 runs. Google Maps: name, address, phone, site, category, rating, reviews, hours, GPS, place-ID. CSV/JSON, no key. Local-biz prospecting + competitor scout + territory mapping. Backed by 951-run Trustpilot flagship + 31-actor portfolio. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

IMDb Scraper — Ratings, Cast, Genres, JSON/CSV, No Key

knotless_cadence/imdb-movie-scraper

16 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. IMDb titles in JSON/CSV — title, imdbId, type, genres, actors, directors, rating. Bulk by ID or search. No API key. For streaming intel + licensing + recommender training. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

Yelp Scraper — Reviews, Ratings, Contacts, CSV, No API Key

knotless_cadence/yelp-business-scraper

Yelp business leads CSV/JSON — name, address, phone, website, rating, reviews, categories by keyword+city. No paid API, no copy-paste. 17 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For local-biz prospecting + SMB lead-gen. spinov001@gmail.com · blog.spinov.online

Alex

Website Screenshot — Full Pages, Any Resolution, PNG, No Limits

knotless_cadence/website-screenshot-scraper

20 runs. Website screenshots as PNG/JPG/PDF in 2 min — full-page, desktop + mobile, custom viewport, bulk URL input. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For competitor visual tracking + UX research. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

Tech Stack Detector — Frameworks, CMS, Analytics, JSON Out

knotless_cadence/website-tech-stack-detector

Competitor tech stack as CSV/JSON in 2 min — frameworks, CMS, analytics, CDN, servers, trackers. No Wappalyzer seat fee, no BuiltWith cap. 19 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

IP Geolocation — Country, City, ISP, CSV, No API Key, Bulk

knotless_cadence/ip-geolocation-lookup

20 runs. IP intel as CSV/JSON — country, region, city, ISP, ASN, timezone, lat/lon, isMobile/isProxy flags. Accepts IPs + domains. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For fraud + ad-targeting + GDPR audits. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

ArXiv Paper Scraper — Search by Category, Bulk JSON, DOI

knotless_cadence/arxiv-paper-scraper

arXiv corpus as JSON — arxivId, title, authors, abstract, categories, dates, DOI, PDF URL. By search OR category. Built for ML/AI training data + lit reviews. 19 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

Walmart Reviews Scraper — Product Reviews to CSV/JSON in 2 min

knotless_cadence/walmart-reviews-scraper

25 runs / u7d=1 fresh signal. Backed by 971-run Trustpilot flagship + 32-actor portfolio (2190 lifetime runs). Walmart reviews → CSV/JSON. Bypasses 100-review UI cap. 17 fields: stars, text, author, date, helpful, images. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

Product Hunt Scraper — Launches, Upvotes, Makers, CSV, Daily

knotless_cadence/product-hunt-scraper

Product Hunt intel as JSON/CSV — 11 fields/post (name, tagline, votes, comments, makers, topics, url, createdAt + 3 more). No API key (HTML+JSON parse). 16 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For VC dealflow + launch intel. spinov001@gmail.com · t.me/scraping_ai

Alex