๐ GitHub Scraper โ Repos, Stars & Code Data
Pricing
from $20.00 / 1,000 results
๐ GitHub Scraper โ Repos, Stars & Code Data
Extract repo data from GitHub โ stars, forks, contributors, languages, issues & READMEs. Build developer tools, open source analytics & technology trend trackers. Pay per repo.
Pricing
from $20.00 / 1,000 results
Rating
0.0
(0)
Developer
NexGenData
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
๐ GitHub Scraper โ Repos, Stars, Forks, Contributors & Topic Discovery
Pay-per-result GitHub scraper โ extracts full repo metadata, stargazer counts, fork graphs, contributor lists, languages, topics, license, README content, and release cadence. Built for VC scouts, devtool marketers, OSS-funded analytics, and competitive intelligence as a no-rate-limit alternative to GitHub's REST API (5000 req/hr authenticated cap), GraphQL v4 (point quota), GitHub Archive on BigQuery (storage cost), Octoverse-style reports, and SaaS aggregators like Sourcegraph Cloud ($299+/mo) and OpenSauced.
Why GitHub Scraper Beats the GitHub REST API, GraphQL & Sourcegraph
| Feature | NexGenData GitHub Scraper | GitHub REST API | GitHub GraphQL v4 | Sourcegraph Cloud |
|---|---|---|---|---|
| Cost | $0.002 / repo, pay-per-result | Free + rate-limited | Free + node quota | $299-1000+ / month |
| Rate limit | None for end user | 5000 req/hr | 5000 node points/hr | Plan-dependent |
| Auth | Apify token | GitHub PAT (often hits cap) | GitHub PAT | Account + plan |
| Bulk export | Direct dataset โ JSON/CSV/Excel | Per-call REST | Per-query GraphQL | UI + limited API |
| Contributor enrichment | Yes | Multi-call required | Multi-query | Yes |
| Topic / trending discovery | Yes | Limited | Limited | Yes |
| Free trial | Free Apify credits on signup | Free for low volume | Free for low volume | 30-day trial |
VC scouts, devtool marketers, and OSS-funded analysts pick this actor instead of rolling their own PAT-rotation rig to dodge GitHub's 5000-req/hr limit. It is a drop-in alternative to GitHub's API for "I just need to pull 50K repos by topic + their stargazer-history slope" โ the kind of query you cannot do in one GitHub API call.
What You Get Per Repo
Each dataset item is a flat JSON record:
owner,name,full_name,html_url,descriptionstars,forks,watchers,open_issues,closed_issues,open_prs,closed_prsprimary_language,languages_breakdown(% by bytes)topics,license_spdx,default_branchcreated_at,updated_at,pushed_at,last_release_atcontributors_count,top_contributorsโ array of{login, contributions}releasesโ array of{tag_name, published_at, prerelease, download_count}commit_activity_52wโ weekly commit countsstar_historyโ sampled time seriesreadme_text,has_wiki,has_pages,archived,disabledfunding_linksโ Open Collective, GitHub Sponsors, Patreon, etc.
Use Cases
- VC devtool scouting โ surface fast-growing OSS repos in your thesis (LLM tooling, observability, dev experience) without burning GitHub PAT quota
- Developer-tool marketers โ find repos using your competitor's library to retarget
- OSS funding programs โ score grant candidates on commit cadence, contributor diversity, and adoption signals
- Hiring teams โ discover top contributors in a language / topic
- Investor due-diligence โ verify a startup's claim of "X stars in Y months" with raw star-history data
- Competitive intel โ track release cadence + open-issue backlog of a competitor's OSS
- Newsletters / Substacks โ automate "top 10 LLM repos this month" content
Quick Start
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("nexgendata/github-scraper").call(run_input={"queries": ["topic:llm stars:>500", "language:rust stars:>1000"],"includeContributors": True,"includeReleases": True,"maxResults": 5000})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["full_name"], item["stars"], item["primary_language"])
Pricing
Pay-per-event โ no PAT quota, no monthly minimum.
- Actor Start: $0.0001
- Per repo: $0.002
- Per contributor enrichment: $0.0005
A 5000-repo topic sweep with contributors costs about $10-15. The equivalent in GitHub-API terms requires hours of PAT rotation and exponential backoff.
Related NexGenData Actors
| Use case | Actor |
|---|---|
| Daily / weekly trending repos | GitHub Trending Scraper |
| Deep stargazer-history + analytics | GitHub Repo Stats |
| GitLab projects + MRs | GitLab Scraper |
| Docker Hub image pull counts | Docker Hub Scraper |
| npm package download stats | npm Package Stats |
| PyPI package download stats | PyPI Package Stats |
| Dev.to articles + dev audience | Dev.to Scraper |
| Developer-tools MCP server | Developer Tools MCP Server |
FAQ
Q: Why not just use the GitHub API directly? Two reasons: (1) the 5000 req/hr cap throttles anything over ~3000 repos with contributor enrichment, and (2) star-history requires sampling the stargazer event stream โ a paginated multi-call dance that's painful in any GraphQL implementation.
Q: Do you respect GitHub's TOS? We use unauthenticated public-page extraction plus optional authenticated API calls when the user provides a PAT. All data we extract is publicly available without login.
Q: Can I scrape private repos? No โ public data only.
Q: How fresh are stargazer counts? Live per run. Star history is sampled at points along the repo's life.
Q: What about commit-level data?
This actor stops at repo-level metadata. For commit-diff content extraction, layer on github-repo-stats which deep-dives into individual repos.
Q: Can I filter by topic or language?
Yes โ queries accepts any GitHub search syntax (topic:, language:, stars:, created:, pushed:, etc.).
About NexGenData
NexGenData publishes 260+ buyer-intent actors covering SEC filings, YC alumni, lead generation, competitive intelligence, stock fundamentals across 30+ exchanges, and more. All pay-per-result. Browse the full catalog at https://apify.com/nexgendata?fpr=2ayu9b
How NexGenData Pricing Works
Every NexGenData actor uses pay-per-event pricing โ you only pay for results that actually land in your dataset. No monthly minimum, no seat fees, no surprise overage bills.
- Actor Start: a single-event charge each time you spin the actor up (scaled to memory size)
- Result / item: charged per item written to the default dataset
- No charge for retries, internal proxy rotation, or failed sub-requests โ those are absorbed by the platform
Apify Platform Bonus
New to Apify? Sign up with the NexGenData referral link โ you get free platform credits on signup (enough for several thousand free results) and you help fund the maintenance of this actor fleet.
Integration Surface
Every actor in the NexGenData catalog can be triggered from:
- Apify console โ point-and-click run
- Apify API โ REST + webhooks
- Apify Python / JS SDKs โ programmatic batch
- Zapier, Make.com, n8n โ official integrations
- MCP โ many actors are exposed as MCP tools for Claude / ChatGPT / Cursor agents
- Schedules โ built-in cron for daily / weekly / monthly runs
- Webhooks โ POST results to any HTTPS endpoint on dataset write
Support
NexGenData maintains 260+ Apify actors and ships updates regularly. Bug reports via the Apify console issues tab get a response within 24 hours. Roadmap requests are welcome โ high-demand features ship in the next version.
Home: thenextgennexus.com Full catalog: apify.com/nexgendata