๐Ÿ™ GitHub Scraper โ€” Repos, Stars & Code Data avatar

๐Ÿ™ GitHub Scraper โ€” Repos, Stars & Code Data

Pricing

from $20.00 / 1,000 results

Go to Apify Store
๐Ÿ™ GitHub Scraper โ€” Repos, Stars & Code Data

๐Ÿ™ GitHub Scraper โ€” Repos, Stars & Code Data

Extract repo data from GitHub โ€” stars, forks, contributors, languages, issues & READMEs. Build developer tools, open source analytics & technology trend trackers. Pay per repo.

Pricing

from $20.00 / 1,000 results

Rating

0.0

(0)

Developer

NexGenData

NexGenData

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

3 days ago

Last modified

Share

๐Ÿ™ GitHub Scraper โ€” Repos, Stars, Forks, Contributors & Topic Discovery

Pay-per-result GitHub scraper โ€” extracts full repo metadata, stargazer counts, fork graphs, contributor lists, languages, topics, license, README content, and release cadence. Built for VC scouts, devtool marketers, OSS-funded analytics, and competitive intelligence as a no-rate-limit alternative to GitHub's REST API (5000 req/hr authenticated cap), GraphQL v4 (point quota), GitHub Archive on BigQuery (storage cost), Octoverse-style reports, and SaaS aggregators like Sourcegraph Cloud ($299+/mo) and OpenSauced.

Why GitHub Scraper Beats the GitHub REST API, GraphQL & Sourcegraph

FeatureNexGenData GitHub ScraperGitHub REST APIGitHub GraphQL v4Sourcegraph Cloud
Cost$0.002 / repo, pay-per-resultFree + rate-limitedFree + node quota$299-1000+ / month
Rate limitNone for end user5000 req/hr5000 node points/hrPlan-dependent
AuthApify tokenGitHub PAT (often hits cap)GitHub PATAccount + plan
Bulk exportDirect dataset โ†’ JSON/CSV/ExcelPer-call RESTPer-query GraphQLUI + limited API
Contributor enrichmentYesMulti-call requiredMulti-queryYes
Topic / trending discoveryYesLimitedLimitedYes
Free trialFree Apify credits on signupFree for low volumeFree for low volume30-day trial

VC scouts, devtool marketers, and OSS-funded analysts pick this actor instead of rolling their own PAT-rotation rig to dodge GitHub's 5000-req/hr limit. It is a drop-in alternative to GitHub's API for "I just need to pull 50K repos by topic + their stargazer-history slope" โ€” the kind of query you cannot do in one GitHub API call.

What You Get Per Repo

Each dataset item is a flat JSON record:

  • owner, name, full_name, html_url, description
  • stars, forks, watchers, open_issues, closed_issues, open_prs, closed_prs
  • primary_language, languages_breakdown (% by bytes)
  • topics, license_spdx, default_branch
  • created_at, updated_at, pushed_at, last_release_at
  • contributors_count, top_contributors โ€” array of {login, contributions}
  • releases โ€” array of {tag_name, published_at, prerelease, download_count}
  • commit_activity_52w โ€” weekly commit counts
  • star_history โ€” sampled time series
  • readme_text, has_wiki, has_pages, archived, disabled
  • funding_links โ€” Open Collective, GitHub Sponsors, Patreon, etc.

Use Cases

  • VC devtool scouting โ€” surface fast-growing OSS repos in your thesis (LLM tooling, observability, dev experience) without burning GitHub PAT quota
  • Developer-tool marketers โ€” find repos using your competitor's library to retarget
  • OSS funding programs โ€” score grant candidates on commit cadence, contributor diversity, and adoption signals
  • Hiring teams โ€” discover top contributors in a language / topic
  • Investor due-diligence โ€” verify a startup's claim of "X stars in Y months" with raw star-history data
  • Competitive intel โ€” track release cadence + open-issue backlog of a competitor's OSS
  • Newsletters / Substacks โ€” automate "top 10 LLM repos this month" content

Quick Start

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("nexgendata/github-scraper").call(run_input={
"queries": ["topic:llm stars:>500", "language:rust stars:>1000"],
"includeContributors": True,
"includeReleases": True,
"maxResults": 5000
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["full_name"], item["stars"], item["primary_language"])

Pricing

Pay-per-event โ€” no PAT quota, no monthly minimum.

  • Actor Start: $0.0001
  • Per repo: $0.002
  • Per contributor enrichment: $0.0005

A 5000-repo topic sweep with contributors costs about $10-15. The equivalent in GitHub-API terms requires hours of PAT rotation and exponential backoff.

Use caseActor
Daily / weekly trending reposGitHub Trending Scraper
Deep stargazer-history + analyticsGitHub Repo Stats
GitLab projects + MRsGitLab Scraper
Docker Hub image pull countsDocker Hub Scraper
npm package download statsnpm Package Stats
PyPI package download statsPyPI Package Stats
Dev.to articles + dev audienceDev.to Scraper
Developer-tools MCP serverDeveloper Tools MCP Server

FAQ

Q: Why not just use the GitHub API directly? Two reasons: (1) the 5000 req/hr cap throttles anything over ~3000 repos with contributor enrichment, and (2) star-history requires sampling the stargazer event stream โ€” a paginated multi-call dance that's painful in any GraphQL implementation.

Q: Do you respect GitHub's TOS? We use unauthenticated public-page extraction plus optional authenticated API calls when the user provides a PAT. All data we extract is publicly available without login.

Q: Can I scrape private repos? No โ€” public data only.

Q: How fresh are stargazer counts? Live per run. Star history is sampled at points along the repo's life.

Q: What about commit-level data? This actor stops at repo-level metadata. For commit-diff content extraction, layer on github-repo-stats which deep-dives into individual repos.

Q: Can I filter by topic or language? Yes โ€” queries accepts any GitHub search syntax (topic:, language:, stars:, created:, pushed:, etc.).

About NexGenData

NexGenData publishes 260+ buyer-intent actors covering SEC filings, YC alumni, lead generation, competitive intelligence, stock fundamentals across 30+ exchanges, and more. All pay-per-result. Browse the full catalog at https://apify.com/nexgendata?fpr=2ayu9b


How NexGenData Pricing Works

Every NexGenData actor uses pay-per-event pricing โ€” you only pay for results that actually land in your dataset. No monthly minimum, no seat fees, no surprise overage bills.

  • Actor Start: a single-event charge each time you spin the actor up (scaled to memory size)
  • Result / item: charged per item written to the default dataset
  • No charge for retries, internal proxy rotation, or failed sub-requests โ€” those are absorbed by the platform

Apify Platform Bonus

New to Apify? Sign up with the NexGenData referral link โ€” you get free platform credits on signup (enough for several thousand free results) and you help fund the maintenance of this actor fleet.

Integration Surface

Every actor in the NexGenData catalog can be triggered from:

  • Apify console โ€” point-and-click run
  • Apify API โ€” REST + webhooks
  • Apify Python / JS SDKs โ€” programmatic batch
  • Zapier, Make.com, n8n โ€” official integrations
  • MCP โ€” many actors are exposed as MCP tools for Claude / ChatGPT / Cursor agents
  • Schedules โ€” built-in cron for daily / weekly / monthly runs
  • Webhooks โ€” POST results to any HTTPS endpoint on dataset write

Support

NexGenData maintains 260+ Apify actors and ships updates regularly. Bug reports via the Apify console issues tab get a response within 24 hours. Roadmap requests are welcome โ€” high-demand features ship in the next version.

Home: thenextgennexus.com Full catalog: apify.com/nexgendata