Pricing

Pay per event

GitHub Scraper — Repos, Issues, PRs & Code

Scrape GitHub deeply — repos, issues, PRs, code search, contributors, releases, READMEs, commits, users, trending. 11 modes in one actor for AI coding agents (Claude Code, Cursor, Copilot). Optional PAT for 5K req/hr. MCP-ready, flat JSON output.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Khadin Akbar

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

What it does

Scrapes GitHub deeply through 11 selectable modes:

Mode	Returns	Required input
`repo`	Full repo metadata — 50+ fields (stars, forks, languages, topics, license, default branch, latest release)	`repo`
`repo-search`	Repos matching a search query — full GitHub search qualifier syntax (`language:`, `stars:>1000`, `user:`, `topic:`)	`query`
`issues`	Issues with labels, assignees, milestones, optional comment thread	`repo`
`prs`	Pull requests with reviewers, files changed, mergeable status, optional reviews + comments	`repo`
`code-search`	Code search across all of GitHub — file path, repo, sha, text matches (requires GITHUB_TOKEN)	`query`
`contributors`	Contributors with login, contribution count, full profile (email, company, location, bio)	`repo`
`releases`	Releases with assets, download counts, release notes, prerelease flag	`repo`
`readme`	Full README — raw markdown + GFM-rendered HTML	`repo`
`commits`	Commit history with author, message, SHA, parents, optional file diffs + stats	`repo`
`user`	User or organization profile + their repos + organizations + social accounts	`user`
`trending`	Trending repos by language and timeframe (daily / weekly / monthly)	none

When to use it

AI coding agents that need to ground answers in real repo state — issue history, PR reviews, contributor expertise, recent commits.
OSS research — analyze hundreds of repos for tech stack, activity, bus factor, dependency drift.
DevRel sourcing — find maintainers, contributors, and active issue commenters for partnership outreach.
Recruiter pipelines — identify high-signal devs by contribution patterns and language depth.
Competitive intelligence — track competitor open-source releases, issue volume, PR velocity.

When not to use it

Private repositories or GitHub Enterprise Server (use the official gh CLI with auth instead).
Real-time GitHub events — use GitHub webhooks for that.
LinkedIn-style enrichment of GitHub users — see linkedin-profile-email-scraper for that.

Price

apify-actor-start — $0.00005 per run start.
result — $0.005 per record returned for: repo, repo-search, issues (without comments), prs (without reviews/comments), contributors, releases, readme, user, trending.
deep-result — $0.01 per record for heavier modes: code-search, commits with includeFiles=true, PRs with includeReviews=true or includeComments=true, issues with includeComments=true.

A typical agent call (50 repos in repo-search) costs about $0.25. A deep run (100 PRs with reviews + comments) costs about $1.00. The actor stops at maxResults (default 50, hard cap 1000) so one run stays under $10 — the x402 default prepay limit for agentic payments.

Authentication

Without a token, GitHub's REST API allows 60 requests/hour. With a token: 5,000/hour. Code search requires authentication.

Set the GITHUB_TOKEN environment variable in Apify Console → Settings → Environment variables → Add → Secret:

Create a fine-grained Personal Access Token at https://github.com/settings/tokens?type=beta
Grant public_repo (read-only) scope; that's enough for everything public.
Paste the token as GITHUB_TOKEN in the actor's environment variables. Apify masks it automatically.

The actor never logs the token — apify/log auto-redacts.

Example inputs

Full metadata for one repo

{
  "mode": "repo",
  "repo": "facebook/react"
}

Top 50 TypeScript MCP repos

{
  "mode": "repo-search",
  "query": "language:typescript stars:>500 mcp",
  "maxResults": 50
}

Open issues with full conversation threads

{
  "mode": "issues",
  "repo": "apify/actors-mcp-server",
  "state": "open",
  "includeComments": true,
  "maxResults": 100
}

Recent merged PRs with reviews

{
  "mode": "prs",
  "repo": "vercel/next.js",
  "state": "closed",
  "includeReviews": true,
  "since": "2026-04-01",
  "maxResults": 200
}

Code search across GitHub

{
  "mode": "code-search",
  "query": "StreamableHTTPServerTransport language:typescript",
  "maxResults": 100
}

{
  "mode": "trending",
  "language": "rust",
  "timeframe": "daily"
}

Last 30 days of commits with file diffs

{
  "mode": "commits",
  "repo": "anthropics/claude-code",
  "since": "2026-04-28",
  "includeFiles": true,
  "maxResults": 200
}

Output shape

Every record has mode, type, url, and scrapedAt (ISO 8601 UTC). Mode-specific fields follow. Items are flat, nulls are explicit, dates are ISO 8601. Average item size is under 500 tokens — built to fit inside an agent's context window when sampling 3-20 results.

Sample repo record:

{
  "mode": "repo",
  "type": "repo",
  "owner": "facebook",
  "name": "react",
  "fullName": "facebook/react",
  "description": "The library for web and native user interfaces.",
  "url": "https://github.com/facebook/react",
  "homepage": "https://react.dev",
  "language": "JavaScript",
  "topics": ["react", "frontend", "javascript", "library"],
  "stars": 229000,
  "forks": 46900,
  "watchers": 6700,
  "openIssues": 980,
  "license": "MIT",
  "archived": false,
  "defaultBranch": "main",
  "createdAt": "2013-05-24T16:15:54Z",
  "updatedAt": "2026-05-28T11:20:01Z",
  "pushedAt": "2026-05-28T03:42:11Z",
  "languages": { "JavaScript": 8294122, "TypeScript": 311024, "HTML": 24189 },
  "latestRelease": { "tagName": "v18.3.1", "publishedAt": "2025-04-26T17:42:00Z" },
  "scrapedAt": "2026-05-28T18:14:32Z"
}

Use with MCP

The actor exposes itself as apify--github-deep-scraper in the Apify MCP server. Hit it from any MCP client:

https://mcp.apify.com?tools=khadinakbar/github-deep-scraper

From Claude Code or Cursor, configure the Apify MCP server with your Apify token, then the tool is discoverable through standard MCP list_tools calls. Anthropic agents budget per call — typical agent runs stay under the $1 x402 default prepay limit when maxResults is set sensibly.

Reliability and rate limits

Built-in retry with exponential backoff for 5xx errors.
429 backoff respects GitHub's Retry-After header.
403 rate-limit responses wait until X-RateLimit-Reset (or fail clean if the wait exceeds 60 s).
Latest rate-limit state is persisted to the actor's key-value store (RATELIMIT-latest) for inspection.
ETag headers are surfaced via the KV store collection ETAG to enable downstream conditional caching.

FAQ

Why one actor with 11 modes instead of 11 separate actors? Agents call tools by name. Having one tool that covers all GitHub surfaces means the agent picks correctly the first time. Eleven separate actors mean eleven tool-description shootouts and eleven chances to pick the wrong one.

Do I need a GitHub token? For most modes, no — but you'll be capped at 60 requests/hour. Set GITHUB_TOKEN to lift that to 5,000/hour. For code-search, a token is required (GitHub's API forces this).

Does it work for private repositories? No. Use the official GitHub CLI (gh) for private repos. This actor is designed for public data only.

What about GitHub GraphQL? The actor uses REST v3 for stable, paginated endpoints. A future version may switch heavy multi-field reads to GraphQL to halve API quota usage.

How fresh is the data? Real-time. Every run hits GitHub's live API.

Can I run multiple modes in one call? No — one mode per run. Chain runs from your orchestrator (Apify task, n8n, Zapier, agent loop) when you need composite data.

Legal

This actor accesses GitHub's official REST and GraphQL APIs and the public github.com/trending HTML page. All endpoints used are public; no authentication is required to access them (a token is recommended for higher rate limits). You are responsible for complying with GitHub's Terms of Service and Acceptable Use Policy. The actor does not scrape private repositories or any data behind login. It does not bypass any technical protection measure. Personal data extracted (contributor names, emails published on profiles) must be handled in accordance with applicable data-protection law (GDPR, CCPA). Apify and the actor author make no warranty about data accuracy, completeness, or fitness for any purpose.

Changelog

2026-05-29 v0.2 — Reliability hardening. Graceful 404/422/451/409 → structured not-found/search-error records instead of failed runs. Canary check at run start. Pre-flight validation. Trending mode now flags invalid language slugs explicitly. 36/36 brutal test matrix green.
2026-05-28 v0.1 — Initial release. 11 modes. REST v3. PAT-aware rate-limit handling. Premium PPE pricing.

Google Patents Scraper — search patents, citations & inventor portfolios across USPTO/EPO/WIPO when you need IP context alongside open-source.
Hugging Face Scraper — models, datasets & Spaces for AI research and benchmarking work that pairs with GitHub repo intel.
Y Combinator Scraper — YC company profiles, founders & jobs to enrich repo-author identification and dev-tool competitive scans.
Google SERP Scraper — Google search results for any keyword when you need to combine repo signal with web-wide ranking.
ImportYeti Scraper — US Customs import/supplier graph for industrial intelligence beyond the software layer.

Built by @khadinakbar.

🐙 GitHub MCP — AI Code & Repo Analytics

nexgendata/github-mcp-server

🐙 GitHub repos, code search, issues, PRs, commits MCP server for AI agents (Claude Desktop, Cursor, OpenAI Agents SDK, Vercel AI SDK). Search repos + read code + issues + PRs + releases + user profiles via MCP — built for code-aware AI workflows. Free tier available.

NexGenData

GitHub Repository Scraper — Stars, Issues & Activity

sovereigntaylor/github-repo-scraper

Scrape any GitHub repository for stars, forks, issues, PRs, contributors, languages, topics, releases, license, last commit, and README preview. Search repos by keyword with language and star filters. Great for tech research and competitive analysis.

Ricardo Akiyoshi

GitHub Repository Intelligence MCP Server

cg_nguyen/github-repo-intel-mcp

MCP server for AI coding agents. Pulls structured intel from any public GitHub repository — overview, recent PRs, contributors, hot files, CI status, dependencies — over HTTP/JSON-RPC. Optional PAT for private repos and higher rate limits.

CG Nguyễn

GitHub Repository & Trending Scraper

rupom888/github-repository-scraper

Search GitHub repos, scrape user profiles with repos, get repo details with contributors, or track GitHub trending. Uses public API - optional token for higher rate limits.

Syed Rupom

GitHub Scraper

brilliant_gum/github-scraper

Scrapes GitHub repositories, users, trending repos, issues and code via the GitHub REST API v3. Supports authentication tokens for higher rate limits (5000 req/hr vs 60 req/hr unauthenticated). Includes smart analytics: language distribution, license distribution, stars histogram, activity scores...

Yuliia Kulakova

🐙 GitHub Scraper — Repos, Stars & Code Data

nexgendata/github-scraper

Extract repo data from GitHub — stars, forks, contributors, languages, issues & READMEs. Build developer tools, open source analytics & technology trend trackers. Pay per repo.

NexGenData

GitHub Scraper - Repos, Stars, Issues & Profiles

cryptosignals/github-scraper

Scrape GitHub repositories, profiles, and issues — extract stars, forks, contributors, README, commit history, and topics. CSV/JSON output. No login.

Web Data Labs

GitHub Issues Scraper

glassventures/github-issues-scraper

Scrape GitHub issues from repos, orgs, or search queries. Extract titles, labels, assignees, comments, reactions. Export to JSON, CSV, Excel.

Glass Ventures

GitHub Code Search API

automly/github-code-search-api

Search public GitHub code and export structured code-hit records for developer research, package discovery, code intelligence, and technical lead generation.