Pricing

Pay per event

GitHub Scraper — Repos, Issues, PRs & Code

Scrape GitHub deeply — repos, issues, PRs, code search, contributors, releases, READMEs, commits, users, trending. 11 modes in one actor for AI coding agents (Claude Code, Cursor, Copilot). Optional PAT for 5K req/hr. MCP-ready, flat JSON output.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Khadin Akbar

Actor stats

Bookmarked

Total users

Monthly active users

a day ago

Last modified

GitHub Deep Scraper — Repos, Issues, PRs, Code Search, Commits

The GitHub data layer for AI coding agents. One actor, 11 modes, official REST + GraphQL APIs. Built for Claude Code, Cursor, GitHub Copilot, Aider, and any agent that needs rich GitHub context in a single tool call instead of stitching five narrow scrapers together.

What it does

Scrapes GitHub deeply through 11 selectable modes:

Mode	Returns	Required input
`repo`	Full repo metadata — 50+ fields (stars, forks, languages, topics, license, default branch, latest release)	`repo`
`repo-search`	Repos matching a search query — full GitHub search qualifier syntax (`language:`, `stars:>1000`, `user:`, `topic:`)	`query`
`issues`	Issues with labels, assignees, milestones, optional comment thread	`repo`
`prs`	Pull requests with reviewers, files changed, mergeable status, optional reviews + comments	`repo`
`code-search`	Code search across all of GitHub — file path, repo, sha, text matches (requires GITHUB_TOKEN)	`query`
`contributors`	Contributors with login, contribution count, full profile (email, company, location, bio)	`repo`
`releases`	Releases with assets, download counts, release notes, prerelease flag	`repo`
`readme`	Full README — raw markdown + GFM-rendered HTML	`repo`
`commits`	Commit history with author, message, SHA, parents, optional file diffs + stats	`repo`
`user`	User or organization profile + their repos + organizations + social accounts	`user`
`trending`	Trending repos by language and timeframe (daily / weekly / monthly)	none

When to use it

AI coding agents that need to ground answers in real repo state — issue history, PR reviews, contributor expertise, recent commits.
OSS research — analyze hundreds of repos for tech stack, activity, bus factor, dependency drift.
DevRel sourcing — find maintainers, contributors, and active issue commenters for partnership outreach.
Recruiter pipelines — identify high-signal devs by contribution patterns and language depth.
Competitive intelligence — track competitor open-source releases, issue volume, PR velocity.

When not to use it

Private repositories or GitHub Enterprise Server (use the official gh CLI with auth instead).
Real-time GitHub events — use GitHub webhooks for that.
LinkedIn-style enrichment of GitHub users — see linkedin-profile-email-scraper for that.

Price

apify-actor-start — $0.00005 per run start.
result — $0.005 per record returned for: repo, repo-search, issues (without comments), prs (without reviews/comments), contributors, releases, readme, user, trending.
deep-result — $0.01 per record for heavier modes: code-search, commits with includeFiles=true, PRs with includeReviews=true or includeComments=true, issues with includeComments=true.

A typical agent call (50 repos in repo-search) costs about $0.25. A deep run (100 PRs with reviews + comments) costs about $1.00. The actor stops at maxResults (default 50, hard cap 1000) so one run stays under $10 — the x402 default prepay limit for agentic payments.

Authentication

Without a token, GitHub's REST API allows 60 requests/hour. With a token: 5,000/hour. Code search requires authentication.

Set the GITHUB_TOKEN environment variable in Apify Console → Settings → Environment variables → Add → Secret:

Create a fine-grained Personal Access Token at https://github.com/settings/tokens?type=beta
Grant public_repo (read-only) scope; that's enough for everything public.
Paste the token as GITHUB_TOKEN in the actor's environment variables. Apify masks it automatically.

The actor never logs the token — apify/log auto-redacts.

Example inputs

Full metadata for one repo

{
  "mode": "repo",
  "repo": "facebook/react"
}

Top 50 TypeScript MCP repos

{
  "mode": "repo-search",
  "query": "language:typescript stars:>500 mcp",
  "maxResults": 50
}

Open issues with full conversation threads

{
  "mode": "issues",
  "repo": "apify/actors-mcp-server",
  "state": "open",
  "includeComments": true,
  "maxResults": 100
}

Recent merged PRs with reviews

{
  "mode": "prs",
  "repo": "vercel/next.js",
  "state": "closed",
  "includeReviews": true,
  "since": "2026-04-01",
  "maxResults": 200
}

Code search across GitHub

{
  "mode": "code-search",
  "query": "StreamableHTTPServerTransport language:typescript",
  "maxResults": 100
}

{
  "mode": "trending",
  "language": "rust",
  "timeframe": "daily"
}

Last 30 days of commits with file diffs

{
  "mode": "commits",
  "repo": "anthropics/claude-code",
  "since": "2026-04-28",
  "includeFiles": true,
  "maxResults": 200
}

Output shape

Every record has mode, type, url, and scrapedAt (ISO 8601 UTC). Mode-specific fields follow. Items are flat, nulls are explicit, dates are ISO 8601. Average item size is under 500 tokens — built to fit inside an agent's context window when sampling 3-20 results.

Sample repo record:

{
  "mode": "repo",
  "type": "repo",
  "owner": "facebook",
  "name": "react",
  "fullName": "facebook/react",
  "description": "The library for web and native user interfaces.",
  "url": "https://github.com/facebook/react",
  "homepage": "https://react.dev",
  "language": "JavaScript",
  "topics": ["react", "frontend", "javascript", "library"],
  "stars": 229000,
  "forks": 46900,
  "watchers": 6700,
  "openIssues": 980,
  "license": "MIT",
  "archived": false,
  "defaultBranch": "main",
  "createdAt": "2013-05-24T16:15:54Z",
  "updatedAt": "2026-05-28T11:20:01Z",
  "pushedAt": "2026-05-28T03:42:11Z",
  "languages": { "JavaScript": 8294122, "TypeScript": 311024, "HTML": 24189 },
  "latestRelease": { "tagName": "v18.3.1", "publishedAt": "2025-04-26T17:42:00Z" },
  "scrapedAt": "2026-05-28T18:14:32Z"
}

Use with MCP

The actor exposes itself as apify--github-deep-scraper in the Apify MCP server. Hit it from any MCP client:

https://mcp.apify.com?tools=khadinakbar/github-deep-scraper

From Claude Code or Cursor, configure the Apify MCP server with your Apify token, then the tool is discoverable through standard MCP list_tools calls. Anthropic agents budget per call — typical agent runs stay under the $1 x402 default prepay limit when maxResults is set sensibly.

Reliability and rate limits

Built-in retry with exponential backoff for 5xx errors.
429 backoff respects GitHub's Retry-After header.
403 rate-limit responses wait until X-RateLimit-Reset (or fail clean if the wait exceeds 60 s).
Latest rate-limit state is persisted to the actor's key-value store (RATELIMIT-latest) for inspection.
ETag headers are surfaced via the KV store collection ETAG to enable downstream conditional caching.

FAQ

Why one actor with 11 modes instead of 11 separate actors? Agents call tools by name. Having one tool that covers all GitHub surfaces means the agent picks correctly the first time. Eleven separate actors mean eleven tool-description shootouts and eleven chances to pick the wrong one.

Do I need a GitHub token? For most modes, no — but you'll be capped at 60 requests/hour. Set GITHUB_TOKEN to lift that to 5,000/hour. For code-search, a token is required (GitHub's API forces this).

Does it work for private repositories? No. Use the official GitHub CLI (gh) for private repos. This actor is designed for public data only.

What about GitHub GraphQL? The actor uses REST v3 for stable, paginated endpoints. A future version may switch heavy multi-field reads to GraphQL to halve API quota usage.

How fresh is the data? Real-time. Every run hits GitHub's live API.

Can I run multiple modes in one call? No — one mode per run. Chain runs from your orchestrator (Apify task, n8n, Zapier, agent loop) when you need composite data.

Legal

This actor accesses GitHub's official REST and GraphQL APIs and the public github.com/trending HTML page. All endpoints used are public; no authentication is required to access them (a token is recommended for higher rate limits). You are responsible for complying with GitHub's Terms of Service and Acceptable Use Policy. The actor does not scrape private repositories or any data behind login. It does not bypass any technical protection measure. Personal data extracted (contributor names, emails published on profiles) must be handled in accordance with applicable data-protection law (GDPR, CCPA). Apify and the actor author make no warranty about data accuracy, completeness, or fitness for any purpose.

Changelog

2026-05-29 v0.2 — Reliability hardening. Graceful 404/422/451/409 → structured not-found/search-error records instead of failed runs. Canary check at run start. Pre-flight validation. Trending mode now flags invalid language slugs explicitly. 36/36 brutal test matrix green.
2026-05-28 v0.1 — Initial release. 11 modes. REST v3. PAT-aware rate-limit handling. Premium PPE pricing.

Built by @khadinakbar — see the full portfolio for related dev-cluster actors: chatgpt-gpt-store-scraper, apify-store-scraper, broken-link-checker, website-uptime-monitor.

GitHub Scraper - Repos, Issues, PRs & Contributors

nominated_tupelo/github-scraper

Scrape GitHub repositories, issues, pull requests, contributors, releases, and trending repos. Uses the official GitHub REST API. Optional GitHub token for higher rate limits.

kade

GitHub Trending Repos Scraper

rambunctious_fingerprint/github-trending-scraper

Casey Marsh

Github Trending Repos

sweet_rebel/github-trending-repos

Rajat Sharda

GitHub Keyword Monitor

kempt_sprinkles/github-keyword-monitor

Track GitHub repos, issues & PRs mentioning any keyword, brand, or competitor. Structured data for devtool lead gen, scheduled daily.

Nikolas Gevorkyan

GitHub Repository Intelligence MCP Server

cg_nguyen/github-repo-intel-mcp

MCP server for AI coding agents. Pulls structured intel from any public GitHub repository — overview, recent PRs, contributors, hot files, CI status, dependencies — over HTTP/JSON-RPC. Optional PAT for private repos and higher rate limits.

CG Nguyễn

GitHub Scraper - Repos, Users & Issues

fascinating_lentil/github-repos-users-issues-scraper

Scrape GitHub repositories, users, and issues via the official GitHub API. Get stars, forks, languages, topics, issues, user profiles, and follower counts. No login needed (optional token for higher limits).

Md Jakaria Mirza

GitHub Repository & Trending Scraper

rupom888/github-repository-scraper

Search GitHub repos, scrape user profiles with repos, get repo details with contributors, or track GitHub trending. Uses public API - optional token for higher rate limits.

Syed Rupom

GitHub Repo Monitor — Releases, Stars, Issues & Activity

vertaizen/github-repo-monitor

OSS monitoring tool tracking any GitHub repos via the open GitHub REST API — new releases, star velocity, issues, commits & contributors. No scraping, no blocking. Monitor mode returns only NEW activity for release alerts & devtools competitive intel. MCP-ready.

Diego Moragues

GitHub Scraper

brilliant_gum/github-scraper

Scrapes GitHub repositories, users, trending repos, issues and code via the GitHub REST API v3. Supports authentication tokens for higher rate limits (5000 req/hr vs 60 req/hr unauthenticated). Includes smart analytics: language distribution, license distribution, stars histogram, activity scores...