GitHub Scraper API — Repositories, Users & Data Export
Pricing
from $3.90 / 1,000 repositories
GitHub Scraper API — Repositories, Users & Data Export
Scrape GitHub repositories and user profiles via the official public API. Search by keyword, fetch repos by owner/name, or look up user profiles. No auth required; supply your free GitHub token to raise rate limits to 5000 req/hr. Pay per result.
Pricing
from $3.90 / 1,000 repositories
Rating
0.0
(0)
Developer
Vitalii Bondarev
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
GitHub Repository & User Scraper API
Used by developer-focused recruiters building outreach lists, B2B SaaS teams identifying active open-source contributors in their stack, and AI agents answering "who are the top contributors to this Python project?"
Scrape GitHub repositories and user profiles via the official GitHub REST API. Search repos by keyword, fetch specific repos by owner/name, or retrieve user and organization profiles. Pay per result. No proxy needed — GitHub's public API requires no authentication for read-only data.
$2.00/1K repos · $3.00/1K user/org profiles (Pay Per Event). First 10 results free. No proxy needed. Official GitHub API. 5000 req/hr with free token. Full GitHub search syntax supported.
Not affiliated with GitHub, Inc. or Microsoft. GitHub is a trademark of GitHub, Inc.
What You Can Scrape
Repositories
- Full name (owner/repo), description, homepage URL
- Stars, forks, watchers, open issues count
- Primary language, topic tags, license (SPDX identifier)
- Created / last-updated dates (ISO 8601)
- Archived and fork flags
Users & Organizations
- Login, display name, bio, company, location
- Follower and following counts, public repo count
- Account creation date, blog/website URL
- Twitter username, public email (when available)
Rate Limits
| Mode | Limit |
|---|---|
| No token (anonymous) | 60 API requests/hr; 10 search requests/min |
| With free GitHub token | 5,000 requests/hr (no scope required) |
For large runs: generate a free token at github.com/settings/tokens (no scopes needed for public data) and supply it via the githubToken input field. The token never leaves your Apify account's encrypted storage.
Modes
searchRepos (default)
Searches GitHub repositories by keyword using the GitHub Search API. Supports full GitHub search syntax:
"machine learning language:python stars:>500""react hooks""apify scraper"
Returns repos sorted by stars (default), forks, update date, or help-wanted issues.
repoFullNames
Fetch one or more specific repos by exact owner/repo identifier. Returns the full repo object including topics, license, and all metrics. Useful when you have a curated list.
usernames
Look up GitHub user or organization profiles by login. Returns follower count, public repo count, bio, company, location, and account creation date.
Output Schema
Repo record
| Field | Type | Description |
|---|---|---|
type | string | Always "repo" |
full_name | string | "owner/repo" |
owner | string | Owner login |
name | string | Repo name |
description | string | Repo description |
homepage | string | Project homepage URL |
html_url | string | GitHub page URL |
language | string | Primary programming language |
topics | list[str] | Topic tags |
license | string | SPDX license ID (e.g. "MIT") |
stars | integer | Star count |
forks | integer | Fork count |
watchers | integer | Watcher count |
open_issues | integer | Open issues count |
created_at | string | ISO 8601 creation date |
updated_at | string | ISO 8601 last-updated date |
is_archived | boolean | Whether repo is archived |
is_fork | boolean | Whether repo is a fork |
query | string | Search query that produced this result |
scraped_at | string | ISO 8601 scrape timestamp |
parse_confidence | float | Data quality score (1.0 = all fields present) |
warnings | list[str] | Missing-field codes for low-confidence records |
User record
| Field | Type | Description |
|---|---|---|
type | string | "user" or "organization" |
login | string | GitHub login |
name | string | Display name |
bio | string | Bio text |
company | string | Employer (@ prefix stripped) |
location | string | Location string |
blog | string | Blog/homepage URL |
email | string | Public email (often null) |
twitter_username | string | Twitter handle |
followers | integer | Follower count |
following | integer | Following count |
public_repos | integer | Public repo count |
public_gists | integer | Public gist count |
created_at | string | ISO 8601 account creation date |
updated_at | string | ISO 8601 profile last-updated date |
html_url | string | GitHub profile URL |
query | string | Username queried |
scraped_at | string | ISO 8601 scrape timestamp |
parse_confidence | float | Data quality score |
warnings | list[str] | Missing-field codes |
Example Use Cases
- Lead generation: Find active developers in specific languages or topics
- Tech stack research: Discover most-starred repos per language
- Competitive intelligence: Track competitor repos (stars, forks, activity)
- Developer outreach: Build lists of contributors to specific projects
- Open source analytics: Monitor ecosystem health by topic tag
vs. Competitors
| Feature | This Actor | drobnikj/github-scraper | HTML web scrapers |
|---|---|---|---|
| Data source | Official GitHub REST API | Older GitHub API | HTML scraping |
| Proxy or anti-bot workaround needed | No | No | Sometimes |
| Rate limit handling | Yes (429 backoff, token boost) | Basic | Often breaks |
| User + org profiles | Yes (separate gh-user event) | Limited | Inconsistent |
| parse_confidence | Yes | No | No |
| Full GitHub search syntax | Yes | Partial | No |
| Cost | $2/1K repos · $3/1K profiles | ~$1-2/1K | $1-5/1K + proxy |
Use with AI agents (MCP)
An agent calls this tool to look up GitHub repos and developer profiles mid-conversation — e.g. "Find top Python ML repos with 1000+ stars", "Get the profile for linus-torvalds", or "What repos does the Apify org maintain?"
Point your MCP client at this tool:
{"mcpServers": {"apify": {"command": "npx","args": ["mcp-remote","https://mcp.apify.com/?tools=bovi/github-scraper","--header","Authorization: Bearer <YOUR_APIFY_TOKEN>"]}}}
Minimal agent input (repo search):
{"mode": "searchRepos","searchQueries": ["apify scraper language:python stars:>100"],"maxItems": 20}
Pricing
Pay-per-result (PPE):
| Event | Rate | Trigger |
|---|---|---|
gh-item | $2.00/1K | Repository records (searchRepos or repoFullNames mode) |
gh-user | $3.00/1K | User and organization profile records (usernames mode) |
User/org profiles are priced higher because they contain contact signals (public email, Twitter handle, company) useful for developer lead-gen. You pay only for data you receive — no charge for rate-limit retries or failed lookups.
Worked examples:
- 500 repo search results = $1.00
- 1,000 repo search results = $2.00
- 200 user/org profiles = $0.60
- 500 repos + 100 user profiles = $1.30 ($1.00 repos + $0.30 profiles)
GitHub's public API is free — compute is billed to your Apify account based on actual run time.
FAQ
Do I need a GitHub token? A token is optional but strongly recommended for runs over ~50 items. Without one, the rate limit is 60 requests/hr; with a free token (no scopes needed) it's 5,000 requests/hr. Generate one at github.com/settings/tokens.
Can I search private repositories? No. The actor uses GitHub's public REST API — only public repos and public profile data are accessible.
What if my search returns empty? GitHub Search API caps results at 1,000 per query. For common keywords, narrow your search with qualifiers (language, stars, topics). Empty results are not charged.
What output formats are available?
JSON, CSV, Excel — download from the Apify dataset, or pipe to n8n / Make / Zapier. Every record includes html_url linking directly to GitHub.
Tips
- Supply a GitHub token (
githubTokeninput) for large runs — without it you'll hit the 60 req/hr limit quickly. - For
searchRepos, GitHub caps results at 1,000 per query. Use multiple specific queries to cover more ground. is_fork: truerepos often have low original activity — filter them out if you need organic projects.parse_confidence < 1.0means some optional fields were missing from the API response — the record is still valid.
Integrations
Built for developer-focused recruiters and B2B SaaS teams identifying active contributors and repo signals in their stack — the JSON/dataset output drops into the tools you already run, no glue code:
- n8n / Make / Zapier — trigger a run or pipe every new dataset item into 500+ apps (Google Sheets, Airtable, Slack, HubSpot, your database) with no code: n8n, Make, Zapier.
- Webhooks — fire your own endpoint the moment a run finishes, to push results straight into your pipeline (docs).
- MCP server — expose this actor as a tool to Claude, Cursor, or any MCP client so an AI agent can pull this data mid-conversation (guide).
- API & SDKs — fetch the dataset as JSON, CSV, or Excel through the Apify REST API or the Python / JS SDKs.
See all Apify integrations.