GitHub Scraper API — Repositories, Users & Data Export avatar

GitHub Scraper API — Repositories, Users & Data Export

Pricing

from $3.90 / 1,000 repositories

Go to Apify Store
GitHub Scraper API — Repositories, Users & Data Export

GitHub Scraper API — Repositories, Users & Data Export

Scrape GitHub repositories and user profiles via the official public API. Search by keyword, fetch repos by owner/name, or look up user profiles. No auth required; supply your free GitHub token to raise rate limits to 5000 req/hr. Pay per result.

Pricing

from $3.90 / 1,000 repositories

Rating

0.0

(0)

Developer

Vitalii Bondarev

Vitalii Bondarev

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

GitHub Repository & User Scraper API

Used by developer-focused recruiters building outreach lists, B2B SaaS teams identifying active open-source contributors in their stack, and AI agents answering "who are the top contributors to this Python project?"

Scrape GitHub repositories and user profiles via the official GitHub REST API. Search repos by keyword, fetch specific repos by owner/name, or retrieve user and organization profiles. Pay per result. No proxy needed — GitHub's public API requires no authentication for read-only data.

$2.00/1K repos · $3.00/1K user/org profiles (Pay Per Event). First 10 results free. No proxy needed. Official GitHub API. 5000 req/hr with free token. Full GitHub search syntax supported.

Not affiliated with GitHub, Inc. or Microsoft. GitHub is a trademark of GitHub, Inc.


What You Can Scrape

Repositories

  • Full name (owner/repo), description, homepage URL
  • Stars, forks, watchers, open issues count
  • Primary language, topic tags, license (SPDX identifier)
  • Created / last-updated dates (ISO 8601)
  • Archived and fork flags

Users & Organizations

  • Login, display name, bio, company, location
  • Follower and following counts, public repo count
  • Account creation date, blog/website URL
  • Twitter username, public email (when available)

Rate Limits

ModeLimit
No token (anonymous)60 API requests/hr; 10 search requests/min
With free GitHub token5,000 requests/hr (no scope required)

For large runs: generate a free token at github.com/settings/tokens (no scopes needed for public data) and supply it via the githubToken input field. The token never leaves your Apify account's encrypted storage.


Modes

searchRepos (default)

Searches GitHub repositories by keyword using the GitHub Search API. Supports full GitHub search syntax:

  • "machine learning language:python stars:>500"
  • "react hooks"
  • "apify scraper"

Returns repos sorted by stars (default), forks, update date, or help-wanted issues.

repoFullNames

Fetch one or more specific repos by exact owner/repo identifier. Returns the full repo object including topics, license, and all metrics. Useful when you have a curated list.

usernames

Look up GitHub user or organization profiles by login. Returns follower count, public repo count, bio, company, location, and account creation date.


Output Schema

Repo record

FieldTypeDescription
typestringAlways "repo"
full_namestring"owner/repo"
ownerstringOwner login
namestringRepo name
descriptionstringRepo description
homepagestringProject homepage URL
html_urlstringGitHub page URL
languagestringPrimary programming language
topicslist[str]Topic tags
licensestringSPDX license ID (e.g. "MIT")
starsintegerStar count
forksintegerFork count
watchersintegerWatcher count
open_issuesintegerOpen issues count
created_atstringISO 8601 creation date
updated_atstringISO 8601 last-updated date
is_archivedbooleanWhether repo is archived
is_forkbooleanWhether repo is a fork
querystringSearch query that produced this result
scraped_atstringISO 8601 scrape timestamp
parse_confidencefloatData quality score (1.0 = all fields present)
warningslist[str]Missing-field codes for low-confidence records

User record

FieldTypeDescription
typestring"user" or "organization"
loginstringGitHub login
namestringDisplay name
biostringBio text
companystringEmployer (@ prefix stripped)
locationstringLocation string
blogstringBlog/homepage URL
emailstringPublic email (often null)
twitter_usernamestringTwitter handle
followersintegerFollower count
followingintegerFollowing count
public_reposintegerPublic repo count
public_gistsintegerPublic gist count
created_atstringISO 8601 account creation date
updated_atstringISO 8601 profile last-updated date
html_urlstringGitHub profile URL
querystringUsername queried
scraped_atstringISO 8601 scrape timestamp
parse_confidencefloatData quality score
warningslist[str]Missing-field codes

Example Use Cases

  • Lead generation: Find active developers in specific languages or topics
  • Tech stack research: Discover most-starred repos per language
  • Competitive intelligence: Track competitor repos (stars, forks, activity)
  • Developer outreach: Build lists of contributors to specific projects
  • Open source analytics: Monitor ecosystem health by topic tag

vs. Competitors

FeatureThis Actordrobnikj/github-scraperHTML web scrapers
Data sourceOfficial GitHub REST APIOlder GitHub APIHTML scraping
Proxy or anti-bot workaround neededNoNoSometimes
Rate limit handlingYes (429 backoff, token boost)BasicOften breaks
User + org profilesYes (separate gh-user event)LimitedInconsistent
parse_confidenceYesNoNo
Full GitHub search syntaxYesPartialNo
Cost$2/1K repos · $3/1K profiles~$1-2/1K$1-5/1K + proxy

Use with AI agents (MCP)

An agent calls this tool to look up GitHub repos and developer profiles mid-conversation — e.g. "Find top Python ML repos with 1000+ stars", "Get the profile for linus-torvalds", or "What repos does the Apify org maintain?"

Point your MCP client at this tool:

{
"mcpServers": {
"apify": {
"command": "npx",
"args": [
"mcp-remote",
"https://mcp.apify.com/?tools=bovi/github-scraper",
"--header",
"Authorization: Bearer <YOUR_APIFY_TOKEN>"
]
}
}
}

Minimal agent input (repo search):

{
"mode": "searchRepos",
"searchQueries": ["apify scraper language:python stars:>100"],
"maxItems": 20
}

Pricing

Pay-per-result (PPE):

EventRateTrigger
gh-item$2.00/1KRepository records (searchRepos or repoFullNames mode)
gh-user$3.00/1KUser and organization profile records (usernames mode)

User/org profiles are priced higher because they contain contact signals (public email, Twitter handle, company) useful for developer lead-gen. You pay only for data you receive — no charge for rate-limit retries or failed lookups.

Worked examples:

  • 500 repo search results = $1.00
  • 1,000 repo search results = $2.00
  • 200 user/org profiles = $0.60
  • 500 repos + 100 user profiles = $1.30 ($1.00 repos + $0.30 profiles)

GitHub's public API is free — compute is billed to your Apify account based on actual run time.

FAQ

Do I need a GitHub token? A token is optional but strongly recommended for runs over ~50 items. Without one, the rate limit is 60 requests/hr; with a free token (no scopes needed) it's 5,000 requests/hr. Generate one at github.com/settings/tokens.

Can I search private repositories? No. The actor uses GitHub's public REST API — only public repos and public profile data are accessible.

What if my search returns empty? GitHub Search API caps results at 1,000 per query. For common keywords, narrow your search with qualifiers (language, stars, topics). Empty results are not charged.

What output formats are available? JSON, CSV, Excel — download from the Apify dataset, or pipe to n8n / Make / Zapier. Every record includes html_url linking directly to GitHub.


Tips

  • Supply a GitHub token (githubToken input) for large runs — without it you'll hit the 60 req/hr limit quickly.
  • For searchRepos, GitHub caps results at 1,000 per query. Use multiple specific queries to cover more ground.
  • is_fork: true repos often have low original activity — filter them out if you need organic projects.
  • parse_confidence < 1.0 means some optional fields were missing from the API response — the record is still valid.

Integrations

Built for developer-focused recruiters and B2B SaaS teams identifying active contributors and repo signals in their stack — the JSON/dataset output drops into the tools you already run, no glue code:

  • n8n / Make / Zapier — trigger a run or pipe every new dataset item into 500+ apps (Google Sheets, Airtable, Slack, HubSpot, your database) with no code: n8n, Make, Zapier.
  • Webhooks — fire your own endpoint the moment a run finishes, to push results straight into your pipeline (docs).
  • MCP server — expose this actor as a tool to Claude, Cursor, or any MCP client so an AI agent can pull this data mid-conversation (guide).
  • API & SDKs — fetch the dataset as JSON, CSV, or Excel through the Apify REST API or the Python / JS SDKs.

See all Apify integrations.